By Jim


2008-12-03 04:25:27 8 Comments

I would like to have a regular expression that checks if a string contains only upper and lowercase letters, numbers, and underscores.

18 comments

@Marcio Martins 2019-02-08 14:08:05

I believe you are not taking Latin and Unicode characters in your matches. For example, if you need to take "ã" or "ü" chars, the use of "\w" won't work.

You can, alternatively, use this approach:

^[A-ZÀ-Ýa-zà-ý0-9_]+$

Hope it helps!

@Mukund 2017-11-14 15:50:30

^\w*$ will work for below combination 1 123 1av pRo av1

@Danuel O'Neal 2012-01-31 13:38:39

In Computer Science, an Alphanumeric value often means the first character is not a number but is an alphabet or underscore. Thereafter the character can be 0-9, A-Z, a-z, or underscore (_).

Here is how you would do that:

Tested under php:

$regex = '/^[A-Za-z_][A-Za-z\d_]*$/'

or take this

^[A-Za-z_][A-Za-z\d_]*$

and place it in your development language.

@Saurabh 2015-05-20 13:02:51

this works for me you can try [\p{Alnum}_]

@Day Davis Waterbury 2012-06-09 22:53:02

Although it's more verbose than \w, I personally appreciate the readability of the full POSIX character class names ( http://www.zytrax.com/tech/web/regex.htm#special ), so I'd say:

^[[:alnum:]_]+$

However, while the documentation at the above links states that \w will "Match any character in the range 0 - 9, A - Z and a - z (equivalent of POSIX [:alnum:])", I have not found this to be true. Not with grep -P anyway. You need to explicitly include the underscore if you use [:alnum:] but not if you use \w. You can't beat the following for short and sweet:

^\w+$

Along with readability, using the POSIX character classes (http://www.regular-expressions.info/posixbrackets.html) means that your regex can work on non ASCII strings, which the range based regexes won't do since they rely on the underlying ordering of the ASCII characters which may be different from other character sets and will therefore exclude some non-ASCII characters (letters such as œ) which you might want to capture.

@Agustin 2012-04-03 14:57:40

For those of you looking for unicode alphanumeric matching, you might want to do something like:

^[\p{L} \p{Nd}_]+$

Further reading at http://unicode.org/reports/tr18/ and at http://www.regular-expressions.info/unicode.html

@Agustin 2012-04-04 02:38:36

If you just want Latin do p{Latin} instead of p{L}

@Shantanu 2012-01-11 00:52:41

Try these multi-lingual extensions I have made for string.

IsAlphaNumeric - String must contain atleast 1 alpha (letter in Unicode range, specified in charSet) and atleast 1 number (specified in numSet). Also, the string should comprise only of alpha and numbers.

IsAlpha - String should contain atleast 1 alpha (in the language charSet specified) and comprise only of alpha.

IsNumeric - String should contain atleast 1 number (in the language numSet specified) and comprise only of numbers.

The charSet/numSet range for the desired language can be specified. The Unicode ranges are available on below link:

http://www.ssec.wisc.edu/~tomw/java/unicode.html

API :

    public static bool IsAlphaNumeric(this string stringToTest)
    {
        //English
        const string charSet = "a-zA-Z";
        const string numSet = @"0-9";

        //Greek
        //const string charSet = @"\u0388-\u03EF";            
        //const string numSet = @"0-9";

        //Bengali
        //const string charSet = @"\u0985-\u09E3";
        //const string numSet = @"\u09E6-\u09EF";

        //Hindi
        //const string charSet = @"\u0905-\u0963";
        //const string numSet = @"\u0966-\u096F";

        return Regex.Match(stringToTest, @"^(?=[" + numSet + @"]*?[" + charSet + @"]+)(?=[" + charSet + @"]*?[" + numSet + @"]+)[" + charSet + numSet [email protected]"]+$").Success;
    }

    public static bool IsNumeric(this string stringToTest)
    {
        //English
        const string numSet = @"0-9";

        //Hindi
        //const string numSet = @"\u0966-\u096F";

        return Regex.Match(stringToTest, @"^[" + numSet + @"]+$").Success;
    }

    public static bool IsAlpha(this string stringToTest)
    {
        //English
        const string charSet = "a-zA-Z";

        return Regex.Match(stringToTest, @"^[" + charSet + @"]+$").Success;
    }

Usage :

        //English
        string test = "AASD121asf";

        //Greek
        //string test = "Ϡϛβ123";

        //Bengali
        //string test = "শর৩৮";

        //Hindi
        //string test = @"क़लम३७ख़";

        bool isAlphaNum = test.IsAlphaNumeric();

@Shah 2012-01-11 13:37:50

what about only alphabets

@Shantanu 2012-04-20 03:27:20

@Shah : I have added the only alphabets (and only numbers too).

@mylesmckeown 2010-06-24 09:25:57

For me there was an issue in that I want to distinguish between alpha, numeric and alpha numeric, so to ensure an alphanumeric string contains at least one alpha and at least one numeric, I used :

^([a-zA-Z_]{1,}\d{1,})+|(\d{1,}[a-zA-Z_]{1,})+$

@Aniket kale 2018-12-24 09:45:44

Exactly what I want... Thanks

@boooloooo 2010-11-12 18:20:07

use lookaheads to do the "at least one" stuff. Trust me it's much easier.

Here's an example that would require 1-10 characters, containing at least one digit and one letter:

^(?=.*\d)(?=.*[A-Za-z])[A-Za-z0-9]{1,10}$

NOTE: could have used \w but then ECMA/Unicode considerations come into play increasing the character coverage of the \w "word character".

@Krishna Prasad 2014-09-17 09:01:43

Thank you very much...

@Rahi 2015-09-30 11:26:17

How would we do if we want to add _ and - to the list?

@Jean-Denis Muys 2009-07-10 08:56:41

matching diacritics in a regexp opens a whole can of worms, especially when taking Unicode into consideration. You might want to read about Posix locales in particular.

@Big 2017-05-29 17:29:20

Can you please provide a link or little explanation.

@kch 2008-12-05 05:25:04

There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:

/^\w+$/

\w is equivalent to [A-Za-z0-9_], which is pretty much what you want. (unless we introduce unicode to the mix)

Using the + quantifier you'll match one or more characters. If you want to accept an empty string too, use * instead.

@tchrist 2012-06-10 05:09:58

\w isn’t usually restricted to ASCII alone.

@Alex 2017-09-11 18:21:45

English is not the only language in the world, so this should be the accepted answer, not the [a-z] and its variations. \w will capture non-latin characters too. Like šēēā or кукареку

@Lupus Ossorum 2017-09-18 05:14:33

Why is this answer not higher in the list?

@guidotex 2018-11-16 19:30:13

Validated on page 318 of the O'Reilly "Mastering Regular Expressions"

@Jay 2008-12-03 04:31:51

The following regex matches alphanumeric characters and underscore:

^[a-zA-Z0-9_]+$

For example, in Perl:

#!/usr/bin/perl -w

my $arg1 = $ARGV[0];

# check that the string contains *only* one or more alphanumeric chars or underscores
if ($arg1 !~ /^[a-zA-Z0-9_]+$/) {
  print "Failed.\n";
} else {
    print "Success.\n";
}

@BenAlabaster 2008-12-03 04:35:41

The pattern in your code is correct, but the pattern above only checks a single instance.

@Jay 2008-12-03 04:46:23

That was intentional, code sample was intended as a clarifying usage in actually checking a string. Also why code has the beginning and end of line markers as well which are not in the regex example.

@Jay 2008-12-03 05:04:20

@Windows programmer - not sure if you're just trying to be humorous or clever, but alphanumeric specifically refers to the latin alphabet and arabic numerals, so wouldn't include ñ or any of the other special chars you've referenced in the comments here.

@Windows programmer 2008-12-03 06:41:37

When did ñ stop being Latin?

@Jan Goyvaerts 2008-12-03 07:48:43

@Jay: I think your answer would be a lot clearer if the regex above the source code snippet was the proper regex, rather than a partial regex. People who don't know Perl will look at your regex, but not at the Perl snippet.

@Jay 2008-12-05 04:55:35

@Windows programmer - en.wikipedia.org/wiki/Alphanumeric - latin alphabet, not "latin character set" which is what includes diacritics etc. Purely a semantics issue, but I personally go with the common usage of the term alphanumeric as A-Z and 0-9.

@Jay 2008-12-05 04:56:21

@Jan - added the full regex anyway, though there's already an accepted answer so it probably doesn't matter. Helps if people specify the language they're working in in the first place so we don't have to guess ;)

@Windows programmer 2008-12-05 05:57:34

ñ is a letter of the alphabet in Spanish, including in Latin America.

@Windows programmer 2008-12-05 06:02:04

"I would like to have a regular expression that checks if a string contains only upper and lowercase letters, numbers, and underscores" doesn't limit it to Latin letters. "The following regex matches alphanumeric characters and underscore" doesn't limit it to Latin letters. "^[a-zA-Z0-9_]+$" fails.

@Charlie 2008-12-03 04:33:50

To match a string that contains only those characters (or an empty string), try

"^[a-zA-Z0-9_]*$"

This works for .NET regular expressions, and probably a lot of other languages as well.

Breaking it down:

^ : start of string
[ : beginning of character group
a-z : any lowercase letter
A-Z : any uppercase letter
0-9 : any digit
_ : underscore
] : end of character group
* : zero or more of the given characters
$ : end of string

If you don't want to allow empty strings, use + instead of *.

EDIT As others have pointed out, some regex languages have a shorthand form for [a-zA-Z0-9_]. In the .NET regex language, you can turn on ECMAScript behavior and use \w as a shorthand (yielding ^\w*$ or ^\w+$). Note that in other languages, and by default in .NET, \w is somewhat broader, and will match other sorts of unicode characters as well (thanks to Jan for pointing this out). So if you're really intending to match only those characters, using the explicit (longer) form is probably best.

@Windows programmer 2008-12-03 06:42:35

If you ever go to Germany or if you ever see just about any German text you'll see what I'm saying.

@Jan Goyvaerts 2008-12-03 07:45:35

\w and [A-Za-z0-9_] are not equivalent in most regex flavors. \w includes letters with diacritics, letters from other scripts, etc.

@Trejkaz 2011-10-24 22:24:23

The original question did say "upper and lowercase letters", so it would seem that "letters" from non-Latin scripts should match.

@tchrist 2012-06-10 05:09:36

[\p{upper}\p{lower}\p{gc=Number}_] is all you need to do this right, presuming there are no combining characters.

@Induster 2012-07-31 19:50:50

I've seen this in many places, but it still allows the '$' character for me. All other special characters are blocked that I've tested so far.

@Chris Harrison 2013-02-19 05:14:25

I get "No ending delimiter '^' found", when I use this pattern with preg_match

@Charlie 2013-02-20 14:37:12

It looks like preg_match requires your pattern to be enclosed with delimiters, which are normally slashes. So you would need "/^[a-zA-Z0-9_]*$/". See this question for more info: stackoverflow.com/questions/6445133/…. See also this page: forums.phpfreaks.com/topic/…

@doug65536 2013-10-05 17:45:20

What's going on with all the up-votes. This is not correct. It only works for English. If you are going to make an edit, EDIT it. Don't add on an "Edit:", just make it correct.

@JohnMerlino 2014-02-09 21:22:23

I like how you broke down the regular expressions too

@SomeRandomDeveloper 2014-09-08 15:34:42

Upvote for actually breaking down and explaining the pattern! Well done!

@jlaverde 2015-05-29 18:07:50

@heisenberg YES. x100. I took formal languages a few years ago and this brought it all back.

@unknown6656 2015-09-11 14:33:02

what about characters like "öäüßÿ...." --> Characters in other languages, which have accents etc.?

@Sandburg 2018-08-30 07:44:49

+ doesn't work on some grep implementations. The lexicon is limited, be carefull.

@BenAlabaster 2008-12-03 04:31:41

Um...question: Does it need to have at least one character or no? Can it be an empty string?

^[A-Za-z0-9_]+$

Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *

^[A-Za-z0-9_]*$

Edit:

If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:

^\w+$

Or

^\w*$

@BenAlabaster 2008-12-03 05:54:21

Well now that you mention it, I also missed a whole bunch of other French characters...

@Jan Goyvaerts 2008-12-03 07:49:29

\w is the same as [\w] with less typing effort

@BenAlabaster 2008-12-03 14:30:48

Yeah, you still need the + or * and the ^ and $ - \w just checks that it contains word characters, not that it only contains word characters...

@Induster 2012-07-31 19:51:57

oddly, this still allows the $ sign.

@Sebas 2016-04-09 02:02:13

@Induster, it's because of what BenAlabaster just pointed out

@Anton 2008-12-03 05:08:09

You want to check that each character matches your requirements, which is why we use:

[A-Za-z0-9_]

And you can even use the shorthand version:

\w

Which is equivalent (in some regex flavors, so make sure you check before you use it). Then to indicate that the entire string must match, you use:

^

To indicate the string must start with that character, then use

$

To indicate the string must end with that character. Then use

\w+ or \w*

To indicate "1 or more", or "0 or more". Putting it all together, we have:

^\w*$

@Jan Goyvaerts 2008-12-03 07:45:01

\w and [A-Za-z0-9_] are not equivalent in most regex flavors. \w includes letters with diacritics, letters from other scripts, etc.

@mson 2008-12-03 04:44:06

Here is the regex for what you want with a quantifier to specify at least 1 character and no more than 255 characters

[^a-zA-Z0-9 _]{1,255}

@Drew Hall 2008-12-03 04:31:17

How about:

^([A-Za-z]|[0-9]|_)+$

...if you want to be explicit, or:

^\w+$

...if you prefer concise (Perl syntax).

@David Norman 2008-12-03 04:33:10

To check the entire string and not allow empty strings, try

^[A-Za-z0-9_]+$

Related Questions

Sponsored Content

7 Answered Questions

[SOLVED] How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

18 Answered Questions

[SOLVED] How do you use a variable in a regular expression?

  • 2009-01-30 00:11:05
  • JC Grubbs
  • 592171 View
  • 1108 Score
  • 18 Answer
  • Tags:   javascript regex

71 Answered Questions

16 Answered Questions

[SOLVED] How do you access the matched groups in a JavaScript regular expression?

  • 2009-01-11 07:21:20
  • nickf
  • 641421 View
  • 1149 Score
  • 16 Answer
  • Tags:   javascript regex

48 Answered Questions

[SOLVED] What is the best regular expression to check if a string is a valid URL?

11 Answered Questions

[SOLVED] How to do a regular expression replace in MySQL?

27 Answered Questions

6 Answered Questions

[SOLVED] \d is less efficient than [0-9]

1 Answered Questions

2 Answered Questions

Sponsored Content