By Vitor Silva

2008-10-02 10:53:17 8 Comments

How can I check if a given string is a valid URL address?

My knowledge of regular expressions is basic and doesn't allow me to choose from the hundreds of regular expressions I've already seen on the web.


@Ravi Matani 2018-01-23 18:35:47

Below expression will work for all popular domains. It will accept following urls:

In addition it will make message with url as link also
e.g. please visit
In above example it will make as hyperlink

if (new RegExp("([-a-z0-9]{1,63}\\.)*?[a-z0-9][-a-z0-9]{0,61}[a-z0-9]\\.(com|com/|org|gov|cm|net|online|live|biz|us|uk|||in||int|info|edu|mil|ca|co||org/|gov/|cm/|net/|online/|live/|biz/|us/|uk/|||in/||int/|info/|edu/|mil/|ca/|co/|[-\\[email protected]\\+\\.~#\\?*&/=% ]*)?$").test(strMessage) || (new RegExp("^[a-z ]+[\.]?[a-z ]+?[\.]+[a-z ]+?[\.]+[a-z ]+?[-\\[email protected]\\+\\.~#\\?*&/=% ]*").test(strMessage) && new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_][email protected])?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(strMessage)) || (new RegExp("^[a-z ]+[\.]?[a-z ]+?[-\\[email protected]\\+\\.~#\\?*&/=% ]*").test(strMessage) && new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_][email protected])?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(strMessage))) {
  if (new RegExp("^[a-z ]+[\.]?[a-z ]+?[\.]+[a-z ]+?[\.]+[a-z ]+?$").test(strMessage) && new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_][email protected])?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(strMessage)) {
    var url1 = /(^|<|\s)([\w\.]+\.(?:com|org|gov|cm|net|online|live|biz|us|uk|||in||int|info|edu|mil|ca|co|\s|>|$)/g;
    var html = $.trim(strMessage);
    if (html) {
      html = html.replace(url1, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="http://$2">$2</a>$3');
    returnString = html;
    return returnString;
  } else {
    var url1 = /(^|&lt;|\s)(www\..+?\.(?:com|org|gov|cm|net|online|live|biz|us|uk|||in||int|info|edu|mil|ca|co|[^,\s]*)(\s|&gt;|$)/g,
      url2 = /(^|&lt;|\s)(((https?|ftp):\/\/|mailto:).+?\.(?:com|org|gov|cm|net|online|live|biz|us|uk|||in||int|info|edu|mil|ca|co|[^,\s]*)(\s|&gt;|$)/g,
      url3 = /(^|&lt;|\s)([\w\.]+\.(?:com|org|gov|cm|net|online|live|biz|us|uk|||in||int|info|edu|mil|ca|co|[^,\s]*)(\s|&gt;|$)/g;

    var html = $.trim(strMessage);
    if (html) {
      html = html.replace(url1, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="http://$2">$2</a>$3').replace(url2, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="$2">$2</a>$5').replace(url3, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="http://$2">$2</a>$3');
    returnString = html;

    return returnString;

@AmerllicA 2020-01-25 06:04:57

Without any more description:

const url = /^((http(s?)?):\/\/)?([wW]{3}\.)?[a-zA-Z0-9\-.]+\.[a-zA-Z]{2,}(\.[a-zA-Z]{2,})?$/g;

@Toto 2020-01-25 11:04:05

Please, don't give same (wrong) answer multiple times, , ask for closing question as duplicate.

@AmerllicA 2020-01-25 12:19:57

Thanks, dear @MartijnPieters to clarify how to answer multiple questions by the same concept. I will do just right you commented on my deleted posts. thanks.

@AmerllicA 2020-01-25 12:29:54

Dear @Toto, I fixed the error of code and for thanks, I leave and upvote to one of your answers. Good Luck.

@Kerem 2019-05-08 09:09:09

If you would like to apply a more strict rule, here is what I have developed:

isValidUrl(input) {
    var regex = /^(((H|h)(T|t)(T|t)(P|p)(S|s)?):\/\/)?[[email protected]:%._\+~#=]{2,100}\.[a-zA-Z]{2,10}(\/([[email protected]:%_\+.~#?&//=]*))?/
    return regex.test(input)

@Dragana Le Mitova 2019-05-02 12:19:50


Detects Urls like these:



@Dragana Le Mitova 2019-05-02 15:27:27

It detects valid url in string.

@Dragana Le Mitova 2019-10-19 13:55:24

I improved it, you can use it with .pl sites now.

@Akiva 2020-02-13 16:10:37

Your pattern fails to work on

@Dragana Le Mitova 2020-02-14 01:46:15 there is a proof that it works. You can use this site link to learn and test your regex.

@Dmytro Huz 2019-05-09 09:54:24

Here is a good rule that covers all possible cases: ports, params and etc


@Ray Li 2020-01-07 21:07:03

Unfortunately, this URL does not cover params containing a URL encoded value such as

@Mahfuzur Rahman 2018-11-26 12:10:31

I think it is a very simple way. And it works very good.

var hasURL = (str) =>{
	var url_pattern = new RegExp("(www.|http://|https://|ftp://)\w*");
		document.getElementById("demo").innerHTML = 'No URL';
		document.getElementById("demo").innerHTML = 'String has a URL';
<p>Please enter a string and test it has any url or not</p>
<input type="text" id="url" placeholder="url" onkeyup="hasURL(document.getElementById('url').value)">
<p id="demo"></p>

@Elie G. 2018-12-10 04:40:15

Your regex doesn't work at all bro. All it validates is that your string contains either www immediately followed by one character (any character since you haven't escaped the .) or http:// or https:// or ftp:// and any of these can be followed by any alphanumeric characters. So, in other words, all the following strings would result as being valid but they are obviously not valid urls : www., www▓, £¢¤£¢¤www¢ (See on regex101). You could have used a shorter regex: (www.|(https?|ftp)://)\w*. (This is still not a good regex btw)

@Mahfuzur Rahman 2018-12-10 05:34:57

Obviously www. , www▓, £¢¤£¢¤www¢ those are not valid urls. But I think, those are not also meaningful string. I just try to simplify the url pattern. @ DrunkenPoney

@Elie G. 2018-12-10 16:35:28

My goal wasn't to write meaningful strings but to show that weird strings would be accepted and anyway since your regex validate for www I suppose you don't necessarily need the protocol to be specified but your regex wouldn't allow urls like Moreover, one of the problems I was trying to show you is that your regex matches wherever the validation parts (www, http, ...) are in the string. You could at least specify that your string needs to start with it.

@Elie G. 2018-12-10 16:38:04

And if you want a quick regex to validate url but is not 100% safe here is one I made which I used to extract the different parts from an url but can be used to validate that a string contains the base parts of an url.

@eyelidlessness 2008-10-10 07:23:01

I wrote my URL (actually IRI, internationalized) pattern to comply with RFC 3987 ( These are in PCRE syntax.

For absolute IRIs (internationalized):


To also allow relative IRIs:

/^(?:[a-z](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,1}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+\.[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?|(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,1}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+\.[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;[email protected]])+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?)$/i

How they were compiled (in PHP):


/* Regex convenience functions (character class, non-capturing group) */
function cc($str, $suffix = '', $negate = false) {
    return '[' . ($negate ? '^' : '') . $str . ']' . $suffix;
function ncg($str, $suffix = '') {
    return '(?:' . $str . ')' . $suffix;

/* Preserved from RFC3986 */

$ALPHA = 'a-z';
$DIGIT = '0-9';
$HEXDIG = $DIGIT . 'a-f';

$sub_delims = '!\\$&\'\\(\\)\\*\\+,;=';
$gen_delims = ':\\/\\?\\#\\[\\]@';
$reserved = $gen_delims . $sub_delims;
$unreserved = '-' . $ALPHA . $DIGIT . '\\._~';

$pct_encoded = '%' . cc($HEXDIG) . cc($HEXDIG);

$dec_octet = ncg(implode('|', array(
    cc('1-9') . cc($DIGIT),
    '1' . cc($DIGIT) . cc($DIGIT),
    '2' . cc('0-4') . cc($DIGIT),
    '25' . cc('0-5')

$IPv4address = $dec_octet . ncg('\\.' . $dec_octet, '{3}');

$h16 = cc($HEXDIG, '{1,4}');
$ls32 = ncg($h16 . ':' . $h16 . '|' . $IPv4address);

$IPv6address = ncg(implode('|', array(
    ncg($h16 . ':', '{6}') . $ls32,
    '::' . ncg($h16 . ':', '{5}') . $ls32,
    ncg($h16, '?') . '::' . ncg($h16 . ':', '{4}') . $ls32,
    ncg($h16 . ':' . $h16, '?') . '::' . ncg($h16 . ':', '{3}') . $ls32,
    ncg(ncg($h16 . ':', '{0,2}') . $h16, '?') . '::' . ncg($h16 . ':', '{2}') . $ls32,
    ncg(ncg($h16 . ':', '{0,3}') . $h16, '?') . '::' . $h16 . ':' . $ls32,
    ncg(ncg($h16 . ':', '{0,4}') . $h16, '?') . '::' . $ls32,
    ncg(ncg($h16 . ':', '{0,5}') . $h16, '?') . '::' . $h16,
    ncg(ncg($h16 . ':', '{0,6}') . $h16, '?') . '::',

$IPvFuture = 'v' . cc($HEXDIG, '+') . cc($unreserved . $sub_delims . ':', '+');

$IP_literal = '\\[' . ncg(implode('|', array($IPv6address, $IPvFuture))) . '\\]';

$port = cc($DIGIT, '*');

$scheme = cc($ALPHA) . ncg(cc('-' . $ALPHA . $DIGIT . '\\+\\.'), '*');

/* New or changed in RFC3987 */

$iprivate = '\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}';

$ucschar = '\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}' .
    '\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}' .
    '\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}' .
    '\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}' .
    '\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}' .

$iunreserved = '-' . $ALPHA . $DIGIT . '\\._~' . $ucschar;

$ipchar = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':@'));

$ifragment = ncg($ipchar . '|' . cc('\\/\\?'), '*');

$iquery = ncg($ipchar . '|' . cc($iprivate . '\\/\\?'), '*');

$isegment_nz_nc = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '+');
$isegment_nz = ncg($ipchar, '+');
$isegment = ncg($ipchar, '*');

$ipath_empty = '(?!' . $ipchar . ')';
$ipath_rootless = ncg($isegment_nz) . ncg('\\/' . $isegment, '*');
$ipath_noscheme = ncg($isegment_nz_nc) . ncg('\\/' . $isegment, '*');
$ipath_absolute = '\\/' . ncg($ipath_rootless, '?'); // Spec says isegment-nz *( "/" isegment )
$ipath_abempty = ncg('\\/' . $isegment, '*');

$ipath = ncg(implode('|', array(
))) . ')';

$ireg_name = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '*');

$ihost = ncg(implode('|', array($IP_literal, $IPv4address, $ireg_name)));
$iuserinfo = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':'), '*');
$iauthority = ncg($iuserinfo . '@', '?') . $ihost . ncg(':' . $port, '?');

$irelative_part = ncg(implode('|', array(
    '\\/\\/' . $iauthority . $ipath_abempty . '',
    '' . $ipath_absolute . '',
    '' . $ipath_noscheme . '',
    '' . $ipath_empty . ''

$irelative_ref = $irelative_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?');

$ihier_part = ncg(implode('|', array(
    '\\/\\/' . $iauthority . $ipath_abempty . '',
    '' . $ipath_absolute . '',
    '' . $ipath_rootless . '',
    '' . $ipath_empty . ''

$absolute_IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?');

$IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?');

$IRI_reference = ncg($IRI . '|' . $irelative_ref);

Edit 7 March 2011: Because of the way PHP handles backslashes in quoted strings, these are unusable by default. You'll need to double-escape backslashes except where the backslash has a special meaning in regex. You can do that this way:

$escape_backslash = '/(?<!\\)\\(?![\[\]\\\^\$\.\|\*\+\(\)QEnrtaefvdwsDWSbAZzB1-9GX]|x\{[0-9a-f]{1,4}\}|\c[A-Z]|)/';
$absolute_IRI = preg_replace($escape_backslash, '\\\\', $absolute_IRI);
$IRI = preg_replace($escape_backslash, '\\\\', $IRI);
$IRI_reference = preg_replace($escape_backslash, '\\\\', $IRI_reference);

@Peter Di Cecco 2010-01-06 19:27:02

If you think that's bad, you should see the one for e-mail:

@Gumbo 2010-07-08 10:57:01

HTTP URLs have no authentication information. This is only valid for FTP URLs.

@eyelidlessness 2010-07-08 15:05:10

@Gumbo, it's allowed in the spec and used in URI implementations for HTTP applications. It's discouraged (for obvious reasons) but perfectly valid and should be anticipated. Most (if not all?) browsers sometimes translate HTTP authentication into the URL for subsequent access.

@Donal Fellows 2010-08-29 16:13:56

I don't suppose you have an RE that also fully validates the hostname part of the URL, only accepting names that are valid hostnames in the network location where the browser is currently located? ;-)

@eyelidlessness 2010-08-29 16:51:27

@Donal, I'm not sure regex supports that... ;)

@ntziolis 2011-03-07 10:25:15

I have seen lots of web pages use the authentication in http urls.

@CyberJunkie 2011-06-07 03:46:29

what the.. is that a regex for valid URLs??

@eyelidlessness 2011-06-07 03:54:19

@CyberJunkie, yes it is. The answer provides a link to the RFC, code for how it was compiled, and an explanation about how to use it in PHP. Did you downvote? If so, could you explain why?

@Devin G Rhode 2011-10-16 23:24:40

@eyelidlessness I think it would be appreciated if you wrapped that in a function that simple takes in the IRI string and returns true valid or false invalid. Then we include your code in a separate file and just call your function.

@eyelidlessness 2011-10-17 00:26:44

@Devin, in a function in what language? I compiled it in PHP, but it can be used in other languages. Should I write a function in all of those languages? Alternately, it would be pretty simple for you to do the same in a language of your choosing.

@Devin G Rhode 2011-10-17 01:10:59

Perhaps I post a question specifically on wrapping your code in functions in various languages? I think that would keep things organized.

@joshcomley 2011-11-22 15:40:32

Thanks for posting this answer - the only thing I'm finding is that in RegexBuddy it's not working; things like \x{10000}-\x{1FFFD} are causing trouble. Any ideas?

@bruha 2012-02-13 01:51:57

@joshcomley replace \x{ABCD} to \uABCD, if you write it in JS

@velop 2013-11-22 12:26:23

Is http://com a correkt url? Because it is passing with your regex. And furthermore isn't a u-modifier missing at the end for unicode support (at least in php)?

@eyelidlessness 2013-11-22 17:18:49

Yes, http://com is a valid URL. http://localhost is, why wouldn't other words be? You are correct that the u modifier is necessary in PHP. I want to be clear that while I generated these with PHP, they are not meant to be PHP-specific.

@Hamid Sarfraz 2014-01-28 00:58:07

thanks @eyelidlessness. I have an issue. This Regex is passing http://? as a valid URL. Is it so?

@aliteralmind 2014-04-10 01:18:11

This answer has been added to the Stack Overflow Regular Expression FAQ, under "Common Validation Tasks".

@Hans 2015-06-23 22:19:04

@eyelidlessness This regex erroneously allows | in querystrings. Eg| matches. I think it's because of the stray | in $iprivate

@eyelidlessness 2015-06-24 04:42:55

@Hans, why wouldn't | be allowed in a query string? It is. It matches the %xF0000-FFFFD range of iprivate. The | in the $iprivate variable is a regex special character that means OR. See

@Hans 2015-06-24 13:07:31

@eyelidlessness Per RFC 3987 IRI Syntax unicode char VERTICAL LINE u+007C is not allowed anywhere in IRI's at all, in fact. The | in $iprivate represents a literal, NOT an alternation operator, since it's enclosed in a character class.

@eyelidlessness 2015-06-26 05:36:05

@Hans, while I see now that the | is treated as a literal as you say, I see nothing in the spec disallowing the character anywhere, and it does in fact match the iprivate range %xF0000-FFFFD. The pipe is allowed, and there's no reason it shouldn't be.

@Hans 2015-06-26 19:51:42

@eyelidlessness Why do you think u+007c is in the range u+F0000-u+FFFFD? If you need further convincing, just test /[\x{F0000}-\x{FFFFD}]/u against | to observe that it does not match. If still not convinced, take a look at IRI validators across various languages such as Python's rfc3987 package or .NET's Uri.IsWellFormedUriString method with IRI support enabled. None of them allow for |. See sample results here

@eyelidlessness 2015-06-26 21:58:43

@Hans, I apologize, you are correct. I was very quickly trying to verify by converting the pattern to JS to test in my console, because I don't have a PHP environment to test in anymore. But I was not paying attention to converting the character classes correctly. I guess I was surprised because there's really no obvious reason that a pipe would be disallowed. Thanks for the correction.

@Hans 2015-06-26 22:25:23

@eyelidlessness No worries. Thanks for updating the answer. BTW, is an excellent tool for testing both pcre and js regex's.

@avgvstvs 2016-06-05 14:57:52

When running these through a regex debugger, the amount of backtracking in these regexes is horrific. While it's certainly not a valid url,佐贺诺伦-^ńörén.jpg takes 6862 steps to match a failure, and parses the entire string 15x before it finally returns a negative answer. The regex might work, but if your regex engine relies on backtracking (pretty much anything other than grep) this could cause a performance bottleneck.

@user 2017-02-24 17:01:42

Does anyone have a C# version of this?

@RokL 2017-02-28 17:59:10

@avgvstvs it could be optimized by adding possessive quantifiers.

@avgvstvs 2017-03-01 13:45:31

@UMad On the ESAPI project we encountered a problem like this for a different regex. Execution time with possessives only netted an improvement from 86s to parse the URL, to 52s. Still WAY too long. I used the tool regexbuddy and it demonstrated the backtracking behavior I brought up here. My bigger point: URL validation should be handled by writing a grammar and generating a parser, not regex.

@RokL 2017-03-02 17:01:28

@avgvstvs Another option is to use regex implementation that doesn't use backtracking, but rather generates a DFA.

@avgvstvs 2017-03-09 16:53:36

@UMad, or even better, Thompson NFA: (Some benchmarks provided about 3/4 of the way down)

@avgvstvs 2017-03-09 16:54:47

This question was "non constructive," but there's an answer here discussing drawbacks with DFAs as well:

@アレックス 2017-06-28 04:55:27

1) does your regexp support right-to-left spelling? 2) does it support all unicode allowed in IRI's?

@Evert 2017-08-24 13:59:44

Strongly suggest you learn about the /x modifier for regexes and add comments.

@me_ 2017-10-01 19:26:02

this matches "h t t p :" but doesn't match "google . com" which is the most common way people quickly write urls...

@Nodarii 2019-04-02 06:41:18


live demo:

I have tested various expressions to match my requirements.

As a user I can hit browser search bar with following strings:

valid urls

invalid urls

@Sajeeb Chandan 2019-02-14 12:26:52


You can use this pattern for detecting URLs.

Following is the proof of concept

RegExr: URL Detector

@Elie G. 2018-12-10 18:31:57

Here is a regex I made which extracts the different parts from an URL:


((?:https?|ftp):\/\/?)?(group 1): extracts the protocol
([^:/\s.]+\.[^:/\s]|localhost)(group 2): extracts the hostname
(:\d+)?(group 3): extracts the port number
((?:\/\w+)*\/)?([\w\-.]+[^#?\s]+)?(groups 4 & 5): extracts the path part
([^#]+)?(group 6): extracts the query part
(#[\w-]+)?(group 7): extracts the hash part

For every part of the regex listed above, you can remove the ending ? to force it (or add one to make it facultative). You can also remove the ^ at the beginning and $ at the end of the regex so it won't need to match the whole string.

See it on regex101.

Note: this regex is not 100% safe and may accept some strings which are not necessarily valid URLs but it does indeed validate some criterias. Its main goal was to extract the different parts of an URL not to validate it.

@Laurie Stearn 2019-04-10 09:22:36

Thanks. The group approach to these answers is best. Here's hoping for updates following the direction of this article linked on the next page, and a revision of the "not 100% safe". A quantification like 99.9% is enough for most readers. :P

@Matthew O'Riordan 2011-11-22 22:38:54

I've just written up a blog post for a great solution for recognizing URLs in most used formats such as:

The regular expression used is:

/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w][email protected])?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w][email protected])[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)/

@cmroanirgo 2012-12-01 07:11:09

This is good, but fails in a few spots (as @Matthew ack's in his comments in his blog): and and

@Jaime Cham 2013-03-15 08:58:21

That one also works, but it's missing support for the port number (useful in debugging). Modified would be /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w][email protected])?[A-Za-z0-9.‌​-]+(:[0-9]+)?|(?:www‌​.|[-;:&=\+\$,\w][email protected])[‌​A-Za-z0-9.-]+)((?:\/‌​[\+~%\/.\w-_]*)?\??(‌​?:[-\+=&;%@.\w_]*)#?‌​(?:[\w]*))?)/

@Oliver Moran 2013-05-05 20:17:00

Without http:// before it, the above does recognise the URL of this page. Depends on expected usage but strikes me as weak in many cases.

@RobH 2013-07-10 09:28:39

This Regex doesn't handle links with parenthesis in them: e.g.

@Anthony 2013-08-08 17:04:38

Shouldn't the dot be escaped after www?

@machineghost 2014-01-31 00:01:51

Perhaps they run their code against unit tests, and those unit tests contain strings they thought up which look like URLs but aren't?

@Cas Bloem 2014-02-07 15:29:51

Got another match mate: width:210px; and margin:3px

@Steve P 2014-03-25 10:10:09

another match: "dot:." the main search it is looking for is 3-9 characters, followed by a colon ":" then any character or number from a-z && 0-9 + others :) @whiteb0x if you are looking for a regex validator check here.

@Imran 2015-04-03 20:53:06

Can this be easily modified to avoid matching a trailing dot, as in ""?

@Gustav 2015-06-27 18:07:30

Doesn't match ""...?

@Chad Brown 2016-05-31 21:45:35

Big issue with this one as it also matches javascript:alert(0);

@basit raza 2016-12-29 06:40:21

It also match the email address pattern i.e [email protected]

@Salem Artin 2018-01-26 22:21:02

This regex does not work for URLs that contain a comma For example:,456

@Mr_and_Mrs_D 2018-02-23 12:45:03

Link broken fix please

@icecub 2019-10-15 06:43:27

This regex also matches string:string. So it will match test:P or test:DDd. Should probably fix that..

@DroidOS 2012-10-24 12:06:38

This is a rather old thread now and the question asks for a regex based URL validator. I ran into the thread whilst looking for precisely the same thing. While it may well be possible to write a really comprehensive regex to validate URLs. I eventually settled on another way to do things - by using PHP's parse_url function.

It returns boolean false if the url cannot be parsed. Otherwise, it returns the scheme, the host and other information. This may well not be enough for a comprehensive URL check on its own, but can be drilled down into for further analysis. If the intent is to simply catch typos, invalid schemes etc. It is perfectly adequate!

@Erick Maynard 2018-09-26 00:20:20

Interestingly, none of the answers above worked for what I needed, so I figured I would offer my solution. I needed to be able to do the following:

  • Match http(s)://,,, and
  • Match Github markdown style links like [Google](
  • Match all possible domain extensions, like .com, or .io, or .guru, etc. Basically anything between 2-6 characters in length
  • Split everything into proper groupings so that I could access each part as needed.

Here was the solution:

/^(\[[A-z0-9 _]*\]\()?((?:(http|https):\/\/)?(?:[\w-]+\.)+[a-z]{2,6})(\))?$

This gives me all of the above requirements. You could optionally add the ability for ftp and file if necessary:

/^(\[[A-z0-9 _]*\]\()?((?:(http|https|ftp|file):\/\/)?(?:[\w-]+\.)+[a-z]{2,6})(\))?$

@Hank Gay 2008-10-02 11:14:58

Non-validating URI-reference Parser

For reference purposes, here's the IETF Spec: (TXT | HTML). In particular, Appendix B. Parsing a URI Reference with a Regular Expression demonstrates how to parse a valid regex. This is described as,

for an example of a non-validating URI-reference parser that will take any given string and extract the URI components.

Here's the regex they provide:


As someone else said, it's probably best to leave this to a lib/framework you're already using.

@Alex D 2013-04-13 19:39:44

Completely useless. Can someone show me a string which this regex does not match? (Both "#?#?#" or "<<<>>>" match. What kind of URIs are those?)

@Hank Gay 2013-07-18 14:07:51

@AlexD Don't complain to me. That's the official specification for a URI. Take it up with the IETF if you don't like it.

@andyg0808 2013-12-13 10:12:10

@AlexD I think those might be considered relative references. See RFC 3986, section 4.2.

@Alex D 2013-12-13 18:34:48

@andyg0808, you may be right, but the fact remains that this regex matches virtually any string under the sun.

@Evan Carroll 2018-08-27 05:47:47

This is not a good answer because it's not validating, as per the question. It's parsing. Those are two different functions. If you give this regex trash, it tries to parse it. If the URL isn't valid, the parsing isn't guaranteed to work.

@Evan Carroll 2018-08-27 05:52:19

@AlexD this isn't a validating regex it's just a parsing regex.

@Laurie Stearn 2019-04-10 10:04:46

@Evan Carroll: Anything can be parsed according to some criteria. Feed any regex on this page with a string, and where it doesn't parse to a valid URL, it's an invalid URL by assertion. Then trialing the result validates the regex assertion. You're right, the answer says Non-validating URI-reference Parser "for reference purposes", which might be included in an answer to something like this thread, and then cross-linked.

@Wayne Werner 2020-03-19 21:54:17

@AlexD According to Python's urllib.parse.urlparse(), an entirely valid URI: ParseResult(scheme='', netloc='', path='', params='', query='', fragment='?#?#'). Just because it's useless doesn't mean it's invalid.

@Nik Kov 2018-07-27 15:02:42

The best regex, i've found is: /(^|\s)((https?:\/\/)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\/\S*)?)/gi

For ios swift : (^|\\s)((https?:\\/\\/)?[\\w-]+(\\.[\\w-]+)+\\.?(:\\d+)?(\\/\\S*)?)

Found here

@dev_khan 2018-03-07 14:44:14

After rigorous searching i finally settled with the following


And this thing work for general in future URLs.

@Divya-Systematix 2017-08-30 17:20:08

I hope it's helpful for you...


@diazdeteran 2018-04-08 23:20:44

You're allowing for more than two forward slashes (/) in http or https and \w already includes the other two w and any digit. You probably meant this: ^(http|https):\/\/\w+\.\w+(\/\w+)?

@IT Eng - BU 2018-01-01 09:08:52

As far as I have found, this expression is good for me-


Working example-

function RegExForUrlMatch()
  var expression = /(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9]\.[^\s]{2,})/g;

  var regex = new RegExp(expression);
  var t = document.getElementById("url").value;

  if (t.match(regex)) {
    document.getElementById("demo").innerHTML = "Successful match";
  } else {
    document.getElementById("demo").innerHTML = "No match";
<input type="text" id="url" placeholder="url" onkeyup="RegExForUrlMatch()">

<p id="demo">Please enter a URL to test</p>

@jeffkmeng 2018-03-28 23:28:38

Not all urls start with www or don't have a subdomain.

@Keng 2008-10-02 17:53:23

Here's what RegexBuddy uses.


It matches these below (inside the ** ** marks):


You can download RegexBuddy at

@toohool 2008-10-02 18:00:19

What about gopher? Poor, forgotten gopher.

@PandaWood 2010-11-13 07:18:13

Your regex doesn't match any url I can come up with - including those you've included. I paste your regex into and it says "Forward slashes must be escaped." Is there a typo or can you clarify by getting it to work at

@Keng 2010-11-15 14:39:38

@PandaWood that's because you need to format for Ruby. What is Ruby's escape character?

@PandaWood 2010-11-22 01:28:27

Hi Keng, even if I copy your exact RegEx above into RegexBuddy, I can't match it on any URL. I guess there's something gone amiss in the markup. Ruby regex is hardly any different at this basic syntax level.

@Keng 2010-11-22 04:26:51

@PandaWood wait...if you have REB just go to the library and grab it. that's where i got it...check to see if they are the same.

@jpillora 2013-01-16 00:16:25

As a JavaScript RegExp literal: /\b(https?|ftp|file):\/\/[\-A-Za-z0-9+&@#\/%?=~_|!:,.;]*[\-A‌​-Za-z0-9+&@#\/%=~_|]‌​/

@Keng 2014-03-21 18:26:29

@Mahesh Chand thanks for the update; the edit got rejected so I couldnt get a moderator to reinstate the enhancement. I think it got rejected because the reviewers would rather see code changes in comments and then let the OP add it. I made the update though. thanks.

@Salem Artin 2018-01-26 22:22:48

The good thing about this regex is that it matches URLs with commas. For example:,456

@Michael Foukarakis 2018-05-30 08:30:46

This regex not only does not match many valid URIs, but also matches anything like [-A-Za-z0-9+&@#/%?=~_|!:,.;], which is of course nothing like a URI. I suggest deletion.

@Teejay 2018-09-17 10:42:52

This matches nearly everything... useless

@Synetech 2019-09-09 17:21:46

@toohool, at least gopher lasted longer than archie. :-\ (Do people still finger? 🤔)

@tk_ 2017-09-07 06:00:39

How about this:


These are the test cases:

Test cases

You can try it out in here :

@tk_ 2018-05-04 12:24:37

@downvoter can you let me know what is the issue with this regex? because I'm using this in my production setup :P

@AndroidDev 2017-08-17 06:43:02

This is not a regular expression but accomplishes the same thing (Javascript only):

function isAValidUrl(url) {
  try {
    new URL(url);
    return true;
  } catch(e) {
    return false;

@Ali Habibzadeh 2017-12-06 14:43:38

The problem with this is that h ttp://bla is a valid URL (the space between h and t is so SO doesn't make it an actual URL)

@Blair Conrad 2008-10-02 10:59:09

The post Getting parts of a URL (Regex) discusses parsing a URL to identify its various components. If you want to check if a URL is well-formed, it should be sufficient for your needs.

If you need to check if it's actually valid, you'll eventually have to try to access whatever's on the other end.

In general, though, you'd probably be better off using a function that's supplied to you by your framework or another library. Many platforms include functions that parse URLs. For example, there's Python's urlparse module, and in .NET you could use the System.Uri class's constructor as a means of validating the URL.

@S.p 2013-07-18 04:47:04

The best regular expression for URL for me would be:

"(([\w]+:)?//)?(([\d\w]|%[a-fA-F\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)[email protected])?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?"

@rektide 2014-02-02 22:25:42

this seems to be limited w/r/t number of domains it'll accept?

@James Kuang 2014-02-03 23:19:30

Thanks! Here's the escaped version that worked for me on iOS: (([\\w]+:)?//)?(([\\d\\w]|%[a-fA-f\\d]{2,2})+(:([\\d\\w]|%[a‌​-fA-f\\d]{2,2})+)[email protected])‌​?([\\d\\w][-\\d\\w]{‌​0,253}[\\d\\w]\\.)+[‌​\\w]{2,4}(:[\\d]+)?(‌​/([-+_~.\\d\\w]|%[a-‌​fA-f\\d]{2,2})*)*(\\‌​?(&?([-+_~.\\d\\w]|%‌​[a-fA-f\\d]{2,2})=?)‌​*)?(#([-+_~.\\d\\w]|‌​%[a-fA-f\\d]{2,2})*)‌​?

@ndm13 2017-05-05 20:25:13

This regex only matches suffixes up to 4 characters long and fails on IP addresses (v4 and v6), localhost, and domain names with foreign characters. I would recommend editing your inclusion size ranges and replacing \w with \p{L} at a minimum.

@Yoav Feuerstein 2017-08-31 03:59:00

Note that this RegEx doesn't capture URLs that have subdomains of one letter only, like "". In order to fix that, I had to change ([\d\w][-\d\w]{0,253}[\d\w]\.)+ into ([\d\w][-\d\w]{0,253}[\d\w]?\.)+ (add a question mark near the end of it)

@maxspan 2016-10-07 01:56:06

To Match a URL there are various option and it depend on you requirement. below are few.



And there is a link which gives you more than 10 different variations of validation for URL.

@ctwheels 2016-09-13 20:34:37

I created a similar regex (PCRE) to the one @eyelidlessness provided following RFC3987 along with other RFC documents. The major difference between @eyelidlessness and my regex are mainly readability and also URN support.

The regex below is all one piece (instead of being mixed with PHP) so it can be used in different languages very easily (so long as they support PCRE)

The easiest way to test this regex is to use regex101 and copy paste the code and test strings below with the appropriate modifiers (gmx).

To use this regex in PHP, insert the regex below into the following code:

$regex = <<<'EOD'
// Put the regex here

You can match a link without a scheme by doing the following:
To match a link without a scheme (i.e. [email protected] or, replace this section:


with this:


Note, however, that by replacing this, the regex does not become 100% reliable.

Regex (PCRE) with gmx modifiers for the multi-line test string below

  # Definitions
  # URI Parts

Test Strings

# Valid URIs[email protected]/top_story.htm
mailto:[email protected]
ftp://username:[email protected]/path/to/file/somefile.html?queryVariable=value#fragment
mailto:[email protected][2001:DB8::1]
mailto:[email protected][255:192:168:1]
mailto:[email protected]
# Note that the example below IS a valid as it does follow RFC standards

# These work with the optional scheme group although I'd suggest making the scheme mandatory as misinterpretations can occur
[email protected]

@MithPaul 2016-08-29 04:17:51

I think I found a more general regexp to validate urls, particularly websites

​(https?:\/\/)?(www\.)[[email protected]:%._\+~#=]{2,256}\.[a-z]{2,4}\b([[email protected]:%_\+.~#?&//=]*)|(https?:\/\/)?(www\.)?(?!ww)[[email protected]:%._\+~#=]{2,256}\.[a-z]{2,4}\b([[email protected]:%_\+.~#?&//=]*)

it does not allow for instance www.something or http://www or http://www.something

Check it here:

@Besnik Kastrati 2014-06-05 10:46:03

This will match all URLs

  • with or without http/https
  • with or without www

...including sub-domains and those new top-level domain name extensions such as .museum, .academy, .foundation etc. which can have up to 63 characters (not just .com, .net, .info etc.)

(([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)[email protected])?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?

Because today maximum length of the available top-level domain name extension is 13 characters such as .international, you can change the number 63 in expression to 13 to prevent someone misusing it.

as javascript

var urlreg=/(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)[email protected])?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/;

  var url = $(this).val();
  $(this).toggleClass('invalid', urlreg.test(url) == false)

<script src=""></script>

Wikipedia Article: List of all internet top-level domains

@user1063287 2014-07-14 07:34:53

Could anyone please convert this for use in Javascript?

@Alkasai 2015-03-20 19:03:10

Finally!! Can someone mark this as an answer? Or at lease upvote it. I thing though, i don't think it matches single letter domains, i.e. How would you adjust it to handle these case?

@AwokeKnowing 2016-01-26 00:49:46

it seems to allow http// without :

@Can Rau 2016-11-28 21:28:02

matches telephone numbers and email addresses have a look at copy pasted your regex, just escaped all slashes

@Mecki 2008-10-02 11:08:46

If you really search for the ultimate match, you probably find it on "A Good Url Regular Expression?".

But a regex that really matches all possible domains and allows anything that is allowed according to RFCs is horribly long and unreadable, trust me ;-)

@Rahul Desai 2015-10-09 00:50:48

I found the following Regex for URLs, tested successfully with 500+ URLs:

/\b(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)[email protected])?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?\b/gi

I know it looks ugly, but the good thing is that it works. :)

Explanation and demo with 581 random URLs on regex101.

Source: In search of the perfect URL validation regex

@Jonathan Maim 2015-11-10 04:42:15

Your regex is doing the work in 155'000 steps. Here is another regex that is evaluating all the 580 URLS your provided in 19'000 steps regex101 link: /(https?):\/\/([\w-]+(\.[\\w-]+)*\.([a-z]+))(([\w.,@?^=%&amp‌​;:\/~+#()!-]*)([\[email protected]?‌​^=%&amp;\/~+#()!-]))‌​?/gi

@Kiril 2012-02-14 21:32:43

Mathias Bynens has a great article on the best comparison of a lot of regular expressions: In search of the perfect URL validation regex

The best one posted is a little long, but it matches just about anything you can throw at it.

JavaScript version

/^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)[email protected])?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$/i

PHP version

_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)[email protected])?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]-*)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]-*)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$_iuS

@Toby Beresford 2016-10-05 13:59:03

For preg_match use with PHP use %^(?:(?:https?|ftp)://)(?:\S+(?::\S*)[email protected]|\d{1,3}(?:\.\d{1,3})‌​{3}|(?:(?:[a-z\d\x{0‌​0a1}-\x{ffff}]+-?)*[‌​a-z\d\x{00a1}-\x{fff‌​f}]+)(?:\.(?:[a-z\d\‌​x{00a1}-\x{ffff}]+-?‌​)*[a-z\d\x{00a1}-\x{‌​ffff}]+)*(?:\.[a-z\x‌​{00a1}-\x{ffff}]{2,6‌​}))(?::\d+)?(?:[^\s]‌​*)?$%iu

@Venryx 2017-04-29 01:16:18

On that page, I prefer stephenhay's solution, because it's 38 chars instead of 502!

@Matt Fletcher 2019-01-23 15:15:09

Also doesn't allow for IP addresses

@stackdave 2019-02-12 16:55:42

give valid (slash slash) : //

Related Questions

Sponsored Content

15 Answered Questions

[SOLVED] What is a non-capturing group in regular expressions?

73 Answered Questions

17 Answered Questions

[SOLVED] What is the maximum length of a URL in different browsers?

  • 2009-01-06 16:14:30
  • Sander Versluys
  • 1253370 View
  • 4841 Score
  • 17 Answer
  • Tags:   http url browser

30 Answered Questions

[SOLVED] Regular expression to match a line that doesn't contain a word

7 Answered Questions

21 Answered Questions

[SOLVED] How do you access the matched groups in a JavaScript regular expression?

  • 2009-01-11 07:21:20
  • nickf
  • 770062 View
  • 1369 Score
  • 21 Answer
  • Tags:   javascript regex

8 Answered Questions

[SOLVED] Is there a regular expression to detect a valid regular expression?

  • 2008-10-05 17:07:35
  • psytek
  • 208415 View
  • 1007 Score
  • 8 Answer
  • Tags:   regex

32 Answered Questions

[SOLVED] What is the difference between a URI, a URL and a URN?

  • 2008-10-06 21:26:58
  • Sean McMains
  • 1146578 View
  • 4366 Score
  • 32 Answer
  • Tags:   http url uri urn rfc3986

20 Answered Questions

[SOLVED] How do you use a variable in a regular expression?

  • 2009-01-30 00:11:05
  • JC Grubbs
  • 762878 View
  • 1380 Score
  • 20 Answer
  • Tags:   javascript regex

7 Answered Questions

[SOLVED] Ukkonen's suffix tree algorithm in plain English

Sponsored Content