By Vitor Silva


2008-10-02 10:53:17 8 Comments

How can I check if a given string is a valid URL address?

My knowledge of regular expressions is basic and doesn't allow me to choose from the hundreds of regular expressions I've already seen on the web.

30 comments

@eyelidlessness 2008-10-10 07:23:01

I wrote my URL (actually IRI, internationalized) pattern to comply with RFC 3987 (http://www.faqs.org/rfcs/rfc3987.html). These are in PCRE syntax.

For absolute IRIs (internationalized):

/^[a-z](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,1}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+\.[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?$/i

To also allow relative IRIs:

/^(?:[a-z](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,1}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+\.[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?|(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,1}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+\.[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;[email protected]])+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?)$/i

How they were compiled (in PHP):

<?php

/* Regex convenience functions (character class, non-capturing group) */
function cc($str, $suffix = '', $negate = false) {
    return '[' . ($negate ? '^' : '') . $str . ']' . $suffix;
}
function ncg($str, $suffix = '') {
    return '(?:' . $str . ')' . $suffix;
}

/* Preserved from RFC3986 */

$ALPHA = 'a-z';
$DIGIT = '0-9';
$HEXDIG = $DIGIT . 'a-f';

$sub_delims = '!\\$&\'\\(\\)\\*\\+,;=';
$gen_delims = ':\\/\\?\\#\\[\\]@';
$reserved = $gen_delims . $sub_delims;
$unreserved = '-' . $ALPHA . $DIGIT . '\\._~';

$pct_encoded = '%' . cc($HEXDIG) . cc($HEXDIG);

$dec_octet = ncg(implode('|', array(
    cc($DIGIT),
    cc('1-9') . cc($DIGIT),
    '1' . cc($DIGIT) . cc($DIGIT),
    '2' . cc('0-4') . cc($DIGIT),
    '25' . cc('0-5')
)));

$IPv4address = $dec_octet . ncg('\\.' . $dec_octet, '{3}');

$h16 = cc($HEXDIG, '{1,4}');
$ls32 = ncg($h16 . ':' . $h16 . '|' . $IPv4address);

$IPv6address = ncg(implode('|', array(
    ncg($h16 . ':', '{6}') . $ls32,
    '::' . ncg($h16 . ':', '{5}') . $ls32,
    ncg($h16, '?') . '::' . ncg($h16 . ':', '{4}') . $ls32,
    ncg($h16 . ':' . $h16, '?') . '::' . ncg($h16 . ':', '{3}') . $ls32,
    ncg(ncg($h16 . ':', '{0,2}') . $h16, '?') . '::' . ncg($h16 . ':', '{2}') . $ls32,
    ncg(ncg($h16 . ':', '{0,3}') . $h16, '?') . '::' . $h16 . ':' . $ls32,
    ncg(ncg($h16 . ':', '{0,4}') . $h16, '?') . '::' . $ls32,
    ncg(ncg($h16 . ':', '{0,5}') . $h16, '?') . '::' . $h16,
    ncg(ncg($h16 . ':', '{0,6}') . $h16, '?') . '::',
)));

$IPvFuture = 'v' . cc($HEXDIG, '+') . cc($unreserved . $sub_delims . ':', '+');

$IP_literal = '\\[' . ncg(implode('|', array($IPv6address, $IPvFuture))) . '\\]';

$port = cc($DIGIT, '*');

$scheme = cc($ALPHA) . ncg(cc('-' . $ALPHA . $DIGIT . '\\+\\.'), '*');

/* New or changed in RFC3987 */

$iprivate = '\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}';

$ucschar = '\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}' .
    '\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}' .
    '\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}' .
    '\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}' .
    '\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}' .
    '\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}';

$iunreserved = '-' . $ALPHA . $DIGIT . '\\._~' . $ucschar;

$ipchar = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':@'));

$ifragment = ncg($ipchar . '|' . cc('\\/\\?'), '*');

$iquery = ncg($ipchar . '|' . cc($iprivate . '\\/\\?'), '*');

$isegment_nz_nc = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '+');
$isegment_nz = ncg($ipchar, '+');
$isegment = ncg($ipchar, '*');

$ipath_empty = '(?!' . $ipchar . ')';
$ipath_rootless = ncg($isegment_nz) . ncg('\\/' . $isegment, '*');
$ipath_noscheme = ncg($isegment_nz_nc) . ncg('\\/' . $isegment, '*');
$ipath_absolute = '\\/' . ncg($ipath_rootless, '?'); // Spec says isegment-nz *( "/" isegment )
$ipath_abempty = ncg('\\/' . $isegment, '*');

$ipath = ncg(implode('|', array(
    $ipath_abempty,
    $ipath_absolute,
    $ipath_noscheme,
    $ipath_rootless,
    $ipath_empty
))) . ')';

$ireg_name = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '*');

$ihost = ncg(implode('|', array($IP_literal, $IPv4address, $ireg_name)));
$iuserinfo = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':'), '*');
$iauthority = ncg($iuserinfo . '@', '?') . $ihost . ncg(':' . $port, '?');

$irelative_part = ncg(implode('|', array(
    '\\/\\/' . $iauthority . $ipath_abempty . '',
    '' . $ipath_absolute . '',
    '' . $ipath_noscheme . '',
    '' . $ipath_empty . ''
)));

$irelative_ref = $irelative_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?');

$ihier_part = ncg(implode('|', array(
    '\\/\\/' . $iauthority . $ipath_abempty . '',
    '' . $ipath_absolute . '',
    '' . $ipath_rootless . '',
    '' . $ipath_empty . ''
)));

$absolute_IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?');

$IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?');

$IRI_reference = ncg($IRI . '|' . $irelative_ref);

Edit 7 March 2011: Because of the way PHP handles backslashes in quoted strings, these are unusable by default. You'll need to double-escape backslashes except where the backslash has a special meaning in regex. You can do that this way:

$escape_backslash = '/(?<!\\)\\(?![\[\]\\\^\$\.\|\*\+\(\)QEnrtaefvdwsDWSbAZzB1-9GX]|x\{[0-9a-f]{1,4}\}|\c[A-Z]|)/';
$absolute_IRI = preg_replace($escape_backslash, '\\\\', $absolute_IRI);
$IRI = preg_replace($escape_backslash, '\\\\', $IRI);
$IRI_reference = preg_replace($escape_backslash, '\\\\', $IRI_reference);

@Peter Di Cecco 2010-01-06 19:27:02

If you think that's bad, you should see the one for e-mail: ex-parrot.com/~pdw/Mail-RFC822-Address.html

@Gumbo 2010-07-08 10:57:01

HTTP URLs have no authentication information. This is only valid for FTP URLs.

@eyelidlessness 2010-07-08 15:05:10

@Gumbo, it's allowed in the spec and used in URI implementations for HTTP applications. It's discouraged (for obvious reasons) but perfectly valid and should be anticipated. Most (if not all?) browsers sometimes translate HTTP authentication into the URL for subsequent access.

@Donal Fellows 2010-08-29 16:13:56

I don't suppose you have an RE that also fully validates the hostname part of the URL, only accepting names that are valid hostnames in the network location where the browser is currently located? ;-)

@eyelidlessness 2010-08-29 16:51:27

@Donal, I'm not sure regex supports that... ;)

@ntziolis 2011-03-07 10:25:15

I have seen lots of web pages use the authentication in http urls.

@CyberJunkie 2011-06-07 03:46:29

what the.. is that a regex for valid URLs??

@eyelidlessness 2011-06-07 03:54:19

@CyberJunkie, yes it is. The answer provides a link to the RFC, code for how it was compiled, and an explanation about how to use it in PHP. Did you downvote? If so, could you explain why?

@Devin G Rhode 2011-10-16 23:24:40

@eyelidlessness I think it would be appreciated if you wrapped that in a function that simple takes in the IRI string and returns true valid or false invalid. Then we include your code in a separate file and just call your function.

@eyelidlessness 2011-10-17 00:26:44

@Devin, in a function in what language? I compiled it in PHP, but it can be used in other languages. Should I write a function in all of those languages? Alternately, it would be pretty simple for you to do the same in a language of your choosing.

@Devin G Rhode 2011-10-17 01:10:59

Perhaps I post a question specifically on wrapping your code in functions in various languages? I think that would keep things organized.

@joshcomley 2011-11-22 15:40:32

Thanks for posting this answer - the only thing I'm finding is that in RegexBuddy it's not working; things like \x{10000}-\x{1FFFD} are causing trouble. Any ideas?

@bruha 2012-02-13 01:51:57

@joshcomley replace \x{ABCD} to \uABCD, if you write it in JS

@velop 2013-11-22 12:26:23

Is http://com a correkt url? Because it is passing with your regex. And furthermore isn't a u-modifier missing at the end for unicode support (at least in php)?

@eyelidlessness 2013-11-22 17:18:49

Yes, http://com is a valid URL. http://localhost is, why wouldn't other words be? You are correct that the u modifier is necessary in PHP. I want to be clear that while I generated these with PHP, they are not meant to be PHP-specific.

@Hamid Sarfraz 2014-01-28 00:58:07

thanks @eyelidlessness. I have an issue. This Regex is passing http://? as a valid URL. Is it so?

@aliteralmind 2014-04-10 01:18:11

This answer has been added to the Stack Overflow Regular Expression FAQ, under "Common Validation Tasks".

@Hans 2015-06-23 22:19:04

@eyelidlessness This regex erroneously allows | in querystrings. Eg http://foo.com?a=| matches. I think it's because of the stray | in $iprivate

@Hans 2015-06-24 01:20:03

@eyelidlessness 2015-06-24 04:42:55

@Hans, why wouldn't | be allowed in a query string? It is. It matches the %xF0000-FFFFD range of iprivate. The | in the $iprivate variable is a regex special character that means OR. See regular-expressions.info/refquick.html

@Hans 2015-06-24 13:07:31

@eyelidlessness Per RFC 3987 IRI Syntax unicode char VERTICAL LINE u+007C is not allowed anywhere in IRI's at all, in fact. The | in $iprivate represents a literal, NOT an alternation operator, since it's enclosed in a character class.

@eyelidlessness 2015-06-26 05:36:05

@Hans, while I see now that the | is treated as a literal as you say, I see nothing in the spec disallowing the character anywhere, and it does in fact match the iprivate range %xF0000-FFFFD. The pipe is allowed, and there's no reason it shouldn't be.

@Hans 2015-06-26 19:51:42

@eyelidlessness Why do you think u+007c is in the range u+F0000-u+FFFFD? If you need further convincing, just test /[\x{F0000}-\x{FFFFD}]/u against | to observe that it does not match. If still not convinced, take a look at IRI validators across various languages such as Python's rfc3987 package or .NET's Uri.IsWellFormedUriString method with IRI support enabled. None of them allow for |. See sample results here

@eyelidlessness 2015-06-26 21:58:43

@Hans, I apologize, you are correct. I was very quickly trying to verify by converting the pattern to JS to test in my console, because I don't have a PHP environment to test in anymore. But I was not paying attention to converting the character classes correctly. I guess I was surprised because there's really no obvious reason that a pipe would be disallowed. Thanks for the correction.

@Hans 2015-06-26 22:25:23

@eyelidlessness No worries. Thanks for updating the answer. BTW, regex101.com is an excellent tool for testing both pcre and js regex's.

@avgvstvs 2016-06-05 14:57:52

When running these through a regex debugger, the amount of backtracking in these regexes is horrific. While it's certainly not a valid url, http://core-jenkins.scansafe.cisco.com/佐贺诺伦-^ńörén.jpg takes 6862 steps to match a failure, and parses the entire string 15x before it finally returns a negative answer. The regex might work, but if your regex engine relies on backtracking (pretty much anything other than grep) this could cause a performance bottleneck.

@user 2017-02-24 17:01:42

Does anyone have a C# version of this?

@U Mad 2017-02-28 17:59:10

@avgvstvs it could be optimized by adding possessive quantifiers.

@avgvstvs 2017-03-01 13:45:31

@UMad On the ESAPI project we encountered a problem like this for a different regex. Execution time with possessives only netted an improvement from 86s to parse the URL, to 52s. Still WAY too long. I used the tool regexbuddy and it demonstrated the backtracking behavior I brought up here. My bigger point: URL validation should be handled by writing a grammar and generating a parser, not regex.

@U Mad 2017-03-02 17:01:28

@avgvstvs Another option is to use regex implementation that doesn't use backtracking, but rather generates a DFA.

@avgvstvs 2017-03-09 16:53:36

@UMad, or even better, Thompson NFA: swtch.com/~rsc/regexp/regexp1.html (Some benchmarks provided about 3/4 of the way down)

@avgvstvs 2017-03-09 16:54:47

This question was "non constructive," but there's an answer here discussing drawbacks with DFAs as well: stackoverflow.com/a/17119079/557153

@アレックス 2017-06-28 04:55:27

1) does your regexp support right-to-left spelling? 2) does it support all unicode allowed in IRI's?

@Evert 2017-08-24 13:59:44

Strongly suggest you learn about the /x modifier for regexes and add comments.

@me_ 2017-10-01 19:26:02

this matches "h t t p :" but doesn't match "google . com" which is the most common way people quickly write urls...

@Nodarii 2019-04-02 06:41:18

^(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$

live demo: https://regex101.com/r/HUNasA/2

I have tested various expressions to match my requirements.

As a user I can hit browser search bar with following strings:

valid urls

invalid urls

@Sajeeb Chandan 2019-02-14 12:26:52

https?:\/{2}(?:[\/-\w.]|(?:%[\da-fA-F]{2}))+

You can use this pattern for detecting URLs.

Following is the proof of concept

RegExr: URL Detector

@DrunkenPoney 2018-12-10 18:31:57

Here is a regex I made which extracts the different parts from an URL:

^((?:https?|ftp):\/\/?)?([^:/\s.]+\.[^:/\s]|localhost)(:\d+)?((?:\/\w+)*\/)?([\w\-.]+[^#?\s]+)?([^#]+)?(#[\w-]+)?$

((?:https?|ftp):\/\/?)?(group 1): extracts the protocol
([^:/\s.]+\.[^:/\s]|localhost)(group 2): extracts the hostname
(:\d+)?(group 3): extracts the port number
((?:\/\w+)*\/)?([\w\-.]+[^#?\s]+)?(groups 4 & 5): extracts the path part
([^#]+)?(group 6): extracts the query part
(#[\w-]+)?(group 7): extracts the hash part

For every part of the regex listed above, you can remove the ending ? to force it (or add one to make it facultative). You can also remove the ^ at the beginning and $ at the end of the regex so it won't need to match the whole string.

See it on regex101.

Note: this regex is not 100% safe and may accept some strings which are not necessarily valid URLs but it does indeed validate some criterias. Its main goal was to extract the different parts of an URL not to validate it.

@Laurie Stearn 2019-04-10 09:22:36

Thanks. The group approach to these answers is best. Here's hoping for updates following the direction of this article linked on the next page, and a revision of the "not 100% safe". A quantification like 99.9% is enough for most readers. :P

@Matthew O'Riordan 2011-11-22 22:38:54

I've just written up a blog post for a great solution for recognizing URLs in most used formats such as:

The regular expression used is:

/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w][email protected])?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w][email protected])[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)/

@cmroanirgo 2012-12-01 07:11:09

This is good, but fails in a few spots (as @Matthew ack's in his comments in his blog): beta.foobar.com and goo.gl and bit.ly

@Jaime Cham 2013-03-15 08:58:21

That one also works, but it's missing support for the port number (useful in debugging). Modified would be /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w][email protected])?[A-Za-z0-9.‌​-]+(:[0-9]+)?|(?:www‌​.|[-;:&=\+\$,\w][email protected])[‌​A-Za-z0-9.-]+)((?:\/‌​[\+~%\/.\w-_]*)?\??(‌​?:[-\+=&;%@.\w_]*)#?‌​(?:[\w]*))?)/

@Oliver Moran 2013-05-05 20:17:00

Without http:// before it, the above does recognise the URL of this page. Depends on expected usage but strikes me as weak in many cases.

@RobH 2013-07-10 09:28:39

This Regex doesn't handle links with parenthesis in them: e.g. msdn.microsoft.com/en-us/library/ms563775(v=office.14).aspx

@Anthony 2013-08-08 17:04:38

Shouldn't the dot be escaped after www?

@machineghost 2014-01-31 00:01:51

Perhaps they run their code against unit tests, and those unit tests contain strings they thought up which look like URLs but aren't?

@Cas Bloem 2014-02-07 15:29:51

Got another match mate: width:210px; and margin:3px

@Steve P 2014-03-25 10:10:09

another match: "dot:." the main search it is looking for is 3-9 characters, followed by a colon ":" then any character or number from a-z && 0-9 + others :) @whiteb0x if you are looking for a regex validator check here.

@Yuri 2014-04-02 06:52:53

Does not work for ipv6 links. http://[ff00::1]:80/test

@Imran 2015-04-03 20:53:06

Can this be easily modified to avoid matching a trailing dot, as in "www.test.com."?

@Gustav 2015-06-27 18:07:30

Doesn't match "example.com"...?

@Chad Brown 2016-05-31 21:45:35

Big issue with this one as it also matches javascript:alert(0);

@SOFe 2016-09-04 15:13:37

I really wonder what is the signifance of www. in URLs.

@basit raza 2016-12-29 06:40:21

It also match the email address pattern i.e [email protected]

@Salem Artin 2018-01-26 22:21:02

This regex does not work for URLs that contain a comma For example: example.com/todo.do?id=123,456

@Mr_and_Mrs_D 2018-02-23 12:45:03

Link broken fix please

@Mahfuzur Rahman 2018-11-26 12:10:31

I think it in a very simple way. And it works very good.

var hasURL = (str) =>{
	var url_pattern = new RegExp("(www.|http://|https://|ftp://)\w*");
	if(!url_pattern.test(str)){
		document.getElementById("demo").innerHTML = 'No URL';
	}
	else
		document.getElementById("demo").innerHTML = 'String has a URL';
};
<p>Please enter a string and test it has any url or not</p>
<input type="text" id="url" placeholder="url" onkeyup="hasURL(document.getElementById('url').value)">
<p id="demo"></p>

@DrunkenPoney 2018-12-10 04:40:15

Your regex doesn't work at all bro. All it validates is that your string contains either www immediately followed by one character (any character since you haven't escaped the .) or http:// or https:// or ftp:// and any of these can be followed by any alphanumeric characters. So, in other words, all the following strings would result as being valid but they are obviously not valid urls : www., www▓, £¢¤£¢¤www¢ (See on regex101). You could have used a shorter regex: (www.|(https?|ftp)://)\w*. (This is still not a good regex btw)

@Mahfuzur Rahman 2018-12-10 05:34:57

Obviously www. , www▓, £¢¤£¢¤www¢ those are not valid urls. But I think, those are not also meaningful string. I just try to simplify the url pattern. @ DrunkenPoney

@DrunkenPoney 2018-12-10 16:35:28

My goal wasn't to write meaningful strings but to show that weird strings would be accepted and anyway since your regex validate for www I suppose you don't necessarily need the protocol to be specified but your regex wouldn't allow urls like google.com. Moreover, one of the problems I was trying to show you is that your regex matches wherever the validation parts (www, http, ...) are in the string. You could at least specify that your string needs to start with it.

@DrunkenPoney 2018-12-10 16:38:04

And if you want a quick regex to validate url but is not 100% safe here is one I made which I used to extract the different parts from an url but can be used to validate that a string contains the base parts of an url.

@DroidOS 2012-10-24 12:06:38

This is a rather old thread now and the question asks for a regex based URL validator. I ran into the thread whilst looking for precisely the same thing. While it may well be possible to write a really comprehensive regex to validate URLs. I eventually settled on another way to do things - by using PHP's parse_url function.

It returns boolean false if the url cannot be parsed. Otherwise, it returns the scheme, the host and other information. This may well not be enough for a comprehensive URL check on its own, but can be drilled down into for further analysis. If the intent is to simply catch typos, invalid schemes etc. It is perfectly adequate!

@Erick Maynard 2018-09-26 00:20:20

Interestingly, none of the answers above worked for what I needed, so I figured I would offer my solution. I needed to be able to do the following:

  • Match http(s)://www.google.com, http://google.com, www.google.com, and google.com
  • Match Github markdown style links like [Google](http://www.google.com)
  • Match all possible domain extensions, like .com, or .io, or .guru, etc. Basically anything between 2-6 characters in length
  • Split everything into proper groupings so that I could access each part as needed.

Here was the solution:

/^(\[[A-z0-9 _]*\]\()?((?:(http|https):\/\/)?(?:[\w-]+\.)+[a-z]{2,6})(\))?$

This gives me all of the above requirements. You could optionally add the ability for ftp and file if necessary:

/^(\[[A-z0-9 _]*\]\()?((?:(http|https|ftp|file):\/\/)?(?:[\w-]+\.)+[a-z]{2,6})(\))?$

@Hank Gay 2008-10-02 11:14:58

Non-validating URI-reference Parser

For reference purposes, here's the IETF Spec: (TXT | HTML). In particular, Appendix B. Parsing a URI Reference with a Regular Expression demonstrates how to parse a valid regex. This is described as,

for an example of a non-validating URI-reference parser that will take any given string and extract the URI components.

Here's the regex they provide:

 ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

As someone else said, it's probably best to leave this to a lib/framework you're already using.

@Alex D 2013-04-13 19:39:44

Completely useless. Can someone show me a string which this regex does not match? (Both "#?#?#" or "<<<>>>" match. What kind of URIs are those?)

@Hank Gay 2013-07-18 14:07:51

@AlexD Don't complain to me. That's the official specification for a URI. Take it up with the IETF if you don't like it.

@andyg0808 2013-12-13 10:12:10

@AlexD I think those might be considered relative references. See RFC 3986, section 4.2.

@Alex D 2013-12-13 18:34:48

@andyg0808, you may be right, but the fact remains that this regex matches virtually any string under the sun.

@Evan Carroll 2018-08-27 05:47:47

This is not a good answer because it's not validating, as per the question. It's parsing. Those are two different functions. If you give this regex trash, it tries to parse it. If the URL isn't valid, the parsing isn't guaranteed to work.

@Evan Carroll 2018-08-27 05:52:19

@AlexD this isn't a validating regex it's just a parsing regex.

@Laurie Stearn 2019-04-10 10:04:46

@Evan Carroll: Anything can be parsed according to some criteria. Feed any regex on this page with a string, and where it doesn't parse to a valid URL, it's an invalid URL by assertion. Then trialing the result validates the regex assertion. You're right, the answer says Non-validating URI-reference Parser "for reference purposes", which might be included in an answer to something like this thread, and then cross-linked.

@Nik Kov 2018-07-27 15:02:42

The best regex, i've found is: /(^|\s)((https?:\/\/)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\/\S*)?)/gi

For ios swift : (^|\\s)((https?:\\/\\/)?[\\w-]+(\\.[\\w-]+)+\\.?(:\\d+)?(\\/\\S*)?)

http://jsfiddle.net/9BYdp/1/

Found here

@dev_khan 2018-03-07 14:44:14

After rigorous searching i finally settled with the following

^[a-zA-Z0-9]+\:\/\/[a-zA-Z0-9]+\.[-a-zA-Z0-9]+\.?[a-zA-Z0-9]+$|^[a-zA-Z0-9]+\.[-a-zA-Z0-9]+\.[a-zA-Z0-9]+$

And this thing work for general in future URLs.

@Divya-Systematix 2017-08-30 17:20:08

I hope it's helpful for you...

^(http|https):\/\/+[\www\d]+\.[\w]+(\/[\w\d]+)?

@diazdeteran 2018-04-08 23:20:44

You're allowing for more than two forward slashes (/) in http or https and \w already includes the other two w and any digit. You probably meant this: ^(http|https):\/\/\w+\.\w+(\/\w+)?

@Ravi Matani 2018-01-23 18:35:47

Below expression will work for all popular domains. It will accept following urls:
www.yourwebsite.com
http://www.yourwebsite.com
www.yourwebsite.com
yourwebsite.com
yourwebsite.co.in
In addition it will make message with url as link also
e.g. please visit yourwebsite.com
In above example it will make yourwebsite.com as hyperlink

if (new RegExp("([-a-z0-9]{1,63}\\.)*?[a-z0-9][-a-z0-9]{0,61}[a-z0-9]\\.(com|com/|org|gov|cm|net|online|live|biz|us|uk|co.us|co.uk|in|co.in|int|info|edu|mil|ca|co|co.au|org/|gov/|cm/|net/|online/|live/|biz/|us/|uk/|co.us/|co.uk/|in/|co.in/|int/|info/|edu/|mil/|ca/|co/|co.au/)(/[-\\[email protected]\\+\\.~#\\?*&/=% ]*)?$").test(strMessage) || (new RegExp("^[a-z ]+[\.]?[a-z ]+?[\.]+[a-z ]+?[\.]+[a-z ]+?[-\\[email protected]\\+\\.~#\\?*&/=% ]*").test(strMessage) && new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_][email protected])?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(strMessage)) || (new RegExp("^[a-z ]+[\.]?[a-z ]+?[-\\[email protected]\\+\\.~#\\?*&/=% ]*").test(strMessage) && new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_][email protected])?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(strMessage))) {

        if (new RegExp("^[a-z ]+[\.]?[a-z ]+?[\.]+[a-z ]+?[\.]+[a-z ]+?$").test(strMessage) && new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_][email protected])?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(strMessage)) {

            var url1 = /(^|&lt;|\s)([\w\.]+\.(?:com|org|gov|cm|net|online|live|biz|us|uk|co.us|co.uk|in|co.in|int|info|edu|mil|ca|co|co.au))(\s|&gt;|$)/g;
            var html = $.trim(strMessage);
            if (html) {
                html = html
                      .replace(url1, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="http://$2">$2</a>$3');
            }
            returnString = html;
            return returnString;   
        }
        else {

            var url1 = /(^|&lt;|\s)(www\..+?\.(?:com|org|gov|cm|net|online|live|biz|us|uk|co.us|co.uk|in|co.in|int|info|edu|mil|ca|co|co.au)[^,\s]*)(\s|&gt;|$)/g,
                url2 = /(^|&lt;|\s)(((https?|ftp):\/\/|mailto:).+?\.(?:com|org|gov|cm|net|online|live|biz|us|uk|co.us|co.uk|in|co.in|int|info|edu|mil|ca|co|co.au)[^,\s]*)(\s|&gt;|$)/g,
                url3 = /(^|&lt;|\s)([\w\.]+\.(?:com|org|gov|cm|net|online|live|biz|us|uk|co.us|co.uk|in|co.in|int|info|edu|mil|ca|co|co.au)[^,\s]*)(\s|&gt;|$)/g;

            var html = $.trim(strMessage);
            if (html) {
                html = html
                    .replace(url1, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="http://$2">$2</a>$3')
                    .replace(url2, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="$2">$2</a>$5')
                    .replace(url3, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="http://$2">$2</a>$3');
            }
            returnString = html;

            return returnString;
        }
    }

@IT Eng - BU 2018-01-01 09:08:52

As far as I have found, this expression is good for me-

(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9]\.[^\s]{2,})

Working example-

function RegExForUrlMatch()
{
  var expression = /(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9]\.[^\s]{2,})/g;

  var regex = new RegExp(expression);
  var t = document.getElementById("url").value;

  if (t.match(regex)) {
    document.getElementById("demo").innerHTML = "Successful match";
  } else {
    document.getElementById("demo").innerHTML = "No match";
  }
}
<input type="text" id="url" placeholder="url" onkeyup="RegExForUrlMatch()">

<p id="demo">Please enter a URL to test</p>

@jeffkmeng 2018-03-28 23:28:38

Not all urls start with www or don't have a subdomain.

@Keng 2008-10-02 17:53:23

Here's what RegexBuddy uses.

(\b(https?|ftp|file)://)?[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]

It matches these below (inside the ** ** marks):

**http://www.regexbuddy.com**  
**http://www.regexbuddy.com/**  
**http://www.regexbuddy.com/index.html**  
**http://www.regexbuddy.com/index.html?source=library**  

You can download RegexBuddy at http://www.regexbuddy.com/download.html.

@toohool 2008-10-02 18:00:19

What about gopher? Poor, forgotten gopher.

@PandaWood 2010-11-13 07:18:13

Your regex doesn't match any url I can come up with - including those you've included. I paste your regex into rubular.com and it says "Forward slashes must be escaped." Is there a typo or can you clarify by getting it to work at rubular.com?

@Keng 2010-11-15 14:39:38

@PandaWood that's because you need to format for Ruby. What is Ruby's escape character?

@PandaWood 2010-11-22 01:28:27

Hi Keng, even if I copy your exact RegEx above into RegexBuddy, I can't match it on any URL. I guess there's something gone amiss in the markup. Ruby regex is hardly any different at this basic syntax level.

@Keng 2010-11-22 04:26:51

@PandaWood wait...if you have REB just go to the library and grab it. that's where i got it...check to see if they are the same.

@jpillora 2013-01-16 00:16:25

As a JavaScript RegExp literal: /\b(https?|ftp|file):\/\/[\-A-Za-z0-9+&@#\/%?=~_|!:,.;]*[\-A‌​-Za-z0-9+&@#\/%=~_|]‌​/

@Jayson Ragasa 2013-01-16 08:07:01

what about "The Regulator" ??? :( osherove.com/tools

@faizan 2013-05-16 14:45:19

this regex returns true for ajkshkadfshjdaf

@Keng 2014-03-21 18:26:29

@Mahesh Chand thanks for the update; the edit got rejected so I couldnt get a moderator to reinstate the enhancement. I think it got rejected because the reviewers would rather see code changes in comments and then let the OP add it. I made the update though. thanks.

@user6125029 2017-10-18 21:17:44

Thanks a lot! This exp is working on c++!

@Salem Artin 2018-01-26 22:22:48

The good thing about this regex is that it matches URLs with commas. For example: example.com/todo.do?id=123,456

@Michael Foukarakis 2018-05-30 08:30:46

This regex not only does not match many valid URIs, but also matches anything like [-A-Za-z0-9+&@#/%?=~_|!:,.;], which is of course nothing like a URI. I suggest deletion.

@Teejay 2018-09-17 10:42:52

This matches nearly everything... useless

@tk_ 2017-09-07 06:00:39

How about this:

^(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9]\.[^\s]{2,})$

These are the test cases:

Test cases

You can try it out in here : https://regex101.com/r/mS9gD7/41

@tk_ 2018-05-04 12:24:37

@downvoter can you let me know what is the issue with this regex? because I'm using this in my production setup :P

@AndroidDev 2017-08-17 06:43:02

This is not a regular expression but accomplishes the same thing (Javascript only):

function isAValidUrl(url) {
  try {
    new URL(url);
    return true;
  } catch(e) {
    return false;
  }
}

@Ali Habibzadeh 2017-12-06 14:43:38

The problem with this is that h ttp://bla is a valid URL (the space between h and t is so SO doesn't make it an actual URL)

@Ali Habibzadeh 2017-12-06 14:51:47

even h ttp://http://s is valid

@Blair Conrad 2008-10-02 10:59:09

The post Getting parts of a URL (Regex) discusses parsing a URL to identify its various components. If you want to check if a URL is well-formed, it should be sufficient for your needs.

If you need to check if it's actually valid, you'll eventually have to try to access whatever's on the other end.

In general, though, you'd probably be better off using a function that's supplied to you by your framework or another library. Many platforms include functions that parse URLs. For example, there's Python's urlparse module, and in .NET you could use the System.Uri class's constructor as a means of validating the URL.

@S.p 2013-07-18 04:47:04

The best regular expression for URL for me would be:

"(([\w]+:)?//)?(([\d\w]|%[a-fA-F\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)[email protected])?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?"

@rektide 2014-02-02 22:25:42

this seems to be limited w/r/t number of domains it'll accept?

@James Kuang 2014-02-03 23:19:30

Thanks! Here's the escaped version that worked for me on iOS: (([\\w]+:)?//)?(([\\d\\w]|%[a-fA-f\\d]{2,2})+(:([\\d\\w]|%[a‌​-fA-f\\d]{2,2})+)[email protected])‌​?([\\d\\w][-\\d\\w]{‌​0,253}[\\d\\w]\\.)+[‌​\\w]{2,4}(:[\\d]+)?(‌​/([-+_~.\\d\\w]|%[a-‌​fA-f\\d]{2,2})*)*(\\‌​?(&?([-+_~.\\d\\w]|%‌​[a-fA-f\\d]{2,2})=?)‌​*)?(#([-+_~.\\d\\w]|‌​%[a-fA-f\\d]{2,2})*)‌​?

@ndm13 2017-05-05 20:25:13

This regex only matches suffixes up to 4 characters long and fails on IP addresses (v4 and v6), localhost, and domain names with foreign characters. I would recommend editing your inclusion size ranges and replacing \w with \p{L} at a minimum.

@Yoav Feuerstein 2017-08-31 03:59:00

Note that this RegEx doesn't capture URLs that have subdomains of one letter only, like "m.sitename.com". In order to fix that, I had to change ([\d\w][-\d\w]{0,253}[\d\w]\.)+ into ([\d\w][-\d\w]{0,253}[\d\w]?\.)+ (add a question mark near the end of it)

@maxspan 2016-10-07 01:56:06

To Match a URL there are various option and it depend on you requirement. below are few.

_(^|[\s.:;?\-\]<\(])(https?://[-\w;/?:@&=+$\|\_.!~*\|'()\[\]%#,☺]+[\w/#](\(\))?)(?=$|[\s',\|\(\).:;?\-\[\]>\)])_i

#\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))#iS

And there is a link which gives you more than 10 different variations of validation for URL.

https://mathiasbynens.be/demo/url-regex

@ctwheels 2016-09-13 20:34:37

I created a similar regex (PCRE) to the one @eyelidlessness provided following RFC3987 along with other RFC documents. The major difference between @eyelidlessness and my regex are mainly readability and also URN support.

The regex below is all one piece (instead of being mixed with PHP) so it can be used in different languages very easily (so long as they support PCRE)

The easiest way to test this regex is to use regex101 and copy paste the code and test strings below with the appropriate modifiers (gmx).

To use this regex in PHP, insert the regex below into the following code:

$regex = <<<'EOD'
// Put the regex here
EOD;


You can match a link without a scheme by doing the following:
To match a link without a scheme (i.e. [email protected] or www.google.com/pathtofile.php?query), replace this section:

  (?:
    (?<scheme>
      (?<urn>urn)|
      (?&d_scheme)
    )
    :
  )?

with this:

  (?:
    (?<scheme>
      (?<urn>urn)|
      (?&d_scheme)
    )
    :
  )?

Note, however, that by replacing this, the regex does not become 100% reliable.


Regex (PCRE) with gmx modifiers for the multi-line test string below

(?(DEFINE)
  # Definitions
  (?<ALPHA>[\p{L}])
  (?<DIGIT>[0-9])
  (?<HEX>[0-9a-fA-F])
  (?<NCCHAR>
    (?&UNRESERVED)|
    (?&PCT_ENCODED)|
    (?&SUB_DELIMS)|
    @
  )
  (?<PCHAR>
    (?&UNRESERVED)|
    (?&PCT_ENCODED)|
    (?&SUB_DELIMS)|
    :|
    @|
    \/
  )
  (?<UCHAR>
    (?&UNRESERVED)|
    (?&PCT_ENCODED)|
    (?&SUB_DELIMS)|
    :
  )
  (?<RCHAR>
    (?&UNRESERVED)|
    (?&PCT_ENCODED)|
    (?&SUB_DELIMS)
  )
  (?<PCT_ENCODED>%(?&HEX){2})
  (?<UNRESERVED>
    ((?&ALPHA)|(?&DIGIT)|[-._~])
  )
  (?<RESERVED>(?&GEN_DELIMS)|(?&SUB_DELIMS))
  (?<GEN_DELIMS>[:\/?#\[\]@])
  (?<SUB_DELIMS>[!$&'()*+,;=])
  # URI Parts
  (?<d_scheme>
    (?!urn)
    (?:
      (?&ALPHA)
      ((?&ALPHA)|(?&DIGIT)|[+-.])*
      (?=:)
    )
  )
  (?<d_hier_part_slashes>
    (\/{2})?
  )
  (?<d_authority>(?&d_userinfo)?)
  (?<d_userinfo>(?&UCHAR)*)
  (?<d_ipv6>
    (?![^:]*::[^:]*::[^:]*)
    (
      (
        ((?&HEX){0,4})
        :
      ){1,7}
      ((?&d_ipv4)|:|(?&HEX){1,4})
    )
  )
  (?<d_ipv4>
    ((?&octet)\.){3}
    (?&octet)
  )
  (?<octet>
    (
      25[]0-5]|
      2[0-4](?&DIGIT)|
      1(?&DIGIT){2}|
      [1-9](?&DIGIT)|
      (?&DIGIT)
    )
  )
  (?<d_reg_name>(?&RCHAR)*)
  (?<d_urn_name>(?&UCHAR)*)
  (?<d_port>(?&DIGIT)*)
  (?<d_path>
    (
      \/
      ((?&PCHAR)*)*
      (?=\?|\#|$)
    )
  )
  (?<d_query>
    (
      ((?&PCHAR)|\/|\?)*
    )?
  )
  (?<d_fragment>
    (
      ((?&PCHAR)|\/|\?)*
    )?
  )
)
^
(?<link>
  (?:
    (?<scheme>
      (?<urn>urn)|
      (?&d_scheme)
    )
    :
  )
  (?(urn)
    (?:
      (?<namespace_identifier>[0-9a-zA-Z\-]+)
      :
      (?<namespace_specific_string>(?&d_urn_name)+)
    )
    |
    (?<hier_part>
      (?<slashes>(?&d_hier_part_slashes))
      (?<authority>
        (?:
          (?<userinfo>(?&d_authority))
          @
        )?
        (?<host>
          (?<ipv4>\[?(?&d_ipv4)\]?)|
          (?<ipv6>\[(?&d_ipv6)\])|
          (?<domain>(?&d_reg_name))
        )
        (?:
          :
          (?<port>(?&d_port))
        )?
      )
      (?<path>(?&d_path))?
    )
    (?:
      \?
      (?<query>(?&d_query))
    )?
    (?:
      \#
      (?<fragment>(?&d_fragment))
    )?
  )
)
$

Test Strings

# Valid URIs
ftp://cnn.example.com&[email protected]/top_story.htm
ftp://ftp.is.co.za/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB?objectClass?one
mailto:[email protected]
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
telnet://192.0.2.16:80/
urn:isbn:0451450523
urn:oid:2.16.840
urn:isan:0000-0000-9E59-0000-O-0000-0000-2
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
http://localhost/test/somefile.php?query=someval&variable=value#fragment
http://[2001:db8:a0b:12f0::1]/test
ftp://username:[email protected]/path/to/file/somefile.html?queryVariable=value#fragment
https://subdomain.domain.com/path/to/file.php?query=value#fragment
https://subdomain.example.com/path/to/file.php?query=value#fragment
mailto:john.smith(comment)@example.com
mailto:[email protected][2001:DB8::1]
mailto:[email protected][255:192:168:1]
mailto:[email protected]
http://localhost:4433/path/to/file?query#fragment
# Note that the example below IS a valid as it does follow RFC standards
localhost:4433/path/to/file

# These work with the optional scheme group although I'd suggest making the scheme mandatory as misinterpretations can occur
[email protected]
www.google.com/pathtofile.php?query
[192a:123::192.168.1.1]:80/path/to/file.html?query#fragment

@MithPaul 2016-08-29 04:17:51

I think I found a more general regexp to validate urls, particularly websites

​(https?:\/\/)?(www\.)[[email protected]:%._\+~#=]{2,256}\.[a-z]{2,4}\b([[email protected]:%_\+.~#?&//=]*)|(https?:\/\/)?(www\.)?(?!ww)[[email protected]:%._\+~#=]{2,256}\.[a-z]{2,4}\b([[email protected]:%_\+.~#?&//=]*)

it does not allow for instance www.something or http://www or http://www.something

Check it here: http://regexr.com/3e4a2

@Besnik Kastrati 2014-06-05 10:46:03

This will match all URLs

  • with or without http/https
  • with or without www

...including sub-domains and those new top-level domain name extensions such as .museum, .academy, .foundation etc. which can have up to 63 characters (not just .com, .net, .info etc.)

(([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)[email protected])?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?

Because today maximum length of the available top-level domain name extension is 13 characters such as .international, you can change the number 63 in expression to 13 to prevent someone misusing it.

as javascript

var urlreg=/(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)[email protected])?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/;

$('textarea').on('input',function(){
  var url = $(this).val();
  $(this).toggleClass('invalid', urlreg.test(url) == false)
});

$('textarea').trigger('input');
textarea{color:green;}
.invalid{color:red;}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea>http://www.google.com</textarea>
<textarea>http//www.google.com</textarea>
<textarea>googlecom</textarea>
<textarea>https://www.google.com</textarea>

Wikipedia Article: List of all internet top-level domains

@user1063287 2014-07-14 07:34:53

Could anyone please convert this for use in Javascript?

@Alkasai 2015-03-20 19:03:10

Finally!! Can someone mark this as an answer? Or at lease upvote it. I thing though, i don't think it matches single letter domains, i.e. t.co. How would you adjust it to handle these case?

@AwokeKnowing 2016-01-26 00:49:46

it seems to allow http// without :

@Can 2016-11-28 21:28:02

matches telephone numbers and email addresses have a look at regexr.com/3eosr copy pasted your regex, just escaped all slashes

@Mecki 2008-10-02 11:08:46

If you really search for the ultimate match, you probably find it on "A Good Url Regular Expression?".

But a regex that really matches all possible domains and allows anything that is allowed according to RFCs is horribly long and unreadable, trust me ;-)

@Rahul Desai 2015-10-09 00:50:48

I found the following Regex for URLs, tested successfully with 500+ URLs:

/\b(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)[email protected])?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?\b/gi

I know it looks ugly, but the good thing is that it works. :)

Explanation and demo with 581 random URLs on regex101.

Source: In search of the perfect URL validation regex

@Jonathan Maim 2015-11-10 04:42:15

Your regex is doing the work in 155'000 steps. Here is another regex that is evaluating all the 580 URLS your provided in 19'000 steps regex101 link: /(https?):\/\/([\w-]+(\.[\\w-]+)*\.([a-z]+))(([\w.,@?^=%&amp‌​;:\/~+#()!-]*)([\[email protected]?‌​^=%&amp;\/~+#()!-]))‌​?/gi

@Kiril 2012-02-14 21:32:43

Mathias Bynens has a great article on the best comparison of a lot of regular expressions: In search of the perfect URL validation regex

The best one posted is a little long, but it matches just about anything you can throw at it.

JavaScript version

/^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)[email protected])?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$/i

PHP version

_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)[email protected])?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]-*)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]-*)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$_iuS

@Toby Beresford 2016-10-05 13:59:03

For preg_match use with PHP use %^(?:(?:https?|ftp)://)(?:\S+(?::\S*)[email protected]|\d{1,3}(?:\.\d{1,3})‌​{3}|(?:(?:[a-z\d\x{0‌​0a1}-\x{ffff}]+-?)*[‌​a-z\d\x{00a1}-\x{fff‌​f}]+)(?:\.(?:[a-z\d\‌​x{00a1}-\x{ffff}]+-?‌​)*[a-z\d\x{00a1}-\x{‌​ffff}]+)*(?:\.[a-z\x‌​{00a1}-\x{ffff}]{2,6‌​}))(?::\d+)?(?:[^\s]‌​*)?$%iu

@Venryx 2017-04-29 01:16:18

On that page, I prefer stephenhay's solution, because it's 38 chars instead of 502!

@fishbone 2018-11-13 08:08:52

returns false for http://localhost/

@Matt Fletcher 2019-01-23 15:15:09

Also doesn't allow for IP addresses

@stackdave 2019-02-12 16:55:42

give valid (slash slash) : //www.2test.com/

@Fredmat 2015-05-14 13:19:38

You don't specify which language you're using. If PHP is, there is a native function for that:

$url = 'http://www.yoururl.co.uk/sub1/sub2/?param=1&param2/';

if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) {
    // Wrong
}
else {
    // Valid
}

Returns the filtered data, or FALSE if the filter fails.

Check it here >>

Hope it helps.

@Daniel Mihai 2015-04-06 13:36:43

This should work:

function validateUrl(value){
	return /^(http(s)?:\/\/.)?(www\.)?[[email protected]:%._\+~#=]{2,256}\.[a-z]{2,6}\b([[email protected]:%_\+.~#?&//=]*)$/gi.test(value);
}

console.log(validateUrl('google.com')); // true
console.log(validateUrl('www.google.com')); // true
console.log(validateUrl('http://www.google.com')); // true
console.log(validateUrl('http:/www.google.com')); // false
console.log(validateUrl('www.google.com/test')); // true

@Fernando Gabrieli 2018-06-02 23:12:45

Thank you Daniel, you need to add port support like localhost:8080

@kash 2015-02-15 14:49:19

Here's a ready-to-go Java version from the Android source code. This is the best one I've found.

public static final Matcher WEB  = Pattern.compile(new StringBuilder()                 
.append("((?:(http|https|Http|Https|rtsp|Rtsp):")                      
.append("\\/\\/(?:(?:[a-zA-Z0-9\\$\\-\\_\\.\\+\\!\\*\\'\\(\\)")                         
.append("\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,64}(?:\\:(?:[a-zA-Z0-9\\$\\-\\_")                         
.append("\\.\\+\\!\\*\\'\\(\\)\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,25})?\\@)?)?")                         
.append("((?:(?:[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}\\.)+")   // named host                            
.append("(?:")   // plus top level domain                         
.append("(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])")                         
.append("|(?:biz|b[abdefghijmnorstvwyz])")                         
.append("|(?:cat|com|coop|c[acdfghiklmnoruvxyz])")                         
.append("|d[ejkmoz]")                         
.append("|(?:edu|e[cegrstu])")                         
.append("|f[ijkmor]")                         
.append("|(?:gov|g[abdefghilmnpqrstuwy])")                         
.append("|h[kmnrtu]")                         
.append("|(?:info|int|i[delmnoqrst])")                         
.append("|(?:jobs|j[emop])")                         
.append("|k[eghimnrwyz]")                         
.append("|l[abcikrstuvy]")                         
.append("|(?:mil|mobi|museum|m[acdghklmnopqrstuvwxyz])")                         
.append("|(?:name|net|n[acefgilopruz])")                         
.append("|(?:org|om)")                         
.append("|(?:pro|p[aefghklmnrstwy])")                         
.append("|qa")                         
.append("|r[eouw]")                         
.append("|s[abcdeghijklmnortuvyz]")                         
.append("|(?:tel|travel|t[cdfghjklmnoprtvwz])")                         
.append("|u[agkmsyz]")                         
.append("|v[aceginu]")                         
.append("|w[fs]")                         
.append("|y[etu]")                         
.append("|z[amw]))")                         
.append("|(?:(?:25[0-5]|2[0-4]") // or ip address                                                 
.append("[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\\.(?:25[0-5]|2[0-4][0-9]")                             
.append("|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1]")                         
.append("[0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}")                         
.append("|[1-9][0-9]|[0-9])))")                         
.append("(?:\\:\\d{1,5})?)") // plus option port number                             
.append("(\\/(?:(?:[a-zA-Z0-9\\;\\/\\?\\:\\@\\&\\=\\#\\~")  // plus option query params                         
.append("\\-\\.\\+\\!\\*\\'\\(\\)\\,\\_])|(?:\\%[a-fA-F0-9]{2}))*)?")                         
.append("(?:\\b|$)").toString()                 
).matcher("");

@osgx 2017-02-28 02:17:31

This don't work with "New gTLDs", check en.wikipedia.org/wiki/List_of_Internet_top-level_domains & newgtlds.icann.org/en/program-status/delegated-strings. Hardcoding list of TLD is bad practice... Some public suffix lists are available, they include recent variant of TLD: publicsuffix.org (used in Firefox, Chrome, IE)

@ndm13 2017-04-17 23:18:43

My first thought at seeing this: there's no kill like overkill. They literally took all ccTLDs and built a regex to match them specifically. Cuts down on false positives, I suppose, but a terrible way to handle the situation.

@runlevel0 2015-01-14 14:20:24

To match the URL up to the domain:

(^(\bhttp)(|s):\/{2})(?=[a-z0-9-_]{1,255})\.\1\.([a-z]{3,7}$)

It can be simplified to:

(^(\bhttp)(|s):\/{2})(?=[a-z0-9-_.]{1,255})\.([a-z]{3,7})

the latter does not check for the end for the end line so that it can be later used create full blown URL with full paths and query strings.

Related Questions

Sponsored Content

8 Answered Questions

[SOLVED] Is there a regular expression to detect a valid regular expression?

  • 2008-10-05 17:07:35
  • psytek
  • 101508 View
  • 660 Score
  • 8 Answer
  • Tags:   regex

28 Answered Questions

16 Answered Questions

[SOLVED] What is the maximum length of a URL in different browsers?

  • 2009-01-06 16:14:30
  • Sander Versluys
  • 1098587 View
  • 4435 Score
  • 16 Answer
  • Tags:   http url browser

24 Answered Questions

[SOLVED] How to replace plain URLs with links?

  • 2008-09-01 09:58:49
  • Sergio del Amo
  • 226818 View
  • 421 Score
  • 24 Answer
  • Tags:   javascript regex

10 Answered Questions

[SOLVED] Converting user input string to regular expression

  • 2009-05-17 14:20:18
  • Gordon Gustafson
  • 236022 View
  • 297 Score
  • 10 Answer
  • Tags:   javascript html regex

18 Answered Questions

[SOLVED] How do you use a variable in a regular expression?

  • 2009-01-30 00:11:05
  • JC Grubbs
  • 602793 View
  • 1131 Score
  • 18 Answer
  • Tags:   javascript regex

71 Answered Questions

16 Answered Questions

[SOLVED] How do you access the matched groups in a JavaScript regular expression?

  • 2009-01-11 07:21:20
  • nickf
  • 650322 View
  • 1169 Score
  • 16 Answer
  • Tags:   javascript regex

31 Answered Questions

[SOLVED] What is the difference between a URI, a URL and a URN?

  • 2008-10-06 21:26:58
  • Sean McMains
  • 1051552 View
  • 4054 Score
  • 31 Answer
  • Tags:   http url uri urn rfc3986

7 Answered Questions

Sponsored Content