By AndreLiem


2008-10-15 19:24:51 8 Comments

I've been looking for a simple regex for URLs, does anybody have one handy that works well? I didn't find one with the zend framework validation classes and have seen several implementations.

20 comments

@thespacecamel 2018-08-31 18:01:42

For anyone developing with WordPress, just use

esc_url_raw($url) === $url

to validate a URL (here's WordPress' documentation on esc_url_raw). It handles URLs much better than filter_var($url, FILTER_VALIDATE_URL) because it is unicode and XSS-safe. (Here is a good article mentioning all the problems with filter_var).

@Some_North_korea_kid 2018-07-20 11:14:54

"/(http(s?):\/\/)([a-z0-9\-]+\.)+[a-z]{2,4}(\.[a-z]{2,4})*(\/[^ ]+)*/i"
  1. (http(s?)://) means http:// or https://

  2. ([a-z0-9-]+.)+ => 2.0[a-z0-9-] means any a-z character or any 0-9 or (-)sign)

                 2.1 (+) means the character can be one or more ex: a1w, 
                     a9-,c559s, f)
    
                 2.2 \. is (.)sign
    
                 2.3. the (+) sign after ([a-z0-9\-]+\.) mean do 2.1,2.2,2.3 
                    at least 1 time 
                  ex: abc.defgh0.ig, aa.b.ced.f.gh. also in case www.yyy.com
    
                 3.[a-z]{2,4} mean a-z at least 2 character but not more than 
                              4 characters for check that there will not be 
                              the case 
                              ex: https://www.google.co.kr.asdsdagfsdfsf
    
                 4.(\.[a-z]{2,4})*(\/[^ ]+)* mean 
    
                   4.1 \.[a-z]{2,4} means like number 3 but start with 
                       (.)sign 
    
                   4.2 * means (\.[a-z]{2,4})can be use or not use never mind
    
                   4.3 \/ means \
                   4.4 [^ ] means any character except blank
                   4.5 (+) means do 4.3,4.4,4.5 at least 1 times
                   4.6 (*) after (\/[^ ]+) mean use 4.3 - 4.5 or not use 
                       no problem
    
                   use for case https://stackoverflow.com/posts/51441301/edit
    
                   5. when you use regex write in "/ /" so it come
    

    "/(http(s?)://)([a-z0-9-]+.)+[a-z]{2,4}(.[a-z]{2,4})(/[^ ]+)/i"

                   6. almost forgot: letter i on the back mean ignore case of 
                      Big letter or small letter ex: A same as a, SoRRy same 
                      as sorry.
    

Note : Sorry for bad English. My country not use it well.

@Nic3500 2018-07-20 11:41:35

Did you notice how old this question is? Please explain your regex, users who do not know already will have a hard time understanding it without details.

@Kitson88 2017-02-08 16:01:28

Here's a simple class for URL Validation using RegEx and then cross-references the domain against popular RBL (Realtime Blackhole Lists) servers:

Install:

require 'URLValidation.php';

Usage:

require 'URLValidation.php';
$urlVal = new UrlValidation(); //Create Object Instance

Add a URL as the parameter of the domain() method and check the the return.

$urlArray = ['http://www.bokranzr.com/test.php?test=foo&test=dfdf', 'https://en-gb.facebook.com', 'https://www.google.com'];
foreach ($urlArray as $k=>$v) {

    echo var_dump($urlVal->domain($v)) . ' URL: ' . $v . '<br>';

}

Output:

bool(false) URL: http://www.bokranzr.com/test.php?test=foo&test=dfdf
bool(true) URL: https://en-gb.facebook.com
bool(true) URL: https://www.google.com

As you can see above, www.bokranzr.com is listed as malicious website via an RBL so the domain was returned as false.

@Xavi Montero 2016-12-13 23:35:25

Inspired in this .NET StackOverflow question and in this referenced article from that question there is this URI validator (URI means it validates both URL and URN).

if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
{
    throw new \RuntimeException( "URI has not a valid format." );
}

I have successfully unit-tested this function inside a ValueObject I made named Uri and tested by UriTest.

UriTest.php (Contains valid and invalid cases for both URLs and URNs)

<?php

declare( strict_types = 1 );

namespace XaviMontero\ThrasherPortage\Tests\Tour;

use XaviMontero\ThrasherPortage\Tour\Uri;

class UriTest extends \PHPUnit_Framework_TestCase
{
    private $sut;

    public function testCreationIsOfProperClassWhenUriIsValid()
    {
        $sut = new Uri( 'http://example.com' );
        $this->assertInstanceOf( 'XaviMontero\\ThrasherPortage\\Tour\\Uri', $sut );
    }

    /**
     * @dataProvider urlIsValidProvider
     * @dataProvider urnIsValidProvider
     */
    public function testGetUriAsStringWhenUriIsValid( string $uri )
    {
        $sut = new Uri( $uri );
        $actual = $sut->getUriAsString();

        $this->assertInternalType( 'string', $actual );
        $this->assertEquals( $uri, $actual );
    }

    public function urlIsValidProvider()
    {
        return
            [
                [ 'http://example-server' ],
                [ 'http://example.com' ],
                [ 'http://example.com/' ],
                [ 'http://subdomain.example.com/path/?parameter1=value1&parameter2=value2' ],
                [ 'random-protocol://example.com' ],
                [ 'http://example.com:80' ],
                [ 'http://example.com?no-path-separator' ],
                [ 'http://example.com/pa%20th/' ],
                [ 'ftp://example.org/resource.txt' ],
                [ 'file://../../../relative/path/needs/protocol/resource.txt' ],
                [ 'http://example.com/#one-fragment' ],
                [ 'http://example.edu:8080#one-fragment' ],
            ];
    }

    public function urnIsValidProvider()
    {
        return
            [
                [ 'urn:isbn:0-486-27557-4' ],
                [ 'urn:example:mammal:monotreme:echidna' ],
                [ 'urn:mpeg:mpeg7:schema:2001' ],
                [ 'urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                [ 'rare-urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                [ 'urn:FOO:a123,456' ]
            ];
    }

    /**
     * @dataProvider urlIsNotValidProvider
     * @dataProvider urnIsNotValidProvider
     */
    public function testCreationThrowsExceptionWhenUriIsNotValid( string $uri )
    {
        $this->expectException( 'RuntimeException' );
        $this->sut = new Uri( $uri );
    }

    public function urlIsNotValidProvider()
    {
        return
            [
                [ 'only-text' ],
                [ 'http//missing.colon.example.com/path/?parameter1=value1&parameter2=value2' ],
                [ 'missing.protocol.example.com/path/' ],
                [ 'http://example.com\\bad-separator' ],
                [ 'http://example.com|bad-separator' ],
                [ 'ht tp://example.com' ],
                [ 'http://exampl e.com' ],
                [ 'http://example.com/pa th/' ],
                [ '../../../relative/path/needs/protocol/resource.txt' ],
                [ 'http://example.com/#two-fragments#not-allowed' ],
                [ 'http://example.edu:portMustBeANumber#one-fragment' ],
            ];
    }

    public function urnIsNotValidProvider()
    {
        return
            [
                [ 'urn:mpeg:mpeg7:sch ema:2001' ],
                [ 'urn|mpeg:mpeg7:schema:2001' ],
                [ 'urn?mpeg:mpeg7:schema:2001' ],
                [ 'urn%mpeg:mpeg7:schema:2001' ],
                [ 'urn#mpeg:mpeg7:schema:2001' ],
            ];
    }
}

Uri.php (Value Object)

<?php

declare( strict_types = 1 );

namespace XaviMontero\ThrasherPortage\Tour;

class Uri
{
    /** @var string */
    private $uri;

    public function __construct( string $uri )
    {
        $this->assertUriIsCorrect( $uri );
        $this->uri = $uri;
    }

    public function getUriAsString()
    {
        return $this->uri;
    }

    private function assertUriIsCorrect( string $uri )
    {
        // https://stackoverflow.com/questions/30847/regex-to-validate-uris
        // http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/

        if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
        {
            throw new \RuntimeException( "URI has not a valid format." );
        }
    }
}

Running UnitTests

There are 65 assertions in 46 tests. Caution: there are 2 data-providers for valid and 2 more for invalid expressions. One is for URLs and the other for URNs. If you are using a version of PhpUnit of v5.6* or earlier then you need to join the two data providers into a single one.

[email protected]:~/custom_www/hello-trip/mutant-migrant$ vendor/bin/phpunit
PHPUnit 5.7.3 by Sebastian Bergmann and contributors.

..............................................                    46 / 46 (100%)

Time: 82 ms, Memory: 4.00MB

OK (46 tests, 65 assertions)

Code coverage

There's is 100% of code-coverage in this sample URI checker.

@Tim Groeneveld 2014-02-19 05:47:47

OK, so this is a little bit more complex then a simple regex, but it allows for different types of urls.

Examples:

All which should be marked as valid.

function is_valid_url($url) {
    // First check: is the url just a domain name? (allow a slash at the end)
    $_domain_regex = "|^[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})/?$|";
    if (preg_match($_domain_regex, $url)) {
        return true;
    }

    // Second: Check if it's a url with a scheme and all
    $_regex = '#^([a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))$#';
    if (preg_match($_regex, $url, $matches)) {
        // pull out the domain name, and make sure that the domain is valid.
        $_parts = parse_url($url);
        if (!in_array($_parts['scheme'], array( 'http', 'https' )))
            return false;

        // Check the domain using the regex, stops domains like "-example.com" passing through
        if (!preg_match($_domain_regex, $_parts['host']))
            return false;

        // This domain looks pretty valid. Only way to check it now is to download it!
        return true;
    }

    return false;
}

Note that there is a in_array check for the protocols that you want to allow (currently only http and https are in that list).

var_dump(is_valid_url('google.com'));         // true
var_dump(is_valid_url('google.com/'));        // true
var_dump(is_valid_url('http://google.com'));  // true
var_dump(is_valid_url('http://google.com/')); // true
var_dump(is_valid_url('https://google.com')); // true

@user3396065 2016-11-20 15:34:05

Throws: ErrorException: Undefined index: scheme if the protocol is not specified i suggest to check if is set before.

@Tim Groeneveld 2016-11-28 01:31:17

@user3396065, can you please provide an example input that throws this?

@Stanislav 2008-10-16 06:55:36

Use the filter_var() function to validate whether a string is URL or not:

var_dump(filter_var('example.com', FILTER_VALIDATE_URL));

It is bad practice to use regular expressions when not necessary.

EDIT: Be careful, this solution is not unicode-safe and not XSS-safe. If you need a complex validation, maybe it's better to look somewhere else.

@Owen 2008-10-19 08:07:13

this is definitely a great alternative, unfortunately it's php 5.2+ (unless you install the PECL version)

@John Scipione 2009-11-11 22:24:29

filter_var only works in PHP >= 5.2.0

@vamin 2010-06-01 23:27:41

There's a bug in 5.2.13 (and I think 5.3.2) that prevents urls with dashes in them from validating using this method.

@Cesar 2010-09-06 19:30:19

filter_var will reject test-site.com, I have domain names with dashes, wheter they are valid or not. I don't think filter_var is the best way to validate a url. It will allow a url like http://www

@Stanislav 2010-09-07 10:34:30

> It will allow a url like 'www' It is OK when URL like 'localhost'

@Mathias Bynens 2010-12-03 13:18:04

This will allow http://??/.

@liviucmg 2011-03-29 17:43:42

One particular problem: This validates URLs according to RFC 2396 which does not allow underscores in subdomains, but some websites do have underscores in subdomains.

@Benji XVI 2011-05-10 13:24:13

The other problem with this method is it is not unicode-safe.

@Zack Zatkin-Gold 2012-01-12 02:04:55

The filter_var function has since been updated and now it's possible to validate URLs effectively with dashes included, rendering the your comment incorrect, @vamin (see bug report here).

@vamin 2012-01-23 18:12:07

@zzatkin, the bug report states that the fix is incorporated into the later 5.2.14 and 5.3.3 versions (it came too late for 5.2.13 and 5.3.2), though I agree it's not really an issue anymore so long as you keep PHP up to date.

@Andrii Nemchenko 2012-07-08 17:30:01

@Galen You answer doesn't contain a code example.

@Bretticus 2012-11-19 19:31:43

It also will validate onedomain.com<br>http://www.anotherone.com<br>http:/… I'm finding out today. Not what I had in mind! Going back to a regular expression alternative (PHP Version => 5.4.4)

@Sawny 2012-12-16 19:06:33

Dosen't accept UTF-8 characters. Will return false for http://wiki.com/öva/mä/åäö.

@mic 2013-09-30 09:35:52

The filter_var appears to validate all different kinds of URL formats whether they are valid or not, it seems that the regex is the way to correctly validate URL's

@bhaskarc 2015-03-15 17:59:23

yet another issue is that it does not validate against newer tlds like .me, .cm .guru etc

@RisingSun 2015-05-04 18:46:00

This is a bad solution which should not have so many up votes. Highly XSS vulnerable.

@Nick Rice 2016-09-12 11:09:42

Downvoted as dangerous. Read the comments about it the online PHP manual!

@S. Imp 2017-05-12 21:53:42

FILTER_VALIDATE_URL has a lot of problems that need fixing. Also, the docs describing the flags do not reflect the actual source code where references to some flags have been removed entirely. More info here: news.php.net/php.internals/99018

@thespacecamel 2018-08-31 18:31:41

Hree's another article explaining the problems with this: d-mueller.de/blog/…

@Fredmat 2015-05-14 13:13:23

There is a PHP native function for that:

$url = 'http://www.yoururl.co.uk/sub1/sub2/?param=1&param2/';

if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) {
    // Wrong
}
else {
    // Valid
}

Returns the filtered data, or FALSE if the filter fails.

Check it here

@suspectus 2015-06-30 12:13:28

This answer duplicates one of the answers from 2008!

@Owen 2008-10-15 19:30:51

I used this on a few projects, I don't believe I've run into issues, but I'm sure it's not exhaustive:

$text = preg_replace(
  '#((https?|ftp)://(\S*?\.\S*?))([\s)\[\]{},;"\':<]|\.\s|$)#i',
  "'<a href=\"$1\" target=\"_blank\">$3</a>$4'",
  $text
);

Most of the random junk at the end is to deal with situations like http://domain.com. in a sentence (to avoid matching the trailing period). I'm sure it could be cleaned up but since it worked. I've more or less just copied it over from project to project.

@alex 2009-05-27 03:30:22

This has been downvoted... can anyone explain why?

@Alan Moore 2009-05-30 05:53:50

Some things that jump out at me: use of alternation where character classes are called for (every alternative matches exactly one character); and the replacement shouldn't have needed the outer double-quotes (they were only needed because of the pointless /e modifier on the regex).

@Gumbo 2010-01-04 08:30:57

@John Scipione: google.com is only a valid relative URL path but not a valid absolute URL. And I think that’s what he’s looking for.

@Softy 2011-02-02 09:06:46

This doesn't work in this case - it includes the trailing ": 3 cantari noi in albumul <a href="audio.resursecrestine.ro/cantece/index-autori/andrei-r‌​osu/…>

@Stephen P 2011-07-27 23:55:04

@Softy something like http://example.com/somedir/... is a perfectly legitimate URL, asking for the file named ... - which is a legitimate file name.

@Tamlyn 2012-08-15 14:44:39

Doesn't work if the url is at the end of the string.

@Joko Wandiro 2013-11-26 08:03:04

I'm using Zend\Validator\Regex to validate url using your pattern, but it still detect http://www.example to be valid

@Graham T 2014-05-17 14:37:28

Great and comprehensive resource here: mathiasbynens.be/demo/url-regex

@Thomas Venturini 2014-08-21 09:17:01

Here is the way I did it. But I want to mentoin that I am not so shure about the regex. But It should work thou :)

$pattern = "#((http|https)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|”|\"|'|:|\<|$|\.\s)#i";
        $text = preg_replace_callback($pattern,function($m){
                return "<a href=\"$m[1]\" target=\"_blank\">$m[1]</a>$m[4]";
            },
            $text);

This way you won't need the eval marker on your pattern.

Hope it helps :)

@George Milonas 2012-12-07 03:15:06

And there is your answer =) Try to break it, you can't!!!

function link_validate_url($text) {
$LINK_DOMAINS = 'aero|arpa|asia|biz|com|cat|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|mobi|local';
  $LINK_ICHARS_DOMAIN = (string) html_entity_decode(implode("", array( // @TODO completing letters ...
    "&#x00E6;", // æ
    "&#x00C6;", // Æ
    "&#x00C0;", // À
    "&#x00E0;", // à
    "&#x00C1;", // Á
    "&#x00E1;", // á
    "&#x00C2;", // Â
    "&#x00E2;", // â
    "&#x00E5;", // å
    "&#x00C5;", // Å
    "&#x00E4;", // ä
    "&#x00C4;", // Ä
    "&#x00C7;", // Ç
    "&#x00E7;", // ç
    "&#x00D0;", // Ð
    "&#x00F0;", // ð
    "&#x00C8;", // È
    "&#x00E8;", // è
    "&#x00C9;", // É
    "&#x00E9;", // é
    "&#x00CA;", // Ê
    "&#x00EA;", // ê
    "&#x00CB;", // Ë
    "&#x00EB;", // ë
    "&#x00CE;", // Î
    "&#x00EE;", // î
    "&#x00CF;", // Ï
    "&#x00EF;", // ï
    "&#x00F8;", // ø
    "&#x00D8;", // Ø
    "&#x00F6;", // ö
    "&#x00D6;", // Ö
    "&#x00D4;", // Ô
    "&#x00F4;", // ô
    "&#x00D5;", // Õ
    "&#x00F5;", // õ
    "&#x0152;", // Œ
    "&#x0153;", // œ
    "&#x00FC;", // ü
    "&#x00DC;", // Ü
    "&#x00D9;", // Ù
    "&#x00F9;", // ù
    "&#x00DB;", // Û
    "&#x00FB;", // û
    "&#x0178;", // Ÿ
    "&#x00FF;", // ÿ 
    "&#x00D1;", // Ñ
    "&#x00F1;", // ñ
    "&#x00FE;", // þ
    "&#x00DE;", // Þ
    "&#x00FD;", // ý
    "&#x00DD;", // Ý
    "&#x00BF;", // ¿
  )), ENT_QUOTES, 'UTF-8');

  $LINK_ICHARS = $LINK_ICHARS_DOMAIN . (string) html_entity_decode(implode("", array(
    "&#x00DF;", // ß
  )), ENT_QUOTES, 'UTF-8');
  $allowed_protocols = array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal');

  // Starting a parenthesis group with (?: means that it is grouped, but is not captured
  $protocol = '((?:'. implode("|", $allowed_protocols) .'):\/\/)';
  $authentication = "(?:(?:(?:[\w\.\-\+!$&'\(\)*\+,;=" . $LINK_ICHARS . "]|%[0-9a-f]{2})+(?::(?:[\w". $LINK_ICHARS ."\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})*)?)[email protected])";
  $domain = '(?:(?:[a-z0-9' . $LINK_ICHARS_DOMAIN . ']([a-z0-9'. $LINK_ICHARS_DOMAIN . '\-_\[\]])*)(\.(([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_\[\]])+\.)*('. $LINK_DOMAINS .'|[a-z]{2}))?)';
  $ipv4 = '(?:[0-9]{1,3}(\.[0-9]{1,3}){3})';
  $ipv6 = '(?:[0-9a-fA-F]{1,4}(\:[0-9a-fA-F]{1,4}){7})';
  $port = '(?::([0-9]{1,5}))';

  // Pattern specific to external links.
  $external_pattern = '/^'. $protocol .'?'. $authentication .'?('. $domain .'|'. $ipv4 .'|'. $ipv6 .' |localhost)'. $port .'?';

  // Pattern specific to internal links.
  $internal_pattern = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]]+)";
  $internal_pattern_file = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]\.]+)$/i";

  $directories = "(?:\/[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'#!():;*@\[\]]*)*";
  // Yes, four backslashes == a single backslash.
  $query = "(?:\/?\?([?a-z0-9". $LINK_ICHARS ."+_|\-\.~\/\\\\%=&,$'():;*@\[\]{} ]*))";
  $anchor = "(?:#[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'():;*@\[\]\/\?]*)";

  // The rest of the path for a standard URL.
  $end = $directories .'?'. $query .'?'. $anchor .'?'.'$/i';

  $message_id = '[^@].*@'. $domain;
  $newsgroup_name = '(?:[0-9a-z+-]*\.)*[0-9a-z+-]*';
  $news_pattern = '/^news:('. $newsgroup_name .'|'. $message_id .')$/i';

  $user = '[a-zA-Z0-9'. $LINK_ICHARS .'_\-\.\+\^!#\$%&*+\/\=\?\`\|\{\}~\'\[\]]+';
  $email_pattern = '/^mailto:'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/';

  if (strpos($text, '<front>') === 0) {
    return false;
  }
  if (in_array('mailto', $allowed_protocols) && preg_match($email_pattern, $text)) {
    return false;
  }
  if (in_array('news', $allowed_protocols) && preg_match($news_pattern, $text)) {
    return false;
  }
  if (preg_match($internal_pattern . $end, $text)) {
    return false;
  }
  if (preg_match($external_pattern . $end, $text)) {
    return false;
  }
  if (preg_match($internal_pattern_file, $text)) {
    return false;
  }

  return true;
}

@Jeff Puckett 2016-09-26 20:17:20

There are a lot more top level domains.

@Vikash Kumar 2012-10-16 08:57:46

    function validateURL($URL) {
      $pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
      $pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";       
      if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){
        return true;
      } else{
        return false;
      }
    }

@user3396065 2016-11-20 15:20:21

Doesn't works with link like: 'www.w3schools.com/home/3/?a=l'

@Jeremy Moore 2012-08-05 23:17:48

I've found this to be the most useful for matching a URL..

^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$

@andrewsi 2012-09-30 20:27:27

Will that match URLs that begin with ftp: ?

@Shahbaz 2013-09-26 11:43:48

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

@promaty 2011-05-11 18:29:05

I don't think that using regular expressions is a smart thing to do in this case. It is impossible to match all of the possibilities and even if you did, there is still a chance that url simply doesn't exist.

Here is a very simple way to test if url actually exists and is readable :

if (preg_match("#^https?://.+#", $link) and @fopen($link,"r")) echo "OK";

(if there is no preg_match then this would also validate all filenames on your server)

@jini 2011-03-30 20:45:07

function is_valid_url ($url="") {

        if ($url=="") {
            $url=$this->url;
        }

        $url = @parse_url($url);

        if ( ! $url) {


            return false;
        }

        $url = array_map('trim', $url);
        $url['port'] = (!isset($url['port'])) ? 80 : (int)$url['port'];
        $path = (isset($url['path'])) ? $url['path'] : '';

        if ($path == '') {
            $path = '/';
        }

        $path .= ( isset ( $url['query'] ) ) ? "?$url[query]" : '';



        if ( isset ( $url['host'] ) AND $url['host'] != gethostbyname ( $url['host'] ) ) {
            if ( PHP_VERSION >= 5 ) {
                $headers = get_headers("$url[scheme]://$url[host]:$url[port]$path");
            }
            else {
                $fp = fsockopen($url['host'], $url['port'], $errno, $errstr, 30);

                if ( ! $fp ) {
                    return false;
                }
                fputs($fp, "HEAD $path HTTP/1.1\r\nHost: $url[host]\r\n\r\n");
                $headers = fread ( $fp, 128 );
                fclose ( $fp );
            }
            $headers = ( is_array ( $headers ) ) ? implode ( "\n", $headers ) : $headers;
            return ( bool ) preg_match ( '#^HTTP/.*\s+[(200|301|302)]+\s#i', $headers );
        }

        return false;
    }

@Yuda Prawira 2011-07-18 07:42:26

pretty nice on security XD

@pgee70 2014-09-28 21:41:05

Hi this solution is good, and i upvoted it, but it doesn't take into account the standard port for https: -- suggest you just replace 80 with '' where it works out the port

@Raz0rwire 2016-07-18 13:34:59

I ended up implementing a variation on this, because my domain cares whether an URL actually exists or not :)

@abhiomkar 2011-03-13 11:46:46

As per John Gruber (Daring Fireball):

Regex:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))

using in preg_match():

preg_match("/(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $url)

Here is the extended regex pattern (with comments):

(?xi)
\b
(                       # Capture 1: entire matched URL
  (?:
    https?://               # http or https protocol
    |                       #   or
    www\d{0,3}[.]           # "www.", "www1.", "www2." … "www999."
    |                           #   or
    [a-z0-9.\-]+[.][a-z]{2,4}/  # looks like domain name followed by a slash
  )
  (?:                       # One or more:
    [^\s()<>]+                  # Run of non-space, non-()<>
    |                           #   or
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
  )+
  (?:                       # End with:
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
    |                               #   or
    [^\s`!()\[\]{};:'".,<>?«»“”‘’]        # not a space or one of these punct chars
  )
)

For more details please look at: http://daringfireball.net/2010/07/improved_regex_for_matching_urls

@Roger 2011-02-11 13:04:14

Just in case you want to know if the url really exists:

function url_exist($url){//se passar a URL existe
    $c=curl_init();
    curl_setopt($c,CURLOPT_URL,$url);
    curl_setopt($c,CURLOPT_HEADER,1);//get the header
    curl_setopt($c,CURLOPT_NOBODY,1);//and *only* get the header
    curl_setopt($c,CURLOPT_RETURNTRANSFER,1);//get the response as a string from curl_exec(), rather than echoing it
    curl_setopt($c,CURLOPT_FRESH_CONNECT,1);//don't use a cached version of the url
    if(!curl_exec($c)){
        //echo $url.' inexists';
        return false;
    }else{
        //echo $url.' exists';
        return true;
    }
    //$httpcode=curl_getinfo($c,CURLINFO_HTTP_CODE);
    //return ($httpcode<400);
}

@Yzmir Ramirez 2011-08-06 18:14:07

I would still do some kind of validation on $url before actually verifying the url is real because the above operation is expensive - perhaps as much as 200 milliseconds depending on file size. In some cases the url may not actually have a resource at its location available yet (e.g. creating a url to an image that has yet to be uploaded). Additionally you're not using a cached version so its not like file_exists() that will cache a stat on a file and return nearly instantly. The solution you provided is still useful though. Why not just use fopen($url, 'r')?

@PJ Brunet 2012-03-20 20:24:09

Thanks, just what I was looking for. However, I made a mistake trying to use it. The function is "url_exist" not "url_exists" oops ;-)

@siliconpi 2012-05-10 07:14:28

Is there any security risk in directly accessing the user entered URL?

@Camaleo 2018-03-12 13:28:07

you would like to add a check if a 404 was found: <code> $httpCode = curl_getinfo( $c, CURLINFO_HTTP_CODE ); //echo $url . ' ' . $httpCode . '<br>'; if( $httpCode == 404 ) { echo $url.' 404'; } </code>

@Frankie 2009-03-12 17:17:07

Edit:
As incidence pointed out this code has been DEPRECATED with the release of PHP 5.3.0 (2009-06-30) and should be used accordingly.


Just my two cents but I've developed this function and have been using it for a while with success. It's well documented and separated so you can easily change it.

// Checks if string is a URL
// @param string $url
// @return bool
function isURL($url = NULL) {
    if($url==NULL) return false;

    $protocol = '(http://|https://)';
    $allowed = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)';

    $regex = "^". $protocol . // must include the protocol
             '(' . $allowed . '{1,63}\.)+'. // 1 or several sub domains with a max of 63 chars
             '[a-z]' . '{2,6}'; // followed by a TLD
    if(eregi($regex, $url)==true) return true;
    else return false;
}

@jussi 2009-12-10 15:48:03

Eregi will be removed in PHP 6.0.0. And domains with "öäåø" will not validate with your function. You probably should convert the URL to punycode first?

@Frankie 2009-12-10 18:05:05

@incidence absolutely agree. I wrote this in March and PHP 5.3 only came out late June setting eregi as DEPRECATED. Thank you. Gonna edit and update.

@Yzmir Ramirez 2011-08-06 18:15:51

Correct me if I'm wrong, but can we still assume TLDs will have a minimum of 2 characters and maximum of 6 characters?

@Nick Rice 2016-09-12 11:02:35

@YzmirRamirez (All these years later...) If there was any doubt when you wrote your comment there certainly isn't now, with TLDs these days such as .photography

@Yzmir Ramirez 2016-09-13 17:03:04

@NickRice you are correct...how much the web changes in 5 years. Now I can't wait until someone makes the TLD .supercalifragilisticexpialidocious

@joedevon 2009-05-30 05:11:48

Peter's Regex doesn't look right to me for many reasons. It allows all kinds of special characters in the domain name and doesn't test for much.

Frankie's function looks good to me and you can build a good regex from the components if you don't want a function, like so:

^(http://|https://)(([a-z0-9]([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z]{2,6}

Untested but I think that should work.

Also, Owen's answer doesn't look 100% either. I took the domain part of the regex and tested it on a Regex tester tool http://erik.eae.net/playground/regexp/regexp.html

I put the following line:

(\S*?\.\S*?)

in the "regexp" section and the following line:

-hello.com

under the "sample text" section.

The result allowed the minus character through. Because \S means any non-space character.

Note the regex from Frankie handles the minus because it has this part for the first character:

[a-z0-9]

Which won't allow the minus or any other special character.

@catchdave 2008-12-27 14:12:29

As per the PHP manual - parse_url should not be used to validate a URL.

Unfortunately, it seems that filter_var('example.com', FILTER_VALIDATE_URL) does not perform any better.

Both parse_url() and filter_var() will pass malformed URLs such as http://...

Therefore in this case - regex is the better method.

@Kzqai 2010-07-19 00:50:59

This argument doesn't follow. If FILTER_VALIDATE_URL is a little more permissive than you want, tack on some additional checks to deal with those edge cases. Reinventing the wheel with your own attempt at a regex against urls is only going to get you further from a complete check.

@Kzqai 2010-07-19 02:54:06

See all the shot-down regexes on this page for examples of why -not- to write your own.

@catchdave 2010-07-20 04:54:50

You make a fair point Tchalvak. Regexes for something like URLs can (as per other responses) be very hard to get right. Regex is not always the answer. Conversely regex is also not always the wrong answer either. The important point is to pick the right tool (regex or otherwise) for the job and not be specifically "anti" or "pro" regex. In hindsight, your answer of using filter_var in combination with constraints on its edge-cases, looks like the better answer (particularly when regex answers start to get to greater than 100 chars or so - making maintenance of said regex a nightmare)

@Peter Bailey 2008-10-15 19:36:07

I've used this one with good success - I don't remember where I got it from

$pattern = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";

@andrewbadera 2009-08-26 15:54:55

^(http://|https://)?(([a-z0-9]?([-a-z0-9]*[a-z0-9]+)?){1,63}‌​\.)+[a-z]{2,6} (may be too greedy, not sure yet, but it's more flexible on protocol and leading www)

@Roger 2011-02-11 12:40:26

The @Peter Bailey's regex passes example.123

@M A SIDDIQUI 2017-01-03 13:37:01

I used this and it passes: meraj

Related Questions

Sponsored Content

73 Answered Questions

83 Answered Questions

[SOLVED] How to validate an email address in JavaScript?

20 Answered Questions

[SOLVED] Get the current URL with JavaScript?

  • 2009-06-23 19:26:45
  • dougoftheabaci
  • 2608328 View
  • 2760 Score
  • 20 Answer
  • Tags:   javascript url

15 Answered Questions

[SOLVED] How to change the URI (URL) for a remote Git repository?

  • 2010-03-12 12:48:47
  • e-satis
  • 1279774 View
  • 3239 Score
  • 15 Answer
  • Tags:   git url git-remote

7 Answered Questions

[SOLVED] How does PHP 'foreach' actually work?

18 Answered Questions

[SOLVED] Reference — What does this symbol mean in PHP?

14 Answered Questions

[SOLVED] Encode URL in JavaScript?

16 Answered Questions

[SOLVED] What is the maximum length of a URL in different browsers?

  • 2009-01-06 16:14:30
  • Sander Versluys
  • 1128578 View
  • 4503 Score
  • 16 Answer
  • Tags:   http url browser

31 Answered Questions

[SOLVED] What is the difference between a URI, a URL and a URN?

  • 2008-10-06 21:26:58
  • Sean McMains
  • 1071951 View
  • 4110 Score
  • 31 Answer
  • Tags:   http url uri urn rfc3986

48 Answered Questions

[SOLVED] Validate decimal numbers in JavaScript - IsNumeric()

Sponsored Content