By bergin

2010-09-29 10:18:14 8 Comments

Trying to find the links on a page.

my regex is:

/<a\s[^>]*href=(\"\'??)([^\"\' >]*?)[^>]*>(.*)<\/a>/

but seems to fail at

<a title="this" href="that">what?</a>

How would I change my regex to deal with href not placed first in the a tag?


@Meloman 2019-01-22 12:54:27

The following is working for me and returns both href and value of the anchor tag.

preg_match_all("'\<a.*?href=\"(.*?)\".*?\>(.*?)\<\/a\>'si", $html, $match);
if($match) {
    foreach($match[0] as $k => $e) {
        $urls[] = array(
            'anchor'    =>  $e,
            'href'      =>  $match[1][$k],
            'value'     =>  $match[2][$k]

The multidimensional array called $urls contains now associative sub-arrays that are easy to use.

@Milan Malani 2016-08-26 11:17:59

For the one who still not get the solutions very easy and fast using SimpleXML

$a = new SimpleXMLElement('<a href="">Click here</a>');
echo $a['href']; // will echo

Its working for me

@Ravi Prakash 2016-07-06 05:23:10

preg_match_all("/(]>)(.?)(</a)/", $contents, $impmatches, PREG_SET_ORDER);

It is tested and it fetch all a tag from any html code.

@Gordon 2010-09-29 10:35:53

Reliable Regex for HTML are difficult. Here is how to do it with DOM:

$dom = new DOMDocument;
foreach ($dom->getElementsByTagName('a') as $node) {
    echo $dom->saveHtml($node), PHP_EOL;

The above would find and output the "outerHTML" of all A elements in the $html string.

To get all the text values of the node, you do

echo $node->nodeValue; 

To check if the href attribute exists you can do

echo $node->hasAttribute( 'href' );

To get the href attribute you'd do

echo $node->getAttribute( 'href' );

To change the href attribute you'd do

$node->setAttribute('href', 'something else');

To remove the href attribute you'd do


You can also query for the href attribute directly with XPath

$dom = new DOMDocument;
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a/@href');
foreach($nodes as $href) {
    echo $href->nodeValue;                       // echo current attribute value
    $href->nodeValue = 'new value';              // set new attribute value
    $href->parentNode->removeAttribute('href');  // remove attribute

Also see:

On a sidenote: I am sure this is a duplicate and you can find the answer somewhere in here

@Asciiom 2013-10-10 14:11:56

Reliable regex for parsing HTML are inherently impossible even since HTML is not a regular language.

@Toto 2010-09-29 11:43:02

I agree with Gordon, you MUST use an HTML parser to parse HTML. But if you really want a regex you can try this one :


This matches <a at the begining of the string, followed by any number of any char (non greedy) .*? then href= followed by the link surrounded by either " or '

$str = '<a title="this" href="that">what?</a>';
preg_match('/^<a.*?href=(["\'])(.*?)\1.*$/', $str, $m);


array(3) {
  string(37) "<a title="this" href="that">what?</a>"
  string(1) """
  string(4) "that"

@Michal - wereda-net 2014-11-28 17:51:56

just for info: if we search in a text containing many a elements than expression (.*?) is wrong

@Aif 2010-09-29 10:21:13

why don't you just match



$str = '<a title="this" href="that">what?</a>';

$res = array();

preg_match_all("/<a.*?href\s*=\s*['\"](.*?)['\"]/", $str, $res);




$ php test.php
array(2) {
  array(1) {
    string(27) "<a title="this" href="that""
  array(1) {
    string(4) "that"

which works. I've just removed the first capture braces.

@Ignacio Bustos 2013-10-22 15:33:25

i recomend to use preg_match_all("/<a.*?href\s*=\s*['\"](.*?)['\"]/", $str, $res, PREG_SET_ORDER); in order to catch correctly all href values in using foreach($res as $key => $val){echo $val[1]}

@Ruel 2010-09-29 10:25:36

Using your regex, I modified it a bit to suit your need.


I personally suggest you use a HTML Parser

EDIT: Tested

@bergin 2010-09-29 10:28:04

using - sorry, doesnt find the links

@bergin 2010-09-29 10:38:50


@Ruel 2010-09-29 10:41:17

Can you please tell me the text to match? I use: <a title="this" href="that">what?</a>

@Adam 2010-09-29 10:25:32

I'm not sure what you're trying to do here, but if you're trying to validate the link then look at PHP's filter_var()

If you really need to use a regular expression then check out this tool, it may help:

@CharlesLeaf 2010-09-29 10:23:22

Quick test: <a\s+[^>]*href=(\"\'??)([^\1]+)(?:\1)>(.*)<\/a> seems to do the trick, with the 1st match being " or ', the second the 'href' value 'that', and the third the 'what?'.

The reason I left the first match of "/' in there is that you can use it to backreference it later for the closing "/' so it's the same.

See live example on:

@CharlesLeaf 2010-09-29 10:30:43

@bergin please specify, what doesn't work? I get the exact value from the href in your test HTML. What are you expecting that this doesn't do? I see you use a different site for testing, there I also get the 'href' value succesfully from your example.

@Alex Pliutau 2010-09-29 10:22:23

The pattern you want to look for would be the link anchor pattern, like (something):

$regex_pattern = "/<a href=\"(.*)\">(.*)<\/a>/";

@funerr 2016-09-09 11:36:40

What if the anchor has more attributes?

Related Questions

Sponsored Content

20 Answered Questions

[SOLVED] Reference — What does this symbol mean in PHP?

59 Answered Questions

[SOLVED] How do I check if an element is hidden in jQuery?

23 Answered Questions

[SOLVED] What are valid values for the id attribute in HTML?

  • 2008-09-16 09:08:52
  • Mr Shark
  • 455526 View
  • 2047 Score
  • 23 Answer
  • Tags:   html

27 Answered Questions

[SOLVED] Retrieve the position (X,Y) of an HTML element relative to the browser window

32 Answered Questions

[SOLVED] How to create an HTML button that acts like a link?

  • 2010-05-25 16:39:47
  • Andrew
  • 6520726 View
  • 1990 Score
  • 32 Answer
  • Tags:   html

38 Answered Questions

[SOLVED] Deleting an element from an array in PHP

  • 2008-12-15 20:28:55
  • Ben
  • 2741606 View
  • 2585 Score
  • 38 Answer
  • Tags:   php arrays unset

15 Answered Questions

[SOLVED] How to move an element into another element?

  • 2009-08-14 20:14:45
  • Mark Richman
  • 1147558 View
  • 1725 Score
  • 15 Answer
  • Tags:   javascript jquery html

16 Answered Questions

[SOLVED] How do I find out which DOM element has the focus?

  • 2009-01-30 20:21:31
  • Tony Peterson
  • 680176 View
  • 1340 Score
  • 16 Answer
  • Tags:   javascript dom

54 Answered Questions

7 Answered Questions

[SOLVED] How does PHP 'foreach' actually work?

Sponsored Content