By stage


2012-05-14 14:11:11 8 Comments

I searched for a solution but nothing was relevant, so here is my problem:

I want to parse a string which contains HTML text. I want to do it in JavaScript.

I tried this library but it seems that it parses the HTML of my current page, not from a string. Because when I try the code below, it changes the title of my page:

var parser = new HTMLtoDOM("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>", document);

My goal is to extract links from an HTML external page that I read just like a string.

Do you know an API to do it?

8 comments

@AnthumChris 2019-03-07 14:22:46

const parse = Range.prototype.createContextualFragment.bind(document.createRange());

document.body.appendChild( parse('<p><strong>Today is:</strong></p>') ),
document.body.appendChild( parse(`<p style="background: #eee">${new Date()}</p>`) );


Only valid child Nodes within the parent Node (start of the Range) will be parsed. Otherwise, unexpected results may occur:

// <body> is "parent" Node, start of Range
const parseRange = document.createRange();
const parse = Range.prototype.createContextualFragment.bind(parseRange);

// Returns Text "1 2" because td, tr, tbody are not valid children of <body>
parse('<td>1</td> <td>2</td>');
parse('<tr><td>1</td> <td>2</td></tr>');
parse('<tbody><tr><td>1</td> <td>2</td></tr></tbody>');

// Returns <table>, which is a valid child of <body>
parse('<table> <td>1</td> <td>2</td> </table>');
parse('<table> <tr> <td>1</td> <td>2</td> </tr> </table>');
parse('<table> <tbody> <td>1</td> <td>2</td> </tbody> </table>');

// <tr> is parent Node, start of Range
parseRange.setStart(document.createElement('tr'), 0);

// Returns [<td>, <td>] element array
parse('<td>1</td> <td>2</td>');
parse('<tr> <td>1</td> <td>2</td> </tr>');
parse('<tbody> <td>1</td> <td>2</td> </tbody>');
parse('<table> <td>1</td> <td>2</td> </table>');

@Cilan 2014-02-19 03:28:46

It's quite simple:

var parser = new DOMParser();
var htmlDoc = parser.parseFromString(txt, 'text/html');
// do whatever you want with htmlDoc.getElementsByTagName('a');

According to MDN, to do this in chrome you need to parse as XML like so:

var parser = new DOMParser();
var htmlDoc = parser.parseFromString(txt, 'text/xml');
// do whatever you want with htmlDoc.getElementsByTagName('a');

It is currently unsupported by webkit and you'd have to follow Florian's answer, and it is unknown to work in most cases on mobile browsers.

Edit: Now widely supported

@aendrew 2016-03-09 11:21:02

Worth noting that in 2016 DOMParser is now widely supported. caniuse.com/#feat=xml-serializer

@Expenzor 2017-06-06 13:30:40

"text/html" works fine on Chrome

@ceving 2017-11-03 00:17:46

Worth noting that all relative links in the created document are broken, because the document gets created by inheriting the documentURL of window, which most likely differs from the URL of the string.

@Jack Giffin 2018-05-19 17:36:34

Worth noting that you should only call new DOMParser once and then reuse that same object throughout the rest of your script.

@Justin 2019-03-07 17:39:24

The parse() solution below is more reusable and specific to HTML. This is nice if you need an XML document, however.

@Shariq Musharaf 2019-06-20 09:14:44

How can I display this parsed webpage on a dialog box or something? I was not able to find solution for that

@Munawwar 2015-10-24 17:52:11

EDIT: The solution below is only for HTML "fragments" since html,head and body are removed. I guess the solution for this question is DOMParser's parseFromString() method.


For HTML fragments, the solutions listed here works for most HTML, however for certain cases it won't work.

For example try parsing <td>Test</td>. This one won't work on the div.innerHTML solution nor DOMParser.prototype.parseFromString nor range.createContextualFragment solution. The td tag goes missing and only the text remains.

Only jQuery handles that case well.

So the future solution (MS Edge 13+) is to use template tag:

function parseHTML(html) {
    var t = document.createElement('template');
    t.innerHTML = html;
    return t.content.cloneNode(true);
}

var documentFragment = parseHTML('<td>Test</td>');

For older browsers I have extracted jQuery's parseHTML() method into an independent gist - https://gist.github.com/Munawwar/6e6362dbdf77c7865a99

@Jeff Laughlin 2017-09-29 17:06:37

If you want to write forward-compatible code that also works on old browsers you can polyfill the <template> tag. It depends on custom elements which you may also need to polyfill. In fact you might just want to use webcomponents.js to polyfill custom elements, templates, shadow dom, promises, and a few other things all at one go.

@John Slegers 2013-12-09 03:38:55

The following function parseHTML will return either :


The code :

function parseHTML(markup) {
    if (markup.toLowerCase().trim().indexOf('<!doctype') === 0) {
        var doc = document.implementation.createHTMLDocument("");
        doc.documentElement.innerHTML = markup;
        return doc;
    } else if ('content' in document.createElement('template')) {
       // Template tag exists!
       var el = document.createElement('template');
       el.innerHTML = markup;
       return el.content;
    } else {
       // Template tag doesn't exist!
       var docfrag = document.createDocumentFragment();
       var el = document.createElement('body');
       el.innerHTML = markup;
       for (i = 0; 0 < el.childNodes.length;) {
           docfrag.appendChild(el.childNodes[i]);
       }
       return docfrag;
    }
}

How to use :

var links = parseHTML('<!doctype html><html><head></head><body><a>Link 1</a><a>Link 2</a></body></html>').getElementsByTagName('a');

@Sebastian Carroll 2014-01-10 06:21:32

I couldn't get this to work on IE8. I get the error "Object doesn't support this property or method" for the first line in the function. I don't think the createHTMLDocument function exists

@John Slegers 2014-01-14 15:03:35

What exactly is your use case? If you just want to parse HTML and your HTML is intended for the body of your document, you could do the following : (1) var div=document.createElement("DIV"); (2) div.innerHTML = markup; (3) result = div.childNodes; --- This gives you a collection of childnodes and should work not just in IE8 but even in IE6-7.

@Sebastian Carroll 2014-01-22 22:04:57

Thanks for the alternate option, I'll try it if I need to do this again. For now though I used the JQuery solution above.

@Toothbrush 2016-12-24 21:02:03

@SebastianCarroll Note that IE8 doesn't support the trim method on strings. See stackoverflow.com/q/2308134/3210837.

@John Slegers 2016-12-29 14:53:37

@Toothbrush : Is IE8 support still relevant at the dawn of 2017?

@Toothbrush 2016-12-29 15:36:41

@JohnSlegers For some companies, yes.

@John Slegers 2017-01-02 08:30:57

@Toothbrush : Good to know :-)

@Florian Margaine 2012-05-14 14:14:36

Create a dummy DOM element and add the string to it. Then, you can manipulate it like any DOM element.

var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";

el.getElementsByTagName( 'a' ); // Live NodeList of your anchor elements

Edit: adding a jQuery answer to please the fans!

var el = $( '<div></div>' );
el.html("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>");

$('a', el) // All the anchor elements

@stage 2012-05-14 15:10:17

Just a note: With this solution, if I do a "alert(el.innerHTML)", I lose the <html>, <body> and <head> tag....

@stage 2012-05-21 10:10:20

Problem: I need to get links from <frame> tag. But with this solution, the frame tag are deleted...

@Florian Margaine 2012-05-21 10:15:03

You can clone the <frame> and work on the clone. This way, you keep the original untouched and work on the cloned element (which you can delete/whatever). To clone, you can use: var c = el.cloneNode( true ); or with jQuery: var c = $( el ).clone();.

@stage 2012-05-21 10:26:50

I think I didn't understand because when I try it, it doesn't work: var c = el.cloneNode( true ); alert(c.innerHTML); The frame tag is still deleted

@Florian Margaine 2012-05-21 12:03:09

It does work in there: jsfiddle.net/Ralt/nkPjp . If what you want is getting elements from an iframe on another domain, then it is not possible for security reasons.

@stage 2012-05-21 13:28:28

I've got this: jsfiddle.net/aHWJ8 i cannot grap the link ? as you can see, even the <body>, <head>, <html> are deleted.

@stage 2012-05-21 13:54:24

The link is in the "src" of the frame. <FRAME SRC='web-pages/page.html'>

@Florian Margaine 2012-05-21 14:05:47

Well, that's completely different from what your question states. You should ask another question for this.

@Florian Margaine 2012-05-21 14:16:47

But the problem is that you can't do that. Even jQuery will strip off the frame tags, since it's just using innerHTML. I don't think using frames is a good idea btw.

@stage 2012-05-21 14:16:54

But this is what I asked: "My goal is to extract links from a HTML external page that I read just like a String." I extract links from <img>, <script>, <a>... I just miss FRAME because it's deleted by the innerHTML method.

@Florian Margaine 2012-05-21 14:19:58

In an HTML page, a link is an anchor tag (a), that's how everybody answered you :-). You can't get the FRAME source. innerHTML is the only way to do this, so you can't do it. Your only way would be to send the html server side with ajax so that you can work with it.

@Nick 2013-12-03 06:20:36

Thanks for posting an answer that involves vanilla Javascript! Almost in 99.999% of the cases there's no need to use jQuery! Occasionally, I get lazy and use $.get/post, but that's it.

@omninonsense 2015-05-20 17:21:21

@stage I'm a little bit late to the party, but you should be able to use document.createElement('html'); to preserve the <head> and <body> tags.

@JMRC 2015-11-20 17:31:24

I was afraid for ID collision, but this did not happen. Just in case another newbie was wondering the same thing.

@symbiont 2017-08-16 11:39:05

it looks like you are putting an html element within an html element

@RiA 2018-03-27 03:42:30

In my case, my page needs to repeat this activity over and over again. Would repeatedly creating a dummy dom element get memory intensive? Is there a way to dispose of the dom element once the innerHtml has been extracted? I'm not quite familiar with how the browser handles javascript variables.

@Justin 2019-03-07 17:36:47

I'm concerned is upvoted as the top answer. The parse() solution below is more reusable and elegant.

@Joel Richard 2015-02-08 04:41:29

The fastest way to parse HTML in Chrome and Firefox is Range#createContextualFragment:

var range = document.createRange();
range.selectNode(document.body); // required in Safari
var fragment = range.createContextualFragment('<h1>html...</h1>');
var firstNode = fragment.firstChild;

I would recommend to create a helper function which uses createContextualFragment if available and falls back to innerHTML otherwise.

Benchmark: http://jsperf.com/domparser-vs-createelement-innerhtml/3

@Ry- 2015-08-28 22:54:24

Note that, like (the simple) innerHTML, this will execute an <img>’s onerror.

@Munawwar 2015-10-05 21:47:12

An issue with this is that, html like '<td>test</td>' would ignore the td in the document.body context (and only create 'test' text node).OTOH, if it used internally in a templating engine then the right context would be available.

@Munawwar 2015-10-05 21:49:36

Also BTW, IE 11 supports createContextualFragment.

@sea26.2 2019-04-19 01:38:31

The question was how to parse with JS - not Chrome or Firefox

@Mathieu 2012-05-14 14:18:00

var $doc = new DOMParser().parseFromString($html, "text/html");
$As = $('a', $doc);

@Rob W 2012-05-15 13:08:49

Why are you prefixing $? Also, as mentioned in the linked duplicate, text/html is not supported very well, and has to be implemented using a polyfill.

@Mathieu 2012-05-15 13:23:44

I copied this line from a project, I'm used to prefix variables with $ in javascript application (not in library). it's just to avoir having a conflict with a library. that's not very usefull as almost every variable is scoped but it used to be usefull. it also (maybe) help to identify variables easily.

@Jokester 2013-04-24 16:51:36

Sadly DOMParser neither work on text/html in chrome, this MDN page gives workaround.

@jmar777 2012-05-14 14:17:13

If you're open to using jQuery, it has some nice facilities for creating detached DOM elements from strings of HTML. These can then be queried through the usual means, E.g.:

var html = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
var anchors = $('<div/>').append(html).find('a').get();

Edit - just saw @Florian's answer which is correct. This is basically exactly what he said, but with jQuery.

@Florian Margaine 2012-05-14 14:30:43

I edited to add a jquery solution, not exactly like yours!

Related Questions

Sponsored Content

85 Answered Questions

[SOLVED] How do I make the first letter of a string uppercase in JavaScript?

59 Answered Questions

[SOLVED] How to replace all occurrences of a string?

9 Answered Questions

[SOLVED] Why does HTML think “chucknorris” is a color?

3 Answered Questions

75 Answered Questions

[SOLVED] How can I convert a string to boolean in JavaScript?

  • 2008-11-05 00:13:08
  • Kevin
  • 1816188 View
  • 2339 Score
  • 75 Answer
  • Tags:   javascript

26 Answered Questions

[SOLVED] Retrieve the position (X,Y) of an HTML element

23 Answered Questions

[SOLVED] What are valid values for the id attribute in HTML?

  • 2008-09-16 09:08:52
  • Mr Shark
  • 427330 View
  • 1950 Score
  • 23 Answer
  • Tags:   html

24 Answered Questions

[SOLVED] HTML 5: Is it <br>, <br/>, or <br />?

  • 2009-12-22 13:39:08
  • Eikern
  • 1314275 View
  • 1956 Score
  • 24 Answer
  • Tags:   html

3 Answered Questions

[SOLVED] Cannot display HTML string

30 Answered Questions

[SOLVED] How do you parse and process HTML/XML in PHP?

Sponsored Content