By Chris W.


2011-08-31 21:44:11 8 Comments

I'm using the Python bindings to run Selenium WebDriver.

from selenium import webdriver
wd = webdriver.Firefox()

I know I can grab a webelement like so...

elem = wd.find_element_by_css_selector('#my-id')

And I know I can get the full page source with...

wd.page_source

But is there anyway to get the "element source"?

elem.source   # <-- returns the HTML as a string

The selenium webdriver docs for Python are basically non-existent and I don't see anything in the code that seems to enable that functionality.

Any thoughts on the best way to access the HTML of an element (and its children)?

13 comments

@Rusty 2018-02-04 17:32:45

The method to get the rendered HTML I prefer is following:

driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text

However the above method removes all the tags( yes the nested tags as well ) and returns only text content. If you interested in getting the HTML markup as well, then use the method below.

print body_html.getAttribute("innerHTML")

@Rusty 2018-02-05 04:58:38

You can also use driver.find_element_by_tag("body") to reach the body content of the page.

@Nerijus 2011-12-20 12:49:48

You can read innerHTML attribute to get source of the content of the element or outerHTML for source with the current element.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

C#:

element.GetAttribute("innerHTML");

Ruby:

element.attribute("innerHTML")

JS:

element.getAttribute('innerHTML');

PHP:

$elem.getAttribute('innerHTML');

Tested and works with the ChromeDriver.

@bibstha 2012-03-22 13:53:46

innerHTML is a not DOM attribute. So above answer wouldn't work. innerHTML is a javascript javascript value. Doing above would return null. The answer by nilesh is the proper answer.

@Ryan Shillington 2012-07-10 02:04:43

This works great for me, and is much more elegant than the accepted answer. I'm using Selenium 2.24.1.

@CuongHuyTo 2012-07-23 10:57:50

Though innerHTML is not a DOM attribute, it is well supported by all major browsers (quirksmode.org/dom/w3c_html.html). It works also well for me.

@Kelvin 2012-08-20 19:45:01

+1 This appears to work in ruby also. I have a feeling that the getAttribute method (or equivalent in other languages) just calls the js method whose name is the arg. However the documentation doesn't explicitly say this, so nilesh's solution should be a fallback.

@Andrew Badr 2012-08-30 20:00:06

I'm getting this: content.get_attribute('innerHTML') == u'<div>...</div>'

@acdcjunior 2014-05-22 20:54:05

This fails for HtmlUnitDriver. Works for ChromeDriver, FirefoxDriver, InternetExplorerDriver (IE10) and PhantomJSDriver (I haven't tested others).

@Momer 2014-06-17 20:12:34

@acdcjunior - HtmlUnit's javascript support is pretty weak; I'd imagine by extension they haven't supported this. More info at this thread

@mvndaai 2014-11-07 20:36:42

In Ruby it element.attribute("innerHTML") if anyone needs it.

@Bharat Mane 2016-04-26 12:43:43

nice- elem.get_attribute("innerHTML") It works by using this.

@ShaBANG 2016-10-10 19:36:42

in Node: element.getAttribute('innerHTML')

@Shubham Jain 2017-09-03 07:18:46

InnerHTML will return element inside the selected element and outerHTML will return inside HTML along with the element you have selected

Example :- Now suppose your Element is as below

<tr id="myRow"><td>A</td><td>B</td></tr>

innerHTML element Output

<td>A</td><td>B</td>

outerHTML element Output

<tr id="myRow"><td>A</td><td>B</td></tr>

Live Example :-

http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm

Below you will find the syntax which require as per different binding. Change the innerHTML to outerHTML as per required.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

If you want whole page HTML use below code :-

driver.getPageSource();

@oleksii.burdin 2011-09-07 14:23:30

I hope this could help: http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html

Here is described Java method:

java.lang.String    getText() 

But unfortunately it's not available in Python. So you can translate the method names to Python from Java and try another logic using present methods without getting the whole page source...

E.g.

 my_id = elem[0].get_attribute('my-id')

@Chris W. 2011-09-07 21:17:18

Python actually does have a "gettext" equivalent (I think its just the "text" attribute?) but that actually just returns the "plaintext" between HTML tags and won't actually return the full HTML source.

@Ryan Shillington 2012-07-10 02:06:18

This returns only the plain text (not the html) in Java too.

@HelloW 2013-09-12 18:17:19

you must reference it like you said elem[0] otherwise it doesn't work

@StanleyD 2013-07-09 14:18:56

If you are interested in a solution for Remote Control in Python, here is how to get innerHTML:

innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")

@Shane 2013-08-04 00:01:28

Thanks for the help, I have used this. I also find innerHTML = {solenium selector code}.text works just the same.

@WltrRpo 2016-03-29 21:25:03

Java with Selenium 2.53.0

driver.getPageSource();

@Corey Goldberg 2017-05-31 02:18:00

that's not what the question asked for

@Stephan 2017-07-25 06:23:54

Depending on the webdriver, the getPageSource method may not return the actual page source (ie with possible javascript changements). The returned source may be the raw source sent by the server. The webdriver doc must be checked to ensure this point.

@John Alberts 2013-04-15 20:59:33

In Ruby, using selenium-webdriver (2.32.1), there is a page_source method that contains the entire page source.

@nilesh 2011-09-03 03:29:14

There is not really a straight-forward way of getting the html source code of a webelement. You will have to use JS. I am not too sure about python bindings but you can easily do like this in Java. I am sure there must be something similar to JavascriptExecutor class in Python.

 WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element); 

@Chris W. 2011-09-07 21:15:12

This is essentially what I ended up doing, albeit with the Python equivalent.

@Ryan Shillington 2012-07-10 02:05:28

I think the answer below, using element.getAttribute("innerHTML") is a lot easier to read. I don't understand why people are voting it down.

@Anthon 2014-04-30 08:15:40

No need to call javascript at all. In Python just use element.get_attribute('innerHTML')

@nilesh 2014-04-30 13:25:36

@Anthon innerHTMLis not a DOM attribute. When I answered this question in 2011, it did not work for me, looks like now some browsers are supporting it. If it works for you then using innerHTML is cleaner. However there is no guarantee it will work on all browsers.

@Illidan 2015-06-06 14:29:10

Apparently, this is the only way to get innerHTML while using RemoteWebDriver

@Zorgijs 2014-05-30 10:25:21

And in PHPUnit selenium test it's like this:

$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');

@nefski 2014-03-06 14:52:17

Looks outdated, but let it be here anyway. The correct way to do it in your case:

elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)

or

html = elem.get_attribute('innerHTML')

Both are working for me (selenium-server-standalone-2.35.0)

@Mark 2013-03-20 18:08:52

Sure we can get all HTML source code with this script below in Selenium Python:

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

If you you want to save it to file:

f = open('c:/html_source_code.html', 'w')
f.write(source_code.encode('utf-8'))
f.close()

I suggest saving to a file because source code is very very long.

@David 2013-08-22 11:47:07

this worked: elem.attribute("outerHTML") in ruby

@CodeGuru 2013-10-17 23:41:28

Can I set a delay and get the latest source? There are dynamic contents loaded using javascript.

@TheRookierLearner 2014-10-20 16:01:42

Does this work even if the page is not fully loaded? Also, is there any way to set a delay like @FlyingAtom mentioned?

@Tiffany G 2013-03-22 15:46:21

Using the attribute method is, in fact, easier and more straight forward.

Using Ruby with the Selenium and PageObject gems, to get the class associated with a certain element, the line would be element.attribute(Class).

The same concept applies if you wanted to get other attributes tied to the element. For example, if I wanted the String of an element, element.attribute(String).

@Ilya 2012-08-31 04:04:51

WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return      arguments[0].innerHTML;", element); 

This code really works to get JavaScript from source as well!

Related Questions

Sponsored Content

3 Answered Questions

[SOLVED] Headless Browser and scraping - solutions

6 Answered Questions

14 Answered Questions

[SOLVED] Getting the last element of a list in Python

  • 2009-05-30 19:28:53
  • Janusz
  • 1310752 View
  • 1435 Score
  • 14 Answer
  • Tags:   python list indexing

31 Answered Questions

[SOLVED] How to get the current time in Python

  • 2009-01-06 04:54:23
  • user46646
  • 2224989 View
  • 2017 Score
  • 31 Answer
  • Tags:   python datetime time

6 Answered Questions

[SOLVED] How to get the number of elements in a list in Python?

  • 2009-11-11 00:30:54
  • y2k
  • 2572518 View
  • 1582 Score
  • 6 Answer
  • Tags:   python list

51 Answered Questions

[SOLVED] Take a screenshot with Selenium WebDriver

4 Answered Questions

3 Answered Questions

[SOLVED] Selenium - Error getting WebElement within iFrame

5 Answered Questions

[SOLVED] Selenium Webdriver and PageFactory initialize List<WebElement> elements

1 Answered Questions

[SOLVED] Adding functions to Selenium WebDriver WebElements using Python

Sponsored Content