By KevinLeng


2012-11-16 07:53:17 8 Comments

I want to use the method of "findall" to locate some elements of the source xml file in the ElementTree module.

However, the source xml file (test.xml) has namespace. I truncate part of xml file as sample:

<?xml version="1.0" encoding="iso-8859-1"?>
<XML_HEADER xmlns="http://www.test.com">
    <TYPE>Updates</TYPE>
    <DATE>9/26/2012 10:30:34 AM</DATE>
    <COPYRIGHT_NOTICE>All Rights Reserved.</COPYRIGHT_NOTICE>
    <LICENSE>newlicense.htm</LICENSE>
    <DEAL_LEVEL>
        <PAID_OFF>N</PAID_OFF>
        </DEAL_LEVEL>
</XML_HEADER>

The sample python code is below:

from xml.etree import ElementTree as ET
tree = ET.parse(r"test.xml")
el1 = tree.findall("DEAL_LEVEL/PAID_OFF") # Return None
el2 = tree.findall("{http://www.test.com}DEAL_LEVEL/{http://www.test.com}PAID_OFF") # Return <Element '{http://www.test.com}DEAL_LEVEL/PAID_OFF' at 0xb78b90>

Although it can works, because there is a namespace "{http://www.test.com}", it's very inconvenient to add a namespace in front of each tag.

How can I ignore the namespace when using the method of "find", "findall" and so on?

9 comments

@z33k 2019-08-13 09:00:10

Let's combine nonagon's answer with mzjn's answer to a related question:

def parse_xml(xml_path: Path) -> Tuple[ET.Element, Dict[str, str]]:
    xml_iter = ET.iterparse(xml_path, events=["start-ns"])
    xml_namespaces = dict(prefix_namespace_pair for _, prefix_namespace_pair in xml_iter)
    return xml_iter.root, xml_namespaces

Using this function we:

  1. Create an iterator to get both namespaces and a parsed tree object.

  2. Iterate over the created iterator to get the namespaces dict that we can later pass in each find() or findall() call as sugested by iMom0.

  3. Return the parsed tree's root element object and namespaces.

I think this is the best approach all around as there's no manipulation either of a source XML or resulting parsed xml.etree.ElementTree output whatsoever involved.

I'd like also to credit barny's answer with providing an essential piece of this puzzle (that you can get the parsed root from the iterator). Until that I actually traversed XML tree twice in my application (once to get namespaces, second for a root).

@est 2019-03-20 13:11:31

I might be late for this but I dont think re.sub is a good solution.

However the rewrite xml.parsers.expat does not work for Python 3.x versions,

The main culprit is the xml/etree/ElementTree.py see bottom of the source code

# Import the C accelerators
try:
    # Element is going to be shadowed by the C implementation. We need to keep
    # the Python version of it accessible for some "creative" by external code
    # (see tests)
    _Element_Py = Element

    # Element, SubElement, ParseError, TreeBuilder, XMLParser
    from _elementtree import *
except ImportError:
    pass

Which is kinda sad.

The solution is to get rid of it first.

import _elementtree
try:
    del _elementtree.XMLParser
except AttributeError:
    # in case deleted twice
    pass
else:
    from xml.parsers import expat  # NOQA: F811
    oldcreate = expat.ParserCreate
    expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

Tested on Python 3.6.

Try try statement is useful in case somewhere in your code you reload or import a module twice you get some strange errors like

  • maximum recursion depth exceeded
  • AttributeError: XMLParser

btw damn the etree source code looks really messy.

@lijat 2018-12-12 07:52:21

Improving on the answer by ericspod:

Instead of changing the parse mode globally we can wrap this in an object supporting the with construct.

from xml.parsers import expat

class DisableXmlNamespaces:
    def __enter__(self):
            self.oldcreate = expat.ParserCreate
            expat.ParserCreate = lambda encoding, sep: self.oldcreate(encoding, None)
    def __exit__(self, type, value, traceback):
            expat.ParserCreate = self.oldcreate

This can then be used as follows

import xml.etree.ElementTree as ET
with DisableXmlNamespaces():
     tree = ET.parse("test.xml")

The beauty of this way is that it does not change any behaviour for unrelated code outside the with block. I ended up creating this after getting errors in unrelated libraries after using the version by ericspod which also happened to use expat.

@AndreasT 2018-12-26 00:42:20

This is sweet AND healthy! Saved my day! +1

@wimous 2013-11-20 19:07:52

The answers so far explicitely put the namespace value in the script. For a more generic solution, I would rather extract the namespace from the xml:

import re
def get_namespace(element):
  m = re.match('\{.*\}', element.tag)
  return m.group(0) if m else ''

And use it in find method:

namespace = get_namespace(tree.getroot())
print tree.find('./{0}parent/{0}version'.format(namespace)).text

@Kashyap 2014-03-18 19:29:48

Too much to assume that there is only one namespace

@ericspod 2018-01-19 15:56:47

If you're using ElementTree and not cElementTree you can force Expat to ignore namespace processing by replacing ParserCreate():

from xml.parsers import expat
oldcreate = expat.ParserCreate
expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

ElementTree tries to use Expat by calling ParserCreate() but provides no option to not provide a namespace separator string, the above code will cause it to be ignore but be warned this could break other things.

@lijat 2018-12-11 12:42:39

This is a better way than other current answers as it does not depend on string processing

@barny 2019-02-13 14:52:15

In python 3.7.2 (and possibly eariler) AFAICT it's no longer possible to avoid using cElementTree, so this workaround may not be possible :-(

@ericspod 2019-02-19 14:31:22

cElemTree is deprecated but there is shadowing of types being done with C accelerators. The C code isn't calling into expat so yes this solution is broken.

@est 2019-03-20 12:24:12

@barny it's still possible, ElementTree.fromstring(s, parser=None) I am trying to pass parser to it.

@barny 2015-11-30 11:21:06

Here's an extension to nonagon's answer, which also strips namespaces off attributes:

from StringIO import StringIO
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    if '}' in el.tag:
        el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
    for at in el.attrib.keys(): # strip namespaces of attributes too
        if '}' in at:
            newat = at.split('}', 1)[1]
            el.attrib[newat] = el.attrib[at]
            del el.attrib[at]
root = it.root

@user2212280 2013-03-26 15:44:24

If you remove the xmlns attribute from the xml before parsing it then there won't be a namespace prepended to each tag in the tree.

import re

xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)

@david.barkhuizen 2014-05-20 19:00:11

+100, someone mint this developer a cryptocoin

@Michael Rice 2014-09-15 20:53:58

Just FYI this only works on python 2.x python 3.x will throw: TypeError: can't use a string pattern on a bytes-like object

@nonagon 2014-09-18 19:38:26

This worked in many cases for me, but then I ran into multiple namespaces and namespace aliases. See my answer for another approach that handles these cases.

@Mike 2015-02-15 19:48:25

-1 manipulating the xml via a regular expression before parsing is just wrong. though it might work in some cases, this should not be the top voted answer and should not be used in a professional application.

@Parthian Shot 2015-06-16 20:11:07

@Mike HE COMES.

@nonagon 2014-09-18 19:37:36

Instead of modifying the XML document itself, it's best to parse it and then modify the tags in the result. This way you can handle multiple namespaces and namespace aliases:

from StringIO import StringIO
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    if '}' in el.tag:
        el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
root = it.root

This is based on the discussion here: http://bugs.python.org/issue18304

@Jess 2014-10-11 03:08:06

This. This this this. Multiple name spaces were going to be the death of me.

@Tomasz Gandor 2014-11-14 15:12:34

OK, this is nice and more advanced, but still it's not et.findall('{*}sometag'). And it also is mangling the element tree itself, not just "perform the search ignoring namespaces just this time, without re-parsing the document etc, retaining the namespace information". Well, for that case you observably need to iterate through the tree, and see for yourself, if the node matches your wishes after removing the namespace.

@TraceKira 2016-08-29 19:28:03

This works by stripping the string but when i save the XML file using write(...) the namespace dissapears from the begging of the XML xmlns="bla" dissapears. Please advice

@tzp 2013-10-08 10:18:17

You can use the elegant string formatting construct as well:

ns='http://www.test.com'
el2 = tree.findall("{%s}DEAL_LEVEL/{%s}PAID_OFF" %(ns,ns))

or, if you're sure that PAID_OFF only appears in one level in tree:

el2 = tree.findall(".//{%s}PAID_OFF" % ns)

Related Questions

Sponsored Content

20 Answered Questions

[SOLVED] How do I find the location of my Python site-packages directory?

  • 2008-09-23 17:04:43
  • Daryl Spitzer
  • 766862 View
  • 829 Score
  • 20 Answer
  • Tags:   python installation

1 Answered Questions

[SOLVED] Suppress namespace in ElementTree

6 Answered Questions

15 Answered Questions

[SOLVED] How do I find the location of Python module sources?

  • 2008-11-06 18:36:52
  • Daryl Spitzer
  • 509338 View
  • 446 Score
  • 15 Answer
  • Tags:   python module

0 Answered Questions

Python Writing a XML Child Element Without Namespace

1 Answered Questions

1 Answered Questions

2 Answered Questions

[SOLVED] Parsing XML in Python with ElementTree - findall

2 Answered Questions

2 Answered Questions

[SOLVED] Python: namespaces in xml ElementTree (or lxml)

Sponsored Content