Proper XML Namespaces support #367

ddevienne · 2020-08-19T09:17:38Z

I've searched issues, and although there are a few related to namespaces, they do not tackle the real issue,
which is that all node's namespace URI should be available. Knowing a node's namespace-qualified name
(i.e. NS-Prefix colon Local-Name) is insufficient, because of default namespaces, and the fact prefixes are arbitrary
and only namespace URIs matter. Namespaces in XML are pretty essential.

The XML parser is best suited to keep track of in-scope namespaces, and assign them to nodes.
Client code would need to scan the whole document for all namespace-related attributes, and for
all element names, maintain a stack of in-scope namespace, and keep an external shadow DOM
(i.e. a map) to know the NS URI of all nodes, which is possible, but cumbersome and inefficient.

(BTW, I don't see how the namespace-uri() = X XPath predicate can be correct and efficient w/o the
parse-tree knowing about the NS of all nodes, as described above)

I'm currently using https://github.com/svgpp/rapidxml_ns, but would welcome being able to replace it with pugixml,
provided proper XML Namespaces support. Please consider this a formal request for Enhancement. Thanks, --DD

The text was updated successfully, but these errors were encountered:

zeux · 2020-09-22T20:38:42Z

(BTW, I don't see how the namespace-uri() = X XPath predicate can be correct and efficient w/o the
parse-tree knowing about the NS of all nodes, as described above)

To get the namespace URI for a single node, it's enough to scan the ancestry chain for the relevant attributes. It's not as efficient as already having the information, but it doesn't require scanning the entire document.

First class support in namespaces would introduce memory overhead for all nodes to store the extra URI data, and make the parser slower because of the need to identify xmlns attributes and maintain the relevant structures. Because of this I actually believe that the external tracking approach is ideal - with that the users only pay for the extra namespace information when it's relevant. The implementation would be less efficient than the first class implementation but not by much, and it doesn't need to make the core more complex.

It's possible to include a helper like this in pugixml, maybe a separate class xml_namespaces that you can create from xml_document which would pre-record the association between nodes and namespace URIs. The implementation requires a hash map but there's already an implementation for compact mode that could be reused for this.

Perhaps an interface like this could work. This would assume that the tree doesn't mutate after construction.

class xml_namespaces
{
public:
    xml_namespaces();
    explicit xml_namespaces(xml_node root); // alternatively an explicit reset() method

    void reset(xml_node root);

    const char* local_name(xml_node node) const;
    const char* namespace_uri(xml_node node) const;

    // possibly also something like this for more efficient lookup:
    const void* get_namespace(const char* uri) const;
    bool has_namespace(xml_node node, const void* id) const;
};

zeux added the enhancement label Sep 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper XML Namespaces support #367

Proper XML Namespaces support #367

ddevienne commented Aug 19, 2020

zeux commented Sep 22, 2020

Proper XML Namespaces support #367

Proper XML Namespaces support #367

Comments

ddevienne commented Aug 19, 2020

zeux commented Sep 22, 2020