Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extracting attributes using --xml option results in confusing output with whitespace #62

Open
goekce opened this issue Nov 30, 2020 · 4 comments

Comments

@goekce
Copy link

goekce commented Nov 30, 2020

When I try to extract the attributes using XPath and --xml, I get empty XML output:

$ wget https://raw.githubusercontent.com/DiseaseOntology/HumanDiseaseOntology/main/src/ontology/doid.owl

$ xidel -se "//rdfs:label[text()='malignant hyperthermia']/../@rdf:about" doid.owl
http://purl.obolibrary.org/obo/DOID_8545

# extracting a single attribute in XML format does not work.
$ xidel --xml -se "//rdfs:label[text()='malignant hyperthermia']/../@rdf:about" doid.owl
<?xml version="1.0" encoding="UTF-8"?>
<xml>

</xml>

This behavior is confusing when I browse through an XML file. Does it make sense to extract a pseudo tag element which includes the searched attribute?

For example, xmllint outputs attribute="value" when an attribute is addressed.

@benibela
Copy link
Owner

benibela commented Dec 1, 2020

I should probably change that. However, I cannot decide if it should be attribute="value" or raise an error

I wrote Xidel's output first and later implemented a standard fn:serialize and now I plan to merge them.

attribute="value" would be more useful, but serialize xml gives an error:

$ xidel  -se "serialize(//rdfs:label[text()='malignant hyperthermia']/../@rdf:about, {'method':'adaptive'})" doid.owl
rdf:about="http://purl.obolibrary.org/obo/DOID_8545"
$ xidel  -se "serialize(//rdfs:label[text()='malignant hyperthermia']/../@rdf:about, {'method':'xml'})" doid.owl
Error:
err:SENR0001: Cannot serialize attribute
$ xidel  -se "serialize(//rdfs:label[text()='malignant hyperthermia']/../@rdf:about, {'method':'text'})" doid.owl
Error:
err:SENR0001: Cannot serialize attribute


@goekce
Copy link
Author

goekce commented Dec 1, 2020

My impression is --xml tries to output correct XML — that is why xidel always outputs a correct header, even it outputs useless whitespace. That is what you mean with Xidel's output?

If --xml should continue to output correct XML, than it should not output <xml>rdf:about="http://purl.obolibrary.org/obo/DOID_8545"</xml>

However I did not know about fn:serialize! If this function could be easily used via an option (without writing the function and parantheses around the XPath expression), then it would be very convenient for browsing an XML in my opinion.

My workflow:

  • build the right XPath while browsing the XML file using --xml
  • then remove --xml and use the values in the next processing step.

@benibela
Copy link
Owner

benibela commented Dec 2, 2020

My impression is --xml tries to output correct XML — that is why xidel always outputs a correct header, even it outputs useless whitespace. That is what you mean with Xidel's output?

Yes

Xidel also has options that serialize does not have, e.g. converting json to xml (which might have been a bad idea due its very non-standard output):

$ xidel --output-format xml-wrapped -e '{"a":1}' 
**** Processing: data:,<empty/> ****
<?xml version="1.0" encoding="UTF-8"?>
<seq>
<e><object><a>1</a></object></e>
</seq>

If --xml should continue to output correct XML, than it should not output rdf:about="http://purl.obolibrary.org/obo/DOID_8545"

However, it is still correct XML even if it is text rather than an attribute

However I did not know about fn:serialize! If this function could be easily used via an option (without writing the function and parantheses around the XPath expression), then it would be very convenient for browsing an XML in my opinion.

There is another standard XQuery way, but it is even worse:

xidel doid.owl -e 'declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization"; declare option output:method "xml"; //rdfs:label/...'

I could predefine the output namespace, but then it is non standard again

@goekce
Copy link
Author

goekce commented Dec 2, 2020

--xml should probably continue to output correct XML. If XML can also contain simple text, that would be an option, but it would not be consistent with Xidel's behavior when outputting node elements (where Xidel even additionally appends namespaces to child nodes).

The only idea I have is to introduce another option like --excerpt which outputs only corresponding parts of the read file without the effort that --xml puts into output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants