-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handleOnXML tries to parse.xlsx
files
#790
Labels
Comments
This doesn't only effect xlsx, but also docx, pptx etc.. type documents |
To add to this it would be nice to able to have more granularity over what XML is parsed. For example, we use a OnXML handler to follow links in a XML sitemap, but our site contains many SVGs ( |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The handleOnXML function attempts to parse responses with the content-type
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
. This is because the function looks for any mention of xml in the content type. This results in a parse error whenxmlquery.Parse()
is called (For example: `encoding/xml.SyntaxError {Msg: "illegal character code U+0003", Line: 1}).XLSX files packaged as a zip - so can't be directly parsed as XML.
It would be ideal to not try and parse these files, possibly by being more explicit in which content-types we consider to be XML.
The text was updated successfully, but these errors were encountered: