-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reader treats all bozo feeds as errors #270
Labels
Comments
Some conclusions from playing with the Atom feed below:
Also, when the loose parser is used, the feed should be considered stale; that is, we should always prefer entries from the non-broken feed. I'm thinking of something like this:
This would favor feeds that are temporarily broken, and eventually get fixed. For feeds that become permanently broken, it results in old strict entries not receiving updates. <?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
<id>one</id>
<title>1</title>
<summary>i</summary>
</entry>
<entry>
<id>two</id>
<title>Atom-Powered Robots Run Amok
<summary>Summary.&veryundefinedentity;
<content>Content.</content>
</entry>
<entry>
<id>three</id>
<title>3</title>
<summary>iii</summary>
</entry>
</feed> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
reader treats all bozo feeds as errors, even if the loose parser managed to parse them:
We still need a heuristic to tell that apart from complete garbage (version, and the presence of entries?):
>>> feedparser.parse("garbage") {'bozo': 1, 'entries': [], 'feed': {}, 'headers': {}, 'encoding': 'utf-8', 'version': '', 'bozo_exception': SAXParseException('syntax error'), 'namespaces': {}}
The text was updated successfully, but these errors were encountered: