Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in feedparser 6.0.10 #378

Open
TheVamp opened this issue Jun 20, 2023 · 6 comments
Open

Crash in feedparser 6.0.10 #378

TheVamp opened this issue Jun 20, 2023 · 6 comments

Comments

@TheVamp
Copy link

TheVamp commented Jun 20, 2023

I noticed that the latest released version of feedparser crashes, when a CDATA section contains a C Code snippets. Here is an example on how to reproduce the issue.

  • Install feedparser via python -m pip install feedparser
  • RSS XML Crash example - rss.zip
    • or you could use the original feed https://blog.trailofbits.com/feed/
import feedparser

with open("./rss_code_crash.xml", "r") as f:
    rss_data = f.read()
rss = feedparser.parse(rss_data)
# Or just this:
#rss = feedparser.parse('https://blog.trailofbits.com/feed/')

I tested the same issue on the develop branch, but the crash does not occur their.
Thanks for your support.

@kurtmckee
Copy link
Owner

This is the minimum reproducible example:

<content:encoded xmlns:content="bogus">
    <![CDATA[
        <!h<!h<!h<
    ]]>
</content:encoded>

The crash is coming from within the Python standard library -- _markupbase.py at line 134 raises an AssertionError stating "unexpected '<' char in declaration".

On a side note, it appears that Trail of Bits is using Wordpress. Perhaps this is a bug that exists in Wordpress or one of the plugins in its ecosystem and could be fixed there, as well!

@TheVamp
Copy link
Author

TheVamp commented Jun 21, 2023

Is there a specific code change in the develop branch that fixed that problem and interpret the content in a different way?

In the develop branch everything works as expected:

  • python -m pip install git+https://github.com/kurtmckee/feedparser@develop
  • using your RSS sample or my RSS sample as input
  • executing the python script from above and everything works fine

That was why I thought it is a bug in feedparser.
I will have a look into the Wordpress topic.

@kurtmckee
Copy link
Owner

Yep, I saw the same thing with the develop branch.

The crash is a bug in the feedparser 6.0.10 release. However, that's happening because Wordpress is failing to escape the code in its <pre> blocks. It's two bugs, in different products, not one.

@fchorney
Copy link

Coincidentally I was about to raise this exact same issue for the same feed. Looking forward to a fix for it

@fchorney
Copy link

Hi, just curious if this is going to be addressed at some point. Since the issue seemed fixed in the develop branch, could we get a new release? I understand that this is also a Wordpress issue, but if this can be used with the changes in develop, a release would be great. Thanks (I'll note that I haven't actually checked to see if the feed itself has changed and fixed itself yet)

@kurtmckee
Copy link
Owner

The develop branch is not in a state where it can be released yet; it will take many, many hours of work to get it into a stable state, and I can't commit the required time until after the new year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants