Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some sites parsed incorrectly #73

Open
pztrn opened this issue Jan 1, 2023 · 3 comments
Open

Some sites parsed incorrectly #73

pztrn opened this issue Jan 1, 2023 · 3 comments

Comments

@pztrn
Copy link

pztrn commented Jan 1, 2023

Hello, I'm in docker on e38a6ee with mail output plugin.

Some sites parsed incorrectly, e.g. sometimes new releases from github repositories appears like:

изображение

and no actual release information.

Confirmed feeds:

It happens absolutely randomly, sometimes it parses feed normally, sometimes it puts something like HTML head in letter (like on screenshot).

I was using latest release before, it was working fine.

@ncarlier
Copy link
Owner

ncarlier commented Jan 2, 2023

Hello, are you using the fetch filter plugin ?

@pztrn
Copy link
Author

pztrn commented Jan 2, 2023

Yes, it is enabled.

@ncarlier
Copy link
Owner

ncarlier commented Jan 2, 2023

The feed is correctly parsed but the "fetch" filter tries to retrieve the HTML content of the original URL (via Web Scrapping technics). Some websites are not well scraped. It depends mainly of the page structure. I suggest you add a tag only on the feeds you want to be scrapped (ex: tofetch). Then add a condition on the fetch plugin to be activated only on this tag (ex: "tofetch" in Tags).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants