Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove_tags not working on html comments #158

Closed
mosynaq opened this issue Jun 15, 2020 · 4 comments
Closed

remove_tags not working on html comments #158

mosynaq opened this issue Jun 15, 2020 · 4 comments
Labels

Comments

@mosynaq
Copy link

mosynaq commented Jun 15, 2020

I want to remove_tags from '<div><!--<A href="/mypage.htm">-->text</div>'. This is what I get as a result: '-->text', while 'text' is expected.

@Gallaecio
Copy link
Member

I think there’s two parts here:

  • I would expect you to get <!--<A href="/mypage.htm">-->text, and I think not getting that is a bug.

  • You want remove_tags to support also removing comments, which I would say is a feature request.

@Gallaecio Gallaecio added the bug label Jun 15, 2020
@mosynaq
Copy link
Author

mosynaq commented Jun 16, 2020

I meant the latter. Since the HTML comment tag is an HTML tag too, it would be nice to remove them when removing HTML tags are asked. The former would be a good output if it is accessible through a parameter. Anyway, the current output would not be considered an acceptable one.

@Laerte
Copy link
Member

Laerte commented Jun 12, 2024

@mosynaq If we call remove_comments before running the remove_tags the output will be the expected value.

from w3lib.html import remove_tags, remove_comments

raw = '<div><!--<A href="/mypage.htm">-->text</div>'

assert remove_tags(remove_comments(raw)) == "text"

@Gallaecio If we consider HTML comments as tags we could just add a call to remove_comments on remove_tags function or we can add new additional parameter on remove_tags which one sounds better?

@mosynaq
Copy link
Author

mosynaq commented Jun 13, 2024

Thank you @Laerte. I close this issue.

@mosynaq mosynaq closed this as completed Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants