Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to get values not it the html tag eg.span tag #433

Closed
davidAg9 opened this issue Nov 7, 2022 · 2 comments
Closed

how to get values not it the html tag eg.span tag #433

davidAg9 opened this issue Nov 7, 2022 · 2 comments

Comments

@davidAg9
Copy link

davidAg9 commented Nov 7, 2022

    I have an issue where the value I want to get isn't in the span tag but beside it 

<span class="fas fa-user"></span> Published By <a href="mailto:[email protected]"> University Relations Directorate<!--Henry Amoah--></a> | <span class="fas fa-calendar"> </span> Monday October 3, 2022 | <span class="fas fa-clock"></span> 2:27 pm

how do I go about writing it

Originally posted by @davidAg9 in #298 (comment)

@mna
Copy link
Member

mna commented Nov 8, 2022

Hello David,

You can see an example of how to get that here: #287 (comment)

Basically, what you can do is get the parent node via a selector, and then iterate over the selection returned by Contents() (as it is the only one that selects not only HTML elements but all types of nodes, like text and comments) and look for the text nodes to extract the text. What you have there is somewhat weird HTML (in the sense that usually, the spans are there to hold some text, but here it is empty - though with CSS class - and the text is in-between), but anyway sometimes we have to work with broken HTML so that may be your case.

For example:

const data = `
<html>
<body>
	<p>
		<span class="fas fa-user"></span> Published By <a href="mailto:[email protected]"> University Relations Directorate<!--Henry Amoah--></a> | <span class="fas fa-calendar"> </span> Monday October 3, 2022 | <span class="fas fa-clock"></span> 2:27 pm
	</p>
</body>
</html>
`

func main() {
	doc, err := goquery.NewDocumentFromReader(strings.NewReader(data))
	if err != nil {
		log.Fatal(err)
	}

	doc.Find("p").Contents().Each(func(i int, s *goquery.Selection) {
		if goquery.NodeName(s) == "#text" {
			fmt.Printf(">>> (%d) >>> %s\n", i, s.Text())
		}
	})
}

This would print (note that it doesn't get the "University Relations Directorate" as it is not a "free text" element, it is text inside the <a> element):

>>> (0) >>> 
		
>>> (2) >>>  Published By 
>>> (4) >>>  | 
>>> (6) >>>  Monday October 3, 2022 | 
>>> (8) >>>  2:27 pm

Hope this helps!
Martin

@davidAg9
Copy link
Author

davidAg9 commented Nov 8, 2022

Thanks you very much

@davidAg9 davidAg9 closed this as completed Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants