-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to format the result #3
Comments
Re: API, I've toyed with idea of passing arrays to indicate selector + transforms, a la |
Lets say you select all links in a document and want to filter out duplicates. Any user-defined subroutine is called once per item in the array, not on the array as a whole, right? |
It can be done if the subroutine combines select and read :) sl: (subject, v, b) => selectSubroutine(subject, ['a', '{0,}'], b).map(match => readSubroutine(match, ['attribute', 'href'], b)) |
Wow powerful stuff: function sortAndRemoveDups(arr) {
const sorted = arr.sort();
const uniq = [];
let prev = null;
for (let i = 0; i < sorted.length; i += 1) {
if (sorted[i] !== prev) { uniq.push(sorted[i]); }
prev = sorted[i];
}
return uniq;
}
...
slb: (s, v, b) => sortAndRemoveDups(selectSubroutine(s, [v.concat('a:not([href^="#"])').join(' '), '{0,}'], b).map(m => readSubroutine(m, ['attribute', 'href'], b)))
...
allRealLinksUnderBody: slb body |
There has been a request to add a "formatting" ability like in scrape-it library.
Its documented as:
Example:
{ articles: { listItem: ".article" , data: { createdAt: { selector: ".date" + , convert: x => new Date(x) } , title: "a.article-title" , tags: { listItem: ".tags > span" } , content: { selector: ".article-content" , how: "html" } } } }
Considerations:
The text was updated successfully, but these errors were encountered: