Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] use limit & offset in cardigann definitions #13996

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ilike2burnthing
Copy link
Contributor

@ilike2burnthing ilike2burnthing commented Feb 10, 2023

TL;DR - worth looking into further?


This is in response to #13681 (comment), but I completely forgot about until eb93dbb jogged my memory.

It's just a proof of concept rather than an actual PR, as:

  1. it will apply to a lot of other definitions
  2. there may need to be C# Cardigann and/or core changes made as well

From a quick glance at the definitions, two main cases are currently workable:

  • animeclipse animetracker - just limit, largely pointless but might as well if this does move forward
  • digitalcore - limit and offset with single torrents per results

Then there's:

  • anilibria - limit and offset with multiple torrents per results

This is broken due to how Jackett handles limit. For anilibria, limit=1 will return 1 'result', but it may still have more than 1 magnet. However, Jackett will only show the first magnet.

Also, if I use limit=100 in a torznab search, Jackett shows 100 results, but using limit=0 or limit= (or performing a keywordless search in Jackett's UI) Jackett shows 190 results, despite the anilibria search URL being the same. Jackett trims any results over the limit.

Not sure if this can be easily fixed, and whether it's a core issue or just Cardigann.


As for other issues:

@mynameisbogdan
Copy link
Contributor

mynameisbogdan commented Feb 10, 2023

  • as we're HTML-scraping most trackers, they generally use pages rather than an offset, so we'd need to use some method of the following to allow the use of offset with them:
    {{ .Query.Offset }} / {{ .Query.Limit }} + 1

This one is quite easy fix, new variables.
Something like {{ .Query.CurrentPage0 }} and {{ .Query.CurrentPage1 }}, when pagination starts with page=0, respectively page=1.

And for .Query.Limit I think this could be pagesize defined in YML.

@ilike2burnthing
Copy link
Contributor Author

ilike2burnthing commented Feb 10, 2023

Ok, so we throw the maths of {{ .Query.Offset }} / {{ .Query.Limit }} + 1 over to the Cardigann indexer, and that gets fed out to {{ .Query.CurrentPage0 }} or {{ .Query.CurrentPage1 }} depending on if they start on page=0 or page=1 respectively?

Bit of an edge case here, but what happens if someone manually uses (for example) limit=10 and offset=15? Obviously there's no page 1.5, so what happens in that case? Cardigann rounds up?

pagesize is something else that's overdue - #250.

@mynameisbogdan
Copy link
Contributor

mynameisbogdan commented Feb 10, 2023

Ok, so we throw the maths of {{ .Query.Offset }} / {{ .Query.Limit }} + 1 over to the Cardigann indexer, and that gets fed out to {{ .Query.CurrentPage0 }} or {{ .Query.CurrentPage1 }} depending on if they start one page=0 or page=1 respectively?

Yeah.

variables[".Query.Limit"] = query.Limit.ToString() ?? null;
variables[".Query.Offset"] = query.Offset.ToString() ?? null;
variables[".Query.CurrentPage0"] = Math.Max(0, query.Offset > 0 && query.Limit > 0 ? query.Offset / query.Limit : 0).ToString();
variables[".Query.CurrentPage1"] = Math.Max(1, query.Offset > 0 && query.Limit > 0 ? (query.Offset / query.Limit) + 1 : 1).ToString();

Bit of an edge case here, but what happens if someone manually uses (for example) limit=10 and offset=15? Obviously there's no page 1.5, so what happens in that case? Cardigann rounds up?

Being integers involved in the math, I'm certain will be page 1.

@ilike2burnthing
Copy link
Contributor Author

3 quick examples:

  • bigfangroup - CurrentPage0
  • bitsearch - CurrentPage1
  • anirena - just Offset

@ilike2burnthing
Copy link
Contributor Author

Resolve conflict.

@garfield69 thoughts?

@garfield69
Copy link
Contributor

thoughts?

I am not able to make time must time this week to properly read and absorb this topic, but my first thought is that we need to ensure that the limit does not get abused by setting a cap on the limit size, or we risk trashing a site.
Also, given that the limit could necessarily result in multi page fetches, a cap on the number of pages fetched should be considered, as well as enforcing a requestdelay to ensure pacing to prevent flooding.

@ilike2burnthing
Copy link
Contributor Author

Yea, I was wondering last week if it would be worth implementing a global request delay of 2s, and then this could still be overridden per indexer/definition.

I suppose in a similar vein then, we could have a default page cap of 5(?) and limit cap of 500 (1000?).

@bakerboy448
Copy link
Contributor

bakerboy448 commented Feb 18, 2023

Starr has a hard cap of 30 pages or 1000 results whatever is hit first for context

Would suggest using that accordingly otherwise you're artificially limiting results

defaults of 10 pages / 1000 results seems sane - assumption is 100 results per page

@mynameisbogdan mynameisbogdan force-pushed the limit/offset branch 2 times, most recently from c0e1549 to fcb7975 Compare March 5, 2023 12:51
@mynameisbogdan mynameisbogdan added the Don't Merge Hold up - don't merge this label Mar 5, 2023
@mynameisbogdan mynameisbogdan force-pushed the limit/offset branch 3 times, most recently from 236457b to 2de12fa Compare March 5, 2023 20:48
@garfield69
Copy link
Contributor

schema.json needs updating

@ilike2burnthing
Copy link
Contributor Author

Working now.

@mynameisbogdan
Copy link
Contributor

Please use rebase when possible.

@mynameisbogdan mynameisbogdan force-pushed the limit/offset branch 3 times, most recently from 5b6fc20 to a96bbe4 Compare March 6, 2023 13:10
@mynameisbogdan
Copy link
Contributor

mynameisbogdan commented Mar 6, 2023

@ilike2burnthing I think it's better to use strictly positive integers, but with PageSize should we set minimum 1, since 0 would be the default?

@ilike2burnthing
Copy link
Contributor Author

Sure, whatever makes the most sense. I was just just doing a quick fix to satisfy the check.

@Roardom
Copy link

Roardom commented Apr 10, 2023

Hi, just saw this PR. Sorry to make things more complicated. After doing some logging on our tracker and determining that Jackett was using 32% of our total database execution time, we've spent some time optimizing our api (HDInnovations/UNIT3D-Community-Edition#2679).

Among other things such as removing a couple of redundant queries and implementing some basic caching, we've changed our pagination from using perPage and page query parameters (which used to be easily mapped to limit and offset with some simple math) to instead using cursor pagination. Now, links to the previous/next page are found within the api response, or are null if there doesn't exist a previous/next page. But it's no longer possible for a client to specify a specific page themselves. Supposedly, you could base64 encode/decode the cursor query parameter, and change as you see fit, but that functionality is part of the framework we use and not something we can modify/support.

I would think as trackers in active development get larger in size, that they too will switch to using cursor pagination instead of limit/offset, for the greater efficiency (https://shopify.engineering/pagination-relative-cursors explains the advantages if you're interested). Unfortunately, this concept doesn't correspond directly to torznab's pagination implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Don't Merge Hold up - don't merge this Work in Progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants