Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atom/RSS feed paging #152

Open
jameysharp opened this issue Jun 27, 2018 · 1 comment
Open

Atom/RSS feed paging #152

jameysharp opened this issue Jun 27, 2018 · 1 comment

Comments

@jameysharp
Copy link

RFC5005 is a standard that was published in 2007 for "Feed Paging and Archiving" for Atom and RSS. I'd like to see Granary support paging of those output types using this standard. Ideally, it would also be able to consume paged feeds and convey the paging information in all its output formats as well.

There are three major sections in this standard, not counting introductory and supplemental material.

Section 2, "Complete Feeds", just adds an empty <fh:complete/> tag to indicate that the contents of this feed document represent the complete history. If you can tell that the data you've consumed from your upstream source is complete (there are no earlier or later pages for this query) then you should add this tag to the generated RSS or Atom feed.

Section 3, "Paged Feeds", is useful when you don't know how many entries the query could return, or if there could be an infinite sequence of results. I haven't looked much at this section because for my use cases I've only cared about collections where I want to fetch all pages, where section 4 is more efficient. But section 3 lets you provide a simple cursor interface to clients, which I think is a good fit for what you're doing. Ideally, you'd also support consuming paged feeds and exposing the upstream cursor somehow in the various output formats, but it sounds like that's a longer-term project?

Section 4, "Archived Feeds", is semantically kind of a combination of sections 2 and 3. It indicates that if you fetch all the pages of the feed, then you will have the complete history of the feed. But there are some details specified for efficiency that I think make this section complicated for Granary. The archived feed page served at a particular URL may be treated by clients as if it has a far-future Expires header, so if old entries are inserted, deleted, or edited, then the URL needs to be changed before clients are guaranteed to pick it up. Also, the same entry may appear in multiple feed documents, in which case only the copy from the most recent page is supposed to be used.

So it's not clear to me that Granary can do anything with section 4 archived feeds except pass them through when converting between RSS and Atom, or something like that. But maybe there's some API you consume that turns out to be a good fit for that paging model, I don't know.

I'm guessing that sections 2 and 3 are easy to implement, though, and I'd love to see that happen!

@snarfed
Copy link
Owner

snarfed commented Jun 28, 2018

thanks for filing, and for all the details! and great to meet you! this definitely makes sense. i'd happily merge a PR for this, or maybe even implement it myself when it bubbles up my todo list. :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants