Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenAPI docs #371

Open
uniquestring opened this issue Sep 25, 2021 · 17 comments
Open

Add OpenAPI docs #371

uniquestring opened this issue Sep 25, 2021 · 17 comments

Comments

@uniquestring
Copy link

TL;DR

I propose to switch to OpenAPI (fka Swagger) for API documentation. I am willing to contribute to this effort, but we need to discuss if and how this should be pulled of before. I am aware of ajayyy/SponsorBlock#706.

Why switch?

I think the API and docs just grew as this project developed and there is nothing wrong with that. I think that a point is reached where a switch to OpenAPI would make sense. OpenAPI can not only provide the same information that the wiki page currently does, but also improve on it. There are also additional advantages to using this widely adopted standard that I will go into.

I am aware that there is an inofficial OpenAPI documentation as the result of ajayyy/SponsorBlock#706. But I think that this doesn't really get us anywhere. Besides the fact that this OpenAPI doc is unofficial, it is also hard to find. I only came across them by accident and only after a while and after starting my on draft. Because of that I think in the current situation this existing OpenAPI doc it is almost as good as none.

Advantages

You can read about the advantages of OpenAPI on the official website, but here are the main advantages that I see in this case.

  1. Better accessibility of the API
    • raw spec instead of wiki page
    • -> better accessible for machines
    • -> better accessible for people e.g. via Swagger UI
      • (even) nicer presentation than a wiki page
      • allows to play around with an explore the API
      • would allow for stuff that would be hard/tedious to squeeze into an wiki page (e.g. https://swagger.io/docs/specification/links/)
  2. Better docs
    • less ambiguous than the current docs are, in some places
    • more consistency/less "doc duplication"
      • e.g. compare the "segments" data model returned by /skipSegments and /skipSegments/:sha256HashPrefix.
      • The wiki shows two different models. In reality both responses use the same model.
      • OpenAPI would just reuse the model in this case, reducing doc duplication and also preventing mistakes like this one.
    • simpler way to version the api, deprecate stuff, etc.
  3. Code generation
    • generate API clients for different languages
    • generate server stubs for different languages
    • e.g. https://github.com/swagger-api/swagger-codegen
    • -> spend less time on boring boilerplate code and have more time for actual features
    • -> less work for other developers that want to interact with SponsorBlock

I think code generation might have the biggest impact. It might be able to replace the manually written API client code in the web extension. And third party developers might be able to just go ahead use the API instead of writing their own code doing the tedious ground work first.
I'm not sure, if there is currently an generator suitable to the SponsorBlock server, but if not, there might be one in the future. It could also become handy if the server implementation is switched to a different language or reimplemented later on.

Options

In general there are two options on how the switch to OpenAPI could be done:

  1. Maintain docs separately
    • Maintaining them completely separate is always possible. This would be the same kind of work, that currently happens in the wiki.
    • There are also solutions, that allow you to put OpenAPI doc into doc blocks inside the code. This allows you for example, to document each API endpoint right above the exact function that will handle the corresponding request. This makes it easier to keep the docs up to date and check them against what's actually happening.
      • One doc block solution that I played with is https://github.com/readmeio/oas. It works completely transparently and doesn't add any build dependency to the code itself, since it will just parse comments in the source code, completely independently from the build of the application
  2. Use a framework
    • Frameworks for different languages are available. Many provide advantages like deducing API endpoints/parameters/data models/error codes from the actual code, completely removing the need to manually maintain this info. Some frameworks are also able to take care of some boilerplate code, providing you with already parsed request/response data and error handling.

I would like to contribute to this effort. As I said, I played around with a doc block based solution, and it worked just fine. I am not familiar with typescript, so I don't see myself diving into OpenAPI frameworks that could be used for this project. I want to discuss which approach should be pursued (or none at all) before I put too much work into it.

@ajayyy
Copy link
Owner

ajayyy commented Sep 25, 2021

Maybe linking @mchangrh's swagger page from the wiki could help?

The readmeio generated doc does look like a cool solution for future updates though.

From a very quick search, it seems most of the auto-generators require brute-forcing through api calls, which might just be more work than necessary.

@uniquestring
Copy link
Author

Maybe linking @mchangrh's swagger page from the wiki could help?

It wouldn't hurt for the time being, sure. But if you want to switch to OpenAPI, I would a) integrate it as an official part of the project and b) completely replace the current docs with OpenAPI when it is ready. There is just no point in maintaining multiple docs.

The readmeio generated doc does look like a cool solution for future updates though.

If this looks like the way to go to you, I am happy to start working on it.

From a very quick search, it seems most of the auto-generators require brute-forcing through api calls, which might just be more work than necessary.

I think you got things mixed up. If you have OpenAPI docs, no brute force is needed, as the docs contain exactly the info needed to generate the wrappers.
But there are also tools that brute force APIs or deduce the OpenAPI doc from looking at requests made during normal use. This is helpful if you need OpenAPI docs for a third party API that doesn't provide this information itself.

@ajayyy
Copy link
Owner

ajayyy commented Sep 25, 2021

Actually, looking at https://github.com/mchangrh/sb-openapi-source/blob/7e697901828106c877c598abd3c47e4969b6a2f0/openapi.yaml, I think it might be easier to just write the xml instead of the comment in the code, as the comment still needs to specify types and params anyway, since it looks like it doesn't care about typescript types.

@uniquestring
Copy link
Author

I think you are right. I first played around with doc blocks (via https://github.com/readmeio/oas). They have some sorthands for defining parameters, but they are very limited. So I ended up writing full blown OpenAPI notation anyways, which makes the doc blocks relatively long.

In the end, the difference between the doc blocks and the full blown OpenAPI file isn't that big. You can basically just cut part of the OpenAPI file and paste it somewhere as a comment. So it should be easy to switch later on if necessary.

How about this: @mchangrh's OpenAPI spec is already a very good starting point. It was generated by Postman, so some things can be generalized a bit more, but it should save some work. After generalizing, we should have a good first version, that you can take a look at. If you are happy with the result we could continue to work towards the switch of the API.

Ok?


One more thing in advance: An example:

/skipSegments

{  
// ...
  category: string, // Optional, defaults to "sponsor", can be repeated for multiple categories.
  // OR
  categories: string[], // Optional, array of categories, will look like ["sponsor","intro"]
    // See https://github.com/ajayyy/SponsorBlock/wiki/Types#category
// ...
}

These are pretty much redundant as they can both achieve same. category would map quite nicely (as an enum) to OpenAPI while the other would just be a dumb string, with no advantage over the other parameter. This might be the reason why @mchangrh left some parameters etc. out.

I think this API rework might be a good occasion do some clean up work. Not only parameters but also default values, etc. This could also lead to some simplification inside the server code.

How do you feel about this? Would you be ok with some minor API changes along with the rework? Nothing that would really impact functionality, just redundant stuff etc.

@ajayyy
Copy link
Owner

ajayyy commented Sep 25, 2021

I don't think this is redundant as I find the repeated parameter not very intuitive.

@mchangrh
Copy link
Contributor

mchangrh commented Sep 26, 2021

those docs were written ground up since postman didn't let me port them in - I gave up on that idea completely and ended up rewriting from scratch

skipSegment by hash differs slightly since it also returns the video hash

I would be okay linking the swagger docs, I feel bad self-promoting so have tended to keep it away from the main pages.

I maintained the old wiki docs, the mediawiki docs and the sb-openapi repo and have been contributing quite a bit to the actual repository (see status of docs here https://mchang.icu/sb-matrix)

@mchangrh
Copy link
Contributor

There are also solutions, that allow you to put OpenAPI doc into doc blocks inside the code. This allows you for example, to document each API endpoint right above the exact function that will handle the corresponding request. This makes it easier to keep the docs up to date and check them against what's actually happening.

I have looked into this but there doesn't seem to be any adapters I could find for mediawiki, which is our current wiki system

Most of the stuff is generalized, some isn't since it's not seen anywhere else. I'm open to PRs and recommendations and have interacted with most of the server code at some point so I'm no stranger to it.

@uniquestring
Copy link
Author

I don't think this is redundant as I find the repeated parameter not very intuitive.

You can use both: &category=category1&category=category2&category=category3 and ["category1","category2","category3"] (as categories value) to achieve the same result. So technically they are redundant.

Another thing is, that this redundancy just adds complexity. You can have one parameter filled, or the other, or both, or none and you have to handle each of these cases. It might even get tricky to settle on what "the expected behaviour" is in some of those edge cases.

I mean, look at this this for example:

https://github.com/ajayyy/SponsorBlockServer/blob/master/src/routes/getSkipSegments.ts#L285

const categories: Category[] = req.query.categories
        ? JSON.parse(req.query.categories as string)
        : req.query.category
            ? Array.isArray(req.query.category)
                ? req.query.category
                : [req.query.category]
            : ["sponsor"];
if (!Array.isArray(categories)) {
    res.status(400).send("Categories parameter does not match format requirements.");
    return false;
}

Just having one way to define categories, and making the parameter mandatory, could make this much simpler. Should look something like this:

// Maybe you could even leave this extra variable out and just use req.query.categories instead
const categories: Category[] = req.query.categories
if (!Array.isArray(categories)) {
    res.status(400).send("Categories parameter does not match format requirements.");
    return false;
}

Removing such redundancies reduces the potential for bugs and unexpected behaviour and also makes the API easier to understand.

Developers need to know the API anyways. So there isn't much benefit to providing multiple parameters for doing the same thing. Neither does having an optional parameter with an "intransparent" default value instead of just making the parameter mandatory.

@mchangrh
Copy link
Contributor

I think the default override is necessary since some (quite) old clients are using the old endpoint and specify no other parameters, but the repeating categories over JSON array might be better

@mchangrh
Copy link
Contributor

also parsing the JSON causes some complications, if you go up then you can see the try/catch for JSON syntax errors :P

@uniquestring
Copy link
Author

skipSegment by hash differs slightly since it also returns the video hash

Yes. With 'the "segments" data model' I meant just the model for each segment. Of course /skipSegments/:sha256HashPrefix wraps this with a container object, but if you just look at the segment, they still differ:

{
   segment: float[], //[0, 15.23] start and end time in seconds
   UUID: string,
   category: string,
   videoDuration: float // Duration of video when submission occurred (to be used to determine when a submission is out of date)
}

vs.

{
   segment: float[], // [0, 15.23] start and end time in seconds
   UUID: string,
   category: string
}

I have looked into this but there doesn't seem to be any adapters I could find for mediawiki, which is our current wiki system

Yeah and even if there was, I don't think it would make sense to use. Personally, I think the API docs could just be separate. Not all data can be pressed into the same format. You wouldn't document a whole project on a Swagger page, and maybe, the other way around, you shouldn't try too hard putting all docs on a wiki page. I've seen a lot of other Software, where they have some kind of handbook, but just link to the OpenAPI definition/Swagger UI for their API. Nothing wrong with that if you ask me.


I think the default override is necessary since some (quite) old clients are using the old endpoint and specify no other parameters, but the repeating categories over JSON array might be better

If backwards compatibility is a concern, then sure, maybe some stuff should be left in for now. But it isn't really practical to keep it like this forever. At some point the API will (and probably should) break. Third party projects can't rely on APIs never changing anyways. With API versions and deprecations, there are the right tools to signify such changes


also parsing the JSON causes some complications, if you go up then you can see the try/catch for JSON syntax errors :P

Yeah, sure. This only was an example. The point was, that you have to worry about less.

Btw. this is exactly what a dedicated framework would take care of. Like always presenting you with an ready to go Category[] array or returning the appropriate error to the caller, if none was provided. All without you needing to do any of that manually in the code.


Most of the stuff is generalized, some isn't since it's not seen anywhere else. I'm open to PRs and recommendations and have interacted with most of the server code at some point so I'm no stranger to it.

I think I'll just have my shot at it and then we can take a look and discuss. It'll be easier than just hypothetically discussing all of this.

@ajayyy
Copy link
Owner

ajayyy commented Sep 26, 2021

At some point the API will (and probably should) break

That is something that I very much disagree with, but for this case, the actual extension uses the JSON method, the other method was only added for castblock.

I'm proud to say that v1.0.1 of the extension still works perfectly to this day.

@uniquestring
Copy link
Author

At some point the API will (and probably should) break
That is something that I very much disagree with [...]

I'm proud to say that v1.0.1 of the extension still works perfectly to this day.

In some cases it might make sense to break compatibility, like if you develop the API further or remove some old stuff. But in other cases it might be more sensible to keep some "obsolete" functionality and avoid breaking existing applications. Especially when breaking the functionality wouldn't bring that much meaningful improvement, but work to all consumer applications.

I can understand your standpoint and I don't think breakage of the API would be absolutely necessary at this point. That's why I asked you how you feel about it.

My train of thought when working with OpenAPI was: "Mh ok, this category parameter perfectly makes sense. I declare an enum with all the possible values. I then can have a category parameter for the /skipSegments endpoint, that is an optional array of this enum. OpenAPI now documents very clearly, that you can have 0, 1 or multiple categories and they can only be from a certain set of values. This can take care of all the current use cases."

If you have this, you can only ask yourself "Why should there be another parameter?", because:

  1. It wouldn't add anything meaningful.
  2. It has some custom format that coincidentally can't be defined with OpenAPI, so you would need to stick to a dumb string instead of having the more detailed information, that this is an list of values, which values are allowed and other fine granular control.
  3. You need to explain, that either one or the other parameter should be defined or you need to document how the Server will behave if you declare both. (You can't see it from the current docs, but category is ignored if categories is defined. How should anyone know if they don't know the server code and just want to use the API?)

Parts of the documentation in the wiki are already outdated, so from an documentation standpoint, you are already breaking compatibility, even if you just update it to the current version and not change any actual code. That's why I thought, it could be an good opportunity to clean up even more and maybe simplify things while we are at it.


But ok, as I said, I don't see the breakage as absolutely necessary. And I don't want to use much time trying to talk you into it. As I said, it is possible to keep everything as it is and still introduce OpenAPI documentation. I did that anyway when I was playing around with OpenAPI. I just added the "redundant" parameters (as a simple string in the worst case) and noted which parameter could be used instead.

How about this:
I'll create the OpenAPI docs including everything, even the stuff I deem "obsolete". This way, you don't lose any functionality, which seems to be the thing that is important to you. At the same time this project will have an official OpenAPI documentation, which was the goal I was aiming for. You can still just edit the OpenAPI docs and add/remove parameters, endpoints, etc. so it's not like this is set in stone for all time to come.

Would you be ok with that?

@mchangrh
Copy link
Contributor

In terms of oAPI, category can be declared as an array and enum'd so it will repeat instead of going through the json format.

Also I think some of the features you mentioned were v3 exclusive and swaggerUI only supports up to v2 so beware of compatibility issues there. Also, to the best of my knowledge the wiki documentation is up to date but I will go over it later

@uniquestring
Copy link
Author

In terms of oAPI, category can be declared as an array and enum'd so it will repeat instead of going through the json format.

That is exactly what I wrote. If you look at the docs about serialization, you'll see that you can achieve something like /users?id=3&id=4&id=5, which matches the expected format for category, but there isn't anything matching the ["sponsor","intro"] that the categories parameter expects. That's why the only option I see is a dumb string.


Also I think some of the features you mentioned were v3 exclusive and swaggerUI only supports up to v2 so beware of compatibility issues there.

I don't think so. I used OAS 3.0 when I played around with this project and it works fine. Swagger UI does seem to support OAS 3.0 for over 3 years now: https://github.com/swagger-api/swagger-ui#compatibility

Swagger UI Version Release Date OpenAPI Spec compatibility
3.18.3 2018-08-03 2.0, 3.0

You even used OAS 3.0 in your own OpenAPI spec:

openapi: 3.0.4

https://github.com/mchangrh/sb-openapi-source/blob/main/openapi.yaml#L1


Also, to the best of my knowledge the wiki documentation is up to date but I will go over it later

I didn't go into it very deeply, just the first few endpoints for testing, but as I wrote before, there are some misalignments I stumbled upon: #371 (comment). But nothing big as far as I can see.

@mchangrh
Copy link
Contributor

Sorry, I must have been thinking about OAS 3.1.
I'll have to dig through my history to find the relevant discussion, but I believe it was type being an array of strings and then having a custom serializer to JSON-ify it made the weird categories=[`"sponsor", "selfpromo"] possible

@mchangrh
Copy link
Contributor

borrowing from https://github.com/Podcastindex-org/docs-api, RapiDoc might be a way to have presentable docs without going full-on with swagger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants