Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tantivy Internal Architecture Documentation #2300

Open
kj3moraes opened this issue Jan 12, 2024 · 10 comments
Open

Tantivy Internal Architecture Documentation #2300

kj3moraes opened this issue Jan 12, 2024 · 10 comments

Comments

@kj3moraes
Copy link

This is a request to make a full fledged documentation of the algorithms and the implementation of Tantivy. It would be a great resource to thoroughly document all of Tantivy's internals and the "flow" in a modern documentation style.

Proposal

We document the "internals" of Tantivy using MkDocs (specifically Material theme for Mkdocs since it is ubiquitous).

  • the classes (core, collector, indexer, etc.) - what they are used for and how they achieve this.
  • the essentials of distributed search
  • relation to Apache Lucene (similarities, differences)

Essentially, take the ARCHITECTURE.md file and flush it out further, and put it on an easily accessible site (with good UI).

This is not a step-by-step code walkthrough or a detailed documentation of every method.. It is aimed for someone

  • who knows Rust well
  • wants to learn about search engines, etc.
  • wants to learn about Tantivy's implementation
  • does not have the time to do a in-depth code readthrough.

Plan

I have studied Apache Lucene recently and have asked @fulmicoton if I can work on the internal documentation. My idea is that we can make a branch called internal-docs and setup the documentation there (Material Mkdocs has a great integration for github so I'm biased but we can use whatever everyone collectively decides on).

@PSeitz
Copy link
Contributor

PSeitz commented Jan 13, 2024

I think the rust docs would be a good place for this, as they don't get outdated so easily and are the entry point for documentation.

Or do you think we can not do certain things in rust doc?

Btw I think core should mostly be dissolved, there's a PR for that: #2259

@kj3moraes
Copy link
Author

kj3moraes commented Jan 13, 2024

I was thinking more of a walkthrough of how the indexing and searching occurs internally. The rust docs are great when developing and when you need a reference but if I wanted to understand what exactly happened in the internals of Tantivy, I wouldn't be able to grok that from the docs easily.

Its just a suggestion to have something to learn about the process of distributed search and Tantivy's implementation of it.

@PSeitz
Copy link
Contributor

PSeitz commented Jan 16, 2024

Yes, but we can have a walk through in the rust docs or are there features missing?

if I wanted to understand what exactly happened in the internals of Tantivy, I wouldn't be able to grok that from the docs easily.

I think that's an issue in the docs that others also have currently and should be fixed.

@kj3moraes
Copy link
Author

Fair enough, we could do it in the Rust docs itself. Do they support

  • flowcharts
  • diagrams

If these are present (or can be added with some patches) then Rust docs sounds good. We would need some segregation to explain which is the internal documentation / walkthrough and which is the API reference.

Could you share some links for these kinds of docs that others have made ?

@fulmicoton
Copy link
Collaborator

@kj3moraes I'm ok with something outside of rustdoc as long as it is in markdown. The rust world tends to use mdbook for that.

@PSeitz
Copy link
Contributor

PSeitz commented Jan 24, 2024

Fair enough, we could do it in the Rust docs itself. Do they support

* flowcharts

* diagrams

There's a mermaid integration, which looks promising https://docs.rs/simple-mermaid/latest/simple_mermaid/

If these are present (or can be added with some patches) then Rust docs sounds good. We would need some segregation to explain which is the internal documentation / walkthrough and which is the API reference.

I don't think this needs much separation. A walk-through is really helpful on the API level. Internals are also helpful to understand how to use an API.

Any documentation outside of CI (which rustdocs is part of) will become obsolete, which is the case for several tantivy docs already.

@kj3moraes
Copy link
Author

Yeah fair enough, we can get started on it then. Should we make a separate branch for it ?

Also

Could you share some links for these kinds of docs that others have made ?

for reference

@PSeitz
Copy link
Contributor

PSeitz commented Jan 26, 2024

Yeah fair enough, we can get started on it then. Should we make a separate branch for it ?

I don't think we need a branch for this.

Also

Could you share some links for these kinds of docs that others have made ?

for reference

Are you looking for something specific? The bigger crates have more extensive documentation, eg. https://docs.rs/tokio/latest/tokio/

@kj3moraes
Copy link
Author

Hey @PSeitz , how do you propose we begin ?

@PSeitz
Copy link
Contributor

PSeitz commented Feb 8, 2024

I like the straightforward style of Architecture.md and think it would make a good addition to the docs.

Probably makes sense to identify what's missing as high level concepts or structure and add them, before going into details.

PRs are fine against the main branch. You can also join our discord channel if there are some questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants