Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can loadJSON be added as an instance method which merges indices? #247

Open
acnebs opened this issue Jan 31, 2024 · 3 comments
Open

Can loadJSON be added as an instance method which merges indices? #247

acnebs opened this issue Jan 31, 2024 · 3 comments

Comments

@acnebs
Copy link

acnebs commented Jan 31, 2024

I'm having an issue where my index is getting periodically garbage-collected by the system, without the rest of my code doing so. So I end up in a situation where I'm expecting an index to be there, but it is not. I have implemented logic to check for when the index is smaller than I expect and re-index items, however this can be quite performance intensive depending on the size of the indexed content.

So I am also persisting the index to IndexedDb with JSON.stringify(minisearchInstance) and then loading it first. However Minisearch.loadJSON is a constructor, so I need to entirely replace my instance. However there are some instances when I have already indexed new items into my fresh index before I have finished deserializing my persisted index, at which point those documents are lost.

Would it be possible to make loadJSON an instance method of a MiniSearch instance which merged the deserialized index with whatever is already in the instance to address such a use case?

@acnebs acnebs changed the title Can loadJSON be added as an instance method? Can loadJSON be added as an instance method which merges indexes? Jan 31, 2024
@acnebs acnebs changed the title Can loadJSON be added as an instance method which merges indexes? Can loadJSON be added as an instance method which merges indices? Jan 31, 2024
@lucaong
Copy link
Owner

lucaong commented Jan 31, 2024

Hi @acnebs ,
I need to think more deeply about it, but it is probably not possible to merge two indexes more efficiently than in a reindex.

That said, the index should never be garbage collected if your MiniSearch instance is still available. Under the hood, the index is referenced by a simple _index property in the MiniSearch instance object, so as long as the instance is reachable by live code, the garbage collector will not collect the index. I suspect something else might be going on in your app, and I think it is better to fix the root issue. Could you explain more about the issue? What do you observe when the index disappears?

Without knowing much about your system and issue, some question that could maybe point in the right direction:

  • Do you ever call removeAll()? A bug causing unnecessary calls to this method would explain the behavior. The removeAll() method, when called with no argument, clears the whole index immediately.
  • Do you make use of discard / discardAll / replace? While these are all safe to use, it is possible that a bug in their usage could cause unnecessary deletions and hence the behavior you observe.

@acnebs
Copy link
Author

acnebs commented Jan 31, 2024

This is in the context of a capacitor app, and mostly happens on iOS.

The app itself creates a new instance of minisearch when the app first loads, indexing all the items. If the user leaves the app for a while, iOS eventually clears everything from memory. When you go back to the app though, it seems like Apple is doing some trickery (or capacitor is) so that the entire web app does not reload -- that initializing step which inits a class-based wrapper (as a singleton) around minisearch does not run.

But there are no items in the minisearch index anymore - they have been cleared.

@lucaong
Copy link
Owner

lucaong commented Jan 31, 2024

I see, thanks @acnebs for the explanation.

I would advise trying to figure out how to re-initialize the app: the MiniSearch instance and the index are ultimately made of plain JavaScript objects (although many nested ones for the index), so if iOS or Capacitor clears memory and does not re-initialize, in principle what you observe could happen with any other piece of state too.

That said, I will think if I can expose a method to efficiently load a serialized index on top of what is already indexed. Without having thought much about it, it does not sound promising, but I don't want to exclude it right away.

Meanwhile, one thing you could do on the application side, is to re-index only the documents that were added after the last time the index was persisted. For example, you could:

  • Upon detecting that the index has shrunk, get a list of all the document IDs that are in the smaller index by performing a wildcard search: miniSearch.search(MiniSearch.wildcard)
  • Load a new MiniSearch instance from the serialized index with loadJSON, then only re-index the documents in the list you found with the wildcard search

Note that this strategy won't work if you also perform deletions, but that's also the case with an "index merging" feature. If you also delete documents, you might need to keep track of which documents you added or deleted since the last serialization, and perform such updates on top of the serialized index upon recovery.

As you can see, it can get messy, so ideally you would find how to force proper re-initialization of the app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants