Fulltext Index Search Adapter for DataScript in ClojureScript

I needed fulltext search in a text-heavy application that uses DataScript.

Is it any good?

Yes, but it's still early.

Design

The search adapter maintains a fulltext index in a separate DataScript database. This is not ideal, but the current design of Datomic & DataScript do not support extensible indices.

The fulltext adapter:

listens for changes in the source connection using (d/listen! conn),
inspects the incoming tx-report,
filters on attributes that have :db/fulltext true in their schema,
tokenises the string value,
removes stop words like "the" and "and",
maintains a multi-cardinality attribute in the fulltext DataScript DB.

Using a separate connection makes it convenient to have a one-to-one attribute mapping and to manage cache eviction since it could grow large. In practice this is not an issue because you can query across DataScript databases, e.g. (d/q '[:in $ $1 ...] db1 db2).

Usage

(ns datascript-fulltext.example
    :require [[reagent.core :as r :refer [atom]]
              [datascript.core :as d]
              [com.theronic.datascript.fulltext :as ft]])

(def conn (d/create-conn {:message/text {:db/fulltext true}})
(def !ft-conn (atom nil))

(defn parent-component [conn]
    (let [!input (atom "hi")] ;; todo text input
      [:div [:code "Matching fulltext entities: " (ft/query! @!input)]]))

(defn init! []
  (let [fulltext-conn (ft/install-fulltext! conn)] 
    (d/transact! conn [{:db/id -1 :message/text "hi there"}]) ;; load from storage after connecting to sync.
    (ft/search @ft/ft-conn "hi") ;; => fill yield message ID.
    (reset! !ft-conn fulltext-conn))
  (reagent/render [parent-component conn]))
  
(init!)

Todo

~~Delete index values on mutation. Should be quick to add, and then do a smart diff to avoid writes.~~
Store hashed token values instead of strings.
Use schema definition of source connection.
Track source schema and rebuild index on change.
Batch updates using queued web workers to prevent locking main thread.
Add adapter for off-site storage, e.g. Redis.
Match source transaction IDs if possible.
Fork & extend Datascript to support (fulltext ...) search function.
[in progress] Add soundex or double-metaphone
Maintain indexed token counts for relevance ranking. Maybe :db/index does this already?
Support n-grams (can get heavy).
Bloom filters.
seq over matching datoms directly with pagination.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/com/theronic/datascript		src/com/theronic/datascript
test/datascript_fulltext		test/datascript_fulltext
README.md		README.md
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/com/theronic/datascript

src/com/theronic/datascript

test/datascript_fulltext

test/datascript_fulltext

README.md

README.md

project.clj

project.clj

Repository files navigation

Fulltext Index Search Adapter for DataScript in ClojureScript

Is it any good?

Design

Usage

Todo

About

Releases

Packages

Languages

theronic/datascript-fulltext

Folders and files

Latest commit

History

Repository files navigation

Fulltext Index Search Adapter for DataScript in ClojureScript

Is it any good?

Design

Usage

Todo

About

Topics

Resources

Stars

Watchers

Forks

Languages