Optimize database reindex #4558

mattwiller · 2024-05-15T01:38:18Z

In testing on localhost over resource types with 15k – 100k resources in the table, this combination of transaction batching and concurrent handling of the lookup tables reduced the overall full reindex time by 30–50% (saving on the order of several minutes for my small data set).

This PR also moves the reindex logic into an async worker that processes a batch of rows and then enqueues another job to handle the next chunk. This makes the job much more robust, and allows it to more safely run for extended periods of time.

vercel · 2024-05-15T01:38:21Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
medplum-provider	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 22, 2024 9:37pm

3 Ignored Deployments

Name	Status	Preview	Updated (UTC)
medplum-app	⬜️ Ignored (Inspect)	Visit Preview	May 22, 2024 9:37pm
medplum-storybook	⬜️ Ignored (Inspect)	Visit Preview	May 22, 2024 9:37pm
medplum-www	⬜️ Ignored (Inspect)	Visit Preview	May 22, 2024 9:37pm

…imize

sonarcloud · 2024-05-17T22:29:56Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
85.4% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

codyebberson

Overall looks good, nice perf wins 🎉

See questions on forAllResources and transactions

codyebberson · 2024-05-21T18:32:51Z

packages/server/src/fhir/repo.ts

-
- const lastUpdated = resource.meta?.lastUpdated as string;
- currentTimestamp = lastUpdated;
+ await this.withTransaction(async (conn) => {


Is this transaction necessary? Trying to imagine what it's needed for, versus letting the callback handle it.

Hm, I see, you're using this to avoid per-resource transactions in the reindex step.

imho, that kinda breaks the abstraction of forAllResources

I'm not sure the transaction is fully necessary in reindexResource

If it is both necessary and if this is a major performance win, then i think we should probably modify the API contract of forAllResources (maybe a "page" callback and a "resource" callback?)

mattwiller added 2 commits May 13, 2024 16:20

WIP: Refactor loop over resources

cf9d361

Optimize database search reindex

ce3414b

mattwiller added the fhir-datastore Related to the FHIR datastore, includes API and FHIR operations label May 15, 2024

mattwiller added this to the May 31st, 2024 milestone May 15, 2024

mattwiller self-assigned this May 15, 2024

mattwiller requested a review from a team as a code owner May 15, 2024 01:38

Merge branch 'main' of github.com:medplum/medplum into db-reindex-opt…

a231d05

…imize

vercel bot deployed to Preview – medplum-provider May 15, 2024 22:15 View deployment

vercel bot deployed to Preview – medplum-app May 15, 2024 22:19 View deployment

vercel bot deployed to Preview – medplum-storybook May 15, 2024 22:24 View deployment

vercel bot deployed to Preview – medplum-www May 15, 2024 22:28 View deployment

Add end time to allow for earlier termination

ca415a5

vercel bot deployed to Preview – medplum-provider May 16, 2024 17:52 View deployment

mattwiller added 2 commits May 17, 2024 11:40

Move reindex into reentrant async worker

94d7d72

Add tests for reindex job

e107adb

vercel bot deployed to Preview – medplum-provider May 17, 2024 19:17 View deployment

Add comments

6670ebc

vercel bot deployed to Preview – medplum-provider May 17, 2024 21:11 View deployment

Fix build

88d4991

vercel bot deployed to Preview – medplum-provider May 17, 2024 21:18 View deployment

Log periodically during reindex

5916b93

vercel bot deployed to Preview – medplum-provider May 17, 2024 21:50 View deployment

mattwiller added 2 commits May 17, 2024 14:58

Add test case for job failure

1173998

Add elapsed time to reindex progress log

110b89d

vercel bot deployed to Preview – medplum-provider May 17, 2024 22:06 View deployment

mattwiller added 2 commits May 17, 2024 15:06

Fix comments

131ddb8

Merge branch 'main' of github.com:medplum/medplum into db-reindex-opt…

100f3ee

…imize

vercel bot deployed to Preview – medplum-app May 17, 2024 22:55 View deployment

vercel bot deployed to Preview – medplum-storybook May 17, 2024 22:59 View deployment

vercel bot deployed to Preview – medplum-provider May 17, 2024 23:03 View deployment

codyebberson reviewed May 21, 2024

View reviewed changes

mattwiller added 3 commits May 21, 2024 14:34

WIP: Fix

7765444

Hardcode transaction isolation level

7804a8f

Reindex multiple resource types

6d2c009

vercel bot deployed to Preview – medplum-provider May 22, 2024 21:09 View deployment

Revert isolation level change

f1b8e23

vercel bot deployed to Preview – medplum-provider May 22, 2024 21:37 View deployment

mattwiller mentioned this pull request May 22, 2024

Use nested transactions with stronger isolation #4583

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize database reindex #4558

Optimize database reindex #4558

mattwiller commented May 15, 2024 •

edited

vercel bot commented May 15, 2024 •

edited

sonarcloud bot commented May 17, 2024

codyebberson left a comment

codyebberson May 21, 2024

codyebberson May 21, 2024

Optimize database reindex #4558

Are you sure you want to change the base?

Optimize database reindex #4558

Conversation

mattwiller commented May 15, 2024 • edited

vercel bot commented May 15, 2024 • edited

sonarcloud bot commented May 17, 2024

Quality Gate passed

codyebberson left a comment

Choose a reason for hiding this comment

codyebberson May 21, 2024

Choose a reason for hiding this comment

codyebberson May 21, 2024

Choose a reason for hiding this comment

mattwiller commented May 15, 2024 •

edited

vercel bot commented May 15, 2024 •

edited