Accidental inclusion of a defective preview feature leading to wide performance difference #5

sorenbs · 2024-03-28T15:58:11Z

Hey Drizzle team 👋

First of all, thanks a lot for putting in the effort to create these benchmarks comparing Drizzle with Prisma ORM. We found these to be a useful resource for performance comparisons and appreciate your work!

Accidental inclusion of a defective preview feature leading to the wide performance difference

While reviewing the benchmark, we have discovered that most queries perform almost identical between Drizzle and Prisma. Two queries use the experimental full-text search feature, which is in preview. Almost the entire difference in performance can be attributed to those two queries.

This defective preview feature can be found in two routes in the benchmark setup where FTS is being used:

We have documented the performance issue with this preview feature, and provided instructions for how to perform full-text search with a raw SQL query to avoid this performance problem.

As you will see in the benchmark output below, changing these two routes to use raw SQL queries causes Drizzle and Prisma to perform very similar.

If we fix the broken query, Prisma ORM and Drizzle have similar performance

If we drop the broken preview feature and replace it with a raw SQL query (using Prisma ORM’s $queryRaw escape hatch), the results show that Prisma ORM and Drizzle are in the same ballpark when it comes to performance. In that case, Drizzle is about 8% faster (in terms of the iterations measured by k6):

	Drizzle	Prisma ORM
Raw results
Iterations per second	4814	4414
Avg iteration duration	517 ms	564 ms

I also ran the initial setup with the defective fulltext-search feature, and can confirm that the results for Prisma are about 5x worse in that case.

This PR changes the full-text search queries to use raw SQL, and updates Prisma to the latest version.

To ensure that developers can still successfully use full-text search with Prisma ORM, we have updated our documentation with instructions to drop down to raw SQL if they observe slow queries using the search API.

…pdate to latest Prisma version.

AndriiSherman · 2024-04-02T15:57:20Z

Hey @sorenbs!

Thanks a lot. That all makes sense. We didn't update benchmarks and are planning to update them and add more use cases for different production scenarios. We would also appreciate any ideas for project structures from your side, if you ever wanted to share.

We would need to rerun those benchmarks ourselves on separate machines, the same way we did it before and explained in the README. After this is done, we will update the results and merge this PR!

hansaskov · 2024-04-02T17:58:40Z

Hey @sorenbs!

Thanks a lot. That all makes sense. We didn't update benchmarks and are planning to update them and add more use cases for different production scenarios. We would also appreciate any ideas for project structures from your side, if you ever wanted to share.

We would need to rerun those benchmarks ourselves on separate machines, the same way we did it before and explained in the README. After this is done, we will update the results and merge this PR!

I have previously used this repository for a good starting point when i wanted to benchmark Hasura against Drizzle. So here are some improvements i would recommend for your next update.

Consider having the server and db on seperate machines. So 3 machines in total. This will give additional insight about the cpu usage of the server and db. Specifically to determine if poor performance is caused by inefficient queries or additional computation on the server.
Interfaces, interfaces interfaces. Refactor the drizzle and prisma implementation to use a unified interface. And use this interface as the input when implementing it with hono. This will make it easier to add and swap queriy libraries with you preferred backend.
Consider only starting one server at a time, so either start the drizzle server or the prisma server with the same port. Dont start them at the same time with different ports.

Thats it. Please ask if you would like more clarification :)

AndriiSherman · 2024-04-02T18:02:27Z

Hey @sorenbs!
Thanks a lot. That all makes sense. We didn't update benchmarks and are planning to update them and add more use cases for different production scenarios. We would also appreciate any ideas for project structures from your side, if you ever wanted to share.
We would need to rerun those benchmarks ourselves on separate machines, the same way we did it before and explained in the README. After this is done, we will update the results and merge this PR!

I have previously used this repository for a good starting point when i wanted to benchmark Hasura against Drizzle. So here are some improvements i would recommend for your next update.

Consider having the server and db on seperate machines. So 3 machines in total. This will give additional insight about the cpu usage of the server and db. Specifically to determine if poor performance is caused by inefficient queries or additional computation on the server.

Interfaces, interfaces interfaces. Refactor the drizzle and prisma implementation to use a unified interface. And use this interface as the input when implementing it with hono. This will make it easier to add and swap queriy libraries with you preferred backend.

Consider only starting one server at a time, so either start the drizzle server or the prisma server with the same port. Dont start them at the same time with different ports.

Thats it. Please ask if you would like more clarification :)

Thanks a lot! Will consider all of the points

sorenbs · 2024-04-03T10:58:02Z

Thanks @AndriiSherman!

I found the structure of the benchmark repository to be pretty good! The changes suggested by hansaskov above are good. In addition, I would really like to see the individual queries benchmarked separately. That way developers can see how Drizzle and Prisma perform on different kinds of queries.

I have one question, now that we are talking. After having looked at this for a while, I still don't understand what this graph is referring to:

It says avg latency, but I haven;t been able to match the 8.7ms for drizzle or the 1.49s for Prisma to anything. Not the screenshot of k6 you included in the readme, and not the results of my own runs of the benchmark - neither before or after my changes.

In fact - my own testing shows Drizzle and Prisma to be within 10% of each other - not two orders of magnitude!

AndriiSherman · 2024-04-03T11:02:41Z

Thanks @AndriiSherman!

I found the structure of the benchmark repository to be pretty good! The changes suggested by hansaskov above are good. In addition, I would really like to see the individual queries benchmarked separately. That way developers can see how Drizzle and Prisma perform on different kinds of queries.

I have one question, now that we are talking. After having looked at this for a while, I still don't understand what this graph is referring to:
It says `avg latency`, but I haven;t been able to match the 8.7ms for drizzle or the 1.49s for Prisma to anything. Not the screenshot of k6 you included in the readme, and not the results of my own runs of the benchmark - neither before or after my changes.
In fact - my own testing shows Drizzle and Prisma to be within 10% of each other - not two orders of magnitude!

I would need to mention @AlexBlokh to help me with this. He can explain more about the benchmarks UI on our site.

Regarding benchmarking separate queries: the whole purpose was to show a real project use case. Most of the time, you are not executing the same query for one production project. It's always about some specific use case and a specific set of queries that will be executed by your application users. So, I don't think it's useful to just run select * from users with k6. This query will be the same for any ORM out there, I guess

sorenbs · 2024-04-03T11:29:52Z

Regarding benchmarking separate queries: the whole purpose was to show a real project use case. Most of the time, you are not executing the same query for one production project. It's always about some specific use case and a specific set of queries that will be executed by your application users. So, I don't think it's useful to just run select * from users with k6. This query will be the same for any ORM out there, I guess

That makes complete sense to me. I think a breakdown by query would be an interesting level of additional detail, and I can certainly see it being useful to show the blended result as the top-line metric. It's your benchmark, so I'll let you all have that conversation and make that decision.

SaltyAom · 2024-04-03T19:55:44Z

src/prisma-server.ts

- },
- });
+ const term = `${c.req.query("term")}:*`;
+ const result = await prisma.$queryRaw`select * from "customers" where to_tsvector('english', "customers"."companyName") @@ to_tsquery('english', ${term});`;


btw, I ran this on my end, and seems like this line is supposed to be selecting "customers"."company_name" instead since the Prisma schema is using companyName with @map("contact_name").

// prisma.schema 12 model Customer { 13 id Int @id @default(autoincrement()) 14 companyName String @map("company_name") @db.VarChar 15 contactName String @map("contact_name") @db.VarChar

Further more, if we ran this query without modification from company_name to companyName Prisma will throw an error unable to find customers.companyName.

So this line should be:

const result = await prisma.$queryRaw`select * from "customers" where to_tsvector('english', "customers"."company_name") @@ to_tsquery('english', ${term});`;

AlexBlokh · 2024-04-18T11:16:59Z

hey, @sorenbs!

There's been many concerns that we were comparing Drizzle with an outdated version of Prisma which is not the case, those two were the latest version back then

I would say that there was anything accidental on our end, we've spent enormous amount of time preparing a proper meaningful benchmarks as close to the real world case as possible and were aiming to continue adding comparisons if there be a demand and interest of the audience, which took 8 months to build

I would really like to see the individual queries benchmarked separately

we didn't do it that way specifically, we've done synthetic benchmarks with mitata in the past and in SQLite department we were beating Prisma up to x120 times on some queries while on others we every other ORM and Prisma were much the same. While that's a great marketing statement - it doesn't bring a lot of actual business value. So we decided to showcase that if you have a VPS with Postgres and NodeJS app - you can have 9x more through output with Drizzle rather that Prisma while having lower latency

As I've said previously we were not updating benchmarks since there were no interest in them within developers until recent time and since we're not VC backed and have very limited resources - we have to carefully spend time on the most important stuff that moves us forward. 8 months have passed since the release, benchmarks are gaining momentum on social media and we will gladly bring them up to date and we'll add MySQL and SQLite, Turso with read replicas, etc.

I still don't understand what this graph is referring to:

We generate a set of 1m weighted random requests to the server, where you have majority of them fetching items by ID and list of products with pagination and others are meaningfully distributed based on real world ecommerce experience we have
Then we run those 1m requests with k6 and collect data to influxdb and then we aggregate them to plot on the 10m graph we have(it can be less than 10m if server manages to handle 1m requests faster)

I think a breakdown by query would be an interesting level of additional detail

while that would be an additional UI/UX challenge for us to represent, that is not relevant either since there are multiple synthetic benchmarks out there for people to look for and I've mentioned that we didn't wanna build another one

Regarding the pull request I do see that you've changed your search API invocation to the raw SQL which tbh defeats the purpose of this benchmark. Drizzle is used in benchmarks as it's meant to be used due to it's API and Prisma is positioned as the proper way for you to be abstracted away from SQL itself and we've used search API due to your docs. That would be mostly the same as comparing us to naked driver or Kysely extension, which doesn't make any sense. We've chosen a real world scenario for ecommerce and used Prisma API for search. Let's wait until you fix the bug in your API and we'll gladly update the benchmarks and the results!

P.S. I do see you're making statements on X that we're making faulty statements, I don't really understand why if you admitted yourself there's/was a bug in Prisma API. We didn't rerun benchmarks ourselves, we will do, but I do still see that there still will be a meaningful gap based on @SaltyAom tweet about CPU usage and CPU usage is the cause of Prisma flatting out at 500reqs/s and Drizzle going to 4.5k and then flatting out

replace full-text search preview feature with raw SQL queries. Also u…

bc3d302

…pdate to latest Prisma version.

This was referenced Mar 29, 2024

Update Prisma to latest version #1

Open

Upgrade Drizzle and Prisma to its latest versions #2

Open

SaltyAom reviewed Apr 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accidental inclusion of a defective preview feature leading to wide performance difference #5

Accidental inclusion of a defective preview feature leading to wide performance difference #5

sorenbs commented Mar 28, 2024 •

edited

AndriiSherman commented Apr 2, 2024

hansaskov commented Apr 2, 2024 •

edited

AndriiSherman commented Apr 2, 2024

sorenbs commented Apr 3, 2024

AndriiSherman commented Apr 3, 2024

sorenbs commented Apr 3, 2024

SaltyAom Apr 3, 2024

AlexBlokh commented Apr 18, 2024 •

edited

Accidental inclusion of a defective preview feature leading to wide performance difference #5

Are you sure you want to change the base?

Accidental inclusion of a defective preview feature leading to wide performance difference #5

Conversation

sorenbs commented Mar 28, 2024 • edited

Accidental inclusion of a defective preview feature leading to the wide performance difference

If we fix the broken query, Prisma ORM and Drizzle have similar performance

AndriiSherman commented Apr 2, 2024

hansaskov commented Apr 2, 2024 • edited

AndriiSherman commented Apr 2, 2024

sorenbs commented Apr 3, 2024

AndriiSherman commented Apr 3, 2024

sorenbs commented Apr 3, 2024

SaltyAom Apr 3, 2024

Choose a reason for hiding this comment

AlexBlokh commented Apr 18, 2024 • edited

sorenbs commented Mar 28, 2024 •

edited

hansaskov commented Apr 2, 2024 •

edited

AlexBlokh commented Apr 18, 2024 •

edited