Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLO-106 Added index on status field for posts table #20181

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

vershwal
Copy link
Member

@vershwal vershwal commented May 9, 2024

There are slow queries on the posts table that take up to 5 minutes to browse posts for customers with a large number of posts.
Sentry link: https://ghost-foundation.sentry.io/performance/?isDefaultQuery=false&metricSearchSetting=metricsOnly&project=1879038&query=transaction%3A%22GET+%2Fghost%2Fapi%2Fadmin%2Fposts%2F%22&statsPeriod=7d&utc=true

There are two slow queries and an index on [status, type] will improve the performance of both of these queries.

@github-actions github-actions bot added the migration [pull request] Includes migration for review label May 9, 2024
Copy link
Contributor

github-actions bot commented May 9, 2024

It looks like this PR contains a migration 👀
Here's the checklist for reviewing migrations:

General requirements

  • Satisfies idempotency requirement (both up() and down())
  • Does not reference models
  • Filename is in the correct format (and correctly ordered)
  • Targets the next minor version
  • All code paths have appropriate log messages
  • Uses the correct utils
  • Contains a minimal changeset
  • Does not mix DDL/DML operations

Schema changes

  • Both schema change and related migration have been implemented
  • For index changes: has been performance tested for large tables
  • For new tables/columns: fields use the appropriate predefined field lengths
  • For new tables/columns: field names follow the appropriate conventions
  • Does not drop a non-alpha table outside of a major version

Data changes

  • Mass updates/inserts are batched appropriately
  • Does not loop over large tables/datasets
  • Defends against missing or invalid data
  • For settings updates: follows the appropriate guidelines

@vershwal vershwal force-pushed the indexForPostsTable branch 2 times, most recently from 1987efe to a279d05 Compare May 9, 2024 10:47
@vershwal vershwal changed the title Added index on status field for posts table SLO-106 Added index on status field for posts table May 9, 2024
@erik-ghost
Copy link

@vershwal Thanks for creating this! Can you attach EXPLAIN ANALYZE output of this query before and after the change so we can understand the impact it's having on the query analyzer? Can you also test this before/after on a database in staging that's having this problem and share what improvement this is making for the customer as experienced from the browser?

@vershwal
Copy link
Member Author

vershwal commented May 9, 2024

Query 1

SELECT `posts`.*, (
SELECT COUNT(DISTINCT `members_click_events`.`member_id`)
FROM `members_click_events`
INNER JOIN `redirects` ON
  `members_click_events`.`redirect_id` = `redirects`.`id`
WHERE posts.id = redirects.post_id) AS `count__clicks`
FROM `posts`
WHERE (
  `posts`.`status` in ('draft') AND `posts`.`type` = 'post'
)
ORDER BY CASE WHEN posts.status = 'scheduled' THEN 1 WHEN
  posts.status = 'draft' THEN 2 ELSE 3 END ASC,CASE WHEN
  posts.status != 'draft' THEN posts.published_at END
  DESC,posts.updated_at DESC,posts.id DESC
LIMIT 20

Current behavior:-

-> Limit: 20 row(s)  (cost=6015 rows=20) (actual time=876..876 rows=20 loops=1)
    -> Sort row IDs: (case when (posts.`status` = 'scheduled') then 1 when (posts.`status` = 'draft') then 2 else 3 end), (case when (posts.`status` <> 'draft') then posts.published_at end) DESC, posts.updated_at DESC, posts.id DESC, limit input to 20 row(s) per chunk  (cost=6015 rows=7991) (actual time=876..876 rows=20 loops=1)
        -> Filter: ((posts.`type` = 'post') and (posts.`status` = 'draft'))  (cost=6015 rows=7991) (actual time=0.615..875 rows=800 loops=1)
            -> Table scan on posts  (cost=6015 rows=7991) (actual time=0.604..872 rows=15000 loops=1)
-> Select #2 (subquery in projection; dependent)
    -> Aggregate: count(distinct members_click_events.member_id)  (cost=1091 rows=1) (actual time=0.0633..0.0633 rows=1 loops=20)
        -> Nested loop inner join  (cost=1000 rows=908) (actual time=0.063..0.063 rows=0 loops=20)
            -> Covering index lookup on redirects using redirects_post_id_foreign (post_id=posts.id)  (cost=1.6 rows=5.29) (actual time=0.0629..0.0629 rows=0 loops=20)
            -> Index lookup on members_click_events using members_click_events_redirect_id_foreign (redirect_id=redirects.id)  (cost=175 rows=171) (never executed)

After adding index:-

-> Limit: 20 row(s)  (cost=280 rows=20) (actual time=120..121 rows=20 loops=1)
    -> Sort row IDs: (case when (posts.`status` = 'scheduled') then 1 when (posts.`status` = 'draft') then 2 else 3 end), (case when (posts.`status` <> 'draft') then posts.published_at end) DESC, posts.updated_at DESC, posts.id DESC, limit input to 20 row(s) per chunk  (cost=280 rows=800) (actual time=120..121 rows=20 loops=1)
        -> Index lookup on posts using status_type (status='draft', type='post')  (cost=280 rows=800) (actual time=0.662..119 rows=800 loops=1)
-> Select #2 (subquery in projection; dependent)
    -> Aggregate: count(distinct members_click_events.member_id)  (cost=1091 rows=1) (actual time=0.0395..0.0395 rows=1 loops=20)
        -> Nested loop inner join  (cost=1000 rows=908) (actual time=0.0391..0.0391 rows=0 loops=20)
            -> Covering index lookup on redirects using redirects_post_id_foreign (post_id=posts.id)  (cost=1.59 rows=5.29) (actual time=0.0389..0.0389 rows=0 loops=20)
            -> Index lookup on members_click_events using members_click_events_redirect_id_foreign (redirect_id=redirects.id)  (cost=175 rows=171) (never executed)

The estimated cost has decreased from 6015 to 280
The actual time taken has decreased from 876 ms to 120 ms

@vershwal
Copy link
Member Author

vershwal commented May 9, 2024

Query 2

SELECT COUNT(DISTINCT posts.id) AS aggregate
FROM `posts`
WHERE (
  `posts`.`status` in ('draft') AND `posts`.`type` = 'post'
)

Current behavior:-

-> Limit: 200 row(s)  (cost=6744 rows=1) (actual time=882..882 rows=1 loops=1)
    -> Aggregate: count(distinct posts.id)  (cost=6744 rows=1) (actual time=882..882 rows=1 loops=1)
        -> Filter: ((posts.`type` = 'post') and (posts.`status` = 'draft'))  (cost=6736 rows=79.9) (actual time=1.53..880 rows=800 loops=1)
            -> Table scan on posts  (cost=6736 rows=7991) (actual time=1.51..878 rows=15000 loops=1)

After adding index

-> Limit: 200 row(s)  (cost=207 rows=1) (actual time=20.5..20.5 rows=1 loops=1)
    -> Aggregate: count(distinct posts.id)  (cost=207 rows=1) (actual time=20.5..20.5 rows=1 loops=1)
        -> Covering index lookup on posts using status_type (status='draft', type='post')  (cost=127 rows=800) (actual time=0.126..0.938 rows=800 loops=1)

The estimated cost has decreased from 6744 to 207
The actual time taken has decreased from 882 ms to 20.5 ms

const {createNonTransactionalMigration} = require('../../utils');
const {addIndex, dropIndex} = require('../../../schema/commands');

module.exports = createNonTransactionalMigration(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this by using createAddIndexMigration, just like

module.exports = createAddIndexMigration('members_created_events', ['attribution_id']);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migration [pull request] Includes migration for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants