Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new time on page #3645

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

ruslandoga
Copy link
Contributor

@ruslandoga ruslandoga commented Dec 20, 2023

Changes

This PR changes how Time on Page statistic is calculated.

Before it was:

sum(duration(page)) / count(unique(page, next_page, session))

Now it is:

avg(sum(duration(page) per user) > 0)

TODOs

  • filter by "active visitors" sum(duration(sessions) per user) > 0 and not "active pages" sum(duration(page)) per user > 0?

Tests

  • Automated tests have been updated

Changelog

  • Entry has been added to changelog

Documentation

  • This change does not need a documentation update

Dark mode

  • This PR does not change the UI

}
)
else
from e in no_select_timed_pages_q,
select: {e.pathname, fragment("sum(?)/countIf(?)", e.duration, e.transition)}
from e in no_select_timed_pages_q, select: {e.pathname, avg(e.duration)}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New SQL:

SELECT 
  pathname,
  avg(duration) 
FROM (
  SELECT
    pathname,
    sum(next_timestamp - timestamp) AS duration
  FROM (
    SELECT
      leadInFrame(timestamp) OVER (
        PARTITION BY session_id
        ORDER BY timestamp
        ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING
      ) AS next_timestamp,
      timestamp,
      pathname,
      user_id
    FROM events_v2 SAMPLE 20000000 
    WHERE -- {...site and date range filter...}
  )
  WHERE next_timestamp != 0 AND -- {...some page filter...}
  HAVING duration > 0
  GROUP BY pathname, user_id
)
GROUP BY pathname

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous SQL:

SELECT 
  pathname,
  sum(duration)/countIf(transition)
FROM (
  SELECT
    pathname,
    next_pathname != pathname AS transition,
    sum(next_timestamp - timestamp) AS duration
  FROM (
    SELECT
      leadInFrame(timestamp) OVER event_horizon AS next_timestamp,
      leadInFrame(pathname) OVER event_horizon AS next_pathname,
      timestamp,
      pathname,
      session_id
    FROM events_v2 SAMPLE 20000000
    WHERE -- {...site and date range filter...}
    WINDOW event_horizon AS (
      PARTITION BY session_id
      ORDER BY timestamp
      ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING
    )
  )
  WHERE next_timestamp != 0 AND -- {...some page filter...}
  GROUP BY pathname, next_pathname, session_id
)
GROUP BY pathname

which tried to recreate the original neighbour SQL:

SELECT
  pathname,
  round(sum(duration)/count(case when next_pathname != pathname then 1 end))
FROM (
  SELECT
    pathname,
    next_pathname,
    sum(next_timestamp-timestamp) AS duration
  FROM (
    SELECT
      *,
      neighbor(timestamp, 1) as next_timestamp,
      neighbor(pathname, 1) as next_pathname,
      neighbor(session_id, 1) as next_session_id
    FROM (
      SELECT
        pathname,
        timestamp,
        session_id
      FROM events_v2 SAMPLE 20000000
      WHERE -- {...site and date range filter...}
      ORDER BY session_id, timestamp
    )
  )
  WHERE session_id=next_session_id AND -- {...some page filter...}
  GROUP BY pathname, next_pathname, session_id
)
GROUP BY pathname

@@ -362,15 +355,15 @@ defmodule Plausible.Stats.Breakdown do
select: %{
page: i.page,
time_on_page: sum(i.time_on_page),
visits: sum(i.pageviews) - sum(i.exits)
visitors: sum(i.visitors)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this imported_pages query might have been under-reporting time_on_page as it was dividing by visits rather than visitors which I assume is lower. The reason I'm changing it to visitors is because that's how google seems to calculate time_on_page on its dashboards and it also maps better to avg(time_on_page across sessions) which the events_v2 query does.

populate_stats(site, [
build(:pageview, pathname: "/", user_id: @user_id, timestamp: ~N[2021-01-01 00:00:00]),
build(:pageview, pathname: "/", user_id: @user_id, timestamp: ~N[2021-01-01 00:10:00])
])

assert [%{"name" => "/", "time_on_page" => nil}] =
assert [%{"name" => "/", "time_on_page" => 600.0}] =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new approach supports single page visits / applications. The previous approach was dividing by zero in that case (since there were no page transitions).

@ruslandoga ruslandoga self-assigned this Feb 18, 2024
@ruslandoga ruslandoga removed their assignment Apr 6, 2024
@ruslandoga ruslandoga marked this pull request as ready for review April 23, 2024 17:51
@ruslandoga ruslandoga marked this pull request as draft April 23, 2024 17:59
@ruslandoga ruslandoga marked this pull request as ready for review April 23, 2024 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant