New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new time on page #3645
base: master
Are you sure you want to change the base?
new time on page #3645
Conversation
} | ||
) | ||
else | ||
from e in no_select_timed_pages_q, | ||
select: {e.pathname, fragment("sum(?)/countIf(?)", e.duration, e.transition)} | ||
from e in no_select_timed_pages_q, select: {e.pathname, avg(e.duration)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New SQL:
SELECT
pathname,
avg(duration)
FROM (
SELECT
pathname,
sum(next_timestamp - timestamp) AS duration
FROM (
SELECT
leadInFrame(timestamp) OVER (
PARTITION BY session_id
ORDER BY timestamp
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING
) AS next_timestamp,
timestamp,
pathname,
user_id
FROM events_v2 SAMPLE 20000000
WHERE -- {...site and date range filter...}
)
WHERE next_timestamp != 0 AND -- {...some page filter...}
HAVING duration > 0
GROUP BY pathname, user_id
)
GROUP BY pathname
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous SQL:
SELECT
pathname,
sum(duration)/countIf(transition)
FROM (
SELECT
pathname,
next_pathname != pathname AS transition,
sum(next_timestamp - timestamp) AS duration
FROM (
SELECT
leadInFrame(timestamp) OVER event_horizon AS next_timestamp,
leadInFrame(pathname) OVER event_horizon AS next_pathname,
timestamp,
pathname,
session_id
FROM events_v2 SAMPLE 20000000
WHERE -- {...site and date range filter...}
WINDOW event_horizon AS (
PARTITION BY session_id
ORDER BY timestamp
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING
)
)
WHERE next_timestamp != 0 AND -- {...some page filter...}
GROUP BY pathname, next_pathname, session_id
)
GROUP BY pathname
which tried to recreate the original neighbour SQL:
SELECT
pathname,
round(sum(duration)/count(case when next_pathname != pathname then 1 end))
FROM (
SELECT
pathname,
next_pathname,
sum(next_timestamp-timestamp) AS duration
FROM (
SELECT
*,
neighbor(timestamp, 1) as next_timestamp,
neighbor(pathname, 1) as next_pathname,
neighbor(session_id, 1) as next_session_id
FROM (
SELECT
pathname,
timestamp,
session_id
FROM events_v2 SAMPLE 20000000
WHERE -- {...site and date range filter...}
ORDER BY session_id, timestamp
)
)
WHERE session_id=next_session_id AND -- {...some page filter...}
GROUP BY pathname, next_pathname, session_id
)
GROUP BY pathname
@@ -362,15 +355,15 @@ defmodule Plausible.Stats.Breakdown do | |||
select: %{ | |||
page: i.page, | |||
time_on_page: sum(i.time_on_page), | |||
visits: sum(i.pageviews) - sum(i.exits) | |||
visitors: sum(i.visitors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this imported_pages query might have been under-reporting time_on_page
as it was dividing by visits
rather than visitors
which I assume is lower. The reason I'm changing it to visitors
is because that's how google seems to calculate time_on_page
on its dashboards and it also maps better to avg(time_on_page across sessions)
which the events_v2
query does.
populate_stats(site, [ | ||
build(:pageview, pathname: "/", user_id: @user_id, timestamp: ~N[2021-01-01 00:00:00]), | ||
build(:pageview, pathname: "/", user_id: @user_id, timestamp: ~N[2021-01-01 00:10:00]) | ||
]) | ||
|
||
assert [%{"name" => "/", "time_on_page" => nil}] = | ||
assert [%{"name" => "/", "time_on_page" => 600.0}] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new approach supports single page visits / applications. The previous approach was dividing by zero in that case (since there were no page transitions).
0127766
to
21403a4
Compare
Changes
This PR changes how Time on Page statistic is calculated.
Before it was:
Now it is:
TODOs
sum(duration(sessions) per user) > 0
and not "active pages"sum(duration(page)) per user > 0
?Tests
Changelog
Documentation
Dark mode