Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics in Tobira #1155

Open
LukasKalbertodt opened this issue Apr 3, 2024 · 11 comments
Open

Statistics in Tobira #1155

LukasKalbertodt opened this issue Apr 3, 2024 · 11 comments
Labels
kind:new-feature A new feature needs:decision Needs a decision of some kind (discussion thread) needs:research Needs research as we are lacking knowledge to make an informed decision

Comments

@LukasKalbertodt
Copy link
Member

Users of Tobira should be able to see some statistics about their events, pages, ... in Tobira. The most basic one would be a video "click" counter. This is a very general issue describing all the ways one could approach that.

With #1099, it is possible to let the Paella player send usage statistics to a Matomo server. As described in #1038, this is not the full solution, as that data is only stored in Matamo, not displayed in Tobira itself.

Where to store the statistical data?

Matomo

The collection would work as it does with the Paella plugin now: matomo.js would be loaded from the Matomo server, and events would be emitted, which are sent to the same server.

Advantages:

  • Central (can aggregate view counts for a video from Tobira, LMS, etc for example)
  • Is optimized for storing this kind of data & has built-in UI tools to show & analyze the data

Disadvantages:

  • Data collection can be blocked rather easily by the browser or plugins (my browser does block Matomo, for example)
  • Tobira would have to retrieve the data from Matomo, adding a new communication link, which complicates the architecture of Tobira, making the whole system potentially harder to set up and more brittle.

Tobira

Tobira would just use its own API to let the frontend send certain events and statistics.

Advantages:

  • Data collection very unlikely to get blocked -> better data.
  • No reliance on another external system.
  • There is some Tobira-specific data (e.g. page views), so Tobira likely wants to store some statistics itself anyway. Technically, they could also be ingested into Matomo, but that seems a bit weird to me.

Disadvantages:

  • Is not central, i.e. video views through an LMS are not counted.
  • Incurrs some implementation work that we would get for free in Matomo, e.g. figuring out how to best store such data or how to protect against bad actors (e.g. adding a million views to a video)

Opencast

Opencast could get APIs to send events to (similar to Matomo). And/or Opencast could just check incoming HTTP requests for certain files, and trace them back to an event.

Advantages:

  • Central (can aggregate view counts for a video from Tobira, LMS, etc for example)
  • Data collection very unlikely to get blocked -> better data.
  • Tobira would only need to fetch statistical data from Opencast, where a communication channel already exists, meaning the system architecture does not change.

Disadvantages:

  • Incurrs some implementation work that we would get for free in Matomo. (See same point in Tobira section above)
  • Implementing that requires discussions in the OC community, i.e. this would be lots slower than just implementing what we need in Tobira.

As you can see, this is not a simple choice.

I personally tend towards not relying on Matomo to not make the architecture more involved. But I can totally understand the requirement to also show video views that happened in an LMS. And I'm not entirely sure how difficult implementing all of this would be, especially the protection against bad actors.

A mix of these options is certainly possible. For example, we could task Opencast with only gathering basic statistics about videos, while Tobira collects most other data itself.

@LukasKalbertodt LukasKalbertodt added kind:new-feature A new feature needs:decision Needs a decision of some kind (discussion thread) needs:research Needs research as we are lacking knowledge to make an informed decision labels Apr 3, 2024
@oas777
Copy link
Collaborator

oas777 commented Apr 5, 2024

In light of resources being scarce this year, I suggest to keep it simple:

  • Make Tobira gather the most basic statistic: Clicks per video.
  • Display this figure with an option to deactivate to not reveal how poorly your video is doing.
  • Any data beyond this has to be requested from the Opencast/Matomo admin.
  • Opencast uses Matomo to aggregate statistics from Tobira, LMS, and other distribution channels.
  • Matomo feeds some data to the statistic tab in Opencast (I understand this is somehow working in Bern).

Questions to change my mind:

  • How big is the difference between Tobira and Matomo in gathering data?
  • How reliable is the Bern solution to feed Matomo data to Opencast? Is there any data beyond "clicks" being used?
  • Can you share statistics from Matomo with owners of a video / a series?

If anyone wants better statistics from Opencast and/or Tobira in Opencast and/or Tobira, they can revisit this as an Opencast/Tobira feature next year.

@LukasKalbertodt
Copy link
Member Author

How big is the difference between Tobira and Matomo in gathering data?

It's really quite difficult to get good numbers on that. For somewhat obvious reasons: browsers blocking Matomo tracking, are hard to track. I have the following two plugins installed in all my desktop browsers I regularly use:

Both of these seem to block Matomo in their default configuration. In uBlock Origin's case, the "Easy Privacy" list seems to be responsible for blocking, which is also used by other ad blockers, I believe. Of course these absolute numbers about "users" above are not very helpful. It's also important to understand that the typical users of a video portal are not the "average population" and might be more inclined to install such a privacy or ad blocking plugin. There are also some Chrome-based browsers like "Brave" getting fairly popular. Many of these also promise enhanced privacy and I saw reports about at least "Brave" of blocking Matomo in some cases.

But even with this research, it's hard to put numbers on it. If I were asked to take a guess, I would say that between 5% and 50% of your ETH video portal users would block Matomo. That's quite the range, I know :P If I were pressed to guess one number... maybe 15%?

And again, some users (e.g. computer science students) are much more likely to have such a blocker installed. So all your computer science lectures might have significantly fewer views if using Matomo for that 😬


Regarding your suggestion: I don't mind if Tobira starts collecting very basic statistics. Then we will probably start doing that some time soon.

@oas777
Copy link
Collaborator

oas777 commented Apr 8, 2024

Thanks, Lukas. So if my video was clicked 100 times, Matomo reports 5-50 clicks, right? What would Tobira report?

@LukasKalbertodt
Copy link
Member Author

No, Matomo would report 50-95 clicks. My percentages talk about the probability of Matomo being blocked, i.e. unable to report anything. Tobira would report all 100.

@oas777
Copy link
Collaborator

oas777 commented Apr 9, 2024

Tobira would report all 100.

That's tempting. Anyway, let's wait for David and others to comment my suggestions.

@dagraf
Copy link
Collaborator

dagraf commented Apr 12, 2024

Here my comments:

  • I agree with the 'keep it simple solution' sketched by Olaf here. At a later stage, and depending on more findings about how easy/feasible it is to get more data from Matomo to Opencast (see comment below), we could end up wanting Tobira to gather more information than just clicks.

Questions about our solutions:

  • Currently, we only provide "clicks" to users with the right "Write" on a series or video (see screenshot from the Opencast tab "Stastics" below). As far as I know, it would be possible to provide more information coming from Matomo, but there seems to be an issue when Opencast sends a lot of requests to Matomo. @snoesberger, can you give more details about this issue?
  • How reliable is the Bern solution for feeding Matomo data to Opencast? > There are some tricks to circumvent some add blockers (again, @snoesberger can give you the details) when working with Matomo. Having these tricks running, we compared the clicks counted by Matomo with our server logs in late 2024. See the screenshot below of what we decided to communicate to our users after having analyzed the numbers. In short and english: "Experience has shown that deviations of up to 20% can occur. With low access figures, these deviations can also be higher."
Bildschirmfoto 2024-04-12 um 08 40 47 Bildschirmfoto 2024-04-12 um 09 20 06

@oas777
Copy link
Collaborator

oas777 commented Apr 16, 2024

Pending Sascha's comments and in order to limit this discussion to "Statistics in Tobira" I would suggest

  • to implement the simple solution for the time being and
  • have Sascha and Oli align what's feasible in Opencast.

@snoesberger
Copy link

Finally, here you have my comments about different topics in this conversation:

  • How big is the difference between Tobira and Matomo in gathering data?

As Lukas already mentioned, AdBlockers are a big problem for Matomo. But there are ways to avoid being blocked by them, see f. ex. https://github.com/0x11DFE/Matomo-Anti-Adblock. With these settings I was able to bypass AdBlockers like uBlock or Ad Blocker Ultimate. But at the moment this works only for Paella 6, in Paella 7 there is no way to change the name of the Matomo JavaScript file which has to be loaded by the player (Paella GitHub issue).

It is difficult to get real numbers on how many views or clicks you are missing with Matomo. Most of the blocking happens on the client side and you will never know from your (server) perspective when statistics were blocked. One way to get an idea is to compare the access to the video files in the server logs with the data in Matomo. With our real live data I did compare our Matomo unique visitors with our nginx access logs. The user IP and the user agent string are used to recognise a unique visitor in the nginx access log file. This are the results:

  • ~70% - 75% of the videos do have exactly the same number of unique visitors
  • for ~85% of the videos the difference of unique visitors between Matomo and server logs is less than +/- 20%
  • ~5% of the videos have more unique visitors in Matomo
  • ~25% of the videos have more unique visitors in the server logs

The AdBlocker bypass as described above was in place for this analysis.

  • How reliable is the Bern solution to feed Matomo data to Opencast? Is there any data beyond "clicks" being used?

At the moment we only provide "clicks" in Opencast. To do this, we copy data from Matomo to the InfluxDB, which is needed by the Opencast statistics feature. The copy script uses the "segment" parameter of the Matomo API to get the hourly data. This can lead to problems if the API with the "segment" parameter is called several times in a row. To avoid these problems we copy the data just once in an hour.

  • Can you share statistics from Matomo with owners of a video / a series?

In the Matomo UI you can't restrict access to the statistic data for just the owned videos or series. A logged in user in Matomo has always access to the statistics of all the videos and series.

Conclusion

  • Matomo has the big advantage to have the video statistics for different possible sources like LMSs, Opencast, Tobira, etc. in one place, but you'll never have 100% accuracy of the data.
  • Getting the data you want from the Matomo API can be difficult and excessive use of the Matomo API may result in incorrect responses (zero data where data should be available).
  • In our case most of the users are using ILIAS to access our Opencast videos. A solution directly implemented in Opencast or Tobira would not cover these accesses (or you would end up building your own "Matomo").
  • Providing access to statistics for owners of videos and series requires custom development, the Matomo UI can't be used for this.
  • For the time being, I would go for the simple solution described by Olaf

@oas777
Copy link
Collaborator

oas777 commented Apr 17, 2024

@oliverkarlETH

@oas777
Copy link
Collaborator

oas777 commented Apr 19, 2024

Thanks Sascha for your explanations. I think we have agreement on what we want in Tobira for the time being. Let's discuss everything else somewhere else.

@LukasKalbertodt
Copy link
Member Author

Thanks @snoesberger for your data. I just saw your talk as well. Very very useful information! I am also happy to see that my estimates weren't that off. What surprised me is that "The AdBlocker bypass as described above was in place for this analysis." So even with this workaround, Matomo missed around 25% of users. That's very interesting. So overall I guess the number of users blocking Matomo is non-insignificant and certainly not something one can ignore easily.

While I do think the "Tobira only counts views in Tobira itself" limitation is quite a major one, I agree with both of you that we should start with that. Even if we eventually move this into Opencast (to count all views), development and experimentation with this in Tobira is likely faster. Once we have something in Tobira that we are decently happy with, one can still try to move it into Opencast.

I'm very interested in tackling this, but as you know, some other things have a higher priority right now. We will see when I'll get to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:new-feature A new feature needs:decision Needs a decision of some kind (discussion thread) needs:research Needs research as we are lacking knowledge to make an informed decision
Projects
Status: Todo
Development

No branches or pull requests

4 participants