Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-129] POC - Real-time Dashboards powered by data streams #28272

Open
surapuramakhil opened this issue Apr 30, 2024 · 2 comments
Open

[SIP-129] POC - Real-time Dashboards powered by data streams #28272

surapuramakhil opened this issue Apr 30, 2024 · 2 comments
Labels
sip Superset Improvement Proposal

Comments

@surapuramakhil
Copy link

surapuramakhil commented Apr 30, 2024

Please make sure you are familiar with the SIP process documented
here. The SIP will be numbered by a committer upon acceptance.

[SIP] Proposal for ...<title>

Motivation

Today, real-time dashboards are built on repeated polling of 10 seconds interval. For Every pool, SQL queries are executed, typically the entire data needed for dashboard would be fetched.

This causes a lot of load on warehouse DB - especially when you have the lot of active users on Superset and at the same time you have a lot of real time dashboards. This thing can be skipped entirely if your dashboards have low retention periods.

Proposed Change

Like, How SQL Lab performs SQL queries and generates dataset which are required to power dashboards. An alternate pipeline powered by steams would generate/update the datasets required for dashboards. Like SQL Lab, there will be another module where user can specify how a steam needs to be consumed (functions), and how those dataset needs to be updated.

  1. Stream as a data source
  2. Stream consumers function for dataset population, rest everything remains same (restricting scope)

New or Changed Public Interfaces

Describe any new additions to the model, views or REST endpoints. Describe any changes to existing visualizations, dashboards and React components. Describe changes that affect the Superset CLI and how the Superset is deployed.

New dependencies

Describe any npm/PyPI packages that are required. Are they actively maintained? What are their licenses?

Migration Plan and Compatibility

Describe any database migrations that are necessary, or updates to stored URLs.

Rejected Alternatives

Describe alternative approaches that were considered and rejected.

@surapuramakhil surapuramakhil added the sip Superset Improvement Proposal label Apr 30, 2024
@rusackas
Copy link
Member

I've always conjectured that this would be tied into the Global Async Queries feature. two main reasons:

  1. It has a redis caching layer, so that might lighten the load on your DB in general.
  2. If we want to do real realtime analytics, the dashboard would need a major overhaul so that (a) charts all subscribe to a websocket for updates, which would be published on query completion OR a push of subscribed streamed data, and (b) all charts should be updated so that they support transitioning/transposing on data uptates rather than just blinking a hard refresh.

@rusackas rusackas changed the title [SIP] POC - Real-time Dashboards powered by data streams [SIP-129] POC - Real-time Dashboards powered by data streams May 1, 2024
@surapuramakhil
Copy link
Author

Restricting scope of this SIP. Which are necessary and sufficient for this to work.

  1. Adding Streams as alternate data source
  2. Stream processors - "consumers function" for dataset population

One dataset populates, the rest of the flow would remain the same. Chart queries will hit in-Memory dataset where underlying data lies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sip Superset Improvement Proposal
Projects
Development

No branches or pull requests

2 participants