You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, the decision to run a batch query in distributed mode or locally is based solely on the query structure itself, without considering the data size. This approach could pose a problem if the table being scanned is small, as running in distributed mode might incur excessive overhead due to scheduling costs potentially surpassing execution costs. I propose leveraging table statistics (e.g., row count, table size) to estimate an upper bound for IO operations in a batch query. The query should be executed in distributed mode only if the expected IO is sufficiently high.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently, the decision to run a batch query in distributed mode or locally is based solely on the query structure itself, without considering the data size. This approach could pose a problem if the table being scanned is small, as running in distributed mode might incur excessive overhead due to scheduling costs potentially surpassing execution costs. I propose leveraging table statistics (e.g., row count, table size) to estimate an upper bound for IO operations in a batch query. The query should be executed in distributed mode only if the expected IO is sufficiently high.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: