-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyHive, Presto connector returning wrong resultset #460
Comments
Which version of PyHive and SQLAlchemy are you using? |
Sorry for my late reply, I was on a vacation. |
Can you try latest version of PyHive i.e. 0.7.0 to check whether issue still exists. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm using Presto Cluster for processing large amount of data.
To visualize the data I use the connector provided and suggested by the official Superset documentation, which is PyHive from the SQLAlchemy library and I'm using the default settings for the connection.
When using the provided pyhive presto connector and executing a very simple query - "SELECT * FROM test_table", the returned number of rows by the resultset is incorrect compared with the same query executed in the presto-cli app, the official connector provided by the Presto documentation.
I created two simple python scripts to test Presto connection using PyHive and the official jdbc.jar driver.
The PyHive connector returned wrong number of rows in the resultset about 817000 rows, exactly the same number of rows that was returned by the Superset chart. The connector with the official jdbc driver returned the correct amount of data - 875000 rows.
It looks like the issue is caused by the PyHive connector. Is it possible to change the connection method from PyHive to the official JDBC driver?
I'm attaching the two python scripts that I used to reproduce the issue.
The text was updated successfully, but these errors were encountered: