Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Superset 3.0.0 is showing up " PrestoDBSQLValidator was unable to check your query." if #28326

Open
3 tasks done
rabindragogoi opened this issue May 3, 2024 · 10 comments
Open
3 tasks done
Labels
data:connect:presto Related to Presto

Comments

@rabindragogoi
Copy link

rabindragogoi commented May 3, 2024

Bug description

Superset : 3.0.0

While running a query in SQL LAB with more that 20k values in IN clause, the query is not getting submitted in and is returning:

"superset.databases.commands.exceptions.ValidatorSQLError:
PrestoDBSQLValidator was unable to check your query."
Please recheck your query.
Exception: HTTPSConnectionPool(host='xxxx.uat.com', port=7778): Max retries exceeded with url: /v1/statement (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))"

The error is not in presto side as the query did not even got submitted and is failing in the superset itself.

Error Stack trace:

2024-05-03 02:33:54,745:ERROR:superset.databases.commands.validate_sql:HTTPSConnectionPool(host='XXXX.UAT.com', port=7778): Max retries exceeded with url: /v1/statement (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/superset/databases/commands/validate_sql.py", line 68, in run
errors = self._validator.validate(sql, schema, self._model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/superset/sql_validators/presto_db.py", line 174, in validate
annotation = cls.validate_statement(statement, database, cursor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/superset/sql_validators/presto_db.py", line 146, in validate_statement
raise ex
File "/opt/app-root/lib64/python3.11/site-packages/superset/sql_validators/presto_db.py", line 76, in validate_statement
db_engine_spec.execute(cursor, sql)
File "/opt/app-root/lib64/python3.11/site-packages/superset/db_engine_specs/base.py", line 1526, in execute
raise cls.get_dbapi_mapped_exception(ex) from ex
File "/opt/app-root/lib64/python3.11/site-packages/superset/db_engine_specs/base.py", line 1524, in execute
cursor.execute(query)
File "/opt/app-root/lib64/python3.11/site-packages/pyhive/presto.py", line 269, in execute
response = self._requests_session.post(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/adapters.py", line 517, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='XXXX.UAT.com', port=7778): Max retries exceeded with url: /v1/statement (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
SupersetErrorException
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/superset/databases/commands/validate_sql.py", line 68, in run
errors = self._validator.validate(sql, schema, self._model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/superset/sql_validators/presto_db.py", line 174, in validate
annotation = cls.validate_statement(statement, database, cursor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/superset/sql_validators/presto_db.py", line 146, in validate_statement
raise ex
File "/opt/app-root/lib64/python3.11/site-packages/superset/sql_validators/presto_db.py", line 76, in validate_statement
db_engine_spec.execute(cursor, sql)
File "/opt/app-root/lib64/python3.11/site-packages/superset/db_engine_specs/base.py", line 1526, in execute
raise cls.get_dbapi_mapped_exception(ex) from ex
File "/opt/app-root/lib64/python3.11/site-packages/superset/db_engine_specs/base.py", line 1524, in execute
cursor.execute(query)
File "/opt/app-root/lib64/python3.11/site-packages/pyhive/presto.py", line 269, in execute
response = self._requests_session.post(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/adapters.py", line 517, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='XXXX.UAT.com', port=7778): Max retries exceeded with url: /v1/statement (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/flask/app.py", line 1484, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/flask/app.py", line 1469, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
return f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/superset/views/base_api.py", line 127, in wraps
raise ex
File "/opt/app-root/lib64/python3.11/site-packages/superset/views/base_api.py", line 121, in wraps
duration, response = time_function(f, self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/superset/utils/core.py", line 1526, in time_function
response = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/superset/utils/log.py", line 255, in wrapper
value = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/superset/databases/api.py", line 1037, in validate_sql
validator_errors = ValidateSQLCommand(pk, sql_request).run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/superset/databases/commands/validate_sql.py", line 87, in run
raise ValidatorSQLError(superset_error) from ex
superset.databases.commands.exceptions.ValidatorSQLError: PrestoDBSQLValidator was unable to check your query.
Please recheck your query.
Exception: HTTPSConnectionPool(host='XXXX.UAT.com', port=7778): Max retries exceeded with url: /v1/statement (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))

======================

The error is throwing out from "/opt/app-root/lib64/python3.11/site-packages/superset/databases/commands/validate_sql.py"

NOTE :
The query gets submitted to Presto and returns back with results if we have 1k to 3k of records in the IN clause.

How to reproduce the bug

  1. Go to sqllab
  2. create a query with more that 20K records in IN Clause of a query
  3. Query will be in pending state for a long time and will error out.

Screenshots/recordings

image

Superset version

master / latest-dev

Python version

3.11

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@rusackas rusackas added the data:connect:presto Related to Presto label May 3, 2024
@rabindragogoi
Copy link
Author

rabindragogoi commented May 3, 2024

Any Update for this issue? @rusackas

@rusackas
Copy link
Member

rusackas commented May 3, 2024

Nope, but I'm highly doubtful that many people have 20K values in their "in" clause. Normally you would keep that many values in a table, and just do a JOIN or other (far more terse) query. Is that possible in your case? You might just be sending far too much text to the SQL parser or something like that, causing some bottleneck/timeout, which seems like a rare and avoidable edge case.

Here's what GPT has to say about your issue, in case you want to investigate some more:

The error you're encountering in Apache Superset seems to be originating from an SSL error during the HTTPS connection attempt to your PrestoDB server. The error message "EOF occurred in violation of protocol" typically indicates a lower-level issue with the SSL/TLS handshake between Superset and your PrestoDB server.

Here are several steps and checks you can perform to troubleshoot and potentially resolve this issue:

  1. Verify SSL Configuration: Check the SSL configuration on the PrestoDB server. Ensure that the SSL certificate is correctly installed, valid, and not expired. Verify that the server is configured to use TLS protocols supported by the client (Superset in this case).
  2. Check Network Issues: Since the error message includes "Max retries exceeded," it's also worth checking if there are any network connectivity issues between the Superset instance and the PrestoDB server. You can test connectivity using tools like ping or curl to ensure there are no intermittent network issues or firewalls blocking the connection.
  3. Update Dependencies: Ensure that the Python libraries related to networking and HTTP requests (requests, urllib3) are up to date. Sometimes, older versions of these libraries might have bugs or incompatibilities with certain SSL/TLS configurations.
  4. Adjust SSL/TLS Settings in Superset: If the SSL issue persists, you can try adjusting the SSL settings in your Superset configuration. For example, you might adjust the SSL verification settings temporarily to see if that allows the connection to succeed (not recommended for production environments due to security concerns).
  5. Increase Timeout and Retries: The error mentions that maximum retries have been exceeded. You can increase the timeout and retry settings in Superset's configuration to allow more time for the connection to be established, especially if the server is under heavy load or slow to respond.
  6. Review Superset Logs: Further investigate any additional log entries around the time of the error to see if there are more detailed messages that might indicate the root cause. This could provide more context on why the SSL/TLS handshake is failing.
  7. Simplify the Query: Since the error occurs when a large number of values are used in the IN clause of your SQL query, test with a smaller set of values to determine if the issue is related to the size of the request. If the smaller query succeeds, the issue might be related to how large queries are handled or transmitted over the network.

To proceed, you might start with the simpler checks (like verifying network connectivity and SSL certificates) and then move towards more complex configurations and adjustments as needed.

@rabindragogoi
Copy link
Author

Thanks @rusackas : I am trying to upgrade to superset-3.1.2 in the hope to over come both the Trino issue (#26768) and this SSLError issue.

Could you please provide me the requirements.txt file required for superset-3.1.2 ? This would be a great help.

Thanks in advance

@rabindragogoi
Copy link
Author

rabindragogoi commented May 7, 2024

@rusackas : I tried with superset-3.1.2 but still I am getting the "SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))""

One another point I would like to mention is :
---- The same query with 20k values in IN clause works perfect in superset-1.5.2
---- In Superset-3.1.2 the query works well with limited values say 2k values in IN clause. but fails with 20K values.

I am using the below requirement.txt file.

alembic==1.12.0
amqp==5.1.1
apache-superset==3.1.2
apispec==6.3.0
async-timeout==4.0.3
attrs==23.1.0
Babel==2.13.0
backoff==2.2.1
bcrypt==4.0.1
billiard==4.1.0
blinker==1.6.3
Brotli==1.1.0
#cachelib==0.8.0
cachelib==0.9.0
celery==5.3.4
certifi==2023.7.22
cffi==1.16.0
click==8.1.7
click-didyoumean==0.3.0
click-option-group==0.5.6
click-plugins==1.1.1
click-repl==0.3.0
colorama==0.4.6
convertdate==2.4.0
cron-descriptor==1.4.0
croniter==1.4.1
#cryptography==39.0.2
cryptography==42.0.4
Deprecated==1.2.14
deprecation==2.1.0
dnspython==2.4.2
email-validator==1.3.1
exceptiongroup==1.1.3
Flask==2.3.3
#Flask-AppBuilder==4.3.7
Flask-AppBuilder==4.4.1
Flask-Babel==2.0.0
#Flask-Caching==1.11.1
Flask-Caching==2.1.0
Flask-Compress==1.14
Flask-JWT-Extended==4.5.3
Flask-Login==0.6.2
Flask-LDAP==0.1.5
Flask-Migrate==3.1.0
Flask-SQLAlchemy==2.5.1
flask-talisman==1.1.0
Flask-WTF==1.2.1
Flask-SimpleLDAP==1.3.3
func-timeout==4.3.5
geographiclib==2.0
python-ldap==3.3.1
geopy==2.4.0
gevent==23.9.1
greenlet==3.0.0
#gunicorn==20.1.0
gunicorn==21.2.0
h11==0.14.0
hashids==1.3.1
hijri-converter==2.3.1
#holidays==0.23
holidays==0.25
humanize==4.8.0
idna==3.4
importlib-metadata==6.8.0
importlib-resources==6.1.0
isodate==0.6.1
itsdangerous==2.1.2
Jinja2==3.1.2
jsonschema==4.19.1
jsonschema-specifications==2023.7.1
kombu==5.3.2
korean-lunar-calendar==0.3.1
limits==3.6.0
Mako==1.2.4
Markdown==3.5
markdown-it-py==3.0.0
MarkupSafe==2.1.3
marshmallow==3.20.1
marshmallow-sqlalchemy==0.26.1
mdurl==0.1.2
msgpack==1.0.7
mysqlclient==2.1.0
nh3==0.2.14
numpy==1.23.5
ordered-set==4.1.0
outcome==1.2.0
packaging==21.3
#pandas==1.5.3
pandas==2.0.3
#paramiko==3.3.1
paramiko==3.4.0
parsedatetime==2.6
pgsanity==0.2.9
Pillow==10.0.1
polyline==2.0.1
prison==0.2.1
prompt-toolkit==3.0.39
#pyarrow==12.0.1
pyarrow==14.0.1
pycparser==2.21
Pygments==2.16.1
PyJWT==2.8.0
PyMeeus==0.5.12
PyNaCl==1.5.0
pyparsing==3.1.1
PySocks==1.7.1
python-dateutil==2.8.2
python-dotenv==1.0.0
python-geohash==0.8.5
pytz==2023.3.post1
PyYAML==6.0.1
redis==4.6.0
referencing==0.30.2
rich==13.6.0
rpds-py==0.10.4
selenium==4.9.1
shortid==0.1.2
simplejson==3.19.2
six==1.16.0
slack-sdk==3.23.0
sniffio==1.3.0
sortedcontainers==2.4.0
SQLAlchemy==1.4.49
SQLAlchemy-Utils==0.38.3
sqlparse==0.4.4
sshtunnel==0.4.0
tabulate==0.8.10
trio==0.22.2
trio-websocket==0.11.1
typing_extensions==4.8.0
tzdata==2023.3
#urllib3==2.0.6
urllib3==2.2.1
vine==5.0.0
wcwidth==0.2.8
Werkzeug==2.3.7
wrapt==1.15.0
wsproto==1.2.0
WTForms==3.0.1
WTForms-JSON==0.3.5
XlsxWriter==3.0.9
zipp==3.17.0
zope.event==5.0
zope.interface==6.1
environs==9.3.3
PyHive[presto]==0.6.4
pydruid==0.6.2
elasticsearch-dbapi==0.2.10
sqlalchemy-trino==0.5.0
thrift==0.16.0
trino==0.326.0
requests==2.31.0
requests-kerberos==0.14.0
pykerberos==1.2.4
pipdeptree==2.10.0

It would be great if you provide me the requirements.txt file required for superset-3.1.2.

Thanks in advance.

@rabindragogoi
Copy link
Author

@rusackas : Could you please help me with the compatible trino version with superset-3.1.2?

This is a compatible issue as suggested by chat gpt.

image

@rusackas
Copy link
Member

rusackas commented May 7, 2024

@dosu-bot

Copy link

dosubot bot commented May 7, 2024

Hello @rusackas,

Nice to meet you! I'm Dosu, a bot here to assist you with bugs, answer questions, and help you get more involved with Apache Superset while we wait for a human maintainer. I'm diving into your issue now and will get back to you shortly with some insights.

@rusackas
Copy link
Member

rusackas commented May 7, 2024

Not sure the currently functional version. I haven't used Trino/Presto since Nov/Dec, and I was just using whatever's built into Preset at the time (it as for a demo). Maybe @nytai knows a compatible version?

Triggering out new friend DOSU to see if it provides something helpful here ;)

Copy link

dosubot bot commented May 7, 2024

To address the SSL error with your query in Apache Superset 3.1.2, ensure your database connection configuration includes the necessary SSL parameters. Specifically, if you're encountering SSL errors, verify that the server_cert is correctly configured in your Superset database connection settings. This involves ensuring the certificate is valid and correctly references the CA that issued the server's certificate. If the certificate is self-signed or not trusted by the system running Superset, you'll need to add the certificate to the trusted store or provide it directly in the connection configuration.

For a practical example, you might use the following format in your connection string to include the certificate:

trino://{UserName}:{Password}@{trino url without https}:443/{catalog name}?verify={ca certificate path on superset host}

This approach has helped others resolve similar issues, as discussed in Configuring Secured Trino with Superset.

Regarding the compatible Trino version for Superset 3.1.2, while there's no explicit mention of compatible Trino versions in the documentation you've provided, generally, Trino versions 350 and newer should work well with Superset 3.1.2. Ensure your sqlalchemy-trino library is up to date, as it facilitates the interaction between Superset and Trino. The version included in your requirements.txt, 0.5.0, should be compatible, but always check for newer versions that might include fixes or improvements.

If you face any specific issues, examining the Superset and Trino server logs can offer insights into potential compatibility problems. Adjusting log levels for more detailed information might also aid in diagnosing and resolving issues.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@rabindragogoi
Copy link
Author

Hi @rusackas : One thing I noticed is that the query with 20k values runs good when I make the presto connection with username and password but shows up the error when i make the connection with keytab. With keytab I am able to run only 1k to 2k of records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:connect:presto Related to Presto
Projects
None yet
Development

No branches or pull requests

2 participants