Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

1011 Running sql using sparkconnect should not print full stack trace #1012

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

b1ackout
Copy link

@b1ackout b1ackout commented May 10, 2024

Describe your changes

  1. Updated is_non_sqlalchemy_error to handle pyspark.sql.utils.AnalysisException, thus when short_errors is enabled to show only the spark sql error and not the full stack trace

Issue number

Closes #1011

Checklist before requesting a review


馃摎 Documentation preview 馃摎: https://jupysql--1012.org.readthedocs.build/en/1012/

@b1ackout b1ackout requested a review from edublancas as a code owner May 10, 2024 10:16
src/sql/run/sparkdataframe.py Outdated Show resolved Hide resolved
src/sql/run/sparkdataframe.py Outdated Show resolved Hide resolved
@edublancas edublancas added the feature Adds a new feature label May 21, 2024
@b1ackout b1ackout marked this pull request as draft May 23, 2024 07:34
src/sql/util.py Outdated
@@ -559,6 +559,7 @@ def is_non_sqlalchemy_error(error):
# Pyspark
"UNRESOLVED_ROUTINE",
"PARSE_SYNTAX_ERROR",
"AnalysisException",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking through the code I think adding AnalysisException here will solve the issue since PARSE_SYNTAX_ERROR works as expected.
AnalysisException covers all these error conditions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just need to test it somehow. Will try to package the jupysql and install it in a spark environment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can install like this:

pip install git+https://github.com/b1ackout/jupysql@running-sql-using-sparkconnect-should-not-print-full-stack-trace

Copy link
Author

@b1ackout b1ackout May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it but no luck, the error message doesn't contain AnalysisException, only the sql error conditions listed here which are included in AnalysisException. So instead of included all these in the list (which could also be updated regularly) I think checking if the error is of instance of AnalysisException would be a lot cleaner.

Tested it and it works.
image

]
return any(msg in str(error) for msg in specific_db_errors)
is_pyspark_analysis_exception = (
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If AnalysisException is imported then checks if the error is of instance of pyspark's Analysis Exception and handles it accordingly

@@ -556,11 +562,14 @@ def is_non_sqlalchemy_error(error):
"pyodbc.ProgrammingError",
# Clickhouse errors
"DB::Exception:",
# Pyspark
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these as they are included in AnalysisException

@b1ackout b1ackout requested a review from edublancas May 27, 2024 09:25
@b1ackout b1ackout marked this pull request as ready for review May 27, 2024 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Adds a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants