Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure on creation of a new Presto dataset due to failure to load columns #25962

Closed
eyalsh99 opened this issue Nov 12, 2023 · 13 comments · Fixed by #26782 · May be fixed by #28305
Closed

Failure on creation of a new Presto dataset due to failure to load columns #25962

eyalsh99 opened this issue Nov 12, 2023 · 13 comments · Fixed by #26782 · May be fixed by #28305
Labels
data:connect:athena Related to Athena data:connect:presto Related to Presto

Comments

@eyalsh99
Copy link

We have a Presto database connection working fine with virtual datasets (added throw SQLLab).
When trying to add a new dataset of a physical table we're getting an error.

Reproducing the bug:

  1. Go to Datasets
  2. Click on '+ Dataset'
  3. Select the a Presto DB
  4. Select the Schema
  5. Select the Table
  6. See error

Expected results

See the table columns on the right pane

Actual results

An error message appears on the right pane:
"An Error Occurred
Unable to load columns for the selected table. Please select a different table."

Couldn't find any errors in the Superset app container log or any other container.

Screenshots

image

Environment

  • Google Chrome, Version 119.0.6045.123
  • Superset version: 3.0.1
  • python version: 3.11.4
  • node.js version: 20.9.0
  • any feature flags active: none

Checklist

  • [+ ] I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • [ +] I have reproduced the issue with at least the latest released version of superset.
  • [ +] I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

@sfirke
Copy link
Contributor

sfirke commented Nov 14, 2023

Possibly a duplicate of #23982, do you have that sort of datetime partition / can you try a table without that to isolate the issue? Specifically the comment #23982 (comment) has a fix you could try.

@sfirke sfirke added the data:connect:presto Related to Presto label Nov 14, 2023
@sfirke
Copy link
Contributor

sfirke commented Nov 14, 2023

@bkyryliuk you are listed as a Presto user in the database rolodex - do you have thoughts on this or that issue I linked?

@eyalsh99
Copy link
Author

Possibly a duplicate of #23982, do you have that sort of datetime partition / can you try a table without that to isolate the issue? Specifically the comment #23982 (comment) has a fix you could try.

@sfirke Thank for your comment but I don't think it's the same problem. It happens for all tables. The attached tables has only "string" columns.

@eyalsh99
Copy link
Author

eyalsh99 commented Nov 19, 2023

I was able to debug the problem and found that it was caused by the stringification of the data type at _create_column_info function. It expects to get a column type object and not a string representation of it.
Once changed I managed to get the table columns and create the dataset.

@sfirke
Copy link
Contributor

sfirke commented Nov 19, 2023

Nice problem solving! Thanks for the update. Are you able to propose a fix to the codebase and send a pull request?

@eyalsh99
Copy link
Author

Nice problem solving! Thanks for the update. Are you able to propose a fix to the codebase and send a pull request?

Thanks. Sure, I'll send the PR.

@ameedbakri
Copy link

@eyalsh99
what you change to fix the issue?

@eyalsh99
Copy link
Author

eyalsh99 commented Feb 7, 2024

@eyalsh99 what you change to fix the issue?

You should go to superset/db_engine_specs/presto.py and in _create_column_info function do the following change:
"type": f"{data_type}", ==> "type": data_type,

The function is expected to return the native column type and not a stringify version of it.

brouberol added a commit to brouberol/superset that referenced this issue Feb 7, 2024
…able/schemas

This patch fixes failures occuring when performing a schema preview of a
Presto table.

The `PrestoBaseEngineSpec.where_latest_partition` attempts to construct
SQLAlchemy `Column` objects based on a name and a type. However, this leads
to the following error in our case:

```console
sqlalchemy.exc.ArgumentError: 'SchemaItem' object, such as a 'Column' or a 'Constraint' expected, got 'VARCHAR'
```

This comes from the fact that we run `Column('column_name', 'VARCHAR')` instead of
`Column('column_name', sqlalchemy.types.VARCHAR)`. We fix this particular error by
passing the _actual_ type class, and not just a string.

> [!NOTE]
> This also fixes the same issue for Trino tables, as `TrinoEngineSpec` inherits
> from `PrestoBaseEngineSpec`, the Presto db client class.

Fixes apache#25962
Fixes apache#25962
@rusackas
Copy link
Member

rusackas commented Mar 1, 2024

@eyalsh99 are you still planning to open a PR for that change (and maybe add a test if we're lucky?)

@RyzhkovIlia
Copy link

@eyalsh99 I am facing the same problem using Athena, maybe you know a solution for my case?

@rusackas rusackas added the data:connect:athena Related to Athena label Apr 19, 2024
@sfirke
Copy link
Contributor

sfirke commented Apr 26, 2024

Someone in Slack just reported experiencing this in Athena, too. Same issue or different?

@RyzhkovIlia
Copy link

@sfirke Same issues but on Athena, not Presto. And athena.py doesn't have _create_column_info function

@eyalsh99
Copy link
Author

eyalsh99 commented May 1, 2024

@eyalsh99 are you still planning to open a PR for that change (and maybe add a test if we're lucky?)

Apologies for the late response. Submitted the PR: #28305

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:connect:athena Related to Athena data:connect:presto Related to Presto
Projects
None yet
5 participants