Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expect_column_values_to_of_type does not work #9626

Open
sfan0704 opened this issue Mar 14, 2024 · 20 comments
Open

expect_column_values_to_of_type does not work #9626

sfan0704 opened this issue Mar 14, 2024 · 20 comments

Comments

@sfan0704
Copy link

sfan0704 commented Mar 14, 2024

Opening an issue similar to this #5565

I'm also running into this problem while using a Snowflake datasource to check the type of the columns.

Here are the specs,

  • python==3.10.13
  • great_expectatinos==0.18.10
  • SQLAlchemy==1.4.51
  • snowflake-connector-python==2.8.3
  • snowflake-sqlalchemy==1.4.7

I was debugging through this and found that the metrics object only has the key name in the dict
Screenshot 2024-03-14 at 19 42 13

stack trace
Screenshot 2024-03-14 at 19 38 59

Originally posted by @sfan0704 in #5565 (comment)

@sfan0704 sfan0704 changed the title Hi, I'm also running into this problem while using a Snowflake datasource checking the type of the columns. expect_column_values_to_of_type does not work Mar 14, 2024
@austiezr
Copy link
Contributor

Hey @sfan0704 -- I'm so far unable to reproduce this. Are you able to share your configuration for this expectation as well?

@sfan0704
Copy link
Author

Hey @austiezr thanks for responding to this. My expectation suite looks like this. Not sure what else I can provide to make this clearer.

    {
      "expectation_type": "expect_column_values_to_be_of_type",
      "kwargs": {
        "column": "reference_currency_price",
        "type_": "FLOAT"
      },
      "meta": {}
    },

@austiezr
Copy link
Contributor

How did you initially create this datasource? Is this error arising after running a checkpoint?

@sfan0704
Copy link
Author

I create the datasource like this,

self.data_context.sources.add_snowflake(name="source_name",  connection_string="STRING")

And correct, the exception was thrown after we run the checkpoint.

@NehaNRane
Copy link

Hi Team,

Any update on this issue. I had worked with this expectation and it was working on till March 14, 2024. Today when I am running the same validation with same arguments, its giving me error as below:

Calculating Metrics: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.16it/s]
'type'

@sfan0704
Copy link
Author

Can you share your python dependency versions?

@NehaNRane
Copy link

Can you share your python dependency versions?

I am using
python==3.9.6
great_expectatinos==0.18.7
SQLAlchemy==1.4.51
snowflake-connector-python==3.6.0
snowflake-sqlalchemy==1.5.1

@NehaNRane
Copy link

Hi Team,

Any update on this issue. Any eta for the issue.?

@rachhouse
Copy link
Contributor

Hey @NehaNRane @sfan0704, Austin handed this issue over to me - I'm going to investigate and try to reproduce the error you're getting.

@rachhouse
Copy link
Contributor

@sfan0704 @NehaNRane Ok, I've tested in two different virtual environments matching your dependencies and have not been able to replicate the error - the expect_column_values_to_be_of_type Expectation runs without issue on a Snowflake Table Asset.

As a next troubleshooting step, could you please install great-expectations[snowflake] (using the version currently giving you the error) in a fresh virtual environment and retry your code?

If you still get the same error after a fresh install, we'll take a closer look at your code and Snowflake data next.

@NehaNRane
Copy link

@rachhouse, I tried again by creating fresh virtual environment but I am getting the same error.
python==3.9.6
SQLAlchemy==1.4.52
great-expectations==0.18.12
snowflake-connector-python==3.8.1
snowflake-sqlalchemy==1.5.1

Error:
Calculating Metrics: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.27it/s]
Traceback (most recent call last):
File "c:\Users\10700792\OneDrive - LTIMindtree\Backup - 01-03-2023\Backup on 1st Mar\Documents\Extras\poc\all_expectation_testing.py", line 154, in
validation_result = validator.expect_column_values_to_be_of_type(column="PaymentMethod", type_= "VARCHAR")
File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\validator\validator.py", line 590, in inst_expectation
raise err
File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\validator\validator.py", line 553, in inst_expectation
validation_result = expectation.validate(
File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\expectations\expectation.py", line 1311, in validate
] = validator.graph_validate(
File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\validator\validator.py", line 1089, in graph_validate
raise err
File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\validator\validator.py", line 1073, in graph_validate
result = configuration.metrics_validate(
File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\core\expectation_configuration.py", line 1494, in metrics_validate
return expectation_impl(self).metrics_validate(
File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\expectations\expectation.py", line 1100, in metrics_validate
] = self._validate(
File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\expectations\core\expect_column_values_to_be_of_type.py", line 496, in _validate
actual_column_type = [
File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\expectations\core\expect_column_values_to_be_of_type.py", line 497, in
type_dict["type"]
File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\expectations\metrics\util.py", line 367, in getitem
item = self.data[key]
KeyError: 'type'

@rachhouse
Copy link
Contributor

Hi @NehaNRane, thanks for retrying with a fresh install and including the stacktrace. Can you please also share your GX workflow code from all_expectation_testing.py that generates this error?

@rachhouse
Copy link
Contributor

I've tested this Expectation (expect_column_values_to_of_type) against a collection differently cased Snowflake table columns, but still have not reproduced an error.

At this point, I am suspicious that this behavior might be a Windows issue. I see you are using Windows from your stacktrace, @NehaNRane, and when this same behavior was reported in #5565, it also occurred for a Windows user.

@sfan0704, are you running your code on Windows or WSL?

@sfan0704
Copy link
Author

I'm running on a Mac

@NehaNRane
Copy link

NehaNRane commented Apr 11, 2024

Hi @rachhouse
Below code is in all_expectation_testing.py. Its a simple code.

import great_expectations as gx
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.core.expectation_configuration import ExpectationConfiguration
import yaml
from configs.datasource_config import snowflake_datasource

context = gx.get_context()
snowflake_datasource = {
    "name": "my_snowflake_datasource",
    "class_name": "Datasource",
    "execution_engine": {
        "class_name": "SqlAlchemyExecutionEngine",
        "connection_string": connection_string
        "create_temp_table": False,
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
    },
}

db_datasource = context.add_or_update_datasource(**snowflake_datasource)
table_name = "PAYMENT"
schema_name = "STRIPE"
db_name = "RAW_DATA"

batch_request = RuntimeBatchRequest(
    datasource_name="my_snowflake_datasource",
    data_connector_name="default_runtime_data_connector_name",
    data_asset_name=table_name,
    runtime_parameters={
        "query": f'SELECT * FROM "{db_name}"."{schema_name}"."{table_name}"'
    },
    batch_identifiers={"default_identifier_name": "default_identifier"}
)

validator = context.get_validator(batch_request=batch_request)
validation_result = validator.expect_column_values_to_be_of_type(column="status", type_= "VARCHAR")
print(validation_result)

@rachhouse
Copy link
Contributor

@NehaNRane, looks like the issue is that you're using outdated code - the use of RuntimeBatchRequest is deprecated. Can you try updating your code to use Fluent Data Sources? Here is a simple example of how you can connect to a Snowflake table and run the expect_column_values_to_of_type Expectation:

import great_expectations as gx

DATA_SOURCE_NAME = "<your data source name>"
DATA_ASSET_NAME = "<your data asset name>"
EXPECTATION_SUITE_NAME = "<your expectation suite name>"
CHECKPOINT_NAME = "<your checkpoint name>"

context = gx.get_context()

data_source = context.sources.add_snowflake(
    name=DATA_SOURCE_NAME,
    connection_string="<snowflake connection string>"
)

data_asset = data_source.add_table_asset(
    name=DATA_ASSET_NAME,
    table_name="<snowflake table name>"
)

batch_request = data_asset.build_batch_request()

context.add_or_update_expectation_suite(expectation_suite_name=EXPECTATION_SUITE_NAME)

validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name=EXPECTATION_SUITE_NAME,
)

validator.expect_column_values_to_be_of_type("<snowflake table column name>", type_="FLOAT")
validator.expect_column_values_to_be_of_type("<snowflake table column name>", type_="VARCHAR")

validator.save_expectation_suite(discard_failed_expectations=False)

checkpoint = context.add_or_update_checkpoint(
    name=CHECKPOINT_NAME,
    validator=validator,
)

checkpoint_result = checkpoint.run()

@sfan0704 Thanks for confirming your environment, can share your full GX workflow code that generates the error?

@NehaNRane
Copy link

NehaNRane commented Apr 12, 2024

Hi @rachhouse,

I observed that this issue occurs only when data_asset is created by query asset. We have a table_name in mixed case. To handle this, we are using query asset. In our use case, table_name and column_name can be in any case. So, how to go about it in such scenario.?

@rachhouse
Copy link
Contributor

Thanks @NehaNRane, those details are very helpful - I'm now able to consistently generate an error when using a Snowflake table with a mixed case name, a QueryAsset, and expect_column_values_to_be_of_type. I'll dig into this with Engineering next week.

@sfan0704
Copy link
Author

Sorry for the delayed response, we're also doing exactly what @rachhouse is doing. Looking forward to the potential solution here. Thanks.

@rachhouse
Copy link
Contributor

Hi @NehaNRane @sfan0704, Engineering and I dug further into this error, and the problem is caused by Snowflake tables that have a MixedCaseName name identifier that needs to be quoted (for example, DATABASE.SCHEMA."MyTable"). Fixing the underlying issue is ticketed and on our backlog, but we don't have a timeline yet for when we'll be able to fix it.

The current workaround is to either:
a) create a view of your MixedCaseTableName table that uses an UPPERCASE_NAME or
b) rename your MixedCaseTableName table to use an UPPERCASE_NAME

and use a GX Table Asset with your renamed table or created view. From there, you should be able to run expect_column_values_to_be_of_type on your asset and have it behave as expected.

In testing, I created a view, UCCTABLENAME_VIEW from a table with a quoted MixedCaseName:

create view UCCTABLENAME_VIEW as select * from "UccTableName";

used the view to create a Table Asset:

data_asset = data_source.add_table_asset(
    name="snowflake-table",
    table_name="UCCTABLENAME_VIEW"
)

and was able to run expect_column_values_to_be_of_type using the subsequent validator. In testing, MixedCaseColumnNames worked fine for the Expectation - as long as the table/view name was uppercase.

validator.expect_column_values_to_be_of_type("ColumnOne", type_="FLOAT")
validator.expect_column_values_to_be_of_type("ColumnTwo", type_="VARCHAR")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants