Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Macro adapter.get_columns_in_relation() doesn't return any columns during unit testing #10139

Open
2 tasks done
DmytroSly opened this issue May 14, 2024 · 6 comments
Open
2 tasks done
Labels
bug Something isn't working unit tests Issues related to built-in dbt unit testing functionality

Comments

@DmytroSly
Copy link

DmytroSly commented May 14, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

The model I configured a unit test for, relies on the macro adapter.get_columns_in_relation() that I use for one of the models, referenced with the ref() macro. However, the macro adapter.get_columns_in_relation() returns an empty list for a relation during unit testing. This causes the SQL code to compile improperly and fail.

Expected Behavior

The macro adapter.get_columns_in_relation() should return a list of columns for a referenced relation during unit testing.

Steps To Reproduce

Create a SQL model that relies on the list of columns returned by the macro adapter.get_columns_in_relation(), then configure and run a unit test for this model

Relevant log output

No response

Environment

- OS: Windows 10 Enterprise
- Python: 3.11.4
- dbt: 1.8.0

Which database adapter are you using with dbt?

snowflake

Additional Context

I hope it is possible to fix this, since dbt knows for sure the list of columns of all the referenced models. Apart from the input columns provided in the unit test config, dbt injects all other columns with NULL values into the resulting SQL for such a unit test. The input tables for a model, that I am trying to unit test, are really wide - 70-80 columns.

@DmytroSly DmytroSly added bug Something isn't working triage labels May 14, 2024
@dbeatty10 dbeatty10 added the unit tests Issues related to built-in dbt unit testing functionality label May 14, 2024
@DmytroSly
Copy link
Author

FYI
I tried using the override functionality to set the list of columns to be returned by the macro adapter.get_columns_in_relation(), but the problem with the override in my case is that this macro is used two times, and each time for a different model. A different list of columns should be returned for every model, but the override makes it return the same list of columns for different models.

@dbeatty10
Copy link
Contributor

I'm not sure if this is the same underlying issue or not, but I just opened this issue that looks similar:

@dbeatty10
Copy link
Contributor

Thanks for reaching out about this @DmytroSly !

Could you provide a simplified version of the models you are having trouble with?

e.g., something like this:

models/model_a.sql

select 1 as id

models/model_b.sql

{%- set columns = adapter.get_columns_in_relation(ref('model_a')) -%}

{% for column in columns %}
  {{ log("Column: " ~ column, info=true) }}
{% endfor %}

select 2 as guid

@Elisabeth-Kury
Copy link

I ran into the same problem when writing a unit test using the dbt.utils get_columns_in_relation function, hence why I'm adding to this issue, hope thats fine :)
I then wrote a really simplified test model that looks as follows, basically in this example I want to concat all column values except the values for the columns createdat and updatedat.

The model input seeds/test.csv is just a simple test file with one row:

id,name,mail,date,updatedat,createdat
1,John,john.example,2024-05-08,2024-05-01,2024-05-01

models/stg_test.sql

{%- set columns = get_columns_in_relation(ref('test')) -%}
{% set column_names = columns|map(attribute='name')|reject("in", ['createdat','updatedat'])|list %}

select
    *, {{ concat( column_names ) }} as row_hash_value
from {{ ref('test') }}

The dbt run is successful and the compiled code looks fine:

select
     *, id || name || mail || date as row_hash_value
from "my_dwh"."seeds"."stg_test"
    )

However when I run a dbt unit test with the same input as a csv fixture, the test fails with the following error message:

09:04:39    Runtime Error in unit_test logic_unit_test_test (models\b_staging\stg_monta\_stg_monta__unit_tests.yml)
  An error occurred during execution of unit test 'logic_unit_test_test'. There may be an error in the unit test definition: check the data types.
   Database Error
    syntax error at or near "as"
    LINE 42:             *,  as row_hash_value
                             ^

Looking at the compiled code it seems that the dbt unit test is not correctly passing the get_columns_in_relation function or the function is returning nothing:

with __dbt__cte__test as (

-- Fixture for test
select 
    
    cast('1' as integer)
 as id, 
    
    cast('John' as text)
 as name, 
    
    cast('john.example' as text)
 as mail, 
    
    cast('2024-05-08' as date)
 as date, 
    
    cast('2024-05-01' as date)
 as updatedat, 
    
    cast('2024-05-01' as date)
 as createdat
) select
    *,  as row_hash_value
from __dbt__cte__test

Hope this helps, would be great if this was resolved or anyone has an idea how to fix this behaviour.

@dbeatty10
Copy link
Contributor

That is very helpful @Elisabeth-Kury 🤩

Full details for a reproducible example, including unit test definition below.

Reprex

seeds/test.csv

id,name,mail,date,updatedat,createdat
1,John,john.example,2024-05-08,2024-05-01,2024-05-01

models/stg_test.sql

{%- set columns = get_columns_in_relation(ref('test')) -%}
{% set column_names = columns|map(attribute='name')|reject("in", ['createdat','updatedat'])|list %}

select
    *, {{ concat( column_names ) }} as row_hash_value
from {{ ref('test') }}

models/_unit_tests.yml

unit_tests:

  - name: test__stg_test
    model: stg_test
    given:
      - input: ref('test')
        rows:
          - {id: 2, name: Ari, mail: [email protected], date: 2024-01-01, updatedat: 2024-01-01, createdat: 2024-01-01}
    expect:
        rows:
          - {id: 2, name: Ari, mail: [email protected], date: 2024-01-01, updatedat: 2024-01-01, createdat: 2024-01-01, row_hash_value: [email protected]}
dbt seed -s test --full-refresh
dbt run -s stg_test
dbt build  -s stg_test

@dbeatty10 dbeatty10 removed the triage label May 29, 2024
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 3, 2024

I think this is another case where actually materializing the fixtures as database objects would work, because then each call to get_columns_in_relation would just go the columns in each relation, without any need to mock different args / return values:

I expect that will be significantly slower in the general case, but it would allow the framework to support more-complex models — so while it shouldn't replace the default, it should be something that users can opt into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unit tests Issues related to built-in dbt unit testing functionality
Projects
None yet
Development

No branches or pull requests

4 participants