-
-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to lint and fix areas of templated code that are not actually executed #4061
Comments
In some cases, forcing execution of a branch not normally taken could cause runtime errors, parse errors, etc. The linter should ignore these, but in case of potential side effects (e.g. it's common in dbt to call out to the database, perhaps other external services), we should provide a way to suppress this behavior, e.g. a comment like:
|
Hey all, is it possible to run sqlfluff for jinja (dbt) in a way it resolves all possible if-else nodes? For example, in my dbt project I have this:
and I would like to validate The output of the parse command for else block is: |
Thanks for your reply.
For info, I've created a function in R, as more familiar with that than Python. I've set incremental load within the project.yml for all models within models\raw folder where the function output to:
|
|
Hi folks!!,
|
Here is an example where we override a ref.
in the model its basically a select some_columns from |
the first time that the model run the model, the table will be created and the pre_combined key will be added, in this run the Few notes:
I think that you are missing something in your model to de-duplicate the input data.
This model will first deduplicate the data then rerun the only latest record: dbt merge/incremental is not de-deduplication for you. |
I've already started merging in some of the logic for this, so I've assigned this issue to me for now. |
@alanmcruickshank -- do you feel like you have a good path for fixing this in SQLFluff? If not, we've discussed potential ways to override/set state consistently for issues like linting conditional logic so conditionals always compile in a certain direction. The idea would be to give you some levers to pull to get more consistent results out of |
@gwenwindflower - I think we've got a good solution for the Jinja parsing bit already 👍 The part that is causing me a headache is managing multiple versions of a file and getting them to all play nice with each other. That being said - if you've got good suggestions for making this easier then I'm all ears! |
@alanmcruickshank cool! i'm re our ideas it was more or less this, basically allowing people to construct state in a very precise way, then figuring out what the best way for sqlfluff to use that in compile would be (for instance letting people customize the args to the compile command for the dbt templater via the |
@gwenwindflower - check out #5822 which supports a very minimal version of this. It might be best to get on the phone to talk more though and work through some of the details. In particular, this feature is still very experimental so it might not be ready for the big time yet - totally worth starting to test out to find bugs though! |
Search before asking
Description
Several users have asked if SQLFluff does (or could) support this feature. Another SQL formatting tool, sqlfmt supports this (although it's less sophisticated and doesn't actually parse the SQL code.)
Alan and Barry discussed how that might be done in SQLFluff:
this pr means we got fairly accurate links between closing and starting jinja tags: #3936
and the rework of placeholder segments (currently bound up in the reindent pr), means that we know where code is being rendered and where not.
...so we could detect if statements and find cases where code is never rendered.
the only thing we'd need to do would be to allow the templater to optionally return multiple versions of a query, parse them all and then lint them all - deduplicating the results.
if one of the versions fails, we'd probably hide the parsing errors because it could be use incorrectly interpreting the loops, because we wouldn't really understand the loops, we're just replacing code.
....thinking about it - the other thing we could do would be preprocess the templates, replacing any if statements with True or False to force clauses to be evaluated. 🤷
all of this feels quite hacky though. Maybe just allowing users to add a
--sqlfluff
directive which tells the templater what other options to insert for some context values to generate multiple versions of a query. Whatever we do it all boils down to linting multiple versions of the same query - which has a performance overhead.that being said. templating is fast. we could pretty quickly evaluate for any query if any alternative variations exist. For most queries, they won't - so performance is the same. We could also evaluate how many exist, and limit ourselves to only evaluating the most significant n (e.g. the variations with the most different code). Altering the templater to optionally return an iterable of TemplatedFile wouldn't be too hard, and deduplicating errors based on their source position already exists (#4041) and the same method could easily be used here, so that the user doesn't know the file got linted more than once.
I'm worried about the potential for extreme cases. Just as we have people trying to lint 100,000 line files, someone will try to line a file with 10,000 if statements, which means 2^10,000 possible paths.
Do you have thoughts on how to determine and limit to the most significant variations?
limiting is easy once we know how many variations.
determining the variations is the hard bit.
I'm tempted initially to just start generating them and then just stop iteration after generating 5.
then we filter from those files.
That would catch all the simple cases, and then we just warn for complicated cases.
That could be a good start. Now that I think about it, basically we just want to cover all the code. And JinjaTracer knows exactly what code is covered. So I think we could do this pretty easily. The tracer already does something slightly similar when building the trace -- it knows from the trace output what code got executed, but not necessarily how it got there. It has some very simple heuristics to determine that, e.g. if it executed code above the current point, then it needs to find a loop end whose beginning is above the target point. OTOH, if the next step is below, then find the shortest path to get there (nothing fancy, just skip over any "if"s).
Use case
Lint or format the whole SQL file without requiring the user to process the file multiple times with different Jinja variables.
Dialect
n/a
Are you willing to work on and submit a PR to address the issue?
Code of Conduct
The text was updated successfully, but these errors were encountered: