Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint and fix areas of templated code that are not actually executed #4202

Draft
wants to merge 78 commits into
base: main
Choose a base branch
from

Conversation

barrywhart
Copy link
Member

@barrywhart barrywhart commented Dec 24, 2022

Notes for remaining work & debugging:

  • Take the list of violations returned by LintedFile.get_violations() and apply only those using BaseSegment.apply_fixes(). This will replace the current patch-based strategy.
  • New design: 1. Lint the main variant using the normal linter loop (looping over rules). When it generates a fix, run the same rule sequentially against the other variants to see if they generate the same violation. If so, keep and apply it to all the variants. If not, discard it. 2. After step 1 finishes, now run the linter against each variant, retaining fixes only if they are confined to code not rendered in the main variant. This two-part design has already been implemented as a post-processing step in LintedFile.get_violations()`.
  • When generating variants, only allow one variant per unreached slice. This will ensure in fix_string() that we don't have to reconcile varying changes to the same slice. The greedy_set_cover() function here may be useful for choosing a minimum set of variants that covers as much of the raw file as possible.
  • Are all the new tests working?
  • Think about (and test) cases where there are more than 2 variants

Tests:

  • L025: Tables should not be aliased if that alias is not used
    • Define alias in shared code, only use in one 'if' branch
  • L027: References should be qualified if select has more than one referenced table/view
    • Define shared first table without alias, then in one 'if' branch, reference a second table without alias
  • L028: References should be consistent in statements with a single table.
  • L010: Inconsistent capitalisation of keywords
    • 'if' at top of file, shared keyword at bottom
  • L014: Inconsistent capitalisation of unquoted identifiers.

==

  • Review list of SQLFluff rules to come up with new test cases. Focus on rules that "remove unused aliases", things like that which span across areas of the file.
  • Consider limiting the rules that run on the variants (e.g. just the formatting and capitalization rules, safe and/or localized changes)

Brief summary of the change made

Fixes #4061

Are there any other side effects of this change that we should be aware of?

Pull Request checklist

  • Please confirm you have completed any of the necessary steps below.

  • Included test cases to demonstrate any code changes, which may be one or more of the following:

    • .yml rule test cases in test/fixtures/rules/std_rule_cases.
    • .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with tox -e generate-fixture-yml).
    • Full autofix test cases in test/fixtures/linter/autofix.
    • Other.
  • Added appropriate documentation for the change.

  • Created GitHub issues for any relevant followup/future enhancements if appropriate.

@barrywhart barrywhart marked this pull request as draft December 24, 2022 19:13
trace = tracer_probe.trace(append_to_templated=append_to_templated)
print(f"Yielding trace for {trace.templated_str!r}")
yield trace.raw_sliced, trace.sliced_file, trace.templated_str
return
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this return to enable "multi-linting"

@barrywhart barrywhart marked this pull request as ready for review December 30, 2022 19:27
@barrywhart barrywhart marked this pull request as draft December 31, 2022 18:55
@alanmcruickshank
Copy link
Member

I'm going to try and pick this work up and finish it off - that will likely involve quite a bit of rework - and I might start this as a new branch. I'll keep this open until I've ported the code across and then close this PR at that point.

@alanmcruickshank
Copy link
Member

Just picking up on this for anyone watching it. Regardless of the implementation here, this feature would likely mean longer runtimes (because we end up parsing and linting multiple versions of the same file). In the best case scenario we can use the "base" variant as a hint for parsing the other versions, but it's still likely going to be longer.

I'll pick this up once I've made a fair inroad into that performance frontier.

@barrywhart
Copy link
Member Author

@alanmcruickshank: A followup to your performance comment -- that is true, but we have coverage information on the rendered template and can skip the whole process of generating and linting variants if the initial rendering covered everything. So in most cases, there'll be no impact on performance. I think the draft PR does this.

@alanmcruickshank
Copy link
Member

Oh, totally agree for files that are fully covered in the first pass, it's the others that will experience the slowdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the ability to lint and fix areas of templated code that are not actually executed
3 participants