Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug to allow parenthesis query after INSERT INTO #1157

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jonathanlehto
Copy link
Contributor

Fixing a bug where the columns after hive check would consume a query

@jonathanlehto jonathanlehto changed the title added bugfix Added Query Consumption Bug Fix Feb 29, 2024
@alamb alamb changed the title Added Query Consumption Bug Fix Fix bug to allow parenthesis query after INSERT INTO Feb 29, 2024
@alamb
Copy link
Collaborator

alamb commented Feb 29, 2024

Hi @jonathanlehto -- can you please fix the CI tests?

I think one needs cargo fmt and there appears to be another error https://github.com/sqlparser-rs/sqlparser-rs/actions/runs/8098985632/job/22142812958?pr=1157

@jonathanlehto
Copy link
Contributor Author

Apologies, I may not be as familiar with MySQL syntax, but it looks like the test case error is due to asserting that the parser will fail the REPLACE statement. From comparing the INSERT and REPLACE docs, it looks like that statement should be succeeding (I could very well be wrong).

If that test case should fail, I can get another update in. I just want to make sure we are on the same page first!

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bearing with me @jonathanlehto

I left some questions about this PR

// make sure we are not about to consume a query
let mut after_columns = vec![];
match self.peek_nth_token(1).token {
Token::Word(w) if w.keyword != Keyword::SELECT => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for select seems incorrect here

What about a query like

INSERT INTO tbla (cola) (VALUES (1))

The Hive manual https://cwiki.apache.org/confluence/display/hive/languagemanual+dml doesn't appear to say anything about this syntax

Standard syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
 
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
 
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

I may be missing something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading those docs, it looks like the PARTITION keyword is required. I moved the after_columns parsing into that statement. I don't know why I didn't do that originally, but I think this should resolve the situation. I added another test case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have mis-communicated this. I went ahead and confirmed that the failing test case is incorrect.

Test case:

    #[test]
    fn test_replace_into_select() {
        let sql = r#"REPLACE INTO t1 (a, b, c) (SELECT * FROM t2)"#;

        assert!(Parser::parse_sql(&GenericDialect {}, sql).is_err());
    }

I created a local mysql table and successfully ran these commands.

mysql> INSERT INTO tab (col) (SELECT col FROM tab);
Query OK, 1 row affected (0.02 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> REPLACE INTO tab (col) (SELECT col FROM tab);
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0

However, you can not wrap values in mysql, copied below. From what I can tell, only Trino / Athena allow values statement to be wrapped.

mysql> INSERT INTO tab (col) VALUES (1);
Query OK, 1 row affected (0.02 sec)

mysql> INSERT INTO tab (col) (VALUES (1));
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(1))' at line 1

Unfortunately, I don't have the time to check every Dialect that is supported, but I know that Athena / Trino will succeed and at least MySQL will fail.

In this case, do we only want to succeed a parenthesis values statement in common, and fail for every other dialect? There seems to be another bug here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased and updated this with what my best guess for a solution would be. I only allow VALUES as a subquery in the GenericDialect. Let me know your thoughts

@@ -106,6 +106,9 @@ fn parse_insert_values() {
}

verified_stmt("INSERT INTO customer WITH foo AS (SELECT 1) SELECT * FROM foo UNION VALUES (1)");

// allow parenthesis query after insert into
verified_stmt("INSERT INTO tbla (cola) (SELECT cola FROM tblb)");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this syntax mean? Is it the same as

INSERT INTO tbla (cola) SELECT cola FROM tblb

(no parens?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, well at the least, running an explain statement in Athena (Trino) yields no difference between

INSERT INTO tbla (cola) SELECT cola FROM tblb
and
INSERT INTO tbla (cola) (SELECT cola FROM tblb)

And registers both as valid queries. From what I recall, I think you can generally wrap any select statements in parenthesis

@alamb
Copy link
Collaborator

alamb commented Mar 3, 2024

🤔 it seems like the CI is still failing on this PR

@coveralls
Copy link

coveralls commented Mar 5, 2024

Pull Request Test Coverage Report for Build 8238393022

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 25 of 26 (96.15%) changed or added relevant lines in 1 file are covered.
  • 376 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.05%) to 87.873%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/parser/mod.rs 25 26 96.15%
Files with Coverage Reduction New Missed Lines %
src/parser/mod.rs 66 82.93%
src/ast/mod.rs 310 79.97%
Totals Coverage Status
Change from base Build 8128645646: 0.05%
Covered Lines: 20506
Relevant Lines: 23336

💛 - Coveralls

@alamb
Copy link
Collaborator

alamb commented Mar 11, 2024

It appears the tests are still failing on this PR

@jonathanlehto
Copy link
Contributor Author

jonathanlehto commented Mar 11, 2024

I updated the mr to fix the lint and codestyle errors. I was hoping to get your thoughts on this comment, so, if you have some time, I would love to know if you think that implementation is okay!


let is_generic = dialect_of!(self is GenericDialect);
if !is_generic {
if let SetExpr::Values(_) = *subquery.body {
Copy link
Collaborator

@alamb alamb Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is applying a semantic check: https://github.com/sqlparser-rs/sqlparser-rs?tab=readme-ov-file#syntax-vs-semantics

I think any such errors should be generated downstream. Can you please remove this check?

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jonathanlehto -- sorry for the delay in reviewing -- the time I can spend on this crate is quite limited and I don't have the bandwidth to study problems in depth typically.

I took a look over this PR again and left some feedback. Let me know what you think

fn test_insert_into_values() {
let sql = r#"INSERT INTO t1 (a) VALUES(1)"#;

assert!(Parser::parse_sql(&GenericDialect {}, sql).is_ok());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb
Copy link
Collaborator

alamb commented Apr 9, 2024

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants