Skip to content

Issue with Parsing NDJSON File in DuckDB: Unexpected Quotation Marks #12188

Answered by lnkuiper
Max0u asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @Max0u, I think the issue is that we're not annotating the strings going into the Parquet file as being the Parquet JSON type. Therefore, the type is interpreted by pqrs as a VARCHAR, and surrounded by double quotes.

If we add a cast like so:

duckdb -c "COPY (SELECT * FROM read_ndjson('path_to_file.ndjson', maximum_depth=1)) TO 'my.parquet'";
duckdb --jsonlines -c "SELECT field1::JSON field1, field2::JSON field2 FROM 'my.parquet'";

We get proper JSON output without the double quotes:

{"field1":"value1","field2":{"subfield1":"subvalue1"}}
{"field1":"value2","field2":{"subfield2":"subvalue2"}}

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@Max0u
Comment options

@Tishj
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by Max0u
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants