Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unquoted Values #723

Open
ryber opened this issue Oct 18, 2021 · 21 comments
Open

Support Unquoted Values #723

ryber opened this issue Oct 18, 2021 · 21 comments
Labels
pr-needed Feature request for which PR likely needed (no active development but idea is workable)

Comments

@ryber
Copy link

ryber commented Oct 18, 2021

In both Gson and org.json a unquoted value will be interpreted as a string. So given this array (note that apple is not quoted):

[33.5, 42, "foo", true, apple]

The resulting ArrayNode equivalent in those frameworks (JsonArray and JSONArray respectfully) will interpret it as

[33.5, 42, "foo", true, "apple"]

I have a OSS utility (Unirest-Java), that will allow users to bring their own favorite Json parsing engine. So I have an abstraction for them, and trying to make them operate with the same rules. This one I can't deal with unless Jackson were to handle it. I noticed you have a JsonReadFeature for ALLOW_UNQUOTED_FIELD_NAMES, it would be super cool if there was a ALLOW_UNQUOTED_VALUES

Thanks for all the hard work. Jackson is a fantastic library and a real backbone of the world of Java you should be really proud of it!

@cowtowncoder
Copy link
Member

Yes, this would require a new option like JsonReadFeature.ALLOW_UNQUOTED_VALUES. I probably won't have time to tackle this, but maybe someone else might find time.
This feature would need to be implemented for all 4 backends (Reader, InputStream, DataInput, non-blocking); some of code used for unquoted field names might prove valuable.

One thing to note is that the definition of where an unquoted value ends will probably require "interesting" heuristics -- given that such values are not legal JSON (and shame on org.json / Gson accepting them by default...), there is no definition of how such values should be decoded; choice of end markers is rather large.

@cowtowncoder cowtowncoder added the pr-needed Feature request for which PR likely needed (no active development but idea is workable) label Jun 20, 2022
@micw
Copy link

micw commented Nov 5, 2022

Hello @cowtowncoder,
would you probably find the time to investigate in this feature request?

Kind regards,
Michael.

@cowtowncoder
Copy link
Member

@micw No I don't think I will have time to work on this anytime soon. Person who has worked on similar things lately is @pjfanning (not saying he'd have time or interest, but that he is probably quite familiar with code by now).

@pjfanning
Copy link
Member

Honestly, I simply don't understand the need to support cases that are disallowed by the JSON spec - this change would be low priority for me.

@cowtowncoder
Copy link
Member

cowtowncoder commented Nov 6, 2022

@pjfanning Fair enough. I think majority of these requests are for various JSON-like variants like something called "JSON5" (see #734), and may make slightly more sense. But I agree that it's sort of slippery slope.

@GedMarc
Copy link

GedMarc commented Nov 6, 2022

unquoted keys are one thing, unquoted values seem to border on the ridiculous - You would need to restrict all values to not have the "," character, or set the end of key-value pair value to anything.... which many, MANY, will -
Then on investigation you still have to determine if its numeric, boolean, or a string value - overhead would be crazy -

Not a viable request in my opinion @micw @ryber

@pjfanning
Copy link
Member

I'm not sure this is allowed by json5 - see https://json5.org/ - Strings can be single or double quoted but need some sort of quotes - are you said above, without quotes, it is hard to spot the end of the string.

In the example from the OP, you would need to trim the string - so you would probably need 2 deserialization features - one to support having no quotes and the other to support trimming the results.

@micw
Copy link

micw commented Nov 6, 2022

My use case is to process lot of json from sources not under my controll. In particular, I read statistics from "freifunk" nodes (free, community driven wifi mesh nodes) to combine it to a map and load it into grafana.

It used to happen that the script generated JSON has some missing quotes. Especially when a script that is expected to return a number returns an error message instead ;-)

I agree that this is a rare use case and the better option would be to fix the script. But on the other hand, there is an implementation with a different JSON parser (need to figure out which one) which does not fail on this issue ;-)

@cowtowncoder
Copy link
Member

@GedMarc Yeah. YAML is pretty problematic when "no" means false etc.

@pjfanning Ok. I wasn't sure if this was specifically related to JSON5; good to know it is not.

@cowtowncoder
Copy link
Member

@micw Thank you for sharing more details of your use case. Yes, such "JSON" should be fixed if and when it is a flow in generation and not following some JSON-like alternative format.

But just to be clear: any JSON parser that accepts such values is technically non-compliant. Jackson strictly enforces JSON specification unless instructed to allow certain deviations. So even if GSON did alllow this (does it really? :-o ), or org.json I would not consider that a strong argument for inclusion.

@ryber
Copy link
Author

ryber commented Nov 7, 2022

Yes, GSON and org.json really do this. Here are a couple of tests (BTW, I personally don't really care, I think it's silly that they actually do this, I only opened the issue originally to kind of document the discrepancy:

   @Test
    void orgJsonUnquotedvalues() {
        JSONObject o = new JSONObject("{\"foo\": bar, \"baz\": qux}");
        assertEquals("bar", o.getString("foo"));
    }

   @Test
    void gsonUnquotedValues() {
        Gson gson = new Gson();
        JsonObject e = gson.fromJson("{\"foo\": bar, \"baz\": qux}", JsonObject.class);
        assertEquals("bar", e.get("foo").getAsString());
    }

@cowtowncoder
Copy link
Member

Ok thank you for confirming @ryber. Interesting. Filing an issue (be it for documentation or having option for strict handling) makes sense.

@ryber
Copy link
Author

ryber commented Nov 7, 2022

Interestingly, org.json will let you have spaces in the value, but GSON will not.

@micw
Copy link

micw commented Nov 7, 2022

Yes. Once you leave the spec path, results might be surprising ;-)

@niquola
Copy link

niquola commented Sep 2, 2023

Hi, guys. This is an essential feature for us!
We need identifiers in json (like symbols in cloure/EDN or variables in js):

{ 
  type: namespace.Identifier,
  subtype: Enum/Type,
  value: "...",

}

Can I somehow extend the existing Jackson (something like a hook)? Or can it only be implemented in the core?

@niquola
Copy link

niquola commented Sep 2, 2023

Hi, guys. This is a very important feature for us! Can I somehow extend the existing Jackson (something like a hook)? Or can it only be implemented in the core?

Oh, I see, it is hardcoded here

public String nextFieldName() throws IOException

@cowtowncoder
Copy link
Member

Yes, there are no hooks as this really is a low-level tokenization aspect and for performance reason that is not modular.
Nor is it clear whether supporting things that are strictly against JSON spec is beneficial per se.

Having said that, there are a few other opt-in settings so if someone was to tackle this, it would be considered for inclusion.

@micw
Copy link

micw commented Sep 2, 2023

A pluggable tokenizer with implementation for "strict" and "lax" could do it and should not affect performance.

@pjfanning
Copy link
Member

pjfanning commented Sep 2, 2023

It might be faster to use https://github.com/marhali/json5-java and if you need to integrate with other Jackson modules, you could write a class that implements the Jackson JsonParser API and that delegates to the json5-java parser.

This could be done in a 'dataformat' module - a bit like how https://github.com/FasterXML/jackson-dataformat-xml delegates the parser/generator work to woodstox.

@cowtowncoder
Copy link
Member

A pluggable tokenizer with implementation for "strict" and "lax" could do it and should not affect performance.

I don't think so: adding true pluggability itself would almost certainly add measurable overhead if done within jackson-core.
I have no interest in spending time to create unnecessary abstractions (which I consider this to be).

Implementing alternate handling with switches (as is done for other aspect) is different story and does not need to add significant overhead.

As per @pjfanning's suggestion, yes, implementing alternate format backend would make sense for anyone wanting to tackle this: this is kind of pluggability Jackson already supports quite well.

@ash211
Copy link

ash211 commented Apr 13, 2024

Relevant for predibase/lorax#392, I would be interested in a Jackson option that allowed parsing this invalid JSON:

{"key": 2022-02-02}

as if it were this valid JSON:

{"key": "2022-02-02"}

by treating the first character after {"key": that did not match regex [0-9T:.+-] as the end of the string. The regex represents characters in valid ISO 8601 datetimes with time zone (might need tweaks to be 100% correct).

My first preference is to fix the code creating this invalid JSON to create valid JSON instead of using a Jackson option, but if a Jackson option did exist I would be using it in the interim until the JSON generation is fixed. Sharing as a data point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-needed Feature request for which PR likely needed (no active development but idea is workable)
Projects
None yet
Development

No branches or pull requests

7 participants