-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whitespace handling in some contexts is ambiguous #53
Comments
Thanks for the feedback. The syntax page (http://hjson.org/syntax.html#quoteless-strings) does mention "preceding and trailing whitespace is ignored" but you are right that it is missing from the RFC. I will update and clarify this (along with multiline strings) when I update the spec for #56. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There are some behaviors related to whitespace that neither the syntax page nor the RFC specifies. As long as they exist, these ambiguities reduce my trust in Hjson.
(search keywords: ambiguous, ambiguity, ambiguities, unspecified)
Is whitespace at the beginning of quoteless strings included?
In the above Hjson, there are two spaces after the colon. Is the value
"very high"
," very high"
, or" very high"
? I am guessing you want the answer to be the first one, but this is not clear.The Objects section of the RFC contains the grammar
name-separator ws-c value
. Ifws-c
is greedy, then the answer is that initial whitespace is consumed before starting the quoteless string. However, nowhere does the RFC, nor the referenced RFC 5234, explicitly spell out that repetitions should be greedy.Whether you choose to define that repetition to be greedy or not, I would also suggest that you spell out this behavior in the Quoteless Strings section for clarity. You could write something like this:
(The HTML character entity
will be helpful when writing examples like the above. I used that entity after the colon so as to preventsize:
andlarge
from being on separate lines, which would make the number of spaces used unclear.)This ambiguity was one of the first objections I had to quoteless strings when I read about them on http://hjson.org/. So I think it is worth explaining the correct behavior on the syntax page, not just in the RFC.
Handling inconsistent indentation for multiline strings
This is the current explanation of whitespace handling for multiline strings:
But consider this Hjson:
The rules do not say what to do with non-whitespace characters before the first column.
I think such formatting should be specified to be invalid Hjson. The document must be fixed by the author before being parsed. Otherwise it is unclear whether the author wanted the string to be
"one\ntwo\nthree"
,"one\ntwo\n three"
, or" one\ntwo\n three"
.Also, in “whitespace up to the column of the first single quote”, “column” is not well-defined. Consider this multiline string, where
.
means a space and\t
means a horizontal tab:What column does
..\t
reach? You could specify that one tab equals 8 spaces, so “hello” starts on column 10, meaning the string starts with 6 spaces. Or you could specify that one tab equals 4 or 2 spaces, because those sizes are more common for indentation. Or you could specify that as in plain text, a tab moves to the next tabstop – the next column that is a multiple of 8. In that case, “hello” would be on column 8 and the string would start with 4 spaces.Because all those options are plausible, I would suggest removing reference to “column” in the specification, and defining inconsistent indentation to be an error. Something like this:
The text was updated successfully, but these errors were encountered: