Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace handling in some contexts is ambiguous #53

Open
roryokane opened this issue Aug 21, 2016 · 1 comment
Open

Whitespace handling in some contexts is ambiguous #53

roryokane opened this issue Aug 21, 2016 · 1 comment

Comments

@roryokane
Copy link

roryokane commented Aug 21, 2016

There are some behaviors related to whitespace that neither the syntax page nor the RFC specifies. As long as they exist, these ambiguities reduce my trust in Hjson.

(search keywords: ambiguous, ambiguity, ambiguities, unspecified)

Is whitespace at the beginning of quoteless strings included?

rating:  very high

In the above Hjson, there are two spaces after the colon. Is the value "very high", " very high", or "  very high"? I am guessing you want the answer to be the first one, but this is not clear.

The Objects section of the RFC contains the grammar name-separator ws-c value. If ws-c is greedy, then the answer is that initial whitespace is consumed before starting the quoteless string. However, nowhere does the RFC, nor the referenced RFC 5234, explicitly spell out that repetitions should be greedy.

Whether you choose to define that repetition to be greedy or not, I would also suggest that you spell out this behavior in the Quoteless Strings section for clarity. You could write something like this:

Quoteless strings cannot start with whitespace – in all contexts, whitespace is consumed before the start of a quoteless string. So, for example, in the object key-value pair size: large, the quoteless string after the colon is equivalent to the JSON string "large", not to " large".

(The HTML character entity   will be helpful when writing examples like the above. I used that entity after the colon so as to prevent size: and large from being on separate lines, which would make the number of spaces used unclear.)

This ambiguity was one of the first objections I had to quoteless strings when I read about them on http://hjson.org/. So I think it is worth explaining the correct behavior on the syntax page, not just in the RFC.

Handling inconsistent indentation for multiline strings

This is the current explanation of whitespace handling for multiline strings:

  • Whitespace on the first line is ignored.
  • The first three single quotes define the head. On the following lines all whitespace up to the column of the first single quote is ignored.
  • All other whitespace is assumed to be part of the string.
  • The last newline is ignored to allow for better formatting.

But consider this Hjson:

  '''
  one
two
  three
  '''

The rules do not say what to do with non-whitespace characters before the first column.

I think such formatting should be specified to be invalid Hjson. The document must be fixed by the author before being parsed. Otherwise it is unclear whether the author wanted the string to be "one\ntwo\nthree", "one\ntwo\n  three", or "  one\ntwo\n  three".

Also, in “whitespace up to the column of the first single quote”, “column” is not well-defined. Consider this multiline string, where . means a space and \t means a horizontal tab:

....'''
..\thello
....'''

What column does ..\t reach? You could specify that one tab equals 8 spaces, so “hello” starts on column 10, meaning the string starts with 6 spaces. Or you could specify that one tab equals 4 or 2 spaces, because those sizes are more common for indentation. Or you could specify that as in plain text, a tab moves to the next tabstop – the next column that is a multiple of 8. In that case, “hello” would be on column 8 and the string would start with 4 spaces.

Because all those options are plausible, I would suggest removing reference to “column” in the specification, and defining inconsistent indentation to be an error. Something like this:

If the sequence of whitespace characters on every line after the starting ''', including the line of the closing ''', do not exactly match the sequence of whitespace characters before the starting ''', then the multiline string is invalid Hjson. For example:

[give an example here]

@laktak
Copy link
Member

laktak commented Aug 22, 2016

Thanks for the feedback. The syntax page (http://hjson.org/syntax.html#quoteless-strings) does mention "preceding and trailing whitespace is ignored" but you are right that it is missing from the RFC.

I will update and clarify this (along with multiline strings) when I update the spec for #56.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants