Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguous status of octothorpes and semicolons #46

Open
edukisto opened this issue Jul 15, 2021 · 12 comments
Open

Ambiguous status of octothorpes and semicolons #46

edukisto opened this issue Jul 15, 2021 · 12 comments

Comments

@edukisto
Copy link

On the one hand, an inline sequence with a leading # and ; “shall neither be parsed as a comment nor as part of the section name, pair (defined below) key or value in which it was inserted” (from the spec).

On the other hand, this and this normative tests allow # and ; inside values.

Could you please clarify which of the statements is correct?

@xuhdev
Copy link
Member

xuhdev commented Jul 15, 2021

This is strange. The spec should be correct. I think we should delete those two tests. Hope I'm not missing anything here? @jedmao @cxw42

@edukisto
Copy link
Author

edukisto commented Jul 15, 2021

It seems to me that it is necessary to clarify that # and ; are special characters that require escaping in section names, keys and values. An unescaped character makes the entire string invalid (non-standard).

@edukisto
Copy link
Author

Below are some examples.

EditorConfig for Visual Studio Code v0.16.4 ignores section names with unescaped # or ;. Properties of these sections get carried over to the previous section.

If you use Sublime Text plugin for EditorConfig v1.0.0, a single unescaped # or ; in a section name destroys the whole configuration. The same thing happens with the EditorConfig Vim Plugin v1.1.1.

All these plugins allow an inline comment after a key-value pair. In this case, the Sublime Text plugin requires a space before #/; and ignores the property otherwise.

@jednano
Copy link
Member

jednano commented Jul 16, 2021

The spec was written to align with the existing implementation and tests. See also these tests for the Rust implementation. I never really liked this rule, but it would be a breaking change to remove this "feature."

cc @florianb

@jednano
Copy link
Member

jednano commented Jul 16, 2021

Also, if the intent is to remove this behavior, deleting the tests is not enough. We should change the tests to explicitly not support this.

@florianb
Copy link
Member

florianb commented Jul 16, 2021 via email

@florianb
Copy link
Member

florianb commented Jul 16, 2021 via email

@edukisto
Copy link
Author

It is clear that developers implement non-standard inline comments at their own risk. But...

  1. “Section header... may contain any characters between the square brackets”.

    “Inserting a # or ; after non-whitespace characters in a line (i.e., inline) shall neither be parsed as a comment nor as part of the section name... in which it was inserted”.

    • Am I correct in understanding that # or ; may be a part of the header but not a part of the name (i. e., [foo#not_a_part_of_the_name] is valid and corresponds to the foo file)? AFAIK, all the plugin developers understood it differently. Some of them simply ignore such headers.
    • How do I specify a file named # or ;? According to the specification, [#] and [;] are invalid. The test set, at the same time, points directly to the solution, so if you want a # [1; 2] or ; [3; 4] in a section name, you should escape it. All the plugins I’ve seen associate [\#] and [\;] with the # and ; files respectively.
  2. “Inserting a # or ; after non-whitespace characters in a line (i.e., inline) shall neither be parsed as a comment nor as part of the... pair (defined below) key or value in which it was inserted”.

    • The specification implicitly defines # and ; inside values as non-recommended special characters, but tests don’t interpret them (both escaped and unescaped) this way [5; 6].

While the specification simply doesn’t recommend # and ; inside section names, keys, and values, the test set hints that these characters in escaped form are still possible, so developers should not simply ignore them.

@florianb
Copy link
Member

@edukisto thank you very much for pointing that out!

English is not my mother tongue – i guess i am not the best person to answer this right now and i always had my struggles with "neither" constructs.

For a more streamline EditorConfig future i would love to see inline comment symbols being entirely ignored (simply being treated as part of the name, key or value).

I don't know how others look at this issue - but i guess we need to clarify the specification at that point. I would also like to think about moving away from inline-comment handling at all.

@xuhdev
Copy link
Member

xuhdev commented Jul 20, 2021

I agree with @florianb that it is also an option to completely ignore inline comment symbols (simply being treated as part of the value (not key or name)). This is in line with what our tests currently enforce and is also a reasonable choice. I agree we should clarify on this -- shall we start a poll on this?

@edukisto
Copy link
Author

@florianb, @xuhdev, the main problem is that the specification implicitly classifies # and ; as special characters. These characters are not recommended for usage, but this is what makes them special. It seems to me that it would be wrong to just declare them as regular characters.

Moreover, according to these tests, there are three special characters: #, ;, and \.

I’ve tested C, JavaScript, and Python cores and here are some suggestions below.

  1. All the cores pass [test\#.c] (called from here) and [test\;.c] (called from here).

    If you decide that escaping of # and ; in section names is impossible, then [\#], [\;], and [\] will point to \#, \;, and '', respectively. This doesn’t match any implementation I know.

    What files do the section headers point to?

    Section header C and Python JavaScript Desired behavior
    [#] File error # #
    [\#] # # \#
    [;] File error ; ;
    [\;] ; ; \;
    [\] Line error \ \
    [\\] \ \ \\

    It seems to me that the requirement to escape #, ;, and \ inside section headers is the least painful way to eliminate the ambiguity.

  2. The JavaScript core fails [test11.c] (called from here) and [test5.c] (called from here).

    The JavaScript core also fails [test12.c] and [test6.c]. However, I can’t see the purpose of these tests, because they are not called from anywhere.

    Are inline comments included in values?

    Key/value pair C and Python JavaScript
    k=v#c v#c v
    k=v;c v;c v
    k=v\#c v\#c v\\
    k=v\;c v\;c v\\
    k=v #c v v
    k=v ;c v v
    k=v \#c v \#c v \\
    k=v \;c v \;c v\\

    As we can see, escaping of # and ; only matters if there is a space before the special character.

    Besides, there is no technical need to maintain inline comments.

    It is obvious that the behavior should be unified, but I don’t see a simple way to do it yet.

@florianb
Copy link
Member

@edukisto – thank you very much for your exhausting analysis. Indeed - the current ecosystem of Ini-parsers is very fragmented in multiple ways.

My intention at this point would be to first simplify the specification and fix any ambiguities. This would also issue the demand to fix all dependent implementations in the second step. This is no easy task and i guess it will take time.

The one-ini project already aims to get rid of these underlying parser based ambiguities. And i guess i should (besides of the fact that i need to spend time on one-ini after i return from holiday) also propose a draft for a process-guidance for implementors, also providing goals for text mangling tools, fixers and linters.

Your feedback came to the right time - we finally need to fix the issues with the ini format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants