Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime(java): Stop handpicking syntax groups for @javaTop #14727

Merged
merged 3 commits into from May 10, 2024

Conversation

zzzyxwvut
Copy link
Contributor

@zzzyxwvut zzzyxwvut commented May 7, 2024

Also:

  • Remove the obsolete comment for g:java_allow_cpp_keywords.
  • Remove the commented out groups java\%[Debug\]StringError.
  • Infer and set the preferred formatting Vim options from
    the modeline.
  • Set &encoding and &termencoding to "utf-8" for test files.
  • Limit non-ASCII charset to [§ƒɐɘʬʭΑ-Τα-μ] for test files.

Since vim-6-0u, non-contained syntax groups can be referred
to by using the contains=TOP... argument.

@zzzyxwvut zzzyxwvut marked this pull request as draft May 7, 2024 16:36
@zzzyxwvut
Copy link
Contributor Author

So, a syntax test for input/java_methods_style.java fails
on Mac (Huge, 14). Apparently for a multi-byte character 𝓔
(f0 9d 93 94).

@zzzyxwvut
Copy link
Contributor Author

Well, tests seem to pass now that &enc and &tenc are set
to "utf-8".

@zzzyxwvut
Copy link
Contributor Author

I have locally run make with script in xterm-256color
($TERM), with default &tenc and &enc, and with the options
set to "utf-8".

cd runtime/syntax/
script --log-out /tmp/make.script --log-timing /tmp/make.timing --command 'make clean test'
scriptreplay --divisor 0.05 --log-out /tmp/make.script --log-timing /tmp/make.timing

(I wonder if it would be beneficial to instrument make with
script for the CI so that its generated files and log files
can be downloaded for post-mortem investigation for matching
terminals.)

In either case, multi-byte characters are replaced on screen
with what appears like (ef bf bd) (Replacement Character)
when the generated script file is replayed at a slower rate.
So, I'm not sure if &enc and &tenc matter here.

It would be helpful if someone with access to Mac (12, 14)
and make have tried to locally reproduce test failure and
uploaded the generated failed/ test files.

@@ -341,8 +321,6 @@ if exists("java_highlight_debug")
syn region javaDebugStrTemplEmbExp contained matchgroup=javaDebugStrTempl start="\\{" end="}" contains=javaComment,javaLineComment,javaDebug\%(Paren\)\@!.*
syn region javaDebugStrTempl contained start=+\%(\.[[:space:]\n]*\)\@<="+ end=+"+ contains=javaDebugStrTemplEmbExp,javaDebugSpecial
syn region javaDebugStrTempl contained start=+\%(\.[[:space:]\n]*\)\@<="""[ \t\x0c\r]*$+hs=e+1 end=+"""+he=s-1 contains=javaDebugStrTemplEmbExp,javaDebugSpecial,javaDebugTextBlockError
" The next line is commented out, it can cause a crash for a long line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true? I'm curious about the bug rather than the error highlighting feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was never a test suite for it so I wouldn't know.

Apparently, javaStringError became a burden sometime around
October 2000, and javaDebugStringError was just recently
still its copy squirreled away.

Anyway, sooner or later any fledgeling Javaist would learn
that:

It is a compile-time error for a line terminator (§3.4) to
appear after the opening " and before the matching closing
".

and that no line splicing is supported for either "a" or
'a':

The line continuation escape sequence can appear in a text
block, but cannot appear in a character literal or a string
literal because each disallows a LineTerminator.

(JLS-21, §3.10.5, §3.10.7.)

Let's toss it out.

@chrisbra
Copy link
Member

chrisbra commented May 8, 2024

uploaded the generated failed/ test files.

Yeah, that is currently hard to do. I wondered if we can just simply cat all files from the failed/ directory, so one can at least copy/paste the content locally to be able to inspect it.

Somthing like we do for ASAN here:

- name: ASan logs
if: contains(matrix.extra, 'asan') && !cancelled()
run: |
for f in $(grep -lR '#[[:digit:]]* *0x[[:xdigit:]]*' "${LOG_DIR}"); do
asan_symbolize -l "$f"
false # in order to fail a job
done

Something like this perhaps?

bash -c "shopt -s globstar; tail -n +1  **/failed/*" 2>/dev/null || true

@zzzyxwvut
Copy link
Contributor Author

The failed test is not a regression. I have found at least
four other earlier run failures for Mac (Huge, 14): 1, 2, 3,
4, since the introduction of input/java_methods_style.java.
(The links are anchored at the top of the file, scroll down
a little to read the failure message.)

What is puzzling is that there are enough of successful test
runs in between the failures, and between the first failure
and the initial committing of the test file.

Maybe the fact that 𝓔 (Unicode category: Letter, Uppercase)
does not qualify as a letter in Vim but a punctuation symbol

vim/src/mbyte.c

Line 2951 in d5c8c09

{0x1d400, 0x1d7ff, 1}, // Mathematical Alphanumeric Symbols

can be a clue. (It is matchable with \P or \F, but not with
\k or \K.) Or is it a general aversion for characters made
of three or more bytes (it is four: f0 9d 93 94)?

Should I replace such problematic characters in tests?

Should I drop the setting of &enc and &tenc?

@chrisbra
Copy link
Member

chrisbra commented May 9, 2024

Yes, I would say keep it as simple as possible in the test, so 𝓔 should be replaced by something simpler. I think setting enc and tenc is fine, it's probably closer to what a user has set as well.

Also:

- Remove the obsolete comment for g:java_allow_cpp_keywords.
- Remove the commented out groups java\%[Debug\]StringError.
- Infer and set the preferred formatting Vim options from
  the modeline.

Since vim-6-0u, non-contained syntax groups can be referred
to by using the "contains=TOP..." argument.
@zzzyxwvut
Copy link
Contributor Author

With /[^[:alnum:][:blank:][:punct:]] for aid, I have looked
through all Java test files and replaced all characters made
of more than two bytes with either ASCII characters or some
two-byte characters ([§ƒɐɘʬʭΑ-Τα-μ]).

Let's see if these were the problem.

(By the way, testing on the linux console (Ctrl-Alt-F1, with
$TERM saying linux) is also successful, with and without
setting &enc and &tenc to "utf-8".)

@zzzyxwvut zzzyxwvut marked this pull request as ready for review May 9, 2024 16:46
@chrisbra
Copy link
Member

alright, thanks!

@chrisbra chrisbra merged commit 06bdac1 into vim:master May 10, 2024
35 checks passed
clason added a commit to clason/neovim that referenced this pull request May 10, 2024
runtime(java): Stop handpicking syntax groups for @javaTop (vim/vim#14727)

* runtime(java): Stop handpicking syntax groups for @javaTop

Also:

- Remove the obsolete comment for g:java_allow_cpp_keywords.
- Remove the commented out groups java\%[Debug\]StringError.
- Infer and set the preferred formatting Vim options from
  the modeline.

Since vim-6-0u, non-contained syntax groups can be referred
to by using the "contains=TOP..." argument.

* Set &encoding and &termencoding to "utf-8" for test files

* Limit non-ASCII charset to [§ƒɐɘʬʭΑ-Τα-μ] for test files

vim/vim@06bdac1

Co-authored-by: Aliaksei Budavei <[email protected]>
clason added a commit to neovim/neovim that referenced this pull request May 10, 2024
runtime(java): Stop handpicking syntax groups for @javaTop (vim/vim#14727)

* runtime(java): Stop handpicking syntax groups for @javaTop

Also:

- Remove the obsolete comment for g:java_allow_cpp_keywords.
- Remove the commented out groups java\%[Debug\]StringError.
- Infer and set the preferred formatting Vim options from
  the modeline.

Since vim-6-0u, non-contained syntax groups can be referred
to by using the "contains=TOP..." argument.

* Set &encoding and &termencoding to "utf-8" for test files

* Limit non-ASCII charset to [§ƒɐɘʬʭΑ-Τα-μ] for test files

vim/vim@06bdac1

Co-authored-by: Aliaksei Budavei <[email protected]>
@zzzyxwvut
Copy link
Contributor Author

It is still a problem, now with Τ (GREEK CAPITAL LETTER TAU)
of two bytes (ce a4) BUT for Mac (Huge, 12).

And there are two other candidates: input/vim_keymap.vim and
input/vim9_keymap.vim with their á (c3 a1) and (e2 80 9c).

Since it fails only so often, can it be time related (sleep
or TermWait())? And without actual failed/ files to look
at, it is still a mystery who's at fault here: readfile() or
term_dumpwrite()?

Another fun fact: when Vim is started with LC_ALL=C and it
is needed to set &enc and &tenc to "utf-8" for the terminal
buffer, it turns out that they must be set in a non-terminal
buffer as well to have effect for the terminal buffer. Huh?
So, you would probably just set it once before term_start()
is called. (That should account for replacement characters
() filling in for other multi-byte characters that was
mentioned above.)

@zzzyxwvut
Copy link
Contributor Author

@chrisbra, there appears to be a way to fetch failed/* files
with the upload-artifact action.

Storing artifacts uses storage space on GitHub. GitHub
Actions usage is free for standard GitHub-hosted runners
in public repositories, and for self-hosted runners.

Just following GitHub examples, we can give it a try with
(please double check my YAML):

jobs:

  . . . . . . .

  macos:
    steps:

      . . . . . . .

      - name: Test
        timeout-minutes: 20
        run: |
          make ${TEST}

      - name: Upload failed syntax tests
      - uses: actions/upload-artifact@v4
        with:
          # Name of the artifact to upload.
          # Optional. Default is 'artifact'
          # (Note that due to how artifacts are created in this v4
          # version, it is no longer possible to upload to the same
          # named artifact multiple times. You must either split the
          # uploads into multiple artifacts with different names, or
          # only upload once. Otherwise you will encounter an error.)
          name: ${{ matrix.runner }}-${{ matrix.features }}-${{ github.workflow_sha }}-${{ github.run_attempt }}-failed-syntax-tests

          # A file, directory or wildcard pattern that describes what
          # to upload.
          # Required.
          # (Note that within an individual job, there is a limit of
          # 500 artifacts that can be created for that job.)
          path: ${{ github.workspace }}/runtime/syntax/testdir/failed/

          # The desired behavior if no files are found using the
          # provided path.
          # Available Options:
          #   warn: Output a warning but do not fail the action
          #   error: Fail the action with an error message
          #   ignore: Do not output any warnings or errors, the action
          #     does not fail
          # Optional. Default is 'warn'
          if-no-files-found: ignore

          # Duration after which artifact will expire in days. 0 means
          # using default retention.
          # Minimum 1 day.
          # Maximum 90 days unless changed from the repository
          # settings page.
          # Optional. Defaults to repository settings.
          retention-days: 0

          # The level of compression for Zlib to be applied to the
          # artifact archive.
          # The value can range from 0 to 9.
          # For large files that are not easily compressed, a value of
          # 0 is recommended for significantly faster uploads.
          # Optional. Default is '6'
          compression-level: 9

          # If true, an artifact with a matching name will be deleted
          # before a new one is uploaded.
          # If false, the action will fail if an artifact for the
          # given name already exists.
          # Does not fail if the artifact does not exist.
          # Optional. Default is 'false'
          overwrite: true

Zip files should be available for download:

At the bottom of the workflow summary page, there is
a dedicated section for artifacts.

and there are also REST API endpoints for Curl and friends.

@chrisbra
Copy link
Member

thanks, I created #14771 for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants