Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying subtitles with diacritics unexpectedly strips spaces #101

Open
artjomsR opened this issue Jul 27, 2023 · 6 comments
Open

Copying subtitles with diacritics unexpectedly strips spaces #101

artjomsR opened this issue Jul 27, 2023 · 6 comments

Comments

@artjomsR
Copy link
Contributor

artjomsR commented Jul 27, 2023

Environment

  1. OS name
    Windows 10, mpvacious_v0.23
    Describe the bug
    A description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Load subtitles with diacritics. Copy the subtitles either automatically or using Ctrl+C.
  2. Subtitles have spaces trimmed if string contains diacritics. So, "héllo héllo" becomes "héllohéllo"

Expected behavior
Subtitles are copied as-is

I can't seem to isolate where the problem lies. Individual components work correctly but collectively it doesn't work. A bat file outputs the content as expected:

@echo off
chcp 65001 >nul
echo héllo héllo|clip

I've also tried hardcoding the string and that produced strings with whitespaces correctly:

self.copy_to_clipboard = function(text)
    ...
    mp.commandv("run", "cmd.exe", "/d", "/c", string.format("@echo off & chcp 65001 >nul & echo %s|clip", "héllo héllo"))

I tried using set /p instead of echo but that didn't do anything

I've tried powershell but that didn't seem to work at all - it just flashes a terminal window without copying anything, maybe I didn't get the params quite right

Any suggestions would be appreciated

@tatsumoto-ren
Copy link
Member

tatsumoto-ren commented Jul 27, 2023

Any suggestions would be appreciated

You can use powershell to send text to clipboard, like it was done in videoclip, here. Not sure if you tried this exact method or not.

Also, have you tried GNU/Linux? Here is a list of recommended GNU distributions.

@artjomsR
Copy link
Contributor Author

Thank you for getting back to me! I've figured out the problem: this works fine before 0.20. And 0.21 is a breaking change. I've set nuke_spaces=no and it works as expected, strings with diacritics do not get spaces removed! 🎉

I assume this is because it finds a non latin character (é) and then assumes it's language e.g. Japanese?

Thank you for your suggestion, I'm using Linux as well but haven't switched completely due to other tools I'm using :)

@tatsumoto-ren
Copy link
Member

I assume this is because it finds a non latin character (é) and then assumes it's language e.g. Japanese?

Yes

@tatsumoto-ren
Copy link
Member

It would be beneficial if someone made a PR that replaces cmd.exe calls with powershell when copying text to the system clipboard. Initially, we implemented copying via cmd.exe for compatibility with windows 7, but today I think people don't use that anymore. I can't do this PR myself because I don't use windows.

@artjomsR
Copy link
Contributor Author

artjomsR commented Jul 29, 2023

If it ain't broke, don't fix it? In this particular issue, I don't think it's an issue with cmd, but rather the implementation of nuke_spaces. And I think this issue will be present in Mac / Unix systems too

I think it's a bug, because it's clearly confusing how spaces are being removed (and it used to work fine before 0.21). I think nuke_spaces should either check that

  • ALL characters are non-latin and only then trim spaces
  • Or maybe enable spaces as soon as ANY latin character is detected (if it's a mix of e.g. English and Japanese we still want spaces?)

@tatsumoto-ren
Copy link
Member

If it ain't broke, don't fix it? In this particular issue, I don't think it's an issue with cmd

Some concerns have been raised regarding the use of cmd. #20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants