Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving interop-issue while running wsl-vpnkit as a systemd-service in own wsl-distro #247

Open
dabeck81 opened this issue Nov 8, 2023 · 12 comments

Comments

@dabeck81
Copy link

dabeck81 commented Nov 8, 2023

Description of issue:

I created my own wsl-distribution where I wanted to start the wsl-vpnkit immediatly with a systemd-service upon start.

For this I created a dockerfile containing following instructions:

RUN wget -cO wsl-vpnkit.tar.gz "https://github.com/sakai135/wsl-vpnkit/releases/download/v0.4.1/wsl-vpnkit.tar.gz" \
    && mkdir -p /opt/wsl-vpnkit \
    && tar --directory /opt/wsl-vpnkit --strip-components=1 -xf wsl-vpnkit.tar.gz app/wsl-vpnkit app/wsl-gvproxy.exe app/wsl-vm \
    && rm -r /wsl-vpnkit.tar.gz

RUN systemctl enable wsl-vpnkit

As you can see the wsl-vpnkit is installed under /opt/wsl-vpnkit

I adapted the wsl-vpnkit.service file to point to that directory, but I quickly encountered upon inspecting the journalctl-logs of the service, that I constantly received the wsl-gvproxy.exe is not executable due to WSL interop settings or Windows permissions-error

Solution implemented

The problem was that when the systemd-service started, there was no WSL_INTEROP environment variable set to point towards the socket used for running ".exe" files in a WSL-distro.

I came up with following solution while reading following WSL-thread around this issue: microsoft/WSL#5065

Adapt service-file as follows:

[Unit]
Description=wsl-vpnkit

[Service]
# for wsl-vpnkit setup as a standalone script
# important to set type to idle, we want the service to be one of the last ones to be executed
Type=idle
# before running the wsl-vpnkit script we want to provide the correct WSL_INTEROP variable
ExecStart=/bin/sh -c '. /etc/systemd/system/wsl-interop-env.sh; /opt/wsl-vpnkit/wsl-vpnkit'
Environment=VMEXEC_PATH=/opt/wsl-vpnkit/wsl-vm
Environment=GVPROXY_PATH=/opt/wsl-vpnkit/wsl-gvproxy.exe

Restart=always
KillMode=mixed

[Install]
WantedBy=multi-user.target

And here you have the wsl-interop-env.sh script that initializes the WSL_INTEROP variable:

#!/bin/sh

export WSL_INTEROP=
for socket in $(ls /run/WSL|sort -n); do
   if ss -elx | grep "$socket"; then
      export WSL_INTEROP=/run/WSL/$socket
   else
      rm $socket
   fi
done

Proposal as a change in wsl-vpnkit

Add a check in the wsl-vpnkit-script to set the WSL_INTEROP variable when it is empty/not existant based upon the code in the wsl-interop-env.sh script.

If you have another working proposal to run wsl-vpnkit as a service within your own distro, please let me know

Thanks for this awsome project

@pauko
Copy link

pauko commented Nov 14, 2023

Hi @dabeck81!

I encountered the same problem with wsl-vpnkit. Thanks to your instructions above I was able to get it running partially, but not as I'd expect. It works as long as I start the systemd service manually with sudo, but not on start-up of my WSL distribution (after I did systemctl enable wsl-vpnkit). Then the "wsl-gvproxy.exe not executable" error shows up again.

I understand you implemented the "standalone script" setup variant mentioned in the top-level README.

What needs to be modified in addition to your fix that wsl-vpnkit doesn't run into the "wsl-gvproxy.exe not executable" issue when using the "distro setup"?

@pauko
Copy link

pauko commented Nov 15, 2023

Hi @dabeck81!

I encountered the same problem with wsl-vpnkit. Thanks to your instructions above I was able to get it running partially, but not as I'd expect. It works as long as I start the systemd service manually with sudo, but not on start-up of my WSL distribution (after I did systemctl enable wsl-vpnkit). Then the "wsl-gvproxy.exe not executable" error shows up again.

I understand you implemented the "standalone script" setup variant mentioned in the top-level README.

What needs to be modified in addition to your fix that wsl-vpnkit doesn't run into the "wsl-gvproxy.exe not executable" issue when using the "distro setup"?

I have to correct my above comment, it is my bad.
I did not copy @dabeck81 's wsl-vpnkit.service exactly, for instance, I missed the Type=idle.

Also, my experience might also be impaired by some other issue I encountered with the local proxy px I use in combination - see px 0.8.4: PAC configuration seems to cause unstable behaviour. Maybe there is an interdependency with this.

So, to conclude, it seems like @dabeck81 's solution works, indeed!

However, I'm still interested in the "distro setup".

@santschi
Copy link

santschi commented Nov 15, 2023

I used @dabeck81 script and modified it for the distro setup. I only did the following:

Create the script:

/usr/local/bin/start-vpn-kit.sh

#!/bin/sh

export WSL_INTEROP=
for socket in $(ls /run/WSL|sort -n); do
   if ss -elx | grep "$socket"; then
      export WSL_INTEROP=/run/WSL/$socket
   else
      rm $socket
   fi
done

# also start wsl-vpnkit
/mnt/c/Windows/system32/wsl.exe -d wsl-vpnkit --cd /app wsl-vpnkit

Don't forget to chmod +x

Modify the service to use the script:

/etc/systemd/system/wsl-vpnkit.service

[Unit]
Description=wsl-vpnkit
After=network.target

[Service]
# for wsl-vpnkit setup as a distro
ExecStart=/usr/local/bin/start-vpn-kit.sh

Restart=always
KillMode=mixed

[Install]
WantedBy=multi-user.target

That was already sufficient to get my setup working again.

@dabeck81
Copy link
Author

I have created a pull-request to resolve my issue: #250
This does the necessary to set the WSL_INTEROP value correct inline in the wsl-vpnkit script.
No need anymore for having an extra script running before the service is started.

And most of all, it is self-healing, so if you have more than one bash-terminals running and you would close the process linked to the WSL-INTEROP-socket, that will be detected and will switch over to another running socket

@grepwood
Copy link

grepwood commented Nov 22, 2023

@dabeck81 your PR assumes that your interop socket will live forever which is just not true. The stability of your solution will literally hang on the skin of your Bash terminal's teeth. If you close all those terminals, you will ensure that wsl-vpnkit will never find a working interop socket until you manually restart the service from a new terminal.

I think a better solution would be to be able to request and free WSL interop sockets much like memory is requested and freed in C. For this purpose, I have created microsoft/WSL#10812 and described my use case and the findings myself and my team made.

Secondly, I don't think WSL_INTEROP="/run/WSL/$(ls /run/WSL | sort -n | tail -n 1)" is going to produce reliable results on faster machines. On my machine, the sockets created by /sbin/init (systemd, PID 1), /init (child of systemd, PID 2) and my terminal (PID whatever), are created within the same second. A more reliable way I think would be to compare the sockets' creation mtime, which is accurate to 10^-10th of a second:

WSL_INTEROP=$(find /run/WSL -type s -printf "%T+ %p\n" | sort -nk1 | tail -n1 | awk '{print $2}')

I'll be honest. I run into this scenario where there is no valid interop socket on a daily basis, because I use qterminal. I don't want to use PowerShell's terminal because it's slow and strange, and cmd.exe is just feature-famished - the choice is obvious for me. I found that Qt apps in WSL2 have a bug where the first time they start in the runtime of a WSL2 instance, the GUI of all Qt apps is not interactive. You need to shut them all down and start one of them again, and then all Qt apps will be interactive until you reboot the WSL2 instance. This is how the cookie crumbles on my end :(

@dabeck81
Copy link
Author

@grepwood ,
I just ran your proposal based upon the mtime of creation of the socket. Your solution is equivalent to mine:

  • Yours looks to the creation-time of the socket
  • Mine looks for the highest PID. Because the _interop socket holds the PID of the process responsable for creating the socket, you will always have the socket with the highest PID and thus the last created socket. No need to include the creationTime this way

@blakeduffey
Copy link

Any word on getting these PRs approved/merged and a new version out?

@nealey
Copy link

nealey commented Nov 28, 2023

I've thrown in on microsoft/WSL#10812, providing the logs they requested.

Based on my cursory examination, it looks like @grepwood has it exactly right: the /run/WSL/1_interop socket is magical somehow, and won't allow other processes to open Windows apps. This is why the systemd script is failing, and why the solutions to "borrow" a socket from a terminal work.

@dabeck81
Copy link
Author

Any word on getting these PRs approved/merged and a new version out?

I am affraid only @sakai135 is able to approve and merge the Pull Request, so we will need to wait for that

@grepwood
Copy link

I've read @OneBlue's post and I have a dastardly idea how to cheese the system for our gain. I'll let you know if it worked.

@grepwood
Copy link

And there it is https://gist.github.com/grepwood/8e42b2bd0e56cd964cbd77b6d182aff0

@nealey
Copy link

nealey commented Dec 1, 2023

And there it is https://gist.github.com/grepwood/8e42b2bd0e56cd964cbd77b6d182aff0

"dastardly" was right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants