Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait does not return on signal with handler #10193

Open
stevenpelley opened this issue Jan 6, 2024 · 4 comments
Open

wait does not return on signal with handler #10193

stevenpelley opened this issue Jan 6, 2024 · 4 comments

Comments

@stevenpelley
Copy link

> uname -a
Darwin MacBook-Pro.local 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct  9 21:28:12 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T8103 arm64
> fish --version
fish, version 3.7.0

the "wait" command does not return when receiving a signal for which a handler is defined. Consider the following script:

#!/usr/bin/env fish                                                                                     
                                                                                                        
fish -c 'sleep 300' &                                                                                   
set mypid $last_pid

echo "bg pid $mypid"
echo "this pid $fish_pid"                                                                               
                                                                                                        
set sig SIGTERM                                                                                         

function myhandler --on-signal $sig --inherit-variable mypid --inherit-variable sig                     
  echo "kill -$sig $mypid"                                                                              
  kill -$sig $mypid
  echo "after kill"
end
  
while jobs;                                                                                             
  # uncommenting this and commenting out echo...wait line below verifies the
  # signal handler runs when sleep returns.
  #sleep 5

  wait $mypid
  echo "wait status $status"                                                                   
end

If I kill -SIGTERM <PID> with the script's PID nothing happens. When I uncomment the "sleep 5" and comment out the "wait" I see the output of "jobs" every 5 seconds, and when I again kill -SIGTERM <PID> it will execute the handler and subsequently terminate once the "sleep" completes.

Changing "set sig SIGTERM" to "set sig SIGINT" results in different behavior: With "wait" (not "sleep") a SIGTERM sent to the fish script (kill -SIGINT, not ctrl-c) does cause wait to return and then execute the handler. It correctly returns 130 (128 + SIGINT's 2) according to POSIX sh's wait semantics (not documented in fish). However, the SIGINT does not kill the child fish process and so the loop continues. It appears that even a manual kill -SIGINT <PID> fails to kill it. But this is a different problem or undocumented intended behavior (couldn't find anything, sorry if it is documented).

The documentation doesn't actually specify that wait will return immediately on any signal with a defined hander/trap, but the syntax so closely matches POSIX and related shells' wait command that I assume the intent is to match those semantics. Ideally I want something akin to https://www.gnu.org/software/bash/manual/html_node/Signals.html to list the intricacies of various signals, handlers, and interaction with commands/builtins

@stevenpelley
Copy link
Author

Out of curiosity I made a test of trap behavior for bash, zsh, and fish at https://github.com/stevenpelley/shell-wait-trap-test
While building this I learned that bash ignores SIGTERM in the absence of any traps so that kill 0 terminates all processes in the shell's process group aside from the shell itself. I suspect the behavior I'm seeing in fish was to provide the same behavior, but it was removed in all cases, not just interactive mode.

If I decide to take this any further I'll post elsewhere first (mailing list?)

@faho
Copy link
Member

faho commented Jan 11, 2024

If I decide to take this any further I'll post elsewhere first (mailing list?)

The mailing list is pretty inactive and not where we tend to discuss bugs or enhancements.
This is the right spot to post things like this, we're just busy.

And speaking for myself I barely use wait so I'm not sure what exactly the right behavior here is. Truth be told I've been hoping for someone else to figure it out.

@faho faho added this to the fish-future milestone Jan 11, 2024
@stevenpelley
Copy link
Author

Thanks @faho
I learned about/investigated wait in bash, zsh, and posix sh for other reasons and wrote an opinion piece in the README at https://github.com/stevenpelley/bash-playground
If nothing else this can act as a bit of documentation/reference and includes a few links to other tools' docs.

My take is that bash/zsh/sh wait is itself "incomplete" and has become confusing as it evolved. In https://github.com/stevenpelley/bash-playground?tab=readme-ov-file#proposed-new-options I list my desired behavior. Since fish isn't aiming to be posix compliant there's an opportunity to rethink wait's behavior. Of course, you want some out of the box familiarity for those coming from other shells. If fish provided my desired behavior it would be a strong reason for me to use it over other shells. I've also seen a number of confused and frustrated posts on HN, reddit, and stack overflow by people trying to solve similar problems with traditional shells.

Some related details that I find confusing:

Fish has TRAP, and allows disabling signals with an empty string (TRAP "" SIGTERM), but it's not clear how/if this interacts with event handlers for signals. Does this disable event handlers for the same signal, or does it only ignore default signal handling and undo previous calls to TRAP?

ignored traps are also ignored in subshells (I know there are none in fish) and any child processes (https://pubs.opengroup.org/onlinepubs/007904875/functions/exec.html -- paragraph starts Signals set to the default action; also discussed in https://pubs.opengroup.org/onlinepubs/009695399/utilities/trap.html). I haven't tested this, it's just something I ran into with bash, is related to this discussion, and the behavior should be defined whether it follows sh or not.

@stevenpelley
Copy link
Author

stevenpelley commented Feb 22, 2024

I'm interested in moving this forward and might be willing to contribute depending on how messy it gets.

In the short term I believe this is solved in signals.rs::fish_signal_handler in "match sig" SIGTERM case by adding topic_monitor_principal().post(topic_t::sighupint) as is done for SIGINT and SIGHUP. This would cause a handled SIGTERM to awake wait(). Note that if SIGTERM is not observed (there is no defined handler) we'd be resetting the TERM handler and raising SIGTERM anyways, so it would never get there, as intended.

Simply doing this leaves some confusion (or contributes to it, posting to a signal-named topic that doesn't include "term") and additionally does nothing for all of the other signals that might be observed/trapped but do not post to a topic monitor.
I recommend/TODO:

  • hoisting (or adding a new) topic monitor posting out of "match sig" into the preceding "if observed" block. Any observed/trapped signal should awake a call to wait(), and so must post a topic monitor.
  • Investigate the downstream effects of the defined topics. Today topics are clearly "a thing that can happen" but to awake wait() we may want to additionally allow topics that are "a thing I want to do" so that we don't need to define a topic for every signal. If the existing topics fit this pattern it may simply be renamed, otherwise a new or additional organization of topics may be required.
  • the "implementation details" paragraph below lists a number of mechanisms to list or post events (topic monitors, pending events, handle_winch, reader_sighup, CANCELLATION_SIGNAL, reader_handle_sigint). Investigate which of these are repetitive or can otherwise be refactored by common triggers even if the downstream state needs to stay separated. The goal is to streamline the logic of fish_signal_handler to make it clear what needs to happen in all cases if a signal is observed, in all cases unconditionally, and in cases for specific signals.

Reference:
Opengroup doc on wait and sh signals

wait and trap behavior. I want to aim for POSIX-compatible sh semantics, and deviate from that when there's a good reason to. Or at least there are useful aspects of POSIX sh that I'm aiming for with this improvement.
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_11 (section 2.11)

When a signal for which a trap has been set is received while the shell is waiting for the completion of a utility executing a foreground command, the trap associated with that signal shall not be executed until after the foreground command has completed. When the shell is waiting, by means of the wait utility, for asynchronous commands to complete, the reception of a signal for which a trap has been set shall cause the wait utility to return immediately with an exit status >128, immediately after which the trap associated with that signal shall be taken.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/wait.html for general wait semantics. I'm going to skip advanced features like -n for now and focus on general behavior and interaction with signals.

Rust implementation details: to check my understanding and document.
src/signal.rs::fish_signal_handler is the single signal handler for all signals.
event.rs::is_signal_observed/OBSERVED_SIGNALS lists signals with some event handler
in fish_signal_handler if signal is observed it is enqueued via event.rs::enqueue_signal/PENDING_SIGNALS. This will be queried in fire_delayed around every statement and other places, at any time where we may run event handlers.

In fish_signal_handler there are then a number of per-signal actions in a match block. These contain:
topic monitor posts (sighupint for SIGHUP and SIGINT; sigchld for SIGCHLD) that awakes a call to command wait via topic_monitor:await_gens/check proc:process_mark_finished_children/proc_wait_any wait:wait_for_completion/wait. But I don't see anything requiring that the signal is "observed" -- it might simply be that in all cases the topic_monitor signals must wake wait() because a child finished, the signal is observed, or the signal is going to terminate/interrupt the script. This may be not be correct for all signal types. Posting topic monitors is unconditional on whether the signal is observed. SIGTERM does not post a topic_monitor. Git annotate on the 3.7.0 branch's C++ code looks like this is a historical oversight, not deliberate.
"not observed" conditional behavior -- SIGHUP reader_sighup(); SIGTERM restore term foreground process group for exit, reset SIGTERM handler to default, raise SIGTERM; SIGINT store to CANCELLATION_SIGNAL
observed-independent unconditional behavior -- SIGWINCH term resize; SIGINT reader_handle_sigint() -- input.rs reads reader's INTERRUPTED. I assume this is detecting ctrl-C

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants