Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerized iamlive proxy doesn't generate --output-file on SIGHUP nor on exit #57

Open
timblaktu opened this issue Aug 26, 2022 · 10 comments

Comments

@timblaktu
Copy link

timblaktu commented Aug 26, 2022

Thank you for creating this amazing project! My iamlive container, running v0.49.0, is now successfully proxying aws cli requests, as proven by its stdout captured in the following docker log entry, output in response to aws sts get-caller-identity --debug --profile <myprofile>:

my-iamlive-1  | {"Version":"2012-10-17","Statement":[{"Effect":"Allow","Action":["sts:GetCallerIdentity"],"Resource":"*"}]}

..but it is not dumping this text into its --output-file, neither at graceful exit nor on SIGHUP.

The iamlive container is based on this one, and executes iamlive in its entrypoint as:

    /app/iamlive \
        --output-file ${IAMLIVE_SHARED_PATH}/iamlive.log \
        --mode proxy \
        --bind-addr 0.0.0.0:10080 \
        --ca-bundle ${IAMLIVE_SHARED_PATH}/ca.pem \
        --ca-key ${IAMLIVE_SHARED_PATH}/ca.key \
        | jq -c .

The ${IAMLIVE_SHARED_PATH} folder (actually is /home/appuser/.iamlive) is the container mount point for a named docker volume that is shared with another "client container" that is being monitored for AWS api calls. Below is the relevant excerpt from the docker compose config that orchestrates these two containers.

services:
  main:
    environment:
      - IAMLIVE_SHARED_PATH=${IAMLIVE_SHARED_PATH}
    .
    .
    .
    build:
      args:
        - IAMLIVE_SHARED_PATH=${DEFAULT_CONTAINER_HOME}/${IAMLIVE_SHARED_FOLDER}
    .
    .
    .
    # Ensure iamlive container is run before main
    depends_on:
      - "iamlive"
    volumes:
      - "dot-iamlive:${DEFAULT_CONTAINER_HOME}/${IAMLIVE_SHARED_FOLDER}:rw"
  iamlive:
    environment:
      - IAMLIVE_SHARED_PATH=${IAMLIVE_SHARED_PATH}
    .
    .
    .
    build:
      args:
        - IAMLIVE_SHARED_PATH=${IAMLIVE_SHARED_PATH}
    .
    .
    .
    user: "${HOST_UID}:${HOST_GID}"
    expose:
       - "${IAMLIVE_LISTENER_HTTP_HTTPS_PORT}"
    volumes:
      - "dot-iamlive:${IAMLIVE_SHARED_PATH}:rw"
volumes:
  dot-iamlive:

(All I've removed from the above are many superfluous and noisy variable definitions.)

Mutual access to the shared volume has already been proven working correctly, since:

  1. the IAMLIVE_SHARED_PATH mount point in the iamlive container is where iamlive dumps the required ca.pem and ca.key files and
  2. the IAMLIVE_SHARED_PATH mount point in the main (client) container is where the client application instructs the AWS CLI to read the certificate bundle from, using:
export AWS_CA_BUNDLE=${IAMLIVE_SHARED_PATH}/ca.pem

As mentioned at the top, I've also proven that this scheme of two applications/containers is working in terms of networking and application configuration. The iamlive proxy is receiving the sts get-caller-identity request and dumping to stdout a policy document correctly containing a sts:GetCallerIdentity action.

The issue

I've yet to see a iamlive.log file get dumped.

Use case 1: iamlive exits

At first, I had the main container sleep several seconds after the successful aws sts get-caller-identity transaction, and then exit. Because main depends_on iamlive, main is stopped first, then iamlive. Here, I expected that the iamlive application would be sent and would catch SIGTERM, and then run this code to write (and flush??) GetPolicyDocument()'s return value to the outputFileFlag. Since the file path being written is in a mounted folder backed by docker volume on the host, if this file got written, I expected that to persist until the next container run. (There is nothing in this system that deletes files in that shared volume folder.)

Use case 2: SIGHUP

Next, I modified the project to enable the client application running in the main container to send UNIX process signals to other containers, specifically so that it could send SIGHUP to the iamlive container as a way to force it to dump the policy to disk before exiting. For the curious, this required:

  1. sharing with main the path to the host's docker daemon socket via a host docker volume, and
  2. ensuring the user in main has the correct permissions to access that socket, and
  3. POSTing to this socket at the URL http://localhost/containers/<target_container_id>/kill?signal=SIGHUP

When testing this, however, everything worked just fine (the POST gets a 204 No Content response, which is the expected "successful" result for this api call), except that the iamlive.log file did not get dumped. I confirmed that the I was using the correct docker daemon api and that I was using the correct target_container_id by removing the ?signal=SIGHUP part of the URL which sends SIGKILL by default, and when running observing the iamlive container exiting immediately after the request POSTed from the client application running in the main container.

Summary

So, this feels like a bug, but I could also use some help in troubleshooting this from the iamlive side, so please send me any ideas you have on troubleshooting techniques for this app. I've not seen any debugging or verbose mode, nor have I looked at the source code much yet, but I am now stumped and receptive to any help. I realize that this usage mode is unusual -most people seem to be monitoring AWS cli activity from the host system instead of from another container - but this is why I explained myself so thoroughly. Still, let me know if you need any more info to help.

Thanks again for creating this amazing project!

@timblaktu
Copy link
Author

Since my goal here is actually to monitor terraform calls, despite this issue, I went ahead and modified the client application to wait to enable the iamlive proxy (by setting HTTP*_PROXY and AWS_CA_BUNDLE env vars) until right before it calls terraform apply, and "let her rip". As in the above reported case where I was just doing a single AWS API call, I can see in the iamlive container's stdout as it runs that the proxy is indeed working as expected.

I will report back later if I have the same issue with the file not getting dumped, as another data point. The terraform project I'm running takes about 20 minutes to run and is creating and bootstrapping an EKS cluster.

@iann0036
Copy link
Owner

Hey @timblaktu,

Thanks for raising and for providing your detailed setup.

I've heard from others that containerising can significantly mess with the signals processing. I also haven't been able to nail down the true solution here, however there was an addition that may help you.

If you use the --refresh-rate flag and set it to 1, then have a sleep 1; after your terraform execution, the system should have all the actions captured due to this continuous writing mechanism. Basically it'll write every second regardless of the signals. It's not as clean, but it might help you through some of your issue.

Let me know how you go.

@timblaktu
Copy link
Author

timblaktu commented Aug 26, 2022

Thanks for the tip, @iann0036. So far, no joy using --refresh-rate 1 in the iamlive call in my container's entrypoint:

/app/iamlive --output-file /home/appuser/.iamlive/iamlive.log \
    --mode proxy --refresh-rate 1 --bind-addr 0.0.0.0:10080 \
    --ca-bundle /home/appuser/.iamlive/ca.pem \
    --ca-key /home/appuser/.iamlive/ca.key \
    | jq -c .                                                                                                                                                                                               

Still no output file appearing the specified location. But now I have some ideas:

  1. @iann0036 Is there some way to make iamlive process have more verbose logging output, related to the operation of the application, not the generated policy doc? I'd love to get more insight into what's happening in the signal handler function, or even confirm that it's even running in my case.
  2. I wonder if there's something fishy going on with my docker entrypoint command using a pipe to format output with jq.. I will test removing that..
    a. UPDATE: tried this to no avail. The jq -c . pipe was actually working fine as is, since at stdout I was seeing the generated policy appear on one line.

@timblaktu
Copy link
Author

timblaktu commented Aug 26, 2022

@iann0036 re: signal handling in iamlive (or any) containers, I have learned that signals issued to containers by docker kill or the corresponding docker daemon API call, are sent only to the first container process with PID 1 - this is the entrypoint process. If an entrypoint process spawns other processes that need to handle signals, then as described in the docker docs it must either use the exec gosu pattern, or spawn child processes normally, and implement signal handlers with the trap command, and forward the signal to child processes and perform other desired cleanup.

In my case, I defined a docker ENTRYPOINT that was a script which called iamlive in a bash subshell (bc of the script's shebang: #!/usr/bin/env bash), which resulted in a process tree that would prevent signals from getting to iamlive app:

/app $ ps aux
PID   USER     TIME  COMMAND
    1 1000      0:00 bash /app/docker-entrypoint-iamlive.sh
   10 1000      0:03 /app/iamlive --output-file /home/appuser/.iamlive/iamlive.log --mode proxy --refresh-rate 1 --bind-addr 0.0.0.0:10080 --ca-bundle /home/appuser/.iamlive/ca.pem --ca-key /home/appuser/.iamlive/ca.key
   59 1000      0:00 sh
   66 1000      0:00 ps aux

So, it's clear that I was never seeing an output file because iamlive was never receiving the SIGTERM or SIGHUP signals from the docker daemon. iamlive was instead being killed by its parent bash process, and this was being done in an ungraceful (SIGKILL) and an unhandled, way.

I didn't really want the added complexity of signal handlers in my entrypoint script, but I preferred that over installing another package (gosu) onto my container, so I decided to just trap the 4 signals iamlive is listening for: SIGHUP, SIGTERM, SIGINT, and SIGQUIT - from the parent process, and pass them on to iamlive. I'm implementing this now, will report back with results.

@timblaktu
Copy link
Author

I'm definitely now getting an --output-file dumped on SIGTERM now, which is progress! SIGHUP is being caught by my script and passed on to the iamlive process, but I'm not getting any file dumped from that event. I was hoping that SIGHUP wouldn't cause the iamlive process to abort, however, but that's the behavior I'm seeing. I can see in the handler code that SIGHUP is a special case that does NOT call os.exit(), so there must be some issue in my environment. I'm simply trying to use SIGHUP to force dump the file at strategic points in the execution of a terraform project, coupled with copying/moving the dumped file from a shared volume, so that I can essentially use iamlive like an "iam profiler" of arbitrary processes that make AWS calls. I'll report more as the plot thickens. Thanks for your support.

@timblaktu
Copy link
Author

@iann0036 I have since discovered NO_PROXY and as such have less reason to "control" iamlive with signals from my client process. I'd like to keep this issue open until I resolve the issue, but for now wanted to suggest that the README includes NO_PROXY in the proxy mode instructions alongside HTTP_PROXY to fix situations where the monitored client process sends requests other than through the AWS api.

@jgrumboe
Copy link

Have you tried adding tini to your container with iamlive? https://computingforgeeks.com/use-tini-init-system-in-docker-containers/
Tini is a init process for containers, especially for forwarding signals to forked processes.

@timblaktu
Copy link
Author

timblaktu commented Sep 2, 2022

@jgrumboe No, ive never used --init or the tini init system. Thanks. Perhaps I'll try it as a sanity check since it looks like a c implementation of what ive done in bash (probably with some error..) It's -g option makes it forward SIGHUP (and most other interesting ones) to the child process group. Reading the docs i domt see any way to pass tini args using docker --init, so i probably have to call tini from my entrypoint (literally replacing my bash script with tini). Thanks again..

Tini discussiom stating that its just a signal forwarder..

@jgrumboe
Copy link

jgrumboe commented Sep 2, 2022

You're welcome. 👍 Maybe you can report back your findings.

@stv-io
Copy link

stv-io commented Sep 21, 2023

Suggestion, probably aimed towards @iann0036 - would it make sense to add a Dockerfile to this repo, with the findings and ideas presented here? There could even be a workflow which builds and pushes an image to ghcr.io, along with the rest of the binary releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants