Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken workflows due to the driver's lack of [re]start container capabilities #301

Open
walterdolce opened this issue Jun 20, 2018 · 3 comments

Comments

@walterdolce
Copy link

walterdolce commented Jun 20, 2018

Hi,

I am not sure this is intended behaviour, but I found myself having to manually manage the containers initially created/managed by test-kitchen via the kitchen-docker driver because it is currently unable to (re?)start the containers it initially created/managed.

For example, when I start from scratch with a clean slate (so no container is running and no image is pulled from upstream sources), I can normally use the normal kitchen create, kitchen converge, kitchen verify workflow.

But if I am not yet done with working on something, and I leave things as-is for the day, restart the laptop, etc and come back to it the day after for example, then the moment I tell test-kitchen to create or converge, I start to get the following:

kitchen create -l debug                                                                                                                             

-----> Starting Kitchen (v1.21.2)
D      [local command] BEGIN (docker >> /dev/null 2>&1)
D      [local command] END (0m0.07s)
WARN: Unresolved specs during Gem::Specification.reset:
      winrm (~> 2.0)
      winrm-fs (~> 1.0, ~> 1.1)
      docker-api (~> 1.26)
      aws-sdk (~> 2)
      addressable (>= 2.5.1, ~> 2.3, ~> 2.4, ~> 2.5)
      multi_json (~> 1.10, ~> 1.11)
      mixlib-versioning (>= 0)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
-----> Creating <default-centos-7>...
D      [kitchen::driver::docker command] BEGIN (docker -H unix:///var/run/docker.sock port a9f6376b90514ce79fb1c49433348c95528687cd221ad89d09d5b29f134fe45e 22/tcp)
       0.0.0.0:32770
D      [kitchen::driver::docker command] END (0m0.06s)
D      [SSH] opening connection to kitchen@localhost<{:user_known_hosts_file=>"/dev/null", :port=>32770, :compression=>false, :compression_level=>0, :keepalive=>true, :keepalive_interval=>60, :timeout=>15, :keys_only=>true, :keys=>["/path/to/the/project/folder/.kitchen/docker_id_rsa"], :auth_methods=>["publickey"], :verify_host_key=>false}>
D      [SSH] connection failed (#<Errno::ECONNREFUSED: Connection refused - connect(2) for 127.0.0.1:32770>)
       Waiting for SSH service on localhost:32770, retrying in 3 seconds

D      [SSH] opening connection to kitchen@localhost<{:user_known_hosts_file=>"/dev/null", :port=>32770, :compression=>false, :compression_level=>0, :keepalive=>true, :keepalive_interval=>60, :timeout=>15, :keys_only=>true, :keys=>["/path/to/the/project/folder/.kitchen/docker_id_rsa"], :auth_methods=>["publickey"], :verify_host_key=>false, :logger=>#<Logger:0x007fb1e614c888 @progname=nil, @level=4, @default_formatter=#<Logger::Formatter:0x007fb1e614c838 @datetime_format=nil>, @formatter=nil, @logdev=#<Logger::LogDevice:0x007fb1e614c7c0 @shift_size=nil, @shift_age=nil, @filename=nil, @dev=#<IO:<STDERR>>, @mon_owner=nil, @mon_count=0, @mon_mutex=#<Thread::Mutex:0x007fb1e614c748>>>, :password_prompt=>#<Net::SSH::Prompt:0x007fb1e614c6f8>, :user=>"kitchen"}>
D      [SSH] connection failed (#<Errno::ECONNREFUSED: Connection refused - connect(2) for 127.0.0.1:32770>)
       Waiting for SSH service on localhost:32770, retrying in 3 seconds

# ... the above repeats indefinitely...

As you can see, it's stuck in an endless quest for connection. Additionally, this is sligthly confusing because it doesn't really tell what the real issue is. That is only highlighted when a destroy command is issued. Please see below.

kitchen destroy -l debug                                                                                                                             


-----> Starting Kitchen (v1.21.2)
D      [local command] BEGIN (docker >> /dev/null 2>&1)
D      [local command] END (0m0.08s)
WARN: Unresolved specs during Gem::Specification.reset:
      winrm (~> 2.0)
      winrm-fs (~> 1.0, ~> 1.1)
      docker-api (~> 1.26)
      aws-sdk (~> 2)
      addressable (>= 2.5.1, ~> 2.3, ~> 2.4, ~> 2.5)
      multi_json (~> 1.10, ~> 1.11)
      mixlib-versioning (>= 0)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
-----> Destroying <default-centos-7>...
D      [kitchen::driver::docker command] BEGIN (docker -H unix:///var/run/docker.sock top a9f6376b90514ce79fb1c49433348c95528687cd221ad89d09d5b29f134fe45e)
       Error response from daemon: Container a9f6376b90514ce79fb1c49433348c95528687cd221ad89d09d5b29f134fe45e is not running
D      [kitchen::driver::docker command] END (0m0.06s)
D      [kitchen::driver::docker command] BEGIN (docker -H unix:///var/run/docker.sock rmi bfb1c2035a5a)
       Error response from daemon: conflict: unable to delete bfb1c2035a5a (must be forced) - image is being used by stopped container a9f6376b9051
D      [kitchen::driver::docker command] END (0m0.06s)
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: 1 actions failed.
>>>>>>     Failed to complete #destroy action: [Expected process to exit with [0], but received '1'
---- Begin output of docker -H unix:///var/run/docker.sock rmi bfb1c2035a5a ----
STDOUT:
STDERR: Error response from daemon: conflict: unable to delete bfb1c2035a5a (must be forced) - image is being used by stopped container a9f6376b9051
---- End output of docker -H unix:///var/run/docker.sock rmi bfb1c2035a5a ----
Ran docker -H unix:///var/run/docker.sock rmi bfb1c2035a5a returned 1] on default-centos-7
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details
>>>>>> Also try running `kitchen diagnose --all` for configuration

The real issue here seems to be that test-kitchen (via the kitchen-docker driver) is unable to do anything because of the stopped container a9f6376b9051.

Shouldn't the driver be able to restart stopped containers so that it can then perform the actions it has been instructed to do? (create, converge, destroy, etc)

Thanks.

PS: I think this is somehow related to #285 as stopped containers are mentioned there, too.

@walterdolce
Copy link
Author

walterdolce commented Jun 20, 2018

It's is probably worth noting that manually starting the stopped container does not help, because it would most probably start it bound to a different port.

For example, continuing on what has been reported above:

docker start a9f6376b9051

Will start the container. But now its port 22 is bound to a different port than the one before (32770). Please see below.

 docker ps

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                   NAMES
a9f6376b9051        bfb1c2035a5a        "/usr/sbin/sshd -D -…"   18 hours ago        Up 39 seconds       0.0.0.0:32768->22/tcp   defaultcentos7-random-container-name-j4ij3pcd

It means that a kitchen create or kitchen converge would not be able to do anything with the [re]started container because the driver doesn't know how to access it. The kitchen-docker driver is still expecting the container to be on a different port as defined in the file .kitchen/default-centos.yml generated by the driver itself.

But manually restarting the container as shown above will allow engineers to run kitchen destroy and start from scratch. But this is, IMO, not ideal.

I guess the above highlights the driver's lack of behaviour to deal with these scenarios and how having to manually [re]start the stopped container don't actually solve the problem but leaves things in a broken state and open to "manual intervention".

Hope this makes sense. Any question let me know!

@walterdolce walterdolce changed the title Broken workflows due to inability to [re]start containers Broken workflows due to the driver's lack of [re]start container capabilities Jun 20, 2018
@coderanger
Copy link
Contributor

Are you trying to run kitchen from inside a container? That's usually the cause of these kinds of things.

@walterdolce
Copy link
Author

@coderanger no, I'm not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants