Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ec2_vpc_nat_gateway using a dynamically-allocated eIP sometimes fails with botocore exception InvalidElasticIpID.NotFound #1872

Open
1 task done
pluto00987 opened this issue Nov 21, 2023 · 2 comments
Labels
jira waiting_on_contributor Needs help. Feel free to engage to get things unblocked

Comments

@pluto00987
Copy link

pluto00987 commented Nov 21, 2023

Summary

Creating a NAT gateway with ec2_vpc_nat_gateway using a dynamically-allocated eIP sometimes fails with a botocore exception InvalidElasticIpID.NotFound. This is despite the fact that the eIPallocation it references (eipalloc-0faae3f7d465f76f9 as per the example traceback below) does exist, at least after the fact, and also that no eIP is provided by the yaml so it is creating that eIP itself (as expected).

It's unclear to me why this happens, ie if it's a collection issue or a boto issue. I don't see any 'state' or similar attribute on an eIP that would suggest it might not be 'ready' as soon as it 'exists'. As such I'm not sure if/how the collection could check for that in between eIP creation and NATgw creation.

This is with aws collection 6.2.0, but I don't see any changes to ec2_vpc_nat_gateway.py in newer versions of 6.x

Issue Type

Bug Report

Component Name

ec2_vpc_nat_gateway

Ansible Version

$ ansible --version
ansible [core 2.14.3]
  config file = /runner/project/ansible.cfg
  configured module search path = ['/home/runner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /runner/requirements_collections:/home/runner/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.16 (main, Dec  8 2022, 00:00:00) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] (/usr/bin/python3)
  jinja version = 3.1.2
  libyaml = True

Collection Versions

$ ansible-galaxy collection list

# /usr/share/ansible/collections/ansible_collections
Collection              Version
----------------------- -------
@NAMESPACE@.@NAME@      3.0.1
amazon.aws              5.4.0
ansible.posix           1.5.1
ansible.windows         1.13.0
awx.awx                 21.13.0
azure.azcollection      1.15.0
community.vmware        *
google.cloud            1.1.3
kubernetes.core         2.4.0
openstack.cloud         2.0.0
redhatinsights.insights 1.0.7
theforeman.foreman      3.9.0

# /runner/requirements_collections/ansible_collections
Collection         Version
------------------ -------
amazon.aws         6.2.0
ansible.netcommon  3.1.0
ansible.utils      2.11.0
ansible.windows    1.11.1
awx.awx            19.2.2
community.aws      6.1.0
community.docker   1.9.0
community.general  3.4.0
community.windows  1.11.0
oasis_roles.system 1.1.3

AWS SDK versions

$ pip show boto boto3 botocore
WARNING: Package(s) not found: boto
Name: boto3
Version: 1.26.99
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.9/site-packages
Requires: botocore, jmespath, s3transfer
Required-by:
---
Name: botocore
Version: 1.29.99
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.9/site-packages
Requires: jmespath, python-dateutil, urllib3
Required-by: boto3, s3transfer

Configuration

$ ansible-config dump --only-changed
ANSIBLE_FORCE_COLOR(env: ANSIBLE_FORCE_COLOR) = True
ANSIBLE_PIPELINING(/runner/project/ansible.cfg) = True
COLLECTIONS_PATHS(env: ANSIBLE_COLLECTIONS_PATHS) = ['/runner/requirements_collections', '/home/runner/.ansible/collections', >
CONFIG_FILE() = /runner/project/ansible.cfg
DEFAULT_CALLBACK_PLUGIN_PATH(env: ANSIBLE_CALLBACK_PLUGINS) = ['/runner/artifacts/2081/callback']
DEFAULT_ROLES_PATH(env: ANSIBLE_ROLES_PATH) = ['/runner/requirements_roles', '/home/runner/.ansible/roles', '/usr/share/ansibl>
DEFAULT_STDOUT_CALLBACK(env: ANSIBLE_STDOUT_CALLBACK) = awx_display
HOST_KEY_CHECKING(env: ANSIBLE_HOST_KEY_CHECKING) = False
INVENTORY_UNPARSED_IS_FAILED(env: ANSIBLE_INVENTORY_UNPARSED_FAILED) = True
RETRY_FILES_ENABLED(env: ANSIBLE_RETRY_FILES_ENABLED) = False

OS / Environment

CentOS Stream release 9

Steps to Reproduce

- name: Ensure the VPC has NAT gateway for agent subnets
  amazon.aws.ec2_vpc_nat_gateway:
    if_exist_do_not_create: yes
    region: "{{ region }}"
    subnet_id: "{{ subnet_id }}"
    wait: yes
  register: natgw
  when: agent_nat

Expected Results

This should create a new public NAT gateway, using a freshly-allocated Elastic IP.

Actual Results

"An error occurred (InvalidElasticIpID.NotFound) when calling the CreateNatGateway operation: The elasticIp ID 'eipalloc-0faae3f7d465f76f9' does not exist"

"Traceback (most recent call last):
  File \"/tmp/ansible_amazon.aws.ec2_vpc_nat_gateway_payload_031y5umw/ansible_amazon.aws.ec2_vpc_nat_gateway_payload.zip/ansible_collections/amazon/aws/plugins/modules/ec2_vpc_nat_gateway.py\", line 630, in create
  File \"/tmp/ansible_amazon.aws.ec2_vpc_nat_gateway_payload_031y5umw/ansible_amazon.aws.ec2_vpc_nat_gateway_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/retries.py\", line 105, in deciding_wrapper
    return retrying_wrapper(*args, **kwargs)
  File \"/tmp/ansible_amazon.aws.ec2_vpc_nat_gateway_payload_031y5umw/ansible_amazon.aws.ec2_vpc_nat_gateway_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/cloud.py\", line 119, in _retry_wrapper
    return _retry_func(
  File \"/tmp/ansible_amazon.aws.ec2_vpc_nat_gateway_payload_031y5umw/ansible_amazon.aws.ec2_vpc_nat_gateway_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/cloud.py\", line 68, in _retry_func
    return func()
  File \"/usr/local/lib/python3.9/site-packages/botocore/client.py\", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File \"/usr/local/lib/python3.9/site-packages/botocore/client.py\", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidElasticIpID.NotFound) when calling the CreateNatGateway operation: The elasticIp ID 'eipalloc-0f0e36392ebfc5490' does not exist
",

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@pluto00987
Copy link
Author

On the surface this seems similar to #1320 but I don't think it's quite the same issue.

@tremble
Copy link
Contributor

tremble commented Nov 21, 2023

The most likely cause is the AWS APIs being "eventually" consistent (the same as #1320). Sometimes the API calls will return things like the ID for a net-new resource before they can be consistently referenced.

updating the client creation call to something like the following will probably fix work around the issue:

retry_decorator = AWSRetry.jittered_backoff(
    catch_extra_error_codes=["InvalidElasticIpID.NotFound"],
)
client = module.client("ec2", retry_decorator=retry_decorator)

Would you be willing to open a PR?

@hakbailey hakbailey added the waiting_on_contributor Needs help. Feel free to engage to get things unblocked label Nov 28, 2023
@gravesm gravesm added jira and removed needs_triage labels Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira waiting_on_contributor Needs help. Feel free to engage to get things unblocked
Projects
None yet
Development

No branches or pull requests

4 participants