Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector refuses to start when connectivity to one/any external service is not working #20382

Open
james-stevens opened this issue Apr 26, 2024 · 0 comments
Labels
domain: config Anything related to configuring Vector domain: observability Anything related to monitoring/observing Vector type: bug A code related bug.

Comments

@james-stevens
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When running vector on RHEL9, the file /usr/lib/systemd/system/vector.service contains the line

 ExecStartPre=/usr/bin/vector validate

This means vector will refuse to start-up if an external connection is not currently available, instead of starting up then retrying the connection, which is what it would do if the connection had gone down after it had successfully started.

From our config, I tried removing

healthchecks:
  require_healthy: true

and removing this from every service

    healthcheck:
      enabled: true

but vector validate still fails causing the service to refuse to start.

I would suggest ExecStartPre=/usr/bin/vector validate in the vector.system file could have either --no-environment or --skip-healthchecks added so vector will start up & retry the external connection once started, which is what it would do if the connection had failed during normal operation.

Because we use vector to run other data migration services, in this case aggregating metrics, having them all fail because one (or more) is not working is not really a useful mode of operation.

Configuration

api:
  enabled: true
  address: 127.0.0.1:8686

expire_metrics_secs: 300

healthchecks:
  require_healthy: true

sources:
  vector_metrics:
    type: internal_metrics


  services_metrics:
    type: prometheus_scrape
    scrape_interval_secs: 15
    scrape_timeout_secs: 2
    endpoints:
      - "http://127.0.0.1:9200/metrics"
      - "http://127.0.0.1:9167/metrics"


  dnstap:
    type: dnstap
    socket_path: /var/lib/vector/dnstap.sock
    socket_file_mode: 0o777
    mode: unix
    multithreaded: true

  relay_blocks:
    type: vector
    address: 10.17.252.114:9001

sinks:
  output_my_prom:
    type: prometheus_exporter
    address: 172.17.252.114:9100
    inputs:
      - vector_metrics
      - services_metrics

  vector_dnstap:
    inputs: [ dnstap ]
    type: vector
    address: "<hostname>:9000"
    buffer:
      max_size: 2684354880
      type: "disk"
      when_full: "drop_newest"
    healthcheck:
      enabled: true
    tls:
      enabled: true
      ca_file: /etc/vector/pems/myCA.pem
      key_file: /etc/vector/pems/vector.pem
      crt_file: /etc/vector/pems/vector.pem
      key_pass: "****"
      verify_certificate: true
      verify_hostname: true

  vector_relay_blocks:
    inputs: [ relay_blocks ]
    type: vector
    address: "<hostname>:9001"
    buffer:
      max_size: 2684354880
      type: "disk"
      when_full: "drop_newest"
    healthcheck:
      enabled: true
    tls:
      enabled: true
      ca_file: /etc/vector/pems/myCA.pem
      key_file: /etc/vector/pems/vector.pem
      crt_file: /etc/vector/pems/vector.pem
      key_pass: "****"
      verify_certificate: true
      verify_hostname: true

Version

vector 0.37.0 (x86_64-unknown-linux-gnu c1da408 2024-03-26 13:41:34.870460047)

Debug Output

# vector validate
√ Loaded ["/etc/vector/vector.yaml"]
√ Component configuration
2024-04-26T11:02:06.910152Z ERROR vector::topology::builder: msg="Healthcheck failed." error=Request failed: status: Unavailable, message: "error trying to connect: error:0A000086:SSL routines:(unknown function):certificate verify failed:ssl/statem/statem_clnt.c:2092:: unable to get local issuer certificate", details: [], metadata: MetadataMap { headers: {} } component_kind="sink" component_type="vector" component_id=vector_relay_blocks
x Health check for "vector_relay_blocks" failed: Request failed: status: Unavailable, message: "error trying to connect: error:0A000086:SSL routines:(unknown function):certificate verify failed:ssl/statem/statem_clnt.c:2092:: unable to get local issuer certificate", details: [], metadata: MetadataMap { headers: {} }
2024-04-26T11:02:07.010816Z ERROR vector::topology::builder: msg="Healthcheck failed." error=Request failed: status: Unavailable, message: "error trying to connect: error:0A000086:SSL routines:(unknown function):certificate verify failed:ssl/statem/statem_clnt.c:2092:: unable to get local issuer certificate", details: [], metadata: MetadataMap { headers: {} } component_kind="sink" component_type="vector" component_id=vector_dnstap
x Health check for "vector_dnstap" failed: Request failed: status: Unavailable, message: "error trying to connect: error:0A000086:SSL routines:(unknown function):certificate verify failed:ssl/statem/statem_clnt.c:2092:: unable to get local issuer certificate", details: [], metadata: MetadataMap { headers: {} }
√ Health check "output_my_prom"

Example Data

No response

Additional Context

No response

References

No response

@james-stevens james-stevens added the type: bug A code related bug. label Apr 26, 2024
@jszwedko jszwedko added domain: observability Anything related to monitoring/observing Vector domain: config Anything related to configuring Vector labels Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: config Anything related to configuring Vector domain: observability Anything related to monitoring/observing Vector type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants