Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No logs with reduce + starts_with #20452

Closed
beliys opened this issue May 7, 2024 · 12 comments
Closed

No logs with reduce + starts_with #20452

beliys opened this issue May 7, 2024 · 12 comments
Labels
type: bug A code related bug.

Comments

@beliys
Copy link

beliys commented May 7, 2024

A note for the community

No response

Problem

Hello, guys! I need help to understand what I am doing wrong.
I have the following logs:

Info;myservice-6.40.268-dev-1-1565;<10000121289>;network.Connection.~Connection;23360962488064;Connection has been destroyed (create_count=121291, destroy_count=121290, connections_count=1)
Debug;myservice-6.40.268-dev-1-1565;;data.DB.Update;23360985601792;query: [
        UPDATE mytable SET myfield1 = 0,myfield2 = 0, myfield3 = true
        WHERE id=3416;
        ]
Info;myservice-6.40.268-dev-1-1565;;myMgr::Update;23360987703040;Prepare data for saving myData: 0ms
Info;myservice-6.40.268-dev-1-1565;;myMgr::Update;23360987703040;Send myData take: 0ms
Debug;myservice-6.40.268-dev-1-1565;;data.DB.Update;23360985601792;query: [
        UPDATE mytable SET myfield1 = 0,myfield2 = 0, myfield3 = true
        WHERE id=3416;
        ]

With this config, everything works, but for log type

Debug;myservice-6.40.268-dev-1-1565;;data.DB.Update;23360985601792;query: [
        UPDATE mytable SET myfield1 = 0,myfield2 = 0, myfield3 = true
        WHERE id=3416;
        ]

4 separate logs are created instead of one.

data_dir: "/var/lib/vector"
expire_metrics_secs: 60
api:
 enabled: true
 address: 0.0.0.0:8686
 playground: true
sources:
 vector_agents:
   type: vector
   address: 0.0.0.0:9000
   version: "2"
transforms:
 prepare_logs:
   type: remap
   inputs:
     - vector_agents
   source: |-
     .body = del(.)
     .service = del(.body.kubernetes.container_name)
     .namespace = del(.body.kubernetes.pod_namespace)
     .node_name = del(.body.kubernetes.pod_node_name)
     .message = del(.body.message)
     .timestamp = del(.body.timestamp)
     del(.body)
sinks:
 opensearch:
   type: elasticsearch
   api_version: v8
   inputs:
     - prepare_logs
   endpoints:
     - https://127.0.0.1:9200
   auth:
     strategy: "basic"
     user: "vector"
     password: "MyPass"
   compression: gzip
   batch:
     max_bytes: 200000
   distribution:
     retry_max_duration_secs: 300
   mode: "bulk"
   bulk:
     action: "create"
     index: "{{ .service }}-%Y.%m.%d"
   encoding:
     except_fields: ["index_name"]
   healthcheck: true
   tls:
     verify_hostname: true
     verify_certificate: false
   acknowledgements:
     enabled: true

I wanted to make the beginning of the message be any string that does not start with a space.
Here's my attempt to do it with reduce + starts_with:

data_dir: "/var/lib/vector"
expire_metrics_secs: 60
api:
 enabled: true
 address: 0.0.0.0:8686
 playground: true
sources:
 vector_agents:
   type: vector
   address: 0.0.0.0:9000
   version: "2"
transforms:
 prepare_logs:
   type: remap
   inputs:
     - vector_agents
   source: |-
     .body = del(.)
     .service = del(.body.kubernetes.container_name)
     .namespace = del(.body.kubernetes.pod_namespace)
     .node_name = del(.body.kubernetes.pod_node_name)
     .message = del(.body.message)
     .timestamp = del(.body.timestamp)
     del(.body)
 multiline_reduce:
   type: "reduce"
   inputs:
     - prepare_logs
   merge_strategies:
     .message: "concat_newline"
   starts_when:
     source: starts_with!(.message, " ")
     type: vrl
sinks:
 opensearch:
   type: elasticsearch
   api_version: v8
   inputs:
     - multiline_reduce
   endpoints:
     - https://127.0.0.1:9200
   auth:
     strategy: "basic"
     user: "vector"
     password: "myPass"
   compression: gzip
   batch:
     max_bytes: 200000
   distribution:
     retry_max_duration_secs: 300
   mode: "bulk"
   bulk:
     action: "create"
     index: "{{ .service }}-%Y.%m.%d"
   encoding:
     except_fields: ["index_name"]
   healthcheck: true
   tls:
     verify_hostname: true
     verify_certificate: false
   acknowledgements:
     enabled: true

But the logs stop gathering and I see the following error in the logs:

2024-05-07T07:36:34.836322Z ERROR sink{component_kind="sink" component_id=opensearch component_type=elasticsearch}:request{request_id=19}: vector::sinks::elasticsearch::service: Response contained errors. error_code="http_response_200" response=Response { status: 200, version: HTTP/1.1, headers: {"content-type": "application/json; charset=UTF-8", "content-length": "610"}, body: b"{\"took\":0,\"errors\":true,\"items\":[{\"create\":{\"_index\":\"<null>-2024.05.01\",\"_id\":null,\"status\":400,\"error\":{\"type\":\"invalid_index_name_exception\",\"reason\":\"Invalid index name [<null>-2024.05.01], must not contain the following characters [ , \\\", *, \\\\, <, |, ,, >, /, ?]\",\"index\":\"<null>-2024.05.01\",\"index_uuid\":\"_na_\"}}},{\"create\":{\"_index\":\"<null>-2024.05.01\",\"_id\":null,\"status\":400,\"error\":{\"type\":\"invalid_index_name_exception\",\"reason\":\"Invalid index name [<null>-2024.05.01], must not contain the following characters [ , \\\", *, \\\\, <, |, ,, >, /, ?]\",\"index\":\"<null>-2024.05.01\",\"index_uuid\":\"_na_\"}}}]}" }

Any help would be appreciated.

Configuration

No response

Version

vector 0.37.1 (x86_64-unknown-linux-gnu cb6635a 2024-04-09 13:45:06.561412437)

Debug Output

No response

Example Data

Info;myservice-6.40.268-dev-1-1565;<10000121289>;network.Connection.~Connection;23360962488064;Connection has been destroyed (create_count=121291, destroy_count=121290, connections_count=1)
Debug;myservice-6.40.268-dev-1-1565;;data.DB.Update;23360985601792;query: [
UPDATE mytable SET myfield1 = 0,myfield2 = 0, myfield3 = true
WHERE id=3416;
]

Info;myservice-6.40.268-dev-1-1565;;myMgr::Update;23360987703040;Send myData take: 0ms
Debug;myservice-6.40.268-dev-1-1565;;data.DB.Update;23360985601792;query: [
UPDATE mytable SET myfield1 = 0,myfield2 = 0, myfield3 = true
WHERE id=3416;
]
Info;myservice-6.40.268-dev-1-1565;;myMgr::Update;23360987703040;Prepare data for saving myData: 0ms

Additional Context

No response

References

No response

@beliys beliys added the type: bug A code related bug. label May 7, 2024
@jszwedko
Copy link
Member

jszwedko commented May 7, 2024

Hi @beliys ,

I think your reduce condition is not quite right. What you have:

   starts_when:
     source: starts_with!(.message, " ")
     type: vrl

Would cause it to start merging the line after the one you actually want to start merging (that is it would start on the line that starts with a space). Instead, I think you likely want to reduce based on if the line starts with the level like:

   starts_when:
     source: starts_with!(.message, "Debug") || starts_with!(.message, "Info") # etc
     type: vrl

This will cause the reduce transform to start reducing the line it sees the log level up through the next line where there is a log level.

@beliys
Copy link
Author

beliys commented May 8, 2024

Hi, @jszwedko ,
thank you very much for your reply. So with source: starts_with!(.message, "Debug") || starts_with!(.message, "Info") the logs are collected, but they are cut. Instead of

Debug;myservice-6.40.268-dev-1-1565;;data.DB.Update;23360985601792;query: [
        UPDATE mytable SET myfield1 = 0,myfield2 = 0, myfield3 = true
        WHERE id=3416;
        ]

only

Debug;myservice-6.40.268-dev-1-1565;;data.DB.Update;23360985601792;query: [

are in the log. Maybe I chose the wrong function, but I would like a complete multi-line log in 1 record.

@jszwedko
Copy link
Member

jszwedko commented May 8, 2024

Hmm, can you share the new reduce transform configuration? That seems not what I expected. It should create a single (multi-line) log event.

@beliys
Copy link
Author

beliys commented May 8, 2024

data_dir: "/var/lib/vector"
expire_metrics_secs: 60
api:
 enabled: true
 address: 0.0.0.0:8686
 playground: true
sources:
 vector_agents:
   type: vector
   address: 0.0.0.0:9000
   version: "2"
transforms:
 prepare_logs:
   type: remap
   inputs:
     - vector_agents
   source: |-
     .body = del(.)
     .service = del(.body.kubernetes.container_name)
     .namespace = del(.body.kubernetes.pod_namespace)
     .node_name = del(.body.kubernetes.pod_node_name)
     .message = del(.body.message)
     .timestamp = del(.body.timestamp)
     del(.body)
 multiline_reduce:
   type: "reduce"
   inputs:
     - prepare_logs
   merge_strategies:
     .message: "concat_newline"
   starts_when:
     source: starts_with!(.message, "Debug") || starts_with!(.message, "Info")
     type: vrl
sinks:
 opensearch:
   type: elasticsearch
   api_version: v8
   inputs:
     - multiline_reduce
   endpoints:
     - https://127.0.0.1:9200
   auth:
     strategy: "basic"
     user: "vector"
     password: "myPass"
   compression: gzip
   batch:
     max_bytes: 200000
   distribution:
     retry_max_duration_secs: 300
   mode: "bulk"
   bulk:
     action: "create"
     index: "{{ .service }}-%Y.%m.%d"
   encoding:
     except_fields: ["index_name"]
   healthcheck: true
   tls:
     verify_hostname: true
     verify_certificate: false
   acknowledgements:
     enabled: true

@jszwedko
Copy link
Member

jszwedko commented May 8, 2024

@beliys Ah I think I see the issue: your logs will all be intermixed. reduce requires the stream to be ordered (a partition key like file can be specified). In your case, your source is a vector source so there is no guarantee of ordering from logs coming from upstream agents. I think you'll need to put this reduce transform in the agents that are doing the log collection and specify the https://vector.dev/docs/reference/configuration/transforms/reduce/#group_by option to ensure that it sees ordered streams to be able to reduce.

@beliys
Copy link
Author

beliys commented May 9, 2024

@jszwedko, thanks a lot. It works now. But now I don't really like the hard code of the beginning of the line. Is it possible to get rid of the fact that the term always does not start with a space?

@jszwedko
Copy link
Member

jszwedko commented May 9, 2024

Is it possible to get rid of the fact that the term always does not start with a space?

I'm not sure I understand. Do you mean check if the line starts with a space instead? You could do that with something like:

starts_when:
     source: !starts_with!(.message, " ")
     type: vrl

@beliys
Copy link
Author

beliys commented May 9, 2024

Thanks, looks like what's needed, but I'm getting an error:

Error : named tag handle cannot contain such characters at line 49, column 29:
            source: !starts_with!(.message, " ")
                                ^
Line : undefined  undefined

@jszwedko
Copy link
Member

jszwedko commented May 9, 2024

Thanks, looks like what's needed, but I'm getting an error:

Error : named tag handle cannot contain such characters at line 49, column 29:
            source: !starts_with!(.message, " ")
                                ^
Line : undefined  undefined

That error seems to be coming from somewhere other than Vector 🤔 .

@beliys
Copy link
Author

beliys commented May 9, 2024

yes, it's a yaml validator inside helm

@jszwedko
Copy link
Member

jszwedko commented May 9, 2024

yes, it's a yaml validator inside helm

Maybe you need to quote it then?

@beliys
Copy link
Author

beliys commented May 9, 2024

Many thanks. Everything works as expected. I'm happy)

@beliys beliys closed this as completed May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants