Skip to content
/ tel Public

Cloud log framework easily helps to handle 3 log pillars (Logging, tracing and monitoring). All-In-One. Inspired by open telemetry.

License

Notifications You must be signed in to change notification settings

tel-io/tel

Repository files navigation

Telemetry.V2 Otel Protocol

Framework which aims to ease logging affair: Logs, Traces and Metrics .

V2 version launch usage of OpenTelemetry specification for all logging directions. This mean that all logging propagators uses OTEL protocol.

Tel use zap.Logger as the heart of system. That why it’s pass all zap functions through.

Motto

Ony context all logs.

Decrease external dependencies as match as possible.

Features

All-In-One

Library establish connection via GRPC OTLP protocol with opentelemetry-collector-contrib (official OTEL server) and send logs, traces and metrics. Collector by the way distribute them to Loki, Tempo and Prometheus or any other services which you prefer and which collector support. Furthermore, we prepared for you working dashboards in ./__grafana folder which created for our middlewares for most popular servers and clients.

Logs

Our goal to support logfmt format for Loki viewing. Via simple zap library interface.

By the way, you can enrich log attributes which should be written only when whey would really need

// create copy of ctx which we enrich with log attributes
cxt := tel.Global().Copy().Ctx()

// pass ctx in controller->store->root layers and enrich information
err := func (ctx context.Context) error{
	tel.FromCtx(ctx).PutAttr(extract...)
	return return fmt.Errorf("some error")
}(ctx)

// and you write log message only when it really needed
// with all putted attribute via ctx from ALL layers earlier
// No need to look previous info/debug messages
// All needed information in one message with all attributes which you already added, but which would be writen only when you really do call `Error()`, `Info()`, `Debug()` and so on
//
// for example: only when you got error
if err != nil{
	tel.FromCtx(ctx).Error("error happened", tel.Error(err))
}
Trace

Library simplify usage with creating Spans of trace

Also, you can send not only logs and also encroach trace events

	span, ctx := tel.StartSpanFromContext(req.Context(), "my trace")
	defer span.End()

	tel.FromCtx(ctx).Info("this message will be saved both in LogInstance and trace",
		// and this will come to the trace as attribute
		tel.Int("code", errCode))
Metrics

Simplify working with metrics

	m := tel.Global().Meter("github.com/MyRepo/MyLib/myInstrumenation")

	requestLatency, err := m.SyncFloat64().Histogram("demo_client.request_latency",
		instrument.WithDescription("The latency of requests processed"))
	if err != nil {
		t.Fatal("metric load error", tel.Error(err))
	}

    ...
    start := time.Now()
    ....

    ms := float64(time.Now().Sub(start).Microseconds())

    requestLatency.Record(ctx, ms,
        attribute.String("userID", "e64916d9-bfd0-4f79-8ee3-847f2d034d20"),
        attribute.Int("orderID", 1),
    )
Middleware
  • Recovery flow

  • Instantiate new copy of tel for further handler

  • Basic metrics with respective dashboard for grafana

  • Trace propagation

    • client part - send (inject) current trace span to the server

    • server part - read (extract) trace and create new trace child one (or absolutly new if no trace info was provided or this info where not properly wrapped via propagator protocol of OTEL specification)

Logging stack

Logging data exported via OTEL’s GRPC protocol. tel developed to trespass it via open-telemetry collector which should route log data up to any desired log receivers.

Keep in mind that collector has plugin version collector contrib - this is gateway-adapter to numerous protocols which not yet support OTEL, for example grafana loki.

For instance, you can use opentelemetry-collector-contrib as tel receiver and route logging data to Grafana Loki, trace data to Grafana Tempo and metric data to Prometheus + Grafana ;)

Grafana references feature

loki to tempo

tel approach to put traceID field with actual trace ID. All our middlewares should do that or developer should do it by himself

Just call UpdateTraceFields before write some logs

tel.UpdateTraceFields(ctx)

understood grafana should setup derivedFields for Loki data source

  - name: Loki
    type: loki
    url: http://loki:3100
    uid: loki
    jsonData:
      derivedFields:
        - datasourceUid: tempo
          matcherRegex: "traceID=(\\w+)"
          name: trace
          url: '$${__value.raw}'

tempo to loki

We match tempo with loki by service_name label. All logs should contain traceID by any key form and service_name. In grafana tempo datasource should be configured with tracesToLogs

prometheus to loki

  - name: Tempo
    type: tempo
    access: proxy
    orgId: 1
    url: http://tempo:3200
    basicAuth: false
    isDefault: false
    version: 1
    editable: false
    apiVersion: 1
    uid: tempo
    jsonData:
      nodeGraph:
        enabled: true
      tracesToLogs:
        datasourceUid: loki
        filterBySpanID: false
        filterByTraceID: true
        mapTagNamesEnabled: false
        tags:
          - service_name

Install

go get github.com/tel-io/tel/v2@latest

collector

OTEL collector configuration (labels) part of setup, this mean if you not properly setup it - you wouldn’t be able to see appropriate result

link:docker/otel-collector-config.yaml[role=include]

Features

  • OTEL logs implementation

Env

OTEL_SERVICE_NAME

service name

type: string

NAMESPACE

project namespace

type: string

DEPLOY_ENVIRONMENT

ENUM: dev, stage, prod

type: string

LOG_LEVEL

info log

type: string NOTE: debug, info, warn, error, dpanic, panic, fatal

LOG_ENCODE

valid options: console and json or "none"

none - disable print to console (only OTEL or critical errors)

DEBUG

for IsDebug() function

type: bool

MONITOR_ENABLE

default: true

MONITOR_ADDR

address where health, prometheus would be listen

Note
address logic represented in net.Listen description
OTEL_ENABLE

default: true

OTEL_COLLECTOR_GRPC_ADDR

Address to otel collector server via GRPC protocol

OTEL_EXPORTER_WITH_INSECURE

With insecure …​

OTEL_ENABLE_COMPRESSION

default: true

Enables gzip compression for grpc connections

OTEL_METRIC_PERIODIC_INTERVAL_SEC

default: "15"

Interval metrics gathered

OTEL_COLLECTOR_TLS_SERVER_NAME

Check server certificate DNS name given from server.

Disable OTEL_EXPORTER_WITH_INSECURE if set

LOGGING_OTEL_CLIENT

default: false

required OTEL_ENABLE = true

Inject logger adapter to otel library related to grpc client and get log information related to this transport

LOGGING_OTEL_PROCESSOR

default: false

required OTEL_ENABLE = true

Inject logger adapter to otel processor library related to collectors behaviour

LOGS_ENABLE_RETRY

default: false

Enable retrying to send logs to collector.

LOGS_SYNC_INTERVAL

default: 1s

Limit how often logs are flushed with level.Error.

Example: 1s means allowed 1 flush per second if logs have level.Error.

LOGS_MAX_MESSAGE_SIZE

default: 256

Limit message size. If limit is exceeded, message is truncated.

LOGS_MAX_MESSAGES_PER_SECOND

default: 100

Limit rate of messages per second. If limit is exceeded, warning is logged and messages are dropped. Value 0 disables limit.

LOGS_MAX_LEVEL_MESSAGES_PER_SECOND

default: ``

The same as LOGS_MAX_MESSAGES_PER_SECOND but allows to configure limit per level.

Value format: <level1>=<n>,<level2>=<n>. Ex: LOGS_MAX_LEVEL_MESSAGES_PER_SECOND="error=0,info=100"

TRACES_ENABLE_RETRY

default: false

Enable retrying to send traces to collector.

TRACES_SAMPLER

default: statustraceidratio:0.1

Set sampling strategy. There are options: - never - always - traceidratio:<float64> - statustraceidratio:<float64>

where <float64> is required and valid floating point number from 0.0 to 1.0

TRACES_ENABLE_SPAN_TRACK_LOG_MESSAGE

default: false

Enable adding all log messages to active span as event.

TRACES_ENABLE_SPAN_TRACK_LOG_FIELDS

default: true

Enable adding all log fields to active span as attributes.

TRACES_CARDINALITY_DETECTOR_ENABLE

default: true

Enable cardinality check for span names.

TRACES_CARDINALITY_DETECTOR_MAX_CARDINALITY

default: 0

Limit cardinality of span’s attributes. Not used, so default value is 0.

TRACES_CARDINALITY_DETECTOR_MAX_INSTRUMENTS

default: 500

Limit the number of unique span names.

TRACES_CARDINALITY_DETECTOR_DIAGNOSTIC_INTERVAL

default: 10m

Enable diagnostic loop that checks for cardinality violations and logs a warning.

You can disable it by setting the value to 0.

METRICS_ENABLE_RETRY

default: false

Enable retrying to send metrics to collector.

METRICS_CARDINALITY_DETECTOR_ENABLE

default: true

Enable cardinality check for metrics' labels.

METRICS_CARDINALITY_DETECTOR_MAX_CARDINALITY

default: 100

Limit cardinality of metric’s labels. If limit is exceeded, metric is ignored, but the previous metrics work as before

METRICS_CARDINALITY_DETECTOR_MAX_INSTRUMENTS

default: 500

Limit the number of unique metric names (without labels. only name).

METRICS_CARDINALITY_DETECTOR_DIAGNOSTIC_INTERVAL

default: 10m

Enable diagnostic loop that checks for cardinality violations and logs a warning.

You can disable it by setting the value to 0.

OTEL_COLLECTOR_TLS_CA_CERT

TLS CA certificate body

OTEL_COLLECTOR_TLS_CLIENT_CERT

TLS client certificate

OTEL_COLLECTOR_TLS_CLIENT_KEY

TLS client key

OTEL_RESOURCE_ATTRIBUTES

This optional variable, handled by open-telemetry SDK. Separator is semicolon. Put additional resources variables, very suitable!

ToDo

  • ❏ Expose health check to specific metric

  • ❏ Duplicate trace messages for root - ztrace.New just add to chain tree

Usage

Tale look in example/demo folder.

About

Cloud log framework easily helps to handle 3 log pillars (Logging, tracing and monitoring). All-In-One. Inspired by open telemetry.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published