Skip to content

Commit

Permalink
doc: add symbol upload protocol specification (#31)
Browse files Browse the repository at this point in the history
Co-authored-by: Christos Kalkanis <[email protected]>
  • Loading branch information
athre0z and christos68k committed May 16, 2024
1 parent 3b12f0d commit 494f1b8
Show file tree
Hide file tree
Showing 2 changed files with 347 additions and 0 deletions.
98 changes: 98 additions & 0 deletions docs/symb-proto/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
Elastic symbolization protocol
==============================

## `symbfile` format

`symbfile` is our custom file format for efficiently storing large amounts of
symbol information. A symbfile is a concatenation of length- and message-type
prefixed protobuf messages. The purpose of a symbfile is to provide a mapping
from each address in an executable to the corresponding function name, file name
and line number, including support for inline functions. One or more symbfiles
always describe just one executable. Recording associations of one or more
symbfiles with an executable (e.g. via a file hash) is left to other components.

We currently use two different symbol information representations:

- **Range based records ([`RangeV1`])**\
These map an ELF virtual address range to symbol information and a `depth` integer that
determines the depth within an inline chain. Inline chains are flattened into
multiple overlapping range records. To determine the inline trace for any
given address, the user would sweep though the whole symbfile and collect all
ranges that contain the desired address and then order the resulting range
records by their `depth` field. This presents the ground truth for symbol
information.
- **Return pad records ([`ReturnPadV1`])**\
These map a single address to the symbols of a full inline trace. We generate
such records for each instruction following a `call`. The idea here is that
when building stack traces all but the last frame will have addresses that
point to such return addresses. Special casing these allows the symbolization
service to proactively insert all symbols for all non-leaf frames which then
massively reduces the amount of frames that need to be symbolized lazily.

While the symbfile format would generally also allow mixing both record types
into a single file, we currently always generate a separate symbfile per record
kind.

More details about the format itself can be found in the documentation comments
of the [protobuf definition][symbfile-proto].

[`RangeV1`]: ./symbfile.proto#L120
[`ReturnPadV1`]: ./symbfile.proto#L212
[symbfile-proto]: ./symbfile.proto

## REST API

Symbfiles are uploaded via a REST API. The `symbtool push-symbols` command
extracts and uploads at least two symbol files ("ranges" and "returnpads") to
the symbolization service via HTTP(S). The "ranges" files that contain non-leaf
frame symbol information are uploaded via `/api/symbols-ranges`. The "return
pads" files that contain leaf frame symbol information are uploaded via
`/api/symbols-returnpads`. Symbfiles may be split and uploaded in multiple
chunks (in separate HTTP requests) for improved load balancing in the presence
of muliple symbolizer services.

### File metadata (request)

While the binary file data forms the request body, the needed file metadata is
set as HTTP headers. We use the following HTTP headers:

| Header | Description | Example |
| -------------- | ---------------------- | ------------------------------------------------------------------------------------ |
| FileID | Base64 encoded FileID | `FileID: d--nFqkSpJIXRFeHMp_Smg` |
| FilePart | Part number 0..N-1 | `FilePart: 1` |
| FileParts | Number of parts N, | `FileParts: 5` |
| Content-Length | Length of body | `Content-Length: 735912` |
| Authorization | Contains an API key | `Authorization: APIKey QzJqQ1Q0WUI1NlR0QVl4NTlZcXg6Y0xhcFN1S2tTSXlyTFlNTUloclJvdw==` |

### Response

The response to an upload request is JSON formatted.

A successful upload sets the HTTP status code to 200. The response body looks
like this

```json
{
"success": true,
"status": 200
}
```

In the failure case, the HTTP status code is 4xx or 5xx. The response body
explains the failure in greater detail, for example:

```json
{
"success": false,
"uuid": "f1ada52e-d705-423f-a1b0-fc054eb8900e",
"error": {
"Code": "1000",
"Text": "Something went wrong on our side."
},
"status": 400
}
```

`uuid` allows logically connecting user reports and logs: error reports from
the user that contain the UUID allow finding the logs needed for
investigation and debugging.
249 changes: 249 additions & 0 deletions docs/symb-proto/symbfile.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
syntax = "proto3";
package symbfile;

// # General format
//
// The file starts with a magic byte sequence of "symbfile" (8 bytes).
// Following this magic, all further data is stored as protobuf messages
// that are prefixed with:
//
// - a variable-length integer indicating the length of the message
// - a variable-length integer indicating the message type
//
// The variable-length integers are encoded using the same algorithm as `uint32`
// in the protobuf wire protocol. The first message must always be of kind
// `Header`.
//
// # Example file
//
// ```text
// ┃ File offset ┃ Contents ┃ Comment ┃
// ┣━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
// ┃ [0x000, 0x008) ┃ "symbfile" ┃ File magic ┃
// ┃ [0x008, 0x009) ┃ 0x00 ┃ Length of the empty header ┃
// ┃ [0x009, 0x010) ┃ 0x01 ┃ Message type = MT_HEADER ┃
// ┃ [0x010, 0x011) ┃ 0x21 ┃ Length of the next message ┃
// ┃ [0x011, 0x012) ┃ 0x02 ┃ Message type = MT_RANGE_V1 ┃
// ┃ [0x012, 0x033) ┃ [0x21 bytes of protobuf data] ┃ Protobuf serialized RangeV1 message ┃
// ┃ [0x033, 0x035) ┃ 0x80, 0x03 ┃ Length of the next message ┃
// ┃ [0x035, 0x036) ┃ 0x02 ┃ Message type = MT_RANGE_V1 ┃
// ┃ [0x036, 0x137) ┃ [0x101 bytes of protobuf data] ┃ Protobuf serialized RangeV1 message ┃
// ┃ EOF ┃ N/A ┃ Parsing ends gracefully ┃
// ```
//
// # Design goals
//
// - Size-efficiently store symbolization information in a format that can
// efficiently (memory, compute) be ingested into a database.
// - Simple: should be able to implement a reader and writer in a day
// - Allow for lazy/streamed reading and insertion into our DB tables
// - No forward references
// - Preferably little state to be kept in memory during reading
// - Good compressibility with zstd
// - Paying some extra memory and computation overhead during writing is
// acceptable if it improves compressibility or read performance
// - Creating files doesn't have to be streamable
// - Good forward and backward compatibility
// - Served by global service that will have a wide range of different
// symbolizer versions asking it for symbols
// - Symbol data can remain relevant for many years

// Header message. Currently empty.
message Header {}

// Defines the type of a message.
enum MessageType {
// Sentinel value to ensure that an uninitialized field of this type can
// be told from one initialized to the first valid value.
MT_INVALID = 0;

// The message is of type `Header`.
MT_HEADER = 1;

// The message is of type `RangeV1`.
MT_RANGE_V1 = 2;

// The message is of type `ReturnPadV1`.
MT_RETURN_PAD_V1 = 3;

// The message is of type `StringTableV1`.
MT_STRING_TABLE_V1 = 4;
}

// Symbol information for a range of instructions.
//
// The ranges essentially represent a flattened interval tree of inline functions.
//
// Consider the following source file `main.c`:
//
// ```text
// 01 ┃ #include <stdio.h>
// 02 ┃ #include <stdlib.h>
// 03 ┃ #include <string.h>
// 04 ┃
// 05 ┃ int main() {
// 06 ┃ char* s = strdup("Hello, world!");
// 07 ┃ puts(s);
// 08 ┃ free(s);
// 09 ┃ return 0;
// 10 ┃ }
// ```
//
// Imagine the compiler optimized the binary as follows, inlining the `strdup`,
// `puts` and `free` functions into `main` and then additionally inlining `malloc`
// and `strcpy` into the `strdup` inline instance (thus transitively also into
// `main`).
//
// ```text
// Depth
// 2 ┃ [ malloc ][ strcpy ]
// 1 ┃ [ strdup ] [ puts ] [ free ]
// 0 ┃ [ main ]
// ━━╋━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━┷━━━━━ ELF VA
// 0x000 0x100 0x200
// ```
//
// The corresponding `Range` records would look like this (arbitrary order):
//
// ```text
// ┃ elf VA ┃ length ┃ func ┃ file ┃ call line ┃ depth ┃
// ┣━━━━━━━━╋━━━━━━━━╋━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━╋━━━━━━━┫
// ┃ 0x0000 ┃ 0x0210 ┃ main ┃ /home/bob/main.c ┃ 0 ┃ 0 ┃
// ┃ 0x000A ┃ 0x0100 ┃ strdup ┃ /lib/glibc/sdup.c ┃ 6 ┃ 1 ┃
// ┃ 0x0010 ┃ 0x0081 ┃ malloc ┃ /lib/glibc/mm.c ┃ 1234 ┃ 2 ┃
// ┃ 0x0010 ┃ 0x0073 ┃ strcpy ┃ /lib/glibc/scpy.c ┃ 1240 ┃ 2 ┃
// ┃ 0x0110 ┃ 0x0084 ┃ puts ┃ /lib/glibc/io.c ┃ 7 ┃ 1 ┃
// ┃ 0x017C ┃ 0x006A ┃ free ┃ /lib/glibc/io.c ┃ 8 ┃ 1 ┃
// ```
message RangeV1 {
// Start address of the instruction range, in ELF virtual address space.
oneof elfVA {
// Update ELF VA with an offset relative to the
// previous range or return pad record's ELF VA.
sint64 deltaElfVA = 1;

// Set ELF VA to a new absolute value.
uint64 setElfVA = 12;
}

// Length of the instruction sequence.
uint64 length = 2;

// Demangled name of the function.
oneof func {
string funcStr = 3;
uint32 funcRef = 9;
}

// Source file that these instructions were generated from.
oneof file {
string fileStr = 4;
uint32 fileRef = 10;
}

// Absolute line number of the call to the inline function. 0 if `depth` is 0.
uint32 callLine = 5;

// The file that issued the call to the inline function. 0 if `depth` is 0 or
// the call file is equal to the file of the parent record record (`depth - 1`).
oneof callFile {
string callFileStr = 6;
uint32 callFileRef = 11;
}

// Depth in the inline function tree, starting at 0 for the top-level function.
uint32 depth = 7;

// Line table for this executable range.
LineTable lineTable = 8;
}

// Columnar array mapping range offsets to line numbers.
//
// The line table only contains information for ranges that aren't
// covered by other inline functions.
message LineTable {
// Byte offset from the range start address.
//
// The first offset is encoded relative to the `elfVA` field of the containing
// `RangeV1` struct, all following offsets are relative to the previous offset
// (cumulative sum). The addresses constructed via the cumulative sum then
// denote the start of a source line mapping range that either spans to the next
// record or to the end of the range (`elfVa` + `length`).
//
// For example, the following range
//
// ```text
// Range {
// elf_va: 0x123,
// length: 0x20,
// // [other fields omitted]
// lineTable: LineTable {
// offset: [0x3, 0x10],
// lineNumber: [10, 53],
// },
// }
// ```
//
// corresponds to the following table of address mappings:
//
// ```text
// ┃ Range ┃ Line number ┃
// ┣━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
// ┃ [0x123..0x126) ┃ Unknown. Hole in table, likely covered by inline instances ┃
// ┃ [0x126..0x136) ┃ Line 10 ┃
// ┃ [0x136..0x143) ┃ Line 53 ┃
// ```
repeated uint32 offset = 1;

// Line number in the source file.
repeated uint32 lineNumber = 2;
}

// Information about an instruction that can serve as the next instruction after
// a call instruction. These are instructions that can show up as non-leaf
// frames in stack traces, which is why we special-case them.
//
// This message stores the whole inline function stack for the given address in
// a columnar fashion. The records are ordered by their depth in the inline
// stack in ascending order (top-level function is first).
message ReturnPadV1 {
// Address of the return pad, in ELF virtual address space.
//
// This is the address of the instruction following a call, minus 1. This
// offset is applied to be consistent with the addresses sent by the host
// agent: because disassembling backwards is a hard problem, the host agent
// simply subtracts 1 from the addresses of all non-leaf frames to indicate
// that we want traces for the previous instruction.
oneof elfVA {
// Update ELF VA with an offset relative to the
// previous range or return pad record's ELF VA.
sint64 deltaElfVA = 1;

// Set ELF VA to a new absolute value.
uint64 setElfVA = 5;
}

// Name of the function.
//
// Reference into the string table.
repeated uint32 func = 2;

// Source file that these instructions were generated from.
//
// Reference into the string table.
repeated uint32 file = 3;

// Absolute source line number.
//
// This is the call line for the first n-1 records and the line
// number corresponding to the return pad for the nth record.
repeated uint32 line = 4;
}

// Replace the string lookup table in the reader.
message StringTableV1 {
// New string table. String indices in other messages correspond to indices
// in this array.
repeated string strings = 1;
}

0 comments on commit 494f1b8

Please sign in to comment.