faster print time (< 10ns) #2931

Theramar · 2023-11-05T12:35:55Z

Taken from fmtlog (emphasis mine)

The results on a linux server with "Intel(R) Xeon(R) Gold 6144 CPU @ 3.50GHz" is:

Message	fmtlog	Nanolog	spdlog
staticString	6.4 ns, 7.08 M/s	6.5 ns, 33.10 M/s	156.4 ns, 6.37 M/s
stringConcat	6.4 ns, 6.05 M/s	7.5 ns, 14.20 M/s	209.4 ns, 4.77 M/s
singleInteger	6.3 ns, 6.22 M/s	6.5 ns, 50.29 M/s	202.3 ns, 4.94 M/s
twoIntegers	6.4 ns, 4.87 M/s	6.6 ns, 39.25 M/s	257.2 ns, 3.89 M/s
singleDouble	6.2 ns, 5.37 M/s	6.5 ns, 39.62 M/s	225.0 ns, 4.44 M/s
complexFormat	6.4 ns, 2.95 M/s	6.7 ns, 24.30 M/s	390.9 ns, 2.56 M/s

Note that the throughput of Nanolog is not comparable here because it outputs to binary log file instead of human-readable text format, e.g. it saves an int64 timestamp instead of a long formatted date time string.

How can fmtlog achieve such low and stable latency? Two key optimization techniques are employed inspired by Nanolog:

One is allocating a single producer single consumer queue for each logging thread, and have the background thread polling for all these queues. This avoids threads contention and performance will not deteriorate when thread number increases. The queue is automatically created on the first log msg of a thread, so queue is not created for threads that don't use fmtlog. The thread queue has a default size of 1 MB(can be changed by macro FMTLOG_QUEUE_SIZE), and it takes a little time to allocate the queue. It's recommended that user actively calls fmt::preallocate() once the thread is created, so even the first log can have low latency.

What happens when the queue is full? By default, fmtlog simply dump addtional log msgs and return. Alternatively, front-end logging can be blocked while the queue is full by defining macro FMTLOG_BLOCK=1, then no log will be missing. User can register a callback function when log queue is full by fmtlog::setLogQFullCB(cb, userData), by which user can be aware that the consumer(polling thread) is not keeping up. Normally, the queue being full is seldom a problem, but incautious user could leave log statements that are invoked in an unexpected high frequency, e.g. a tcp client spamming with "Connection refused" errors without a connection retry delay. To handle this problem in an elegant way, fmtlog provides a log macro which limits frequency of this log: FMTLOG_LIMIT and 4 shortcuts logil, logwl and logel respectively, user needs to pass the mininum interval in nanosecond as the first argument, e.g.

logil(1e9, "this log will be displayed at most once per second");

The other optimization is that static information of a log(such as format string, log level and location) is saved in a table at its first call, and fmtlog simply pushes the index of the static info table entry with dynamic arguments to the queue, minimizing the msg size. In addition, fmtlog defines a decoding function for each log statment, which is invoked in fmtlog::poll()when the log msg is popped from the queue.

However, these decoding functions bloat program size with each function consuming around 50 bytes. In addition, the static infomation entry also consumes 50-ish bytes runtime memory for each log statement. Such memory overhead may not be worthwhile for those infrequent and latency insensitive logs(e.g. program initialization info), thus fmtlog provides user with another log macro which disables this optimization: FMTLOG_ONCE and of couse shortcuts: logdo, logio, logwoand logeo. FMTLOG_ONCE will not create a static info table entry, nor add a decoding function: it pushes static info along with formatted msg body onto the queue. Note that passing argument by pointer is not supported by `FMTLOG_ONCE`.

For those who prefer to further optimize memory usage by filtering log at compile time, macro FMTLOG_ACTIVE_LEVEL is applied with a default value FMTLOG_LEVEL_INF, meaning debug logs will simply be discarded at compile time. Note that FMTLOG_ACTIVE_LEVEL only applies to log shortcut macros, e.g. logi, but not FMTLOG. Similarly, runtime log level filtering can be disabled by defining macro FMTLOG_NO_CHECK_LEVEL, which will increase performance and reduce generated code size a bit.

Is there a way in `spdlog` to achieve similar speed? Or at least access the same set of features.

The text was updated successfully, but these errors were encountered:

gabime · 2023-11-05T12:58:26Z

spdlog async is much slower than synchronous spdlog. It is interesting to compare to the synchronous version of spdlog.

Theramar · 2023-11-05T14:31:18Z

Surely it would be nice however IIRC spdlog does not offer an binary encoding/decoding (see Nanolog) mechanism which can increase performance

Theramar · 2023-11-06T13:45:51Z

@gabime the benchmark shows synchronous spdlog, see MengRao/fmtlog#84

Theramar · 2023-11-08T19:37:31Z

any updates?

tt4g · 2023-11-08T21:38:35Z

I am not familiar with how fmtlog works, but it does not appear to support enhancements by developers.
spdlog is extensible in that developers can define subclasses of sink.
Instead, there is the overhead of calling virtual functions at runtime, which can have a significant impact on print speed (and thus limits compile-time optimization).

Most of fmtlog's optimizations are not likely to be imported into spdlog, but a PR that improves print speed would be welcome.

hanzz2007 · 2023-11-29T02:37:48Z

seems fmtlog compare with sync spdlog while itself use async mode inside. So that is unfair

Theramar mentioned this issue Nov 5, 2023

Improve benchmark: synchronous spdlog MengRao/fmtlog#84

Open

gabime closed this as completed May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster print time (< 10ns) #2931

faster print time (< 10ns) #2931

Theramar commented Nov 5, 2023 •

edited

Is there a way in `spdlog` to achieve similar speed? Or at least access the same set of features.

gabime commented Nov 5, 2023

Theramar commented Nov 5, 2023

Theramar commented Nov 6, 2023

Theramar commented Nov 8, 2023

tt4g commented Nov 8, 2023

hanzz2007 commented Nov 29, 2023 •

edited

faster print time (< 10ns) #2931

faster print time (< 10ns) #2931

Comments

Theramar commented Nov 5, 2023 • edited

Is there a way in spdlog to achieve similar speed? Or at least access the same set of features.

gabime commented Nov 5, 2023

Theramar commented Nov 5, 2023

Theramar commented Nov 6, 2023

Theramar commented Nov 8, 2023

tt4g commented Nov 8, 2023

hanzz2007 commented Nov 29, 2023 • edited

Theramar commented Nov 5, 2023 •

edited

Is there a way in `spdlog` to achieve similar speed? Or at least access the same set of features.

hanzz2007 commented Nov 29, 2023 •

edited