Performance improvements #1

victor-istomin · 2015-05-29T02:46:41Z

Hello,

I've tried this logger and found it pretty light and useful. However, I've found multi-threading concurrency bottleneck, so I'd like to improve this a bit.
Test code sample and benchmarking result will be provided below.

Mutex lock is not necessary when performing line formatting and conversion to UTF-8, so I have moved string manipulation out of the locked scope. This gave more concurrency level and about 3 times more performance on Windows with my test sample. Unfortunately, this change won't improve things on Linux.
Investigating at Linux I've found that application is still running "in one thread". Profiler shows spending a lot of time for mutex lock/unlock, not for disk IO. The problem is inside pthread_mutex_lock, which has too small spinlock iterations count before entering sleep state. I've added spinlock iterations manually, so code became parallel and 3 times faster in my sample in Linux too.
Omitting several ANSI/UTF-16 conversions and changing 'stream << char[1]' to 'stream << char' gave me another few percent of performance. This just removes redundant strlen() call on char[1]. Additionally strftime is about 30% faster on Windows than manual stringstream conversion, and it looks simpler.

Multithreading test code

    void threadFunction()
    {
        for (int i = 0; i < 800000; ++i)
            LOG(plog::debug) << "i=" << i << "clock:" << clock();
    }

    int main(int argc, char* argv[])
    {
        plog::init(plog::debug, "Hello.txt");
        std::vector<std::thread> threads(4);

        clock_t start = clock();

        for (size_t i = 0; i < threads.size(); ++i)
            threads[i] = std::thread(threadFunction);

        for (std::thread& thread : threads)
            thread.join();

        clock_t finish = clock();

        std::cout << "Time: " << finish - start << std::endl;

        return 0;
    }

Benchmarking results for Linux

Test system is Core i5-4300M with SSD. OS: Windows 10 host, VM: Unubtu Linux 14.04.2.

Original branch

top/iotop results: Disk write speed 4-5 MB/s, CPU 25-30% total (one core at 100%).

  victor@ubuntu:~/plog$ time ./test_master_o2
  Time: 106639234

  real  0m58.393s
  user  0m4.579s
  sys   1m42.063s

Performance improvement branch

top/iotop results: Disk write speed 17-22 MB/s, CPU 100% total

  victor@ubuntu:~/plog$ time ./test_perf_o2_3
  Time: 39805657

  real  0m11.434s
  user  0m16.325s
  sys   0m23.483s

Benchmarking results for Windows are very similar.

Thanks for review and feel free to ask additional questions if needed.
Victor.

SergiusTheBest · 2015-06-03T10:21:31Z

Interesting findings, great thanks.

stephane-martin · 2016-03-14T02:56:41Z

AFAIK pthread_yield does not exist on MacOSX.

cd /usr/include/pthread
grep -r yield *
pthread.h:void pthread_yield_np(void);
sched.h:extern int sched_yield(void);

SergiusTheBest · 2016-03-14T09:58:36Z

@stephane-martin

Yes, the linux man page says about pthread_yield:

This call is nonstandard, but present on several other systems. Use
the standardized sched_yield instead

Thanks for noticing it.

SergiusTheBest · 2016-03-14T10:14:33Z

I remember about performance improvements made by @victor-istomin and I also made some additional profiling and research. The results are very sad:

std::mutex and pthread_mutex implementations don't use spinning by default
C/C++ string conversion functions are very slow

I tried to use custom string conversion functions and got a near 5x performance boost!

Mutex lock is not necessary when performing line formatting and conversion to UTF-8, so I have moved string manipulation out of the locked scope. This gave more concurrency level and about 3 times more performance on Windows with my test sample.

victor-istomin added 9 commits May 27, 2015 02:05

Execution concurrency improved

9cda6d0

Less unicode conversions: Windows unicode performance +20%

6c7de68

Optimized time formatting: +30%

f718ae4

Fixed linux build

35a32bc

Cosmetics: tab to spaces

4af4794

Threading concurrency improved for pthreads

db4d44e

CXXFLAGS: -pthread

3e05ff9

CXXFLAGS: -pthread

2524df1

CXXFLAGS: -pthread

a1a9e7f

SergiusTheBest force-pushed the master branch from 393d175 to f4c22b0 Compare October 21, 2019 12:02

SergiusTheBest force-pushed the master branch from fb189bb to 82d73f5 Compare June 7, 2020 20:06

SergiusTheBest force-pushed the master branch from b4837f8 to d8461e9 Compare March 13, 2021 23:01

SergiusTheBest force-pushed the master branch 2 times, most recently from 7831b21 to e2650b8 Compare April 26, 2022 11:15

SergiusTheBest force-pushed the master branch from d47bdc8 to 3306953 Compare June 10, 2022 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #1

Performance improvements #1

victor-istomin commented May 29, 2015

SergiusTheBest commented Jun 3, 2015

stephane-martin commented Mar 14, 2016

SergiusTheBest commented Mar 14, 2016

SergiusTheBest commented Mar 14, 2016

Performance improvements #1

Are you sure you want to change the base?

Performance improvements #1

Conversation

victor-istomin commented May 29, 2015

Multithreading test code

Benchmarking results for Linux

Original branch

Performance improvement branch

SergiusTheBest commented Jun 3, 2015

stephane-martin commented Mar 14, 2016

SergiusTheBest commented Mar 14, 2016

SergiusTheBest commented Mar 14, 2016