Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gk: convert all batched lookups into coroutines #370

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

AltraMayor
Copy link
Owner

GK blocks batch lookups to the flow table and the LPM table. This pull request reverts these batched lookups back to single lookups and moves the code into coroutines.

@AltraMayor AltraMayor added this to the First deployment milestone Nov 7, 2019
Copy link
Owner Author

@AltraMayor AltraMayor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gk: add support to coroutines

gk/main.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@mengxiang0811 mengxiang0811 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gk: add support to coroutines

gk/co.c Outdated Show resolved Hide resolved
gk/co.c Outdated Show resolved Hide resolved
lua/gk.lua Outdated Show resolved Hide resolved
gk/co.c Outdated Show resolved Hide resolved
gk/co.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@mengxiang0811 mengxiang0811 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gk: add support to coroutines

gk/co.h Outdated Show resolved Hide resolved
gk/co.c Outdated Show resolved Hide resolved
gk/co.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@mengxiang0811 mengxiang0811 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gk: move processing of front packets to coroutines

gk/co.c Show resolved Hide resolved
gk/co.c Outdated Show resolved Hide resolved
@AltraMayor
Copy link
Owner Author

The code has incorporated all the feedback.

Copy link
Collaborator

@mengxiang0811 mengxiang0811 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gk: add support to coroutines

gk/co.h Outdated Show resolved Hide resolved
@mengxiang0811
Copy link
Collaborator

This pull request needs to resolve the conflicts.

Library CORO implements coroutines.
@mengxiang0811
Copy link
Collaborator

Below is the performance evaluation of the master branch with our updated scripts:

| lcores | table size | GK Mpps rcvd (0%) | GK Mpps rcvd (50%) | GK Mpps rcvd (99%) | GK Mpps rcvd (mean) | cli Mpps sent (0%) | cli Mpps sent (50%) | cli Mpps sent (99%) | cli Mpps sent (mean) |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  1  |  2^15  |       5.58        |        8.28        |        8.29        |        7.32         |       10.16        |        10.73        |        11.47        |         10.8         | 
|  1  |  2^20  |       6.8        |        7.58        |        7.89        |        7.48         |       9.04        |        10.29        |        11.4        |         10.34         | 
|  1  |  2^25  |       3.79        |        5.43        |        5.62        |        5.0         |       9.75        |        10.44        |        11.13        |         10.45         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  2  |  2^15  |       10.32        |        10.51        |        10.7        |        10.5         |       9.82        |        10.58        |        11.08        |         10.58         | 
|  2  |  2^20  |       9.71        |        10.19        |        10.6        |        10.15         |       9.61        |        10.44        |        11.15        |         10.45         | 
|  2  |  2^25  |       5.85        |        8.3        |        10.0        |        7.96         |       9.0        |        10.37        |        11.09        |         10.32         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  3  |  2^15  |       9.36        |        9.49        |        9.62        |        9.5         |       9.24        |        9.72        |        10.58        |         9.75         | 
|  3  |  2^20  |       8.34        |        8.79        |        9.71        |        8.84         |       8.92        |        9.74        |        10.52        |         9.77         | 
|  3  |  2^25  |       6.21        |        7.95        |        8.98        |        7.68         |       9.05        |        9.84        |        10.49        |         9.8         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|

@mengxiang0811
Copy link
Collaborator

Below is the performance evaluation after applying this pull request:

| lcores | table size | GK Mpps rcvd (0%) | GK Mpps rcvd (50%) | GK Mpps rcvd (99%) | GK Mpps rcvd (mean) | cli Mpps sent (0%) | cli Mpps sent (50%) | cli Mpps sent (99%) | cli Mpps sent (mean) |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  1  |  2^15  |       4.62        |        6.47        |        6.48        |        6.04         |       9.36        |        10.52        |        11.17        |         10.45         | 
|  1  |  2^20  |       3.73        |        4.15        |        4.35        |        4.11         |       9.61        |        10.53        |        11.12        |         10.53         | 
|  1  |  2^25  |       2.19        |        2.38        |        3.35        |        2.59         |       9.62        |        10.6        |        11.16        |         10.54         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  2  |  2^15  |       10.14        |        10.42        |        10.5        |        10.39         |       9.8        |        10.48        |        10.95        |         10.45         | 
|  2  |  2^20  |       8.57        |        9.53        |        10.11        |        9.43         |       9.72        |        10.37        |        10.96        |         10.36         | 
|  2  |  2^25  |       3.75        |        4.75        |        6.4        |        4.81         |       9.65        |        10.3        |        11.12        |         10.34         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  3  |  2^15  |       9.12        |        9.63        |        10.04        |        9.59         |       8.91        |        9.85        |        10.56        |         9.8         | 
|  3  |  2^20  |       8.27        |        8.82        |        9.17        |        8.76         |       9.14        |        9.89        |        10.7        |         9.88         | 
|  3  |  2^25  |       4.14        |        7.07        |        9.16        |        7.04         |       9.17        |        9.81        |        10.54        |         9.83         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|

@mengxiang0811
Copy link
Collaborator

mengxiang0811 commented Nov 27, 2019

Below is the CPU cycles profiling with 3 lcores + 2^25 flow entries:

Samples: 668K of event 'cycles:ppp', Event count (approx.): 8729181903458, Thread: lcore-slave-12
  Children      Self  Command         Shared Object       Symbol                                                                                                                 ◆
+    3.97%     0.00%  lcore-slave-12  gatekeeper          [.] eal_thread_loop                                                                                                    ▒
+    2.75%     1.10%  lcore-slave-12  gatekeeper          [.] gk_proc                                                                                                            ▒
+    2.57%     0.00%  lcore-slave-12  [unknown]           [.] 0x00007f38d4bfb480                                                                                                 ▒
+    2.09%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d4c0                                                                                                 ▒
+    1.95%     1.94%  lcore-slave-12  gatekeeper          [.] rte_hash_lookup_with_hash                                                                                          ▒
+    1.73%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d500                                                                                                 ▒
+    1.54%     1.54%  lcore-slave-12  gatekeeper          [.] ip_flow_cmp_eq                                                                                                     ▒
+    1.28%     1.26%  lcore-slave-12  gatekeeper          [.] rte_hash_cuckoo_make_space_mw                                                                                      ▒
+    1.19%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d540                                                                                                 ▒
+    0.99%     0.00%  lcore-slave-12  gatekeeper          [.] rte_hash_k16_cmp_eq                                                                                                ▒
     0.92%     0.90%  lcore-slave-12  gatekeeper          [.] gk_co_process_front_pkt_final                                                                                      ▒
+    0.91%     0.82%  lcore-slave-12  libc-2.27.so        [.] __memset_sse2_unaligned_erms                                                                                       ▒
     0.80%     0.79%  lcore-slave-12  gatekeeper          [.] gk_co_process_front_pkt                                                                                            ▒
+    0.75%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d580                                                                                                 ▒
+    0.70%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000002203000a                                                                                                 ▒
+    0.61%     0.00%  lcore-slave-12  [unknown]           [.] 0000000000000000                                                                                                   ▒
     0.60%     0.54%  lcore-slave-12  gatekeeper          [.] ixgbe_recv_pkts_vec                                                                                                ▒
+    0.56%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d5c0                                                                                                 ▒
+    0.53%     0.53%  lcore-slave-12  gatekeeper          [.] __rte_hash_add_key_with_hash                                                                                       ▒
     0.53%     0.53%  lcore-slave-12  libc-2.27.so        [.] __memcmp_sse4_1                                                                                                    ▒
     0.53%     0.52%  lcore-slave-12  gatekeeper          [.] gk_co_scan_flow_table                                                                                              ▒
     0.50%     0.48%  lcore-slave-12  gatekeeper          [.] gk_process_request.isra.9                                                                                          ▒
     0.50%     0.46%  lcore-slave-12  gatekeeper          [.] coro_transfer                                                                                                      ▒
     0.46%     0.00%  lcore-slave-12  [unknown]           [.] 0x00000040c0000073                                                                                                 ▒
     0.43%     0.43%  lcore-slave-12  gatekeeper          [.] gk_co_main                                                                                                         ▒
     0.42%     0.42%  lcore-slave-12  [kernel.kallsyms]   [k] nmi                                                                                                                ▒
     0.41%     0.00%  lcore-slave-12  [unknown]           [.] 0x00000000312e3030                                                                                                 ▒
     0.41%     0.00%  lcore-slave-12  [unknown]           [.] 0x0000000000000001                                                                                                 ▒
     0.39%     0.37%  lcore-slave-12  gatekeeper          [.] encapsulate                                                                                                        ▒
     0.35%     0.34%  lcore-slave-12  gatekeeper          [.] process_flow_entry                                                                                                 ▒
     0.35%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d600                                                                                                 ▒
     0.32%     0.32%  lcore-slave-12  gatekeeper          [.] rte_hash_prefetch_buckets_non_temporal                                                                             ▒
     0.29%     0.29%  lcore-slave-12  gatekeeper          [.] __rte_hash_del_key_with_hash                                                                                       ▒
     0.28%     0.00%  lcore-slave-12  [unknown]           [k] 0x000000011120d4e8                                                                                                 ▒
     0.24%     0.15%  lcore-slave-12  gatekeeper          [.] process_cmds_from_mailbox                                                                                          ▒
     0.18%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d640                                                                                                 ▒
     0.18%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d528                                                                                                 ▒
     0.15%     0.15%  lcore-slave-12  gatekeeper          [.] adjust_pkt_len                                                                                                     ▒
     0.11%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d680                                                                                                 ▒
     0.11%     0.10%  lcore-slave-12  gatekeeper          [.] pkt_copy_cached_eth_header                                                                                         ▒
     0.10%     0.07%  lcore-slave-12  gatekeeper          [.] common_ring_mp_enqueue                                                                                             ▒
     0.09%     0.09%  lcore-slave-12  gatekeeper          [.] send_pkts 

Below is the memory loads profiling with 3 lcores + 2^25 flow entries:

Samples: 2M of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 288164965, Thread: lcore-slave-12
Overhead  Command         Shared Object      Symbol                                                                                                                              ◆
   3.52%  lcore-slave-12  gatekeeper         [.] ixgbe_recv_pkts_vec                                                                                                             ▒
   3.38%  lcore-slave-12  gatekeeper         [.] rte_hash_cuckoo_make_space_mw                                                                                                   ▒
   3.16%  lcore-slave-12  gatekeeper         [.] gk_co_process_front_pkt                                                                                                         ▒
   2.93%  lcore-slave-12  gatekeeper         [.] rte_hash_prefetch_buckets_non_temporal                                                                                          ▒
   1.38%  lcore-slave-12  gatekeeper         [.] coro_transfer                                                                                                                   ▒
   1.24%  lcore-slave-12  gatekeeper         [.] gk_co_process_front_pkt_final                                                                                                   ▒
   1.16%  lcore-slave-12  gatekeeper         [.] rte_hash_lookup_with_hash                                                                                                       ▒
   1.04%  lcore-slave-12  gatekeeper         [.] gk_proc                                                                                                                         ▒
   0.99%  lcore-slave-12  libc-2.27.so       [.] __memcmp_sse4_1                                                                                                                 ▒
   0.89%  lcore-slave-12  gatekeeper         [.] __rte_hash_del_key_with_hash                                                                                                    ▒
   0.87%  lcore-slave-12  gatekeeper         [.] __rte_hash_add_key_with_hash                                                                                                    ▒
   0.85%  lcore-slave-12  gatekeeper         [.] gk_process_request.isra.9                                                                                                       ▒
   0.81%  lcore-slave-12  gatekeeper         [.] common_ring_mc_dequeue                                                                                                          ▒
   0.75%  lcore-slave-12  gatekeeper         [.] gk_co_scan_flow_table                                                                                                           ▒
   0.55%  lcore-slave-12  gatekeeper         [.] gk_solicitor_enqueue_bulk                                                                                                       ▒
   0.41%  lcore-slave-12  gatekeeper         [.] ixgbe_recv_scattered_pkts_vec                                                                                                   ▒
   0.36%  lcore-slave-12  gatekeeper         [.] process_flow_entry                                                                                                              ▒
   0.34%  lcore-slave-12  gatekeeper         [.] process_cmds_from_mailbox                                                                                                       ▒
   0.28%  lcore-slave-12  gatekeeper         [.] encapsulate                                                                                                                     ▒
   0.16%  lcore-slave-12  libc-2.27.so       [.] __memmove_sse2_unaligned_erms                                                                                                   ▒
   0.15%  lcore-slave-12  gatekeeper         [.] memcmp@plt                                                                                                                      ▒
   0.09%  lcore-slave-12  gatekeeper         [.] adjust_pkt_len                                                                                                                  ▒
   0.06%  lcore-slave-12  gatekeeper         [.] common_ring_mp_enqueue                                                                                                          ▒
   0.04%  lcore-slave-12  gatekeeper         [.] pkt_copy_cached_eth_header                                                                                                      ▒
   0.03%  lcore-slave-12  gatekeeper         [.] lpm_lookup_ipv4                                                                                                                 ▒
   0.03%  lcore-slave-12  gatekeeper         [.] gk_co_main                                                                                                                      ▒
   0.01%  lcore-slave-12  gatekeeper         [.] ip_flow_cmp_eq                                                                                                                  ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] rcu_check_callbacks                                                                                                             ▒
   0.00%  lcore-slave-12  gatekeeper         [.] gk_co_scan_flow_table_final                                                                                                     ▒
   0.00%  lcore-slave-12  gatekeeper         [.] memcpy@plt                                                                                                                      ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] account_user_time                                                                                                               ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] update_load_avg                                                                                                                 ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] hrtimer_active                                                                                                                  ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] smp_apic_timer_interrupt                                                                                                        ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] ktime_get_update_offsets_now                                                                                                    ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] update_curr                                                                                                                     ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] __acct_update_integrals                                                                                                         ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] perf_mux_hrtimer_handler                                                                                                        ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] __indirect_thunk_start                                                                                                          ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] ktime_get                                                                                                                       ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] cpuacct_account_field                                                                                                           ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] rcu_irq_enter

Below is the stats with 3 lcores + 2^25 flow entries:

    1871042.695725      task-clock (msec)         #    5.764 CPUs utilized          
            63,216      context-switches          #    0.034 K/sec                  
                 7      cpu-migrations            #    0.000 K/sec                  
             2,980      page-faults               #    0.002 K/sec                  
 5,587,194,847,661      cycles                    #    2.986 GHz                      (33.33%)
 2,968,010,189,245      stalled-cycles-frontend   #   53.12% frontend cycles idle     (33.33%)
 7,681,911,254,771      instructions              #    1.37  insn per cycle         
                                                  #    0.39  stalled cycles per insn  (40.00%)
 1,052,629,516,672      branches                  #  562.590 M/sec                    (40.00%)
    14,861,588,607      branch-misses             #    1.41% of all branches          (40.00%)
 2,583,453,994,051      L1-dcache-loads           # 1380.756 M/sec                    (39.99%)
    62,738,934,810      L1-dcache-load-misses     #    2.43% of all L1-dcache hits    (13.33%)
    25,734,384,853      LLC-loads                 #   13.754 M/sec                    (13.33%)
    11,696,300,564      LLC-load-misses           #   45.45% of all LL-cache hits     (20.00%)
   <not supported>      L1-icache-loads                                             
       653,660,469      L1-icache-load-misses                                         (26.67%)
 2,583,931,932,672      dTLB-loads                # 1381.012 M/sec                    (26.67%)
    11,710,806,623      dTLB-load-misses          #    0.45% of all dTLB cache hits   (13.33%)
     1,281,604,483      iTLB-loads                #    0.685 M/sec                    (13.33%)
       133,702,153      iTLB-load-misses          #   10.43% of all iTLB cache hits   (20.00%)
   <not supported>      L1-dcache-prefetches                                        
    11,946,665,954      L1-dcache-prefetch-misses #    6.385 M/sec                    (26.67%)

     324.605504881 seconds time elapsed

Note that the above profiling results are after applying the pull request.

This patch gets everything set up to do the work of gk_proc()
inside coroutines, but no work is actually moved into coroutines.
The following patches gradually move the work of gk_proc() into
coroutines.
@mengxiang0811
Copy link
Collaborator

Below is the performance evaluation after applying the latest updated pull request:

| lcores | table size | GK Mpps rcvd (0%) | GK Mpps rcvd (50%) | GK Mpps rcvd (99%) | GK Mpps rcvd (mean) | cli Mpps sent (0%) | cli Mpps sent (50%) | cli Mpps sent (99%) | cli Mpps sent (mean) |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  1  |  2^15  |       4.73        |        4.73        |        4.82        |        4.74         |       9.19        |        10.3        |        10.89        |         10.31         | 
|  1  |  2^20  |       4.22        |        4.34        |        4.42        |        4.33         |       9.54        |        10.28        |        10.91        |         10.29         | 
|  1  |  2^25  |       1.64        |        2.6        |        2.86        |        2.54         |       8.83        |        9.54        |        10.58        |         9.65         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  2  |  2^15  |       3.98        |        7.96        |        9.08        |        7.59         |       9.32        |        10.15        |        11.04        |         10.22         | 
|  2  |  2^20  |       6.05        |        6.59        |        6.91        |        6.54         |       9.2        |        10.13        |        10.81        |         10.1         | 
|  2  |  2^25  |       3.35        |        4.51        |        4.94        |        4.4         |       9.6        |        10.16        |        10.91        |         10.22         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  3  |  2^15  |       6.39        |        9.52        |        10.04        |        8.84         |       9.05        |        10.05        |        10.85        |         9.98         | 
|  3  |  2^20  |       7.0        |        8.19        |        8.71        |        8.1         |       9.21        |        10.14        |        10.78        |         10.1         | 
|  3  |  2^25  |       3.66        |        4.44        |        7.37        |        4.89         |       8.73        |        10.08        |        10.9        |         10.07         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|

Below is the CPU cycles profiling with 3 lcores + 2^25 flow entries:

Samples: 750K of event 'cycles:ppp', Event count (approx.): 8654593548709, Thread: lcore-slave-12
  Children      Self  Command         Shared Object       Symbol                                                                                                                 ▒
+    3.16%     0.00%  lcore-slave-12  [kernel.kallsyms]   [k] ret_from_intr                                                                                                      ◆
+    3.16%     0.00%  lcore-slave-12  [kernel.kallsyms]   [k] do_IRQ                                                                                                             ▒
+    3.13%     0.00%  lcore-slave-12  [kernel.kallsyms]   [k] irq_exit                                                                                                           ▒
+    3.13%     0.00%  lcore-slave-12  [kernel.kallsyms]   [k] __softirqentry_text_start                                                                                          ▒
+    3.12%     0.06%  lcore-slave-12  [kernel.kallsyms]   [k] net_rx_action                                                                                                      ▒
+    2.96%     0.23%  lcore-slave-12  [kernel.kallsyms]   [k] ixgbe_poll                                                                                                         ▒
+    2.88%     0.00%  lcore-slave-12  gatekeeper          [.] eal_thread_loop                                                                                                    ▒
+    2.70%     0.10%  lcore-slave-12  [kernel.kallsyms]   [k] napi_consume_skb                                                                                                   ▒
+    2.58%     0.02%  lcore-slave-12  [kernel.kallsyms]   [k] skb_release_all                                                                                                    ▒
+    2.40%     0.00%  lcore-slave-12  [unknown]           [.] 0x00007f1b9a56d3e0                                                                                                 ▒
+    2.38%     1.72%  lcore-slave-12  gatekeeper          [.] rte_hash_lookup_and_yield_with_hash                                                                                ▒
+    2.05%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d4c0                                                                                                 ▒
+    2.00%     0.72%  lcore-slave-12  gatekeeper          [.] gk_proc                                                                                                            ▒
+    1.92%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d500                                                                                                 ▒
+    1.51%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d540                                                                                                 ▒
+    1.42%     0.36%  lcore-slave-12  [kernel.kallsyms]   [k] skb_release_data                                                                                                   ▒
+    1.36%     1.01%  lcore-slave-12  gatekeeper          [.] rte_hash_cuckoo_make_space_mw                                                                                      ▒
+    1.32%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d580                                                                                                 ▒
+    1.08%     0.25%  lcore-slave-12  [kernel.kallsyms]   [k] skb_release_head_state                                                                                             ▒
+    1.07%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d5c0                                                                                                 ▒
+    1.07%     0.01%  lcore-slave-12  [kernel.kallsyms]   [k] skb_free_head                                                                                                      ▒
+    1.06%     0.29%  lcore-slave-12  [kernel.kallsyms]   [k] kfree                                                                                                              ▒
     0.97%     0.72%  lcore-slave-12  gatekeeper          [.] gk_co_process_front_pkt_final                                                                                      ▒
     0.94%     0.72%  lcore-slave-12  gatekeeper          [.] gk_co_process_front_pkt                                                                                            ▒
+    0.83%     0.00%  lcore-slave-12  [unknown]           [.] 0000000000000000                                                                                                   ▒
+    0.81%     0.07%  lcore-slave-12  [kernel.kallsyms]   [k] __slab_free                                                                                                        ▒
     0.77%     0.68%  lcore-slave-12  [kernel.kallsyms]   [k] sock_wfree                                                                                                         ▒
     0.77%     0.59%  lcore-slave-12  gatekeeper          [.] coro_transfer                                                                                                      ▒
+    0.77%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d600                                                                                                 ▒
     0.71%     0.71%  lcore-slave-12  [kernel.kallsyms]   [k] cmpxchg_double_slab.isra.61                                                                                        ▒
+    0.70%     0.56%  lcore-slave-12  gatekeeper          [.] gk_process_request.isra.10                                                                                         ▒
+    0.68%     0.52%  lcore-slave-12  libc-2.27.so        [.] __memset_sse2_unaligned_erms                                                                                       ▒
     0.62%     0.46%  lcore-slave-12  gatekeeper          [.] gk_co_scan_flow_table                                                                                              ▒
+    0.59%     0.49%  lcore-slave-12  gatekeeper          [.] ip_flow_cmp_eq                                                                                                     ▒
     0.53%     0.35%  lcore-slave-12  gatekeeper          [.] ixgbe_recv_pkts_vec                                                                                                ▒
+    0.51%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d640                                                                                                 ▒
+    0.50%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000012203000a                                                                                                 ▒
     0.49%     0.00%  lcore-slave-12  [unknown]           [.] 0x00000000312e3030                                                                                                 ▒
     0.49%     0.00%  lcore-slave-12  [unknown]           [.] 0x0000000000000001                                                                                                 ▒
     0.43%     0.31%  lcore-slave-12  gatekeeper          [.] encapsulate                                                                                                        ▒
     0.42%     0.32%  lcore-slave-12  gatekeeper          [.] process_flow_entry                                                                                                 ▒
     0.41%     0.00%  lcore-slave-12  [unknown]           [.] 0x00000040c0000073 

Below is the memory loads profiling with 3 lcores + 2^25 flow entries:

Samples: 2M of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 275410232, Thread: lcore-slave-12
Overhead  Command         Shared Object      Symbol                                                                                                                              ◆
   4.71%  lcore-slave-12  gatekeeper         [.] rte_hash_cuckoo_make_space_mw                                                                                                   ▒
   2.52%  lcore-slave-12  gatekeeper         [.] ixgbe_recv_pkts_vec                                                                                                             ▒
   1.83%  lcore-slave-12  gatekeeper         [.] gk_co_process_front_pkt                                                                                                         ▒
   1.57%  lcore-slave-12  gatekeeper         [.] rte_hash_lookup_and_yield_with_hash                                                                                             ▒
   1.40%  lcore-slave-12  gatekeeper         [.] coro_transfer                                                                                                                   ▒
   1.24%  lcore-slave-12  gatekeeper         [.] prefetch_and_yield                                                                                                              ▒
   1.16%  lcore-slave-12  gatekeeper         [.] __rte_hash_add_key_with_hash                                                                                                    ▒
   0.89%  lcore-slave-12  [kernel.kallsyms]  [k] skb_release_data                                                                                                                ▒
   0.87%  lcore-slave-12  gatekeeper         [.] gk_process_request.isra.10                                                                                                      ▒
   0.83%  lcore-slave-12  [kernel.kallsyms]  [k] sock_def_write_space                                                                                                            ▒
   0.81%  lcore-slave-12  gatekeeper         [.] gk_proc                                                                                                                         ▒
   0.80%  lcore-slave-12  gatekeeper         [.] gk_co_process_front_pkt_final                                                                                                   ▒
   0.75%  lcore-slave-12  gatekeeper         [.] gk_co_scan_flow_table                                                                                                           ▒
   0.58%  lcore-slave-12  [kernel.kallsyms]  [k] sock_wfree                                                                                                                      ▒
   0.56%  lcore-slave-12  gatekeeper         [.] common_ring_mc_dequeue                                                                                                          ▒
   0.53%  lcore-slave-12  [kernel.kallsyms]  [k] skb_release_head_state                                                                                                          ▒
   0.52%  lcore-slave-12  libc-2.27.so       [.] __memcmp_sse4_1                                                                                                                 ▒
   0.46%  lcore-slave-12  [kernel.kallsyms]  [k] __slab_free                                                                                                                     ▒
   0.42%  lcore-slave-12  [kernel.kallsyms]  [k] __indirect_thunk_start                                                                                                          ▒
   0.37%  lcore-slave-12  gatekeeper         [.] __rte_hash_del_key_with_hash                                                                                                    ▒
   0.34%  lcore-slave-12  gatekeeper         [.] process_flow_entry                                                                                                              ▒
   0.32%  lcore-slave-12  [kernel.kallsyms]  [k] cmpxchg_double_slab.isra.61                                                                                                     ▒
   0.30%  lcore-slave-12  gatekeeper         [.] gk_solicitor_enqueue_bulk                                                                                                       ▒
   0.28%  lcore-slave-12  gatekeeper         [.] process_cmds_from_mailbox                                                                                                       ▒
   0.27%  lcore-slave-12  gatekeeper         [.] ixgbe_recv_scattered_pkts_vec                                                                                                   ▒
   0.26%  lcore-slave-12  [kernel.kallsyms]  [k] napi_consume_skb                                                                                                                ▒
   0.25%  lcore-slave-12  libc-2.27.so       [.] __memmove_sse2_unaligned_erms                                                                                                   ▒
   0.24%  lcore-slave-12  [kernel.kallsyms]  [k] ixgbe_poll                                                                                                                      ▒
   0.20%  lcore-slave-12  gatekeeper         [.] encapsulate                                                                                                                     ▒
   0.18%  lcore-slave-12  [kernel.kallsyms]  [k] kfree                                                                                                                           ▒
   0.07%  lcore-slave-12  gatekeeper         [.] gk_co_main                                                                                                                      ▒
   0.06%  lcore-slave-12  [kernel.kallsyms]  [k] kmem_cache_free_bulk                                                                                                            ▒
   0.06%  lcore-slave-12  gatekeeper         [.] common_ring_mp_enqueue                                                                                                          ▒
   0.06%  lcore-slave-12  gatekeeper         [.] adjust_pkt_len                                                                                                                  ▒
   0.06%  lcore-slave-12  gatekeeper         [.] pkt_copy_cached_eth_header                                                                                                      ▒
   0.05%  lcore-slave-12  [kernel.kallsyms]  [k] dql_completed                                                                                                                   ▒
   0.04%  lcore-slave-12  gatekeeper         [.] memcmp@plt                                                                                                                      ▒
   0.03%  lcore-slave-12  [kernel.kallsyms]  [k] skb_release_all                                                                                                                 ▒
   0.02%  lcore-slave-12  gatekeeper         [.] lpm_lookup_ipv4                                                                                                                 ▒
   0.02%  lcore-slave-12  [kernel.kallsyms]  [k] do_IRQ                                                                                                                          ▒
   0.02%  lcore-slave-12  [kernel.kallsyms]  [k] common_interrupt                                                                                                                ▒
   0.02%  lcore-slave-12  [kernel.kallsyms]  [k] napi_schedule_prep

Below is the stats with 3 lcores + 2^25 flow entries:


    1867274.876021      task-clock (msec)         #    5.776 CPUs utilized          
           103,235      context-switches          #    0.055 K/sec                  
                 7      cpu-migrations            #    0.000 K/sec                  
             2,974      page-faults               #    0.002 K/sec                  
 5,575,698,245,139      cycles                    #    2.986 GHz                      (33.33%)
 2,965,248,807,024      stalled-cycles-frontend   #   53.18% frontend cycles idle     (33.33%)
 7,532,386,362,044      instructions              #    1.35  insn per cycle         
                                                  #    0.39  stalled cycles per insn  (40.00%)
 1,042,765,025,951      branches                  #  558.442 M/sec                    (40.00%)
    18,869,190,878      branch-misses             #    1.81% of all branches          (40.00%)
 2,573,455,686,225      L1-dcache-loads           # 1378.188 M/sec                    (39.99%)
    69,274,863,575      L1-dcache-load-misses     #    2.69% of all L1-dcache hits    (13.33%)
    26,623,265,673      LLC-loads                 #   14.258 M/sec                    (13.34%)
    11,355,842,090      LLC-load-misses           #   42.65% of all LL-cache hits     (20.00%)
   <not supported>      L1-icache-loads                                             
     1,549,898,302      L1-icache-load-misses                                         (26.67%)
 2,573,731,016,293      dTLB-loads                # 1378.335 M/sec                    (26.67%)
    10,794,095,805      dTLB-load-misses          #    0.42% of all dTLB cache hits   (13.34%)
     1,685,548,829      iTLB-loads                #    0.903 M/sec                    (13.33%)
       559,210,349      iTLB-load-misses          #   33.18% of all iTLB cache hits   (20.00%)
   <not supported>      L1-dcache-prefetches                                        
    12,222,916,777      L1-dcache-prefetch-misses #    6.546 M/sec                    (26.67%)

     323.295308910 seconds time elapsed

This patch moves the following work to coroutines:
1. scanning of the flow table;
2. processing of front packets.

Besides moving the processing of front packets to coroutines
this patch streamlines the code to better fit the new model.
For example, this patch simplifies the parameters of
process_flow_entry() and its subordinate functions:
gk_process_request(), gk_process_granted(), gk_process_declined(),
and gk_process_bpf().
This patch prefetches the transmission fields of a packet when
it is ready to be prepared for transmission.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants