Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The uplink thread runs in a single process #302

Open
Rorsachach opened this issue Sep 22, 2022 · 6 comments
Open

The uplink thread runs in a single process #302

Rorsachach opened this issue Sep 22, 2022 · 6 comments

Comments

@Rorsachach
Copy link

The hardware I'm using:
network device: X722 for 10GbE SPF+ 37d0
CPU: Xeon(R) D-2177NT @ 1.90GHz
The driver Ihm using:
vfio-pci

here is my startup.conf

cpu {
    main-core 0
    workers 10
}

dpdk {
    dev default {
        num-rx-queues 5
        num-tx-queues 5
    }
}

When I ran the UPG and used 10Gbps of upstream and downstream traffic to measure the speed at the same time, the results were not ideal. Then I executed the show run and found that there was only one thread for processing the uplink data, while the thread for processing the downlink data varied depending on the traffic size and number of users. Could you tell me how to increase the speed of processing uplink data,please?

I'm sorry I can't paste the specific command result, because there is no Internet connection.

@RoadRunnr
Copy link
Member

a) that problem start with the NIC not distributing the load accross multiple CPU. It is generic VPP issue, you will need to ask the VPP community for help on that.
b) the UPF function currently has race conditions that are highly likely to crash VPP if you run it on multiple worker threads. Don't do that!

@Rorsachach
Copy link
Author

a) that problem start with the NIC not distributing the load accross multiple CPU. It is generic VPP issue, you will need to ask the VPP community for help on that.
b) the UPF function currently has race conditions that are highly likely to crash VPP if you run it on multiple worker threads. Don't do that!

Will the UPF crash with one main thread and one worker thread? I tried to configure the CPU this way, and it always crashed during PDU session delete. I found what looks like an error in the next code。

/* upf_pfcp.c */

void pfcp_free_session(upf_session_t *sx) {
    ......

    sparse_free_rules(sx->teid_by_chid);

    ......
}

Then I looked at the vpp definition of sparse_vec_free. Add the following code and recompile it.

mspace_is_heap_object(
    sparse_vec_header(sx->teid_by_chid),
    clib_mem_get_per_cpu_heap()
);

It is found that the deletion operation cannot find the vec in the heap corresponding to the current CPU.

Is this due to the introduction of multi-threading? Is the problem of upf or vpp? I would like to get your reply. Thank you.

@RoadRunnr
Copy link
Member

@sergeymatov it seems you where the one to last touch that piece of code, maybe you can comment on that?

To me it looks like the root problem must be somewhere else. sparse_vec is not a per CPU structure. It is IMHO more likely that something else has already free'd either the whole sx structure or only the teid_by_chid. In both case, the problem would be race condition between the management task and the work thread.

@sergeymatov
Copy link
Contributor

Sparse vector for TEID mapping should only be used (no matter it's read/write) in PFCP-related things. We currently running PFCP server on a main core while workers can not invoke modification of PFCP Session.
@Rorsachach you can try to add checkers if session or vector are actually exists before it's about to be freed and rise a clib_warning message with some like
clib_warning ("Invoking sparse vec free, thread %d", vlib_get_thread_index ()); to check threads activity

@Rorsachach
Copy link
Author

@sergeymatov Thank you for your reply. I ran some more tests.
I first compared the teid_by_chid generated by sparse_vec_new with the teid_by_chid passed in by sparse_vec_free. They are same.

Then I checked with clib_mem_is_heap_object(sparse_vec_header (sx->teid_by_chid)). The return value is sometimes true and sometimes false.

Then I run the upg with a single core and the same problem occurs

I compiled upg several times without changing any other parts of the code and found that sometimes it didn't crash and sometimes it did. So, the only thing I can be sure of is that sometimes the vector is not in the current cpu heap. But I don't know exactly what the problem is. I think the problem might be sparse_vec in vpp.

@mgumz mgumz changed the title The uplink thead runs in a single process The uplink thread runs in a single process Oct 14, 2022
@dibasdas02
Copy link

Any update on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants