Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[message-pool] evict message from coap pending request queue if send queue is empty #9984

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sarveshkumarv3
Copy link
Contributor

If the send queue is empty, currently the messages are not evicted and this could cause received messages to be dropped when too many pending address query notifications. The changes made are to evict the message from head of the coap pending request queue if send queue is empty and new buffer cannot be allocated

Copy link

size-report bot commented Apr 2, 2024

Size Report of OpenThread

Merging #9984 into main(0e36799).

name branch text data bss total
ot-cli-ftd main 467112 760 66364 534236
#9984 467240 760 66364 534364
+/- +128 0 0 +128
ot-ncp-ftd main 436380 760 61576 498716
#9984 436516 760 61576 498852
+/- +136 0 0 +136
libopenthread-ftd.a main 236152 0 40310 276462
#9984 236270 0 40310 276580
+/- +118 0 0 +118
libopenthread-cli-ftd.a main 57406 0 8075 65481
#9984 57406 0 8075 65481
+/- 0 0 0 0
libopenthread-ncp-ftd.a main 31857 0 5916 37773
#9984 31857 0 5916 37773
+/- 0 0 0 0
ot-cli-mtd main 364888 760 51220 416868
#9984 365008 760 51220 416988
+/- +120 0 0 +120
ot-ncp-mtd main 347420 760 46448 394628
#9984 347564 760 46448 394772
+/- +144 0 0 +144
libopenthread-mtd.a main 158411 0 25182 183593
#9984 158529 0 25182 183711
+/- +118 0 0 +118
libopenthread-cli-mtd.a main 39787 0 8059 47846
#9984 39787 0 8059 47846
+/- 0 0 0 0
libopenthread-ncp-mtd.a main 24737 0 5916 30653
#9984 24737 0 5916 30653
+/- 0 0 0 0
ot-cli-ftd-br main 536064 768 131076 667908
#9984 536208 768 131076 668052
+/- +144 0 0 +144
libopenthread-ftd-br.a main 299705 5 104998 404708
#9984 299823 5 104998 404826
+/- +118 0 0 +118
libopenthread-cli-ftd-br.a main 71069 0 8099 79168
#9984 71069 0 8099 79168
+/- 0 0 0 0
ot-rcp main 62248 564 20604 83416
#9984 62248 564 20604 83416
+/- 0 0 0 0
libopenthread-rcp.a main 9542 0 5052 14594
#9984 9542 0 5052 14594
+/- 0 0 0 0
libopenthread-radio.a main 18821 0 214 19035
#9984 18821 0 214 19035
+/- 0 0 0 0

Copy link
Member

@abtink abtink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks @sarveshkumarv3.
Some suggestions below:

src/core/coap/coap.cpp Outdated Show resolved Hide resolved
src/core/coap/coap.cpp Outdated Show resolved Hide resolved
src/core/coap/coap.cpp Outdated Show resolved Hide resolved
src/core/coap/coap.hpp Show resolved Hide resolved
src/core/coap/coap.hpp Outdated Show resolved Hide resolved
src/core/coap/coap.hpp Outdated Show resolved Hide resolved
src/core/common/message.cpp Outdated Show resolved Hide resolved
src/core/coap/coap.cpp Outdated Show resolved Hide resolved
@abtink
Copy link
Member

abtink commented Apr 2, 2024

Thinking more this PR and #9949, some aspects to consider:

TMF Eviction Impact: This PR allows any message (regardless of priority) to possibly evict a TMF agent request. Since TMF messages are generally high priority themselves, should we add checks for such evictions (e.g., based on priority)?

Investigating Address Query/Notification Overload: From #9949, this change aims to handle situations where a device receives many address queries, leading to frequent TMF address notifications. It may help to delve deeper into this and understand it better (address the root cause):

  • Query Origin: Are address queries coming from a single device/BR or multiple sources?
  • Rate Limiting: If queries originate from the same device, are existing query rate-limiting mechanisms functioning correctly? Are the limits appropriate?
  • CoAP Acknowledgement: TMF address notification messages are high-priority, confirmable CoAP messages. Upon receiving a CoAP acknowledgement, they should be swiftly removed from the request queue. Why is ack message being delayed, leading to request pile-up? (particularly since device seem to keep receiving query message from the same sender?)

@sarveshkumarv3 sarveshkumarv3 force-pushed the evict_coap_msg_ver2 branch 3 times, most recently from 4250361 to 63ce30c Compare April 2, 2024 21:34
@sarveshkumarv3
Copy link
Contributor Author

Thinking more this PR and #9949, some aspects to consider:

TMF Eviction Impact: This PR allows any message (regardless of priority) to possibly evict a TMF agent request. Since TMF messages are generally high priority themselves, should we add checks for such evictions (e.g., based on priority)?

thanks for your inputs@abtink!

The eviction ordering is to first evict low priority messages in send queue, high priority messages in send queue and only if that is empty to evict TMF agent request right? Also we evict from the head of the request queue, which means TMF agent messages are consuming lot of messages and one at the head is already in the buffers for a long time (In the log, I see around 26s to go through all the retransmission attempts)?

Please let me know if anything else can be prioritized ahead of TMF agent messages? The other thought is MLE queue or MPL queues. The reassembly queues anyway have a limit of 2s after which it would be discarded?

Investigating Address Query/Notification Overload: From #9949, this change aims to handle situations where a device receives many address queries, leading to frequent TMF address notifications. It may help to delve deeper into this and understand it better (address the root cause):

  • Query Origin: Are address queries coming from a single device/BR or multiple sources?
  • Rate Limiting: If queries originate from the same device, are existing query rate-limiting mechanisms functioning correctly? Are the limits appropriate?
  • CoAP Acknowledgement: TMF address notification messages are high-priority, confirmable CoAP messages. Upon receiving a CoAP acknowledgement, they should be swiftly removed from the request queue. Why is ack message being delayed, leading to request pile-up? (particularly since device seem to keep receiving query message from the same sender?)

Yes, I agree, from the given logs, it is not clear why we see so frequent address queries from the accessory (with no visibility here), but I suspect there is some underlying matter stack related issue, causing multiple accessories to go into a crash-loop (running out of buffers) and hence do not handle the AddressNotify.

The changes here would still help to mitigate the Border Router being able to receive from other accessories (protecting its RX buffers) in case we encounter scenario of misbehaving accessories?

Copy link
Member

@abtink abtink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarveshkumarv3 for submitting the PR and all the updates LGTM.

Couple of smaller suggestions below.

On this:

Also we evict from the head of the request queue, which means TMF agent messages are consuming lot of messages and one at the head is already in the buffers for a long time (In the log, I see around 26s to go through all the retransmission attempts)?

I cannot think of any good way to use the priority when deciding to evict from TMF agent queue. So the approach in the PR sounds reaonable.

src/core/coap/coap.cpp Outdated Show resolved Hide resolved
src/core/common/message.cpp Outdated Show resolved Hide resolved
@sarveshkumarv3
Copy link
Contributor Author

Thanks @sarveshkumarv3 for submitting the PR and all the updates LGTM.

Couple of smaller suggestions below.

On this:

Also we evict from the head of the request queue, which means TMF agent messages are consuming lot of messages and one at the head is already in the buffers for a long time (In the log, I see around 26s to go through all the retransmission attempts)?

I cannot think of any good way to use the priority when deciding to evict from TMF agent queue. So the approach in the PR sounds reaonable.

thanks for the inputs @abtink, incorporated the changes now

Copy link

codecov bot commented Apr 3, 2024

Codecov Report

Attention: Patch coverage is 86.66667% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 82.32%. Comparing base (1fb1d9a) to head (5710ebb).
Report is 5 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #9984       +/-   ##
===========================================
+ Coverage   68.31%   82.32%   +14.01%     
===========================================
  Files         494      549       +55     
  Lines       57933    68748    +10815     
===========================================
+ Hits        39577    56600    +17023     
+ Misses      18356    12148     -6208     
Files Coverage Δ
src/core/coap/coap.hpp 84.61% <ø> (+60.61%) ⬆️
src/core/common/message.cpp 95.88% <100.00%> (+1.30%) ⬆️
src/core/coap/coap.cpp 81.22% <75.00%> (+35.15%) ⬆️

... and 361 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants