util/network-devp2p: fix possible deadlock caused by conflicting lock order #11769

BurtonQin · 2020-06-06T14:46:51Z

Similar to #11766

This PR fixes possible deadlock caused by conflicting lock order:
Basically, I found three pairs of lock that may have varying lock orders:

reserved_nodes->handlers:
762-825,1118-1122,977-978 conflict with 851-865,851-876,1095-1098.

session->handlers:
763-825 conflicts with 851-850, 851-869.

sessions->handlers:
953-959 conflicts with 851-853.

Note that write lock has a higher priority than read lock. So the following code can lead to deadlock:

Thread-A	Thread-B	Thread-C	Thread-D
	a.read()	b.read()
b.write()			a.write()
	b.read()	a.read()

B holds a.read(), C holds b.read():
a.read() in C waits for a.write() (high priority) in D, which in turn waits for a.read() in B.
Similarly, b.read() in B waits for b.write() (high priority) in A, which in turn waits for b.read() in C.

The fix is to enforce the order.

… order

rakita · 2020-08-27T09:56:52Z

Hello, I didn't dive deep enough to approve this, but conflicting lock order on its own is a bad code smell. I wanted to just point out when this can happen and put it into context.

My reasoning:
1: It is already stated that for this to happens we need two RwLocks and both of them at some point need to lock both read and write and I totally agree. (And of course, those locks need to be intervened between themselves, and order of locks should be in question, and you already prove that order is conflicting but let's leave this for later)

2: Mutexes in question are: reserved_nodes, sessions, handlers

3: The approach that I used to check this is to find all write locks (do a search for some_mutex.write()) on those mutexes and see when they happen in execution.

Result are:

handlers.write() is used only in:
- fn message when we get NetworkIoMessage::AddHandler (This happens only once on application initialization)
reserved_nodes.write() is used in:
- fn add_reserved_node is used in
  - Host::new called only once in initialization of app
  - fn add_reserved_peer and this is called from parity_addReservedPeer API
- fn remove_reserved_node is called from
  - fn remove_reserved_peer is called from parity_removeReservedPeer API
session.write() is called multiple times
- fn create_connection called from fn connect_peer and fn accept. And there is few more functions behind that, but we can say that this write is called on new peer.
- fn deregister_stream called from fn kill_connection and fn session_writable and same as create_connection this can be called at any time.

Summary:

handlers.write() is not called at all (except in initialiation).
reserved_nodes.write() is called only from API's (parity_addReservedPeer and parity_removeReservedPeer)
session.write() is called on peer connection/disconnection and this can be called at any time.

Conclusion
This deadlock can only happen when peer disconnects/connects and at the same time somebody calls API for reserved peers.

util/network-devp2p: fix possible deadlock caused by conflicting lock…

7c49b58

… order

dvdplm requested a review from vorot93 June 8, 2020 12:06

sorpaas approved these changes Jul 29, 2020

View reviewed changes

sorpaas added A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust. labels Jul 29, 2020

ngotchac mentioned this pull request Aug 26, 2020

Client occasionally stops importing new blocks #11758

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

util/network-devp2p: fix possible deadlock caused by conflicting lock order #11769

util/network-devp2p: fix possible deadlock caused by conflicting lock order #11769

BurtonQin commented Jun 6, 2020 •

edited

rakita commented Aug 27, 2020

util/network-devp2p: fix possible deadlock caused by conflicting lock order #11769

Are you sure you want to change the base?

util/network-devp2p: fix possible deadlock caused by conflicting lock order #11769

Conversation

BurtonQin commented Jun 6, 2020 • edited

rakita commented Aug 27, 2020

BurtonQin commented Jun 6, 2020 •

edited