Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other peer discovery mechanism #47

Open
mhchia opened this issue Aug 16, 2018 · 13 comments
Open

Other peer discovery mechanism #47

mhchia opened this issue Aug 16, 2018 · 13 comments
Labels
design Need design question Further information is requested

Comments

@mhchia
Copy link
Collaborator

mhchia commented Aug 16, 2018

What is wrong?

We use a global topic for nodes to broadcast their ShardPreference, informing nodes which nodes are listening in which shards. Even though a ShardPreference only occupies SHARD_COUNT bits, along with more bytes occupied by packet headers, it still might be an issue when the number of nodes in the sharding P2P network grows to a really big number.
It might also be a problem that, ShardPreference is not easily verified. Therefore, scams are not easily avoidable. A node can connect to the the node who just broadcast the ShardPreference and ask for Proof-of-Custody things to verify if the node actually listens to that shard. However, it is still quite tricky.

How can it be fixed?

Find other peer discovery approaches.
Possible options we already had in mind are

  • 2 DHTs, one is used dedicatedly to translate peerID to IP and port, while another one is used for peer discovery. Its key should correlate with shardID and possibly peerID(?), and the value is peerID.
    • It's just an idea. I'm not sure if it works or not.
  • rendezvous protocol
    • Is it quite experimental?
  • DHT providers with topics(example is here )
  • Others

Edit: added "scamming through ShardPreference channel" in "What is wrong?"

@mhchia mhchia added question Further information is requested design Need design later labels Aug 16, 2018
@mhchia mhchia mentioned this issue Aug 16, 2018
7 tasks
@mhchia mhchia removed the later label Sep 13, 2018
@mhchia
Copy link
Collaborator Author

mhchia commented Sep 13, 2018

Should be a good timing to start off investigating this.

@jrhea
Copy link
Contributor

jrhea commented Sep 15, 2018

I'm curious, how does a node decide what shard to join? Is it random, or do they join the shard that has the least number of participants?

@mhchia
Copy link
Collaborator Author

mhchia commented Sep 18, 2018

A node can choose what shard to join by their own will. For validators, they will be assigned specific shards to join by the beacon chain

@mhchia
Copy link
Collaborator Author

mhchia commented Sep 18, 2018

Jannik is working on this design. Reference: jannikluhn/sharding-netsim#3, jannikluhn/sharding-netsim#4

@raulk
Copy link
Contributor

raulk commented Sep 18, 2018

Quick note just to bring provider records into consideration. With go-libp2p-kad-dht, you can declare yourself as a provider of a CID (content ID).

Other nodes can look up providers for a given CID on the DHT. We could experiment with setting a value like: "eth:shard:" for the payload of the CID, hashed with whatever function and encoded in base58, or else.

Nodes can then lookup members "providing" membership in a shard using FindProviders: https://github.com/libp2p/go-libp2p-kad-dht/blob/master/routing.go#L456

I'll also enquire what the status of rendezvous is.

@jannikluhn
Copy link

@raulk Curious about this, could you please elaborate a little on how this works? I'm guessing the DHT maps CIDs to a list of node ids? I briefly thought about something like this, but it seemed a bit weird (and potentially dangerous) to me that there would be nodes that know about all nodes in a single shard.

@jrhea
Copy link
Contributor

jrhea commented Sep 19, 2018

A node can choose what shard to join by their own will. For validators, they will be assigned specific shards to join by the beacon chain

Ok so if nodes can join a shard of their choosing, how do you ensure that there are enough nodes in a shard? Will each type of client (i.e. Nimbus, PegaSys, etc) implement different logic for choosing a shard to join, or will they all just initially select a shard to join at random?

@mhchia
Copy link
Collaborator Author

mhchia commented Sep 20, 2018

@jrhea

Ok so if nodes can join a shard of their choosing, how do you ensure that there are enough nodes in a shard?

I think that is what "shard load balancing" wants to solve, but IMO currently we don't have a specific approach. Something which might mitigate this is, we can also let clients connect to a random shard by default.

Will each type of client (i.e. Nimbus, PegaSys, etc) implement different logic for choosing a shard to join, or will they all just initially select a shard to join at random?

I think it might be possible, maybe we can have a consensus on how to do this later.

@jrhea
Copy link
Contributor

jrhea commented Sep 20, 2018

@mhchia and @jannikluhn, I was thinking about a scheme for deciding what shard for a client to join...

  • peer is as defined in libP2P where peer.id = SHA256(peer.pubkey)
  • c is the number of shards

a client performs the following calculation to determine what shard to join:

peer.id mod c

Even if c isn't a factor of 2^256 the bias would be so low (on the order of 2^-256) that it would be undetectable.

Benefits:

  • shard topics would be evenly populated by clients
  • determining the shard a peer belongs to could be calculated instead of relying on other methods

I haven't thought about how to manage the scenario when a client needs to switch shards, but peer.id would have to account for it somehow.

@jannikluhn
Copy link

I think in most cases users should decide manually what shard they want to join (mostly because they are interested in a particular contract on that shard). "Forcing" them to a different shard would not be a good solution. What one could do is "soft load balancing", i.e. if nodes don't have a preference because they are joining for the first time suggest a different shard with the lowest gas price.

To ensure that shards aren't empty (especially in the beginning when usage is low), I see two viable options:

  • have one (or more) bootstrapping nodes for each shard that can serve at least the number of validators assigned to that shard at all times
  • make validators by default connect to a random shard (in addition to their validation assignment). That should ensure roughly a 1:1 ratio of validators to static nodes.

@jrhea
Copy link
Contributor

jrhea commented Sep 21, 2018

@jannikluhn thanks for clearing that up - great explanation. If you don't mind, I have a couple of follow-up questions.

I think in most cases users should decide manually what shard they want to join (mostly because they are interested in a particular contract on that shard).

How does the user know what shard a contract is on? Is it that info stored on the main chain, will they find out by asking members of a global topic, or something else?

make validators by default connect to a random shard (in addition to their validation assignment). That should ensure roughly a 1:1 ratio of validators to static nodes.

What is the definition of a 'static node'? Again, thanks for the response and sorry for all the questions.

@jannikluhn
Copy link

How does the user know what shard a contract is on?

I'd imagine it to be the same way users know contract addresses today, the shard id would just be an additional prefix to the address. So mostly "off-chain", but name resolvers on some shard or the main chain are also possible.

What is the definition of a 'static node'?

I meant nodes that don't change their shards frequently (to distinguish from validators).

@raulk
Copy link
Contributor

raulk commented Sep 24, 2018

@jannikluhn

Curious about this, could you please elaborate a little on how this works? I'm guessing the DHT maps CIDs to a list of node ids? I briefly thought about something like this, but it seemed a bit weird (and potentially dangerous) to me that there would be nodes that know about all nodes in a single shard.

I like to think of the "provider" entry like a "symlink" in the DHT. The mapping of [key=>nodes who store it] is done by distance metric, but instead of storing the actual value, it stores who is known to possess that value.

But I think you are right. Given the discrete domain of shards (1024 shards?), the CIDs would be predictable and the nodes responsible for those prefixes could become attack targets.

go-libp2p-rendezvous (for reference purposes: spec, impl) seems like a direction to explore, as well as pubsub (which would require bootstrap nodes as well, i.e. rendezvous).

I guess one of the complexities is how to guard against spurious peers. Perhaps the rendezvous nodes could send challenges to nodes registered on a shard periodically – or the users of discovery could ask for deregistration of spurious nodes by presenting a proof of data unavailability (or something simliar) to the rendezvous nodes? i.e. if I connect to a client that's registered on shard X, but I discover that it's a lie, I can present a proof to have that node de-registered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Need design question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants