Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: consider merging absolute8511/nsq fork (HA, replication, in-order delivery, log-based) #887

Open
absolute8511 opened this issue Apr 17, 2017 · 15 comments
Labels

Comments

@absolute8511
Copy link
Contributor

NSQ is great for most usage and we are using it for a long time. However, there are some important features we need eagerly. So we redesigned the NSQ to add some features we need. The main features are:

  • Replication and High Availibility
  • Auto balance and migrate
  • Delivery the messages in Order
  • Consume history messages
  • Dynamic Log level and more detail logs
  • More stats in the nsqadmin
  • Performance improvement

After redesigned, this version has been used in Youzan.com for several months and processed 150B messages until now.

To thanks to the original project, we open source all the redesigned project including client SDKs and if any possible we would like to merge it to the master. Since there are many differences between the two projects, it may be a very hard work.
The forked project : https://github.com/absolute8511/nsq

Thanks again for the great work on the NSQ.

@ploxiln
Copy link
Member

ploxiln commented Apr 17, 2017

Merging back in to this project would have to be done one feature at a time. And as you say, it'll be a lot of work - not just because of merge conflicts, but also because of design review and revision.

It appears that you've integrated etcd into your fork - that's something we've thought about. Can you describe how your fork uses it exactly?

@mreiferson
Copy link
Member

Wow, thanks, this is great!

In a similar vein to @ploxiln's comments, I would love to start by reading some design and architecture docs to better understand how you've accomplished all of this. After that, we can decide what makes the most sense in terms of merging it into mainline.

@absolute8511
Copy link
Contributor Author

Yes, we use the etcd as the storage of the cluster meta data, such as topic partition number, replicator, the node placement strategy and some other options. @ploxiln

There are some documents under doc (including the slide which I talked at GopherChina Conference). And I am preparing a more detail article to describe what we have done about the fork. @mreiferson

@ploxiln
Copy link
Member

ploxiln commented Apr 18, 2017

looks like the document useful for English speakers is this one: https://github.com/absolute8511/nsq/blob/master/doc/NSQ-redesigned-details.pdf

One of the big changes is to use a shared file-based queue for all channels. That's similar to the stalled effort in #625 - so that is indeed a topic we're interested in. Can you summarize how requeued messages and deferred-published messages are handled? (I can think of one or two ways, but I'm curious about your strategy and how it works out in practice.)

Is the performance improvement in your fork mostly from this change, from go channels to the shared file-based queue?

Another big change - even bigger, I'd say - is the system of assigning "topic leader" and "topic follower" roles to nsqd processes, using etcd to coordinate. I understand that this is so a "leader" can die abruptly, and a "follower" will have all of the successfully-published messages, and can take-over. Please correct me if I'm wrong: does this mean that all processes which publish a message on Topic "A" must publish to the same single nsqd, the "leader"? Currently, separate processes which are producing messages for a single topic can separately publish to N independent nsqd if desired, for the same topic "A". This change would seem to significantly limit the scalability of nsqd for handling a single topic. That may be a trade-off some users are willing to make for this functionality. But again, correct me if I'm wrong :)

@absolute8511
Copy link
Contributor Author

absolute8511 commented Apr 19, 2017

For the requeue, we have 2 strategies currently, first is put back to the requeued channel and second is put back to the end of the file. So for deferred-publish or deferred-req we may lead to imprecise for the time deferred. This issue for imprecise defer can not be avoided while lots of deferred messages. The priority queue in NSQ for deferred messages is not enough if too much deferred, so we are planing using the (Hierarchical Timing Wheel) to store them on disk (detail is still in discussion).

Except the file-based queue, there are some another improvements such as group commit which reduce io, id generator using atomic int64, buffer pool to reduce GC, prefetch on disk read to reduce the syscall.

About the leader and follower, you're correct. To solve the scalability we introduced the partition, and a topic can have several partitions and each partition of topic can have an independent leader to handle the read/write.

@ploxiln
Copy link
Member

ploxiln commented Apr 19, 2017

ah, partitions, reminds me of Kafka

@shinzui
Copy link

shinzui commented Apr 20, 2017

This is really exciting. I hope this work could make its way back to the main project.

@absolute8511
Copy link
Contributor Author

absolute8511 commented Apr 21, 2017

Yeah, I think the first thing need to be done is: every message should be written to the disk file without involving the channel overflow to disk. All theses features are based on this assumption.

@mreiferson
Copy link
Member

Yeah, I think the first thing need to be done is: every message should be written to the disk file without involving the channel overflow to disk. All theses features are based on this assumption.

I agree with this as a pre-requisite. Have you reviewed the work in #625?

@mreiferson mreiferson changed the title Add some great features for the NSQ and maybe merge to master *: consider merging absolute8511/nsq fork (HA, replication, in-order delivery, log-based) Apr 22, 2017
@absolute8511
Copy link
Contributor Author

absolute8511 commented Apr 25, 2017

After reviewed the changes, I noticed that WAL is added for each topic. In my redesigned solution, I tried to make the least changes. Since the topic disk file has all the data for message, I treated the disk file as the WAL itself. So what I did is just remove the code handling channel overflow to disk.

And to keep compatible with the original NSQ, I add most of the replication related code outside the topic code without modifying the old topic. Also in this way we can turn off replication in the topic unit test.

The biggest change I did for topic is change message id from 16bytes guid id to the 16bytes binary ID. (8 bytes for uint64 internal id and 8 bytes for outside trace id)

@lihan
Copy link

lihan commented May 8, 2017

@absolute8511 "Dynamic Log level and more detail logs" Possible to submit a separate branch to address this one? It would be way quicker to merge smaller set over the time than a big chunk in one go.

@mreiferson
Copy link
Member

Dynamic Log level and more detail logs

This is now happening in #892. Please review if you haven't already.

After reviewed the changes, I noticed that WAL is added for each topic. In my redesigned solution, I tried to make the least changes. Since the topic disk file has all the data for message, I treated the disk file as the WAL itself. So what I did is just remove the code handling channel overflow to disk

@absolute8511 I see. How did you implement independent per-channel "cursors" on top of the diskqueue though? That's basically the reason why I wrote and used mreiferson/wal.

The biggest change I did for topic is change message id from 16bytes guid id to the 16bytes binary ID. (8 bytes for uint64 internal id and 8 bytes for outside trace id)

Agreed, my assumption is that the on-disk structure will need to change to support these kinds of features.

@mreiferson
Copy link
Member

@absolute8511 taking a step back, how do you propose we begin the process of discussing design and architecture? Assuming we reach consensus on those things, we can then discuss implementation details and roadmap.

@absolute8511
Copy link
Contributor Author

absolute8511 commented May 19, 2017

@mreiferson channel cursor has the fileNum of the disk file and the offset of the disk file. And while a message read from file, we fill the offset and the occupied size in the message meta. On the finished of the message, the channel cursor will be updated based on the message meta info. Actually, we introduced the log index file outside of the topic to help sync with replica.

I am a little busy recently, and I am still preparing some technical articles on the redesigned NSQ. I think we can begin the merge discussion after I finished these design documents.

ps: the first overview article is here:
Redesign Overview
and there will be some more details in the next few weeks.

@yunnian
Copy link

yunnian commented Jan 15, 2018

This is my coworker ,His Chinese name is 李文

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants