Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Greenfield] Add sentTimestamp field to delivery data (Fix #3146) #3278

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

NicolasDorier
Copy link
Member

@NicolasDorier NicolasDorier commented Jan 10, 2022

The timestamp of the creation of the delivery may be different to the timestamp when it was actually sent.
This is caused because we are queuing the events so the receiver get them in order.

This caused confusion in #3146.

I think another issue which caused the confusion is that deliveries aren't save in database until they are sent. Meaning that the user suddenly saw the event show up in the UI way after it has been created because the callback server took time to respond properly.

This need probably a bigger refactoring though, so I keep it simple here, and may improve the API later.
This would also be a breaking change.

Note that the confusion was caused at the UI level, and this add a field to the API... So I'm not entirely satisfied by my "solution" either. Maybe we don't even need this field until we refactor the API?

@NicolasDorier NicolasDorier marked this pull request as draft January 10, 2022 13:46
@NicolasDorier
Copy link
Member Author

Putting this in draft, need to think a bit.
I think we should better refactor the deliveries to save before they are sent.

@woutersamaey
Copy link
Contributor

we are queuing the events so the receiver get them in order.

This may sound logical at first, but I'm not sure it's a good idea to be honest.
In every client I've made so far the first thing I do is take the InvoiceID and throw away everything else. Then I fetch fresh data.
Relying on posted data is dangerous because of potential timing and concurrency issues. You'll see the same thing in webhooks and APIs like with PayPal.

We should reduce the fields in the webhook to only include the invoice ID + fields that are relevant to the delivery itself (like was it resent?). This will result in a very very small message.

If you're considering refactoring, that's what I would advise.

How does the current queue work exactly? How long is a message put on hold? Can the queue get stuck? Do we need to consider an unclogging feature?

Ignoring the sequence and just shooting messages is better, also for scaling. Today nobody runs BTCPay in a cluster, but the queuing could also prevent larger setups (maybe needed in a few years).

@woutersamaey
Copy link
Contributor

woutersamaey commented Jan 11, 2022

Maybe we can add an allowOutOfSequenceDelivery field as boolean when registering the webhook?
This makes me think of how AWS SQS has 2 types of queues and makes you choose which one you want, each with pros and cons.

Junior devs and integrations that you cannot customize may prefer in sequence.
Advanced users would likely go for out of sequence, I believe.

In any case, the queue system should be explained on the page where you manage webhooks, so as a dev you know about this. Both @ndeet and I have been working with the API for a long time now and we didn't know of this queue system.

I'm guessing out of sequence events would also greatly simplify the refactoring needed of global events...

@NicolasDorier
Copy link
Member Author

NicolasDorier commented Apr 26, 2022

@woutersamaey I won't change the format of webhooks, as this is a breaking change, and the data in the event is safe if the hmac is verified.

The queue is kind of bare, just keeping it in a queue memory.

Now I am thinking of doing of just doing my best to be ordered, but drop the ordering if queue is stuck.

@woutersamaey
Copy link
Contributor

@NicolasDorier my suggestion was not so much about making a breaking change, but more like having 2 types where the user can choose if he wants ordered or instant webhooks like AWS SQS is doing. Both have their use case.

The ordered queue we already have.
The unordered "instant" queue should be easy to develop as there is far less complexity.

So just have 2 types with pros and cons you can pick from, like AWS SQS.

That was my idea.

@ndeet
Copy link
Contributor

ndeet commented Apr 26, 2022

To reiterate, th main problem of the current queue referred to in #3146 is that if one webhook fails the whole queue is stuck. So e.g. if you sent a webhook event for invoice id 1 and the webhook endpoints responds with 404 because it can't find any record. It will also block events to be sent for all follow up invoice id 2, 3, 4, ... until it retries and fails again making other events stuck and invoice 2, 3, 4... will stay in status new although they are paid already.

A quick solution might be to only lock the queue when there is a 500 error. If there is a 404 it should continue retrying or just stop for that invoice?

A long term solution would be to have the queue locked per invoice id, so if id 1 fails id 2 still sent and not blocked.

Timestamps and database entry:
Yes this is very confusing. The proper way would be to write the queue to db including timestamp before sending the actual event and then update the status if it succeeded or not. Having many events popping up suddenly is very confusing.

Unordered queue:
If it is optional might make sense. I like the ordered and signed queue though. As it is signed and ordered you can actually rely on the data. This means no roundtrip to load the invoice which is imo more efficient as it does not open many connections on each received event. So if you have unordered notifications hitting endpoint you have to re-query the invoice all the time, causing multiple requests although the invoice may be settled already. So not sure if it is more efficent of bytes transferred :D But agree both have their pros and cons and if it is optional it would be ok for me but ymmv.

@woutersamaey
Copy link
Contributor

@ndeet IMHO unordered queues are more future proof as we'll likely see more webhooks in the future, including ones that are not invoice related. Guarding sequence could be a big pain for events that are related to eachother, but don't have a single topic.

Example: New user created webhook => store created webhook => payment method activated webhook => invoice created webhook

Unordered queues allow us to send any webhook at any time and not be bothered by sequence or contents of the data. Web requests are by nature subject to race conditions, so trying to solve this is only possible in a limited scope like in this case "everything related to 1 invoice".

This was my reasoning...

@ndeet
Copy link
Contributor

ndeet commented May 4, 2022

@woutersamaey yes from that POV I agree and most of the cases the order will be kept anyway. What you describe is how most of the queues work but some have some kind of prioritization and stuff for non-blocking or important events.

What makes me prefer ordered events in this case is that it would be weird to receive e.g. a InvoiceSettled event before the InvoiceReceivedPayment. Or could maybe problematic if someone relied on InvoiceCreated event but the InvoiceSettled happens first the system receiving the webhook likely will never know that the invoice already settled. I know it is very unlikely to happen but just wanted to bring that up so we can discuss if it is reasonable to do it anyway.

@woutersamaey
Copy link
Contributor

@ndeet it could be wierd if webhooks arive in a different sequence, but I see this as a n00b issue.
A skilled dev or any bitcoiner who knows decentralized systems prolly already knows that networks take time to propagate messages and should consider this in their app design.
Trying to make networks synchronous is perhaps accommodating noobs too much?

@NicolasDorier
Copy link
Member Author

Or maybe just doing a best effort. If we notice the queue is stuck, drop the ordering.

@pavlenex
Copy link
Contributor

@NicolasDorier @woutersamaey @ndeet What's blocking this PR? Seems still in draft and it seems that it will help us fix #3146 so should we get back to it and include it as a part of a bug fix release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants