Enable safe retry of aggregate commands in the event of (perceived) failures #2554
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposal
This is a draft proposal. It aims to enable safe retry of aggregate commands in the event of (perceived) failures, without changing current default behavior (i.e. opt-in).
Context
NoHandlerForCommandException
). As such, it makes sense to introduce the concept of aRetryScheduler
.COMMAND_TIMEOUT
)The problem and the proposal are demonstrated in the following repository: https://github.com/jvanheesch/axon-command-idempotency
Approach
Message.getIdentifier()
) in the event@CommandHandler
Limitations
This approach cannot be used to make the following CHs idempotent:
I'm not entirely happy about this, but I do believe the benefits largely outweigh these limitations. For CH constructors, this problem can be mitigated by including a
@TargetAggregateIdentifier
in the command message and selecting an event store implementation that enforces(aggregate_identifier, sequence_number)
uniqueness (e.g. when using axon server, duplicate creation commands result inOUT_OF_RANGE: [AXONIQ-2000] Invalid sequence number 0 for aggregate a1, expected 1)
.Final thoughts
There is definitely room for improvement. Some current questions/concerns:
AggregateSnapshotter
stores ALL previouscorrelationIds
(for idempotent aggregates), which presumably makes snapshot size unbounded. Is such naive implementation sufficient, or is a more complex strategy (retention policy) required?return null;
the most appropriate way of handling duplicate commands? Would some kind of exception make more sense?idempotent
attribute of the@Aggregate
annotation is not always taken into account. This value should be available to every piece of code that instantiates anAnnotatedAggregate
.AggregateConfigurer
passes this value toEventSourcingRepository
and as such, everyAnnotatedAggregate
returned byEventSourcingRepository
will have the correct value for itsidempotent
property. There are however several other places where anAnnotatedAggregate
is sourced (fromDomainEventStream
or state) that currently do not have access to this newly added property, e.g.:AggregateSnapshotter
does not have access to this value, and resorts to providing the methodshouldAddCorrelationIds()
, enablingSpringAggregateSnapshotter
to mitigate this problemCommandHandlerInvoker.DisruptorRepository
does not have access to this value, and as suchDisruptorCommandBus
will not filter out redeliveriesGenericJpaRepository
does not have access to this value. On top of that, additional changes would be required to take this value into account, asGenericJpaRepository
currently does not store (and hence cannot restore) command ids.idempotent
should be globally configurable, eliminating the need foridempotent = true
in every@Aggregate
annotationI believe safe retries would greatly improve command failure handling, and I hope they will be available some day. I'd be happy to improve upon this draft, but would like to hear your thoughts first.