Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove acl-check and cancel instead when REPLCONF ACK fails to validate #2920

Merged
merged 8 commits into from May 1, 2024

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Apr 17, 2024

resolves #2907

  • remove acl-checker which is also the source of this bug
  • add a fallback that cancels replication when REPLCONF ACK fail

@kostasrim kostasrim self-assigned this Apr 17, 2024
@romange
Copy link
Collaborator

romange commented Apr 17, 2024

@kostasrim why the acl checker (on replica ? ) calls SendCommandAndReadResponse ?

@kostasrim
Copy link
Contributor Author

@kostasrim why the acl checker (on replica ? ) calls SendCommandAndReadResponse ?

Because it pings master periodically to auth the connection for ACL changes (because the ACL's of the authed user might change and replication would need to be cancelled)

Snippet :

    void CheckAclRoundTrip() {
      if (auto ec = SendCommandAndReadResponse(StrCat("REPLCONF acl-check ", "0")); ec) {
        cntx_->Cancel();
        LOG(INFO) << "Error in REPLCONF acl-check: " << ec.message();
      } else if (!CheckRespIsSimpleReply("OK")) {
        cntx_->Cancel();
        LOG(INFO) << "Error: " << ToSV(LastResponseArgs().front().GetBuf());
      }    
    }


@romange
Copy link
Collaborator

romange commented Apr 17, 2024

@kostasrim but if the replication needs to be cancelled why the replica needs to know about this? why not shutdown the connection on the master side?

@kostasrim
Copy link
Contributor Author

kostasrim commented Apr 18, 2024

@romange

but if the replication needs to be cancelled why the replica needs to know about this? why not shutdown the connection on the master side?

because if we just shutdown the connection we don't really know what happened on the replica side but we should because there was an ACL change that invalidated somehow the ACL's of the usermasterauth && masterpass. We can't just kill the connection, we want to make sure that the replica is aware of this change and killing a connection is not necessary an indicator of ACL failure. This is done by redis as well IMO, I made sure we got consistent semantics

@romange
Copy link
Collaborator

romange commented Apr 18, 2024

I agree that killing a connection is not an indicator but lets simulate the rest of the flow:

  1. The master kills the connection.
  2. Replica does not know why so it tries to reconnect.
  3. It authenticates and gets an invalid auth response.

Why this flow is not good?

@kostasrim
Copy link
Contributor Author

kostasrim commented Apr 18, 2024

I agree that killing a connection is not an indicator but lets simulate the rest of the flow:

  1. The master kills the connection.
  2. Replica does not know why so it tries to reconnect.
  3. It authenticates and gets an invalid auth response.

Why this flow is not good?

One issue I can think of is that this introduces a race. Once a connection is killed, there is a very very small window/chance that the ACL's got changed again BEFORE the replica tries to reconnect. So for example,

  1. replication works all good
  2. acl changes to the authenticated user for replication
  3. acl-check causes the master to kill replication
  4. acl changes again to the authenticated user for replication
  5. now replica tries to reconnect and it works

Replica got disconnected, it retried and connected successfully and we just made an error message disappear. Imagine seeing a bunch of no reason connect/disconnects in the logs.

Another problem is that you just slightly changing the protocol:

We validate commands against the ACL's. The problem is, on reconnect there other commands that are being used that belong to different categories. So, you can also have a failure for some other ACL category that was completely irrelevant of the original flow (I am thinking of PING, because if I recall correctly we ping during reconnect). But any other command called during that flow might cause an ACL failure completely unrelated to the original one (we are in Partial Mode vs FullSync)

Also I don't really understand why would you want to funnel the ACL validation in the normal replication reconnect flow or how would this have helped with the issue in this PR. Why not keep them separate since they logically differ and there is no really a regression (I doubt that the extra "OK" replies for master are even measurable) or reason to go with a different route (other than the potential to inject more bugs because we missed something like the two examples I give above).

@romange
Copy link
Collaborator

romange commented Apr 18, 2024

Kostas , I am not against the fix, I am even more condfident after this reply that this check is redundant. You remove the code and the system works better. And I prefer not to have code in the codebase that does not contribute value so we won't spend time on fixing the code that should not have been in the first place. ACL around replication is not interesting. Replication has ADMIN access to the system, it's binary - either it has the access or not. There is no point in having fine grained ACL access for your replica from product perspective because it copies all the data anyway and the replica should reside inside your security perimeter.

@@ -2598,9 +2598,6 @@ void ServerFamily::ReplConf(CmdArgList args, ConnectionContext* cntx) {
VLOG(2) << "Received client ACK=" << ack;
cntx->replication_flow->last_acked_lsn = ack;
return;
} else if (cmd == "ACL-CHECK") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks replication when replica is of lower version. My two cents is that we don't care for this case since this setup doesn't really make much sense. Let me know if you disagree

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep the code in master and add a TODO with your anniversary date to remove this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add an alarm on my calendar as well 😉

@@ -893,53 +884,6 @@ void Replica::RedisStreamAcksFb() {
}
}

class AclCheckerClient : public ProtocolClient {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a small bug if master is an older version that we cover in our tests. The problem is that if ACL of the masteruser changes then REPLCONF ACK will start failing silently without noticing and replication will break :(

I guess this setup is more probably since upgrading from one version to the other would involve a replica copying the data and then being promoted to master. An intermediate solution would be to keep this check for older version masters and deprecate/completely remove at some later point.

I am open for suggestions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine, the use-case is not interesting.

@kostasrim kostasrim requested a review from romange April 18, 2024 11:39
@kostasrim kostasrim changed the title fix: deadlock on acl-check fix: remove acl-check and cancel instead when REPLCONF ACK fails to validate Apr 18, 2024
@@ -893,6 +893,7 @@ void Replica::RedisStreamAcksFb() {
}
}

// TODO(kostasrim): Remove this on 20/6/2024
Copy link
Collaborator

@romange romange Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused. Should not you remove the replica side checks so that later you will be able to remove master side responses?
Where do you remove acl-check as written in the PR title?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad here. While I was making the changes I realized that the code I introduced above is enough to allow us a smooth transition (that means as long as two subsequent versions of df contain the code below we can remove both the acl-checks and be fine without the clients noticing.

      auto info_ptr = server_family_.GetReplicaInfo(dfly_cntx);
      if (info_ptr) {
        info_ptr->cntx.Cancel();
      }
      return;

However, I do agree we should remove this now. I will push an update soon.

@@ -1147,7 +1147,10 @@ void Service::DispatchCommand(CmdArgList args, facade::ConnectionContext* cntx)
// Bonus points because this allows to continue replication with ACL users who got
// their access revoked and reinstated
if (cid->name() == "REPLCONF" && absl::EqualsIgnoreCase(ArgS(args_no_cmd, 0), "ACK")) {
LOG(ERROR) << "Tried to reply to REPLCONF";
auto info_ptr = server_family_.GetReplicaInfo(dfly_cntx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adiholden please help me reviewing these changes

@@ -116,6 +116,10 @@ class ProtocolClient {
return sock_.get();
}

void SetSocketTimeout(uint32_t msec) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this called ?

@@ -1147,7 +1147,10 @@ void Service::DispatchCommand(CmdArgList args, facade::ConnectionContext* cntx)
// Bonus points because this allows to continue replication with ACL users who got
// their access revoked and reinstated
if (cid->name() == "REPLCONF" && absl::EqualsIgnoreCase(ArgS(args_no_cmd, 0), "ACK")) {
LOG(ERROR) << "Tried to reply to REPLCONF";
auto info_ptr = server_family_.GetReplicaInfo(dfly_cntx);
if (info_ptr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont you need to call DflyCmd::CancelReplication?

@adiholden
Copy link
Collaborator

@kostasrim how does this solves #2907 ?

@kostasrim kostasrim merged commit 1fd16ab into main May 1, 2024
10 checks passed
@kostasrim kostasrim deleted the fix_acl_check_deadlock branch May 1, 2024 06:57
kireque pushed a commit to kireque/home-ops that referenced this pull request May 10, 2024
…nfly ( v1.17.1 → v1.18.0 ) (#539)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[docker.dragonflydb.io/dragonflydb/dragonfly](https://togithub.com/dragonflydb/dragonfly)
| minor | `v1.17.1` -> `v1.18.0` |

---

### Release Notes

<details>
<summary>dragonflydb/dragonfly
(docker.dragonflydb.io/dragonflydb/dragonfly)</summary>

###
[`v1.18.0`](https://togithub.com/dragonflydb/dragonfly/releases/tag/v1.18.0)

[Compare
Source](https://togithub.com/dragonflydb/dragonfly/compare/v1.17.1...v1.18.0)

##### Dragonfly v1.18.0

Some prominent changes include:

- ACL improvements:
[#&#8203;2945](https://togithub.com/dragonflydb/dragonfly/issues/2945)
[#&#8203;2943](https://togithub.com/dragonflydb/dragonfly/issues/2943)
[#&#8203;2920](https://togithub.com/dragonflydb/dragonfly/issues/2920)
[#&#8203;2982](https://togithub.com/dragonflydb/dragonfly/issues/2982)
[#&#8203;2995](https://togithub.com/dragonflydb/dragonfly/issues/2995)
- Implementation of json.merge
[#&#8203;2960](https://togithub.com/dragonflydb/dragonfly/issues/2960)
-   Replication - memory improvements
- Very much alpha support for data tiering. Try it out with
`--tiered_prefix=/pathto/ssd/base` and see how your memory usage goes
down (STRING type only). Do not use it in prod! 😸

##### What's Changed

- feat: retry ACK if the configs are different
[#&#8203;2833](https://togithub.com/dragonflydb/dragonfly/issues/2833)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2906
- chore(tiering): Update Get, Set, Del by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2897
- chore: preparation step for lock fingerprints by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2899
- fix(transaction): Use FinishHop in schedule by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2911
- chore(tiering): Fix MacOs build by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2913
- feat(cluster): Migration cancellation support by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2869
- feat: process migration data after FIN opcode
[#&#8203;2864](https://togithub.com/dragonflydb/dragonfly/issues/2864)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2918
- chore(string_family): Refactor SetCmd by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2919
- fix: Improve reply latency of HELLO by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2925
- chore: improve reply latency of SendScoredArray by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2929
- Namespace support in prometheus rule by
[@&#8203;Pothulapati](https://togithub.com/Pothulapati) in
[dragonflydb/dragonfly#2931
- fix: socket closed when RegisterOnErrorCb is called in HandleRequests
by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2932
- chore: bring more clarity when replayer fails by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2933
- Slot migration cancel crash fix by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2934
- feat: add ability reaply config with migration
[#&#8203;2924](https://togithub.com/dragonflydb/dragonfly/issues/2924)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2926
- fix(test): Unflake fuzzy cluster migration test by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2927
- chore: Remove Schedule() call by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2938
- chore: get rid of lock keys by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2894
- fix: introduce info_replication_valkey_compatible flag by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2936
- feat(metrics): adding max_clients to metrics and info output
([#&#8203;2912](https://togithub.com/dragonflydb/dragonfly/issues/2912))
by [@&#8203;racamirko](https://togithub.com/racamirko) in
[dragonflydb/dragonfly#2940
- chore: adjust transaction code to keystep/3 commands by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2941
- feat(tiering): Get, GetSet, Set test by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2921
- chore(acl): adjust some ACL command responses by
[@&#8203;Niennienzz](https://togithub.com/Niennienzz) in
[dragonflydb/dragonfly#2943
- chore: Pull helio with new future by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2944
- refactor: add cluster namespace by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2948
- chore: Introduce ShardArgs as a distinct type by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2952
- chore: Log db_index in traffic logger by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2951
- fixes for v1.18.0 by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2956
- feat(tiering): Support append (and modifications in general) by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2949
- feat: extended bracket index in jsonpath by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2954
- chore: Remove TieringV1 by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2962
- fix(pytests): replace proc.wait() with proc.communicate() to avoid
deadlocks by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2964
- feat(tiering): Registered buffers by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2967
- feat: add slot migration error processing by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2957
- chore(acl): allow multiple users in acl deluser by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2945
- feat: implement json.merge by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2960
- fix: fix deadlock and slot flush for migration cancel
[#&#8203;2968](https://togithub.com/dragonflydb/dragonfly/issues/2968)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2972
- chore(tiering): Lots of metrics by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2977
- fix: crash during migration when connection is closing by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2973
- fix: remove acl-check and cancel instead when REPLCONF ACK fails to
validate by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2920
- fix: check return code of process after communicate by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2976
- fix: allow non hashed passwords when loading users from acl file by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2982
- chore: update our container distributions versions by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2983
- chore: remove version checks when running our regtests by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2988
- chore(acl): add vlog and check on deluser flow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2995
- fix(memcached): Register memcached listener to handle `--maxclients`
by [@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2985
- chore: another preparation commit to get rid of kv_args in transaction
by [@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2996
- chore: improve performance of Scan operation by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2990
- fix(server): small string allocations only under 256 bytes str by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2991
- fix(cluster-migration): Support cancelling migration right after
starting it by [@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2992
- chore: fix double header issue by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#3002
- chore: small tiering fixes by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2966
- feat(benchmark-tests): run in K8s by
[@&#8203;zacharya19](https://togithub.com/zacharya19) in
[dragonflydb/dragonfly#2965
- Benchmark fixes by
[@&#8203;zacharya19](https://togithub.com/zacharya19) in
[dragonflydb/dragonfly#3005
- fix(tiering): rename v2 + max_file_size by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#3004
- chore: fix tiering macos stub by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#3006
- chore: export listener stats by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#3007
- chore: pull latest helio by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#3009
- fix(server): lag is 0 when server not in stable state by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#3010
- chore: get rid of kv_args and replace it with slices to full_args by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2942
- fix(server): non auto journal write after callback finish by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#3016
- fix(server): shrink replication steaming buf by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#3012
- fix(zset): fix random in ZRANDMEMBER command by
[@&#8203;BagritsevichStepan](https://togithub.com/BagritsevichStepan) in
[dragonflydb/dragonfly#2994
- Fix benchmark by [@&#8203;adiholden](https://togithub.com/adiholden)
in
[dragonflydb/dragonfly#3017
- chore: Remove tiering test skip by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#3011
- feat(tiering): simple offload loop by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2987
- feat(tiering): MGET support by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#3013
- Revert "chore: get rid of kv_args and replace it with slices to
full\_… by [@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#3024
- chore(dash): Replace comparator with predicate by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#3025
- feat: add defragment command by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#3003

##### Huge thanks to all the contributors! ❤️

##### New Contributors

- [@&#8203;racamirko](https://togithub.com/racamirko) made their first
contribution in
[dragonflydb/dragonfly#2940
- [@&#8203;BagritsevichStepan](https://togithub.com/BagritsevichStepan)
made their first contribution in
[dragonflydb/dragonfly#2994

**Full Changelog**:
dragonflydb/dragonfly@v1.17.0...v1.18.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNTIuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM1Mi4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19-->

Co-authored-by: kireque-bot[bot] <143391978+kireque-bot[bot]@users.noreply.github.com>
lumiere-bot bot added a commit to coolguy1771/home-ops that referenced this pull request May 11, 2024
…18.0 ) (#4656)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[ghcr.io/dragonflydb/dragonfly](https://togithub.com/dragonflydb/dragonfly)
| minor | `v1.17.1` -> `v1.18.0` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>dragonflydb/dragonfly (ghcr.io/dragonflydb/dragonfly)</summary>

###
[`v1.18.0`](https://togithub.com/dragonflydb/dragonfly/releases/tag/v1.18.0)

[Compare
Source](https://togithub.com/dragonflydb/dragonfly/compare/v1.17.1...v1.18.0)

##### Dragonfly v1.18.0

Some prominent changes include:

- ACL improvements:
[#&#8203;2945](https://togithub.com/dragonflydb/dragonfly/issues/2945)
[#&#8203;2943](https://togithub.com/dragonflydb/dragonfly/issues/2943)
[#&#8203;2920](https://togithub.com/dragonflydb/dragonfly/issues/2920)
[#&#8203;2982](https://togithub.com/dragonflydb/dragonfly/issues/2982)
[#&#8203;2995](https://togithub.com/dragonflydb/dragonfly/issues/2995)
- Implementation of json.merge
[#&#8203;2960](https://togithub.com/dragonflydb/dragonfly/issues/2960)
-   Replication - memory improvements
- Very much alpha support for data tiering. Try it out with
`--tiered_prefix=/pathto/ssd/base` and see how your memory usage goes
down (STRING type only). Do not use it in prod! 😸

##### What's Changed

- feat: retry ACK if the configs are different
[#&#8203;2833](https://togithub.com/dragonflydb/dragonfly/issues/2833)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2906
- chore(tiering): Update Get, Set, Del by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2897
- chore: preparation step for lock fingerprints by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2899
- fix(transaction): Use FinishHop in schedule by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2911
- chore(tiering): Fix MacOs build by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2913
- feat(cluster): Migration cancellation support by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2869
- feat: process migration data after FIN opcode
[#&#8203;2864](https://togithub.com/dragonflydb/dragonfly/issues/2864)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2918
- chore(string_family): Refactor SetCmd by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2919
- fix: Improve reply latency of HELLO by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2925
- chore: improve reply latency of SendScoredArray by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2929
- Namespace support in prometheus rule by
[@&#8203;Pothulapati](https://togithub.com/Pothulapati) in
[dragonflydb/dragonfly#2931
- fix: socket closed when RegisterOnErrorCb is called in HandleRequests
by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2932
- chore: bring more clarity when replayer fails by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2933
- Slot migration cancel crash fix by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2934
- feat: add ability reaply config with migration
[#&#8203;2924](https://togithub.com/dragonflydb/dragonfly/issues/2924)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2926
- fix(test): Unflake fuzzy cluster migration test by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2927
- chore: Remove Schedule() call by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2938
- chore: get rid of lock keys by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2894
- fix: introduce info_replication_valkey_compatible flag by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2936
- feat(metrics): adding max_clients to metrics and info output
([#&#8203;2912](https://togithub.com/dragonflydb/dragonfly/issues/2912))
by [@&#8203;racamirko](https://togithub.com/racamirko) in
[dragonflydb/dragonfly#2940
- chore: adjust transaction code to keystep/3 commands by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2941
- feat(tiering): Get, GetSet, Set test by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2921
- chore(acl): adjust some ACL command responses by
[@&#8203;Niennienzz](https://togithub.com/Niennienzz) in
[dragonflydb/dragonfly#2943
- chore: Pull helio with new future by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2944
- refactor: add cluster namespace by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2948
- chore: Introduce ShardArgs as a distinct type by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2952
- chore: Log db_index in traffic logger by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2951
- fixes for v1.18.0 by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2956
- feat(tiering): Support append (and modifications in general) by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2949
- feat: extended bracket index in jsonpath by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2954
- chore: Remove TieringV1 by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2962
- fix(pytests): replace proc.wait() with proc.communicate() to avoid
deadlocks by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2964
- feat(tiering): Registered buffers by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2967
- feat: add slot migration error processing by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2957
- chore(acl): allow multiple users in acl deluser by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2945
- feat: implement json.merge by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2960
- fix: fix deadlock and slot flush for migration cancel
[#&#8203;2968](https://togithub.com/dragonflydb/dragonfly/issues/2968)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2972
- chore(tiering): Lots of metrics by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2977
- fix: crash during migration when connection is closing by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2973
- fix: remove acl-check and cancel instead when REPLCONF ACK fails to
validate by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2920
- fix: check return code of process after communicate by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2976
- fix: allow non hashed passwords when loading users from acl file by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2982
- chore: update our container distributions versions by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2983
- chore: remove version checks when running our regtests by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2988
- chore(acl): add vlog and check on deluser flow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2995
- fix(memcached): Register memcached listener to handle `--maxclients`
by [@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2985
- chore: another preparation commit to get rid of kv_args in transaction
by [@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2996
- chore: improve performance of Scan operation by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2990
- fix(server): small string allocations only under 256 bytes str by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2991
- fix(cluster-migration): Support cancelling migration right after
starting it by [@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2992
- chore: fix double header issue by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#3002
- chore: small tiering fixes by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2966
- feat(benchmark-tests): run in K8s by
[@&#8203;zacharya19](https://togithub.com/zacharya19) in
[dragonflydb/dragonfly#2965
- Benchmark fixes by
[@&#8203;zacharya19](https://togithub.com/zacharya19) in
[dragonflydb/dragonfly#3005
- fix(tiering): rename v2 + max_file_size by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#3004
- chore: fix tiering macos stub by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#3006
- chore: export listener stats by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#3007
- chore: pull latest helio by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#3009
- fix(server): lag is 0 when server not in stable state by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#3010
- chore: get rid of kv_args and replace it with slices to full_args by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2942
- fix(server): non auto journal write after callback finish by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#3016
- fix(server): shrink replication steaming buf by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#3012
- fix(zset): fix random in ZRANDMEMBER command by
[@&#8203;BagritsevichStepan](https://togithub.com/BagritsevichStepan) in
[dragonflydb/dragonfly#2994
- Fix benchmark by [@&#8203;adiholden](https://togithub.com/adiholden)
in
[dragonflydb/dragonfly#3017
- chore: Remove tiering test skip by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#3011
- feat(tiering): simple offload loop by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2987
- feat(tiering): MGET support by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#3013
- Revert "chore: get rid of kv_args and replace it with slices to
full\_… by [@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#3024
- chore(dash): Replace comparator with predicate by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#3025
- feat: add defragment command by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#3003

##### Huge thanks to all the contributors! ❤️

##### New Contributors

- [@&#8203;racamirko](https://togithub.com/racamirko) made their first
contribution in
[dragonflydb/dragonfly#2940
- [@&#8203;BagritsevichStepan](https://togithub.com/BagritsevichStepan)
made their first contribution in
[dragonflydb/dragonfly#2994

**Full Changelog**:
dragonflydb/dragonfly@v1.17.0...v1.18.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNTIuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM1Mi4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19-->

Co-authored-by: lumiere-bot[bot] <98047013+lumiere-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

replicaof no one hangs on replica when connectivity to master is broken
3 participants