Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Redis Cache #32

Open
richard-churchman opened this issue Feb 23, 2024 · 2 comments
Open

Implement Redis Cache #32

richard-churchman opened this issue Feb 23, 2024 · 2 comments
Assignees

Comments

@richard-churchman
Copy link
Contributor

richard-churchman commented Feb 23, 2024

PostgreSQL is used for the purpose of caching transaction data currently, and while it cannot be considered to be an in memory database, the shared buffers means that read performance, while slower than an in memory cache, it is not to an extent that materially affects response time when traded off against the durability guarantees provided by PostgreSQL. Read performance aside, in memory databases are extremely expensive to run, a nightmare to administer and demand a degree of duplication - in Jube at least - given its key value pair access paths (while PostgreSQL queries are indexed on multiple keys, these keys would instead be duplicated, with transaction history being stored in its value HSET).

There is no contest in writes, and Jube response times are severely impacted. For example, reading Abstraction keys overall might take 3ms to 6ms, writing might be 17 ms, which is hard to defend in a real time system. Currently writes to the cache are made in the transaction flow, which is important, as serialisation across requests is required. Ideally all writes would be moved out to background threads performing bulk inserts, but this would not provide for the serialisation guarantees from transaction to transaction (consider a velocity rule made up of counts on the PostgreSQL cache). Turning asynchronous commit on allows some relief, but without moving to UNLOGGED tables (which attract their own problems) it still does not come close to write performance of desirable.

Redis will be implemented as an in memory database as follows.

In respect to the Abstraction cache:

  • In the event that cache is enabled, no comparable inserts will be made to the PostgreSQL cache tables. Instead the value will be key set in Redis on the basis of the search key and values (e.g. IP:123.456.789) and HSET of a MessagePack serialisation of the payload dictionary, for each transaction. In this respect the key serves to index, where the transactions are covered in the HSET values.
  • On each transaction, using async methods, a request will be made to Redis on Abstraction key (e.g. IP:123.456.789).
  • A Time To Live (TTL) definition will be created to accompany the specification of the search key. Given the TTL definition, the expiry of the key will be extended out on each transaction (otherwise the key will be allowed to expire), removing all HSET values. There is no member expire supported in Redis at this time, which means that data will not expire until there are no further transactions on that key. It follows that in the real-time flow there should be some online pruning of the values and \ or;
  • A background job that serves to prune the expired HSET values also.

In respect to TTL Counter:

  • The transactional incrementing of TTL Counters will be done in a Redis write.
  • The writing of the TTL Counter entry, which is used to decrement TTL Counters in a background process will be written to an in memory asynchronous queue for bulk insert, which will be done in a separate thread across all models and all tenants (as above). It follows hat PostgreSQL will continue in use for winding back TTL Counters.
  • The background process will update the Redis cache instead of the same table in PostgreSQL (this is to say not duplicated). The durability guarantees provided by Redis cache of the AOF log \ rewriting will ensure that the Redis cache is unlikely to need to be reconstituted, and the risk of 1 second of incremental counter loss can be conveyed as a risk.

In respect to cached Abstraction values:

  • The background process responsible for calculating counters will also write the values to Redis based on the Abstraction key with the aggregations being stored in the HREF.
  • The transactional process will instead read from the Redis cache rather than the equivalent in PostgreSQL.
  • There are no proposals to deprecate the writing of aggregations to PostgreSQL as this is useful for tracing the calculations, which is a complex process and benefits from the verbose trace.
  • Same durability considerations in respect to guarantees provided by Redis cache of the AOF log \ rewriting.

The functionality will be optional and in the absence of a Redis cache being available, existing functionality will prevail.

Connection strings to Redis should be contained at the application level and fully support multiple endpoints such that FAILOVER can be invoked to resize Redis instances.

@richard-churchman
Copy link
Contributor Author

#31 has largely prepared for the development of this issue.

@richard-churchman
Copy link
Contributor Author

richard-churchman commented Apr 21, 2024

To kick off the project the first step is to document the key names and patterns (may as well put this into markdown documentation). For example, for a payload entry:

KeyName:TenantId:EntityAnalysisModelId:Key:Value

Where in the Hash Key:

CreatedDateTimestamp:ReferenceDateTimestamp:EntityAnalysisModelInstanceEntryGuid:Bytes

So, selecting in an indexed manner might be:

Payload:1:1:AccountId:123456

And the HSET key has several helpful values such that the HSET can be itterated and filtered for the most common cases with going to the MessagePack byte array:

1713689193:1713689193:dc77732f-c211-435e-82eb-5211d2631105:690

The above values are promoted largely as there is subsequent filtering that will take place by iteration. Bytes are held so it is possible to iterate keys and identify tenants and models which are using the most space, which might be helpful if we ever get around to creating a cloud offer, in which case the Redis and Postgres overhead will need to be charged for.

For every Cache class in the application, start up creating such a key model.

It is supposed that the use of unix Timestamp will be more efficient given the amount of filtering based on date, but will test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant