Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement Request: Add Option to Specify Count Parameter in all_pks Method #512

Open
anandhu-gopi opened this issue May 17, 2023 · 0 comments

Comments

@anandhu-gopi
Copy link

Issue Description

Current Behavior

The all_pks method in the redis-om library lacks the option to specify the count parameter for the SCAN command. As a result, it uses a default count value, causing performance issues when dealing with a large number of keys in Redis.

https://github.com/redis/redis-om-python/blob/2b450b5feec5602ac823d14cbb696ec73cf02347/aredis_om/model/model.py#LL1359C1-L1368C1

    #! Current Behavior
    @classmethod
    async def all_pks(cls):  # type: ignore
        key_prefix = cls.make_key(cls._meta.primary_key_pattern.format(pk=""))
        # TODO: We need to decide how we want to handle the lack of
        #  decode_responses=True...
        return (
            remove_prefix(key, key_prefix)
            if isinstance(key, str)
            else remove_prefix(key.decode(cls.Meta.encoding), key_prefix)
            async for key in cls.db().scan_iter(f"{key_prefix}*", _type="HASH")
        )

Desired Behavior

I would like to propose adding an option to pass the count parameter in the all_pks method. This would allow users to customize the number of keys processed at a time, providing better control over performance when working with Redis.

    #! Expected Behavior
    @classmethod
    # we can make count as an optional parameter 
    # to make the change non-breaking
    async def all_pks(cls, count: Optional[int]= None):  # type: ignore
        key_prefix = cls.make_key(cls._meta.primary_key_pattern.format(pk=""))
        # TODO: We need to decide how we want to handle the lack of
        #  decode_responses=True...
        return (
            remove_prefix(key, key_prefix)
            if isinstance(key, str)
            else remove_prefix(key.decode(cls.Meta.encoding), key_prefix)
            async for key in cls.db().scan_iter(f"{key_prefix}*", _type="HASH", count= count)
        )

Steps to Reproduce (if applicable)

N/A

Impact

Adding this feature would improve the performance and scalability of the library when working with Redis instances containing a large number of keys. It would provide users with better control over the data retrieval process and enable them to fine-tune the performance according to their needs.

Additional context:

I have gathered some examples that highlight the performance gains achieved by modifying the value of the count parameter. These examples serve as evidence for the potential improvements in performance when users can customize the count value in the all_pks method. (I have used pyinstrument to profile code)

Note:

It is important to note that determining the ideal count value depends on the specific Redis instance and the number of items it contains. The count value I have used in my examples is based on the number of items in my Redis instance and may not be universally applicable

Test Code

from typing import Optional
from aredis_om import HashModel, get_redis_connection

class DummyModel(HashModel):
    
    class Meta:
        global_key_prefix = "logomatcher_orchestrator"
        model_key_prefix = "SimilarityScoreModel"
        database = get_redis_connection(
            host="redishost",
            port=6379,
            db=0,
        )


async def print_keys_with_prefix_matching(prefix:str,count:Optional[int] = None):
    async for key in DummyModel.db().scan_iter(f"{prefix}*", _type="HASH",count=count):
        print(key)

time took with the default count value

>>> profiler = Profiler()
>>> profiler.start()
>>>
>>> asyncio.get_event_loop().run_until_complete( print_keys_with_prefix_matching("test") )

>>> profiler.stop()
<pyinstrument.session.Session object at 0x7fc9d7002a60>
>>>
>>> profiler.print()

  _     ._   __/__   _ _  _  _ _/_   Recorded: 17:40:33  Samples:  57622
 /_//_/// /_\ / //_// / //_'/ //     Duration: 97.692    CPU time: 48.077
/   _/                      v4.4.0

Program:

97.692 MainThread  <thread>:140504873899840
├─ 49.604 Redis.scan_iter  aioredis/client.py:2413
│     [114 frames hidden]  aioredis, asyncio, <built-in>, async_...
├─ 43.834 <module>  <stdin>:1
│     [330 frames hidden]  <stdin>, asyncio, aioredis, async_tim...
├─ 1.468 HiredisParser.can_read  aioredis/connection.py:477
│     [8 frames hidden]  aioredis, async_timeout
└─ 1.124 StreamReader.read  asyncio/streams.py:643
      [4 frames hidden]  asyncio

time took with the count value 1000

>>> import asyncio
>>> from pyinstrument import Profiler
>>>
>>> profiler = Profiler()
>>> profiler.start()
>>>
>>> asyncio.get_event_loop().run_until_complete( print_keys_with_prefix_matching("test",count=1000) )
>>> profiler.stop()
<pyinstrument.session.Session object at 0x7fc9d6fa4cd0>
>>>
>>> profiler.print()

  _     ._   __/__   _ _  _  _ _/_   Recorded: 17:46:30  Samples:  580
 /_//_/// /_\ / //_// / //_'/ //     Duration: 0.782     CPU time: 0.471
/   _/                      v4.4.0

Program:

0.781 MainThread  <thread>:140504873899840
├─ 0.644 <module>  <stdin>:1
│     [130 frames hidden]  <stdin>, asyncio, selectors, <built-i...
└─ 0.136 Redis.scan_iter  aioredis/client.py:2413
      [78 frames hidden]  aioredis, asyncio, <built-in>, async_...

Environment

Library version: 0.0.27
Python version: 3.9.16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant