Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the ability (and tests of) integer, float, tuple, and frozenset keys #74

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Jwink3101
Copy link

closes #73

@menshikh-iv
Copy link
Contributor

Thanks for the PR, I have 2 main questions:

  • How this change affects to performance (memory usage, speed)?
  • What's about other types (that didn't mentioned here)?

@Jwink3101
Copy link
Author

Jwink3101 commented Mar 8, 2018

Performance should be trivially affected. For string/bytes keys (all that currently work) the only overhead is a few conditionals. For the new key types, there is the slight overhead of json but that is only incurred for the new key types and again is very small and likely not even measurable compared to other aspects of the code.

As for other types, I do not have a good answer. While there are a few exceptions, any hashable python object can be a key but would require manual implementation to handle them. One option would be to instead pickle the object and compare the keys that way. I think this introduces even more overhead. Also, should the user wish to look at the sqlite file elsewhere, the keys would be imperceptible (In my use of sqlitedict, I use a modified json encoder with pickle fallback so the dict remains human readable). Still, this may be a valid approach. update: a quick test shows this pickle-based approach may falter. While the resulting object between say (u'a',u'b') on python 2 and 3 pass equality checks, the pickled form, including base64 encoding too, is not equal. None of this changes whether this approach is good for the mentioned types of objects (I still think it is) but it precludes reliably using pickles as keys /update

I will address a few more issues in the discussion of #73

@Jwink3101 Jwink3101 mentioned this pull request Mar 8, 2018
@Jwink3101 Jwink3101 mentioned this pull request Oct 29, 2018
Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run your code through flake8 so that it conforms to the PEP8 standard. e.g. 2 line spacing after functions, no lines with only whitespace, etc.

Next, while I think your new behavior is useful, it introduces the risk of backward compatibility and performance issues. So, I think this behavior should be disabled by default. Add two keyword parameters to the constructor:

  • serialize: convert an arbitary object to a string
  • deserialize: convert an arbitrary string to an object

where deserialize(serialize(obj)) == obj

By default, both of them should be None, in which case, the old behavior is used. If the user specifies only one of them, raise ValueError. If the user specifies both of them, then we can use them instead of your current convertkey/unconvertkey functions.

This approach has several benefits:

  1. Maintains existing behavior (compatibility, efficiency)
  2. Addresses the problem in the original issue
  3. Extensible: people can use their own serialization functions if the built-in ones are insufficient

@@ -213,7 +214,7 @@ def __bool__(self):
def iterkeys(self):
GET_KEYS = 'SELECT key FROM "%s" ORDER BY rowid' % self.tablename
for key in self.conn.select(GET_KEYS):
yield key[0]
yield _unconvertkey(key[0])
Copy link
Collaborator

@mpenkov mpenkov Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a more helpful function name, like serialize (and conversely, deserialize).

@mpenkov
Copy link
Collaborator

mpenkov commented Dec 29, 2019

Ping @Jwink3101 are you able to finish this PR?

return '___.float.' + repr(key)
# These are recursive
elif isinstance(key,tuple):
return '___.tuple.' + json.dumps([_convertkey(k) for k in key])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be very helpful to allow passing in a JSONEncoder to be used here as a cls=.. argument, to allow inclusion of types which need help serialising.

This could be done after your PR, if you like, but that may be a useful element in the design of this PR.

db['1'] = 1
db[1] = 'ONE'
db[('a',1)] = 'testtuple'
db[frozenset([1,2,'2'])] = 'testfrozenset'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have assertions about the underlying generated key names, possibly as separate test method, but also here so it is possible to understand the implementation by reading the tests

@mpenkov mpenkov added the stale No recent activity from PR author label Nov 25, 2022
@jungerm2
Copy link

This might be part of a more significant issue, a user treating an SqliteDict as a normal dictionary will expect it to work like a normal python dictionary: any hashable object should be able to be used as a key, and will not be mutated as it's added to the dict.

Currently, keys are mutated:

In [1]: from sqlitedict import SqliteDict

In [2]: db = SqliteDict("/nobackup/tmp/db1.sqlite")

In [3]: key, value = 1, 1

In [4]: db[key] = value

In [5]: list(db.keys())
Out[5]: ['1']

In [6]: list(db.values())
Out[6]: [1]

This seems to occur because the default encode_key is the identity but sqlite turns it into a string. Why isn't the pickle/base64 based encode/decode_key the default?

@Jwink3101
Copy link
Author

The problem with using python’s hash or pickle is that they are not cross platform or cross versions. It’s a tough problem!

My code is one option but another is just education.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale No recent activity from PR author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Key ambiguity
5 participants