New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed #373 -- Added CompositePrimaryKey-based Meta.primary_key. #18056
base: main
Are you sure you want to change the base?
Conversation
I was trying out this exciting branch and ran into this error when running a test:
The issue stems from the use of
Curious if anyone ran into this as well. Edited for traceback:
So, this is part of |
Thanks for testing and reporting the issue @grjones! Indeed, I forgot to handle this use case. I'll look into it this week. |
6a26b19
to
c75dcdd
Compare
c75dcdd
to
4be1c68
Compare
@grjones, FYI I pushed the fix |
Nice! I hope this gets merged in soon. Your branch has been working great for me. |
I may have found one other small issue. When adding a regular |
@grjones , thanks, I appreciate the feedback, I'll look into it. If a model defines |
I'll see if I can give you a solid failing test. My "unique constraint" phrasing might not be exactly right. But ultimately, I believe Django queries the DB first to see if the new object's PK already exists and throws a validation error. The composite key logic doesn't seem to be doing that and so an unhandled IntegrityError is raised instead. |
@grjones , sorry for the late reply, I've been busy last week. Could you give me more specifics? What's the error message you expect? |
Actually, I think it's mostly ok. I was using Django Spanner and it's just not quite working with composite keys and will need to be fixed there. I wrote this and it passed. It probably shouldn't say
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start!
I've left a bunch of ideas for improvement. Feel free to push back if you think I'm wrong about anything.
django/db/models/base.py
Outdated
if cls._meta.primary_key and any( | ||
field for field in cls._meta.fields if field.primary_key | ||
): | ||
errors.append( | ||
checks.Error( | ||
"primary_key=True must not be set if Meta.primary_key " | ||
"is defined.", | ||
obj=cls, | ||
id="models.E042", | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to call out specifically which field has primary_key
incorrectly set.
django/db/models/fields/composite.py
Outdated
raise ValueError( | ||
"The right-hand side of the 'exact' lookup must be an iterable" | ||
) | ||
if len(list(self.lhs)) != len(list(self.rhs)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should need the list
calls here. Is there a particular type you were thinking of here that doesn't implement __len__
?
if len(list(self.lhs)) != len(list(self.rhs)): | |
if len(self.lhs) != len(self.rhs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I simplified it, thanks!
django/db/models/fields/composite.py
Outdated
raise ValueError( | ||
"The left-hand side of the 'exact' lookup must be an instance of Cols" | ||
) | ||
if not isinstance(self.rhs, Iterable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Iterable
is the right interface here - it doesn't check for __getitem__
based iteration or __len__
. Also, it allows rhs
to be a generator or other lazy iterator, which would be exhaustible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, could be restricted to tuple
s and list
s, I think that should be fine.
django/db/models/fields/composite.py
Outdated
lhs_len = len(tuple(self.lhs)) | ||
if not all(lhs_len == len(tuple(vals)) for vals in self.rhs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lhs_len = len(tuple(self.lhs)) | |
if not all(lhs_len == len(tuple(vals)) for vals in self.rhs): | |
lhs_len = len(self.lhs) | |
if not all(lhs_len == len(vals) for vals in self.rhs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, you're right, it can be simplified.
django/db/models/fields/composite.py
Outdated
def __iter__(self): | ||
return iter(self.get_source_expressions()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add __len__
too - we can do a cheap length check by checking len(self.targets)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good idea - added!
tests/composite_pk/test_get.py
Outdated
def test_get_tenant_by_pk(self): | ||
test_cases = [ | ||
{"id": self.tenant.id}, | ||
{"pk": self.tenant.pk}, | ||
] | ||
|
||
for lookup in test_cases: | ||
with self.subTest(lookup=lookup): | ||
with CaptureQueriesContext(connection) as context: | ||
obj = Tenant.objects.get(**lookup) | ||
|
||
self.assertEqual(obj, self.tenant) | ||
self.assertEqual(len(context.captured_queries), 1) | ||
if connection.vendor in ("sqlite", "postgresql"): | ||
t = Tenant._meta.db_table | ||
self.assertEqual( | ||
context.captured_queries[0]["sql"], | ||
f'SELECT "{t}"."id" ' | ||
f'FROM "{t}" ' | ||
f'WHERE "{t}"."id" = {self.tenant.id} ' | ||
f"LIMIT {MAX_GET_RESULTS}", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this test, since Tenant
itself doesn't use a CompositeField
.
tests/composite_pk/test_get.py
Outdated
with CaptureQueriesContext(connection) as context: | ||
obj = User.objects.only("pk").get(**lookup) | ||
|
||
self.assertEqual(obj, self.user) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should have more than just one User
in the database? Perhaps one with a different id
and another with a different tenant_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, there's no hurt in adding more test data.
tests/composite_pk/test_update.py
Outdated
with CaptureQueriesContext(connection) as context: | ||
result = User.objects.filter(pk=self.user.pk).update(id=8341) | ||
|
||
self.assertEqual(result, 1) | ||
self.assertFalse(User.objects.filter(pk=self.user.pk).exists()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a bit clearer:
with CaptureQueriesContext(connection) as context: | |
result = User.objects.filter(pk=self.user.pk).update(id=8341) | |
self.assertEqual(result, 1) | |
self.assertFalse(User.objects.filter(pk=self.user.pk).exists()) | |
old_pk = self.user.pk | |
with CaptureQueriesContext(connection) as context: | |
result = User.objects.filter(pk=self.user.pk).update(id=8341) | |
self.assertEqual(result, 1) | |
self.assertFalse(User.objects.filter(pk=old_pk).exists()) |
By storing the old_pk
we don't implicitly rely on self.user.pk
being stale.
tests/composite_pk/tests.py
Outdated
self.assertIsInstance(self.tenant.pk, int) | ||
self.assertGreater(self.tenant.id, 0) | ||
self.assertEqual(self.tenant.pk, self.tenant.id) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Tenant
doesn't have a CompositeField
I don't think these asserts add any value.
self.assertIsInstance(self.tenant.pk, int) | |
self.assertGreater(self.tenant.id, 0) | |
self.assertEqual(self.tenant.pk, self.tenant.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I removed these lines.
tests/composite_pk/tests.py
Outdated
) | ||
|
||
def test_error_on_pk_conflict(self): | ||
with self.assertRaises(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be more specific about the exception type than Exception
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split this into two tests with IntegrityError
(as the second assert was actually raising a transaction management error).
Thank you so much for taking the time to review my changes @LilyFoote !
|
I don't feel strongly that this is better or worse than another option here, so happy to go with what you think is best.
I like your tests quite a bit - they're pretty readable and comprehensive. The main issue I have with them is that they're written for specific databases instead of for generic database features. Where possible Django strongly prefers to test based on features because then the tests apply to as many databases as possible (including third party database libraries). I think the asserts of the actual SQL might be a bit tricky to adapt though, so we might need a different way to check what they're checking. Also, after I reviewed yesterday, I thought of some more things:
|
tests/composite_pk/models/tenant.py
Outdated
user = models.ForeignObject( | ||
User, | ||
on_delete=models.CASCADE, | ||
from_fields=("tenant_id", "user_id"), | ||
to_fields=("tenant_id", "id"), | ||
related_name="+", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think this should be a stretch goal to get it working. See the comment above about MultiColSource
.
django/db/models/fields/composite.py
Outdated
return compiler.compile(WhereNode(exprs, connector=OR)) | ||
|
||
|
||
class Cols(Expression): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is an opportunity to merge this TuplesIn
, Cols
, and friends logic with MultiColSource
so it's less of an 👽. They both do very similar thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @charettes , I'll need to look into this, I wasn't aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged Cols
with MultiColSource
(ad51da4) however, I'm not sure this is correct.
As far as I understand, MultiColSource
was meant to represent columns in a JOIN, and as such, it has a sources
field. Cols
, on the other hand, was meant to represent a list of columns and it doesn't need a sources
field. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments here and there. It's great to see the progress you've made on combining MultiColSource
🎉
Something that came through my mind while reviewing is that we likely want a plan to eventually deprecate Options.pk
in favor of Options.primary_key
?
self.targets, self.sources, self.alias = targets, sources, alias | ||
self.source_expressions = [Col(self.alias, target) for target in self.targets] | ||
|
||
def __repr__(self): | ||
return "{}({}, {})".format(self.__class__.__name__, self.alias, self.field) | ||
|
||
def __len__(self): | ||
return len(self.targets) | ||
|
||
def get_source_expressions(self): | ||
return self.source_expressions | ||
|
||
def set_source_expressions(self, exprs): | ||
assert all(isinstance(expr, Col) for expr in exprs) | ||
self.source_expressions = exprs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common pattern in other expressions is not to define self.source_expressions
but to return properties directly
e.g.
self.targets, self.sources, self.alias = targets, sources, alias | |
self.source_expressions = [Col(self.alias, target) for target in self.targets] | |
def __repr__(self): | |
return "{}({}, {})".format(self.__class__.__name__, self.alias, self.field) | |
def __len__(self): | |
return len(self.targets) | |
def get_source_expressions(self): | |
return self.source_expressions | |
def set_source_expressions(self, exprs): | |
assert all(isinstance(expr, Col) for expr in exprs) | |
self.source_expressions = exprs | |
self.targets, self.sources, self.alias = targets, sources, alias | |
def __repr__(self): | |
return "{}({}, {})".format(self.__class__.__name__, self.alias, self.field) | |
def __len__(self): | |
return len(self.targets) | |
def get_source_expressions(self): | |
return [Col(self.alias, target) for target in self.targets] | |
def set_source_expressions(self, exprs): | |
assert all(isinstance(expr, Col) for expr in exprs) | |
self.targets = [col.target for col in exprs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks!
@@ -9,22 +10,38 @@ | |||
) | |||
|
|||
|
|||
class MultiColSource: | |||
class MultiColSource(Expression): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is thing is meant to be used for non-related fields as field it might be worth graduating it to models.expressions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Col
is in models.expressions
, it would make sense to move this to models.expressions
as well. But please see my other comment about this first.
sql, _ = col.as_sql(compiler, connection) | ||
sqls.append(sql) | ||
|
||
return ", ".join(sqls), [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed we'd need parathensis wrapping here but maybe I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place where parenthesis would be needed is when performing an operation on SQL tuples. e.g. WHERE (field_1, field_2) = (1, 2)
.
This is something I could add to the TupleIn
, TupleExact
lookups in a follow-up PR, but I don't think it's necessary for this one.
cols = self.get_source_expressions() | ||
|
||
for col in cols: | ||
sql, _ = col.as_sql(compiler, connection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of consistency I think we should accumulate parameters even if they are meant to be empty.
django/db/models/fields/composite.py
Outdated
def get_col(self, alias, output_field=None): | ||
return self.cached_col |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to take into account alias
for usage in subqueries an multiple inclusions of the same table.
tests/composite_pk/test_filter.py
Outdated
self.assertEqual( | ||
list(Comment.objects.order_by("-pk")), | ||
[ | ||
# -pk only adds DESC to the last field of the composite key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should definitely add DESC
to both but it's true that ordering is kind of convoluted due to how it resolves lately.
django/db/models/fields/composite.py
Outdated
for vals in self.rhs: | ||
exprs.append( | ||
WhereNode( | ||
[Exact(col, val) for col, val in zip(cols, vals)], connector=AND | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to eventually use row level comparison instead for backends that support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could work on this in the next PR, however, I'm not sure if there's any real benefit to it other than generating nicer SQL.
django/db/models/fields/composite.py
Outdated
from django.utils.functional import cached_property | ||
|
||
|
||
class TupleExact(Exact): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there might be a way to define a TupleLookupMixin
that can be mixed with Exact
, In
, GreaterThan
, and others to do the right thing by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could remove the TupleExact
class and handle the case of comparing tuples inside Exact
, if that's what you're suggesting? It would be a bit more elegant I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I looked into it and I started implementing the other lookups. I don't think it's possible to define a mixin that does the right thing by default, but it's possible to make some abstractions to reduce duplication.
|
||
def contribute_to_class(self, cls, name, **_): | ||
super().contribute_to_class(cls, name, private_only=True) | ||
cls._meta.pk = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense for the time being. In the future we could introduce a public CompositeField
API and deprecate the internal and specialized CompositePrimaryKey
for CompositeField(primary_key=True)
.
# If a model defines Meta.primary_key and a foreign key refers to it, | ||
# the check should be skipped (since primary keys are unique). | ||
pk = self.remote_field.model._meta.primary_key | ||
if pk and set(pk) == {field.attname for field in self.foreign_related_fields}: | ||
return [] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should eventually extend Options.total_unique_constraints
instead or replace with an a method that returns set of fields that are totally unique together by combining field.unique
, Meta.unique_together
, Meta.constraints
(for unique constraints), and now primary_key
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, and also field.primary_key
I suppose? I would lean towards replacing it with a new method then, since changing this total_unique_constraints
would break backwards compatibility (although it's not really a public API, is it?).
Thanks @charettes !
I'm not sure what you mean by that, I don't think we can, because |
So as far as I understand, at the moment I'm not sure it's the right decision to reuse this for composite fields, which on the other hand don't need Let me know what you think! |
You're completely right. In this case is |
django/db/models/fields/composite.py
Outdated
|
||
@cached_property | ||
def cached_col(self): | ||
return MultiColSource(self.model._meta.db_table, self.fields, self.fields, self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that you set self.fields
for both source
and target
is a good sign that MultiColSource
is not the appropriate expression for cached_col
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a new expression Cols
makes most sense
django/db/models/fields/composite.py
Outdated
elif lookup_name == "in": | ||
return TupleIn | ||
|
||
return super().get_lookup(lookup_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a sense that all the other lookups won't work by default so we should likely error out here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not implemented, it will throw a django.db.utils.ProgrammingError
when executing the query - not sure if an explicit error is necessary.
Btw, I implemented gte
, gt
, lte
, lt
too now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two minor points about error message wording. They're not urgent to address.
tests/composite_pk/test_get.py
Outdated
ValueError, "'primary_key' must be a tuple or a list" | ||
): | ||
Comment.objects.get(pk=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if this could be "'pk' must be a tuple or a list"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is possible unfortunately, pk
is translated to the field
django/db/models/fields/composite.py
Outdated
) | ||
if not all(len(self.lhs) == len(vals) for vals in self.rhs): | ||
raise ValueError( | ||
"The left-hand side and right-hand side of the 'in' lookup must " | ||
"have the same number of elements" | ||
f"'{self.lhs.field.name}' must have {len(self.lhs)} elements" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is quite clear. If the pk
has two component fields, but the lookup is pk__in=([1, 2], [3])
we will raise an error 'pk' must have 2 elements
but the tuple does have two elements. I'm not sure what to suggest instead though - it's tricky...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some changes, let me know what you think
It would not be set, if it's a regular primary key, |
Trac ticket number
ticket-373
Branch description
Proposal
Previous PR
Checklist
main
branch.