Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Use Tuple[K, V] instead of DictEntry[K, V] #2554

Open
1 task done
gabrieldemarmiesse opened this issue May 6, 2024 · 9 comments
Open
1 task done
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library

Comments

@gabrieldemarmiesse
Copy link
Contributor

Review Mojo's priorities

What is your request?

As title

What is your motivation for this change?

DictEntry was there because Tuple only worked with AnyRegType, now that Tuple works with AnyType, there is no need for DictEntry anymore. We can just return Tuple when working with Dict.items()

Reference:

Any other details?

No response

@gabrieldemarmiesse gabrieldemarmiesse added enhancement New feature or request mojo-repo Tag all issues with this label labels May 6, 2024
@rparolin rparolin added the good first issue Good for newcomers label May 6, 2024 — with Linear
@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label May 6, 2024 — with Linear
@ematejska ematejska added mojo-repo Tag all issues with this label and removed mojo-repo Tag all issues with this label labels May 6, 2024
@VMois
Copy link

VMois commented May 8, 2024

I would like to work on this feature.

@jayzhan211
Copy link
Contributor

@gabrieldemarmiesse If we replace with Tuple[K, V], how do we deal with the key hash?

@value
struct DictEntry[K: KeyElement, V: CollectionElement](CollectionElement):
    """Store a key-value pair entry inside a dictionary.

    Parameters:
        K: The key type of the dict. Must be Hashable+EqualityComparable.
        V: The value type of the dict.
    """

    var hash: Int
    """`key.__hash__()`, stored so hashing isn't re-computed during dict lookup."""
    var key: K
    """The unique key for the entry."""
    var value: V
    """The value associated with the key."""

    fn __init__(inout self, owned key: K, owned value: V):
        """Create an entry from a key and value, computing the hash.

        Args:
            key: The key of the entry.
            value: The value of the entry.
        """
        self.hash = hash(key)
        self.key = key^
        self.value = value^

@VMois
Copy link

VMois commented May 13, 2024

@gabrieldemarmiesse If we replace with Tuple[K, V], how do we deal with the key hash?

@value
struct DictEntry[K: KeyElement, V: CollectionElement](CollectionElement):
    """Store a key-value pair entry inside a dictionary.

    Parameters:
        K: The key type of the dict. Must be Hashable+EqualityComparable.
        V: The value type of the dict.
    """

    var hash: Int
    """`key.__hash__()`, stored so hashing isn't re-computed during dict lookup."""
    var key: K
    """The unique key for the entry."""
    var value: V
    """The value associated with the key."""

    fn __init__(inout self, owned key: K, owned value: V):
        """Create an entry from a key and value, computing the hash.

        Args:
            key: The key of the entry.
            value: The value of the entry.
        """
        self.hash = hash(key)
        self.key = key^
        self.value = value^

We can make DictEntry an internal structure and only return Tuple[K, V] to the user.

@VMois
Copy link

VMois commented May 13, 2024

@jayzhan211 are you working on this issue? If yes, please, continue. If not, I will start working on it.

@jayzhan211
Copy link
Contributor

@jayzhan211 are you working on this issue? If yes, please, continue. If not, I will start working on it.

not working on it.

@VMois
Copy link

VMois commented May 19, 2024

I started working on it. A few questions:

  1. Should Tuple return key and value directly (Tuple[K, V]) or references to them (Tuple[Reference[K], Reference[V]])? I am a bit worried that if Tuple[K, V] is used, unnecessary object copies will be made.

  2. Do we have a way to benchmark a Dict? I would like to test Dict performance before and after I made changes in case of any regressions.

Thank you.

@gabrieldemarmiesse
Copy link
Contributor Author

Thanks for working on this!

  1. Prefer using references instead of copies. That will allow the iterator to work with non-copyable types.
  2. we don't have benchmarks yet, but if you avoid copying the data around and use references in the tuple, this should be pretty fast. We can always make it faster later on.

The Modular staff is currently working on a benchmarking infrastructure. It's not ready yet. But in general avoiding copies is always a good way to have good performance out of the box.

@VMois
Copy link

VMois commented May 20, 2024

I have a very unusual issue.

I tried Tuple[K, V], and the tests pass, but when I try Tuple[Reference[K,...], Reference[V, ...]], it throws a weird error during the test run (nothing when building the library).

The error is here:

/Users/vmois/Projects/mojo_stuff/mojo/stdlib/src/collections/dict.mojo:628:29: error: symbol use argument #1 has type 

'!kgen.pointer<struct<(!kgen.pack<[[pointer<T>, {"__moveinit__" : (!kgen.pointer<pointer<T>> init_self, !kgen.pointer<pointer<T>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__moveinit__(stdlib::memory::reference::Reference[$0, $1, $2, $3]=&,stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<T>, {"__del__" : (!kgen.pointer<T> owned_in_mem) -> !kgen.none = get_type_method(T, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>, "__del__" : (!kgen.pointer<pointer<T>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__del__(stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<T>, {"__del__" : (!kgen.pointer<T> owned_in_mem) -> !kgen.none = get_type_method(T, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>}], [pointer<U>, {"__moveinit__" : (!kgen.pointer<pointer<U>> init_self, !kgen.pointer<pointer<U>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__moveinit__(stdlib::memory::reference::Reference[$0, $1, $2, $3]=&,stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<U>, {"__del__" : (!kgen.pointer<U> owned_in_mem) -> !kgen.none = get_type_method(U, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>, "__del__" : (!kgen.pointer<pointer<U>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__del__(stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<U>, {"__del__" : (!kgen.pointer<U> owned_in_mem) -> !kgen.none = get_type_method(U, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>}]]>) memoryOnly>>'

but @stdlib::collections::dict::_DictEntryToTupleIter::__next__(stdlib::collections::dict::_DictEntryToTupleIter[$0, $1, $2, $3]&) expected type 

'!kgen.pointer<struct<(!kgen.pack<[[pointer<T>, {"__moveinit__" : (!kgen.pointer<pointer<T>> init_self, !kgen.pointer<pointer<T>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__moveinit__(stdlib::memory::reference::Reference[$0, $1, $2, $3]=&,stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<T>, {"__del__" : (!kgen.pointer<T> owned_in_mem) -> !kgen.none = get_type_method(T, "__del__")}], :i1 0, :struct<()> {  }, 0>, "__del__" : (!kgen.pointer<pointer<T>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__del__(stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<T>, {"__del__" : (!kgen.pointer<T> owned_in_mem) -> !kgen.none = get_type_method(T, "__del__")}], :i1 0, :struct<()> {  }, 0>}], [pointer<U>, {"__moveinit__" : (!kgen.pointer<pointer<U>> init_self, !kgen.pointer<pointer<U>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__moveinit__(stdlib::memory::reference::Reference[$0, $1, $2, $3]=&,stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<U>, {"__del__" : (!kgen.pointer<U> owned_in_mem) -> !kgen.none = get_type_method(U, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>, "__del__" : (!kgen.pointer<pointer<U>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__del__(stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<U>, {"__del__" : (!kgen.pointer<U> owned_in_mem) -> !kgen.none = get_type_method(U, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>}]]>) memoryOnly>>'
        for kv in self.items()

And here is the code - https://github.com/VMois/mojo/pull/1/files (previous commits have Tuple[K, V] changes).

The only difference between the current and expected types is *"self'2x" (present in current) for pointer T but not in expected ({ }).

Could someone help? Maybe my approach is wrong. Thank you.

@jayzhan211
Copy link
Contributor

jayzhan211 commented May 21, 2024

Should we replace DictEntry with Tuple in _DictEntryIter instead of introducing _DictEntryToTupleIter

Upd: I rethink about the issue here. Given that we cant remove DictEntry because we need keyhash, I think replacing DictEntry with Tuple has no benefit at all. _DictEntryIter already returns Reference[DictEntry], if we want to replace it with Tuple[Reference[K], Reference[V]] we then need to get the referenced key and value from DictEntry and combine them into Tuple, it is quite challenging without benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library
Projects
None yet
Development

No branches or pull requests

5 participants