Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NTuple{UInt8} not getting correctly written out #427

Open
Moelf opened this issue Apr 13, 2023 · 3 comments
Open

NTuple{UInt8} not getting correctly written out #427

Moelf opened this issue Apr 13, 2023 · 3 comments

Comments

@Moelf
Copy link
Contributor

Moelf commented Apr 13, 2023

similar to #411 but corresponds to fixedsizelist.jl:

julia> data1 = (; x = [(0x01, 0x02), (0x03, 0x04)])

julia> Arrow.write("/tmp/julia1.feather", data1)

julia> data2 = (; x = [b"\x01\x02", b"\x03\x04"])

julia> Arrow.write("/tmp/julia2.feather", data2)

julia> data3 = (; x = [(0x0001, 0x0002), (0x0003, 0x0004)])

julia> Arrow.write("/tmp/julia3.feather", data3)
In [12]: pyarrow.feather.read_table("/tmp/julia1.feather")["x"]
Out[12]:
<pyarrow.lib.ChunkedArray object at 0x7fd62050c400>
[
  [
    0102,
    0304
  ]
]

In [13]: pyarrow.feather.read_table("/tmp/julia2.feather")["x"]
Out[13]:
<pyarrow.lib.ChunkedArray object at 0x7fd62387ee30>
[
  [
    0102,
    0304
  ]
]

In [14]: pyarrow.feather.read_table("/tmp/julia3.feather")["x"]
Out[14]:
<pyarrow.lib.ChunkedArray object at 0x7fd62046da30>
[
  [
    [
      1,
      2
    ],
    [
      3,
      4
    ]
  ]
]
@Moelf Moelf changed the title Tuple{UInt8} not getting correctly written out NTuple{UInt8} not getting correctly written out Apr 13, 2023
@quinnj
Copy link
Member

quinnj commented Jun 6, 2023

Yeah, I agree this isn't ideal. At the time, I thought this was probably a reasonable way to translate to the arrow fixed size binary data type, but in reality, we should have tried a way to limit to only Base.CodeUnits like we do now for the list data type. The problem is that we now unequivocally treat Base.CodeUnits as list, so there's not a straightforward way to say, "hey, I have a vector of fixed size binary data and want the fixed size binary arrrow data type". We could create a wrapper like Arrow.FixedSizeBinary that people would have to use explicitly, but that's a bit annoying. Let me think on this one for just a bit.

In any case, we would probably want to modify the FixedSizeListKind in ArrowTypes to also have a 3rd type parameter to track whether the fixed size should be binary or not (since we don't want to unequivocally treat UInt8 eltype as binary, which is the core issue here).

@Moelf
Copy link
Contributor Author

Moelf commented Jun 6, 2023

let me know if you want me to try my hands on this one (once you have a design idea)

@Moelf
Copy link
Contributor Author

Moelf commented Feb 1, 2024

fairly critical, bump?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants