Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: bfloat16 and float16 vector does not support panda Dataframe data type with not user friendly error message #2024

Open
1 task done
binbinlv opened this issue Apr 9, 2024 · 1 comment
Assignees
Labels
kind/bug Something isn't working

Comments

@binbinlv
Copy link

binbinlv commented Apr 9, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

bfloat16 and float16 vector does not support panda Dataframe data type with not user friendly error message

>>> import pandas as pd
>>> df = pd.DataFrame({"int64": [i for i in range(nb)], "float16_vector": vectors})
>>> res = collection.insert(df)
RPC error: [batch_insert], <ParamError: (code=1, message=Collection field dim is 128, but entities field dim is 64)>, <Time:{'RPC start': '2024-04-09 18:04:56.172554', 'RPC error': '2024-04-09 18:04:56.172686'}>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/orm/collection.py", line 500, in insert
    return conn.batch_insert(
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 147, in handler
    raise e from e
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 143, in handler
    return func(*args, **kwargs)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 182, in handler
    return func(self, *args, **kwargs)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 122, in handler
    raise e from e
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 87, in handler
    return func(*args, **kwargs)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 575, in batch_insert
    raise err from err
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 558, in batch_insert
    request = self._prepare_batch_insert_request(
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 542, in _prepare_batch_insert_request
    else Prepare.batch_insert_param(collection_name, entities, partition_name, fields_info)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/prepare.py", line 527, in batch_insert_param
    location = cls._pre_batch_check(entities, fields_info)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/prepare.py", line 472, in _pre_batch_check
    location, primary_key_loc, auto_id_loc = traverse_info(fields_info, entities)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/utils.py", line 303, in traverse_info
    raise ParamError(
pymilvus.exceptions.ParamError: <ParamError: (code=1, message=Collection field dim is 128, but entities field dim is 64)>

Expected Behavior

Float32_vector support Dataframe, it is better to support it for bfloat16 and float16 vector data. At least, the error message should be more user friendly such as "non-float32 vector not support dataframe..."

Steps/Code To Reproduce behavior

from pymilvus import CollectionSchema, FieldSchema
from pymilvus import Collection
from pymilvus import connections
from pymilvus import DataType
from pymilvus import Partition
from pymilvus import utility

connections.connect()
dim = 128
int64_field = FieldSchema(name="int64", dtype=DataType.INT64, is_primary=True)
bfloat16_vector = FieldSchema(name="float16_vector", dtype=DataType.BFLOAT16_VECTOR, dim=dim)
schema = CollectionSchema(fields=[int64_field, bfloat16_vector])
collection_name = "vector"
collection = Collection(collection_name, schema=schema)
def gen_bf16_vectors(num, dim):
    """
    generate brain float16 vector data
    raw_vectors : the vectors
    bf16_vectors: the bytes used for insert
    return: raw_vectors and bf16_vectors
    """
    raw_vectors = []
    bf16_vectors = []
    for _ in range(num):
        raw_vector = [random.random() for _ in range(dim)]
        raw_vectors.append(raw_vector)
        bf16_vector = tf.cast(raw_vector, dtype=tf.bfloat16).numpy()
        bf16_vectors.append(bf16_vector)
    return raw_vectors, bf16_vectors

num = 1000
import random
import tensorflow as tf
vectors = gen_bf16_vectors(num, dim)[1]
nb = num
res = collection.insert([[i for i in range(nb)], vectors])
import pandas as pd
df = pd.DataFrame({"int64": [i for i in range(nb)], "float16_vector": vectors})
res = collection.insert(df)

Environment details

- Hardware/Softward conditions (OS, CPU, GPU, Memory):
- Method of installation (Docker, or from source):
- Milvus version (v0.3.1, or v0.4.0):
- Milvus configuration (Settings you made in `server_config.yaml`):

Anything else?

No response

@binbinlv binbinlv added the kind/bug Something isn't working label Apr 9, 2024
@binbinlv
Copy link
Author

binbinlv commented Apr 9, 2024

/assign @XuanYang-cn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants