Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: bulkwriter does not support add row for new json datatype #2080

Open
1 task done
zhuwenxing opened this issue May 11, 2024 · 1 comment
Open
1 task done

[Bug]: bulkwriter does not support add row for new json datatype #2080

zhuwenxing opened this issue May 11, 2024 · 1 comment
Assignees
Labels
kind/bug Something isn't working

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

new json datatype can be int, float, varchar and array. but in bulkwriter will check data as key value pair.

[2024-05-10T09:38:20.579Z]         with RemoteBulkWriter(
[2024-05-10T09:38:20.579Z]             schema=schema,
[2024-05-10T09:38:20.579Z]             remote_path="bulk_data",
[2024-05-10T09:38:20.579Z]             connect_param=RemoteBulkWriter.ConnectParam(
[2024-05-10T09:38:20.579Z]                 bucket_name=self.bucket_name,
[2024-05-10T09:38:20.579Z]                 endpoint=self.minio_endpoint,
[2024-05-10T09:38:20.579Z]                 access_key="minioadmin",
[2024-05-10T09:38:20.579Z]                 secret_key="minioadmin",
[2024-05-10T09:38:20.579Z]             ),
[2024-05-10T09:38:20.579Z]             file_type=BulkFileType.NUMPY,
[2024-05-10T09:38:20.579Z]         ) as remote_writer:
[2024-05-10T09:38:20.579Z]             json_value = [
[2024-05-10T09:38:20.579Z]                 1,
[2024-05-10T09:38:20.579Z]                 1.0,
[2024-05-10T09:38:20.579Z]                 "1",
[2024-05-10T09:38:20.579Z]                 [1, 2, 3],
[2024-05-10T09:38:20.579Z]                 ["1", "2", "3"],
[2024-05-10T09:38:20.579Z]                 [1, 2, "3"],
[2024-05-10T09:38:20.579Z]                 {"key": "value"},
[2024-05-10T09:38:20.579Z]             ]
[2024-05-10T09:38:20.579Z]             for i in range(entities):
[2024-05-10T09:38:20.579Z]                 row = {
[2024-05-10T09:38:20.579Z]                     df.pk_field: i,
[2024-05-10T09:38:20.579Z]                     df.int_field: 1,
[2024-05-10T09:38:20.579Z]                     df.float_field: 1.0,
[2024-05-10T09:38:20.579Z]                     df.string_field: "string",
[2024-05-10T09:38:20.579Z]                     df.json_field: json_value[i%len(json_value)],
[2024-05-10T09:38:20.579Z]                     df.vec_field: cf.gen_vectors(1, dim)[0]
[2024-05-10T09:38:20.579Z]                 }
[2024-05-10T09:38:20.579Z]                 if auto_id:
[2024-05-10T09:38:20.579Z]                     row.pop(df.pk_field)
[2024-05-10T09:38:20.579Z]                 if enable_dynamic_field:
[2024-05-10T09:38:20.579Z]                     row["name"] = fake.name()
[2024-05-10T09:38:20.579Z]                     row["address"] = fake.address()
[2024-05-10T09:38:20.579Z] >               remote_writer.append_row(row)
[2024-05-10T09:38:20.579Z] 
[2024-05-10T09:38:20.579Z] /home/jenkins/agent/workspace/tests/python_client/testcases/test_bulk_insert.py:1121: 
[2024-05-10T09:38:20.579Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2024-05-10T09:38:20.579Z] /usr/local/lib/python3.8/site-packages/pymilvus/bulk_writer/remote_bulk_writer.py:259: in append_row
[2024-05-10T09:38:20.579Z]     super().append_row(row, **kwargs)
[2024-05-10T09:38:20.579Z] /usr/local/lib/python3.8/site-packages/pymilvus/bulk_writer/local_bulk_writer.py:90: in append_row
[2024-05-10T09:38:20.579Z]     super().append_row(row, **kwargs)
[2024-05-10T09:38:20.579Z] /usr/local/lib/python3.8/site-packages/pymilvus/bulk_writer/bulk_writer.py:89: in append_row
[2024-05-10T09:38:20.579Z]     self._verify_row(row)
[2024-05-10T09:38:20.579Z] /usr/local/lib/python3.8/site-packages/pymilvus/bulk_writer/bulk_writer.py:198: in _verify_row
[2024-05-10T09:38:20.579Z]     row[field.name], size = self._verify_json(row[field.name], field)
[2024-05-10T09:38:20.579Z] /usr/local/lib/python3.8/site-packages/pymilvus/bulk_writer/bulk_writer.py:137: in _verify_json
[2024-05-10T09:38:20.579Z]     self._throw(f"Illegal JSON value for field '{field.name}', type mismatch")
[2024-05-10T09:38:20.579Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2024-05-10T09:38:20.579Z] 
[2024-05-10T09:38:20.579Z] self = <pymilvus.bulk_writer.remote_bulk_writer.RemoteBulkWriter object at 0x7f3a40438160>
[2024-05-10T09:38:20.579Z] msg = "Illegal JSON value for field 'json', type mismatch"
[2024-05-10T09:38:20.579Z] 
[2024-05-10T09:38:20.579Z]     def _throw(self, msg: str):
[2024-05-10T09:38:20.579Z]         logger.error(msg)
[2024-05-10T09:38:20.579Z] >       raise MilvusException(message=msg)
[2024-05-10T09:38:20.579Z] E       pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Illegal JSON value for field 'json', type mismatch)>

Expected Behavior

new json datatype can be added

Steps/Code To Reproduce behavior

No response

Environment details

- Hardware/Softward conditions (OS, CPU, GPU, Memory):
- Method of installation (Docker, or from source):
- Milvus version (v0.3.1, or v0.4.0):
- Milvus configuration (Settings you made in `server_config.yaml`):

Anything else?

No response

@zhuwenxing zhuwenxing added the kind/bug Something isn't working label May 11, 2024
@zhuwenxing
Copy link
Contributor Author

/assign @yhmo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants