Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure Matrix read operations #314

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

NikolaosPapailiou
Copy link
Collaborator

Restructure Matrix read operations to handle incomplete queries and batching

Copy link

This pull request has been linked to Shortcut Story #43913: Restructure Matrix read operations.

Copy link
Contributor

@jparismorgan jparismorgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

(Optional)(Non-blocking) It would be really quite nice if we could unit test this, as if it breaks right now we'll have no way to know. Maybe we could set readBatchSize in the constructor and default to READ_BATCH_SIZE, and then in some unit tests set this to a smaller number.

src/include/tdb_defs.h Outdated Show resolved Hide resolved
src/include/detail/linalg/tdb_matrix_with_ids.h Outdated Show resolved Hide resolved
@@ -36,6 +36,28 @@
#include <string>
#include <tiledb/tiledb>

// Default batch size for all TileDB read operations.
// This is expressed in number of array cells read per request.
constexpr size_t DEFAULT_READ_BATCH_SIZE_CELLS = 100000000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using 100'000'000 for legibility.

Copy link
Collaborator

@lums658 lums658 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the reads are reading in batches until the total size of the array is reached. I don't think this is a good way to do this. The initial request should the total_size we want to read, as that will complete in the vast vast majority of cases. If it doesn't, then we should continue the query and get more.

There is no reason to batch things in this way -- that is a pattern for a different use case.

Especially if there is a fixed overhead per query, we want to minimize the number of actual queries submitted.

tiledb_helpers::submit_query(tdb_func__, uri_, query);
query.set_subarray(subarray).set_layout(layout_order);
tiledb::Query::Status status;
do {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this result in incomplete queries whenever read_batch_size_cells is less than total_size? I don't think we should batch read unless we actually need to. That is, we should always try to read total_size and then iterate if we don't get total_size.

Having incomplete queries here is a very edge case. Reading in batches may or may not solve that issue, but it adds latency to the common case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of read_batch_size_cells is to setup a cap on the individual request size to TileDB. The cap is configurable, so we could set it to a large value in order to perform all requests in one go.

To explain why I expect this to reduce our memory footprint without large latency overhead, lets use an example:

  • Assume we want to read 15GB of data from a tiledb:// array
  • Assume that TileDB requires some extra memory (20%) to perform the filter pipeline (decompression, etc.)
  • Assume read I/O throughput and network speed is 200MB/s and all operations are IO bound(no CPU overhead)
  • Assume each time we do an extra call to core we have 500ms latency overhead
  • One go
    • Peak allocated memory on client: 15GB
    • Peak allocated memory on server: 15GB + 3GB = 18GB
    • Peak allocated memory(REST+client when collocated during a realtime UDF): 33GB
    • Latency: 500ms + 2 * 15GB/200MB/s = 150,5s
  • With batch size 1GB
    • Peak allocated memory on client: 15GB
    • Peak allocated memory on server: 1GB + 0,3GB = 1,3GB
    • Peak allocated memory(REST+client when collocated during a realtime UDF): 16,3GB
    • Latency: 15*500ms + 2 * 15GB/200MB/s = 157,5s

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One rule in software is "don't pay for what you don't need." It is true that there is only a 5% penalty with this particular set of assumptions. However, I expect the server is going to deal with its memory footprint in its own way, which is why we might get incomplete reads. I would say just request what is needed and let the server say how much memory it wants to devote to that or not.

Do other apps try to be parsimonious this way?

I guess, on the other hand, since this is configurable, we can always just make it really large.

Copy link
Collaborator

@lums658 lums658 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See in-line comment.

tiledb_helpers::submit_query(tdb_func__, uri_, query);
query.set_subarray(subarray).set_layout(layout_order);
tiledb::Query::Status status;
do {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One rule in software is "don't pay for what you don't need." It is true that there is only a 5% penalty with this particular set of assumptions. However, I expect the server is going to deal with its memory footprint in its own way, which is why we might get incomplete reads. I would say just request what is needed and let the server say how much memory it wants to devote to that or not.

Do other apps try to be parsimonious this way?

I guess, on the other hand, since this is configurable, we can always just make it really large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants