Major compaction 1st edition #31804

wayblink · 2024-04-01T12:29:36Z

No description provided.

Signed-off-by: wayblink <[email protected]>

Signed-off-by: Cai Zhang <[email protected]>

Signed-off-by: wayblink <[email protected]>

sre-ci-robot · 2024-04-01T12:29:45Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wayblink
To complete the pull request process, please assign czs007 after the PR has been reviewed.
You can assign the PR to them by writing /assign @czs007 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mergify · 2024-04-01T12:30:12Z

⚠️ The sha of the head commit of this PR conflicts with #31660. Mergify cannot evaluate rules on this PR. ⚠️

wayblink · 2024-04-02T02:33:27Z

configs/milvus.yaml

@@ -435,6 +435,22 @@ dataCoord:
 maxParallelTaskNum: 10 # max parallel compaction task number
 indexBasedCompaction: true

+ major:


About the terminology here：we have major and L2:

I think L2 request user to understand the concept of segment type and relationships between different compaction types and segment types. Here compaction is actually L1 + L2 -> L2, and we will add L2 single compaction, L2 merge compaction in the future. So I think L2 may cause more confusion.

For major compaction, we will split one major compaction into multi sub compaction tasks to handle large dataset and handoff. It is a little weird to also call these subTask as major. We developers should notice that.

I prefer major than L2 for easier user understanding. And here is another proposal ——clusteringCompaction.
clustering is more accurate and echos the concept of clusteringKey.

Final compaction type will be like:
Level0DeleteCompaction
SingleCompaction: add support for L2
MinorCompaction: add support for L2
ClusteringCompaction(MajorCompaction)： L1+L2 -> L2

But since there is a L0 and L1 segment, does it make more sense to have L2 segment and thus L2 compaction?

Clustering compaction looks clearer for me, but definitely not major compaction.

Major compaction usaully means merge all segments/shards into one

Signed-off-by: wayblink <[email protected]>

mergify · 2024-04-02T02:43:17Z

⚠️ The sha of the head commit of this PR conflicts with #31660. Mergify cannot evaluate rules on this PR. ⚠️

Signed-off-by: wayblink <[email protected]>

xiaofan-luan · 2024-04-04T23:22:31Z

internal/core/src/storage/FileManager.h

+ GetRemoteCentroidsObjectPrefix() const {
+ return rcm_->GetRootPath() + "/files" + std::string(ANALYZE_ROOT_PATH) +
+ "/" + std::to_string(index_meta_.build_id) + "/" +
+ std::to_string(index_meta_.index_version) + "/" +


it's not a good idea to put index_version and buildID before collectionID.

much better to put collectionID at front so we get all collection info in one S3 subdir. Any thing block us from doing this?

xiaofan-luan · 2024-04-04T23:27:44Z

internal/core/src/storage/DiskFileManagerImpl.cpp

+ throw SegcoreError(FileOpenFailed, err_msg.str());
+ }
+ auto fileName = GetFileName(file);
+ auto fileSize = local_chunk_manager->Size(file);


there is no need to do a exist then size.
Size should return error if file not exist.

xiaofan-luan · 2024-04-04T23:28:54Z

internal/core/src/storage/DiskFileManagerImpl.cpp

+
+ auto parallel_degree = 16;
+
+ if (batch_remote_files.size() >= parallel_degree) {


You don't really need to control parallel_degree here, Simply submit all task one by one and AddBatchCompactionResultFiles could handle that.

xiaofan-luan · 2024-04-04T23:31:41Z

internal/core/src/storage/DiskFileManagerImpl.cpp

+ std::unordered_map<std::string, int64_t>& map) {
+ auto local_chunk_manager =
+ LocalChunkManagerSingleton::GetInstance().GetChunkManager();
+ auto& pool = ThreadPools::GetThreadPool(milvus::ThreadPoolPriority::HIGH);


I don't think it's a good idea to read everything then write.
Why not simply use high priority pool to read the file out then write back to remote?

It's gonna to be easy to control memory (Fully based on thread pool worker numbers)

BTW, should you use milvus::ThreadPoolPriority::HIGH? or other threadpool? becasue I saw PutCompactionResultData use thread pool MIDDLE

This could also trigger a deadlock here. If I'm correct it's always middle pool trigger task for HIGH priority thread pool but not vice versa.

xiaofan-luan · 2024-04-04T23:35:05Z

internal/core/src/indexbuilder/KmeansMajorCompaction.cpp

+namespace milvus::indexbuilder {
+
+template <typename T>
+KmeansMajorCompaction<T>::KmeansMajorCompaction(


This should not be name as Major compaction.
This is indead KMeansAnalysis.

Compaction is happened on Milvus and has nothing to do with index

xiaofan-luan · 2024-04-05T00:09:29Z

internal/core/src/indexbuilder/KmeansMajorCompaction.cpp

+ train_num = data_num;
+ }
+ auto train_size_new = train_num * dim * sizeof(T);
+ auto buf = Sample(data_files, offsets, train_size_new, data_size);


need a log for sample time

xiaofan-luan · 2024-04-05T00:12:36Z

internal/core/src/indexbuilder/KmeansMajorCompaction.cpp

+ res.what()));
+ }
+ dataset.reset(); // release train data
+ auto centroids = reinterpret_cast<const T*>(res.value()->GetTensor());


Move the rest of the function to another function

xiaofan-luan · 2024-04-05T00:16:12Z

internal/core/src/indexbuilder/KmeansMajorCompaction.cpp

+ result_files_.emplace_back(centroid_stats_path);
+ WritePBFile(stats, centroid_stats_path);
+
+ auto compute_num_in_centroid = [&](const uint32_t* centroid_id_mapping,


all the below should be another function

xiaofan-luan · 2024-04-05T00:23:13Z

internal/core/src/indexbuilder/KmeansMajorCompaction.cpp

+ std::vector<std::string> data_files;
+ std::vector<uint64_t> offsets;
+ uint32_t dim = 0;
+ auto data_size = file_manager_->CacheCompactionRawDataToDisk(


when did cached file cleaned?

We need to check everywhere about the cache file. It should be cleaned if the analysis cleaned or failed

xiaofan-luan · 2024-04-05T00:37:50Z

internal/core/src/indexbuilder/KmeansMajorCompaction.cpp

+ it != insert_files.value().end();
+ it++) {
+ gather_segment_id.emplace_back(it->first);
+ gather_size += offsets[i];


is there a special reason this has to be grouped?

how large WritePBFile could be? I'm assuming we can write everything into one file.

If stats is less than 10GB, we can parallelly read all the files and concurrently write into a same stats , and finally writePBFile to psersistent

xiaofan-luan · 2024-04-05T02:35:54Z

pkg/common/common.go

+ CompactionStagePath = `compaction_stage`
+
+ // AnalyzeStatsPath storage path const for analyze.
+ AnalyzeStatsPath = `filesanalyze_stats`


this is not same as cpp has

seems there are many other places share this path.

xiaofan-luan · 2024-04-05T02:49:00Z

internal/storage/field_value.go

+
+func NewScalarFieldValue(dtype schemapb.DataType, data interface{}) ScalarFieldValue {
+ switch dtype {
+ case schemapb.DataType_Int8:


what happened to boolean?

xiaofan-luan · 2024-04-05T02:59:29Z

internal/datacoord/analysis_scheduler.go

+ }
+ ats.updateTaskState(taskID, taskInProgress)
+ case taskRetry:
+ if !ats.dropIndexTask(taskID, t.NodeID) {


what is differnece bewteen analysisTask and indexTask? can we simply use one scheduler?

I think wise to use a new Concept Job. Index is a job, analysis is another job.
The scheduler works ob job but don't need to worry about what job it is.

current implementation is a weird combination with index and analysis task

xiaofan-luan · 2024-04-05T03:07:50Z

configs/milvus.yaml

@@ -435,6 +435,22 @@ dataCoord:
 maxParallelTaskNum: 10 # max parallel compaction task number
 indexBasedCompaction: true

+ major:


But since there is a L0 and L1 segment, does it make more sense to have L2 segment and thus L2 compaction?

Clustering compaction looks clearer for me, but definitely not major compaction.

Major compaction usaully means merge all segments/shards into one

xiaofan-luan · 2024-04-05T03:12:26Z

internal/datacoord/compaction.go

@@ -516,6 +510,50 @@ func (c *compactionPlanHandler) handleMergeCompactionResult(plan *datapb.Compact
 return nil
 }

+func (c *compactionPlanHandler) handleMajorCompactionResult(plan *datapb.CompactionPlan, result *datapb.CompactionPlanResult) error {


what is the difference between handleMajorCompactionResult && handleMajorCompactionResult?

internal/datanode/io/binlog_io.go

xiaofan-luan · 2024-04-05T04:14:10Z

internal/distributed/indexnode/service.go

@@ -289,6 +289,18 @@ func (s *Server) GetMetrics(ctx context.Context, request *milvuspb.GetMetricsReq
 return s.indexnode.GetMetrics(ctx, request)
 }

+func (s *Server) Analysis(ctx context.Context, request *indexpb.AnalysisRequest) (*commonpb.Status, error) {


use CreateJob and it should support both AnalysisRequest and IndexRequest, something like
message CreateJobRequestV2 {
AnalysisRequest
IndexRequest
}

xiaofan-luan · 2024-04-05T04:20:40Z

internal/querynodev2/delegator/segment_pruner.go

 "github.com/milvus-io/milvus/pkg/common"
 "github.com/milvus-io/milvus/pkg/log"
 "github.com/milvus-io/milvus/pkg/util/distance"
 "github.com/milvus-io/milvus/pkg/util/funcutil"
 "github.com/milvus-io/milvus/pkg/util/merr"
+ "github.com/milvus-io/milvus/pkg/util/typeutil"
 )

 const defaultFilterRatio float64 = 0.5


need monitering metrics about pruned ratio here.

xiaofan-luan · 2024-04-05T04:24:42Z

internal/querynodev2/delegator/segment_pruner.go

@@ -152,6 +164,7 @@ func FilterSegmentsByVector(partitionStats *storage.PartitionStatsSnapshot,
 }
 // currently, we only support float vector and only one center one segment
 if disErr != nil {
+ log.Error("calculate distance error", zap.Error(disErr))


Below this line

switch searchReq.GetMetricType() { case distance.L2: sort.SliceStable(segmentsToSearch, func(i, j int) bool { return segmentsToSearch[i].distance < segmentsToSearch[j].distance }) case distance.IP, distance.COSINE: sort.SliceStable(segmentsToSearch, func(i, j int) bool { return segmentsToSearch[i].distance > segmentsToSearch[j].distance }) }

what if search request don't have metric type?

xiaofan-luan · 2024-04-05T04:40:57Z

internal/querynodev2/delegator/segment_pruner.go

 "github.com/milvus-io/milvus/pkg/common"
 "github.com/milvus-io/milvus/pkg/log"
 "github.com/milvus-io/milvus/pkg/util/distance"
 "github.com/milvus-io/milvus/pkg/util/funcutil"
 "github.com/milvus-io/milvus/pkg/util/merr"
+ "github.com/milvus-io/milvus/pkg/util/typeutil"
 )

 const defaultFilterRatio float64 = 0.5


filter ratio might not be good enough, we need a lookup table here.
If segment <= 16 , do not filtering
[17, 32], segment * 0.5
[33, 256], 16 + (segment - 32) * 0.2
[256, 1024], 60 + (segment - 256) * 0.05
[1024, more], 98 + (segment - 1024) * 0.01

and we should make sure this is tunable

need to test recall on multiple dataset as well

ideally we expect segment size to be 100+

jaime0815 · 2024-04-05T04:00:08Z

internal/core/src/indexbuilder/KmeansMajorCompaction.cpp

+ auto local_chunk_manager =
+ storage::LocalChunkManagerSingleton::GetInstance().GetChunkManager();
+ // train data fits in memory, read by sequence and generate centroids and id_mapping in one pass
+ if (train_size >= total_size) {


suggest moving the code that reads data into buffer to another method.

jaime0815 · 2024-04-05T06:38:08Z

internal/core/src/indexbuilder/KmeansMajorCompaction.cpp

+ if constexpr (!std::is_same_v<T, float>) {
+ PanicInfo(
+ ErrorCode::UnexpectedError,
+ fmt::format("kmeans major compaction only supports float32 now"));


shall we add the current type to error message?

jaime0815 · 2024-04-05T06:42:58Z

internal/core/src/indexbuilder/KmeansMajorCompaction.cpp

+ auto insert_files = milvus::index::GetValueFromConfig<
+ std::map<int64_t, std::vector<std::string>>>(config_, "insert_files");
+ AssertInfo(insert_files.has_value(),
+ "insert file paths is empty when major compaction");


clustering instead of all major compaction concepts on segcore layer.

jaime0815 · 2024-04-05T06:56:01Z

internal/core/src/indexbuilder/KmeansMajorCompaction.cpp

+ std::string id_mapping_path =
+ output_path + std::to_string(gather_segment_id[j]);
+ result_files_.emplace_back(id_mapping_path);
+ WritePBFile(stats, id_mapping_path);


Is there any guarantee to clean generated files when occurs some errors?

jaime0815 · 2024-04-05T06:59:11Z

internal/core/src/indexbuilder/analysis_c.cpp

+ auto& config = analysis_info->config;
+ config["insert_files"] = analysis_info->insert_files;
+ config["segment_size"] = analysis_info->segment_size;
+ config["train_size"] = analysis_info->train_size;


add a log for each configuration of analysis

jaime0815 · 2024-04-05T07:44:25Z

internal/metastore/kv/datacoord/kv_catalog.go

+}
+
+func (kc *Catalog) SaveAnalysisTask(ctx context.Context, task *model.AnalysisTask) error {
+ key := buildAnalysisTaskKey(task.TaskID)


suggest adding d bid and collection id into the key which scan keys is easier in multiple db and collections scene

xiaofan-luan · 2024-04-06T18:27:39Z

we can not assume we can use local disk for caching(Currently I thought It should be more like pure memory + remote storage mode).

All of the IOs need to be done through chunkManager. and We should carefully design the cleanup path

Signed-off-by: wayblink <[email protected]>

liliu-z · 2024-04-07T09:45:16Z

internal/core/src/indexbuilder/IndexFactory.h

@@ -109,6 +111,37 @@ class IndexFactory {
 throw std::invalid_argument(invalid_dtype_msg);
 }
 }
+
+ MajorCompactionBasePtr
+ CreateCompactionJob(DataType type,


It is weird to create a clustering job in an index factory.

This factory itself is a bad design legacy, which is an only a collection of function instead of object.

For now we don't have to have a creator/factory for jobs, and we can simply make a function in an unnamed namespace in analysis_c.cpp

I think it is OK to create a new folder for Clustering Job

liliu-z · 2024-04-07T10:14:57Z

internal/core/src/indexbuilder/analysis_c.cpp

+}
+
+CStatus
+AppendAnalysisFieldMetaInfo(CAnalysisInfo c_analysis_info,


Just curios, why we need these bunch of appending API? Did we analyze the cost of CGo call? Some potential alternative solutions:

Just pass the params through the CGO function call.

If we need more flexibility, can Json help on this?

liliu-z · 2024-04-07T10:16:21Z

internal/core/src/indexbuilder/analysis_c.cpp

+ auto& config = analysis_info->config;
+ config["insert_files"] = analysis_info->insert_files;
+ config["segment_size"] = analysis_info->segment_size;
+ config["train_size"] = analysis_info->train_size;


Why we need to have an empty JSON in the c_analysis_info Object and fill it with other fields of this object?

liliu-z · 2024-04-07T11:43:32Z

internal/core/src/storage/DiskFileManagerImpl.h

@@ -64,6 +64,12 @@ class DiskFileManagerImpl : public FileManagerImpl {
 std::string
 GetLocalRawDataObjectPrefix();

+ std::string


FileManager's API is from Knowhere, its target is helping Knowhere manipulate files stored in Milvus' storage. We should not add any API more than that

Looks like so many misusing here, including Tantivy's support

For all Disk interactions of this clustering work, let's move it to a clustering io util file

czs007 · 2024-04-08T01:51:21Z

configs/milvus.yaml

@@ -435,6 +435,23 @@ dataCoord:
 maxParallelTaskNum: 10 # max parallel compaction task number
 indexBasedCompaction: true

+ clustering:


need to talk about the configuration

liliu-z · 2024-04-08T03:09:47Z

internal/core/src/indexbuilder/KmeansMajorCompaction.h

+ Train() override;
+
+ BinarySet
+ Upload() override;
+


Why we need to separate these two parts?
Also Train & Upload are not suitable APIs for a Job, a Job can only do run.

liliu-z

Two many disk operations without error handling, destruction mixed with clustering job logic. Let's create a new disk util file to handle all these

liliu-z · 2024-04-08T03:23:25Z

internal/core/src/indexbuilder/KmeansMajorCompaction.cpp

+ err_msg << "Error: write local file '" << file_path << " failed, "
+ << strerror(errno);
+ throw SegcoreError(FileWriteFailed, err_msg.str());
+ }


Need to close the stream

Signed-off-by: wayblink <[email protected]> Signed-off-by: Cai Zhang <[email protected]> Signed-off-by: chasingegg <[email protected]> Co-authored-by: chasingegg <[email protected]>

Signed-off-by: chyezh <[email protected]> Add metric for lru and fix lost delete data when enable lazy load (milvus-io#31868) Signed-off-by: chyezh <[email protected]> feat: Support stream reduce v1 (milvus-io#31873) related: milvus-io#31410 --------- Signed-off-by: MrPresent-Han <[email protected]> Change do wait lru dev (milvus-io#31878) Signed-off-by: sunby <[email protected]> enhance: add config for disk cache (milvus-io#31881) fix config not initialized (milvus-io#31890) Signed-off-by: sunby <[email protected]> fix error handle in search (milvus-io#31895) Signed-off-by: sunby <[email protected]> fix: thread safe vector (milvus-io#31898) fix: insert record cannot reinsert (milvus-io#31900) enhance: cancel concurrency restrict for stream reduce and add metrics (milvus-io#31892) Signed-off-by: MrPresent-Han <[email protected]> fix: bit set (milvus-io#31905) fix bitset clear to reset (milvus-io#31908) Signed-off-by: MrPresent-Han <[email protected]> Fix 0404 lru dev (milvus-io#31914) fix: 1. sealed_segment num_rows reset to std::null opt 2. sealed_segment lazy_load reset to true after evicting to avoid shortcut --------- Signed-off-by: MrPresent-Han <[email protected]> fix possible block due to unpin fifo activating principle (milvus-io#31924) Signed-off-by: MrPresent-Han <[email protected]> Add lru reloader lru dev (milvus-io#31952) Signed-off-by: sunby <[email protected]> fix query limit (milvus-io#32060) Signed-off-by: sunby <[email protected]> fix: lru cache lost delete and wrong mem size (milvus-io#32072) issue: milvus-io#30361 Signed-off-by: chyezh <[email protected]> enhance: add more metrics for cache and search (milvus-io#31777) (milvus-io#32097) issue: milvus-io#30931 Signed-off-by: chyezh <[email protected]> fix:panic due to empty search result when stream reducing(milvus-io#32009) (milvus-io#32083) related: milvus-io#32009 Signed-off-by: MrPresent-Han <[email protected]> fix: sealed segment may not exist when throw (milvus-io#32098) issue: milvus-io#30361 Signed-off-by: chyezh <[email protected]> Major compaction 1st edition (milvus-io#31804) (milvus-io#32116) Signed-off-by: wayblink <[email protected]> Signed-off-by: Cai Zhang <[email protected]> Signed-off-by: chasingegg <[email protected]> Co-authored-by: chasingegg <[email protected]> fix: inconsistent between state lock and load state (milvus-io#32171) issue: milvus-io#30361 Signed-off-by: chyezh <[email protected]> enhance: Throw error instead of crash when index cannot be built (milvus-io#31844) issue: milvus-io#27589 --------- Signed-off-by: Cai Zhang <[email protected]> (cherry picked from commit 1b76766) Signed-off-by: jaime <[email protected]> update knowhere to support clustering (milvus-io#32188) Signed-off-by: chasingegg <[email protected]> fix: segment release is not sync with cache (milvus-io#32212) issue: milvus-io#32206 Signed-off-by: chyezh <[email protected]> fix: incorrect pinCount resulting unexpected eviction(milvus-io#32136) (milvus-io#32238) related: milvus-io#32136 Signed-off-by: MrPresent-Han <[email protected]> fix: possible panic when stream reducing (milvus-io#32247) related: milvus-io#32009 Signed-off-by: MrPresent-Han <[email protected]> enhance: [lru-dev] add the related data size for the read apis (milvus-io#32274) cherry-pick: milvus-io#31816 --------- Signed-off-by: SimFG <[email protected]> add debug log (milvus-io#32303) Signed-off-by: Cai Zhang <[email protected]> Refine code for analyze task scheduler (milvus-io#32122) Signed-off-by: Cai Zhang <[email protected]> fix: memory leak on stream reduce (milvus-io#32345) related: milvus-io#32304 Signed-off-by: MrPresent-Han <[email protected]> feat: adding cache stats support (milvus-io#32344) See milvus-io#32067 Signed-off-by: Ted Xu <[email protected]> Fix bug for version (milvus-io#32363) Signed-off-by: Cai Zhang <[email protected]> fix: remove sub entity in load delta log, update entity num in segment itself (milvus-io#32350) issue: milvus-io#30361 Signed-off-by: chyezh <[email protected]> fix: clear data when loading failure (milvus-io#32370) issue: milvus-io#30361 Signed-off-by: chyezh <[email protected]> fix: stream reduce memory leak for failing to release stream reducer(milvus-io#32345) (milvus-io#32381) related: milvus-io#32345 Signed-off-by: MrPresent-Han <[email protected]> Keep InProgress state when getting task state is init (milvus-io#32394) Signed-off-by: Cai Zhang <[email protected]> add log for search failed (milvus-io#32367) related: milvus-io#32136 Signed-off-by: MrPresent-Han <[email protected]> enable asan by default (milvus-io#32423) Signed-off-by: sunby <[email protected]> Major compaction refactoring (milvus-io#32149) Signed-off-by: wayblink <[email protected]> Lru dev debug (milvus-io#32414) Co-authored-by: wayblink <[email protected]> fix: protect loadInfo with atomic, remove rlock at cache to avoid dead lock (milvus-io#32436) issue: milvus-io#32435 Signed-off-by: chyezh <[email protected]> fix: use Get but not GetBy of SegmentManager (milvus-io#32438) issue: milvus-io#32435 Signed-off-by: chyezh <[email protected]> fix: return growing segment when sealed (milvus-io#32460) issue: milvus-io#32435 Signed-off-by: chyezh <[email protected]> enhance: add request resource for lru loading process(milvus-io#32205) (milvus-io#32452) related: milvus-io#32205 Signed-off-by: MrPresent-Han <[email protected]> fix: unexpected deleted index files when lazy loading(milvus-io#32136) (milvus-io#32469) related: milvus-io#32136 Signed-off-by: MrPresent-Han <[email protected]> fix: reference count leak cause release blocked (milvus-io#32465) issue: milvus-io#32379 Signed-off-by: chyezh <[email protected]> Fix compaction fail (milvus-io#32473) Signed-off-by: wayblink <[email protected]>

Signed-off-by: wayblink <[email protected]> Signed-off-by: Cai Zhang <[email protected]> Signed-off-by: chasingegg <[email protected]> Co-authored-by: chasingegg <[email protected]>

wayblink and others added 13 commits March 27, 2024 19:14

Major compaction a rough version

84f59f3

Signed-off-by: wayblink <[email protected]>

Update segment_pruner and GetClusteringKeyField util func

c23eb85

Refine l2 compaction segment rownum logic

37d418d

Signed-off-by: wayblink <[email protected]>

fix split bug

d8e4461

fix prune for int8,int16,int32

192f367

fix code check

8c379cb

Support analyzing data in indexNode

ee74820

Signed-off-by: Cai Zhang <[email protected]>

Use analyze, support vector l2 compaction

1863545

Signed-off-by: wayblink <[email protected]>

refine config

8e54d1d

Signed-off-by: wayblink <[email protected]>

refine

0044220

add l2 compaction timeout config

6cceb09

reduce hardware.GetMemoryCount call

c3a69cc

Fix mistakes

4fbe671

sre-ci-robot requested review from aoiasd and bigsheeper April 1, 2024 12:29

sre-ci-robot added the area/compilation label Apr 1, 2024

sre-ci-robot added area/internal-api area/test sig/testing test/integration integration test size/XXL Denotes a PR that changes 1000+ lines. labels Apr 1, 2024

wayblink changed the title ~~Major test~~ Major compaction 1st edition Apr 1, 2024

mergify bot mentioned this pull request Apr 2, 2024

L2 compaction 1st edition #31660

Closed

wayblink commented Apr 2, 2024

View reviewed changes

Unify l2 compaction to major compaction

f2e8c15

Signed-off-by: wayblink <[email protected]>

wayblink force-pushed the major-test branch from c29a230 to f2e8c15 Compare April 2, 2024 02:42

add major compaction check interval

7e20f60

wayblink force-pushed the major-test branch from 6d77cf8 to 84f993c Compare April 3, 2024 03:58

Refine spill logic

a348982

Signed-off-by: wayblink <[email protected]>

wayblink force-pushed the major-test branch from 5f6bcfd to a348982 Compare April 3, 2024 08:13

fix wrong row size

b76c23a

Signed-off-by: wayblink <[email protected]>

xiaofan-luan reviewed Apr 5, 2024

View reviewed changes

jaime0815 reviewed Apr 5, 2024

View reviewed changes

wayblink added 7 commits April 7, 2024 11:21

remove useless code

2b20f4e

Signed-off-by: wayblink <[email protected]>

add a parameter to control collection major compaction check

f1d87dc

Signed-off-by: wayblink <[email protected]>

Refine major compaction trigger

b389b54

Signed-off-by: wayblink <[email protected]>

major compaction not need index

f93dc65

use handleMergeCompactionResult and handleMajorCompactionResult

d8d1011

Signed-off-by: wayblink <[email protected]>

Remove useless code

1733388

Signed-off-by: wayblink <[email protected]>

rename major to clustering compaction

9963c16

Signed-off-by: wayblink <[email protected]>

wayblink force-pushed the major-test branch from 8a09162 to 9963c16 Compare April 7, 2024 09:08

Upload and clean partitionStats in datacoord

edc71bd

Signed-off-by: wayblink <[email protected]>

liliu-z reviewed Apr 7, 2024

View reviewed changes

czs007 reviewed Apr 8, 2024

View reviewed changes

liliu-z reviewed Apr 8, 2024

View reviewed changes

wayblink force-pushed the major-test branch 2 times, most recently from b1ce6ac to edc71bd Compare April 8, 2024 06:17

czs007 merged commit 17bf3ab into milvus-io:major_compaction Apr 8, 2024
4 of 18 checks passed


		auto parallel_degree = 16;

		if (batch_remote_files.size() >= parallel_degree) {

Major compaction 1st edition #31804

Major compaction 1st edition #31804

Conversation

wayblink commented Apr 1, 2024

sre-ci-robot commented Apr 1, 2024

mergify bot commented Apr 1, 2024

wayblink Apr 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Apr 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiaofan-luan commented Apr 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

czs007 Apr 8, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liliu-z left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wayblink Apr 2, 2024 •

edited

czs007 Apr 8, 2024 •

edited