Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge stable/v1.25 into main #4891

Closed
wants to merge 103 commits into from
Closed
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
252bca3
Remove consistency flag from getTenants
dirkkul May 6, 2024
1be8e9d
Fix test
dirkkul May 6, 2024
0aebbb8
use flag for consistency wait timeout and rename it
reyreaud-l May 7, 2024
d23863f
Merge pull request #4859 from weaviate/tenant_get_consistency
dirkkul May 7, 2024
43817de
Merge pull request #4864 from weaviate/lre/use-flag-for-consistency-w…
reyreaud-l May 7, 2024
d9776bc
chore: add data dir to remove in restart_dev_environment
moogacs May 7, 2024
a11fc5b
fix: vector validation in batch operation
jeroiraz May 6, 2024
0b614e5
test: batch insertion with invalid vector size
jeroiraz May 7, 2024
a0fb6fb
all local to use weaviate-0
moogacs May 7, 2024
4d6c69b
remove backups dirs
moogacs May 7, 2024
cc6fcc5
Merge pull request #4858 from weaviate/fix_batch_vector_validation
jeroiraz May 7, 2024
18c80e3
Merge pull request #4867 from weaviate/restart-script
moogacs May 7, 2024
13bf814
Change modelId setting to model in Ollama modules
antas-marcin May 7, 2024
a47564b
Merge pull request #4870 from weaviate/change-modelid-to-model-in-oll…
dirkkul May 7, 2024
ddeb480
add config option to limit max segment size
etiennedi May 8, 2024
68beb52
add unit tests
etiennedi May 8, 2024
0e801dc
activate for other buckets
etiennedi May 8, 2024
aabf6b9
add license headers to new files
etiennedi May 8, 2024
72ec672
raft: reserve raft as a class name
moogacs May 8, 2024
fdb9e15
normalize clas name and add tests
moogacs May 8, 2024
d1982a2
add more cases and fix all letters
moogacs May 8, 2024
3399011
fix typos in test description
etiennedi May 8, 2024
18e100e
Merge pull request #4874 from weaviate/reser-raft-class-name
moogacs May 8, 2024
fa83fd2
make bootstrap return on successful notify
reyreaud-l May 8, 2024
50b6b87
Merge pull request #4871 from weaviate/lre/fix-bootstrap-timeout-restart
reyreaud-l May 8, 2024
edafd0e
raft: refactor db opening & logging
moogacs May 8, 2024
ef303b7
raft: always apply local schema before db
moogacs May 8, 2024
359f73e
raft: always apply local db before schema
moogacs May 8, 2024
2609b82
Clean up target vectors in hybrid
dirkkul May 8, 2024
c481646
raft: open db on service.Open instead on apply and always apply schem…
moogacs May 8, 2024
e46a41e
Merge pull request #4879 from weaviate/target_vectors_hybrid
dirkkul May 8, 2024
458ac5e
fix unit test after applying always to db
moogacs May 8, 2024
7c9b231
copmare initialLastAppliedIndex to raft index to decide to db openinng
moogacs May 8, 2024
a0080dd
raft: remove schmeaOnly and rename lastAppliedIndexOnStart
moogacs May 8, 2024
51021c2
Merge pull request #4872 from weaviate/introduce-max-segment-size
antas-marcin May 9, 2024
749ba5f
Fix text2vec-aws module Bedrock support
antas-marcin May 8, 2024
a1a70ea
make HNSW max log size configureable
etiennedi May 9, 2024
32be448
Merge pull request #4881 from weaviate/fix-text2vec-aws-module
antas-marcin May 9, 2024
348da5b
more renaming
moogacs May 9, 2024
9cd8264
Merge pull request #4884 from weaviate/set-hnsw-condensing-limit-dyna…
etiennedi May 9, 2024
283a7fc
Add support for Amazon Titan Text Embeddings V2 model
antas-marcin May 9, 2024
f892e91
Merge pull request #4886 from weaviate/add-support-for-titan-v2
antas-marcin May 9, 2024
4ce0d22
prepare release v1.24.12
antas-marcin May 9, 2024
7363ddd
fix: max length field validations when marshalling objects
jeroiraz May 8, 2024
9c0248f
test: invalid object serialization cases
jeroiraz May 9, 2024
a1cd73a
Merge pull request #4887 from weaviate/prepare-release-v1.24.12
antas-marcin May 9, 2024
a0d86f3
test: avoid high alloc scenarios
jeroiraz May 9, 2024
aac580e
Merge remote-tracking branch 'origin/stable/v1.24' into merge-stable-…
antas-marcin May 9, 2024
3a60440
remove debug logs
etiennedi May 9, 2024
d2eef0e
Merge pull request #4889 from weaviate/remove-debug-log-max-segment
antas-marcin May 9, 2024
8f7f276
Merge pull request #4877 from weaviate/fix_maxvectorlen
jeroiraz May 9, 2024
8afdaf3
Merge pull request #4888 from weaviate/merge-stable-v1.24-into-stable…
antas-marcin May 9, 2024
828155e
Merge pull request #4890 from weaviate/stable/v1.24
antas-marcin May 9, 2024
1539ad7
prepare release v1.25.0
antas-marcin May 9, 2024
4e498c4
Merge pull request #4892 from weaviate/prepare-release-v1.25.0
antas-marcin May 9, 2024
4b5b5e9
fix: prevent empty segment generation
jeroiraz May 9, 2024
48a6249
Migrate huggingface to batching
dirkkul May 9, 2024
8b773b5
Fix linter
dirkkul May 9, 2024
51dcc38
Merge pull request #4893 from weaviate/fix_prevent_empty_segments
jeroiraz May 9, 2024
b489a8d
Merge pull request #4875 from weaviate/refactor-db-open-1
reyreaud-l May 10, 2024
df50575
Merge pull request #4876 from weaviate/refactor-db-open-2
reyreaud-l May 10, 2024
9a018c7
Merge pull request #4878 from weaviate/refactor-db-open-3
reyreaud-l May 10, 2024
4ca4c0f
Merge pull request #4882 from weaviate/refactor-db-open-4
reyreaud-l May 10, 2024
597a606
improve log messages to apply to also dump cmd related output
reyreaud-l May 10, 2024
44d6749
Merge pull request #4895 from weaviate/lre/improve-apply-log-messages
reyreaud-l May 10, 2024
e4d4854
Revert "raft: remove schemaOnly and rename lastAppliedIndexOnStart"
reyreaud-l May 10, 2024
ddb34ca
Ensure classCache is added to context before validate objects call
tsmith023 May 13, 2024
cba31db
Merge pull request #4902 from weaviate/fix-objects-validate-api
tsmith023 May 13, 2024
34d5b2b
Fix batch delete proto Java option setting
antas-marcin May 13, 2024
f7e440e
Merge pull request #4903 from weaviate/fix-batch-delete-proto-java-op…
antas-marcin May 13, 2024
b8506f8
Fix headers with GRPC
dirkkul May 13, 2024
a1ffa35
Add WIP image generation
dirkkul May 13, 2024
bfc69dd
Fix api keys from env vars
dirkkul May 13, 2024
030fb30
chore: warn with older partially written compacted segments
jeroiraz May 13, 2024
fc01e49
Merge pull request #4906 from weaviate/octoai_header
dirkkul May 13, 2024
6cb869b
Merge pull request #4908 from weaviate/chore_warn_old_tmp_files
jeroiraz May 13, 2024
a12d014
add reload on RAFT and DB aligned to handle catch up on restart
reyreaud-l May 10, 2024
8dda171
Merge pull request #4897 from weaviate/revert-4882-refactor-db-open-4
reyreaud-l May 14, 2024
f5c53dc
RAFT: support back RF scale+/-
moogacs May 6, 2024
f98c89a
reset shards shut logic and activate in replication reinit func
moogacs May 6, 2024
f9389c3
nit: untouch file
moogacs May 6, 2024
cd8d1be
reinit only
moogacs May 6, 2024
b43fe35
add tests to check descale RF and verbose replication ac tests
moogacs May 7, 2024
c014f64
don't update replication factor twice
moogacs May 9, 2024
cce8f65
fix race in delete logic
asdine May 14, 2024
f1cc22b
reimplements Shard::reinit, updates replication factor
aliszka May 14, 2024
c68fd7d
improves replication tests
aliszka May 14, 2024
d625d85
satisfies linter: atomic.Int64 ref
aliszka May 14, 2024
d9f1fd6
Merge pull request #4913 from weaviate/support-raft-rf-scale_with_ref…
aliszka May 14, 2024
e5dafaf
RAFT: fix data race on registering call backs and atomic lastAppliedI…
moogacs May 14, 2024
4d46bbc
Merge pull request #4914 from weaviate/fix-data-race
moogacs May 14, 2024
e902c0f
Merge pull request #4856 from weaviate/support-raft-rf-scale
moogacs May 14, 2024
5a4ef90
replaces namedlocks with keylocker
aliszka May 14, 2024
e248385
Merge pull request #4919 from weaviate/chore/replace_named_locks
aliszka May 14, 2024
444383b
Fix generative-aws module
antas-marcin May 14, 2024
4fb8881
Merge pull request #4918 from weaviate/fix-generative-aws-module
antas-marcin May 14, 2024
0d52f16
Merge pull request #4922 from weaviate/stable/v1.24
antas-marcin May 14, 2024
a1d850b
Merge pull request #4894 from weaviate/huggingface_batch
antas-marcin May 15, 2024
7e0bb45
replaces generic "shard not found" error with actual one
aliszka May 14, 2024
3547c07
add missing lock
asdine May 15, 2024
5310e94
Merge pull request #4923 from weaviate/chore/improve_err_msg_on_shard…
aliszka May 15, 2024
9d8afa5
Merge pull request #4910 from weaviate/fix-race-and-flaky-test
asdine May 15, 2024
a2824a4
Remove unsearchable properties (#4930)
donomii May 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 0 additions & 6 deletions adapters/handlers/grpc/v1/parse_search_request.go
Original file line number Diff line number Diff line change
Expand Up @@ -368,12 +368,6 @@ func extractTargetVectors(req *pb.SearchRequest, class *models.Class) (*[]string
var targetVectors *[]string
if hs := req.HybridSearch; hs != nil {
targetVectors = &hs.TargetVectors
if hs.NearText != nil {
targetVectors = &hs.NearText.TargetVectors
}
if hs.NearVector != nil {
targetVectors = &hs.NearVector.TargetVectors
}
}
if na := req.NearAudio; na != nil {
targetVectors = &na.TargetVectors
Expand Down
18 changes: 9 additions & 9 deletions adapters/handlers/grpc/v1/parse_search_request_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -204,11 +204,11 @@ func TestGRPCRequest(t *testing.T) {
Alpha: 1.0,
Query: "nearvecquery",
NearVector: &pb.NearVector{
VectorBytes: byteops.Float32ToByteVector([]float32{1, 2, 3}),
TargetVectors: []string{"custom"},
Certainty: &one,
Distance: &one,
VectorBytes: byteops.Float32ToByteVector([]float32{1, 2, 3}),
Certainty: &one,
Distance: &one,
},
TargetVectors: []string{"custom"},
},
},
out: dto.GetParams{
Expand All @@ -221,12 +221,12 @@ func TestGRPCRequest(t *testing.T) {
Query: "nearvecquery",
FusionAlgorithm: 1,
NearVectorParams: &searchparams.NearVector{
Vector: []float32{1, 2, 3},
TargetVectors: []string{"custom"},
Certainty: 1.0,
Distance: 1.0,
WithDistance: true,
Vector: []float32{1, 2, 3},
Certainty: 1.0,
Distance: 1.0,
WithDistance: true,
},
TargetVectors: []string{"custom"},
},
},
error: false,
Expand Down
4 changes: 2 additions & 2 deletions adapters/handlers/grpc/v1/tenants.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ func (s *Service) tenantsGet(ctx context.Context, principal *models.Principal, r
var err error
var tenants []*models.Tenant
if req.Params == nil {
tenants, err = s.schemaManager.GetConsistentTenants(ctx, principal, req.Collection, req.IsConsistent, []string{})
tenants, err = s.schemaManager.GetConsistentTenants(ctx, principal, req.Collection, true, []string{})
if err != nil {
return nil, err
}
Expand All @@ -38,7 +38,7 @@ func (s *Service) tenantsGet(ctx context.Context, principal *models.Principal, r
if len(requestedNames) == 0 {
return nil, fmt.Errorf("must specify at least one tenant name")
}
tenants, err = s.schemaManager.GetConsistentTenants(ctx, principal, req.Collection, req.IsConsistent, requestedNames)
tenants, err = s.schemaManager.GetConsistentTenants(ctx, principal, req.Collection, true, requestedNames)
if err != nil {
return nil, err
}
Expand Down
54 changes: 27 additions & 27 deletions adapters/handlers/rest/configure_api.go
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,8 @@ func MakeAppState(ctx context.Context, options *swag.CommandLineOptionsGroup) *s
MemtablesMaxSizeMB: appState.ServerConfig.Config.Persistence.MemtablesMaxSizeMB,
MemtablesMinActiveSeconds: appState.ServerConfig.Config.Persistence.MemtablesMinActiveDurationSeconds,
MemtablesMaxActiveSeconds: appState.ServerConfig.Config.Persistence.MemtablesMaxActiveDurationSeconds,
MaxSegmentSize: appState.ServerConfig.Config.Persistence.LSMMaxSegmentSize,
HNSWMaxLogSize: appState.ServerConfig.Config.Persistence.HNSWMaxLogSize,
RootPath: appState.ServerConfig.Config.Persistence.DataPath,
QueryLimit: appState.ServerConfig.Config.QueryDefaults.Limit,
QueryMaximumResults: appState.ServerConfig.Config.QueryMaximumResults,
Expand Down Expand Up @@ -251,8 +253,6 @@ func MakeAppState(ctx context.Context, options *swag.CommandLineOptionsGroup) *s
remoteIndexClient, appState.Logger, appState.ServerConfig.Config.Persistence.DataPath)
appState.Scaler = scaler

/// TODO-RAFT START
//
server2port, err := parseNode2Port(appState)
if len(server2port) == 0 || err != nil {
appState.Logger.
Expand All @@ -268,31 +268,31 @@ func MakeAppState(ctx context.Context, options *swag.CommandLineOptionsGroup) *s
dataPath := appState.ServerConfig.Config.Persistence.DataPath

rConfig := rStore.Config{
WorkDir: filepath.Join(dataPath, "raft"),
NodeID: nodeName,
Host: addrs[0],
RaftPort: appState.ServerConfig.Config.Raft.Port,
RPCPort: appState.ServerConfig.Config.Raft.InternalRPCPort,
RaftRPCMessageMaxSize: appState.ServerConfig.Config.Raft.RPCMessageMaxSize,
ServerName2PortMap: server2port,
BootstrapTimeout: appState.ServerConfig.Config.Raft.BootstrapTimeout,
BootstrapExpect: appState.ServerConfig.Config.Raft.BootstrapExpect,
HeartbeatTimeout: appState.ServerConfig.Config.Raft.HeartbeatTimeout,
RecoveryTimeout: appState.ServerConfig.Config.Raft.RecoveryTimeout,
ElectionTimeout: appState.ServerConfig.Config.Raft.ElectionTimeout,
SnapshotInterval: appState.ServerConfig.Config.Raft.SnapshotInterval,
SnapshotThreshold: appState.ServerConfig.Config.Raft.SnapshotThreshold,
UpdateWaitTimeout: time.Second * 10, // TODO-RAFT read from the flag
MetadataOnlyVoters: appState.ServerConfig.Config.Raft.MetadataOnlyVoters,
DB: nil,
Parser: schema.NewParser(appState.Cluster, vectorIndex.ParseAndValidateConfig, migrator),
AddrResolver: appState.Cluster,
Logger: appState.Logger,
LogLevel: logLevel(),
LogJSONFormat: !logTextFormat(),
IsLocalHost: appState.ServerConfig.Config.Cluster.Localhost,
LoadLegacySchema: schemaRepo.LoadLegacySchema,
SaveLegacySchema: schemaRepo.SaveLegacySchema,
WorkDir: filepath.Join(dataPath, config.DefaultRaftDir),
NodeID: nodeName,
Host: addrs[0],
RaftPort: appState.ServerConfig.Config.Raft.Port,
RPCPort: appState.ServerConfig.Config.Raft.InternalRPCPort,
RaftRPCMessageMaxSize: appState.ServerConfig.Config.Raft.RPCMessageMaxSize,
ServerName2PortMap: server2port,
BootstrapTimeout: appState.ServerConfig.Config.Raft.BootstrapTimeout,
BootstrapExpect: appState.ServerConfig.Config.Raft.BootstrapExpect,
HeartbeatTimeout: appState.ServerConfig.Config.Raft.HeartbeatTimeout,
RecoveryTimeout: appState.ServerConfig.Config.Raft.RecoveryTimeout,
ElectionTimeout: appState.ServerConfig.Config.Raft.ElectionTimeout,
SnapshotInterval: appState.ServerConfig.Config.Raft.SnapshotInterval,
SnapshotThreshold: appState.ServerConfig.Config.Raft.SnapshotThreshold,
ConsistencyWaitTimeout: appState.ServerConfig.Config.Raft.ConsistencyWaitTimeout,
MetadataOnlyVoters: appState.ServerConfig.Config.Raft.MetadataOnlyVoters,
DB: nil,
Parser: schema.NewParser(appState.Cluster, vectorIndex.ParseAndValidateConfig, migrator),
AddrResolver: appState.Cluster,
Logger: appState.Logger,
LogLevel: logLevel(),
LogJSONFormat: !logTextFormat(),
IsLocalHost: appState.ServerConfig.Config.Cluster.Localhost,
LoadLegacySchema: schemaRepo.LoadLegacySchema,
SaveLegacySchema: schemaRepo.SaveLegacySchema,
}
for _, name := range appState.ServerConfig.Config.Raft.Join[:rConfig.BootstrapExpect] {
if strings.Contains(name, rConfig.NodeID) {
Expand Down
2 changes: 1 addition & 1 deletion adapters/handlers/rest/doc.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions adapters/handlers/rest/embedded_spec.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 6 additions & 2 deletions adapters/repos/db/helper_for_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ func testObject(className string) *storobj.Object {
}
}

func createRandomObjects(r *rand.Rand, className string, numObj int) []*storobj.Object {
func createRandomObjects(r *rand.Rand, className string, numObj int, vectorDim int) []*storobj.Object {
obj := make([]*storobj.Object, numObj)

for i := 0; i < numObj; i++ {
Expand All @@ -304,7 +304,11 @@ func createRandomObjects(r *rand.Rand, className string, numObj int) []*storobj.
ID: strfmt.UUID(uuid.NewString()),
Class: className,
},
Vector: []float32{r.Float32(), r.Float32(), r.Float32(), r.Float32()},
Vector: make([]float32, vectorDim),
}

for d := 0; d < vectorDim; d++ {
obj[i].Vector[d] = r.Float32()
}
}
return obj
Expand Down
2 changes: 2 additions & 0 deletions adapters/repos/db/index.go
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,8 @@ type IndexConfig struct {
MemtablesMaxSizeMB int
MemtablesMinActiveSeconds int
MemtablesMaxActiveSeconds int
MaxSegmentSize int64
HNSWMaxLogSize int64
ReplicationFactor int64
AvoidMMap bool
DisableLazyLoadShards bool
Expand Down
2 changes: 2 additions & 0 deletions adapters/repos/db/init.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ func (db *DB) init(ctx context.Context) error {
MemtablesMaxSizeMB: db.config.MemtablesMaxSizeMB,
MemtablesMinActiveSeconds: db.config.MemtablesMinActiveSeconds,
MemtablesMaxActiveSeconds: db.config.MemtablesMaxActiveSeconds,
MaxSegmentSize: db.config.MaxSegmentSize,
HNSWMaxLogSize: db.config.HNSWMaxLogSize,
TrackVectorDimensions: db.config.TrackVectorDimensions,
AvoidMMap: db.config.AvoidMMap,
DisableLazyLoadShards: db.config.DisableLazyLoadShards,
Expand Down
5 changes: 5 additions & 0 deletions adapters/repos/db/lsmkv/bucket.go
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,10 @@ type Bucket struct {
// optionally supplied to prevent starting memory-intensive
// processes when memory pressure is high
allocChecker memwatch.AllocChecker

// optional segment size limit. If set, a compaction will skip segments that
// sum to more than the specified value.
maxSegmentSize int64
}

func NewBucketCreator() *Bucket { return &Bucket{} }
Expand Down Expand Up @@ -178,6 +182,7 @@ func (*Bucket) NewBucket(ctx context.Context, dir, rootDir string, logger logrus
forceCompaction: b.forceCompaction,
useBloomFilter: b.useBloomFilter,
calcCountNetAdditions: b.calcCountNetAdditions,
maxSegmentSize: b.maxSegmentSize,
}, b.allocChecker)
if err != nil {
return nil, fmt.Errorf("init disk segments: %w", err)
Expand Down
7 changes: 7 additions & 0 deletions adapters/repos/db/lsmkv/bucket_options.go
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,13 @@ func WithCalcCountNetAdditions(calcCountNetAdditions bool) BucketOption {
}
}

func WithMaxSegmentSize(maxSegmentSize int64) BucketOption {
return func(b *Bucket) error {
b.maxSegmentSize = maxSegmentSize
return nil
}
}

/*
Background for this option:

Expand Down
5 changes: 4 additions & 1 deletion adapters/repos/db/lsmkv/segment_group.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ type SegmentGroup struct {
calcCountNetAdditions bool // see bucket for more datails
compactLeftOverSegments bool // see bucket for more datails

allocChecker memwatch.AllocChecker
allocChecker memwatch.AllocChecker
maxSegmentSize int64
}

type sgConfig struct {
Expand All @@ -75,6 +76,7 @@ type sgConfig struct {
useBloomFilter bool
calcCountNetAdditions bool
forceCompaction bool
maxSegmentSize int64
}

func newSegmentGroup(logger logrus.FieldLogger, metrics *Metrics,
Expand All @@ -99,6 +101,7 @@ func newSegmentGroup(logger logrus.FieldLogger, metrics *Metrics,
useBloomFilter: cfg.useBloomFilter,
calcCountNetAdditions: cfg.calcCountNetAdditions,
compactLeftOverSegments: cfg.forceCompaction,
maxSegmentSize: cfg.maxSegmentSize,
allocChecker: allocChecker,
}

Expand Down
16 changes: 16 additions & 0 deletions adapters/repos/db/lsmkv/segment_group_compaction.go
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,12 @@ func (sg *SegmentGroup) compactOnce() (bool, error) {
leftSegment := sg.segmentAtPos(pair[0])
rightSegment := sg.segmentAtPos(pair[1])

if !sg.compactionFitsSizeLimit(leftSegment, rightSegment) {
// nothing to do this round, let's wait for the next round in the hopes
// that we'll find smaller (lower-level) segments that can still fit.
return false, nil
}

path := filepath.Join(sg.dir, "segment-"+segmentID(leftSegment.path)+"_"+segmentID(rightSegment.path)+".db.tmp")

f, err := os.Create(path)
Expand Down Expand Up @@ -470,3 +476,13 @@ func (s *segmentLevelStats) report(metrics *Metrics,
}).Set(float64(count))
}
}

func (sg *SegmentGroup) compactionFitsSizeLimit(left, right *segment) bool {
if sg.maxSegmentSize == 0 {
// no limit is set, always return true
return true
}

totalSize := left.size + right.size
return totalSize <= sg.maxSegmentSize
}
101 changes: 101 additions & 0 deletions adapters/repos/db/lsmkv/segment_group_compaction_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
// _ _
// __ _____ __ ___ ___ __ _| |_ ___
// \ \ /\ / / _ \/ _` \ \ / / |/ _` | __/ _ \
// \ V V / __/ (_| |\ V /| | (_| | || __/
// \_/\_/ \___|\__,_| \_/ |_|\__,_|\__\___|
//
// Copyright © 2016 - 2024 Weaviate B.V. All rights reserved.
//
// CONTACT: [email protected]
//

package lsmkv

import (
"testing"

"github.com/stretchr/testify/assert"
)

func TestSegmentGroup_BestCompactionPair(t *testing.T) {
var maxSegmentSize int64 = 10000

tests := []struct {
name string
segments []*segment
expectedPair []string
}{
{
name: "single segment",
segments: []*segment{
{size: 1000, path: "segment0", level: 0},
},
expectedPair: nil,
},
{
name: "two segments, same level",
segments: []*segment{
{size: 1000, path: "segment0", level: 0},
{size: 1000, path: "segment1", level: 0},
},
expectedPair: []string{"segment0", "segment1"},
},
{
name: "multiple segments, multiple levels, lowest level is picked",
segments: []*segment{
{size: 4000, path: "segment0", level: 2},
{size: 4000, path: "segment1", level: 2},
{size: 2000, path: "segment2", level: 1},
{size: 2000, path: "segment3", level: 1},
{size: 1000, path: "segment4", level: 0},
{size: 1000, path: "segment5", level: 0},
},
expectedPair: []string{"segment4", "segment5"},
},
{
name: "two segments that don't fit the max size, but eliglbe segments of a lower level are present",
segments: []*segment{
{size: 8000, path: "segment0", level: 3},
{size: 8000, path: "segment1", level: 3},
{size: 4000, path: "segment2", level: 2},
{size: 4000, path: "segment3", level: 2},
},
expectedPair: []string{"segment2", "segment3"},
},
}

for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
sg := &SegmentGroup{
segments: test.segments,
maxSegmentSize: maxSegmentSize,
}
pair := sg.bestCompactionCandidatePair()
if test.expectedPair == nil {
assert.Nil(t, pair)
} else {
leftPath := test.segments[pair[0]].path
rightPath := test.segments[pair[1]].path
assert.Equal(t, test.expectedPair, []string{leftPath, rightPath})
}
})
}
}

func TestSegmenGroup_CompactionLargerThanMaxSize(t *testing.T) {
maxSegmentSize := int64(10000)
// this test only tests the unhappy path which has an early exist condition,
// meaning we don't need real segments, it is only metadata that is evaluated
// here.
sg := &SegmentGroup{
segments: []*segment{
{size: 8000, path: "segment0", level: 3},
{size: 8000, path: "segment1", level: 3},
},
maxSegmentSize: maxSegmentSize,
}

ok, err := sg.compactOnce()
assert.False(t, ok, "segments are too large to run")
assert.Nil(t, err)
}