Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Query failed: Timestamp lag too large lag(129h36m35.55s) max(24h0m0s) #29619

Open
1 task done
weiZhenkun opened this issue Jan 2, 2024 · 6 comments
Open
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@weiZhenkun
Copy link

weiZhenkun commented Jan 2, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.3.1
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): extenal kafka   
- SDK version(e.g. pymilvus v2.0.0rc2): 2.3.1
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

query error.

Expected Behavior

query error:
failed to search/query delegator 10 for channel demo-cluster3-rootcoord-dml_1_445021135592215568v0: fail to Search, QueryNode ID=10, reason=Timestamp lag too large lag(129h36m35.55s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable

Steps To Reproduce

query error.

Milvus Log

query error.

Anything else?

query error.

@weiZhenkun weiZhenkun added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 2, 2024
@yanliang567
Copy link
Contributor

it seems that the tt lag is huge in you milvus cluster @weiZhenkun Could you please refer this doc to export the whole Milvus logs for investigation?
For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.

/assign @weiZhenkun
/unassign

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 2, 2024
@yanliang567
Copy link
Contributor

/assign @congqixia
I got the logs offline, please ping me when you need the logs @congqixia

/unassign @weiZhenkun

@sre-ci-robot sre-ci-robot assigned congqixia and unassigned weiZhenkun Jan 2, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Jan 2, 2024
@yanliang567 yanliang567 added this to the 2.3.4 milestone Jan 2, 2024
@yanliang567 yanliang567 modified the milestones: 2.3.4, 2.3.5 Jan 12, 2024
@yanliang567 yanliang567 modified the milestones: 2.3.5, 2.3.6 Jan 22, 2024
@yanliang567 yanliang567 modified the milestones: 2.3.6, 2.3.7 Jan 30, 2024
@yanliang567 yanliang567 modified the milestones: 2.3.7, 2.3.9, 2.3.10 Feb 18, 2024
@yanliang567 yanliang567 modified the milestones: 2.3.10, 2.3.11 Feb 28, 2024
@cuiyuan605
Copy link

Has the problem been solved? I had the same problem.

@xiaofan-luan
Copy link
Contributor

Has the problem been solved? I had the same problem.

There might be many different situation that could trigger this issue.
If latest 2.3.11 can not fix that, please offer our detailed logs

@yanliang567 yanliang567 modified the milestones: 2.3.11, 2.3.12 Mar 11, 2024
@yanliang567 yanliang567 modified the milestones: 2.3.12, 2.3.13 Mar 22, 2024
@yanliang567 yanliang567 modified the milestones: 2.3.13, 2.3.14 Apr 15, 2024
@yanliang567 yanliang567 modified the milestones: 2.3.14, 2.3.15 Apr 23, 2024
@lucifax301
Copy link

I also meet this problem , version 2.3.4 or 2.4.1 ,detail logs below
[2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"] [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [role=proxy] [db=default] [collection=zzp] [partitions="[KB3ebe25f7f0eb4401b74a6f5fba54759e,KBb39d6898ef1042f1a7d69b69ae342d5c,KBba5023df84fb4033849b62b6d331d794]"] [dsl=] [len(PlaceholderGroup)=3084] [OutputFields="[chunk_id,file_id,file_name,file_path,timestamp,content]"] [search_params="[{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"params\",\"value\":\"{\\\"nprobe\\\":256}\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"ignore_growing\",\"value\":\"False\"},{\"key\":\"metric_type\",\"value\":\"L2\"},{\"key\":\"anns_field\",\"value\":\"embedding\"}]"] [guarantee_timestamp=1] [nq=1] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]

@yanliang567 yanliang567 modified the milestones: 2.3.15, 2.3.16 May 16, 2024
@xiaofan-luan
Copy link
Contributor

I also meet this problem , version 2.3.4 or 2.4.1 ,detail logs below [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"] [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [role=proxy] [db=default] [collection=zzp] [partitions="[KB3ebe25f7f0eb4401b74a6f5fba54759e,KBb39d6898ef1042f1a7d69b69ae342d5c,KBba5023df84fb4033849b62b6d331d794]"] [dsl=] [len(PlaceholderGroup)=3084] [OutputFields="[chunk_id,file_id,file_name,file_path,timestamp,content]"] [search_params="[{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"params\",\"value\":\"{\\\"nprobe\\\":256}\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"ignore_growing\",\"value\":\"False\"},{\"key\":\"metric_type\",\"value\":\"L2\"},{\"key\":\"anns_field\",\"value\":\"embedding\"}]"] [guarantee_timestamp=1] [nq=1] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]

We will need more logs qfrom querynode and datanode to understand why this happens.
This is a mechanism saying that insert and delete can not be catching up by querynodes or deletenodes.
usually it because of too many writes or consumption blocked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants