[Bug]: Query failed: Timestamp lag too large lag(129h36m35.55s) max(24h0m0s) #29619

weiZhenkun · 2024-01-02T07:27:25Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: 2.3.1
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): extenal kafka   
- SDK version(e.g. pymilvus v2.0.0rc2): 2.3.1
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

query error.

Expected Behavior

query error:
failed to search/query delegator 10 for channel demo-cluster3-rootcoord-dml_1_445021135592215568v0: fail to Search, QueryNode ID=10, reason=Timestamp lag too large lag(129h36m35.55s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable

Steps To Reproduce

query error.

Milvus Log

query error.

Anything else?

query error.

yanliang567 · 2024-01-02T11:05:57Z

it seems that the tt lag is huge in you milvus cluster @weiZhenkun Could you please refer this doc to export the whole Milvus logs for investigation?
For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.

/assign @weiZhenkun
/unassign

yanliang567 · 2024-01-02T11:18:38Z

/assign @congqixia
I got the logs offline, please ping me when you need the logs @congqixia

/unassign @weiZhenkun

cuiyuan605 · 2024-03-10T10:14:52Z

Has the problem been solved? I had the same problem.

xiaofan-luan · 2024-03-10T23:54:01Z

Has the problem been solved? I had the same problem.

There might be many different situation that could trigger this issue.
If latest 2.3.11 can not fix that, please offer our detailed logs

lucifax301 · 2024-05-15T17:12:21Z

I also meet this problem , version 2.3.4 or 2.4.1 ,detail logs below
[2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"] [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [role=proxy] [db=default] [collection=zzp] [partitions="[KB3ebe25f7f0eb4401b74a6f5fba54759e,KBb39d6898ef1042f1a7d69b69ae342d5c,KBba5023df84fb4033849b62b6d331d794]"] [dsl=] [len(PlaceholderGroup)=3084] [OutputFields="[chunk_id,file_id,file_name,file_path,timestamp,content]"] [search_params="[{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"params\",\"value\":\"{\\\"nprobe\\\":256}\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"ignore_growing\",\"value\":\"False\"},{\"key\":\"metric_type\",\"value\":\"L2\"},{\"key\":\"anns_field\",\"value\":\"embedding\"}]"] [guarantee_timestamp=1] [nq=1] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]

xiaofan-luan · 2024-05-18T09:13:50Z

I also meet this problem , version 2.3.4 or 2.4.1 ,detail logs below [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"] [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [role=proxy] [db=default] [collection=zzp] [partitions="[KB3ebe25f7f0eb4401b74a6f5fba54759e,KBb39d6898ef1042f1a7d69b69ae342d5c,KBba5023df84fb4033849b62b6d331d794]"] [dsl=] [len(PlaceholderGroup)=3084] [OutputFields="[chunk_id,file_id,file_name,file_path,timestamp,content]"] [search_params="[{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"params\",\"value\":\"{\\\"nprobe\\\":256}\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"ignore_growing\",\"value\":\"False\"},{\"key\":\"metric_type\",\"value\":\"L2\"},{\"key\":\"anns_field\",\"value\":\"embedding\"}]"] [guarantee_timestamp=1] [nq=1] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]

We will need more logs qfrom querynode and datanode to understand why this happens.
This is a mechanism saying that insert and delete can not be catching up by querynodes or deletenodes.
usually it because of too many writes or consumption blocked

weiZhenkun added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 2, 2024

weiZhenkun assigned yanliang567 Jan 2, 2024

sre-ci-robot assigned weiZhenkun and unassigned yanliang567 Jan 2, 2024

yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 2, 2024

sre-ci-robot assigned congqixia and unassigned weiZhenkun Jan 2, 2024

yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Jan 2, 2024

yanliang567 added this to the 2.3.4 milestone Jan 2, 2024

yanliang567 modified the milestones: 2.3.4, 2.3.5 Jan 12, 2024

yanliang567 modified the milestones: 2.3.5, 2.3.6 Jan 22, 2024

yanliang567 modified the milestones: 2.3.6, 2.3.7 Jan 30, 2024

yanliang567 modified the milestones: 2.3.7, 2.3.9, 2.3.10 Feb 18, 2024

yanliang567 modified the milestones: 2.3.10, 2.3.11 Feb 28, 2024

yanliang567 modified the milestones: 2.3.11, 2.3.12 Mar 11, 2024

yanliang567 modified the milestones: 2.3.12, 2.3.13 Mar 22, 2024

yanliang567 modified the milestones: 2.3.13, 2.3.14 Apr 15, 2024

yanliang567 modified the milestones: 2.3.14, 2.3.15 Apr 23, 2024

yanliang567 modified the milestones: 2.3.15, 2.3.16 May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Query failed: Timestamp lag too large lag(129h36m35.55s) max(24h0m0s) #29619

[Bug]: Query failed: Timestamp lag too large lag(129h36m35.55s) max(24h0m0s) #29619

weiZhenkun commented Jan 2, 2024 •

edited

yanliang567 commented Jan 2, 2024

yanliang567 commented Jan 2, 2024

cuiyuan605 commented Mar 10, 2024

xiaofan-luan commented Mar 10, 2024

lucifax301 commented May 15, 2024

xiaofan-luan commented May 18, 2024

[Bug]: Query failed: Timestamp lag too large lag(129h36m35.55s) max(24h0m0s) #29619

[Bug]: Query failed: Timestamp lag too large lag(129h36m35.55s) max(24h0m0s) #29619

Comments

weiZhenkun commented Jan 2, 2024 • edited

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

yanliang567 commented Jan 2, 2024

yanliang567 commented Jan 2, 2024

cuiyuan605 commented Mar 10, 2024

xiaofan-luan commented Mar 10, 2024

lucifax301 commented May 15, 2024

xiaofan-luan commented May 18, 2024

weiZhenkun commented Jan 2, 2024 •

edited