Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions - BlobTrigger container scan should not skip past changes made during the scan #3029

Open
AartBluestoke opened this issue Oct 20, 2023 · 0 comments

Comments

@AartBluestoke
Copy link

AartBluestoke commented Oct 20, 2023

When a Blob trigger scans a container, and the container contains more than 1 page of blobs (10,000) it may skip changes which occurred on the first page while it s reading the second or later pages, if they also have changes.

Repro steps

  1. Set up a function with a blob trigger

  2. Write 2 blobs per second from with a name of a number starting at 50000, one counting up, the other counting down (eg, so after 1000 seconds you are writing blobs named 49000 and 51000)

  3. Wait until you have over 20,000 blobs

  4. Observe the standard Azure Functions Log Stream (the full logs including the container scan logs)

  5. Observe that the count of blob triggers does not match the count of blobs written during the time between scans

Expected behavior

The count of changed blobs found matches the number of seoconds passed

Actual behavior

Observe that 'Line 29524' in the logs below starts polling for blobs newer than '3:33:44', however the first 2 pages of blobs read (lines 25626 at 3:33:23 and 29273 at 3:33:35) where checked before that time - any changes to these 20,000 blobs between about 3:33:23 and 3:33:44 would never trigger - they had not yet changed when checked and by the next check (29524 and 33614) the "high water mark" has already advanced past them. The same is true for many of the High water mark updates in the following log stream.

Please find a log snippet found from production that illistrates the problem:
Line 64: 2023-09-20T03:30:49Z [Verbose] Poll for blobs newer than '2023-09-20T03:29:43.000Z' in container found 0 blobs in 11 ms. ContinuationToken: False.

Line  4280: 2023-09-20T03:31:13Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:30:17.000Z' in container found 0 blobs in 561 ms. ContinuationToken: True.
Line  8356: 2023-09-20T03:31:38Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:30:17.000Z' in container found 0 blobs in 28 ms. ContinuationToken: False.

Line  8483: 2023-09-20T03:31:48Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:31:22.000Z' in container found 23 blobs in 661 ms. ContinuationToken: True.
Line 12204: 2023-09-20T03:32:01Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:31:22.000Z' in container found 0 blobs in 751 ms. ContinuationToken: True.
Line 12296: 2023-09-20T03:32:12Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:31:22.000Z' in container found 44 blobs in 1319 ms. ContinuationToken: True.
Line 17632: 2023-09-20T03:32:25Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:31:22.000Z' in container found 0 blobs in 54 ms. ContinuationToken: False.

Line 17769: 2023-09-20T03:32:36Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:32:10.000Z' in container found 22 blobs in 599 ms. ContinuationToken: True.
Line 21200: 2023-09-20T03:32:48Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:32:10.000Z' in container found 0 blobs in 657 ms. ContinuationToken: True.
Line 25510: 2023-09-20T03:33:12Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:32:10.000Z' in container found 0 blobs in 52 ms. ContinuationToken: False.

Line 25626: 2023-09-20T03:33:23Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:32:57.000Z' in container found 23 blobs in 577 ms. ContinuationToken: True.
Line 29273: 2023-09-20T03:33:35Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:32:57.000Z' in container found 0 blobs in 622 ms. ContinuationToken: True.
Line 29416: 2023-09-20T03:33:58Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:32:57.000Z' in container found 0 blobs in 62 ms. ContinuationToken: False.

Line 29524: 2023-09-20T03:34:09Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:33:44.000Z' in container found 22 blobs in 705 ms. ContinuationToken: True.
Line 33614: 2023-09-20T03:34:21Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:33:44.000Z' in container found 0 blobs in 637 ms. ContinuationToken: True.
Line 39379: 2023-09-20T03:34:45Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:33:44.000Z' in container found 0 blobs in 65 ms. ContinuationToken: False.

Line 39495: 2023-09-20T03:34:55Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:34:30.000Z' in container found 23 blobs in 649 ms. ContinuationToken: True.
Line 43502: 2023-09-20T03:35:08Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:34:30.000Z' in container found 0 blobs in 644 ms. ContinuationToken: True.
Line 43594: 2023-09-20T03:35:18Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:34:30.000Z' in container found 43 blobs in 647 ms. ContinuationToken: True.
Line 50061: 2023-09-20T03:35:31Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:34:30.000Z' in container found 0 blobs in 82 ms. ContinuationToken: False.

Line 50188: 2023-09-20T03:35:42Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:35:17.000Z' in container found 19 blobs in 612 ms. ContinuationToken: True.
Line 53534: 2023-09-20T03:35:54Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:35:17.000Z' in container found 0 blobs in 654 ms. ContinuationToken: True.
Line 59261: 2023-09-20T03:36:18Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:35:17.000Z' in container found 0 blobs in 89 ms. ContinuationToken: False.

Line 59398: 2023-09-20T03:36:28Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:36:03.000Z' in container found 22 blobs in 606 ms. ContinuationToken: True.
Line 62747: 2023-09-20T03:36:40Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:36:03.000Z' in container found 0 blobs in 628 ms. ContinuationToken: True.
Line 66891: 2023-09-20T03:37:04Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:36:03.000Z' in container found 0 blobs in 104 ms. ContinuationToken: False.

Line 66986: 2023-09-20T03:37:14Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:36:49.000Z' in container found 21 blobs in 583 ms. ContinuationToken: True.
Line 70334: 2023-09-20T03:37:26Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:36:49.000Z' in container found 0 blobs in 680 ms. ContinuationToken: True.
Line 70447: 2023-09-20T03:37:37Z   [Verbose]   Poll for blobs newer than '2023-09-20T03:36:49.000Z' in container found 40 blobs in 596 ms. ContinuationToken: True.

This log sequence should be finding 2 files per second, one in the first batch of the poll, one in the last batch of the poll. It isn't. The files not triggered on will be permanently missed..

Known workarounds

Due to bug #3023 a partial workaround of only relying on the log scan to keep up is not available if your data is in a different container to the azure webjobs scratch pad.

Related information

PR #3014 resolves this issue

Provide any related information

PR raised in response to bug discussed in azure support ticket TrackingID#2308210030000318

  • Package version

Latest - but Git Blame shows it has been this way for years (perhaps 5?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant