Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTreeProcessorMP processes events multiple times when there are more threads than entries #15425

Open
1 task done
iguinn opened this issue May 6, 2024 · 4 comments
Open
1 task done
Assignees
Labels

Comments

@iguinn
Copy link

iguinn commented May 6, 2024

Check duplicate issues.

  • Checked for duplicates

Description

TTreeProcessorMP process events multiple times when you read multiple files using TChain, and some of the files have fewer entries than there are threads. This has been observed using TChain and TSelector, but I haven't tried it out with any other methods of using TTreeProcessorMP.

The attached reproducer minimally shows this bug by showing that the Process method is run more than it should be. Another observations that are not shown in the reproducer is that the additional events are repeats of existing events. In the code where I first observed this, we frequently have files where one thread is processing multiple events (5-10ish), and most of them are real events, with the last few being repeats. It also seems like it always tries to process at least 15 events for each file if you use 16 threads, even if those events don't exist; if there are more than 15 events than it seems to behave as expected.

Reproducer

mp_bug.zip
Run:

mkdir files
root FillRootFiles.C
root TestMP.C+

This will create 100 root files with 1 entry each called files/f_0###.root. It will then use TTreeProcessorMP with 16 threads to read through the files and count the number of events read. This should be 100; instead it is 1500.

ROOT version

v6.28/06 and 6.26/10

Installation method

Built from source

Operating system

Ubuntu 22.04

Additional context

No response

@iguinn iguinn added the bug label May 6, 2024
@dpiparo dpiparo self-assigned this May 9, 2024
@bellenot
Copy link
Member

bellenot commented Jun 5, 2024

Not sure if it's a bug or a feature... With TTreeProcessorMP, if the number of files to process is larger than the number of worker processes, the process is invoked once per file. Otherwise entries are divided equally between workers.
To see this, change your code to have 10 files and 1000 events per file:

  ULong_t n_files = 10;
  ULong_t ev_per_file = 1000;

you'll see:

TParameter<int> ct = 10000

But I might be wrong and @pcanal, @dpiparo, or @guitargeek can correct me...

@dpiparo
Copy link
Member

dpiparo commented Jun 13, 2024

But isn't the report about the opposite case, i.e. when there are more workers than events to process?

@bellenot
Copy link
Member

100 root files with 16 threads, so more files than workers. Or did I miss something?

@iguinn
Copy link
Author

iguinn commented Jun 13, 2024

It's about having more threads in a file than events (there's one event per file). As mentioned above, it seems like the process is getting invoked once per thread per file, but if there aren't enough events, the extra threads seem to reprocess an old event. This is what we had noticed with the much more complex code that led us to identify this, which is that certain events were getting processed and histogrammed multiple times in cases like this. We were able to work around this successfully by checking if the same event is being processed multiple times in a row.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants