Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

romio: Fix wrong communicator use in ADIOI_GEN_OpenColl #6884

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

range3
Copy link

@range3 range3 commented Jan 25, 2024

Pull Request Description

When ADIOI_GEN_OpenColl is called in ADIO_CREAT mode,
file creation is done by calling ADIOI_xxx_Open with MPI_COMM_SELF by rank == fd->hints->ranklist[0].
If successful, the file is closed with ADIOI_xxx_Close, and later the file will be opened by all without creation flag.

The problem is that the communicator of ADIOI_GEN_OpenColl is passed to ADIOI_xxx_Close instead of MPI_COMM_SELF.
If a collective call is made using this communicator in ADIOI_xxx_Close, it may hang.

Fixes: #6868

Contribution Agreement

I have sent the MPICH Individual Contributor License Agreement by e-mail and am awaiting approval.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

When ADIOI_GEN_OpenColl is called in ADIO_CREAT mode,
file creation is done by calling ADIOI_xxx_Open with
MPI_COMM_SELF by rank == fd->hints->ranklist[0].
If successful, the file is closed with ADIOI_xxx_Close,
and later the file will be opened by all without creation flag.

The problem is that the communicator of ADIOI_GEN_OpenColl
is passed to ADIOI_xxx_Close instead of MPI_COMM_SELF.
If a collective call is made using this communicator
in ADIOI_xxx_Close, it may hang.
@hzhou
Copy link
Contributor

hzhou commented Jan 25, 2024

test:mpich/authorship
test:mpich/warnings

@hzhou
Copy link
Contributor

hzhou commented Jan 25, 2024

test:mpich/authorship

@range3
Copy link
Author

range3 commented Feb 9, 2024

Your message to cla awaits moderator approval

My MPICH idivisual cla is still stuck on the mailing list ([email protected]) waiting for moderator approval.
Could you please confirm this?

@hzhou
Copy link
Contributor

hzhou commented Feb 9, 2024

Your message to cla awaits moderator approval

My MPICH idivisual cla is still stuck on the mailing list ([email protected]) waiting for moderator approval. Could you please confirm this?

We got your cla. You are all set.

@hzhou
Copy link
Contributor

hzhou commented Feb 9, 2024

test:mpich/ch3/most
test:mpich/ch4/most

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

romio: Inconsistent Use of Communicator in ADIOI_GEN_OpenColl for ADIOI_xxx_Close
2 participants