Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glusterd may try to start bricks twice #4080

Open
xhernandez opened this issue Mar 27, 2023 · 0 comments · May be fixed by #4088
Open

glusterd may try to start bricks twice #4080

xhernandez opened this issue Mar 27, 2023 · 0 comments · May be fixed by #4088

Comments

@xhernandez
Copy link
Contributor

There is a race that can cause that the same brick is started twice by glusterd at the same time. One of the brick processes will detect that there's another brick process running and will stop, which is correct. However, depending on the order this happens, glusterd may think that the brick has not started when actually it's running.

xhernandez added a commit to xhernandez/glusterfs that referenced this issue Mar 29, 2023
There was a race in glusterd code that could cause that two threads
start the same brick at the same time. One of the bricks will fail
because it will detect the other brick running. Depending on which
brick fails, glusterd will report a start failure and mark the brick
as stopped even if it's running.

The problem is caused by an attempt to connect to a brick that's being
started by another thread. If the brick is not fully initialized, it
will refuse all connection attempts. When this happens, glusterd receives
a disconnection notification, which forcibly marks the brick as stopped.

Now, if another attempt to start the same brick happens, it will believe
that the brick is stopped and it will start it again. If this happens
very soon after the first start attempt, the checks done to see if the
brick is already running will still fail, triggering the start of the
brick process again. One of the bricks will fail to initialize and will
report an error. If the failed one is processed by glusterd in the
second place, the brick will be marked as stopped, even though the
process is actually there and working.

Fixes: gluster#4080
Signed-off-by: Xavi Hernandez <[email protected]>
@xhernandez xhernandez linked a pull request Mar 29, 2023 that will close this issue
xhernandez added a commit to xhernandez/glusterfs that referenced this issue Mar 29, 2023
There was a race in glusterd code that could cause that two threads
start the same brick at the same time. One of the bricks will fail
because it will detect the other brick running. Depending on which
brick fails, glusterd will report a start failure and mark the brick
as stopped even if it's running.

The problem is caused by an attempt to connect to a brick that's being
started by another thread. If the brick is not fully initialized, it
will refuse all connection attempts. When this happens, glusterd receives
a disconnection notification, which forcibly marks the brick as stopped.

Now, if another attempt to start the same brick happens, it will believe
that the brick is stopped and it will start it again. If this happens
very soon after the first start attempt, the checks done to see if the
brick is already running will still fail, triggering the start of the
brick process again. One of the bricks will fail to initialize and will
report an error. If the failed one is processed by glusterd in the
second place, the brick will be marked as stopped, even though the
process is actually there and working.

Fixes: gluster#4080
Signed-off-by: Xavi Hernandez <[email protected]>
xhernandez added a commit to xhernandez/glusterfs that referenced this issue May 10, 2023
When a brick is asynchorously started it's likely that an immediate
connection attempt will fail. In this case just avoid the connection.
The connection will be created the next time the brick is needed.

Fixes: gluster#4080
Signed-off-by: Xavi Hernandez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant