-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible initialization bug for LORSolver<HypreBoomerAMG>
#4286
Comments
Do you get the same initcheck errors if you try an example with AMG, but without LOR preconditioning? For example:
(Yes, with the LOR solvers, set the preconditioner after setting the operator. Maybe the LOR solvers should just give an error if you try to call |
Hi @pazner , And that makes sense about setting it up after setting the operator. I just wanted to make sure that was the accepted use.
|
I also get some errors (not the same as the ones you are getting), but I don't know if they are false positives or not. But what this tells us is that it does not appear to be caused by the LOR preconditioner, but is something in the hypre AMG setup. The errors I see are all like this one:
Any ideas,@v-dobrev, @liruipeng? |
@pazner I think we should close this for now until I can further isolate the problem. I changed some things in my stack and the issue went away, and I didn't save my old spack concretization, so it could have been a false positive. I can reopen the issue if I can isolate two configurations where one causes the sanitizer to trip, and the other doesn't. |
Sounds good, please re-open if needed. |
On GPUs, I've experienced non-deterministic behavior in the lor_elast miniapp I submitted. It's possibly due to some objects not being initialized since cuda's
initcheck
sanitizer throws errors.To investigate, I made a MRE mre_lor.gz that's a modification of
ex1p.cpp
, and solves Poisson. The system is solved withCGSolver
matrix free (partial assembly) and preconditioned with aLORSolver<HypreBoomerAMG>
approximation of the same bilinear form. This is a simplification of thelor_elast
miniapp which further does a block diagonal preconditioner ofLORSolver<HypreBoomerAMG>
s for a vector valued PDE.Two sample runs are:
compute-sanitizer --tool initcheck ./mre_lor -d cuda
compute-sanitizer --tool initcheck ./mre_lor -b
The first run is similar to the miniapp in that it sets the preconditioner in
CGSolver
AFTER setting the operator. This runs, but the sanitizer shows 5 errors, and the number of errors scales with the problem size. When mfem is built for cpu only, I don't have issues with sanitizers with this approach.The second run sets the preconditioner BEFORE setting the operator which is recommended, but this results in a runtime error. Here's the code snippet for details about why.
prec
is aLORSolver<HypreBoomerAMG>
.Am I setting up the solver wrong?
I'm not sure how accurate CUDA's sanitizers are, but I tried this on a few different compiler, mpi implementation and cuda version combinations.
Thanks!
The text was updated successfully, but these errors were encountered: