Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorten, simplify and generalise code for Nédélec finite elements #3840

Merged
merged 3 commits into from
Jun 11, 2024

Conversation

nmnobre
Copy link
Member

@nmnobre nmnobre commented Apr 23, 2024

Hi Roy (@roystgnr), Alex (@lindsayad),

The overarching idea here is to generalise (some of) the code for arbitrary order finite elements.
I'm aware this would be a mammoth task, but I thought we could start with some small steps to better grasp the range of changes that'd be needed.

Since we are working on adding support for 2nd order Nédélec elements of the first kind, I thought instead of just hard-coding the various properties (e.g. the no. of dofs) we could try to generalise them. In the same spirit, for the shape functions, since they are orientation-dependent, I tried to avoid writing them twice as we had done thus far because even though that's not a lot for 1st order, trust me, it is for 2nd order. Here, it'd also be interesting to experiment with obtaining the shape functions from the dof definitions, although this would (a bit more fundamentally) change the way we think about assembly (we'd need to store the shape functions a-priori for a given element type and order perhaps the first time we see such an element).

Lastly, I'd like to understand what n_dofs_per_elem is supposed to return, and in particular its relationship with the middle node on TRI7, QUAD9 or HEX27 elements. For 2nd order Nédélec, we do have some dofs that are associated with the element, so where should they go? The middle node, or the element?

Cheers,
-Nuno

@nmnobre nmnobre marked this pull request as draft April 23, 2024 17:25
@nmnobre nmnobre force-pushed the ndHO branch 3 times, most recently from 6a268a4 to ebbefa6 Compare April 24, 2024 07:24
@moosebuild
Copy link

moosebuild commented Apr 24, 2024

Job Coverage on b377c8b wanted to post the following:

Coverage

196a0a #3840 b377c8
Total Total +/- New
Rate 62.69% 62.70% +0.01% 80.17%
Hits 69280 69242 -38 97
Misses 41226 41194 -32 24

Diff coverage report

Full coverage report

Warnings

  • New new line coverage rate 80.17% is less than the suggested 90.0%

This comment will be updated on new commits.

@roystgnr
Copy link
Member

I thought instead of just hard-coding the various properties (e.g. the no. of dofs) we could try to generalise them.

This is usually a good idea and a pretty easy one; see e.g. fe_hierarchic.C for a decent example of it.

Heavily refactoring shape function definitions is also usually a good idea and usually not a pretty easy one; see e.g. fe_hierarchic_shape_3D.C for an it-was-the-best-I-could-do example of it.

we'd need to store the shape functions a-priori for a given element type and order perhaps the first time we see such an element

With Lagrange shape functions we cache results in between elements, but (unless the elements that are just shifts of each other, where we cache everything for everything) it doesn't quite save as much as you'd think; you don't have to recompute in master space each time but the physical space gradients have some cost.

n_dofs_per_elem is intended to return the number of DoFs which are stored on the Elem. If you have an element with a middle node and you've got a nodal basis like LAGRANGE or BERNSTEIN then the node is where any "bubble" shape functions belong and that's where we store them; if not then we usually just put them on the Elem; there's no big difference one way or another.

@nmnobre nmnobre force-pushed the ndHO branch 6 times, most recently from 6ac3a4d to d36c8ba Compare April 29, 2024 12:20
@nmnobre
Copy link
Member Author

nmnobre commented Apr 29, 2024

This is usually a good idea and a pretty easy one; see e.g. fe_hierarchic.C for a decent example of it.

Perfect, my code was already very similar to that, but I've refactored to follow the same style.

Heavily refactoring shape function definitions is also usually a good idea and usually not a pretty easy one; see e.g. fe_hierarchic_shape_3D.C for an it-was-the-best-I-could-do example of it.

Okay, let's leave that for later then. Let's focus on fe_nedelec.one.C for now, and in particular on nedelec_one_nodal_soln(). I got a bit confused with the logic surrounding order and total_order, and fe_type (previously used for n_sf) and p_refined_fe_type (used for vis_fe), could you please confirm the changes I made are sound?

if not then we usually just put them on the Elem; there's no big difference one way or another.

Good, stored on the element then.

@nmnobre nmnobre marked this pull request as ready for review April 29, 2024 12:33
@nmnobre nmnobre force-pushed the ndHO branch 3 times, most recently from 0926a3f to 55155d4 Compare April 29, 2024 14:15
@nmnobre nmnobre force-pushed the ndHO branch 3 times, most recently from 9d3baea to e5e9fa8 Compare June 3, 2024 22:55
@roystgnr
Copy link
Member

Sorry for the late review; this all looks like obvious improvements to me (assuming the higher-o cases are correct; I can't wait until we have that working!).

I'm going to be out shortly, but I should find time to merge this late today regardless unless anyone objects first.

@nmnobre
Copy link
Member Author

nmnobre commented Jun 10, 2024

Hi Roy,

Thanks a lot for finding the time to review this.
Spoiler: we have 2nd order implementations on both QUADs and TRIs working already.

There's two things I'd like to ask before you merge:

  1. If you are happy with 0f0ca51, ideally n_dofs would move to fe.C and we could remove the specialisation for most FE families. Off the top of your head, are there any FE families (or any other reason) this couldn't happen?
  2. In the same vein, could you have a close look at nedelec_one_nodal_soln()? Except for the checks which might be particular to specific FE families, the method we're using to evaluate the solution at the nodes is pretty general... I understand for nodal variables this might not be optimal (performance-wise) but it'd certainly still work (i.e. it'd still compute the correct result). Also, what's currently happening at AMR interfaces? Do we need special provisions there?

@roystgnr
Copy link
Member

If you are happy with 0f0ca51

I'm fine with it, but I actually wouldn't have gone in that direction myself.

are there any FE families (or any other reason) this couldn't happen?

There are no families for which this couldn't happen, but there are two reasons why IMHO it shouldn't:

  1. We need to benchmark it first, in a worst-case (first-order LAGRANGE on HEX27?) scenario. We've done a lot of micro-optimization over the years, at least for the LAGRANGE case, and I'd hate to only find out a year down the road that switching

  2. We get just a little bit more error checking in METHOD=dbg, in DofMap::dof_indices IIRC, where we extend a vector of indices n_dofs_per_elem and n_dofs_per_node at a time, but then at the end assert that we got n_dofs out of it. This has caught bugs in new FE types and new specializations of FE types in the past, but if we replaced n_dofs with something that sums n_dofs_per_elem and n_dofs_per_node at a time then we're just replacing that verification with a tautology. I didn't say anything about that commit because I'm willing to grand that you don't need the extra bug-checking, but I'm afraid I need it.

In the same vein, could you have a close look at nedelec_one_nodal_soln()? Except for the checks which might be particular to specific FE families, the method we're using to evaluate the solution at the nodes is pretty general... I understand for nodal variables this might not be optimal (performance-wise) but it'd certainly still work

I didn't see any problems, but I fear my disdain for nodal_soln() might have me subconsciously not looking as hard as I should have been; IIRC I wrote RATIONAL_BERNSTEIN nodal_soln() with a bug not too long ago. I wouldn't worry at all about performance there, though, it gets called once per solution output (e.g. once per timestep) when we're decimating for visualization, and because of that whole "decimating" thing it should never be getting called from anything like a residual assembly that might happen more frequently.

Also, what's currently happening at AMR interfaces?

When we see a non-conforming interface, we call FE<dim,NEDELEC_ONE>::compute_constraints() to calculate the constraint equations needed to make the solution conforming on that interface, and that ends up throwing libmesh_not_implemented()

Do we need special provisions there?

I'm afraid so. We have a generic compute_proj_constraints that looks at FE continuity type, and then either returns (DISCONTINUOUS) or calculates the constraints via local L2 (C_ZERO, SIDE_DISCONTINUOUS) or H1 (C_ONE) projections, but nobody's added support for H_DIV or H_CURL there yet.

@nmnobre
Copy link
Member Author

nmnobre commented Jun 10, 2024

There are no families for which this couldn't happen, but there are two reasons why IMHO it shouldn't:

  1. We need to benchmark it first, in a worst-case (first-order LAGRANGE on HEX27?) scenario. We've done a lot of micro-optimization over the years, at least for the LAGRANGE case, and I'd hate to only find out a year down the road that switching
  2. We get just a little bit more error checking in METHOD=dbg, in DofMap::dof_indices IIRC, where we extend a vector of indices n_dofs_per_elem and n_dofs_per_node at a time, but then at the end assert that we got n_dofs out of it. This has caught bugs in new FE types and new specializations of FE types in the past, but if we replaced n_dofs with something that sums n_dofs_per_elem and n_dofs_per_node at a time then we're just replacing that verification with a tautology. I didn't say anything about that commit because I'm willing to grand that you don't need the extra bug-checking, but I'm afraid I need it.

These are exactly the two points I was worried about: 1) performance and 2) that we lose a potential check just like what you described (even though I couldn't find it). Before adding that commit, I actually checked myself that adding the element and node dofs produced the expression I was writing down under n_dofs. I'll recover that by dropping the commit. In any case, my intent was always to either replace n_dofs either for all or for no FE families, not just for Nédélec. Consistency should be queen. :)

@nmnobre
Copy link
Member Author

nmnobre commented Jun 10, 2024

I wouldn't worry at all about performance there, though, it gets called once per solution output (e.g. once per timestep) when we're decimating for visualization

Perfect, so are we saying we can move this to fe.C and call that same method for all families (except for any specific checks like element type)? - hopefully, by applying this to all other families, CI would also flag any obvious mistakes we haven't spotted?

and that ends up throwing libmesh_not_implemented()

Hm, that's good. That means we can proceed without AMR support until we need it? 😇

@roystgnr
Copy link
Member

so are we saying we can move this to fe.C and call that same method for all families (except for any specific checks like element type)? - hopefully, by applying this to all other families, CI would also flag any obvious mistakes we haven't spotted?

We definitely still want to allow individual families to override the default, and we'll want to keep the existing per-family code for cases where there's a property that makes for simple+cheap reimplementation (e.g. LAGRANGE or SCALAR), or where there's a serious performance issue (e.g. RATIONAL_*), or where there's an issue in even defining a nodal solution sensibly (SIDE_HIERARCHIC) ... but yeah, I think at least 80% of our FE families probably ought to be calling the same code here. Let's save that for a separate PR, though.

That means we can proceed without AMR support until we need it?

Sadly, but yes. 😅

@roystgnr roystgnr merged commit 22800f6 into libMesh:devel Jun 11, 2024
20 checks passed
@nmnobre
Copy link
Member Author

nmnobre commented Jun 12, 2024

For completion and future reference.

I compiled libMesh with MOOSE's default configuration plus --enable-perflog.
Then ran the 2d Nédélec one example, i.e. vector_fe/ex3, with grid_size = 100, 5 times with
for i in {1..5}; do ./example-opt element_type=TRI7 -pc_type jacobi | grep "| FE " -A2; done.

BEFORE:

| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0368      0.000000    0.0368      0.000000    0.50     0.50     |
|   init_shape_functions()           121600     0.0386      0.000000    0.0386      0.000000    0.52     0.52     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0382      0.000000    0.0382      0.000000    0.51     0.51     |
|   init_shape_functions()           121600     0.0386      0.000000    0.0386      0.000000    0.52     0.52     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0379      0.000000    0.0379      0.000000    0.51     0.51     |
|   init_shape_functions()           121600     0.0384      0.000000    0.0384      0.000000    0.52     0.52     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0377      0.000000    0.0377      0.000000    0.51     0.51     |
|   init_shape_functions()           121600     0.0382      0.000000    0.0382      0.000000    0.52     0.52     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0382      0.000000    0.0382      0.000000    0.52     0.52     |
|   init_shape_functions()           121600     0.0386      0.000000    0.0386      0.000000    0.52     0.52     |

AFTER:

| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0423      0.000000    0.0423      0.000000    0.57     0.57     |
|   init_shape_functions()           121600     0.0396      0.000000    0.0396      0.000000    0.53     0.53     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0424      0.000000    0.0424      0.000000    0.57     0.57     |
|   init_shape_functions()           121600     0.0387      0.000000    0.0387      0.000000    0.52     0.52     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0429      0.000000    0.0429      0.000000    0.58     0.58     |
|   init_shape_functions()           121600     0.0394      0.000000    0.0394      0.000000    0.53     0.53     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0436      0.000000    0.0436      0.000000    0.59     0.59     |
|   init_shape_functions()           121600     0.0396      0.000000    0.0396      0.000000    0.53     0.53     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0428      0.000000    0.0428      0.000000    0.58     0.58     |
|   init_shape_functions()           121600     0.0401      0.000000    0.0401      0.000000    0.54     0.54     |

Same, but with grid_size = 150...

BEFORE:

| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0859      0.000000    0.0859      0.000000    0.41     0.41     |
|   init_shape_functions()           272400     0.0851      0.000000    0.0851      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0845      0.000000    0.0845      0.000000    0.40     0.40     |
|   init_shape_functions()           272400     0.0857      0.000000    0.0857      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0857      0.000000    0.0857      0.000000    0.41     0.41     |
|   init_shape_functions()           272400     0.0859      0.000000    0.0859      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0857      0.000000    0.0857      0.000000    0.41     0.41     |
|   init_shape_functions()           272400     0.0859      0.000000    0.0859      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0854      0.000000    0.0854      0.000000    0.41     0.41     |
|   init_shape_functions()           272400     0.0862      0.000000    0.0862      0.000000    0.41     0.41     |

AFTER:

| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0964      0.000000    0.0964      0.000000    0.46     0.46     |
|   init_shape_functions()           272400     0.0880      0.000000    0.0880      0.000000    0.42     0.42     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0959      0.000000    0.0959      0.000000    0.46     0.46     |
|   init_shape_functions()           272400     0.0862      0.000000    0.0862      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0949      0.000000    0.0949      0.000000    0.45     0.45     |
|   init_shape_functions()           272400     0.0867      0.000000    0.0867      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0966      0.000000    0.0966      0.000000    0.46     0.46     |
|   init_shape_functions()           272400     0.0893      0.000000    0.0893      0.000000    0.43     0.43     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0965      0.000000    0.0965      0.000000    0.46     0.46     |
|   init_shape_functions()           272400     0.0880      0.000000    0.0880      0.000000    0.42     0.42     |

@roystgnr
Copy link
Member

That's not a huge difference but it is much more than I'd expected. Init might be 1% slower or that might be my imagination, but compute looks 15% slower? IMHO that's worth it for more general and simpler code, but it's at least debatable.

@nmnobre
Copy link
Member Author

nmnobre commented Jun 12, 2024

Yeah.... (sigh)... a bit disappointing... comparing the fastest times, init is roughly 1% slower and compute is about 15% slower for the smaller grid size and 12% slower for the larger grid size I just added. I'm inclined to agree with you that it's worth it as, at least for this potato problem, assembly is <0.5% of the total time...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants