Shorten, simplify and generalise code for Nédélec finite elements #3840

nmnobre · 2024-04-23T17:24:27Z

The overarching idea here is to generalise (some of) the code for arbitrary order finite elements.
I'm aware this would be a mammoth task, but I thought we could start with some small steps to better grasp the range of changes that'd be needed.

Since we are working on adding support for 2nd order Nédélec elements of the first kind, I thought instead of just hard-coding the various properties (e.g. the no. of dofs) we could try to generalise them. In the same spirit, for the shape functions, since they are orientation-dependent, I tried to avoid writing them twice as we had done thus far because even though that's not a lot for 1st order, trust me, it is for 2nd order. Here, it'd also be interesting to experiment with obtaining the shape functions from the dof definitions, although this would (a bit more fundamentally) change the way we think about assembly (we'd need to store the shape functions a-priori for a given element type and order perhaps the first time we see such an element).

Lastly, I'd like to understand what n_dofs_per_elem is supposed to return, and in particular its relationship with the middle node on TRI7, QUAD9 or HEX27 elements. For 2nd order Nédélec, we do have some dofs that are associated with the element, so where should they go? The middle node, or the element?

Cheers,
-Nuno

moosebuild · 2024-04-24T10:43:40Z

Job Coverage on b377c8b wanted to post the following:

Coverage

	196a0a	#3840 b377c8
	Total	Total	+/-	New
Rate	62.69%	62.70%	+0.01%	80.17%
Hits	69280	69242	-38	97
Misses	41226	41194	-32	24

Diff coverage report

Full coverage report

Warnings

New new line coverage rate 80.17% is less than the suggested 90.0%

This comment will be updated on new commits.

roystgnr · 2024-04-26T19:24:10Z

I thought instead of just hard-coding the various properties (e.g. the no. of dofs) we could try to generalise them.

This is usually a good idea and a pretty easy one; see e.g. fe_hierarchic.C for a decent example of it.

Heavily refactoring shape function definitions is also usually a good idea and usually not a pretty easy one; see e.g. fe_hierarchic_shape_3D.C for an it-was-the-best-I-could-do example of it.

we'd need to store the shape functions a-priori for a given element type and order perhaps the first time we see such an element

With Lagrange shape functions we cache results in between elements, but (unless the elements that are just shifts of each other, where we cache everything for everything) it doesn't quite save as much as you'd think; you don't have to recompute in master space each time but the physical space gradients have some cost.

n_dofs_per_elem is intended to return the number of DoFs which are stored on the Elem. If you have an element with a middle node and you've got a nodal basis like LAGRANGE or BERNSTEIN then the node is where any "bubble" shape functions belong and that's where we store them; if not then we usually just put them on the Elem; there's no big difference one way or another.

nmnobre · 2024-04-29T12:32:14Z

This is usually a good idea and a pretty easy one; see e.g. fe_hierarchic.C for a decent example of it.

Perfect, my code was already very similar to that, but I've refactored to follow the same style.

Heavily refactoring shape function definitions is also usually a good idea and usually not a pretty easy one; see e.g. fe_hierarchic_shape_3D.C for an it-was-the-best-I-could-do example of it.

Okay, let's leave that for later then. Let's focus on fe_nedelec.one.C for now, and in particular on nedelec_one_nodal_soln(). I got a bit confused with the logic surrounding order and total_order, and fe_type (previously used for n_sf) and p_refined_fe_type (used for vis_fe), could you please confirm the changes I made are sound?

if not then we usually just put them on the Elem; there's no big difference one way or another.

Good, stored on the element then.

roystgnr · 2024-06-10T14:07:05Z

Sorry for the late review; this all looks like obvious improvements to me (assuming the higher-o cases are correct; I can't wait until we have that working!).

I'm going to be out shortly, but I should find time to merge this late today regardless unless anyone objects first.

nmnobre · 2024-06-10T14:17:31Z

Hi Roy,

Thanks a lot for finding the time to review this.
Spoiler: we have 2nd order implementations on both QUADs and TRIs working already.

There's two things I'd like to ask before you merge:

If you are happy with 0f0ca51, ideally n_dofs would move to fe.C and we could remove the specialisation for most FE families. Off the top of your head, are there any FE families (or any other reason) this couldn't happen?
In the same vein, could you have a close look at nedelec_one_nodal_soln()? Except for the checks which might be particular to specific FE families, the method we're using to evaluate the solution at the nodes is pretty general... I understand for nodal variables this might not be optimal (performance-wise) but it'd certainly still work (i.e. it'd still compute the correct result). Also, what's currently happening at AMR interfaces? Do we need special provisions there?

roystgnr · 2024-06-10T18:27:26Z

If you are happy with 0f0ca51

I'm fine with it, but I actually wouldn't have gone in that direction myself.

are there any FE families (or any other reason) this couldn't happen?

There are no families for which this couldn't happen, but there are two reasons why IMHO it shouldn't:

We need to benchmark it first, in a worst-case (first-order LAGRANGE on HEX27?) scenario. We've done a lot of micro-optimization over the years, at least for the LAGRANGE case, and I'd hate to only find out a year down the road that switching
We get just a little bit more error checking in METHOD=dbg, in DofMap::dof_indices IIRC, where we extend a vector of indices n_dofs_per_elem and n_dofs_per_node at a time, but then at the end assert that we got n_dofs out of it. This has caught bugs in new FE types and new specializations of FE types in the past, but if we replaced n_dofs with something that sums n_dofs_per_elem and n_dofs_per_node at a time then we're just replacing that verification with a tautology. I didn't say anything about that commit because I'm willing to grand that you don't need the extra bug-checking, but I'm afraid I need it.

In the same vein, could you have a close look at nedelec_one_nodal_soln()? Except for the checks which might be particular to specific FE families, the method we're using to evaluate the solution at the nodes is pretty general... I understand for nodal variables this might not be optimal (performance-wise) but it'd certainly still work

I didn't see any problems, but I fear my disdain for nodal_soln() might have me subconsciously not looking as hard as I should have been; IIRC I wrote RATIONAL_BERNSTEIN nodal_soln() with a bug not too long ago. I wouldn't worry at all about performance there, though, it gets called once per solution output (e.g. once per timestep) when we're decimating for visualization, and because of that whole "decimating" thing it should never be getting called from anything like a residual assembly that might happen more frequently.

Also, what's currently happening at AMR interfaces?

When we see a non-conforming interface, we call FE<dim,NEDELEC_ONE>::compute_constraints() to calculate the constraint equations needed to make the solution conforming on that interface, and that ends up throwing libmesh_not_implemented()

Do we need special provisions there?

I'm afraid so. We have a generic compute_proj_constraints that looks at FE continuity type, and then either returns (DISCONTINUOUS) or calculates the constraints via local L2 (C_ZERO, SIDE_DISCONTINUOUS) or H1 (C_ONE) projections, but nobody's added support for H_DIV or H_CURL there yet.

nmnobre · 2024-06-10T23:06:44Z

There are no families for which this couldn't happen, but there are two reasons why IMHO it shouldn't:

We need to benchmark it first, in a worst-case (first-order LAGRANGE on HEX27?) scenario. We've done a lot of micro-optimization over the years, at least for the LAGRANGE case, and I'd hate to only find out a year down the road that switching

We get just a little bit more error checking in METHOD=dbg, in DofMap::dof_indices IIRC, where we extend a vector of indices n_dofs_per_elem and n_dofs_per_node at a time, but then at the end assert that we got n_dofs out of it. This has caught bugs in new FE types and new specializations of FE types in the past, but if we replaced n_dofs with something that sums n_dofs_per_elem and n_dofs_per_node at a time then we're just replacing that verification with a tautology. I didn't say anything about that commit because I'm willing to grand that you don't need the extra bug-checking, but I'm afraid I need it.

These are exactly the two points I was worried about: 1) performance and 2) that we lose a potential check just like what you described (even though I couldn't find it). Before adding that commit, I actually checked myself that adding the element and node dofs produced the expression I was writing down under n_dofs. I'll recover that by dropping the commit. In any case, my intent was always to either replace n_dofs either for all or for no FE families, not just for Nédélec. Consistency should be queen. :)

nmnobre · 2024-06-10T23:20:50Z

I wouldn't worry at all about performance there, though, it gets called once per solution output (e.g. once per timestep) when we're decimating for visualization

Perfect, so are we saying we can move this to fe.C and call that same method for all families (except for any specific checks like element type)? - hopefully, by applying this to all other families, CI would also flag any obvious mistakes we haven't spotted?

and that ends up throwing libmesh_not_implemented()

Hm, that's good. That means we can proceed without AMR support until we need it? 😇

roystgnr · 2024-06-11T14:33:43Z

so are we saying we can move this to fe.C and call that same method for all families (except for any specific checks like element type)? - hopefully, by applying this to all other families, CI would also flag any obvious mistakes we haven't spotted?

We definitely still want to allow individual families to override the default, and we'll want to keep the existing per-family code for cases where there's a property that makes for simple+cheap reimplementation (e.g. LAGRANGE or SCALAR), or where there's a serious performance issue (e.g. RATIONAL_*), or where there's an issue in even defining a nodal solution sensibly (SIDE_HIERARCHIC) ... but yeah, I think at least 80% of our FE families probably ought to be calling the same code here. Let's save that for a separate PR, though.

That means we can proceed without AMR support until we need it?

Sadly, but yes. 😅

nmnobre · 2024-06-12T21:18:26Z

For completion and future reference.

I compiled libMesh with MOOSE's default configuration plus --enable-perflog.
Then ran the 2d Nédélec one example, i.e. vector_fe/ex3, with grid_size = 100, 5 times with
for i in {1..5}; do ./example-opt element_type=TRI7 -pc_type jacobi | grep "| FE " -A2; done.

BEFORE:

| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0368      0.000000    0.0368      0.000000    0.50     0.50     |
|   init_shape_functions()           121600     0.0386      0.000000    0.0386      0.000000    0.52     0.52     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0382      0.000000    0.0382      0.000000    0.51     0.51     |
|   init_shape_functions()           121600     0.0386      0.000000    0.0386      0.000000    0.52     0.52     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0379      0.000000    0.0379      0.000000    0.51     0.51     |
|   init_shape_functions()           121600     0.0384      0.000000    0.0384      0.000000    0.52     0.52     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0377      0.000000    0.0377      0.000000    0.51     0.51     |
|   init_shape_functions()           121600     0.0382      0.000000    0.0382      0.000000    0.52     0.52     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0382      0.000000    0.0382      0.000000    0.52     0.52     |
|   init_shape_functions()           121600     0.0386      0.000000    0.0386      0.000000    0.52     0.52     |

AFTER:

| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0423      0.000000    0.0423      0.000000    0.57     0.57     |
|   init_shape_functions()           121600     0.0396      0.000000    0.0396      0.000000    0.53     0.53     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0424      0.000000    0.0424      0.000000    0.57     0.57     |
|   init_shape_functions()           121600     0.0387      0.000000    0.0387      0.000000    0.52     0.52     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0429      0.000000    0.0429      0.000000    0.58     0.58     |
|   init_shape_functions()           121600     0.0394      0.000000    0.0394      0.000000    0.53     0.53     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0436      0.000000    0.0436      0.000000    0.59     0.59     |
|   init_shape_functions()           121600     0.0396      0.000000    0.0396      0.000000    0.53     0.53     |
| FE                                                                                                              |
|   compute_shape_functions()        121600     0.0428      0.000000    0.0428      0.000000    0.58     0.58     |
|   init_shape_functions()           121600     0.0401      0.000000    0.0401      0.000000    0.54     0.54     |

Same, but with grid_size = 150...

BEFORE:

| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0859      0.000000    0.0859      0.000000    0.41     0.41     |
|   init_shape_functions()           272400     0.0851      0.000000    0.0851      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0845      0.000000    0.0845      0.000000    0.40     0.40     |
|   init_shape_functions()           272400     0.0857      0.000000    0.0857      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0857      0.000000    0.0857      0.000000    0.41     0.41     |
|   init_shape_functions()           272400     0.0859      0.000000    0.0859      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0857      0.000000    0.0857      0.000000    0.41     0.41     |
|   init_shape_functions()           272400     0.0859      0.000000    0.0859      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0854      0.000000    0.0854      0.000000    0.41     0.41     |
|   init_shape_functions()           272400     0.0862      0.000000    0.0862      0.000000    0.41     0.41     |

AFTER:

| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0964      0.000000    0.0964      0.000000    0.46     0.46     |
|   init_shape_functions()           272400     0.0880      0.000000    0.0880      0.000000    0.42     0.42     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0959      0.000000    0.0959      0.000000    0.46     0.46     |
|   init_shape_functions()           272400     0.0862      0.000000    0.0862      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0949      0.000000    0.0949      0.000000    0.45     0.45     |
|   init_shape_functions()           272400     0.0867      0.000000    0.0867      0.000000    0.41     0.41     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0966      0.000000    0.0966      0.000000    0.46     0.46     |
|   init_shape_functions()           272400     0.0893      0.000000    0.0893      0.000000    0.43     0.43     |
| FE                                                                                                              |
|   compute_shape_functions()        272400     0.0965      0.000000    0.0965      0.000000    0.46     0.46     |
|   init_shape_functions()           272400     0.0880      0.000000    0.0880      0.000000    0.42     0.42     |

roystgnr · 2024-06-12T21:48:53Z

That's not a huge difference but it is much more than I'd expected. Init might be 1% slower or that might be my imagination, but compute looks 15% slower? IMHO that's worth it for more general and simpler code, but it's at least debatable.

nmnobre · 2024-06-12T22:11:17Z

Yeah.... (sigh)... a bit disappointing... comparing the fastest times, init is roughly 1% slower and compute is about 15% slower for the smaller grid size and 12% slower for the larger grid size I just added. I'm inclined to agree with you that it's worth it as, at least for this potato problem, assembly is <0.5% of the total time...

nmnobre marked this pull request as draft April 23, 2024 17:25

nmnobre force-pushed the ndHO branch 3 times, most recently from 6a268a4 to ebbefa6 Compare April 24, 2024 07:24

nmnobre force-pushed the ndHO branch 6 times, most recently from 6ac3a4d to d36c8ba Compare April 29, 2024 12:20

nmnobre marked this pull request as ready for review April 29, 2024 12:33

nmnobre force-pushed the ndHO branch 3 times, most recently from 0926a3f to 55155d4 Compare April 29, 2024 14:15

Minor tweaks

dee3b05

nmnobre force-pushed the ndHO branch 3 times, most recently from 9d3baea to e5e9fa8 Compare June 3, 2024 22:55

nmnobre added 2 commits June 4, 2024 09:55

Shorten (and hopefully simplify) code for Nédélec shape functions

270c694

Provision for high-order Nédélec elements

b377c8b

nmnobre force-pushed the ndHO branch from e5e9fa8 to 0f0ca51 Compare June 4, 2024 08:56

lindsayad requested a review from roystgnr June 4, 2024 15:19

roystgnr approved these changes Jun 10, 2024

View reviewed changes

nmnobre force-pushed the ndHO branch from 0f0ca51 to b377c8b Compare June 10, 2024 23:15

roystgnr merged commit 22800f6 into libMesh:devel Jun 11, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shorten, simplify and generalise code for Nédélec finite elements #3840

Shorten, simplify and generalise code for Nédélec finite elements #3840

nmnobre commented Apr 23, 2024 •

edited

moosebuild commented Apr 24, 2024 •

edited

roystgnr commented Apr 26, 2024

nmnobre commented Apr 29, 2024 •

edited

roystgnr commented Jun 10, 2024

nmnobre commented Jun 10, 2024 •

edited

roystgnr commented Jun 10, 2024

nmnobre commented Jun 10, 2024

nmnobre commented Jun 10, 2024 •

edited

roystgnr commented Jun 11, 2024

nmnobre commented Jun 12, 2024 •

edited

roystgnr commented Jun 12, 2024

nmnobre commented Jun 12, 2024 •

edited

Shorten, simplify and generalise code for Nédélec finite elements #3840

Shorten, simplify and generalise code for Nédélec finite elements #3840

Conversation

nmnobre commented Apr 23, 2024 • edited

moosebuild commented Apr 24, 2024 • edited

Coverage

Warnings

roystgnr commented Apr 26, 2024

nmnobre commented Apr 29, 2024 • edited

roystgnr commented Jun 10, 2024

nmnobre commented Jun 10, 2024 • edited

roystgnr commented Jun 10, 2024

nmnobre commented Jun 10, 2024

nmnobre commented Jun 10, 2024 • edited

roystgnr commented Jun 11, 2024

nmnobre commented Jun 12, 2024 • edited

roystgnr commented Jun 12, 2024

nmnobre commented Jun 12, 2024 • edited

nmnobre commented Apr 23, 2024 •

edited

moosebuild commented Apr 24, 2024 •

edited

nmnobre commented Apr 29, 2024 •

edited

nmnobre commented Jun 10, 2024 •

edited

nmnobre commented Jun 10, 2024 •

edited

nmnobre commented Jun 12, 2024 •

edited

nmnobre commented Jun 12, 2024 •

edited