Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABI: How to handle optional datatypes? #6877

Open
dalcinl opened this issue Jan 21, 2024 · 7 comments
Open

ABI: How to handle optional datatypes? #6877

dalcinl opened this issue Jan 21, 2024 · 7 comments

Comments

@dalcinl
Copy link
Contributor

dalcinl commented Jan 21, 2024

In the current MPICH ABI, optional unsupported datatypes are set to MPI_DATATYPE_NULL. This allows for an easy check at runtime to determine whether any optional datatype is actually available.

Within the new in-development MPI ABI walls, optional datatypes have an integral value different from the MPI_DATATYPE_NULL datatype handle. However, in the current MPICH implementation for the new MPI ABI, unsupported datatypes (e.g. MPI_INTEGER16, MPI_REAL2, MPI_COMPLEX4) get eventually translated back to MPI_DATATYPE_NULL, and using them in any current MPI API will trigger an error. Therefore, there is no way to check whether an optional datatype is supported or not.

Relying on error handling is cumbersome in general, as it would require querying the current error handler, overriding with MPI_ERROR_RETURNS, then call a datatype API like MPI_Type_get_size, check for the return error code, restore the original error handler and free the reference.

How are we going to handle this issue? At first, I thought about adding some sort of custom workaround for MPICH. But this issue would be exactly the same with any MPI implementation, so IMHO it is worth to explore a more general solution.
For example we could mandate MPI_Type_size() to return 0 or even better MPI_UNDEFINED , although it feels a bit awkward to use that routine for such purpose. A possibly nicer and general solution I came up with is adding a MPI_COMBINER_UNSUPPORTED value to be returned from MPI_Type_get_envelope to flag unsupported datatypes. Or perhaps even better, return combiner=MPI_UNDEFINED, this way we do not need to introduce a new constant.

cc @jeffhammond

@hzhou
Copy link
Contributor

hzhou commented Jan 22, 2024

Both MPI_Type_size and MPI_Type_get_envelope returning MPI_UNDEFINED sounds fine. We can support both.

@dalcinl
Copy link
Contributor Author

dalcinl commented Jan 22, 2024

Both MPI_Type_size and

Perhaps MPI_Type_size failing out right is better?

If we allow MPI_Type_size to succeed and return MPI_UNDEFINED, why not do the same with MPI_Type_get_[true_]extent? And then, there are other query routines that could be allowed to succeed and return some special output.

Making everything but MPI_Type_get_envelope fail contributes to errors not passing silently, e.g. calls to MPI_Type_size that unconsciously use an negative size value afterwards, leading to potentially disastrous bugs that can slip to production. You can argue that MPI_Type_get_envelope can also be use with little care, but IMHO this is a routine far less used, and the whole datatype decoding thing is so cumbersome that those how use it usually know better the MPI business and are more aware of the gory details.

@jeffhammond
Copy link
Member

We can think of this another way. For example, MPI_REAL16 is defined to be a 16-byte type. There is no problem with using it in MPI to move 16-byte slabs of memory, even if the Fortran compiler does not support REAL*16 or REAL(kind=REAL128). The problem arises when a reduction is performed because, per the following, this is the one place where the representation matters.

Therefore, it might be reasonable to defer errors until reductions are used, because that's the only place where there is a real problem. I recognize this is problematic for users to detect support for such types, hence it may be necessary to add a utility function to query for optional type support.

I'll note that today we already have a problem in this area because all the MPI implementations are doing arithmetic in C, which means that if a Fortran compiler ever defines arithmetic in a different manner than C, MPI implementations will product incorrect results.

Rationale. Particularly for the longer floating-point types, C and Fortran may use different representations. For example, a Fortran compiler may define a 16-byte REAL type with 33 decimal digits of precision while a C compiler may define a 16-byte long double type that implements an 80-bit (10 byte) extended precision floating point value. Both of these types are 16 bytes long, but they are not interoperable. Thus, these types are defined by Fortran, even though C may define types of the same length. (End of rationale.)

@dalcinl
Copy link
Contributor Author

dalcinl commented Jan 22, 2024

We can think of this another way. For example, MPI_REAL16 is defined to be a 16-byte type.

What about MPI_INTEGER, MPI_REAL, etc ? These types may be optional/unsupported simply because the MPI implementation does not support Fortran, or was configured without Fortran support. In this case, would you assume the sizes to be the usual ones you get from most Fortran compilers without special flags?

hence it may be necessary to add a utility function to query for optional type support.

I definitely agree we still need something to flag optional types. However, rather than adding a new API, my proposal of "reusing" MPI_Type_get_envelope and combiner=MPI_UNDEFINED is an equivalent alternative that can be implemented right now on top of the current MPI 4.1 standard. You may find the semantics contrived and not like it. I'm just trying to prevent API explosion.

@jeffhammond
Copy link
Member

You're right, although the propsoed MPI_ABI_DETAILS seems like a reasonable way to determine the Fortran ABI, no?

Because the size of the default integer and real types isn't specified in the ABI, those can't be used at all when they are unsupported.

@dalcinl
Copy link
Contributor Author

dalcinl commented Jan 22, 2024

You're right, although the propsoed MPI_ABI_DETAILS seems like a reasonable way to determine the Fortran ABI, no?

No idea what are you talking about 😥.

Because the size of the default integer and real types isn't specified in the ABI, those can't be used at all when they are unsupported.

Then back to my original point, even if We can think of this another way and defer errors as you said, in the end we would still need a runtime mechanism to flag optional datatypes.

@hzhou
Copy link
Contributor

hzhou commented Jan 22, 2024

Therefore, it might be reasonable to defer errors until reductions are used, because that's the only place where there is a real problem. I recognize this is problematic for users to detect support for such types, hence it may be necessary to add a utility function to query for optional type support.

I like this. Now I recall this was my thinking several years ago :) . We can always define a type size and always define the datatype. The type size may be inaccurate if the supported language/compiler, e.g. Fortran, do not support it, but that shouldn't cause user issues since users shouldn't be using it from the unsupported languages anyway. The reduction should fall for such datatypes when the implementation can't do it -- this is the current behavior anyway.

This removes the question of datatype availability altogether. If it compiles, it is available. Whether reduction is supported is a separate question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants