Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid implicit conversions for bitwise operators #2708

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mnijhuis-tos
Copy link

@mnijhuis-tos mnijhuis-tos commented Jun 2, 2023

This PR partially addresses #2707.

When using XSimd, the following code will compile, however, the resulting code is inefficient if the return type of operator|(const char&, const char&) is int instead of char. [C++ allows implicitly converting/promoting char to int.]
(https://en.cppreference.com/w/cpp/language/implicit_conversion).

#include <xtensor/xfixed.hpp>

void xtensor_or(xt::xtensor_fixed<char, xt::xshape<16>>& b1,
                const xt::xtensor_fixed<char, xt::xshape<16>>& b2) {
    b1 |= b2;    
}

The generated assembly code performs the operation using 32-bit integers and contains many pack and unpack instructions around it. Supplying explict return types in the functors instead of 'auto' avoids those pack and unpack instructions and has a single 'vpor' instruction (when using avx2) that or's all 16 values at once.

Although this improvement works for small integral values like char and unsigned short, using bool still results in inefficient code since xt_simd::simd_condition<bool, bool>::value is false. xsimd::detail::simd_condition explicitly is false when both types are bool, for unknown reasons.

I have only applied the improvement to bitwise operators, since it's probably not safe for other operators. For example, adding characters may overflow so there the implicit conversion to int is useful. On the other side, adding ints themselves could also overflow and they are not promoted to other types. Perhaps this optimization is possible for all operators.

#define UNARY_BITWISE_OPERATOR_FUNCTOR(NAME, OP) \
struct NAME \
{ \
template <class A1> \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is the name A1 something that was used in similar bits of codes. If not, I would find e.g. T more logical (or even A or E which is used in many places)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing UNARY_OPERATOR_FUNCTOR indeed also uses A1. I'll happily change both to T, A, or E.

return OP arg; \
} \
template <class B> \
constexpr auto simd_apply(const B& arg) const \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question out of ignorance: Why don't you use std::decay_t<B> here.
Style: please use the same template name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using std::decay_t<B> indeed is consistent with the return type of operator().
Style: Do you mean I should use the same template argument name in operator() and simd_apply, e.g., use template <class T> in both cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great, yes!

Comment on lines +107 to +109
constexpr std:: \
conditional_t<(sizeof(std::decay_t<T1>) > sizeof(std::decay_t<T2>)), std::decay_t<T1>, std::decay_t<T2>> \
operator()(T1&& arg1, T2&& arg2) const \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what happens here: the opening and closing (..) and <..> don't even seem to match?

Copy link
Author

@mnijhuis-tos mnijhuis-tos Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky part of this expression is that one of the > is a greater-than sign:

The first argument to std::condition_t is sizeof(std::decay_t<T1>) > sizeof(std::decay_t<T2>). Because of the greater-than sign, I'm using extra parentheses around it. I can probably replace it by std::greater, however, I'm afraid the expression will mainly get longer and doesn't become more readable.

The other arguments to std::condition_t are std::decay_t<T1> and std::decay_t<T2>. So, the conditional selects the largest type of T1 and T2. A (bitwise) operator on two chars will now yield a char, instead of an int. Combining a short and a long yields a long.

Combining two equally-sized unsigned and signed types currently yields the second type, e.g, combining uint16_t and int16_t yields an int16_t return value. For bitwise operations, this shouldn't be a problem. Note that I can easily make it return the first type using >= instead of > when comparing the sizes.

For other operations, like addition or subtraction, I can imagine we have to follow the C++ rules closer. I found the following results when using decltype with g++ 11.3:

  • decltype(int8_t() + int8_t()) -> int

  • decltype(uint8_t() + uint8_t()) -> int`

  • decltype(int8_t() + uint8_t()) -> int

  • decltype(uint8_t() + int8_t()) -> int

  • (u)int16_t: Same result as (u)int8_t: Always int.

  • decltype(int32_t() + int32_t()) -> int32_t

  • decltype(uint32_t() + uint32_t()) -> uint32_t`

  • decltype(int32_t() + uint32_t()) -> uint32_t

  • decltype(uint32_t() + int32_t()) -> uint32_t

  • decltype(int32_t() + int64_t()) -> int64_t

  • decltype(uint32_t() + uint64_t()) -> uint64_t`

  • decltype(int32_t() + uint64_t()) -> uint64_t

  • decltype(uint32_t() + int64_t()) -> int64_t

  • decltype(int32_t() + float()) -> float

  • decltype(uint32_t() + float()) -> float

  • decltype(float() + uint32_t()) -> float

  • decltype(float() + int32_t()) -> float

  • decltype(int64_t() + float()) -> float (even though sizeof(float) (4 bytes) is smaller than sizeof(int64_t)!

  • decltype(uint64_t() + float()) -> float

  • decltype(float() + uint64_t()) -> float

  • decltype(float() + int64_t()) -> float

Perhaps the following stategy will work for all binary operations:

  • If the input types are equal, use that as the return type.
  • If both types are floating point types, return the largest type.
  • If only one of the types is a floating point type, return the floating point type.
  • If the type sizes differ, return the largest type (which can be signed).
  • If the type sizes are equal and they only differ in signedness, return the unsigned type.

Code similar to simd_return_type_impl in XSimd could implement this return type strategy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the clarification!! I'm just not sure about the latter case: stripping the signedness is dangerous here no?

#define BINARY_BITWISE_OPERATOR_FUNCTOR(NAME, OP) \
struct NAME \
{ \
template <class T1, class T2> \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use T and S

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing BINARY_OPERATOR_FUNCTOR uses T1 and T2, too. Using T1 and T2 as names reflects the fact that the types are typically similar or the same, so I rather stick to T1 and T2.

using xt::detail::operator OP; \
return (std::forward<T1>(arg1) OP std::forward<T2>(arg2)); \
} \
template <class B> \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you allow the two types here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing BINARY_OPERATOR_FUNCTOR also has a single template argument for simd_apply. I don't know why. Using two arguments seems more logical, indeed.

@tdegeus
Copy link
Member

tdegeus commented Jun 14, 2023

Thanks. This is not my expertise, but I do already have some questions.

@spectre-ns
Copy link
Contributor

@tdegeus the checks on this appear to be stuck burning up compute time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants