Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bf16 kernel (OpSet13) for MatMul in CPU EP #20630

Open
ZchiPitt opened this issue May 9, 2024 · 1 comment
Open

bf16 kernel (OpSet13) for MatMul in CPU EP #20630

ZchiPitt opened this issue May 9, 2024 · 1 comment
Labels
core runtime issues related to core runtime feature request request for unsupported feature or enhancement

Comments

@ZchiPitt
Copy link

ZchiPitt commented May 9, 2024

Describe the issue

MatMul in ONNX OpSet 13 started to support bf16 (https://onnx.ai/onnx/operators/onnx__MatMul.html)

However, we dont see the implementation for bfloat16 in the CPU EP for matmul(13), https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/math/matmul.cc#L61-L89

Is there any reason this is still not supported since OpSet was released long time ago?

If we want to implement it on our own, is there any PR i can reference to?

Ping @snnn @pranavsharma for help

To reproduce

NA

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

NA

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@github-actions github-actions bot added the platform:windows issues related to the Windows platform label May 9, 2024
@snnn snnn added feature request request for unsupported feature or enhancement core runtime issues related to core runtime and removed platform:windows issues related to the Windows platform labels May 9, 2024
@tianleiwu
Copy link
Contributor

tianleiwu commented May 10, 2024

The reason behind the slow adoption of bf16 in CPU: for training, most models are trained with GPU; for inference, int8 and int4 quantization have better support (no need specified hardware).

If you want to implement it, I think you can use hardware specified library. For intel CPU, you can start from the following:
https://github.com/oneapi-src/oneDNN/blob/df3022638aaab0d1fdf62bc6ab16d9031739a0fc/src/cpu/gemm/gemm.cpp#L286

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime feature request request for unsupported feature or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants