Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown buffer size limitation for CRC operation #36

Open
bartlomiejgrzeskowiak opened this issue Oct 3, 2023 · 8 comments
Open

Unknown buffer size limitation for CRC operation #36

bartlomiejgrzeskowiak opened this issue Oct 3, 2023 · 8 comments
Assignees

Comments

@bartlomiejgrzeskowiak
Copy link

bartlomiejgrzeskowiak commented Oct 3, 2023

What is the acceptable input buffer size for CRC operation ?

I play with different sizes of CRC buffer.
DML Lib does accept different sizes, but it behaves with error or even segmentation fault in some cases.

Example execution:

[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_1KB hardware_path
The example will be run on the hardware path.
Starting CRC job example.
Caclulating CRC for region of size 1KB.
Calculated CRC is: 0x2cdf6e8f
Finished successfully.
[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_4MB hardware_path
The example will be run on the hardware path.
Starting CRC job example.
Caclulating CRC for region of size 4MB.
An error (15) occured during job execution.
[bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_16MB hardware_path
Segmentation fault (core dumped)
[bgrzesko@fl31ca105bs0411 build]$ 

How to reproduce (diff -> apply and compile example) :

[bgrzesko@fl31ca105bs0411 build]$ git diff
diff --git a/examples/low-level-api/crc_example.c b/examples/low-level-api/crc_example.c
index 3c12df2..ad03704 100644
--- a/examples/low-level-api/crc_example.c
+++ b/examples/low-level-api/crc_example.c
@@ -9,7 +9,8 @@
 #include "dml/dml.h"
 #include "examples_utils.h"
 
-#define BUFFER_SIZE 1024 // 1 KB
+//#define BUFFER_SIZE 4 * 1024 * 1024 // 4 MB
+#define BUFFER_SIZE 16 * 1024 * 1024 // 16 MB
 
 /*
 * This example demonstrates how to create and run a crc operation.
@mzhukova
Copy link
Contributor

mzhukova commented Oct 9, 2023

Hi @bartlomiejgrzeskowiak,
This is for hardware_path only, correct?
Could you please check what is the max_transfer_size setting that you have set for the WQ? (accel-config list | grep transfer)

@bartlomiejgrzeskowiak
Copy link
Author

Hi @bartlomiejgrzeskowiak, This is for hardware_path only, correct?
It is most probably 'software path since'. I am using crc_example.c where DML_PATH_SW is set.

dml_path_t execution_path = DML_PATH_SW;

Could you please check what is the max_transfer_size setting that you have set for the WQ? (accel-config list | grep transfer)

[bgrzesko@fl31ca105bs0411 ~]$ accel-config list | grep transfer
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,

@mzhukova
Copy link
Contributor

hi @bartlomiejgrzeskowiak,
sorry I was not clear, I believe you were running with hardware_path (meaning using DSA for execution), since the line is [bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_4MB hardware_path, I was just trying to double check whether you observed the similar error on software_path/DML_PATH_SW as well.

Let me try to reproduce on my side and I'll get back to you.

@bartlomiejgrzeskowiak
Copy link
Author

hi @bartlomiejgrzeskowiak, sorry I was not clear, I believe you were running with hardware_path (meaning using DSA for execution), since the line is [bgrzesko@fl31ca105bs0411 build]$ ./examples/low-level-api/ll_crc_example_4MB hardware_path, I was just trying to double check whether you observed the similar error on software_path/DML_PATH_SW as well.

Let me try to reproduce on my side and I'll get back to you.

Hi @mzhukova ,

You're totally right. I was executing HW PATH.
Sorry for misleading you, it was some time ago and I did not noticed that the argument does overwrite the path.

Please let me know if you're able to reproduce the issue.,

BR
Bartek

@abdelrahim-hentabli
Copy link
Contributor

abdelrahim-hentabli commented Oct 11, 2023

Hey @bartlomiejgrzeskowiak , 16 MB is too large to be allocated on the stack. If you wanted to use a 16MB example, you would need to use malloc()

Simple godbolt example for large allocation: https://godbolt.org/z/Ts9xndqcq
Quick reference I found for size of stack on linux being somewhere between 8-10MB: https://unix.stackexchange.com/questions/473416/why-on-modern-linux-the-default-stack-size-is-so-huge-8mb-even-10-on-some-di

@abdelrahim-hentabli abdelrahim-hentabli self-assigned this Oct 11, 2023
@abdelrahim-hentabli
Copy link
Contributor

Hi @bartlomiejgrzeskowiak, This is for hardware_path only, correct?
It is most probably 'software path since'. I am using crc_example.c where DML_PATH_SW is set.

dml_path_t execution_path = DML_PATH_SW;

Could you please check what is the max_transfer_size setting that you have set for the WQ? (accel-config list | grep transfer)

[bgrzesko@fl31ca105bs0411 ~]$ accel-config list | grep transfer
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
    "max_transfer_size":2147483648,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,
            "max_transfer_size":2097152,

Hey @bartlomiejgrzeskowiak , it seems that your workqueue's max_transfer_size is 2 MB (2097152 bytes), which would explain the 4 MB example issue

@bartlomiejgrzeskowiak
Copy link
Author

bartlomiejgrzeskowiak commented Oct 12, 2023

Hi @abdelrahim-hentabli ,

Ok, but:

  1. Max_transfer_size can be configured by system admin, so I might not know it by heart. How can I get this value in my code ? Which API function does return max_transfer_size ?
  2. What about 16MB ? The lib or example should never crash I suppose ?

@abdelrahim-hentabli
Copy link
Contributor

Hey @bartlomiejgrzeskowiak

  1. Currently DML does not have an API to get the max_transfer_size. You would need to use libaccel-config's API to get these values accfg_wq_get_max_transfer_size()
  2. Please see my comment from above: Unknown buffer size limitation for CRC operation #36 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants