Batchsize=40 failure of TensorRT 8.6.1 when running transformers on GPU A30 #3855

cwlseu · 2024-05-11T03:33:56Z

Description

When the input batchsize is large(such as 40, 512, 1024), the output of the model is inconsistent with onnxrt.

In addition, our test found that when batchsize=32, the difference in results is tolerable, but when batchsize is larger, the output of the model is inconsistent. such as batchsize=40, sequence length=40

Environment

Base Docker Image: nvcr.io/nvidia/tensorrt:24.04-py3

TensorRT Version: 8.6.2

NVIDIA GPU: A30

NVIDIA Driver Version:535.104.12

CUDA Version: cuda-12.4

CUDNN Version:

Operating System: Ubuntu

Python Version (if applicable): 3.10

Tensorflow Version (if applicable):

PyTorch Version (if applicable): 2.1.0

Baremetal or Container (if so, version):
based on nvcr.io/nvidia/tensorrt:24.04-py3 also replay the issue.

issue

[I] RUNNING | Command: /usr/local/bin/polygraphy run opset_18_40_s40.onnx --onnxrt --save-outputs debug_output.json
[I] onnxrt-runner-N0-05/11/24-03:02:24  | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[I] onnxrt-runner-N0-05/11/24-03:02:24
    ---- Inference Input(s) ----
    {concat_ids [dtype=int32, shape=(40, 40)],
     concat_mask [dtype=bool, shape=(40, 40)],
     sent_ids [dtype=int32, shape=(40, 40)],
     pos_ids [dtype=int32, shape=(40, 40)]}
[I] onnxrt-runner-N0-05/11/24-03:02:24
    ---- Inference Output(s) ----
    {score [dtype=float32, shape=(40,)]}
[I] onnxrt-runner-N0-05/11/24-03:02:24  | Completed 1 iteration(s) in 32.39 ms | Average inference time: 32.39 ms.
[I] Saving inference results to debug_output.json
[I] PASSED | Runtime: 1.813s | Command: /usr/local/bin/polygraphy run cross_model_opset_18_40_s40.onnx --onnxrt --save-outputs debug_output.json
[I] RUNNING | Command: /usr/local/bin/polygraphy debug build cross_model_opset_18_40_s40.onnx --fp16 --artifacts-dir replays --artifacts replay.json --until=1 --check polygraphy run polygraphy_debug.engine --trt --load-outputs debug_output.json
[W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I]     RUNNING | Iteration 1
[I]     Configuring with profiles:[
            Profile 0:
                {concat_ids [min=[40, 40], opt=[40, 40], max=[40, 40]],
                 concat_mask [min=[40, 40], opt=[40, 40], max=[40, 40]],
                 sent_ids [min=[40, 40], opt=[40, 40], max=[40, 40]],
                 pos_ids [min=[40, 40], opt=[40, 40], max=[40, 40]]}
        ]
[I]     Building engine with configuration:
        Flags                  | [FP16]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 24062.19 MiB, TACTIC_DRAM: 24062.19 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[W]     TensorRT encountered issues when converting weights between types and that could affect accuracy.
[W]     If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[W]     Check verbose logs for the list of affected weights.
[W]     - 32 weights are affected by this issue: Detected subnormal FP16 values.
[W]     - 10 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[I]     Finished engine building in 94.185 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs debug_output.json
[I]     ========== CAPTURED STDOUT ==========
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs debug_output.json
        [I] trt-runner-N0-05/11/24-03:04:45     | Activating and starting inference
        [I] Loading bytes from /relevance-match/convert/polygraphy_debug.engine
        [I] trt-runner-N0-05/11/24-03:04:45
            ---- Inference Input(s) ----
            {concat_ids [dtype=int32, shape=(40, 40)],
             concat_mask [dtype=bool, shape=(40, 40)],
             sent_ids [dtype=int32, shape=(40, 40)],
             pos_ids [dtype=int32, shape=(40, 40)]}
        [I] trt-runner-N0-05/11/24-03:04:45
            ---- Inference Output(s) ----
            {score [dtype=float32, shape=(40,)]}
        [I] trt-runner-N0-05/11/24-03:04:45     | Completed 1 iteration(s) in 1285 ms | Average inference time: 1285 ms.
        [I] Loading inference results from debug_output.json
        [I] Accuracy Comparison | trt-runner-N0-05/11/24-03:04:45 vs. onnxrt-runner-N0-05/11/24-03:02:24
        [I]     Comparing Output: 'score' (dtype=float32, shape=(40,)) with 'score' (dtype=float32, shape=(40,))
        [I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-05/11/24-03:04:45: score | Stats: mean=0.89683, std-dev=0.096175, var=0.0092497, median=0.92944, min=0.52441 at (20,), max=0.99219 at (12,), avg-magnitude=0.89683
        [I]             ---- Values ----
                            [0.9667969  0.9243164  0.921875   0.9506836  0.9838867  0.81591797
                             0.93359375 0.93115234 0.9375     0.95166016 0.93847656 0.9482422
                             0.9921875  0.9277344  0.96875    0.88964844 0.8774414  0.9604492
                             0.9013672  0.9614258  0.52441406 0.96240234 0.98291016 0.93603516
                             0.97802734 0.734375   0.8618164  0.67822266 0.91015625 0.68359375
                             0.80371094 0.8535156  0.83984375 0.9194336  0.96533203 0.94970703
                             0.9003906  0.9790039  0.88427734 0.84277344]
        [I]             ---- Histogram ----
                        Bin Range      |  Num Elems | Visualization
                        (0.524, 0.572) |          1 | ##
                        (0.572, 0.62 ) |          0 |
                        (0.62 , 0.667) |          0 |
                        (0.667, 0.715) |          2 | #####
                        (0.715, 0.762) |          1 | ##
                        (0.762, 0.81 ) |          1 | ##
                        (0.81 , 0.857) |          4 | ###########
                        (0.857, 0.905) |          6 | #################
                        (0.905, 0.952) |         14 | ########################################
                        (0.952, 1    ) |         11 | ###############################
        [I]         onnxrt-runner-N0-05/11/24-03:02:24: score | Stats: mean=0.95434, std-dev=0.046837, var=0.0021937, median=0.97326, min=0.82633 at (27,), max=0.99999 at (16,), avg-magnitude=0.95434
        [I]             ---- Values ----
                            [0.99522805 0.9172077  0.94710356 0.9738695  0.99986017 0.90418845
                             0.95747596 0.95086133 0.9116657  0.9969764  0.9866816  0.85559285
                             0.9961591  0.985543   0.9866173  0.90548503 0.9999934  0.99338365
                             0.9253262  0.97857046 0.96708155 0.9988443  0.9822408  0.982418
                             0.96222353 0.93281376 0.8284521  0.8263327  0.97884107 0.8802821
                             0.9248937  0.92526066 0.9966129  0.9726457  0.992773   0.9534787
                             0.9908974  0.9937494  0.9982102  0.91759574]
        [I]             ---- Histogram ----
                        Bin Range      |  Num Elems | Visualization
                        (0.524, 0.572) |          0 |
                        (0.572, 0.62 ) |          0 |
                        (0.62 , 0.667) |          0 |
                        (0.667, 0.715) |          0 |
                        (0.715, 0.762) |          0 |
                        (0.762, 0.81 ) |          0 |
                        (0.81 , 0.857) |          3 | ####
                        (0.857, 0.905) |          2 | ###
                        (0.905, 0.952) |         10 | ################
                        (0.952, 1    ) |         25 | ########################################
        [I]         Error Metrics: score
        [I]             Minimum Required Tolerance: elemwise error | [abs=0.44267] OR [rel=0.45774] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=0.066281, std-dev=0.079112, var=0.0062587, median=0.034903, min=0.00066936 at (22,), max=0.44267 at (20,), avg-magnitude=0.066281
        [I]                 ---- Values ----
                                [0.02843118 0.00710869 0.02522856 0.02318591 0.01597345 0.08827049
                                 0.02388221 0.01970899 0.02583432 0.04531622 0.04820502 0.09264934
                                 0.00397158 0.05780864 0.01786733 0.0158366  0.12255198 0.03293443
                                 0.02395904 0.01714468 0.44266748 0.03644198 0.00066936 0.04638284
                                 0.01580381 0.19843876 0.0333643  0.14811003 0.06868482 0.19668835
                                 0.12118274 0.07174504 0.15676916 0.05321211 0.02744097 0.00377166
                                 0.09050679 0.01474547 0.11393285 0.07482231]
        [I]                 ---- Histogram ----
                            Bin Range          |  Num Elems | Visualization
                            (0.000669, 0.0449) |         21 | ########################################
                            (0.0449  , 0.0891) |          9 | #################
                            (0.0891  , 0.133 ) |          5 | #########
                            (0.133   , 0.177 ) |          2 | ###
                            (0.177   , 0.222 ) |          2 | ###
                            (0.222   , 0.266 ) |          0 |
                            (0.266   , 0.31  ) |          0 |
                            (0.31    , 0.354 ) |          0 |
                            (0.354   , 0.398 ) |          0 |
                            (0.398   , 0.443 ) |          1 | #
        [I]             Relative Difference | Stats: mean=0.070319, std-dev=0.083742, var=0.0070127, median=0.038379, min=0.00068146 at (22,), max=0.45774 at (20,), avg-magnitude=0.070319
        [I]                 ---- Values ----
                                [0.0285675  0.00775036 0.02663759 0.02380802 0.01597568 0.09762399
                                 0.02494288 0.02072751 0.0283375  0.04545365 0.0488557  0.10828672
                                 0.00398689 0.05865664 0.01810968 0.01748963 0.12255279 0.03315378
                                 0.02589253 0.01752013 0.45773542 0.03648414 0.00068146 0.04721294
                                 0.01642426 0.21273139 0.04027305 0.17923777 0.07016953 0.22343786
                                 0.13102342 0.07754035 0.15730195 0.05470862 0.02764072 0.00395569
                                 0.0913382  0.01483822 0.11413713 0.08154169]
        [I]                 ---- Histogram ----
                            Bin Range          |  Num Elems | Visualization
                            (0.000681, 0.0464) |         22 | ########################################
                            (0.0464  , 0.0921) |          8 | ##############
                            (0.0921  , 0.138 ) |          5 | #########
                            (0.138   , 0.184 ) |          2 | ###
                            (0.184   , 0.229 ) |          2 | ###
                            (0.229   , 0.275 ) |          0 |
                            (0.275   , 0.321 ) |          0 |
                            (0.321   , 0.366 ) |          0 |
                            (0.366   , 0.412 ) |          0 |
                            (0.412   , 0.458 ) |          1 | #
        [E]         FAILED | Output: 'score' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
        [E]     FAILED | Mismatched outputs: ['score']
        [E] Accuracy Summary | trt-runner-N0-05/11/24-03:04:45 vs. onnxrt-runner-N0-05/11/24-03:02:24 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 13.374s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs debug_output.json
[E]     ========== CAPTURED STDERR ==========
[E]     Artifact: replay.json does not exist, skipping.
        Was the artifact supposed to be generated?
[E]     FAILED | Iteration 1 | Duration 110.64828515052795s
[I] Finished 1 iteration(s) | Passed: 0/1 | Pass Rate: 0.0%

Steps To Reproduce

download model: https://drive.google.com/file/d/1-ivPE2ZaQESiQbdRxWwyYa7uQfADyVOy/view?usp=sharing
polygraphy run opset_18_40_s40.onnx --onnxrt --save-outputs debug_output.json
polygraphy debug build opset_18_40_s40.onnx --fp16 --artifacts-dir replays --artifacts replay.json --until=1 --check polygraphy run polygraphy_debug.engine --trt --load-outputs debug_output.json

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-05-13T14:47:22Z

abs_dif_max =0.44267

use follow cmd to compare

min_shapes= 
opt_shapes=
max_shapes=  
cur_shapes=   
    
polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes  $min_shapes    \
--trt-opt-shapes  $opt_shapes  \
--trt-max-shapes  $max_shapes  \
--input-shapes   $cur_shapes

zerollzeng · 2024-05-17T11:54:08Z

Sorry for the late reply, please also check the latest TRT 10 release.

zerollzeng self-assigned this May 17, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batchsize=40 failure of TensorRT 8.6.1 when running transformers on GPU A30 #3855

Batchsize=40 failure of TensorRT 8.6.1 when running transformers on GPU A30 #3855

cwlseu commented May 11, 2024

lix19937 commented May 13, 2024 •

edited

zerollzeng commented May 17, 2024

Batchsize=40 failure of TensorRT 8.6.1 when running transformers on GPU A30 #3855

Batchsize=40 failure of TensorRT 8.6.1 when running transformers on GPU A30 #3855

Comments

cwlseu commented May 11, 2024

Description

Environment

issue

Steps To Reproduce

lix19937 commented May 13, 2024 • edited

zerollzeng commented May 17, 2024

lix19937 commented May 13, 2024 •

edited