Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batchsize=40 failure of TensorRT 8.6.1 when running transformers on GPU A30 #3855

Open
cwlseu opened this issue May 11, 2024 · 2 comments
Open
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@cwlseu
Copy link

cwlseu commented May 11, 2024

Description

When the input batchsize is large(such as 40, 512, 1024), the output of the model is inconsistent with onnxrt.

In addition, our test found that when batchsize=32, the difference in results is tolerable, but when batchsize is larger, the output of the model is inconsistent. such as batchsize=40, sequence length=40

Environment

Base Docker Image: nvcr.io/nvidia/tensorrt:24.04-py3

TensorRT Version: 8.6.2

NVIDIA GPU: A30

NVIDIA Driver Version:535.104.12

CUDA Version: cuda-12.4

CUDNN Version:

Operating System: Ubuntu

Python Version (if applicable): 3.10

Tensorflow Version (if applicable):

PyTorch Version (if applicable): 2.1.0

Baremetal or Container (if so, version):
based on nvcr.io/nvidia/tensorrt:24.04-py3 also replay the issue.

issue

[I] RUNNING | Command: /usr/local/bin/polygraphy run opset_18_40_s40.onnx --onnxrt --save-outputs debug_output.json
[I] onnxrt-runner-N0-05/11/24-03:02:24  | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[I] onnxrt-runner-N0-05/11/24-03:02:24
    ---- Inference Input(s) ----
    {concat_ids [dtype=int32, shape=(40, 40)],
     concat_mask [dtype=bool, shape=(40, 40)],
     sent_ids [dtype=int32, shape=(40, 40)],
     pos_ids [dtype=int32, shape=(40, 40)]}
[I] onnxrt-runner-N0-05/11/24-03:02:24
    ---- Inference Output(s) ----
    {score [dtype=float32, shape=(40,)]}
[I] onnxrt-runner-N0-05/11/24-03:02:24  | Completed 1 iteration(s) in 32.39 ms | Average inference time: 32.39 ms.
[I] Saving inference results to debug_output.json
[I] PASSED | Runtime: 1.813s | Command: /usr/local/bin/polygraphy run cross_model_opset_18_40_s40.onnx --onnxrt --save-outputs debug_output.json
[I] RUNNING | Command: /usr/local/bin/polygraphy debug build cross_model_opset_18_40_s40.onnx --fp16 --artifacts-dir replays --artifacts replay.json --until=1 --check polygraphy run polygraphy_debug.engine --trt --load-outputs debug_output.json
[W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I]     RUNNING | Iteration 1
[I]     Configuring with profiles:[
            Profile 0:
                {concat_ids [min=[40, 40], opt=[40, 40], max=[40, 40]],
                 concat_mask [min=[40, 40], opt=[40, 40], max=[40, 40]],
                 sent_ids [min=[40, 40], opt=[40, 40], max=[40, 40]],
                 pos_ids [min=[40, 40], opt=[40, 40], max=[40, 40]]}
        ]
[I]     Building engine with configuration:
        Flags                  | [FP16]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 24062.19 MiB, TACTIC_DRAM: 24062.19 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[W]     TensorRT encountered issues when converting weights between types and that could affect accuracy.
[W]     If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[W]     Check verbose logs for the list of affected weights.
[W]     - 32 weights are affected by this issue: Detected subnormal FP16 values.
[W]     - 10 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[I]     Finished engine building in 94.185 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs debug_output.json
[I]     ========== CAPTURED STDOUT ==========
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs debug_output.json
        [I] trt-runner-N0-05/11/24-03:04:45     | Activating and starting inference
        [I] Loading bytes from /relevance-match/convert/polygraphy_debug.engine
        [I] trt-runner-N0-05/11/24-03:04:45
            ---- Inference Input(s) ----
            {concat_ids [dtype=int32, shape=(40, 40)],
             concat_mask [dtype=bool, shape=(40, 40)],
             sent_ids [dtype=int32, shape=(40, 40)],
             pos_ids [dtype=int32, shape=(40, 40)]}
        [I] trt-runner-N0-05/11/24-03:04:45
            ---- Inference Output(s) ----
            {score [dtype=float32, shape=(40,)]}
        [I] trt-runner-N0-05/11/24-03:04:45     | Completed 1 iteration(s) in 1285 ms | Average inference time: 1285 ms.
        [I] Loading inference results from debug_output.json
        [I] Accuracy Comparison | trt-runner-N0-05/11/24-03:04:45 vs. onnxrt-runner-N0-05/11/24-03:02:24
        [I]     Comparing Output: 'score' (dtype=float32, shape=(40,)) with 'score' (dtype=float32, shape=(40,))
        [I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-05/11/24-03:04:45: score | Stats: mean=0.89683, std-dev=0.096175, var=0.0092497, median=0.92944, min=0.52441 at (20,), max=0.99219 at (12,), avg-magnitude=0.89683
        [I]             ---- Values ----
                            [0.9667969  0.9243164  0.921875   0.9506836  0.9838867  0.81591797
                             0.93359375 0.93115234 0.9375     0.95166016 0.93847656 0.9482422
                             0.9921875  0.9277344  0.96875    0.88964844 0.8774414  0.9604492
                             0.9013672  0.9614258  0.52441406 0.96240234 0.98291016 0.93603516
                             0.97802734 0.734375   0.8618164  0.67822266 0.91015625 0.68359375
                             0.80371094 0.8535156  0.83984375 0.9194336  0.96533203 0.94970703
                             0.9003906  0.9790039  0.88427734 0.84277344]
        [I]             ---- Histogram ----
                        Bin Range      |  Num Elems | Visualization
                        (0.524, 0.572) |          1 | ##
                        (0.572, 0.62 ) |          0 |
                        (0.62 , 0.667) |          0 |
                        (0.667, 0.715) |          2 | #####
                        (0.715, 0.762) |          1 | ##
                        (0.762, 0.81 ) |          1 | ##
                        (0.81 , 0.857) |          4 | ###########
                        (0.857, 0.905) |          6 | #################
                        (0.905, 0.952) |         14 | ########################################
                        (0.952, 1    ) |         11 | ###############################
        [I]         onnxrt-runner-N0-05/11/24-03:02:24: score | Stats: mean=0.95434, std-dev=0.046837, var=0.0021937, median=0.97326, min=0.82633 at (27,), max=0.99999 at (16,), avg-magnitude=0.95434
        [I]             ---- Values ----
                            [0.99522805 0.9172077  0.94710356 0.9738695  0.99986017 0.90418845
                             0.95747596 0.95086133 0.9116657  0.9969764  0.9866816  0.85559285
                             0.9961591  0.985543   0.9866173  0.90548503 0.9999934  0.99338365
                             0.9253262  0.97857046 0.96708155 0.9988443  0.9822408  0.982418
                             0.96222353 0.93281376 0.8284521  0.8263327  0.97884107 0.8802821
                             0.9248937  0.92526066 0.9966129  0.9726457  0.992773   0.9534787
                             0.9908974  0.9937494  0.9982102  0.91759574]
        [I]             ---- Histogram ----
                        Bin Range      |  Num Elems | Visualization
                        (0.524, 0.572) |          0 |
                        (0.572, 0.62 ) |          0 |
                        (0.62 , 0.667) |          0 |
                        (0.667, 0.715) |          0 |
                        (0.715, 0.762) |          0 |
                        (0.762, 0.81 ) |          0 |
                        (0.81 , 0.857) |          3 | ####
                        (0.857, 0.905) |          2 | ###
                        (0.905, 0.952) |         10 | ################
                        (0.952, 1    ) |         25 | ########################################
        [I]         Error Metrics: score
        [I]             Minimum Required Tolerance: elemwise error | [abs=0.44267] OR [rel=0.45774] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=0.066281, std-dev=0.079112, var=0.0062587, median=0.034903, min=0.00066936 at (22,), max=0.44267 at (20,), avg-magnitude=0.066281
        [I]                 ---- Values ----
                                [0.02843118 0.00710869 0.02522856 0.02318591 0.01597345 0.08827049
                                 0.02388221 0.01970899 0.02583432 0.04531622 0.04820502 0.09264934
                                 0.00397158 0.05780864 0.01786733 0.0158366  0.12255198 0.03293443
                                 0.02395904 0.01714468 0.44266748 0.03644198 0.00066936 0.04638284
                                 0.01580381 0.19843876 0.0333643  0.14811003 0.06868482 0.19668835
                                 0.12118274 0.07174504 0.15676916 0.05321211 0.02744097 0.00377166
                                 0.09050679 0.01474547 0.11393285 0.07482231]
        [I]                 ---- Histogram ----
                            Bin Range          |  Num Elems | Visualization
                            (0.000669, 0.0449) |         21 | ########################################
                            (0.0449  , 0.0891) |          9 | #################
                            (0.0891  , 0.133 ) |          5 | #########
                            (0.133   , 0.177 ) |          2 | ###
                            (0.177   , 0.222 ) |          2 | ###
                            (0.222   , 0.266 ) |          0 |
                            (0.266   , 0.31  ) |          0 |
                            (0.31    , 0.354 ) |          0 |
                            (0.354   , 0.398 ) |          0 |
                            (0.398   , 0.443 ) |          1 | #
        [I]             Relative Difference | Stats: mean=0.070319, std-dev=0.083742, var=0.0070127, median=0.038379, min=0.00068146 at (22,), max=0.45774 at (20,), avg-magnitude=0.070319
        [I]                 ---- Values ----
                                [0.0285675  0.00775036 0.02663759 0.02380802 0.01597568 0.09762399
                                 0.02494288 0.02072751 0.0283375  0.04545365 0.0488557  0.10828672
                                 0.00398689 0.05865664 0.01810968 0.01748963 0.12255279 0.03315378
                                 0.02589253 0.01752013 0.45773542 0.03648414 0.00068146 0.04721294
                                 0.01642426 0.21273139 0.04027305 0.17923777 0.07016953 0.22343786
                                 0.13102342 0.07754035 0.15730195 0.05470862 0.02764072 0.00395569
                                 0.0913382  0.01483822 0.11413713 0.08154169]
        [I]                 ---- Histogram ----
                            Bin Range          |  Num Elems | Visualization
                            (0.000681, 0.0464) |         22 | ########################################
                            (0.0464  , 0.0921) |          8 | ##############
                            (0.0921  , 0.138 ) |          5 | #########
                            (0.138   , 0.184 ) |          2 | ###
                            (0.184   , 0.229 ) |          2 | ###
                            (0.229   , 0.275 ) |          0 |
                            (0.275   , 0.321 ) |          0 |
                            (0.321   , 0.366 ) |          0 |
                            (0.366   , 0.412 ) |          0 |
                            (0.412   , 0.458 ) |          1 | #
        [E]         FAILED | Output: 'score' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
        [E]     FAILED | Mismatched outputs: ['score']
        [E] Accuracy Summary | trt-runner-N0-05/11/24-03:04:45 vs. onnxrt-runner-N0-05/11/24-03:02:24 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 13.374s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs debug_output.json
[E]     ========== CAPTURED STDERR ==========
[E]     Artifact: replay.json does not exist, skipping.
        Was the artifact supposed to be generated?
[E]     FAILED | Iteration 1 | Duration 110.64828515052795s
[I] Finished 1 iteration(s) | Passed: 0/1 | Pass Rate: 0.0%

Steps To Reproduce

  1. download model: https://drive.google.com/file/d/1-ivPE2ZaQESiQbdRxWwyYa7uQfADyVOy/view?usp=sharing
  2. polygraphy run opset_18_40_s40.onnx --onnxrt --save-outputs debug_output.json
  3. polygraphy debug build opset_18_40_s40.onnx --fp16 --artifacts-dir replays --artifacts replay.json --until=1 --check polygraphy run polygraphy_debug.engine --trt --load-outputs debug_output.json
@lix19937
Copy link

lix19937 commented May 13, 2024

abs_dif_max =0.44267

use follow cmd to compare

min_shapes= 
opt_shapes=
max_shapes=  
cur_shapes=   
    
polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes  $min_shapes    \
--trt-opt-shapes  $opt_shapes  \
--trt-max-shapes  $max_shapes  \
--input-shapes   $cur_shapes 

@zerollzeng zerollzeng self-assigned this May 17, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label May 17, 2024
@zerollzeng
Copy link
Collaborator

Sorry for the late reply, please also check the latest TRT 10 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants