Unable to cast Python instance to C++ type of TensorRT 8.4 when running INT8 calibration on GPU A100 #3871

yjiangling · 2024-05-16T10:47:42Z

When I try to conduct INT8 quantilization in Python, it always give the following error during the calibration procedure:

[05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2904, GPU 74855 (MiB)
[05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2904, GPU 74863 (MiB)
[05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2904, GPU 74839 (MiB)
[05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2904, GPU 74847 (MiB)
[05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +16, now: CPU 130, GPU 272 (MiB)
[05/16/2024-18:22:28] [TRT] [I] Starting Calibration.
[ERROR] Exception caught in get_batch(): Unable to cast Python instance to C++ type (compile in debug mode for details)
[05/16/2024-18:22:30] [TRT] [I] Post Processing Calibration data in 2.704e-06 seconds.
[05/16/2024-18:22:30] [TRT] [E] 1: Unexpected exception _Map_base::at
Failed to create the engine

How can I fix it? The get_batch() function in Calibration instance are programmed like this:

class ASRCalibrator(trt.IInt8EntropyCalibrator2):
	def __init__(self, calibration_files=[], batch_size=1, cache_file="", preprocess_func=None):
		super().__init__()
		self.cache_file = cache_file
		self.batch_size = batch_size
		self.files = calibration_files
		self.batch = (None, None)

		self.batches = self.load_batches()
		self.preprocess_func = preprocess_func

	def get_batch_size(self):
		return self.batch_size

	def load_batches(self):
		for filename in self.files:
			self.batch = self.preprocess_func(filename)
			yield self.batch

	def get_batch(self, names):
		try:
			batch = next(self.batches)
			data, data_len = batch

			device_input0 = cuda.mem_alloc(data.nbytes)
			device_input1 = cuda.mem_alloc(data_len.nbytes)

			# 把校准数据从CPU搬运到GPU中
			cuda.memcpy_htod(device_input0, data.ravel())
			cuda.memcpy_htod(device_input1, data_len.ravel())

			return [(device_input0, data.shape), (device_input1, data_len.shape)]

		except StopIteration:
			return []

	def read_calibration_cache(self):
		# 如果校准表文件存在则直接从其中读取校准表
		if os.path.exists(self.cache_file):
			with open(self.cache_file, "rb") as f:
				return f.read()

	def write_calibration_cache(self, cache):
		# 如果进行了校准，则把校准表写入文件中以便下次使用
		with open(self.cache_file, "wb") as f:
			f.write(cache)
			f.flush()

The text was updated successfully, but these errors were encountered:

yjiangling · 2024-05-17T09:48:50Z

@rmccorm4 Hi, I write the get_batch() function followed by your instruction in issue: https://github.com/NVIDIA/TensorRT/issues/688, but it still get the Error: RuntimeError: Unable to cast Python instance to C++ type (compile in debug mode for details), could you please help me to checkout what's wrong? Thank you very much!

yjiangling closed this as completed May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to cast Python instance to C++ type of TensorRT 8.4 when running INT8 calibration on GPU A100 #3871

Unable to cast Python instance to C++ type of TensorRT 8.4 when running INT8 calibration on GPU A100 #3871

yjiangling commented May 16, 2024

yjiangling commented May 17, 2024

Unable to cast Python instance to C++ type of TensorRT 8.4 when running INT8 calibration on GPU A100 #3871

Unable to cast Python instance to C++ type of TensorRT 8.4 when running INT8 calibration on GPU A100 #3871

Comments

yjiangling commented May 16, 2024

yjiangling commented May 17, 2024