Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault happening inside docker container but not outside #12128

Open
subhankardori opened this issue May 16, 2024 · 3 comments
Open
Assignees

Comments

@subhankardori
Copy link

subhankardori commented May 16, 2024

Hi,
here's a very simple paddleocr code I am running outside the docker container:

`from paddleocr import PaddleOCR
from matplotlib import pyplot as plt
import cv2
import os
import numpy as np
import glob
from time import time

ocr_model = PaddleOCR(lang='en', use_angle_cls=True, use_gpu=True)

src_dir = 'abc'
dst_dir = 'xyz'

os.makedirs(dst_dir, exist_ok=True)

image_paths = glob.glob(os.path.join(src_dir, '*'))

for img_path in image_paths:

t1=time()
result = ocr_model.ocr(img_path)
t2=time()
print("PaddleOCR time:",(t2-t1))


boxes = [res[0] for res in result[0]]  
texts = [res[1][0] for res in result[0]]  
scores = [res[1][1] for res in result[0]]  


img = cv2.imread(img_path)
ann=img.copy()

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)


for box, text, score in zip(boxes, texts, scores):

    box = np.array(box).astype(int)
    cv2.polylines(ann, [box], isClosed=True, color=(0, 255, 0), thickness=2)
    

    label = f'{text} ({score:.2f})'
    
 
    cv2.putText(ann, label, (box[0][0], box[0][1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)


filename = os.path.basename(img_path)


cv2.imwrite(os.path.join(dst_dir, filename), ann)`

Upon execution of this code, I am running into no errors
[2024/05/16 16:27:43] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=True, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='/home/shubankar/.paddleocr/whl/det/en/en_PP-OCRv3_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='/home/shubankar/.paddleocr/whl/rec/en/en_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='/home/shubankar/Desktop/envdori/lib/python3.10/site-packages/paddleocr/ppocr/utils/en_dict.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=True, cls_model_dir='/home/shubankar/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, use_pdf2docx_api=False, invert=False, binarize=False, alphacolor=(255, 255, 255), lang='en', det=True, rec=True, type='ocr', ocr_version='PP-OCRv4', structure_version='PP-StructureV2') [2024/05/16 16:27:45] ppocr DEBUG: dt_boxes num : 20, elapsed : 0.26409912109375 [2024/05/16 16:27:45] ppocr DEBUG: cls num : 20, elapsed : 0.03380465507507324 [2024/05/16 16:27:45] ppocr DEBUG: rec_res num : 20, elapsed : 0.0963449478149414 PaddleOCR time: 0.4418179988861084 [2024/05/16 16:27:45] ppocr DEBUG: dt_boxes num : 26, elapsed : 0.037672996520996094 [2024/05/16 16:27:45] ppocr DEBUG: cls num : 26, elapsed : 0.01969003677368164 [2024/05/16 16:27:45] ppocr DEBUG: rec_res num : 26, elapsed : 0.10423398017883301 PaddleOCR time: 0.20465898513793945 [2024/05/16 16:27:45] ppocr DEBUG: dt_boxes num : 46, elapsed : 0.04460334777832031 [2024/05/16 16:27:45] ppocr DEBUG: cls num : 46, elapsed : 0.042574167251586914 [2024/05/16 16:27:46] ppocr DEBUG: rec_res num : 46, elapsed : 0.17322564125061035 PaddleOCR time: 0.30808138847351074 [2024/05/16 16:27:46] ppocr DEBUG: dt_boxes num : 29, elapsed : 0.03878664970397949 [2024/05/16 16:27:46] ppocr DEBUG: cls num : 29, elapsed : 0.030057191848754883 [2024/05/16 16:27:46] ppocr DEBUG: rec_res num : 29, elapsed : 0.11749863624572754 PaddleOCR time: 0.22779440879821777 [2024/05/16 16:27:46] ppocr DEBUG: dt_boxes num : 20, elapsed : 0.03751087188720703 [2024/05/16 16:27:46] ppocr DEBUG: cls num : 20, elapsed : 0.0183870792388916 [2024/05/16 16:27:46] ppocr DEBUG: rec_res num : 20, elapsed : 0.07560086250305176 PaddleOCR time: 0.17191839218139648 [2024/05/16 16:27:46] ppocr DEBUG: dt_boxes num : 36, elapsed : 0.038983821868896484 [2024/05/16 16:27:46] ppocr DEBUG: cls num : 36, elapsed : 0.029161930084228516 [2024/05/16 16:27:46] ppocr DEBUG: rec_res num : 36, elapsed : 0.15292072296142578 PaddleOCR time: 0.26297926902770996 [2024/05/16 16:27:47] ppocr DEBUG: dt_boxes num : 21, elapsed : 0.03756856918334961 [2024/05/16 16:27:47] ppocr DEBUG: cls num : 21, elapsed : 0.02182292938232422 [2024/05/16 16:27:47] ppocr DEBUG: rec_res num : 21, elapsed : 0.08386802673339844 PaddleOCR time: 0.18845915794372559 [2024/05/16 16:27:47] ppocr DEBUG: dt_boxes num : 19, elapsed : 0.046122074127197266 [2024/05/16 16:27:47] ppocr DEBUG: cls num : 19, elapsed : 0.02371668815612793 [2024/05/16 16:27:47] ppocr DEBUG: rec_res num : 19, elapsed : 0.07404541969299316 PaddleOCR time: 0.18595504760742188 [2024/05/16 16:27:47] ppocr DEBUG: dt_boxes num : 47, elapsed : 0.04474592208862305 [2024/05/16 16:27:47] ppocr DEBUG: cls num : 47, elapsed : 0.03702855110168457 [2024/05/16 16:27:47] ppocr DEBUG: rec_res num : 47, elapsed : 0.174943208694458 PaddleOCR time: 0.30758070945739746 [2024/05/16 16:27:48] ppocr DEBUG: dt_boxes num : 24, elapsed : 0.0385584831237793 [2024/05/16 16:27:48] ppocr DEBUG: cls num : 24, elapsed : 0.020543336868286133 [2024/05/16 16:27:48] ppocr DEBUG: rec_res num : 24, elapsed : 0.09856367111206055 PaddleOCR time: 0.19896602630615234 [2024/05/16 16:27:48] ppocr DEBUG: dt_boxes num : 27, elapsed : 0.0385439395904541 [2024/05/16 16:27:48] ppocr DEBUG: cls num : 27, elapsed : 0.023215293884277344 [2024/05/16 16:27:48] ppocr DEBUG: rec_res num : 27, elapsed : 0.10276460647583008 PaddleOCR time: 0.2052445411682129 [2024/05/16 16:27:48] ppocr DEBUG: dt_boxes num : 20, elapsed : 0.035872697830200195 [2024/05/16 16:27:48] ppocr DEBUG: cls num : 20, elapsed : 0.018585205078125 [2024/05/16 16:27:48] ppocr DEBUG: rec_res num : 20, elapsed : 0.07669782638549805 PaddleOCR time: 0.17434000968933105 [2024/05/16 16:27:48] ppocr DEBUG: dt_boxes num : 36, elapsed : 0.0416712760925293 [2024/05/16 16:27:48] ppocr DEBUG: cls num : 36, elapsed : 0.030147075653076172 [2024/05/16 16:27:49] ppocr DEBUG: rec_res num : 36, elapsed : 0.14519858360290527 PaddleOCR time: 0.25902247428894043 [2024/05/16 16:27:49] ppocr DEBUG: dt_boxes num : 24, elapsed : 0.0388178825378418 [2024/05/16 16:27:49] ppocr DEBUG: cls num : 24, elapsed : 0.021373271942138672 [2024/05/16 16:27:49] ppocr DEBUG: rec_res num : 24, elapsed : 0.10416460037231445 PaddleOCR time: 0.20946550369262695 [2024/05/16 16:27:49] ppocr DEBUG: dt_boxes num : 26, elapsed : 0.03809237480163574 [2024/05/16 16:27:49] ppocr DEBUG: cls num : 26, elapsed : 0.028120994567871094 [2024/05/16 16:27:49] ppocr DEBUG: rec_res num : 26, elapsed : 0.09642601013183594 PaddleOCR time: 0.20334506034851074 [2024/05/16 16:27:49] ppocr DEBUG: dt_boxes num : 24, elapsed : 0.03925800323486328 [2024/05/16 16:27:49] ppocr DEBUG: cls num : 24, elapsed : 0.019763946533203125 [2024/05/16 16:27:49] ppocr DEBUG: rec_res num : 24, elapsed : 0.09666180610656738 PaddleOCR time: 0.19680356979370117

However, when I am trying to replicate the same results inside the docker container, I am hitting the following error:
[2024/05/16 11:00:15] ppocr ERROR: error in loading image:abc/four-part.1.3.jpg Traceback (most recent call last): File "paddle_and_annotate.py", line 25, in <module> result = ocr_model.ocr(img_path) File "/usr/local/lib/python3.8/site-packages/paddleocr/paddleocr.py", line 660, in ocr img = preprocess_image(img) File "/usr/local/lib/python3.8/site-packages/paddleocr/paddleocr.py", line 650, in preprocess_image _image = alpha_to_color(_image, alpha_color) File "/usr/local/lib/python3.8/site-packages/paddleocr/ppocr/utils/utility.py", line 86, in alpha_to_color if len(img.shape) == 3 and img.shape[2] == 4: AttributeError: 'NoneType' object has no attribute 'shape'

so I tried restarting the docker, but got the following:

`root@4eb0df9924f1:/code/build/data# python3 paddle_and_annotate.py
[2024/05/16 11:00:36] ppocr DEBUG: Namespace(alpha=1.0, alphacolor=(255, 255, 255), benchmark=False, beta=1.0, binarize=False, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/code/build/data/3', cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./output', det=True, det_algorithm='DB', det_box_type='quad', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='/code/build/data/1', det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_id=0, gpu_mem=500, help='==SUPPRESS==', image_dir=None, image_orientation=False, invert=False, ir_optim=True, kie_algorithm='LayoutXLM', label_list=['0', '180'], lang='en', layout=True, layout_dict_path=None, layout_model_dir=None, layout_nms_threshold=0.5, layout_score_threshold=0.5, max_batch_size=10, max_text_length=25, merge_no_span_structure=True, min_subgraph_size=15, mode='structure', ocr=True, ocr_order_method=None, ocr_version='PP-OCRv4', output='./output', page_num=0, precision='fp32', process_id=0, re_model_dir=None, rec=True, rec_algorithm='SVTR_LCNet', rec_batch_num=6, rec_char_dict_path='/usr/local/lib/python3.8/site-packages/paddleocr/ppocr/utils/en_dict.txt', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_model_dir='/code/build/data/2', recovery=False, save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ser_model_dir=None, show_log=True, sr_batch_num=1, sr_image_shape='3, 32, 128', sr_model_dir=None, structure_version='PP-StructureV2', table=True, table_algorithm='TableAttn', table_char_dict_path=None, table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=True, use_dilation=False, use_gpu=True, use_mp=False, use_npu=False, use_onnx=False, use_pdf2docx_api=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, use_visual_backbone=True, use_xpu=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False)


C++ Traceback (most recent call last):

0 inflateReset2


Error Message Summary:

FatalError: Segmentation fault is detected by the operating system.
[TimeInfo: *** Aborted at 1715857238 (unix time) try "date -d @1715857238" if you are using GNU date ***]
[SignalInfo: *** SIGSEGV (@0x5895aaca3904) received by PID 94 (TID 0x709c6334fb80) from PID 18446744072279963908 ***]

Segmentation fault (core dumped)`

I have had experience with segfaults and I have seen that it usually resolves upon downgrading to previous library version and doing a hit and trial, that too didnt seem to work

Kindly help me troubleshoot this (language preference: English)

@UserWangZz
Copy link
Collaborator

[2024/05/16 11:00:15] ppocr ERROR: error in loading image:abc/four-part.1.3.jpg Traceback (most recent call last): File "paddle_and_annotate.py", line 25, in result = ocr_model.ocr(img_path) File "/usr/local/lib/python3.8/site-packages/paddleocr/paddleocr.py", line 660, in ocr img = preprocess_image(img) File "/usr/local/lib/python3.8/site-packages/paddleocr/paddleocr.py", line 650, in preprocess_image _image = alpha_to_color(_image, alpha_color) File "/usr/local/lib/python3.8/site-packages/paddleocr/ppocr/utils/utility.py", line 86, in alpha_to_color if len(img.shape) == 3 and img.shape[2] == 4: AttributeError: 'NoneType' object has no attribute 'shape'

from this error, make sure the image path that you give is correct, this error show's that paddleocr is not read the image correctly.

@subhankardori
Copy link
Author

understood and I was able to resolve it by changing the version of paddlepaddle-gpu to 2.5.2
can you please help me understand why am I seeing the following warning log possibly from PaddleOCR

W0517 10:00:34.561643 259 default_variables.cpp:95] Fail to fscanf: Success [0]

@UserWangZz
Copy link
Collaborator

This is a C++ error, seems to be a warning about file reading, more context is needed to determine the specific problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants