Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault while training #70

Open
WEIIEW97 opened this issue Sep 2, 2022 · 6 comments
Open

Segmentation fault while training #70

WEIIEW97 opened this issue Sep 2, 2022 · 6 comments

Comments

@WEIIEW97
Copy link

WEIIEW97 commented Sep 2, 2022

It occured 'Segmentation fault (core dumped)' for cpu version and 'cudaCheckError() failed : invalid device function. Segmentation fault (core dumped)' for CUDA version every time when I trained this network. How could it be solved? Thanks in advance.
error

@ironheads
Copy link

same problem

@WEIIEW97
Copy link
Author

same problem

I fixed this problem by transferring the original CUDA code from c++ to python(numba.cuda). Maybe it avoids some complying bugs. (I tried to use pybind11 but it failed as well)

@ironheads
Copy link

ironheads commented Sep 13, 2022

@WEIIEW97
hey, sir.
Actually i dive in the cuda c++ file. Then i found that the input image Tensor should be in size [CHANNEL_SIZE(3), BATCH_SIZE,WIDTH,HEIGHT]. However, in the python code , the input image is [BATCH_SIZE, CHANNEL_SIZE(3), WIDTH, HEIGHT]. So the rgb value is not correct. and another thing is that you should make the image pixel value between [0,1].

Actually, i was using this code because another work depends on it. and i use it to re-implement that work. So i didn't dive in the python code of this work. But after i change the compose procedure(compose different LUTs result) and the TrilinearInterpolationFunction of the python code. In addition, make sure the values of the image tensor is in [0,1]. It seems the code of that work (which depends on this code) can run normally.

I changed some code in c++ cuda, but i'm not sure whether it helps.

I'd like to know whether you fixed this bug in your python(numba.cuda) code. And it's better if you can share your numbda.cuda file.

Looking forward to your reply.

Thanks.
trilinear_cpp.zip

@WEIIEW97
Copy link
Author

@ironheads
hello,

Thanks for your response and sharing! I did not dive deep as you did thus I'd like to appreciate your information.
Attached below is my python(numba.cuda) code. I cannot guarantee robustness since it is just a basic migration.

trilinearcuda.zip

@ironheads
Copy link

ironheads commented Sep 16, 2022

@WEIIEW97
Thanks for your sharing.
If It is a basic migration. I think there are some codes need to modify in the original training/evaluation python code.
Some Modifications show as following.

# LUT0 = Generator3DLUT_identity()
# LUT1 = Generator3DLUT_zero()
# LUT2 = Generator3DLUT_zero()
#...
#...
# img = some images in  the dataset whose shape is [batch_size,3,width,height], make sure that the values are in range [0,1] (My segmentation fault comes from this)
# the following code comes from image_adaptive_lut_train_paired.py
pred = classifier(img).squeeze() # the img is still [batch_size,3,width,height]
# then you should modify the codes as following
new_img = img.permute(1,0,2,3).contiguous()
gen_A0 = LUT0(new_img)
gen_A1 = LUT1(new_img)
gen_A2 = LUT2(new_img)
combine_A = new_img.new(new_img.size())
    for b in range(new_img.size(1)):
        combine_A[:,b,:,:] = pred[b,0] * gen_A0[:,b,:,:] + pred[b,1] * gen_A1[:,b,:,:] + pred[b,2] * gen_A2[:,b,:,:] #+ pred[b,3] * gen_A3[:,b,:,:] + pred[b,4] * gen_A4[:,b,:,:]

result_A = combine_A.permute(1,0,2,3) #get the [batch_size,3,width,height] combined image
# the key is make the LUT's input in shape [3,batch_size,width,height], there maybe some other codes need to be modified when using LUT. I don't list them all because i do not use all the python codes in this work.

# another important modification is TrilinearInterpolationFunction, it should be modified as following.
class TrilinearInterpolationFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, lut, x):
        x = x.contiguous()
        output = x.new(x.size())
        dim = lut.size()[-1]
        shift = dim ** 3
        binsize = 1.000001 / (dim-1)
        W = x.size(2)
        H = x.size(3)
        batch = x.size(1) # this changes
        # print(batch)
        assert 1 == trilinear.forward(lut, 
                                      x, 
                                      output,
                                      dim, 
                                      shift, 
                                      binsize, 
                                      W, 
                                      H, 
                                      batch)

        int_package = torch.IntTensor([dim, shift, W, H, batch])
        float_package = torch.FloatTensor([binsize])
        variables = [lut, x, int_package, float_package]
        
        ctx.save_for_backward(*variables)
        
        return lut, output
    
    @staticmethod
    def backward(ctx, lut_grad, x_grad):
        
        lut, x, int_package, float_package = ctx.saved_variables
        dim, shift, W, H, batch = int_package
        dim, shift, W, H, batch = int(dim), int(shift), int(W), int(H), int(batch)
        binsize = float(float_package[0])
            
        assert 1 == trilinear.backward(x, 
                                       x_grad, 
                                       lut_grad,
                                       dim, 
                                       shift, 
                                       binsize, 
                                       W, 
                                       H, 
                                       batch)
        return lut_grad, x_grad

all the modifications aim to make the input of LUT in shape [3,batch_size,width,height] and then reshape result of LUT into [batch_size,3,width,height].

Another important thing is that the input values should be in range [0,1]

I don't know whether this will help If your code fails when batch_size > 1.
but if you set batch_size > 1, you should modify the code. because when batch_size > 1, the original code is not a implementation of LUT.

@WEIIEW97
Copy link
Author

WEIIEW97 commented Sep 16, 2022

@ironheads
Thank you so much! I will do further research on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants