Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensions #15

Open
pythonometrist opened this issue Sep 24, 2019 · 13 comments
Open

Extensions #15

pythonometrist opened this issue Sep 24, 2019 · 13 comments

Comments

@pythonometrist
Copy link

Thanks to your help - I have added custom losses, special initialization and a bunch of other things as extensions.

I am now trying to mess with the sentence classification model itself. It is a linear layer on top of the bert model. What I would like to do is a) freeze all of bert. b) add a cnn over and above. https://github.com/Shawn1993/cnn-text-classification-pytorch/blob/master/model.py

I ant to compare results with a fozen and unfrozen bert. Any pointers would be most appreciated.

@ThilinaRajapakse
Copy link
Owner

Should be pretty similar to adding custom losses. You can freeze all the layers by setting requires_grad = False for all of them in your subclassed model. You can add your convolutional layers to it as well, and define how you want them to be used in the forward method.
Hopefully, it won't mess with loading the weights from the pretrained model. I don't think it will.

@pythonometrist
Copy link
Author

Cool - let me try it out . While config.hidden_size is the size of the last layer from bert (and in some sense the size of my embedding, I guess I am struggling to figure out the size of vocabulary. It's probably the Bert vocabulary size hiding somewhere in the config. max_seq_length is user specified so we already can assume padded sequences.Agreed the rest is carefully initializing the model and writing up the forward correctly... (which might be non trivial for me!) Let me get back to you. Thanks.

@ThilinaRajapakse
Copy link
Owner

If it doesn't work, you can always decouple BERT and the CNN and just feed the BERT outputs to the CNN.

I'm no expert myself, but you seem to be doing fine to me!

@pythonometrist
Copy link
Author

pythonometrist commented Sep 24, 2019

Well - I got a model to work with some simple linear layers. So that is progress. I need to work out tensor sizes - bert is sending out tensors (64x768) - where 64 is batch size. I assume for each sentence I am receiving once embedding of size 768. I 've got to figure out how to go from there to a Vocabulary x Document matrix - I think it means that somewhere BERT is averaging over the words. OR I simply ned to forget about word embeddings and simply do a 1D convolution at the document level....will think some more and update.

@pythonometrist
Copy link
Author

You da boss. Yep can do all sorts of models once you realize they offer up access to all layers to convolve /lstm over. I am curious if you know about the apex installation - one seems to be pure python vs the other uses c compiler - which one do you use?

@ThilinaRajapakse
Copy link
Owner

Great!

I use the Apex version with C++ extensions. The pure python version is lacking a few features. I don't see any reason not to use the C++ version.

@pythonometrist
Copy link
Author

I am having some issue with apex on a debian server....well fingers crossed. Thanks for all the input! i had been wanting to get into pytorch for a while and now I am in!

@ThilinaRajapakse
Copy link
Owner

Odd. I never had issues with any Ubuntu based distros.

Welcome to Pytorch!

@pythonometrist
Copy link
Author

Thanks - its a server which is stuck on pip 8.1 . But looks like i could get it to wirk with conda. fingers crossed.

@pythonometrist
Copy link
Author

Ok it works with conda!!! - should apex batchnorm 32 be True? and O1 vs O2 - which way worked for you?

@ThilinaRajapakse
Copy link
Owner

I don't think I changed batchnorm. Doesn't it get set when you change the opt level? I used opt 1. Opt 2 was giving me NaN losses.

@pythonometrist
Copy link
Author

Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic

Tht is the default when I run the models - not sure if that should be something else.keep_batchnorm_fp32 : None , I'll dig around and report.

@ThilinaRajapakse
Copy link
Owner

Yeah, I just kept the defaults there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants