Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss Curve #10

Open
pvcastro opened this issue Mar 25, 2020 · 11 comments
Open

Loss Curve #10

pvcastro opened this issue Mar 25, 2020 · 11 comments

Comments

@pvcastro
Copy link

Hi @plkmo !

Great work!

Do you have the loss curve available from your trainings (pre-training and semeval training) so we can check if our experiments are matching yours?

Thanks!

@pvcastro
Copy link
Author

Sorry, missed the results folder

@plkmo
Copy link
Owner

plkmo commented Mar 26, 2020

Hi, yes the training loss curve for semeval training is in the results folder. Please note that the MTB model has been updated and hence the old loss curve for MTB pretraining should be ignored

@pvcastro
Copy link
Author

Thanks @plkmo !
Are you uploading an updated one?

@plkmo
Copy link
Owner

plkmo commented Mar 28, 2020

Yup, will do so once I have the available GPU compute to satisfactorily pre-train it on suitable data.

@pvcastro pvcastro reopened this Mar 31, 2020
@pvcastro
Copy link
Author

loss_vs_epoch_0
accuracy_vs_epoch_0
I'm reopening since we're still discussing this 😅
I got these losses. Do you think they are ok?

@plkmo
Copy link
Owner

plkmo commented Apr 7, 2020

Looks good, I also got something like this with cnn dataset. But note that the loss consists of lm_loss + MTB_loss. From what I can see, lm_loss seems to decrease much more than MTB loss.

If you can, try a larger dataset for MTB pre-training, as cnn dataset might be too small. Eg. the paper used wiki dumps data which is huge.

@pvcastro
Copy link
Author

pvcastro commented Apr 7, 2020

@plkmo From what I could see, you weren't able to get good results from MTB using cnn either, right? I did a pretraining and applied it on the task afterwards, and the results were quite worse than using bert alone.

@plkmo
Copy link
Owner

plkmo commented Apr 11, 2020

Yeah, no good results pretraining MTB based on CNN dataset so far. Best is to directly fine-tune using pre-trained BERT.

@potatoper
Copy link

sorry to bother you, but when I run the program first day, it did work.
But the next day there was a problem with the program when I run the program.

it said IndexError: ('list index out of range', 'occurred at index 47')
please help , I really appreciate if you can have a look on it , sorry for you time
thanks a lot

prog-bar: 100%|██████████| 8000/8000 [00:01<00:00, 4026.81it/s]
prog-bar: 1%| | 96/8000 [00:00<00:00, 13751.82it/s]
Traceback (most recent call last):
File "C:/article/MTB/main_task.py", line 49, in
net = train_and_fit(args)
File "C:\article\MTB\src\tasks\trainer.py", line 33, in train_and_fit
train_loader, test_loader, train_len, test_len = load_dataloaders(args)
File "C:\article\MTB\src\tasks\preprocessing_funcs.py", line 178, in load_dataloaders
train_set = semeval_dataset(df_train, tokenizer=tokenizer, e1_id=e1_id, e2_id=e2_id)
File "C:\article\MTB\src\tasks\preprocessing_funcs.py", line 133, in init
e1_id=self.e1_id, e2_id=self.e2_id), axis=1)
File "C:\ProgramData\Anaconda3\lib\site-packages\tqdm\std.py", line 767, in inner
return getattr(df, df_function)(wrapper, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 6004, in apply
return op.get_result()
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py", line 142, in get_result
return self.apply_standard()
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py", line 248, in apply_standard
self.apply_series_generator()
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py", line 277, in apply_series_generator
results[i] = self.f(v)
File "C:\ProgramData\Anaconda3\lib\site-packages\tqdm\std.py", line 762, in wrapper
return func(*args, **kwargs)
File "C:\article\MTB\src\tasks\preprocessing_funcs.py", line 133, in
e1_id=self.e1_id, e2_id=self.e2_id), axis=1)
File "C:\article\MTB\src\tasks\preprocessing_funcs.py", line 129, in get_e1e2_start
e1_e2_start = ([i for i, e in enumerate(x) if e == e1_id][0] , [i for i, e in enumerate(x) if e == e2_id][0])
IndexError: ('list index out of range', 'occurred at index 47')

@zjucheri
Copy link

zjucheri commented Jul 9, 2020

Yeah, no good results pretraining MTB based on CNN dataset so far. Best is to directly fine-tune using pre-trained BERT.

My result of MTB pretraining based on CNN dataset is bad too, and pretraining task takes a long time. I wonder how much time do you take to pretraining MTB with a good result?

@drevicko
Copy link

@zjucheri : I found that MTB training on the CNN data beyond about 9 epochs degraded performance on FewRel. The key to better performance on is probably to use a larger (and perhaps more relevant or at least generic) data set such as WikiPedia.

@plkmo: Thanks for sharing your rather nice code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants