Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Predict labels given a translation #196

Open
echan00 opened this issue Aug 7, 2019 · 24 comments
Open

[Question] Predict labels given a translation #196

echan00 opened this issue Aug 7, 2019 · 24 comments
Assignees
Labels
pinned question Further information is requested

Comments

@echan00
Copy link
Contributor

echan00 commented Aug 7, 2019

@BrikerMan I want to say thank you for developing this library and providing incredible support along with it. Kudos to you!

I want to create a model that will predict labels given a translation and its labels. For example:

# THREE INPUTS
这是一只狗和这是一只红猫
0 0 0 0 B-TERM1 0 0 0 0 0 B-TERM2 I-TERM2
This is a dog and that is a panda

#OUTPUT
0 0 0 B-TERM1 0 0 0 0 B-TERM2     (the labels for the English input sentence)

My question is do you think this is possible using your library if I customize my own multi-input model? And if so, do you have any tips/suggestions for me?
It looks like I will mainly be using the Stacked Embedding feature and also referencing this article: https://kashgari.bmio.net/advance-use/multi-output-model/

@echan00 echan00 added the question Further information is requested label Aug 7, 2019
@echan00
Copy link
Contributor Author

echan00 commented Aug 8, 2019

Also, do you think it is better to create a model that only labels one term e.g B-TERM ?

Ultimately, I want to be able to identify multiple terms B-TERM1, B-TERM2, B-TERM3, etc.. in the translated text but it's not clear to me whether this would become an issue for the model since B-TERM1, B-TERM2, B-TERM3 are just identifiers and aren't distinct labels.

@echan00
Copy link
Contributor Author

echan00 commented Aug 8, 2019

I'm thinking out aloud here... since ultimately this is a transformer model I presume the sequence needs to be aligned.

Would I structure the data this way?

# TWO INPUTS
这是一只狗和这是一只红猫<div>This is a dog and that is a panda
0 0 0 0 B-TERM1 0 0 0 0 0 B-TERM2 I-TERM2<div>0 0 0 0 0 0 0 0 0

# OUTPUT
0 0 0 0 B-TERM1 0 0 0 0 0 B-TERM2 I-TERM2<div>0 0 0 B-TERM1 0 0 0 0 B-TERM2 

@BrikerMan
Copy link
Owner

I'm thinking out aloud here... since ultimately this is a transformer model I presume the sequence needs to be aligned.

Would I structure the data this way?

# TWO INPUTS
这是一只狗和这是一只红猫<div>This is a dog and that is a panda
0 0 0 0 B-TERM1 0 0 0 0 0 B-TERM2 I-TERM2<div>0 0 0 0 0 0 0 0 0

# OUTPUT
0 0 0 0 B-TERM1 0 0 0 0 0 B-TERM2 I-TERM2<div>0 0 0 B-TERM1 0 0 0 0 B-TERM2 

I think this is the right way to do. Just stack a text-embedding and a numeric feature embedding layer. Then you convert the second input as a numeric feature.

Only one thing u need to keep in mind. When using pre-trained embedding for text, use embedding.processor.bos_token (this might be the wrong key. Similar to this) as separator token.

@echan00
Copy link
Contributor Author

echan00 commented Aug 8, 2019

Thanks, I will try this.

How do you suggest I convert the second input into numeric feature?
I would want to keep B-TERM and I-TERM to capture the beginning and end of the term. But I would also want to be able to do B-TERM1, I-TERM2, B-TERM2, I-TERM2, B-TERM3, I-TERM3, etc.. to know which is the correlating term on the other side?

In my example above:
B-TERM1 is 狗 and B-TERM1 dog,
B-TERM2 and I-TERM2 is 红猫 and B-TERM2 is panda

I want the model to be able to specify the location of B-TERM1 and B-TERM2 (and more terms) in the translation.

@BrikerMan
Copy link
Owner

BrikerMan commented Aug 9, 2019

Just make a label dict then convert label to numeric sequece.

eg:

dic = {
"O": 0,
"B-TERM1": 1,
"B-TERM2": 2,
"B-TERM3": 3,
"B-TERM4": 4,
}

This will work

@echan00
Copy link
Contributor Author

echan00 commented Aug 27, 2019

What about [SEP], [CLS], [BOS] tokens in a numeric sequence? Do I keep them the same? Or also convert to a digit?

@BrikerMan
Copy link
Owner

BrikerMan commented Aug 28, 2019

You need to convert those tokens to digits.

@echan00
Copy link
Contributor Author

echan00 commented Aug 29, 2019

So it looks like my labeled data includes examples where B-TERM can go as high as B-TERM26 in one sentence (26 different terms tagged in the source and target text). Two questions:

1) Does it make sense to create additional training data where B-TERM1, B-TERM2, B-TERM3,..., B-TERM26 are more obviously interchangeable?

For example,
a) The original labeled data:

# TWO INPUTS
这是一只狗和这是一只红猫<div>This is a dog and that is a panda
0 0 0 0 B-TERM1 0 0 0 0 0 B-TERM2 I-TERM2<div>0 0 0 0 0 0 0 0 0
 
# OUTPUT
0 0 0 0 B-TERM1 0 0 0 0 0 B-TERM2 I-TERM2<div>0 0 0 B-TERM1 0 0 0 0 B-TERM2

b) Example above augmented to make sure the model knows B-TERM1.. B-TERM26 are no different.

# TWO INPUTS
这是一只狗和这是一只红猫<div>This is a dog and that is a panda
0 0 0 0 B-TERM13 0 0 0 0 0 B-TERM26 I-TERM26<div>0 0 0 0 0 0 0 0 0
 
# OUTPUT
0 0 0 0 B-TERM13 0 0 0 0 0 B-TERM26 I-TERM26<div>0 0 0 B-TERM13 0 0 0 0 B-TERM26

2) Since the model only learns word alignment, would it make sense to have data where fewer tags are used?

For example,
a) The original labeled data:

# TWO INPUTS
这是一只狗和这是一只红猫<div>This is a dog and that is a panda
0 0 0 0 B-TERM1 0 0 0 0 0 B-TERM2 I-TERM2<div>0 0 0 0 0 0 0 0 0
 
# OUTPUT
0 0 0 0 B-TERM1 0 0 0 0 0 B-TERM2 I-TERM2<div>0 0 0 B-TERM1 0 0 0 0 B-TERM2

b) Example above augmented to create two new sets:
i)

# TWO INPUTS
这是一只狗和这是一只红猫<div>This is a dog and that is a panda
0 0 0 0 B-TERM13 0 0 0 0 0 0 0<div>0 0 0 0 0 0 0 0 0
 
# OUTPUT
0 0 0 0 B-TERM13 0 0 0 0 0 0 0<div>0 0 0 B-TERM13 0 0 0 0 0

ii)

# TWO INPUTS
这是一只狗和这是一只红猫<div>This is a dog and that is a panda
0 0 0 0 0 0 0 0 0 0 B-TERM2 I-TERM2<div>0 0 0 0 0 0 0 0 0
 
# OUTPUT
0 0 0 0 0 0 0 0 0 0 B-TERM2 I-TERM2<div>0 0 0 0 0 0 0 0 B-TERM2

@BrikerMan
Copy link
Owner

BrikerMan commented Aug 29, 2019

Yes, Since you don't have 26 tags in one sentence and all you want the model to do is separate two entity's translation. The ii) way is much better and simpler for the model to learn.

@echan00
Copy link
Contributor Author

echan00 commented Aug 29, 2019

Yes, I agree only one term is much simpler for the model to learn.

However, in the case where I have a sentence that has 26 different terms (most sentences avg. 5-6 terms), a model trained on only one term will require 5-6 separate inferences/predictions correct?

@echan00
Copy link
Contributor Author

echan00 commented Sep 5, 2019

Posting my model and evaluation results below. Seems like something went wrong. Val_accuracy dropped significantly on epoch 7 and 10 (from 0.4062 -> 0.0117 -> 0.0078). My model basically didn't do any predictions.

>>> model = BiLSTM_CRF_Model(embedding=stack_embedding)
>>> model.fit(train_final_x, train_final_y, valid_final_x, valid_final_y, epochs=30)

Model: "model_13"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
Input-Token (InputLayer)        [(None, 512)]        0                                            
__________________________________________________________________________________________________
Input-Segment (InputLayer)      [(None, 512)]        0                                            
__________________________________________________________________________________________________
Embedding-Token (TokenEmbedding [(None, 512, 768), ( 91812096    Input-Token[0][0]                
__________________________________________________________________________________________________
Embedding-Segment (Embedding)   (None, 512, 768)     1536        Input-Segment[0][0]              
__________________________________________________________________________________________________
Embedding-Token-Segment (Add)   (None, 512, 768)     0           Embedding-Token[0][0]            
                                                                 Embedding-Segment[0][0]          
__________________________________________________________________________________________________
Embedding-Position (PositionEmb (None, 512, 768)     393216      Embedding-Token-Segment[0][0]    
__________________________________________________________________________________________________
Embedding-Dropout (Dropout)     (None, 512, 768)     0           Embedding-Position[0][0]         
__________________________________________________________________________________________________
Embedding-Norm (LayerNormalizat (None, 512, 768)     1536        Embedding-Dropout[0][0]          
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 512, 768)     2362368     Embedding-Norm[0][0]             
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-1-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 512, 768)     0           Embedding-Norm[0][0]             
                                                                 Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-1-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-1-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-1-MultiHeadSelfAttention-
                                                                 Encoder-1-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-1-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-1-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-1-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-2-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-1-FeedForward-Norm[0][0] 
                                                                 Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-2-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-2-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-2-MultiHeadSelfAttention-
                                                                 Encoder-2-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-2-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-2-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-2-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-3-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-2-FeedForward-Norm[0][0] 
                                                                 Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-3-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-3-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-3-MultiHeadSelfAttention-
                                                                 Encoder-3-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-3-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-3-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-3-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-4-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-3-FeedForward-Norm[0][0] 
                                                                 Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-4-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-4-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-4-MultiHeadSelfAttention-
                                                                 Encoder-4-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-4-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-4-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-4-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-5-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-4-FeedForward-Norm[0][0] 
                                                                 Encoder-5-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-5-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-5-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-5-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-5-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-5-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-5-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-5-MultiHeadSelfAttention-
                                                                 Encoder-5-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-5-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-5-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-5-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-6-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-5-FeedForward-Norm[0][0] 
                                                                 Encoder-6-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-6-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-6-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-6-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-6-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-6-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-6-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-6-MultiHeadSelfAttention-
                                                                 Encoder-6-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-6-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-6-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-6-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-7-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-6-FeedForward-Norm[0][0] 
                                                                 Encoder-7-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-7-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-7-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-7-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-7-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-7-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-7-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-7-MultiHeadSelfAttention-
                                                                 Encoder-7-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-7-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-7-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-8-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-7-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-8-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-8-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-8-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-7-FeedForward-Norm[0][0] 
                                                                 Encoder-8-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-8-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-8-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-8-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-8-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-8-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-8-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-8-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-8-MultiHeadSelfAttention-
                                                                 Encoder-8-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-8-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-8-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-8-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-9-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-8-FeedForward-Norm[0][0] 
                                                                 Encoder-9-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-9-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-9-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-9-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-9-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-9-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-9-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-9-MultiHeadSelfAttention-
                                                                 Encoder-9-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-9-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-9-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 512, 768)     2362368     Encoder-9-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-9-FeedForward-Norm[0][0] 
                                                                 Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 512, 768)     1536        Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-FeedForward (FeedFor (None, 512, 768)     4722432     Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-FeedForward-Dropout  (None, 512, 768)     0           Encoder-10-FeedForward[0][0]     
__________________________________________________________________________________________________
Encoder-10-FeedForward-Add (Add (None, 512, 768)     0           Encoder-10-MultiHeadSelfAttention
                                                                 Encoder-10-FeedForward-Dropout[0]
__________________________________________________________________________________________________
Encoder-10-FeedForward-Norm (La (None, 512, 768)     1536        Encoder-10-FeedForward-Add[0][0] 
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 512, 768)     2362368     Encoder-10-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-10-FeedForward-Norm[0][0]
                                                                 Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 512, 768)     1536        Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-FeedForward (FeedFor (None, 512, 768)     4722432     Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-FeedForward-Dropout  (None, 512, 768)     0           Encoder-11-FeedForward[0][0]     
__________________________________________________________________________________________________
Encoder-11-FeedForward-Add (Add (None, 512, 768)     0           Encoder-11-MultiHeadSelfAttention
                                                                 Encoder-11-FeedForward-Dropout[0]
__________________________________________________________________________________________________
Encoder-11-FeedForward-Norm (La (None, 512, 768)     1536        Encoder-11-FeedForward-Add[0][0] 
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 512, 768)     2362368     Encoder-11-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-11-FeedForward-Norm[0][0]
                                                                 Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 512, 768)     1536        Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-FeedForward (FeedFor (None, 512, 768)     4722432     Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-FeedForward-Dropout  (None, 512, 768)     0           Encoder-12-FeedForward[0][0]     
__________________________________________________________________________________________________
Encoder-12-FeedForward-Add (Add (None, 512, 768)     0           Encoder-12-MultiHeadSelfAttention
                                                                 Encoder-12-FeedForward-Dropout[0]
__________________________________________________________________________________________________
Encoder-12-FeedForward-Norm (La (None, 512, 768)     1536        Encoder-12-FeedForward-Add[0][0] 
__________________________________________________________________________________________________
Encoder-Output (Concatenate)    (None, 512, 3072)    0           Encoder-9-FeedForward-Norm[0][0] 
                                                                 Encoder-10-FeedForward-Norm[0][0]
                                                                 Encoder-11-FeedForward-Norm[0][0]
                                                                 Encoder-12-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
input_prelabel (InputLayer)     [(None, 512)]        0                                            
__________________________________________________________________________________________________
non_masking_layer_1 (NonMasking (None, 512, 3072)    0           Encoder-Output[0][0]             
__________________________________________________________________________________________________
layer_embedding_prelabel (Embed (None, 512, 48)      336         input_prelabel[0][0]             
__________________________________________________________________________________________________
layer_concatenate (Concatenate) (None, 512, 3120)    0           non_masking_layer_1[0][0]        
                                                                 layer_embedding_prelabel[0][0]   
__________________________________________________________________________________________________
layer_blstm (Bidirectional)     (None, 512, 256)     3328000     layer_concatenate[0][0]          
__________________________________________________________________________________________________
layer_dense (Dense)             (None, 512, 64)      16448       layer_blstm[0][0]                
__________________________________________________________________________________________________
layer_crf_dense (Dense)         (None, 512, 7)       455         layer_dense[0][0]                
__________________________________________________________________________________________________
layer_crf (CRF)                 (None, 512, 7)       49          layer_crf_dense[0][0]            
==================================================================================================
Total params: 180,608,136
Trainable params: 3,345,288
Non-trainable params: 177,262,848
__________________________________________________________________________________________________
Epoch 1/30
138/138 [==============================] - 168s 1s/step - loss: 29.4145 - accuracy: 0.9795 - val_loss: 874.7754 - val_accuracy: 0.4062
Epoch 2/30
138/138 [==============================] - 162s 1s/step - loss: 5.9310 - accuracy: 0.9922 - val_loss: 868.6963 - val_accuracy: 0.4062
Epoch 3/30
138/138 [==============================] - 160s 1s/step - loss: 4.6266 - accuracy: 0.9923 - val_loss: 863.4594 - val_accuracy: 0.4043
Epoch 4/30
138/138 [==============================] - 160s 1s/step - loss: 4.1985 - accuracy: 0.9922 - val_loss: 858.2342 - val_accuracy: 0.4014
Epoch 5/30
138/138 [==============================] - 160s 1s/step - loss: 3.8214 - accuracy: 0.9919 - val_loss: 854.0635 - val_accuracy: 0.4033
Epoch 6/30
138/138 [==============================] - 161s 1s/step - loss: 3.5015 - accuracy: 0.9923 - val_loss: 849.8724 - val_accuracy: 0.4033
Epoch 7/30
138/138 [==============================] - 161s 1s/step - loss: 3.2591 - accuracy: 0.9918 - val_loss: 846.0209 - val_accuracy: 0.0117
Epoch 8/30
138/138 [==============================] - 162s 1s/step - loss: 3.0582 - accuracy: 0.9924 - val_loss: 842.0798 - val_accuracy: 0.0117
Epoch 9/30
138/138 [==============================] - 161s 1s/step - loss: 2.8829 - accuracy: 0.9923 - val_loss: 838.9713 - val_accuracy: 0.0078
Epoch 10/30
138/138 [==============================] - 160s 1s/step - loss: 2.7249 - accuracy: 0.9923 - val_loss: 836.8273 - val_accuracy: 0.0078
Epoch 11/30
138/138 [==============================] - 161s 1s/step - loss: 2.5940 - accuracy: 0.9924 - val_loss: 834.6110 - val_accuracy: 0.0078
Epoch 12/30
138/138 [==============================] - 160s 1s/step - loss: 2.4716 - accuracy: 0.9923 - val_loss: 832.8401 - val_accuracy: 0.0078
Epoch 13/30
138/138 [==============================] - 160s 1s/step - loss: 2.3710 - accuracy: 0.9916 - val_loss: 831.4882 - val_accuracy: 0.0078
Epoch 14/30
138/138 [==============================] - 162s 1s/step - loss: 2.2678 - accuracy: 0.9920 - val_loss: 830.6600 - val_accuracy: 0.0078
Epoch 15/30
138/138 [==============================] - 161s 1s/step - loss: 2.1885 - accuracy: 0.9921 - val_loss: 830.3491 - val_accuracy: 0.0078
Epoch 16/30
138/138 [==============================] - 160s 1s/step - loss: 2.1182 - accuracy: 0.9922 - val_loss: 830.5637 - val_accuracy: 0.0078
Epoch 17/30
138/138 [==============================] - 161s 1s/step - loss: 2.0460 - accuracy: 0.9923 - val_loss: 831.3858 - val_accuracy: 0.0078
Epoch 18/30
138/138 [==============================] - 162s 1s/step - loss: 1.9921 - accuracy: 0.9918 - val_loss: 829.7682 - val_accuracy: 0.0078
Epoch 19/30
138/138 [==============================] - 161s 1s/step - loss: 1.9395 - accuracy: 0.9922 - val_loss: 834.9973 - val_accuracy: 0.0078
Epoch 20/30
138/138 [==============================] - 161s 1s/step - loss: 1.8993 - accuracy: 0.9921 - val_loss: 834.2047 - val_accuracy: 0.0078
Epoch 21/30
138/138 [==============================] - 162s 1s/step - loss: 1.8549 - accuracy: 0.9921 - val_loss: 837.5099 - val_accuracy: 0.0078
Epoch 22/30
138/138 [==============================] - 161s 1s/step - loss: 1.8235 - accuracy: 0.9923 - val_loss: 845.4752 - val_accuracy: 0.0078
Epoch 23/30
138/138 [==============================] - 160s 1s/step - loss: 1.7807 - accuracy: 0.9920 - val_loss: 848.3061 - val_accuracy: 0.0078
Epoch 24/30
138/138 [==============================] - 161s 1s/step - loss: 1.7530 - accuracy: 0.9920 - val_loss: 855.9905 - val_accuracy: 0.0078
Epoch 25/30
138/138 [==============================] - 161s 1s/step - loss: 1.7240 - accuracy: 0.9922 - val_loss: 862.0945 - val_accuracy: 0.0078
Epoch 26/30
138/138 [==============================] - 160s 1s/step - loss: 1.6949 - accuracy: 0.9923 - val_loss: 864.2795 - val_accuracy: 0.0078
Epoch 27/30
138/138 [==============================] - 162s 1s/step - loss: 1.6772 - accuracy: 0.9920 - val_loss: 876.2658 - val_accuracy: 0.0078
Epoch 28/30
138/138 [==============================] - 161s 1s/step - loss: 1.6572 - accuracy: 0.9921 - val_loss: 884.0258 - val_accuracy: 0.0078
Epoch 29/30
138/138 [==============================] - 160s 1s/step - loss: 1.6319 - accuracy: 0.9923 - val_loss: 892.2784 - val_accuracy: 0.0078
Epoch 30/30
138/138 [==============================] - 161s 1s/step - loss: 1.6425 - accuracy: 0.9921 - val_loss: 900.7044 - val_accuracy: 0.0078
<tensorflow.python.keras.callbacks.History object at 0x7ff4d7bcbda0>


>>> model.new_evaluate(test_final_x, test_final_y, debug_info=False)

              precision    recall  f1-score   support

        0     0.0040    0.0020    0.0027     11506
        2     1.0000    0.9519    0.9753      2929
        5     0.8000    0.0016    0.0032      4962
        1     1.0000    0.9997    0.9998      2929
        4     0.7500    0.0015    0.0030      5904
        3     0.9737    0.9594    0.9665      2929

micro avg     0.5924    0.2749    0.3755     31159
macro avg     0.5505    0.2749    0.2786     31159

Possibly related to #217 ?

@BrikerMan
Copy link
Owner

Please try it without the CRF layer, let check out is that an issue related to CRF Layer.

@echan00
Copy link
Contributor Author

echan00 commented Sep 6, 2019

I tried the BiLSTM_Model (val accuracy stayed at 0.99 throughout training), but results don't seem to be good.

Accuracy for [CLS], [SEP], [BOS] are 98-99% accurate. But B-TERM, I-TERM, and O predictions are mostly wrong.

I would appreciate any suggestions or ideas!

>>> model = BiLSTM_Model(embedding=stack_embedding)
>>> model.fit(train_final_x, train_final_y, valid_final_x, valid_final_y, epochs=30)

Model: "model_6"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
Input-Token (InputLayer)        [(None, 512)]        0                                            
__________________________________________________________________________________________________
Input-Segment (InputLayer)      [(None, 512)]        0                                            
__________________________________________________________________________________________________
Embedding-Token (TokenEmbedding [(None, 512, 768), ( 91812096    Input-Token[0][0]                
__________________________________________________________________________________________________
Embedding-Segment (Embedding)   (None, 512, 768)     1536        Input-Segment[0][0]              
__________________________________________________________________________________________________
Embedding-Token-Segment (Add)   (None, 512, 768)     0           Embedding-Token[0][0]            
                                                                 Embedding-Segment[0][0]          
__________________________________________________________________________________________________
Embedding-Position (PositionEmb (None, 512, 768)     393216      Embedding-Token-Segment[0][0]    
__________________________________________________________________________________________________
Embedding-Dropout (Dropout)     (None, 512, 768)     0           Embedding-Position[0][0]         
__________________________________________________________________________________________________
Embedding-Norm (LayerNormalizat (None, 512, 768)     1536        Embedding-Dropout[0][0]          
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 512, 768)     2362368     Embedding-Norm[0][0]             
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-1-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 512, 768)     0           Embedding-Norm[0][0]             
                                                                 Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________                                       
Encoder-1-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-1-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-1-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-1-MultiHeadSelfAttention-
                                                                 Encoder-1-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-1-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-1-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-1-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-2-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-1-FeedForward-Norm[0][0] 
                                                                 Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-2-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-2-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-2-MultiHeadSelfAttention-
                                                                 Encoder-2-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-2-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-2-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-2-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-3-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-2-FeedForward-Norm[0][0] 
                                                                 Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-3-FeedForward[0][0]  
__________________________________________________________________________________________________                                       
Encoder-3-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-3-MultiHeadSelfAttention-
                                                                 Encoder-3-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-3-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-3-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-3-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-4-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-3-FeedForward-Norm[0][0] 
                                                                 Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-4-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-4-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-4-MultiHeadSelfAttention-
                                                                 Encoder-4-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-4-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-4-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-4-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-5-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-4-FeedForward-Norm[0][0] 
                                                                 Encoder-5-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-5-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-5-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-5-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-5-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-5-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-5-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-5-MultiHeadSelfAttention-
                                                                 Encoder-5-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-5-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-5-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-5-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-6-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-5-FeedForward-Norm[0][0] 
                                                                 Encoder-6-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-6-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-6-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-6-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-6-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-6-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-6-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-6-MultiHeadSelfAttention-
                                                                 Encoder-6-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-6-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-6-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-6-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-7-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-6-FeedForward-Norm[0][0] 
                                                                 Encoder-7-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-7-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-7-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-7-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-7-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-7-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-7-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-7-MultiHeadSelfAttention-
                                                                 Encoder-7-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-7-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-7-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-8-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-7-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
__________________________________________________________________________________________________
Encoder-8-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-7-FeedForward-Norm[0][0] 
                                                                 Encoder-8-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-8-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-8-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-8-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-8-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-8-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-8-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-8-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-8-MultiHeadSelfAttention-
                                                                 Encoder-8-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-8-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-8-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 512, 768)     2362368     Encoder-8-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-9-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 512, 768)     0           Encoder-8-FeedForward-Norm[0][0] 
                                                                 Encoder-9-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 512, 768)     1536        Encoder-9-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-9-FeedForward (FeedForw (None, 512, 768)     4722432     Encoder-9-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-9-FeedForward-Dropout ( (None, 512, 768)     0           Encoder-9-FeedForward[0][0]      
__________________________________________________________________________________________________
Encoder-9-FeedForward-Add (Add) (None, 512, 768)     0           Encoder-9-MultiHeadSelfAttention-
                                                                 Encoder-9-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-9-FeedForward-Norm (Lay (None, 512, 768)     1536        Encoder-9-FeedForward-Add[0][0]  
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 512, 768)     2362368     Encoder-9-FeedForward-Norm[0][0] 
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-9-FeedForward-Norm[0][0] 
                                                                 Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 512, 768)     1536        Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-FeedForward (FeedFor (None, 512, 768)     4722432     Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-FeedForward-Dropout  (None, 512, 768)     0           Encoder-10-FeedForward[0][0]     
__________________________________________________________________________________________________
Encoder-10-FeedForward-Add (Add (None, 512, 768)     0           Encoder-10-MultiHeadSelfAttention
                                                                 Encoder-10-FeedForward-Dropout[0]
__________________________________________________________________________________________________
Encoder-10-FeedForward-Norm (La (None, 512, 768)     1536        Encoder-10-FeedForward-Add[0][0] 
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 512, 768)     2362368     Encoder-10-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-10-FeedForward-Norm[0][0]
                                                                 Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 512, 768)     1536        Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-FeedForward (FeedFor (None, 512, 768)     4722432     Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-FeedForward-Dropout  (None, 512, 768)     0           Encoder-11-FeedForward[0][0]     
__________________________________________________________________________________________________
Encoder-11-FeedForward-Add (Add (None, 512, 768)     0           Encoder-11-MultiHeadSelfAttention
                                                                 Encoder-11-FeedForward-Dropout[0]
__________________________________________________________________________________________________
Encoder-11-FeedForward-Norm (La (None, 512, 768)     1536        Encoder-11-FeedForward-Add[0][0] 
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 512, 768)     2362368     Encoder-11-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 512, 768)     0           Encoder-11-FeedForward-Norm[0][0]
                                                                 Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 512, 768)     1536        Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-FeedForward (FeedFor (None, 512, 768)     4722432     Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________                                                                         [120/1919]
Encoder-12-FeedForward-Dropout  (None, 512, 768)     0           Encoder-12-FeedForward[0][0]     
__________________________________________________________________________________________________
Encoder-12-FeedForward-Add (Add (None, 512, 768)     0           Encoder-12-MultiHeadSelfAttention
                                                                 Encoder-12-FeedForward-Dropout[0]
__________________________________________________________________________________________________
Encoder-12-FeedForward-Norm (La (None, 512, 768)     1536        Encoder-12-FeedForward-Add[0][0] 
__________________________________________________________________________________________________
Encoder-Output (Concatenate)    (None, 512, 3072)    0           Encoder-9-FeedForward-Norm[0][0] 
                                                                 Encoder-10-FeedForward-Norm[0][0]
                                                                 Encoder-11-FeedForward-Norm[0][0]
                                                                 Encoder-12-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
input_prelabel (InputLayer)     [(None, 512)]        0                                            
__________________________________________________________________________________________________
non_masking_layer (NonMaskingLa (None, 512, 3072)    0           Encoder-Output[0][0]             
__________________________________________________________________________________________________
layer_embedding_prelabel (Embed (None, 512, 48)      336         input_prelabel[0][0]             
__________________________________________________________________________________________________
layer_concatenate (Concatenate) (None, 512, 3120)    0           non_masking_layer[0][0]          
                                                                 layer_embedding_prelabel[0][0]   
__________________________________________________________________________________________________
layer_blstm (Bidirectional)     (None, 512, 256)     3328000     layer_concatenate[0][0]          
__________________________________________________________________________________________________
layer_dropout (Dropout)         (None, 512, 256)     0           layer_blstm[0][0]                
__________________________________________________________________________________________________
layer_time_distributed (TimeDis (None, 512, 7)       1799        layer_dropout[0][0]              
__________________________________________________________________________________________________
activation (Activation)         (None, 512, 7)       0           layer_time_distributed[0][0]     
==================================================================================================
Total params: 180,592,983
Trainable params: 3,330,135
Non-trainable params: 177,262,848
__________________________________________________________________________________________________
Epoch 1/30
138/138 [==============================] - 40s 287ms/step - loss: 0.0509 - acc: 0.9789 - val_loss: 0.0169 - val_acc: 0.9912
Epoch 2/30
138/138 [==============================] - 36s 258ms/step - loss: 0.0175 - acc: 0.9894 - val_loss: 0.0145 - val_acc: 0.9902
Epoch 3/30
138/138 [==============================] - 36s 258ms/step - loss: 0.0144 - acc: 0.9912 - val_loss: 0.0119 - val_acc: 0.9922
Epoch 4/30
138/138 [==============================] - 35s 256ms/step - loss: 0.0124 - acc: 0.9921 - val_loss: 0.0113 - val_acc: 0.9922
Epoch 5/30
138/138 [==============================] - 35s 256ms/step - loss: 0.0121 - acc: 0.9921 - val_loss: 0.0112 - val_acc: 0.9922
Epoch 6/30
138/138 [==============================] - 35s 256ms/step - loss: 0.0119 - acc: 0.9923 - val_loss: 0.0112 - val_acc: 0.9922
Epoch 7/30
138/138 [==============================] - 35s 257ms/step - loss: 0.0118 - acc: 0.9920 - val_loss: 0.0110 - val_acc: 0.9922
Epoch 8/30
138/138 [==============================] - 36s 258ms/step - loss: 0.0115 - acc: 0.9925 - val_loss: 0.0110 - val_acc: 0.9922
Epoch 9/30
138/138 [==============================] - 35s 257ms/step - loss: 0.0114 - acc: 0.9924 - val_loss: 0.0110 - val_acc: 0.9922
Epoch 10/30
138/138 [==============================] - 35s 256ms/step - loss: 0.0114 - acc: 0.9923 - val_loss: 0.0110 - val_acc: 0.9922
Epoch 11/30
138/138 [==============================] - 35s 255ms/step - loss: 0.0114 - acc: 0.9923 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 12/30
138/138 [==============================] - 35s 256ms/step - loss: 0.0114 - acc: 0.9921 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 13/30
138/138 [==============================] - 36s 260ms/step - loss: 0.0113 - acc: 0.9923 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 14/30
138/138 [==============================] - 36s 261ms/step - loss: 0.0112 - acc: 0.9924 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 15/30
138/138 [==============================] - 36s 258ms/step - loss: 0.0114 - acc: 0.9924 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 16/30
138/138 [==============================] - 36s 260ms/step - loss: 0.0113 - acc: 0.9921 - val_loss: 0.0110 - val_acc: 0.9922
Epoch 17/30
138/138 [==============================] - 36s 261ms/step - loss: 0.0114 - acc: 0.9919 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 18/30
138/138 [==============================] - 36s 259ms/step - loss: 0.0113 - acc: 0.9920 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 19/30
138/138 [==============================] - 35s 257ms/step - loss: 0.0111 - acc: 0.9923 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 20/30
138/138 [==============================] - 36s 260ms/step - loss: 0.0112 - acc: 0.9921 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 21/30
138/138 [==============================] - 36s 258ms/step - loss: 0.0111 - acc: 0.9924 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 22/30
138/138 [==============================] - 36s 259ms/step - loss: 0.0111 - acc: 0.9920 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 23/30
138/138 [==============================] - 36s 260ms/step - loss: 0.0111 - acc: 0.9922 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 24/30
138/138 [==============================] - 36s 260ms/step - loss: 0.0111 - acc: 0.9920 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 25/30
138/138 [==============================] - 36s 260ms/step - loss: 0.0111 - acc: 0.9919 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 26/30
138/138 [==============================] - 36s 260ms/step - loss: 0.0111 - acc: 0.9921 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 27/30
138/138 [==============================] - 36s 259ms/step - loss: 0.0110 - acc: 0.9923 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 28/30
138/138 [==============================] - 36s 262ms/step - loss: 0.0110 - acc: 0.9923 - val_loss: 0.0108 - val_acc: 0.9922
Epoch 29/30
138/138 [==============================] - 36s 262ms/step - loss: 0.0111 - acc: 0.9920 - val_loss: 0.0109 - val_acc: 0.9922
Epoch 30/30
138/138 [==============================] - 36s 259ms/step - loss: 0.0112 - acc: 0.9920 - val_loss: 0.0108 - val_acc: 0.9922
<tensorflow.python.keras.callbacks.History object at 0x7f1ae76a5b00>

>>> model.new_evaluate(test_final_x, test_final_y, debug_info=False)

           precision    recall  f1-score   support

        3     1.0000    0.9884    0.9942      2929
        1     0.1392    0.0791    0.1009     11506
        4     0.9993    0.9805    0.9898      2929
        2     1.0000    1.0000    1.0000      2929
        6     1.0000    0.1296    0.2294      5904
        7     0.2857    0.0016    0.0032      4962

micro avg     0.6474    0.3331    0.4399     31159
macro avg     0.5683    0.3331    0.3617     31159

@BrikerMan
Copy link
Owner

@echan00 What is the evaluation result when you evaluate on the training set and the validation set? Maybe this is because of not enough dataset?

@echan00
Copy link
Contributor Author

echan00 commented Sep 6, 2019

Providing an example of my training data:

dict = {
	"O": 1, "[CLS]": 2, "[SEP]": 3, "[BOS]": 4, "B-TERM1": 6, "I-TERM1": 7
}

>>> train_x[10]
['[CLS]', 'during', 'the', 'track', 'record', 'period', ',', 'our', 'average', 'unit', 'land', 'acquisition', 'cost', 'based', 'on', 'g', '##fa', 'represented', '16', '.', '9', '%', 'of', 'our', 'average', 'unit', 'selling', 'price', 'based', 'on', 'g', '##fa', '.', '[BOS]', '于', '往', '绩', '记', '录', '期', '内', ',', '按', '建', '筑', '面', '积', '计', '算', '的', '平', '均', '单', '位', '土', '地', '收', '购', '成', '本', '为', '按', '建', '筑', '面', '积', '计', '算', '的', '平', '均', '单', '位', '售', '价', '的', '16', '.', '9', '%', '。', '[SEP]']

>>> train_x1[10]
[2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 7, 7, 7, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3]

>>> train_y[10]
[2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 7, 7, 7, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 7, 7, 7, 7, 7, 1, 1, 1, 1, 1, 1, 3]

@BrikerMan
Copy link
Owner

I mean the result of

model.new_evaluate(train_final_x, train_final_y, debug_info=False)
model.new_evaluate(valid_final_x, valid_final_y, debug_info=False)

@echan00
Copy link
Contributor Author

echan00 commented Sep 6, 2019

What is the evaluation result when you evaluate on the training set and the validation set? Maybe this is because of not enough dataset?

Good idea. Evaluation result of training and validation set are similar to test set.

>>> model.new_evaluate(train_final_x, train_final_y, debug_info=False)

           precision    recall  f1-score   support

        3     1.0000    0.9829    0.9914      8788
        1     0.1368    0.0778    0.0992     34376
        4     0.9997    0.9761    0.9877      8788
        2     0.9995    0.9995    0.9995      8788
        6     1.0000    0.1281    0.2271     17691
        7     0.2348    0.0018    0.0035     15145

micro avg     0.6456    0.3309    0.4376     93576
macro avg     0.5590    0.3309    0.3597     93576

>>> model.new_evaluate(valid_final_x, valid_final_y, debug_info=False)

           precision    recall  f1-score   support

        3     1.0000    0.9802    0.9900      5859
        1     0.1357    0.0772    0.0984     22870
        4     0.9998    0.9739    0.9867      5859
        2     0.9993    0.9993    0.9993      5859
        6     1.0000    0.1273    0.2259     11787
        7     0.2184    0.0019    0.0037     10183

micro avg     0.6446    0.3299    0.4364     62417
macro avg     0.5557    0.3299    0.3587     62417

@echan00
Copy link
Contributor Author

echan00 commented Sep 7, 2019

Seems like the model isn't learning anything. @BrikerMan do you think this is a problem with the library? Or something I am not doing correctly?

@BrikerMan
Copy link
Owner

However, in the case where I have a sentence that has 26 different terms (most sentences avg. 5-6 terms), a model trained on only one term will require 5-6 separate inferences/predictions correct?

Maybe our strategy is wrong. Just try using 26 different tags? And check out which label is not easy to learn?

@echan00
Copy link
Contributor Author

echan00 commented Sep 8, 2019

As I imagined, the model treats each of the 26 different tags as separate entities, which makes it harder for the model to learn. Accuracy for [CLS], [SEP], [BOS] are still 95%+ accurate, but the rest of the predictions are mostly incorrect.

           precision    recall  f1-score   support

        6     0.4298    0.0705    0.1211      1390
        1     0.2075    0.1009    0.1357      7257
       28     0.0000    0.0000    0.0000       381
       10     0.2336    0.0349    0.0607       917
        7     0.1521    0.0880    0.1115      1171
        2     0.9974    0.9974    0.9974       771
       17     0.0000    0.0000    0.0000       133
        9     0.0278    0.0071    0.0113       984
       19     0.0000    0.0000    0.0000        93
       12     0.0000    0.0000    0.0000       735
       14     0.0698    0.0108    0.0187       556
       30     0.0000    0.0000    0.0000       285
        8     0.1883    0.0899    0.1217      1146
       11     0.0671    0.0130    0.0218       767
        4     0.9225    0.9887    0.9545       710
       32     0.0000    0.0000    0.0000       196
        3     1.0000    0.9351    0.9665       771
       27     0.0000    0.0000    0.0000        23
       29     0.0000    0.0000    0.0000       324
       15     0.0242    0.0286    0.0262       454
       31     0.0000    0.0000    0.0000       233
       13     0.0408    0.0097    0.0157       618
       40     0.0000    0.0000    0.0000        30
       24     0.0000    0.0000    0.0000        59
       26     0.0000    0.0000    0.0000        32
       49     0.0000    0.0000    0.0000        20
       33     0.0000    0.0000    0.0000       165
       21     0.0000    0.0000    0.0000        80
       20     0.0000    0.0000    0.0000        93
       18     0.0000    0.0000    0.0000       115
       16     0.0000    0.0000    0.0000       156
       25     0.0000    0.0000    0.0000        47
       47     0.0000    0.0000    0.0000        10
       35     0.0000    0.0000    0.0000         2
       23     0.0000    0.0000    0.0000        55
       38     0.0000    0.0000    0.0000         8
       39     0.0000    0.0000    0.0000         8
       22     0.0000    0.0000    0.0000        65
       46     0.0000    0.0000    0.0000        11
       41     0.0000    0.0000    0.0000        26
       44     0.0000    0.0000    0.0000        12
       43     0.0000    0.0000    0.0000         5
       50     0.0000    0.0000    0.0000        10
       48     0.0000    0.0000    0.0000        24
       42     0.0000    0.0000    0.0000         7
       34     0.0000    0.0000    0.0000         2
       45     0.0000    0.0000    0.0000        11
       37     0.0000    0.0000    0.0000         1
       51     0.0000    0.0000    0.0000         7
       36     0.0000    0.0000    0.0000         2

micro avg     0.3828    0.1574    0.2231     20978
macro avg     0.2412    0.1574    0.1778     20978

@echan00
Copy link
Contributor Author

echan00 commented Sep 8, 2019

Looking at the results, my guess is the simpler model with one keyword is better.

But the main problem where the model isn't learning anything is the main issue. Theoretically, BERT embedding should be able to help with this task... I wonder if there is something else we can try to debug the issue and figure out how to get this to work.

@echan00
Copy link
Contributor Author

echan00 commented Sep 11, 2019

While debugging I've found that even when the training input is the same as the desired output the model is still inaccurate.

For example:

# TWO INPUTS
[CLS] 这是一只狗和这是一只红猫 [MOS] This is a dog and that is a panda [SEP]
3 0 0 0 0 1 0 0 0 0 0 0 0 4 0 0 0 1 0 0 0 0 0 2

# OUTPUT
3 0 0 0 0 1 0 0 0 0 0 0 0 4 0 0 0 1 0 0 0 0 0 2

The majority of predictions will look like this:

3 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 2

Perhaps there is a bug with stacked or numeric embedding?

@BrikerMan
Copy link
Owner

I think this issue may be related to the numeric embedding feature. Maybe we just add the numeric feature to WordEmbedding rather than embed the numeric feature then add WordEmbed result and numeric embed result.

Here is the new numeric embedding, just add this to your code and replace the old numeric embedding.

class NumericFeaturesEmbeddingV2(NumericFeaturesEmbedding):
    """Embedding layer without pre-training, train embedding layer while training model"""
    def _build_model(self, **kwargs):
        self.embed_model = keras.Sequential([
            L.Reshape((self.sequence_length, 1), input_shape=(self.sequence_length,))
        ])

Old Stacked Embedding = Word Embedding + Numeric feature Embedding
New Stacked Embedding = Word Embedding + Numeric feature (Without embedding since it is just a number)

@echan00
Copy link
Contributor Author

echan00 commented Sep 12, 2019

Here is the new numeric embedding, just add this to your code and replace the old numeric embedding.

I just tried this, results are identical. Looks like the numeric features (with or without embedding) are not registering since the F1 score is still close to zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pinned question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants