Consulting perplexity test question with RNN cell in ENAS #47

shuotian17 · 2019-09-02T02:25:09Z

Hello，Kim.Currently I have been testing the RNN architecture in figure 6 found in the article. However, the perplexity I got is about 84 at about 41 epoch, which is not equal to 55.8 found in Table 1, Section 3.2 in ENAS. The details of my test experiment are as follows:
In the code, I use the "single" mode in the config.py to train the architecture of the figure 6 in the article. The DAG used is {-1: [Node(id=0, name='tanh')], -2: [Node(id=0, name='tanh')], 0: [Node(id=1, name='tanh')], 1: [Node(id=2, name='ReLU'), Node(id=3, name='tanh')], 2: [Node(id=4, name='ReLU'), Node(id=5, name='tanh'), Node(id=6, name='tanh')], 6: [Node(id=7, name='ReLU')], 7: [Node(id=8, name='ReLU')], 8: [Node(id=9, name='ReLU'), Node(id=10, name='ReLU'), Node(id=11, name='ReLU')], 3: [Node(id=12, name='avg')], 4: [Node(id=12, name='avg')], 5: [Node(id=12, name='avg')], 9: [Node(id=12, name='avg')], 10: [Node(id=12, name='avg')], 11: [Node(id=12, name='avg')], 12: [Node(id=13, name=‘h[t]')]}.

The data set used is PTB. For the Penn Treebank experiments, ω is trained for about 400 steps, each on a minibatch of 64 examples, where the gradient ∇ω is computed using back-propagation through time, truncated at 35-time steps. And I evaluate ppl over the entire validation set (batch size = 1).

The weights were trained using the SGD method, with an initial learning rate of 20, and after 15 epochs, it attenuated with a factor of 0.96. A total of 150 epoch were tested. The number of hidden layers is 1000 and the number of embed layers is 1000, the number of activation functional blocks is 12. The total parameters are (1000 + 1000)* 1000 * 12 = 24M.

As for techniques, Dropout = 0.5. And I set the activation_regularization, temporal_activation_regularization, temporal_activation_regularization_amount as True in config.py to use the weight penalty techniques in the code. Weight Tying is also used in the code. Additionally, I also augment the simple transformations between nodes in the constructed recurrent cell with highway connections(Zilly et al., 2017).

Could please tell me if I have done something wrong with the learning rate or other configurations for testing the DAG in figure 6 of the paper. I am very looking forward to your reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consulting perplexity test question with RNN cell in ENAS #47

Consulting perplexity test question with RNN cell in ENAS #47

shuotian17 commented Sep 2, 2019

Consulting perplexity test question with RNN cell in ENAS #47

Consulting perplexity test question with RNN cell in ENAS #47

Comments

shuotian17 commented Sep 2, 2019