Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consulting perplexity test question with RNN cell in ENAS #47

Open
shuotian17 opened this issue Sep 2, 2019 · 0 comments
Open

Consulting perplexity test question with RNN cell in ENAS #47

shuotian17 opened this issue Sep 2, 2019 · 0 comments

Comments

@shuotian17
Copy link

Hello,Kim.Currently I have been testing the RNN architecture in figure 6 found in the article. However, the perplexity I got is about 84 at about 41 epoch, which is not equal to 55.8 found in Table 1, Section 3.2 in ENAS. The details of my test experiment are as follows:
In the code, I use the "single" mode in the config.py to train the architecture of the figure 6 in the article. The DAG used is {-1: [Node(id=0, name='tanh')], -2: [Node(id=0, name='tanh')], 0: [Node(id=1, name='tanh')], 1: [Node(id=2, name='ReLU'), Node(id=3, name='tanh')], 2: [Node(id=4, name='ReLU'), Node(id=5, name='tanh'), Node(id=6, name='tanh')], 6: [Node(id=7, name='ReLU')], 7: [Node(id=8, name='ReLU')], 8: [Node(id=9, name='ReLU'), Node(id=10, name='ReLU'), Node(id=11, name='ReLU')], 3: [Node(id=12, name='avg')], 4: [Node(id=12, name='avg')], 5: [Node(id=12, name='avg')], 9: [Node(id=12, name='avg')], 10: [Node(id=12, name='avg')], 11: [Node(id=12, name='avg')], 12: [Node(id=13, name=‘h[t]')]}.

The data set used is PTB. For the Penn Treebank experiments, ω is trained for about 400 steps, each on a minibatch of 64 examples, where the gradient ∇ω is computed using back-propagation through time, truncated at 35-time steps. And I evaluate ppl over the entire validation set (batch size = 1).

The weights were trained using the SGD method, with an initial learning rate of 20, and after 15 epochs, it attenuated with a factor of 0.96. A total of 150 epoch were tested. The number of hidden layers is 1000 and the number of embed layers is 1000, the number of activation functional blocks is 12. The total parameters are (1000 + 1000)* 1000 * 12 = 24M.

As for techniques, Dropout = 0.5. And I set the activation_regularization, temporal_activation_regularization, temporal_activation_regularization_amount as True in config.py to use the weight penalty techniques in the code. Weight Tying is also used in the code. Additionally, I also augment the simple transformations between nodes in the constructed recurrent cell with highway connections(Zilly et al., 2017).

Could please tell me if I have done something wrong with the learning rate or other configurations for testing the DAG in figure 6 of the paper. I am very looking forward to your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant