Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable VAE train example #259

Open
swapnull7 opened this issue Dec 9, 2019 · 0 comments
Open

Unstable VAE train example #259

swapnull7 opened this issue Dec 9, 2019 · 0 comments
Labels
topic: examples Issue about examples

Comments

@swapnull7
Copy link
Collaborator

swapnull7 commented Dec 9, 2019

Training VAE with default params seems unstable.
Logs:

train: epoch 0, step 200, nll 150.3577, klw: 0.1153, KL 0.2740,  rc 150.3278, log_ppl 6.7723, ppl 873.2936, time elapsed: 19.7s
train: epoch 0, step 400, nll 148.7724, klw: 0.1305, KL 1.0032,  rc 148.6502, log_ppl 6.6977, ppl 810.5463, time elapsed: 36.8s
train: epoch 0, step 600, nll 148.2995, klw: 0.1457, KL 1.5545,  rc 148.0989, log_ppl 6.6933, ppl 806.9564, time elapsed: 53.9s
train: epoch 0, step 800, nll 147.0893, klw: 0.1609, KL 1.3469,  rc 146.9111, log_ppl 6.6390, ppl 764.3558, time elapsed: 70.9s
train: epoch 0, step 1000, nll nan, klw: 0.1761, KL nan,  rc nan, log_ppl nan, ppl nan, time elapsed: 87.8s
train: epoch 0, step 1200, nll nan, klw: 0.1914, KL nan,  rc nan, log_ppl nan, ppl nan, time elapsed: 104.0s

train: epoch 0, nll nan, KL nan, rc nan, log_ppl nan, ppl nan


valid: epoch 0, nll nan, KL nan, rc nan, log_ppl nan, ppl nan


test: epoch 0, nll nan, KL nan, rc nan, log_ppl nan, ppl nan

train: epoch 1, step 0, nll nan, klw: 0.2002, KL nan,  rc nan, log_ppl nan, ppl nan, time elapsed: 0.2s
train: epoch 1, step 200, nll nan, klw: 0.2154, KL nan,  rc nan, log_ppl nan, ppl nan, time elapsed: 15.5s
train: epoch 1, step 400, nll nan, klw: 0.2306, KL nan,  rc nan, log_ppl nan, ppl nan, time elapsed: 32.6s
train: epoch 1, step 600, nll nan, klw: 0.2458, KL nan,  rc nan, log_ppl nan, ppl nan, time elapsed: 49.4s
train: epoch 1, step 800, nll nan, klw: 0.2610, KL nan,  rc nan, log_ppl nan, ppl nan, time elapsed: 64.2s

Logs from vae-train example from texar-pytorch for reference:

train: epoch 0, step 0, nll 202.0137, klw 0.1002, KL 0.0218, rc 202.0115, log_ppl 9.2349, ppl 10248.7397, time_cost 0.6
train: epoch 0, step 200, nll 145.4623, klw 0.1154, KL 1.2449, rc 145.3253, log_ppl 6.5687, ppl 712.4463, time_cost 23.2
train: epoch 0, step 400, nll 139.3877, klw 0.1306, KL 1.8096, rc 139.1726, log_ppl 6.3007, ppl 544.9752, time_cost 45.7
train: epoch 0, step 600, nll 135.5956, klw 0.1458, KL 2.1700, rc 135.3190, log_ppl 6.1319, ppl 460.2903, time_cost 68.1
train: epoch 0, step 800, nll 133.1483, klw 0.1610, KL 2.3624, rc 132.8281, log_ppl 6.0154, ppl 409.6862, time_cost 90.7
train: epoch 0, step 1000, nll 130.8279, klw 0.1762, KL 2.4912, rc 130.4704, log_ppl 5.9178, ppl 371.5942, time_cost 112.9
train: epoch 0, step 1200, nll 129.0042, klw 0.1914, KL 2.5828, rc 128.6131, log_ppl 5.8383, ppl 343.1805, time_cost 134.9

train: epoch 0, nll 128.1585, KL 2.6265, rc 127.7491, log_ppl 5.7997, ppl 330.2135

valid: epoch 0, nll 119.1858, KL 2.9482, rc 116.2376, log_ppl 5.4454, ppl 231.7005

test: epoch 0, nll 118.1185, KL 2.8654, rc 115.2531, log_ppl 5.3893, ppl 219.0593
train: epoch 1, step 0, nll 117.8860, klw 0.2003, KL 3.2506, rc 117.2353, log_ppl 5.1465, ppl 171.8215, time_cost 0.1
train: epoch 1, step 200, nll 117.5880, klw 0.2155, KL 3.3343, rc 116.8944, log_ppl 5.2860, ppl 197.5439, time_cost 22.2
train: epoch 1, step 400, nll 115.9266, klw 0.2307, KL 3.5084, rc 115.1690, log_ppl 5.2572, ppl 191.9527, time_cost 44.4
train: epoch 1, step 600, nll 115.5976, klw 0.2459, KL 3.6699, rc 114.7753, log_ppl 5.2438, ppl 189.3889, time_cost 66.2
train: epoch 1, step 800, nll 115.2580, klw 0.2611, KL 3.8097, rc 114.3733, log_ppl 5.2201, ppl 184.9485, time_cost 88.1
train: epoch 1, step 1000, nll 114.8173, klw 0.2763, KL 3.9137, rc 113.8769, log_ppl 5.1968, ppl 180.7011, time_cost 109.8
train: epoch 1, step 1200, nll 114.5588, klw 0.2915, KL 3.9752, rc 113.5725, log_ppl 5.1819, ppl 178.0216, time_cost 130.7

train: epoch 1, nll 114.3053, KL 4.0000, rc 113.2953, log_ppl 5.1728, ppl 176.4117

valid: epoch 1, nll 113.7273, KL 4.2768, rc 109.4505, log_ppl 5.1961, ppl 180.5585

test: epoch 1, nll 112.7315, KL 4.1861, rc 108.5454, log_ppl 5.1436, ppl 171.3240
train: epoch 2, step 0, nll 107.0595, klw 0.3004, KL 4.1894, rc 105.8015, log_ppl 4.9940, ppl 147.5299, time_cost 0.1
train: epoch 2, step 200, nll 108.8126, klw 0.3156, KL 4.3093, rc 107.4859, log_ppl 4.9412, ppl 139.9346, time_cost 20.1```
@gpengzhi gpengzhi added the topic: examples Issue about examples label Dec 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: examples Issue about examples
Projects
None yet
Development

No branches or pull requests

2 participants