Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would you mind explaining an issue about gradient descent in lecture 1b #10

Open
theanhle opened this issue Mar 2, 2017 · 1 comment

Comments

@theanhle
Copy link

theanhle commented Mar 2, 2017

  • I've read your slides in lecture 1b (Deep neural network are our friends). In slide: "Gradient are our friends" explaining arg min C(w, b): w0, b0 = 2, 2; C(w0, b0) = 68. This's correct. But after that, I don't understand why the results of expression sum(-2(y^ - y)*x) are: 8, -40, -72. I think that: -8, 40, 72 are correct.
  • By the way, I implemented this simple network but when I trained it through 100 times, the value of cost function was not convergent. Here is my code:
import numpy as np 
x=np.array([1,5,6])
y=np.array([0,16,20])
w = 2
b = 2
epoches = 101
learning_rate = 0.05
for epoch in range(epoches):
    out = x*w + b
    cost = np.sum((y - out)**2) 
    if(epoch % 10 ==0):
        print('Epoch:', epoch, ', cost:', cost)
    dcdw = np.sum(-2*(out - y)*x)
    dcdb = np.sum(-2*(out - y))
    w = w - learning_rate*dcdw
    b = b - learning_rate*dcdb

, and here is result:
Epoch: 0 , cost: 68
Epoch: 10 , cost: 1.1268304493e+19
Epoch: 20 , cost: 3.00027905999e+36
Epoch: 30 , cost: 7.98849058743e+53
Epoch: 40 , cost: 2.12700154184e+71
Epoch: 50 , cost: 5.66331713039e+88
Epoch: 60 , cost: 1.50790492101e+106
Epoch: 70 , cost: 4.01492128811e+123
Epoch: 80 , cost: 1.06900592505e+141
Epoch: 90 , cost: 2.84631649237e+158
Epoch: 100 , cost: 7.57855254577e+175

Please explain for me. Thank you in advance!

@mleue
Copy link

mleue commented Mar 23, 2017

Hey, two issues here.

First: your gradient calculation is off. When you define the cost as (y - out)**2 then the derivative w.r.t. w will be -2*(y - out)*x and not -2*(out - y)*x. So it seems like you just mixed it up there. Same issue for your gradient w.r.t. b.

Second: Diverging cost is usually a sign for a too-high learning rate. Try something lower. Go in steps of dividing by 10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants