Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This commit fixes #191 #205

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

dev-chauhan
Copy link

@dev-chauhan dev-chauhan commented Feb 24, 2020

Replaced outdated truncate!() with reset!()

@dev-chauhan dev-chauhan requested review from MikeInnes and avik-pal and removed request for MikeInnes February 24, 2020 20:30
@MikeInnes
Copy link
Member

truncate! actually does something slightly different from reset!, and probably isn't necessary if using Zygote. I guess this model uses an old Flux version, though; if changing this we need to also update the manifests.

cc @dhairyagandhi96 for the state of the manifest for this model, I'm not completely up to date on how we're organising the zoo.

@dev-chauhan
Copy link
Author

dev-chauhan commented Feb 25, 2020

I have done these changes without looking into the workings of flux.
Now as I have looked how gradients are calculated using Zygote in Flux, I have come to conclusion that we do not need any truncate! function while using Zygote to calculate gradients.
Need for truncate! arose due to Tracker.jl which had been tracking gradients of params for all inputs given to rnn cell e.g. if we have very large sequence a = [a1 a2] (a1 is first half and a2 is second half of a), if we call loss(a1) , update!(), loss(a2), update!() we will have result equivalent to loss(a1), update!(), reset!(), loss([a1 a2]), update!() because while we do loss(a2) after loss(a1) Tracker still remembers what graph computation of a1 created so in calculation of gradient it will calculate as if input was loss([a1 a2]).
If we do loss(a1) ; update!(); truncate!(); loss(a2); update!()
gradient due to loss(a1) will not be considered second time which saves computation + time and final outcome will be same as loss([a1 a2]); update!() which we want.
But in Zygote we calculate gradient for given function and given input so when we do
loss(a1) ; update!(); loss(a2); update!()
it is same as loss([a1 a2]); update!(), because after loss(a1) state of rnn is changed but during calculation of loss(a2), calculation done before is irrelevent but state of rnn continues where it left so final effect will be same as loss([a1 a2]); update!() which is what we want.
So in Zygote we do not need truncate! function.
Am I correct ?

@MikeInnes
Copy link
Member

Yup, that's exactly right.

@CarloLucibello
Copy link
Member

so you should just delete the truncate line here, right?

@dev-chauhan
Copy link
Author

Yes but according to the project environment (model-zoo) uses Flux with Tracker so to do this first have to update that and check for all models if they are compatible with new version or not. It was me who tried to run char-rnn.jl on new version of flux but if we use project env it works as should work.
So first we have to decide how we want to manage this project as mentioned by @MikeInnes. If we go with current env no need to change any thing with truncate else if we're going to work with the current version of Flux we have to remove truncate line in master branch or say remove reset from this commit.
We can also create env for char-rnn directory instead of changing Project.toml and Manifest.toml.

@CarloLucibello
Copy link
Member

we should update this to reflect latest Flux, zygote changes. You can create a separate Project.toml and Manifest.toml for this example

See also the discussion in #207

@CarloLucibello
Copy link
Member

If you have the time and the patience to do it, You could entirely revamp this script to resemble the one in #207, in an effort to improve consistency among the various scripts. Otherwise, we can just merge this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants