Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session 5; provided pre-trained model incompatible (and not loaded) #78

Open
gabrielmontagne opened this issue Jun 26, 2017 · 6 comments

Comments

@gabrielmontagne
Copy link

On Session 5, part 2, the check

    if os.path.exists(ckpt_name):
        saver.restore(sess, ckpt_name)
        print("Model restored.")

... does not match the provided checkpoint file name exactly trump.ckpt.data-00000-of-00001 so the load is not attempted.
Therefore, the model is untrained for the example and the predictions are all non-deterministic random strings,

!!--sssjjj44www???ggvvvwwwx??ggggvaaa577777t777t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t77t777t​
??ffd88l:ttttt?efiiiii8880cc99v666444sszssxrpp/mmmxxxxr!//33gggggkgggkmmmeeiiiiiiii888ccc8994444sszszsírpp/mmmxxxxr!//33gggggkgggkmmmeeiiiiiiii888ccc8994444sszszsírpp/mmmxxxxr!//33gggggkgggkmmmeeiiiiiiii888ccc8994444sszszsírpp/mmmxxxxr!//33gggggkgggkmmmeeiiiiiiii888ccc8994444sszszsírpp/mmmxxxxr!//33gggggkgggkmmmeeiiiiiiii888ccc8994444sszszsírpp/mmmxxxxr!//33gggggkgggkmmmeeiiiiiiii888ccc8994444sszszsírpp/mmmxxxxr!//33gggggkgggkmmmeeiiiiiiii888ccc8994444sszszsírpp/mmmxxxxr!//33gggggkgggkmmmeeiiiii​

etc.

If the check is removed, then the saver does pick up the provided model, but then cannot load it.
It blows up with the error below. I've also noticed, while chasing this, that the encoder and decoder used for this part of the exercise are actually initialized with a different text -- the one from Part 4 - Character-Level Language Model; I guess that's fine as the sets are probably equivalent in this case, but wouldn't it be better to add a cell just to regenerate these for the latter section?

I'm running TF 1.1.0 wit GPU support on a Debian 4.9.18-1 laptop.

INFO:tensorflow:Restoring parameters from ./trump.ckpt
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1038     try:
-> 1039       return fn(*args)
   1040     except errors.OpError as e:

~/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1020                                  feed_dict, fetch_list, target_list,
-> 1021                                  status, run_metadata)
   1022 

~/anaconda3/envs/potrero/lib/python3.6/contextlib.py in __exit__(self, type, value, traceback)
     88             try:
---> 89                 next(self.gen)
     90             except StopIteration:

~/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status()
    465           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466           pywrap_tensorflow.TF_GetCode(status))
    467   finally:

InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2048] rhs shape= [800]
	 [[Node: save/Assign_17 = Assign[T=DT_FLOAT, _class=["loc:@rnn/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](rnn/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/biases, save/RestoreV2_17/_49)]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-8-999cfb8483be> in <module>()
     11     saver = tf.train.Saver()
     12     # if os.path.exists(ckpt_name):
---> 13     saver.restore(sess, ckpt_name)
     14     # print("Model restored.")
     15 

~/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
   1455     logging.info("Restoring parameters from %s", save_path)
   1456     sess.run(self.saver_def.restore_op_name,
-> 1457              {self.saver_def.filename_tensor_name: save_path})
   1458 
   1459   @staticmethod

~/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    776     try:
    777       result = self._run(None, fetches, feed_dict, options_ptr,
--> 778                          run_metadata_ptr)
    779       if run_metadata:
    780         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    980     if final_fetches or final_targets:
    981       results = self._do_run(handle, final_targets, final_fetches,
--> 982                              feed_dict_string, options, run_metadata)
    983     else:
    984       results = []

~/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1030     if handle is None:
   1031       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1032                            target_list, options, run_metadata)
   1033     else:
   1034       return self._do_call(_prun_fn, self._session, handle, feed_dict,

~/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1050         except KeyError:
   1051           pass
-> 1052       raise type(e)(node_def, op, message)
   1053 
   1054   def _extend_graph(self):

InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2048] rhs shape= [800]
	 [[Node: save/Assign_17 = Assign[T=DT_FLOAT, _class=["loc:@rnn/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](rnn/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/biases, save/RestoreV2_17/_49)]]

Caused by op 'save/Assign_17', defined at:
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2698, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2802, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-8-999cfb8483be>", line 11, in <module>
    saver = tf.train.Saver()
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1056, in __init__
    self.build()
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1086, in build
    restore_sequentially=self._restore_sequentially)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 691, in build
    restore_sequentially, reshape)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 155, in restore
    self.op.get_shape().is_fully_defined())
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 270, in assign
    validate_shape=validate_shape)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
    use_locking=use_locking, name=name)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/gabriel/anaconda3/envs/potrero/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [2048] rhs shape= [800]
	 [[Node: save/Assign_17 = Assign[T=DT_FLOAT, _class=["loc:@rnn/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](rnn/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/biases, save/RestoreV2_17/_49)]]
@indraastra
Copy link
Contributor

I ran into a similar error but TF complained about a different, more interpretable issue:

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [512,53] rhs shape= [200,53]
	 [[Node: save/Assign_7 = Assign[T=DT_FLOAT, _class=["loc:@prediction/W"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](prediction/W/Adam_1, save/RestoreV2_7)]]

When I changed n_cells from 512 to 200, the error went away, but I started getting mostly garbage output again.

@indraastra
Copy link
Contributor

Ah, I forgot to make the "os.path.exists(ckpt_name + '.index')" fix at inference time. If I add that after initialization I'm seeing reasonable output (well, as reasonable as one can expect in this case).

@pkmital
Copy link
Owner

pkmital commented Jul 20, 2017

Is this working now then or is there still some issue?

@indraastra
Copy link
Contributor

It works for me with the following changes (but doesn't work as-is):

  1. Replace all existence checks for ckpt_name with ckpt_name + '.index'
  2. Change n_cells from 512 to 200

@pkmital
Copy link
Owner

pkmital commented Sep 6, 2017

New model provided in 5ef9564

@pkmital pkmital closed this as completed Sep 6, 2017
@pkmital
Copy link
Owner

pkmital commented Sep 6, 2017

Sorry this is still half an issue: the new model is compatible, but in infer, the code which checks if the model exists is not working and should use e.g. latest_checkpoint or remove the check entirely, or else the model is not loaded.

@pkmital pkmital reopened this Sep 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants