Suggest: batchsize dimension should always be not broadcastable even if batchsize==1 #873

guoxuesong · 2017-08-09T05:50:55Z

Following code in lasagne/layers/input.py:

        input_var_type = T.TensorType(theano.config.floatX,
                                      [s == 1 for s in self.shape])

would best to be changed to:

    input_var_type = T.TensorType(theano.config.floatX,
                                      [False,]+[s == 1 for s in self.shape[1:]])

The text was updated successfully, but these errors were encountered:

f0k · 2017-09-08T18:30:02Z

Sorry for the delay. Why would you want this? There could be use cases for a broadcastable batch dimension.

guoxuesong · 2017-09-09T07:09:55Z

@f0k Consider this case: We training a model using batchsize > 1, and inference using batchsize = 1. This will break the code using the output of a layer which has shape (batchsize,) as a vector. When inference it is not a vector because dim 0 is breadcastable, but when training it is vector.

f0k · 2017-09-15T14:43:51Z

This will break the code using the output of a layer which has shape (batchsize,) as a vector.

Which layer breaks when it sees a vector of broadcast pattern (True,) instead of (False,)? And if you train with batchsize > 1, why not leave the batchsize unspecified as None when compiling for inference?

guoxuesong · 2017-09-15T15:49:25Z

@f0k I got your point, but leaving the batchsize unspecified as None looks like workaround tips, instead of designed to be.

Follow the documents of lasagne, If I want better performance and I know the batchsize, I should provide it. If we treat the documents as representation of the design, I think the impliment should support the direct thinking from reading the documents.

Maybe broadcastable batch dimension is useful in some case, but nobody know it if not look at the code. While, thinking batchsize==1 would act similar to batchsize>1 is reasonable if only look at the documents.

btw, The detail of my case, I passed classify target of shape (batchsize,) through a InputLayer, and call categorical_crossentropy on the output of the layer. So no lasagne code breaks.

f0k · 2017-09-15T16:14:51Z

Follow the documents of lasagne, If I want better performance and I know the batchsize, I should provide it.

Yes, I agree. We want it to work better or equally well when providing the batchsize, not worse.

The detail of my case, I passed classify target of shape (batchsize,) through a InputLayer, and call categorical_crossentropy on the output of the layer. So no lasagne code breaks.

I still don't understand what breaks -- it shouldn't make a difference whether it's broadcastable or not?

guoxuesong · 2017-09-15T21:48:08Z

@f0k When calling lasagne.objectives.categorical_crossentropy, theano would raise a TypeError: integer vector required for argument: true_one_of_n(got type: TensorType(int64, (True,)) instead of: TensorType(int64, vector))

Traceback (most recent call last):
  File "mnist.py", line 39, in <module>
    main()
  File "/home/ubuntu/deepstacks/deepstacks/framework/main.py", line 2197, in main
    run(args)
  File "/home/ubuntu/deepstacks/deepstacks/framework/main.py", line 1478, in run
    ordered_errors = get_ordered_errors(raw_errors)
  File "/home/ubuntu/deepstacks/deepstacks/lasagne/utils.py", line 17, in ordered_errors
    res += [[prefix+t, map(curry(lasagne.layers.get_output,deterministic=deterministic), errors[t])]]
  File "/home/ubuntu/deepstacks/deepstacks/utils/curry.py", line 14, in __call__
    return self.fun(*(self.pending + args), **kw)
  File "/home/ubuntu/Lasagne/lasagne/layers/helper.py", line 197, in get_output
    all_outputs[layer] = layer.get_output_for(layer_inputs, **kwargs)
  File "/home/ubuntu/Lasagne/lasagne/layers/merge.py", line 352, in get_output_for
    output = self.merge_function(output, input)
  File "/home/ubuntu/deepstacks/deepstacks/framework/macros.py", line 13, in <lambda>
    'equal':[target,'classify',lambda x,y:r*lasagne.objectives.categorical_crossentropy(x,y),],
  File "/home/ubuntu/Lasagne/lasagne/objectives.py", line 179, in categorical_crossentropy
    return theano.tensor.nnet.categorical_crossentropy(predictions, targets)
  File "/home/ubuntu/Theano/theano/tensor/nnet/nnet.py", line 2099, in categorical_crossentropy
    return crossentropy_categorical_1hot(coding_dist, true_dist)
  File "/home/ubuntu/Theano/theano/gof/op.py", line 615, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/ubuntu/Theano/theano/tensor/nnet/nnet.py", line 1452, in make_node
    tensor.lvector))
TypeError: integer vector required for argument: true_one_of_n(got type: TensorType(int64, (True,)) instead of: TensorType(int64, vector))

following patch was applied to deepstacks/deepstacks/framework/main.py to produce similar behaviour of lasagne.layers.InputLayer.

--- main.py     2017-09-16 06:08:06.931262231 +0800
+++ main-likelasagne.py 2017-09-16 06:08:08.427262241 +0800

@@ -1453,7 +1453,7 @@
         logging.info(string.strip(out.getvalue()))
         name=k
         input_var_type = T.TensorType(dtypes[k],
-                [False,]+[s == 1 for s in m[k][1:]])
+                [s == 1 for s in m[k][:]])
         var_name = ("%s.input" % name) if name is not None else "input"
         input_var = input_var_type(var_name)
         inputs[k]=lasagne.layers.InputLayer(name=name,input_var=input_var,shape=m[k])

f0k · 2017-12-31T12:36:05Z

When calling lasagne.objectives.categorical_crossentropy, theano would raise a TypeError

Hmm, this should either be changed in Theano's or Lasagne's categorical_crossentropy(). It should be able to handle a singleton batch dimension.

guoxuesong · 2018-01-24T12:20:36Z

I agree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest: batchsize dimension should always be not broadcastable even if batchsize==1 #873

Suggest: batchsize dimension should always be not broadcastable even if batchsize==1 #873

guoxuesong commented Aug 9, 2017

f0k commented Sep 8, 2017

guoxuesong commented Sep 9, 2017

f0k commented Sep 15, 2017

guoxuesong commented Sep 15, 2017 •

edited

f0k commented Sep 15, 2017

guoxuesong commented Sep 15, 2017 •

edited

f0k commented Dec 31, 2017

guoxuesong commented Jan 24, 2018

Suggest: batchsize dimension should always be not broadcastable even if batchsize==1 #873

Suggest: batchsize dimension should always be not broadcastable even if batchsize==1 #873

Comments

guoxuesong commented Aug 9, 2017

f0k commented Sep 8, 2017

guoxuesong commented Sep 9, 2017

f0k commented Sep 15, 2017

guoxuesong commented Sep 15, 2017 • edited

f0k commented Sep 15, 2017

guoxuesong commented Sep 15, 2017 • edited

f0k commented Dec 31, 2017

guoxuesong commented Jan 24, 2018

guoxuesong commented Sep 15, 2017 •

edited

guoxuesong commented Sep 15, 2017 •

edited