Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest: batchsize dimension should always be not broadcastable even if batchsize==1 #873

Open
guoxuesong opened this issue Aug 9, 2017 · 8 comments

Comments

@guoxuesong
Copy link

Following code in lasagne/layers/input.py:

        input_var_type = T.TensorType(theano.config.floatX,
                                      [s == 1 for s in self.shape])

would best to be changed to:

    input_var_type = T.TensorType(theano.config.floatX,
                                      [False,]+[s == 1 for s in self.shape[1:]])
@f0k
Copy link
Member

f0k commented Sep 8, 2017

Sorry for the delay. Why would you want this? There could be use cases for a broadcastable batch dimension.

@guoxuesong
Copy link
Author

@f0k Consider this case: We training a model using batchsize > 1, and inference using batchsize = 1. This will break the code using the output of a layer which has shape (batchsize,) as a vector. When inference it is not a vector because dim 0 is breadcastable, but when training it is vector.

@f0k
Copy link
Member

f0k commented Sep 15, 2017

This will break the code using the output of a layer which has shape (batchsize,) as a vector.

Which layer breaks when it sees a vector of broadcast pattern (True,) instead of (False,)? And if you train with batchsize > 1, why not leave the batchsize unspecified as None when compiling for inference?

@guoxuesong
Copy link
Author

guoxuesong commented Sep 15, 2017

@f0k I got your point, but leaving the batchsize unspecified as None looks like workaround tips, instead of designed to be.

Follow the documents of lasagne, If I want better performance and I know the batchsize, I should provide it. If we treat the documents as representation of the design, I think the impliment should support the direct thinking from reading the documents.

Maybe broadcastable batch dimension is useful in some case, but nobody know it if not look at the code. While, thinking batchsize==1 would act similar to batchsize>1 is reasonable if only look at the documents.

btw, The detail of my case, I passed classify target of shape (batchsize,) through a InputLayer, and call categorical_crossentropy on the output of the layer. So no lasagne code breaks.

@f0k
Copy link
Member

f0k commented Sep 15, 2017

Follow the documents of lasagne, If I want better performance and I know the batchsize, I should provide it.

Yes, I agree. We want it to work better or equally well when providing the batchsize, not worse.

The detail of my case, I passed classify target of shape (batchsize,) through a InputLayer, and call categorical_crossentropy on the output of the layer. So no lasagne code breaks.

I still don't understand what breaks -- it shouldn't make a difference whether it's broadcastable or not?

@guoxuesong
Copy link
Author

guoxuesong commented Sep 15, 2017

@f0k When calling lasagne.objectives.categorical_crossentropy, theano would raise a TypeError: integer vector required for argument: true_one_of_n(got type: TensorType(int64, (True,)) instead of: TensorType(int64, vector))

Traceback (most recent call last):
  File "mnist.py", line 39, in <module>
    main()
  File "/home/ubuntu/deepstacks/deepstacks/framework/main.py", line 2197, in main
    run(args)
  File "/home/ubuntu/deepstacks/deepstacks/framework/main.py", line 1478, in run
    ordered_errors = get_ordered_errors(raw_errors)
  File "/home/ubuntu/deepstacks/deepstacks/lasagne/utils.py", line 17, in ordered_errors
    res += [[prefix+t, map(curry(lasagne.layers.get_output,deterministic=deterministic), errors[t])]]
  File "/home/ubuntu/deepstacks/deepstacks/utils/curry.py", line 14, in __call__
    return self.fun(*(self.pending + args), **kw)
  File "/home/ubuntu/Lasagne/lasagne/layers/helper.py", line 197, in get_output
    all_outputs[layer] = layer.get_output_for(layer_inputs, **kwargs)
  File "/home/ubuntu/Lasagne/lasagne/layers/merge.py", line 352, in get_output_for
    output = self.merge_function(output, input)
  File "/home/ubuntu/deepstacks/deepstacks/framework/macros.py", line 13, in <lambda>
    'equal':[target,'classify',lambda x,y:r*lasagne.objectives.categorical_crossentropy(x,y),],
  File "/home/ubuntu/Lasagne/lasagne/objectives.py", line 179, in categorical_crossentropy
    return theano.tensor.nnet.categorical_crossentropy(predictions, targets)
  File "/home/ubuntu/Theano/theano/tensor/nnet/nnet.py", line 2099, in categorical_crossentropy
    return crossentropy_categorical_1hot(coding_dist, true_dist)
  File "/home/ubuntu/Theano/theano/gof/op.py", line 615, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/ubuntu/Theano/theano/tensor/nnet/nnet.py", line 1452, in make_node
    tensor.lvector))
TypeError: integer vector required for argument: true_one_of_n(got type: TensorType(int64, (True,)) instead of: TensorType(int64, vector))

following patch was applied to deepstacks/deepstacks/framework/main.py to produce similar behaviour of lasagne.layers.InputLayer.

--- main.py     2017-09-16 06:08:06.931262231 +0800
+++ main-likelasagne.py 2017-09-16 06:08:08.427262241 +0800

@@ -1453,7 +1453,7 @@
         logging.info(string.strip(out.getvalue()))
         name=k
         input_var_type = T.TensorType(dtypes[k],
-                [False,]+[s == 1 for s in m[k][1:]])
+                [s == 1 for s in m[k][:]])
         var_name = ("%s.input" % name) if name is not None else "input"
         input_var = input_var_type(var_name)
         inputs[k]=lasagne.layers.InputLayer(name=name,input_var=input_var,shape=m[k])

@f0k
Copy link
Member

f0k commented Dec 31, 2017

When calling lasagne.objectives.categorical_crossentropy, theano would raise a TypeError

Hmm, this should either be changed in Theano's or Lasagne's categorical_crossentropy(). It should be able to handle a singleton batch dimension.

@guoxuesong
Copy link
Author

I agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants