Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow hvd.DistributedOptimizer bug #3994

Open
Chenjingliang1 opened this issue Oct 12, 2023 · 0 comments · May be fixed by #3995
Open

tensorflow hvd.DistributedOptimizer bug #3994

Chenjingliang1 opened this issue Oct 12, 2023 · 0 comments · May be fixed by #3995
Labels

Comments

@Chenjingliang1
Copy link

Environment:

  1. Framework: (TensorFlow)
  2. Framework version:2.12
  3. Horovod version: master

hvd.DistributedOptimizer bug,when set groups parma can reproduce error.
opt = hvd.DistributedOptimizer(opt, op=hvd.Sum, groups=4)

https://github.com/horovod/horovod/blob/master/horovod/tensorflow/__init__.py#L322

tensors should be replaced with indexed_slices_list.

tensors could contains tensor and IndexedSlices ,only IndexedSlices has dense_shape attr.

new_indexed_slices = [tf.IndexedSlices(x, i, dense_shape=t.dense_shape) for x,i,t in zip(new_values, new_indices, tensors)]
->
new_indexed_slices = [tf.IndexedSlices(x, i, dense_shape=t.dense_shape) for x,i,t in zip(new_values, new_indices, indexed_slices_list)]
[0]<stderr>:  File "/opt/apps/local/lib64/python3/dist-packages/horovod/tensorflow/__init__.py", line 764, in compute_gradients
[0]<stderr>:    avg_grads = _filtered_reduce_grads(grads, vars)
[0]<stderr>:  File "/opt/apps/local/lib64/python3/dist-packages/horovod/tensorflow/__init__.py", line 729, in _filtered_reduce_grads
[0]<stderr>:    rg = self._allreduce_grads(rg, rv)
[0]<stderr>:  File "/opt/apps/local/lib64/python3/dist-packages/horovod/tensorflow/__init__.py", line 601, in allreduce_grads
[0]<stderr>:    ignore_name_scope=use_generic_names)
[0]<stderr>:  File "/opt/apps/local/lib64/python3/dist-packages/horovod/tensorflow/__init__.py", line 411, in _grouped_allreduce_cond
[0]<stderr>:    allreduce_fn, id_fn)
[0]<stderr>:  File "/opt/apps/local/lib64/python3/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
[0]<stderr>:    raise e.with_traceback(filtered_tb) from None
[0]<stderr>:  File "/opt/apps/local/lib64/python3/dist-packages/horovod/tensorflow/__init__.py", line 401, in allreduce_fn
[0]<stderr>:    return grouped_allreduce(tensors, *args, process_set=process_set, **kwargs)
[0]<stderr>:  File "/opt/apps/local/lib64/python3/dist-packages/horovod/tensorflow/__init__.py", line 324, in grouped_allreduce
[0]<stderr>:    dense_shape=t.dense_shape) for x,i,t in zip(new_values, new_indices, tensors)]
[0]<stderr>:  File "/opt/apps/local/lib64/python3/dist-packages/horovod/tensorflow/__init__.py", line 324, in <listcomp>
[0]<stderr>:    dense_shape=t.dense_shape) for x,i,t in zip(new_values, new_indices, tensors)]
[0]<stderr>:AttributeError: 'Tensor' object has no attribute 'dense_shape'
@Chenjingliang1 Chenjingliang1 linked a pull request Oct 24, 2023 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

1 participant