[Fixed]Can't Train Phi3 on Kaggle (ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)) #443

ShazzadAliShozol opened this issue May 9, 2024 · 4 comments


I'm trying to train Phi3 on kaggle. I have already trained it for 300 steps on google colab no issues and saved the model on wandb. then i downloaded the model from wandb to train in kaggle but i keep on getting this error whenver i try to train it. Can anyone help me out with it? It seems the error is coming from triton. I've installed the dependencies as instructed in the notebook.

!pip install -U "xformers<0.0.26" --index-url
!pip install "unsloth[kaggle-new] @ git+"

# Temporary fix for
!pip install datasets==2.16.0 fsspec==2023.10.0 gcsfs==2023.10.0

ValueError                                Traceback (most recent call last)
Cell In[12], line 1
----> 1 trainer_stats = trainer.train(resume_from_checkpoint = True)

File /opt/conda/lib/python3.10/site-packages/trl/trainer/, in SFTTrainer.train(self, *args, **kwargs)
    358 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:
    359     self.model = self._trl_activate_neftune(self.model)
--> 361 output = super().train(*args, **kwargs)
    363 # After training we make sure to retrieve back the original forward pass method
    364 # for the embedding layer by removing the forward post hook.
    365 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:

File /opt/conda/lib/python3.10/site-packages/transformers/, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1778         hf_hub_utils.enable_progress_bars()
   1779 else:
-> 1780     return inner_training_loop(
   1781         args=args,
   1782         resume_from_checkpoint=resume_from_checkpoint,
   1783         trial=trial,
   1784         ignore_keys_for_eval=ignore_keys_for_eval,
   1785     )

File <string>:355, in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)

File /opt/conda/lib/python3.10/site-packages/transformers/, in Trainer.training_step(self, model, inputs)
   3033     return loss_mb.reduce_mean().detach().to(self.args.device)
   3035 with self.compute_loss_context_manager():
-> 3036     loss = self.compute_loss(model, inputs)
   3038 if self.args.n_gpu > 1:
   3039     loss = loss.mean()  # mean() to average on multi-gpu parallel training

File /opt/conda/lib/python3.10/site-packages/transformers/, in Trainer.compute_loss(self, model, inputs, return_outputs)
   3057 else:
   3058     labels = None
-> 3059 outputs = model(**inputs)
   3060 # Save past state if it exists
   3061 # TODO: this needs to be fixed and made cleaner later.
   3062 if self.args.past_index >= 0:

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/, in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/, in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/, in DataParallel.forward(self, *inputs, **kwargs)
    183     return self.module(*inputs[0], **module_kwargs[0])
    184 replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
--> 185 outputs = self.parallel_apply(replicas, inputs, module_kwargs)
    186 return self.gather(outputs, self.output_device)

File /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/, in DataParallel.parallel_apply(self, replicas, inputs, kwargs)
    199 def parallel_apply(self, replicas: Sequence[T], inputs: Sequence[Any], kwargs: Any) -> List[Any]:
--> 200     return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])

File /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/, in parallel_apply(modules, inputs, kwargs_tup, devices)
    106     output = results[i]
    107     if isinstance(output, ExceptionWrapper):
--> 108         output.reraise()
    109     outputs.append(output)
    110 return outputs

File /opt/conda/lib/python3.10/site-packages/torch/, in ExceptionWrapper.reraise(self)
    718 except TypeError:
    719     # If the exception takes multiple arguments, don't try to
    720     # instantiate since we don't know how to
    721     raise RuntimeError(msg) from None
--> 722 raise exception

ValueError: Caught ValueError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/", line 83, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/unsloth/models/", line 882, in PeftModelForCausalLM_fast_forward
    return self.base_model(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/peft/tuners/", line 161, in forward
    return self.model.forward(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/unsloth/models/", line 213, in MistralForCausalLM_fast_forward
    outputs = self.model(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/unsloth/models/", line 650, in LlamaModel_fast_forward
    hidden_states = Unsloth_Offloaded_Gradient_Checkpointer.apply(
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/amp/", line 115, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/unsloth/models/", line 369, in forward
    (output,) = forward_function(hidden_states, *args)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/unsloth/models/", line 432, in LlamaDecoderLayer_fast_forward
    hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states)
  File "/opt/conda/lib/python3.10/site-packages/unsloth/kernels/", line 190, in fast_rms_layernorm
    out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/conda/lib/python3.10/site-packages/unsloth/kernels/", line 144, in forward
  File "/opt/conda/lib/python3.10/site-packages/triton/runtime/", line 550, in run
ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)
Did you turn on the GPU?

Did you turn on the GPU?

Yes. I had Gpu T4 x2 for the sessions.

Try it:

@ShazzadAliShozol ShazzadAliShozol changed the title Can't Train Phi3 on Kaggle (ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)) [Fixed]Can't Train Phi3 on Kaggle (ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)) May 13, 2024
Yeah It's working now. Thank you very much. Since the problem is fixed I'll close the issue

