New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The transition_parser
in Spacy
is not compatible with the use of cuda for inference
#13462
Comments
transition_parser
in Spacy
is not compatible with the use of cuda for inference
This is expected behaviour. The transition parser involves making a prediction on each word of the sentence, and then making a state transition using the action predicted. This requires features from the current state, so the prediction cannot be made all at once across the sentence. This sequence of small matrix multiplications is slow on GPU, so it's faster to do the whole-document feature extraction on GPU, and then copy the result over to the CPU to predict the transitions. We've actually tried pretty extensively to get away from this, but the transition-based model is very good, and we can't match it with a more GPU-friendly approach. A key issue is that the transition-based approach is able to operate on unsegmented documents, so it can do joint sentence segmentation and parsing. You can find the biaffine parser module in |
I am facing an issue where am trying to run a spacy based pipeline, using the
en_core_web_trf:3.7.3
model, whereby thetransition_parser
seems to be placing tensors on cpu instead of the gpu as can be seen in the logs below:I tried multiple fixes, such as using
torch.set_default_device("cuda:0")
, andtorch.set_default_dtype
, but this doesn't seem to be working.How to reproduce the behaviour
This error is encountered using the model in an MLServer deployment. It is a bit difficult to provide reproduction code here.
Your Environment
The text was updated successfully, but these errors were encountered: