Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroundingDINO Python Package #88

Open
FANGAreNotGnu opened this issue May 3, 2023 · 20 comments
Open

GroundingDINO Python Package #88

FANGAreNotGnu opened this issue May 3, 2023 · 20 comments

Comments

@FANGAreNotGnu
Copy link

Hi, thanks for the great work! Is there a plan for the official pypi release?

@rentainhe
Copy link
Collaborator

rentainhe commented May 4, 2023

Hi, thanks for the great work! Is there a plan for the official pypi release?

Sure, for more convenient usage, we will try to update a pypi version for the users in the future release

@tonyhoo
Copy link

tonyhoo commented May 5, 2023

Would like to have this available in PyPI as well for easy installation

@giswqs
Copy link

giswqs commented May 22, 2023

I have added the package to PyPI. Will try to get it on conda-forge as well. I would be happy add maintainers to the package if anyone is interested.

PyPI: https://pypi.org/project/groundingdino-py
GitHub: https://github.com/giswqs/GroundingDINO

pip install groundingdino-py

PS: There are some other packages on PyPI with the name groundingdino in it, so I had to use an alternative package name groundingdino-py as PyPI does not allow the groundingdino name.

I wanted to add groundingdino to PyPI for the downstream package segment-geospatial
opengeos/segment-geospatial#62 (comment)

@yeldarby
Copy link

yeldarby commented Jun 2, 2023

I have added the package to PyPI. Will try to get it on conda-forge as well. I would be happy add maintainers to the package if anyone is interested.

@giswqs - looks like that strips out the CUDA stuff so only runs on CPU (but with no warning); is that correct?

@giswqs
Copy link

giswqs commented Jun 2, 2023

@yeldarby I think it can still utilize GPU. The GPU installation is handled by torch-gpu, so GroundingDINO does not have to handle it.

I learned it from @darshats at #8 and his repo main...darshats:GroundingDINO:main

I have been using GroundingDINO with the samgeo-geospatial package. It seems working fine. https://samgeo.gishub.org/examples/text_prompts/

@yeldarby
Copy link

yeldarby commented Jun 2, 2023

The GPU installation is handled by torch-gpu, so GroundingDINO does not have to handle it.

GroundingDINO has these custom C++ and CUDA files: https://github.com/IDEA-Research/GroundingDINO/tree/main/groundingdino/models/GroundingDINO/csrc

Are the compiled versions of those not needed to run with GPU?

@giswqs
Copy link

giswqs commented Jun 2, 2023

It failed to compile on my Linux machine with the latest cuda, and that's why I had to remove those cuda stuff from GroundingDINO. After that, the installation went smoothly, and I was able to use it with SAM. It is pretty fast. See the example below. However, I am not sure it GroundingDINO uses GPU or not in this case as I am not a GroundingDINO expert. It would be great if GroundingDINO can make the installation a bit more smooth. There are many installation related issues reported here, and I spent hours trying to install it.

03-text-prompt.mp4

@yeldarby
Copy link

yeldarby commented Jun 2, 2023

It would be great if GroundingDINO can make the installation a bit more smooth. There are many installation related issues reported here, and I spent hours trying to install it.

Definitely agree! I've been trying to get generic wheels to build linked to various versions of PyTorch and CUDA with torch-extension-builder, but haven't quite been able to get it working.

I posted a bounty on Replit as a bit of an incentive if anyone wants to make the install more robust! https://replit.com/bounties/@roboflow/package-open-source

@yeldarby
Copy link

yeldarby commented Jun 3, 2023

As a followup, I ran a benchmark:

  • Installing via cloning the repo (A100 on Colab): 100 inferences in 19.1 seconds (5.2 fps)
  • Installing via cloning the repo (CPU on Colab): 100 inferences in 1709 seconds (0.06 fps)
  • Installing via @qiswqs' groundingdino-py fork from PyPi (A100 on Colab): 100 inferences in 22.1 seconds (4.5 fps)

So the GPU acceleration definitely makes a big difference (& the fork appears to be running mostly on GPU). That's probably good enough for my purposes! Besides the 15% slowdown, the only downside is needing to restart the runtime after installing due to The following packages were previously imported in this runtime: [cycler,pyparsing].

Do users have to supply their own config/GroundingDINO_SwinT_OGC.py to use your package from pip? Or is there an easy way to use the bundled one?

@giswqs
Copy link

giswqs commented Jun 3, 2023

@yeldarby Thanks for sharing the benchmark. It is great to know the pip package does run on GPU.

The pip package already includes the config files. The package only removes the cuda stuff to make the installation easier. All other files remain the same as the original GroundingDINO repo. See https://github.com/giswqs/GroundingDINO/tree/main/groundingdino/config

@rohit901
Copy link

rohit901 commented Jun 5, 2023

Thanks a lot for providing the pip package @giswqs. Even I faced a lot of issues while trying to compile/install this package on my remote machine. This pip package seems to work properly. Can the authors @rentainhe @SlongLiu update this pip installation in the README?

@giswqs
Copy link

giswqs commented Jun 5, 2023

If anyone wants to be the maintainer of the pypi package, please let me know. I would be happy to add maintainers or transfer the ownership.

@rentainhe rentainhe pinned this issue Jun 5, 2023
@rentainhe
Copy link
Collaborator

rentainhe commented Jun 5, 2023

Thanks a lot for providing the pip package @giswqs. Even I faced a lot of issues while trying to compile/install this package on my remote machine. This pip package seems to work properly. Can the authors @rentainhe @SlongLiu update this pip installation in the README?

Sure! Thank you so much for providing this!!! We will highlight it in README and Pin this Issue, you can refine the issue name to let more people know this update~

@rentainhe rentainhe changed the title Is there a plan for the official pypi release? GroundingDINO Python Package Jun 5, 2023
@rohit901
Copy link

rohit901 commented Jun 6, 2023

I would like to update on this pip package @giswqs @rentainhe
Yesterday I did not try running inference while using the installation from the pip package, I had just imported the module and it was loading fine so I thought it should be working fine.

However, today I was trying to run inference on the model on GPU after installing this library using the pypi package provided by @giswqs, the moment I import the library

from groundingdino.util.inference import load_model, load_image, predict, annotate

I get following warning that it has failed to load custom C++ ops.

[/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py:31](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py:31): UserWarning: Failed to load custom C++ ops. Running on CPU mode Only!
  warnings.warn("Failed to load custom C++ ops. Running on CPU mode Only!")

Further when trying to run inference by passing some images to the model on GPU, i'm getting the following error and the code is not working:

with NameError: name '_C' is not defined
complete logs:

8 with torch.no_grad():
----> 9     output = model(image, captions = TEXT_PROMPT_LIST)
     11 prediction_logits = output["pred_logits"].cpu().sigmoid()  # prediction_logits.shape = (batch, nq, 256)
     12 prediction_boxes = output["pred_boxes"].cpu() # prediction_boxes.shape = (batch, nq, 4)

File [~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/nn/modules/module.py:1102](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/nn/modules/module.py:1102), in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File [~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/groundingdino.py:313](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/groundingdino.py:313), in GroundingDINO.forward(self, samples, targets, **kw)
    310         poss.append(pos_l)
    312 input_query_bbox = input_query_label = attn_mask = dn_meta = None
--> 313 hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer(
    314     srcs, masks, input_query_bbox, poss, input_query_label, attn_mask, text_dict
    315 )
    317 # deformable-detr-like anchor update
    318 outputs_coord_list = []

File [~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/nn/modules/module.py:1102](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/nn/modules/module.py:1102), in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File [~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/transformer.py:258](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/transformer.py:258), in Transformer.forward(self, srcs, masks, refpoint_embed, pos_embeds, tgt, attn_mask, text_dict)
    253 enc_topk_proposals = enc_refpoint_embed = None
    255 #########################################################
    256 # Begin Encoder
    257 #########################################################
--> 258 memory, memory_text = self.encoder(
    259     src_flatten,
    260     pos=lvl_pos_embed_flatten,
    261     level_start_index=level_start_index,
    262     spatial_shapes=spatial_shapes,
    263     valid_ratios=valid_ratios,
    264     key_padding_mask=mask_flatten,
    265     memory_text=text_dict["encoded_text"],
    266     text_attention_mask=~text_dict["text_token_mask"],
    267     # we ~ the mask . False means use the token; True means pad the token
    268     position_ids=text_dict["position_ids"],
    269     text_self_attention_masks=text_dict["text_self_attention_masks"],
    270 )
    271 #########################################################
    272 # End Encoder
    273 # - memory: bs, \sum{hw}, c
   (...)
    277 # - enc_intermediate_refpoints: None or (nenc+1, bs, nq, c) or (nenc, bs, nq, c)
    278 #########################################################
    279 text_dict["encoded_text"] = memory_text

File [~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/nn/modules/module.py:1102](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/nn/modules/module.py:1102), in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File [~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/transformer.py:576](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/transformer.py:576), in TransformerEncoder.forward(self, src, pos, spatial_shapes, level_start_index, valid_ratios, key_padding_mask, memory_text, text_attention_mask, pos_text, text_self_attention_masks, position_ids)
    574 # main process
    575 if self.use_transformer_ckpt:
--> 576     output = checkpoint.checkpoint(
    577         layer,
    578         output,
    579         pos,
    580         reference_points,
    581         spatial_shapes,
    582         level_start_index,
    583         key_padding_mask,
    584     )
    585 else:
    586     output = layer(
    587         src=output,
    588         pos=pos,
   (...)
    592         key_padding_mask=key_padding_mask,
    593     )

File [~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/utils/checkpoint.py:211](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/utils/checkpoint.py:211), in checkpoint(function, *args, **kwargs)
    208 if kwargs:
    209     raise ValueError("Unexpected keyword arguments: " + ",".join(arg for arg in kwargs))
--> 211 return CheckpointFunction.apply(function, preserve, *args)

File [~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/utils/checkpoint.py:90](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/utils/checkpoint.py:90), in CheckpointFunction.forward(ctx, run_function, preserve_rng_state, *args)
     87 ctx.save_for_backward(*tensor_inputs)
     89 with torch.no_grad():
---> 90     outputs = run_function(*args)
     91 return outputs

File [~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/nn/modules/module.py:1102](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/nn/modules/module.py:1102), in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File [~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/transformer.py:785](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/transformer.py:785), in DeformableTransformerEncoderLayer.forward(self, src, pos, reference_points, spatial_shapes, level_start_index, key_padding_mask)
    780 def forward(
    781     self, src, pos, reference_points, spatial_shapes, level_start_index, key_padding_mask=None
    782 ):
    783     # self attention
    784     # import ipdb; ipdb.set_trace()
--> 785     src2 = self.self_attn(
    786         query=self.with_pos_embed(src, pos),
    787         reference_points=reference_points,
    788         value=src,
    789         spatial_shapes=spatial_shapes,
    790         level_start_index=level_start_index,
    791         key_padding_mask=key_padding_mask,
    792     )
    793     src = src + self.dropout1(src2)
    794     src = self.norm1(src)

File [~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/nn/modules/module.py:1102](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/.conda/envs/RNCDL/lib/python3.8/site-packages/torch/nn/modules/module.py:1102), in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File [~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py:338](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py:338), in MultiScaleDeformableAttention.forward(self, query, key, value, query_pos, key_padding_mask, reference_points, spatial_shapes, level_start_index, **kwargs)
    335     sampling_locations = sampling_locations.float()
    336     attention_weights = attention_weights.float()
--> 338 output = MultiScaleDeformableAttnFunction.apply(
    339     value,
    340     spatial_shapes,
    341     level_start_index,
    342     sampling_locations,
    343     attention_weights,
    344     self.im2col_step,
    345 )
    347 if halffloat:
    348     output = output.half()

File [~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py:53](https://vscode-remote+ssh-002dremote-002b10-002e127-002e30-002e125.vscode-resource.vscode-cdn.net/home/rohit.bharadwaj/Documents/Projects/GroundingDINO/~/Documents/Projects/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py:53), in MultiScaleDeformableAttnFunction.forward(ctx, value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, im2col_step)
     42 @staticmethod
     43 def forward(
     44     ctx,
   (...)
     50     im2col_step,
     51 ):
     52     ctx.im2col_step = im2col_step
---> 53     output = _C.ms_deform_attn_forward(
     54         value,
     55         value_spatial_shapes,
     56         value_level_start_index,
     57         sampling_locations,
     58         attention_weights,
     59         ctx.im2col_step,
     60     )
     61     ctx.save_for_backward(
     62         value,
     63         value_spatial_shapes,
   (...)
     66         attention_weights,
     67     )
     68     return output

NameError: name '_C' is not defined

Perhaps this is because some of the CUDA related files was removed while compiling the pypi package @giswqs ?
Is it possible for someone to provide an updated pypi package by compiling with these CUDA files as well?

@rohit901
Copy link

rohit901 commented Jun 6, 2023

Sorry for the above comment and confusion..., I verified with inference and it seems to be working fine.

I was running the code from wrong directory so it was trying to get the modules not from the pypi package but from the current directory instead.

So to confirm, the above pypi package seems to work even with GPU. Thank you again @giswqs

@giswqs
Copy link

giswqs commented Jul 19, 2023

I have added groundingdino to conda-forge. It can be easily installed with conda. Let me know if anyone is interested in becoming a maintainer of the conda-forge package.

mamba install -c conda-forge groundingdino-py

@ash368
Copy link

ash368 commented Aug 13, 2023

I have added the package to PyPI. Will try to get it on conda-forge as well. I would be happy add maintainers to the package if anyone is interested.

PyPI: https://pypi.org/project/groundingdino-py GitHub: https://github.com/giswqs/GroundingDINO

pip install groundingdino-py

PS: There are some other packages on PyPI with the name groundingdino in it, so I had to use an alternative package name groundingdino-py as PyPI does not allow the groundingdino name.

I wanted to add groundingdino to PyPI for the downstream package segment-geospatial opengeos/segment-geospatial#62 (comment)

this saved me, thank u

@MLRadfys
Copy link

Hi and thanks for providing Grounding Dino as a pip package @giswqs !

I compared the inference output of the original repo with the output of the package, and it doesn't seem to be the same. For label inference both the original repo and the pip package give the same result, nevertheless the pip package seems to have issues with sentence prompts.

Does anyone else encountered this issue?

Cheers,

M

@xiaobanni
Copy link

Hi and thanks for providing Grounding Dino as a pip package @giswqs !

I compared the inference output of the original repo with the output of the package, and it doesn't seem to be the same. For label inference both the original repo and the pip package give the same result, nevertheless the pip package seems to have issues with sentence prompts.

Does anyone else encountered this issue?

Cheers,

M

Can you provide some specific examples for better analysis?

@iorileslie
Copy link

how to install or use on arm64?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants