Skip to content
This repository has been archived by the owner on Apr 12, 2024. It is now read-only.

asctb-ct-label-mapper: A package to recommend controlled vocabulary for annotations of scRNA-seq datasets. and thereby enable cross-dataset or cross-experiment comparison of annotations.

License

hubmapconsortium/asctb-ct-label-mapper

Repository files navigation

ASCT+B Cell-Type Label Mapper

asctb_ct_label_mapper is a package to ensure controlled vocabulary for annotations of scRNA-seq datasets. The goal is to enable cross-dataset or cross-experiment comparison of data by aligning annotations to a standard reference point.

Given a specific organ's scRNA-seq annotated dataset (.h5ad/.rds), you can create a translation file for mapping raw-labels to the ASCT+B naming convention.


General flow:

  1. Create the reference-embeddings by fetching the corresponding ASCT+B organ (with latest version):
  • Fetch the ASCT+B dataset from the ASCT+B Master Tables.
  • Parse the data to create wrangled 3 columns CT-ID, CT-Name, CT-Label.
  • Fetch Description of each unique CT-ID from Cell Ontology.
  • Use NLP-preprocessing best practices for the text fields.
  • Use a Sentence-Transformer model hosted on Hugging Face to create embeddings of shape cx768 (c is the Number of unique CTs in the ASCT+B Master table).
  1. For each input raw Cell-Type annotation/cluster label, create the embedding and compare it against the embeddings generated in step #1.

  2. Identify the best matching ASCT+B label for the input raw label.

  3. You can also visualize the agreeability of cross-dataset annotations before and after using ASCTB CT Label Mapper.


A walkthrough is available on Google Colab here.


Architecture:


Step 1: Create Reference Embeddings

Step 1: Create Reference Embeddings


Step 2: Map input Cell-Type labels to these Reference Embeddings

Step 2: Map input labels to Reference Embeddings


Output: Top-2 matches from ASCT+B as suggestions for each of query Cell-Type annotation label

Expert provides feedback in order to finalize the translation from query annotation label to ASCT+B annotation label.

Output_summary


Cosine Similarity

Cosine Similarity


About

asctb-ct-label-mapper: A package to recommend controlled vocabulary for annotations of scRNA-seq datasets. and thereby enable cross-dataset or cross-experiment comparison of annotations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published