Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addGeneAnnotation module can not annotate the genes on the minus strand #35

Open
fsnibs10 opened this issue Jun 4, 2024 · 2 comments
Open

Comments

@fsnibs10
Copy link

fsnibs10 commented Jun 4, 2024

Hi authors,

My studied organism is Schizosaccharomyces pombe . The latest GFF annotation file is from the Pombase website, not from Ensembl. So I construct the TxDb object by giving the GFF file. The BSgenome is also self constructed by using the BSgenome package. I want to find the spacer sequences targeting a specific gene guided by the website Design_CRISPRko_Cas9.

It is strange that the program worked very well for all genes on the plus strand, but it promotes an error for genes on the minus strand ( 'start' or 'end' cannot contain NAs).

The detailed procession is shown below.

> pombe_gff_file <- "./DY47073.gff3"
> pombe_txdb <- getTxDb(pombe_gff_file, organism=NA)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> pombe_grList <- TxDb2GRangesList(pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> library(BSgenome.Spombe.DY47073.4CrisprVerse)
> mybsgenome <- BSgenome.Spombe.DY47073.4CrisprVerse



> # gene on the plus strand
> pombe_ade6_gr <- queryTxObject(txObject=pombe_txdb,
+                                featureType="cds",
+                                queryColumn="gene_id",
+                                queryValue="SPCC1322.13")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> pombe_ade6_guideSet <- findSpacers(pombe_ade6_gr,
+                              bsgenome=mybsgenome,
+                              crisprNuclease=SpCas9)
> 
> pombe_ade6_guideSet <- addGeneAnnotation(pombe_ade6_guideSet,
+                                    txObject=pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns


> # gene on the minus strand
> pombe_vtc4_gr <- queryTxObject(txObject=pombe_txdb,
+                                featureType="cds",
+                                queryColumn="gene_id",
+                                queryValue="SPCC1322.14c")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> pombe_vtc4_guideSet <- findSpacers(pombe_vtc4_gr,
+                                    bsgenome=mybsgenome,
+                                    crisprNuclease=SpCas9)
> 
> pombe_vtc4_guideSet <- addGeneAnnotation(pombe_vtc4_guideSet,
+                                          txObject=pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
Error in.new_IRanges_from_start_end(start, end): 
  'start' or 'end' cannot contain NAs


Below are the part of two genes from gff file.

chrIII	Liftoff	gene	1363196	1364854	.	+	.	ID=SPCC1322.13;Name=ade6;so_term_name=protein_coding_gene;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=SPCC1322.13_0
chrIII	Liftoff	mRNA	1363196	1364854	.	+	.	ID=SPCC1322.13.1;Parent=SPCC1322.13;product=phosphoribosylaminoimidazole carboxylase Ade6;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
chrIII	Liftoff	exon	1363196	1364854	.	+	.	ID=SPCC1322.13.1:exon:1;Parent=SPCC1322.13.1;product=phosphoribosylaminoimidazole carboxylase Ade6;extra_copy_number=0
chrIII	Liftoff	CDS	1363196	1364854	.	+	0	ID=SPCC1322.13.1:CDS:1;Parent=SPCC1322.13.1;product=phosphoribosylaminoimidazole carboxylase Ade6;extra_copy_number=0

chrIII	Liftoff	gene	1365284	1367449	.	-	.	ID=SPCC1322.14c;Name=vtc4;so_term_name=protein_coding_gene;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=SPCC1322.14c_0
chrIII	Liftoff	mRNA	1365284	1367449	.	-	.	ID=SPCC1322.14c.1;Parent=SPCC1322.14c;product=vacuolar transporter chaperone (VTC) complex subunit;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
chrIII	Liftoff	exon	1365284	1367449	.	-	.	ID=SPCC1322.14c.1:exon:1;Parent=SPCC1322.14c.1;product=vacuolar transporter chaperone (VTC) complex subunit;extra_copy_number=0
chrIII	Liftoff	CDS	1365284	1367449	.	-	0	ID=SPCC1322.14c.1:CDS:1;Parent=SPCC1322.14c.1;product=vacuolar transporter chaperone (VTC) complex subunit;extra_copy_number=0

I don't know how to debug this. I am looking forward to your reply.

@fsnibs10
Copy link
Author

Hi all,

With the help of my colleague,this bug is now fixed. When constructing the TxDb from the GFF file, I forgot to include the chrominfo parameter, which includes the chromosome length. This is necessary for the genes on the minus strand. All in all, it is a powerful tool.

@fsnibs10
Copy link
Author

Hi all,

I also want to show my script for others to use.

> library(BSgenome.Spombe.DY47073.4CrisprVerse)
> mybsgenome <- BSgenome.Spombe.DY47073.4CrisprVerse
> data(SpCas9, package="crisprBase")

> gfffile <- "./DY47073.gff3"
> chrominfo <- data.frame(chrom=c("chrI","chrII","chrIII","chrrDNA_distal_contig1","chrrDNA_distal_contig2","chrMT"),
                        length=c(5623850,4644548,2508823,16591,8860,19433),
                        is_circular=c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE))
> pombe_txdb <- getTxDb(file=gfffile,organism=NA,chrominfo=chrominfo)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> 
> pombe_grList <- TxDb2GRangesList(pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> GenomeInfoDb::genome(pombe_grList) <- "Spombe"
> GenomeInfoDb::seqlengths(pombe_grList)
                  chrI                  chrII                 chrIII 
               5623850                4644548                2508823 
chrrDNA_distal_contig1 chrrDNA_distal_contig2                   chrM 
                 16591                   8860                  19433 


###### test gene on the minus strand
> pombe_gene_gr <- queryTxObject(txObject=pombe_txdb,
+                                featureType="transcript",
+                                queryColumn="gene_id",
+                                queryValue="SPCC1322.14c")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> pombe_gene_guideSet <- findSpacers(pombe_gene_gr,
+                                    bsgenome=mybsgenome,
+                                    crisprNuclease=SpCas9)
Warning:
In .merge_two_Seqinfo_objects(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chrMT
  - in 'y': chrM
  Make sure to always combine/compare objects based on the same reference
  genome (use suppressWarnings() to suppress this warning).
> pombe_gene_guideSet <- addGeneAnnotation(pombe_gene_guideSet,
+                                          txObject=pombe_txdb)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant