Skip to content

Latest commit

 

History

History
60 lines (51 loc) · 3.33 KB

performance.md

File metadata and controls

60 lines (51 loc) · 3.33 KB

Performance

Table of Contents:

Results

PharmCAT comparison

We tested 64 targeted HiFi datasets that were sequenced on Revio (40) and Sequel IIe (24) systems. For each dataset, we ran both PharmCAT and pb-StarPhase to calculate the diplotypes. Additionally, we ran each dataset on an unphased VCF (DeepVariant) and one that was phased (DeepVariant + pbsv + HiPhase). The following table shows the comparison for all CPIC genes. RNR1 was excluded because it is not called by PharmCAT. HLA-A, HLA-B, and CYP2D6 were excluded because they are not called by either tool.

Gene Identical (Unphased) Identical (phased)
ABCG2 100% 100%
CACNA1S 100% 100%
CFTR 100% 100%
CYP2B6 100% 100.0%
CYP2C19 100% 100.0%
CYP2C9 100% 98.4%
CYP3A5 100% 100%
CYP4F2 100% 100%
DPYD 82.8% 68.8%
G6PD 100% 100%
IFNL3 100% 100%
NUDT15 100% 100%
RNR1 (MT) 100% 100%
SLCO1B1 100% 100%
TPMT 100% 100%
UGT1A1 100% 100%
VKORC1 100% 100%
Overall 99.0% 98.1%

We manually curated the remaining differences and determined they are due to either differences in reporting or mishandling of phased alleles in PharmCAT:

  • CYP2C9 - There was one total discrepancies in the phased solutions. In this case, PharmCAT reported "Unknown/Unknown" while pb-StarPhase reported a single diplotype. Interestingly, PharmCAT reported the same answer as pb-StarPhase when unphased variants were provided. When we inspected the variants, it appeared that PharmCAT was erroneously treating variants in different phase blocks as being in phase with each other. This created haplotypes that have not been described before and led to the "Unknown/Unknown" reports.
  • DPYD - This gene is a little different from the other genes in that traditional haplotypes have not been built into CPIC yet. Instead, each "haplotype" is just a single variant.
    • Unphased - If more than two variants are identified, PharmCAT handles these by reporting all variants as "{variant}/None". In contrast, pb-StarPhase will report "NO_MATCH/NO_MATCH" as there is no special treatment for DPYD. All reported diplotypes with <=2 variants were identical between PharmCAT and pb-StarPhase.
    • Phased - The same phase block issue with CYP2C9 impacts the DPYD results. In addition to the mismatches from 3+ variants, phase blocks can get misinterpreted in PharmCAT leading to reduced identity scores in this gene.

In summary, we expect most unphased results to be identical between the two tools, with DPYD reporting being the notable exception. With phased HiFi data, pb-StarPhase is more likely to generate correct results as it properly handles variants that are on different phase blocks. However, we note from our experiment that these situations are relatively rare.

Tool definitions

pb-StarPhase

This is the tool released with this repository.

  • Version - v0.7.2
  • Database version - v0.6.1-20230914
  • Extra parameters - None

PharmCAT

The PharmCAT pipeline (/pharmcat/pharmcat_pipeline) was run from the provided docker image:

  • Version - v2.7.1
  • Image - docker://pgkb/pharmcat:2.7.1
  • Extra parameters - --matcher-all-results --missing-to-ref --reporter-save-json