Pf Target Hunt Helper
This web app, created by the Winzeler Lab, was designed to be a resource for MalDA (Malaria Drug Accelerator) Consortium's Plasmodium falciparum target prioritization efforts. We have compiled a variety of annotations for all genes in the Pf3D7 genome (PlasmoDB release 66) relevant to their potential as antimalarial targets.
To navigate this app, simply enter the PlasmoDB ID of a gene or transcript of interest (e.g. PF3D7_0417200) at the end of the URL, or click on a link in the "Gene List" tab. "Toggle Layout" switches between compact (suitable for wide windows) and basic (suitable for narrow windows) formatting of gene information pages.
Methods
P. falciparum 3D7 Genes
5,318 protein coding genes (5,389 transcripts) were considered from the Plasmodium falciparum 3D7 reference genome in PlasmoDB Release 66 - 28 Nov 2023 (accessed January 3rd, 2024).
PlasmoDB Annotations
The following gene/transcript annotations were taken directly from PlasmoDB Release 66 using the "Search Strategy" function:
- Gene/transcript ID, Entrez ID, UniProt ID(s), gene symbol, product description, ortholog group (OrthoMCL)
- Genomic location, gene/CDS/protein length, molecular weight, isoelectric point
- Domain annotations (Interpro, PFam, Superfamily), # transmembrane (TM) domains, SignalP peptide prediction [ref]
- Computed and curated Gene Ontology (GO) components, functions and processes; Enzyme Commission (EC) numbers
- Genetic variation across all strains in PlasmoDB (number of unique noncoding, synonymous, nonsynonymous, and nonsense SNPs)
PfTargetHuntHelper includes linkouts to BRENDA for EC numbers and AmiGO for GO terms.
Functional Information
To provide additional data on biological function, we include the following:
-
Summary of gene expression across ring, trophozoite, schizont and gametocyte stages based on the Malaria Cell Atlas Chromium 10x transcriptomics dataset [ref]. Specifically, the 75th percentile (3rd quartile) of normalized expression values across cells is reported for each stage. Expression data was taken directly from pf-ch10x-exp.csv in pf.zip downloaded from the Malaria Cell Atlas website in February 2024.
- Total of 37,624 cells; number of cells per stage: ring - 7,071 | trophozoite - 13,436 | schizont - 8,159 | gametocyte - 8,958
- Link to gene ID search in the Malaria Parasite Metabolic Pathways website, which includes a variety of information on function and sub-cellular localization of genes included in its metabolic pathway maps. MPMP is maintained by Hagai Ginsburg from Hebrew University.
- Link to possible entry in iAM_Pf480, a genome-scale metabolic model (GeMM) with 480 Pf genes constructed by Abdel-Haleem et al. 2018 [ref], in BiGG Models.
- Link to gene ID search in the STRING website, which displays networks of known and predicted protein-protein interactions from the STRING database [ref].
Essentiality Information
Three sources of evidence were used to assess Pf gene essentiality:
-
Zhang et al. 2018 [ref]: used piggyBac transposon insertion mutagenesis to generate 38,000 Pf mutants; calculated mutagenesis index scores (MIS) and mutagenesis fitness scores (MFS) for 5,399 nuclear genes, identifying 2,680 genes as essential for in vitro ABS growth. TABLE S5
- Pf3D7 gene IDs were directly mapped to the PlasmoDB release 66 gene list
-
PlasmoGEM (Plasmodium Genetic Modification Project) [ref]: produced P. berghei gene disruption vectors with enhanced transfection efficiency; website includes blood stage phenotypic data (relative growth rate) for 2,578 Pb genes. PlasmoGEM link
- PbANKA gene IDs were mapped to Pf3D7 gene IDs based on OrthoMCL (release 6.19) ortholog groups
- "Download CSV" accessed January 2024
-
RMgmDB (Rodent Malaria genetically modified Parasites database) [ref]: manually curated web repository of genotype and phenotype information for 5,300+ P. berghei mutants (as of August 12, 2023 update) RMgmDB link
- Searched database using all Pf3D7 gene IDs (RMgmDB stores its own PbANKA-Pf3D7 mapping); considered "successful" gene modifications of any type with reported asexual blood stage phenotype
- Accessed January 2024
Binding Information
We also defined presence of a small molecule binding domain as a key factor for target potential. To approximate this, we searched for evidence of similarity to proteins with known ligand interactions through three complementary approaches:
-
Orthology of Pf3D7 protein to at least one protein in BindingDB [ref], a public database of experimentally determined protein-ligand binding affinities (1,218,340 compounds and 9,195 targets as of 2024-01-28 update). BindingDB link
- Accessed January 30th, 2024 with help from Mike Gilson
- 6,202 proteins with measured affinity of ≥10 micromolar with at least one compound considered
- Orthology determined based on presence of P. falciparum 3D7 gene in the same ortholog group as BindingDB Uniprot entry, according to four different orthology databases (HOGENOM, OMA, OrthoDB available in the Uniprot ID mapping web tool [accessed 2/2/2024] and OrthoMCL)
- Additionally, we ran BLAST on the protein sequence of each BindingDB entry against the full OrthoMCL protein database (FASTA release OrthoMCL-6.19). Protein sequences for all Uniprot BindingDB identifiers were obtained from UniProt’s REST tool and were compared against the OrthoMCL custom database with BLAST 2.15 blastp function. Pf genes were mapped based on ortholog group match.
-
AlphaFill [ref] predictions of ligands corresponding to Pf3D7 AlphaFold(v4) models, taken from the AlphaFill databank. The AlphaFill algorithm determines candidate ligands by searching for sequence homologs in PDB with known ligands. Of 5,099 Pf3D7 genes having a corresponding Uniprot ID with AlphaFill information, 2,771 had at least one ligand hit. AlphaFill link
-
Inhibitors linked to EC (Enzyme Commission) number classes in the BRENDA Enzyme Database [ref] (Release 2023.1, updated 2/1/2023) for Pf3D7 genes based on EC number annotations in PlasmoDB release 66, applicable only to enzymes. Other ligand types were not considered. For genes with incomplete EC number annotations, all EC numbers matching the wildcard were considered.
Human Orthologs
Existence of a highly similar human ortholog poses cytotoxicity concerns for a candidate target. Homo sapiens genes orthologous to Pf3D7 genes were determined from OrthoMCL, and both sequence and structural similarity were evaluated through pairwise comparison of Pf3D7 and human ortholog AlphaFold(v4) structures in TM-align [ref]. Of 2,006 genes with human ortholog(s), 1,972 had AlphaFold structures enabling TM-align comparison.
Resistome Mutations
In vitro selections with antimalarial compounds yield mutations that may play roles in resistance. We have compiled a Pf "resistome" database consisting of all SNV and indel mutations identified from whole genome sequencing of 1,138 Pf clones or bulk culture samples with evolved resistance to one of 146 diverse compounds. Here, we report data on the following classes of mutations in the resistome database:
- "Disruptive" mutations: any SNV or indel occurring within a gene that is not a synonymous, intron, or stop retained variant (e.g. inframe indels, frameshift, start lost, splice region, nonsense, missense)
- Missense mutations: missense SNVs
- "Interesting" missense mutations: missense SNVs that do not fall in a low complexity region (PlasmoDB) and have pLDDT >70 for the affected residue where the protein's AlphaFold structure is available. These are highlighted in green in PfTargetHuntHelper
To determine variants, WGS reads were aligned to the Pf3D7 v13 genome and processed according to GATK 3.5 pipeline. GATK HaplotypeCaller with default parameters was used to generate SNV/indel variant calls, and SnpEff was used to annotate genes and variant effects. Initial filtering retained SNVs and indels with total depth ≥2 and alternate allele frequency (AAF) >0.2 in a resistant sample along with depth ≥3 and genotype call of 0/0 in its parent sample. Next, these variants were filtered for SnpEff gene annotation (excludes intergenic variants >1000nt away from nearest gene) and either major alt allele depth >40 and major AAF >0.4, or major alt allele depth >10 and AAF >0.8.
Protein Information
All records from Protein Data Bank (PDB) associated with a Pf gene ID were downloaded in February 2024. NOTE: PDB ID(s) are based on exact gene ID match. To thoroughly check whether a protein has a crystal structure in PDB, search PDB using other keywords.
We include visualization of the AlphaFold protein structure, if available, in an embedded version of iCn3D, a web-based 3D structure viewer from the NIH.
Field Variants
We used the MalariaGEN Pf7 [ref] dataset of 20,864 worldwide samples to assess field genetic variation. Variant (SNV and indel) call data was accessed via the malariagen_data Python package. We restricted our analysis to 5,868,659 variants marked as "passing" based on quality filters and exclusion of variants in hypervariable regions, mitochondrial and apicoplast genomes (see Methods of the Pf7 paper). Note: this means that no variants are reported for subtelomeric and non-nuclear genes, even though they may have field variants. Variants were then sorted based on effect ("synonymous" includes stop retained, "disruptive" was defined as anything other than synonymous, and "missense" is restricted to missense SNVs) and prevalence among all Pf7 samples ("singleton" occurs in only one sample, "doubleton" occurs in two samples, "rare" variants exclude singletons/doubletons but occur in less than 21 samples (<0.1% prevalence), all else defined as "common"). Whether a variant occurs in a sample or not was further stratified by homozygous genotype call, e.g. 1/1 indicating that allele 1 is present at nearly 100% frequency, or any genotype call containing the allele (e.g. 1/2 would be counted as both allele 1 and allele 2 occurring in a sample).
TL;DR: Each number in the table indicates how many unique variants (non-3D7 reference allele at a locus within the gene) of a particular effect type fall into a certain prevalence bin across the Pf7 dataset. In computing prevalence, whether or not a sample "has" a variant is defined either by homozygous only or any genotype call.
Associated Publications
A download from the NCBI FTP site was performed for gene2pubmed.gz (version 2024-02-21) containing tax ID, gene ID (Entrez) and PubMed ID. Gene IDs were mapped to this annotation set and corresponding PMIDs were extracted. To include gene references not related to an Entrez ID, but to the gene name (symbol), PMID details were obtained pragmatically using NCBI Eutils4 efetch function from NCBI. Title, authors and DOI identifier were retrived for each record.
More Resources
- Essentiality and localization for "conserved protein, unknown function" genes on Pf3D7 chromosome 3: TABLE (Kimmel et al. 2023 [ref])
- Structural similarities to known domains found in 353 Pf3D7 proteins of unknown function, using Alphafold predictions and DALI search against PDB: TABLES (Behrens and Spielmann 2024 [ref])
References
- Abdel-Haleem, A. M., Hefzi, H., Mineta, K., Gao, X., Gojobori, T., Palsson, B. O., Lewis, N. E., & Jamshidi, N. (2018). Functional interrogation of Plasmodium genus metabolism identifies species- and stage-specific differences in nutrient essentiality and drug targeting. PLoS computational biology, 14(1), e1005895. https://doi.org/10.1371/journal.pcbi.1005895
- Behrens, H. & Spielmann, T. Identification of domains in Plasmodium falciparum proteins of unknown function using DALI search on Alphafold predictions. bioRxiv 2023.06.05.543710; doi: https://doi.org/10.1101/2023.06.05.543710 [Preprint]
- Chang, A., Jeske, L., Ulbrich, S., Hofmann, J., Koblitz, J., Schomburg, I., Neumann-Schaal, M., Jahn, D., & Schomburg, D. (2021). BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic acids research, 49(D1), D498–D508. https://doi.org/10.1093/nar/gkaa1025
- Gilson, M. K., Liu, T., Baitaluk, M., Nicola, G., Hwang, L., & Chong, J. (2016). BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic acids research, 44(D1), D1045–D1053. https://doi.org/10.1093/nar/gkv1072
- Hekkelman, M.L., de Vries, I., Joosten, R.P. et al. (2023). AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat Methods 20, 205–213. https://doi.org/10.1038/s41592-022-01685-y
- Howick, V. M. et al. (2019). The Malaria Cell Atlas: Single parasite transcriptomes across the complete Plasmodium life cycle. Science 365, eaaw2619. https://doi.org/10.1126/science.aaw2619
- Khan, S.M., Kroeze, H., Franke-Fayard, B., & Janse, C.J. (2013). Standardization in generating and reporting genetically modified rodent malaria parasites: the RMgmDB database. Methods in molecular biology (Clifton, N.J.) 923, 139–150. https://doi.org/10.1007/978-1-62703-026-7_9
- Kimmel, J., Schmitt, M., Sinner, A. et al. (2023). Gene-by-gene screen of the unknown proteins encoded on Plasmodium falciparum chromosome 3. Cell systems, 14(1), 9–23.e7. https://doi.org/10.1016/j.cels.2022.12.001
- MalariaGEN, Abdel Hamid, M. M., Abdelraheem, M. H., Acheampong, D. O., Ahouidi, A., Ali, M., Almagro-Garcia, J., Amambua-Ngwa, A., Amaratunga, C., Amenga-Etego, L., Andagalu, B., Anderson, T., Andrianaranjaka, V., Aniebo, I., Aninagyei, E., Ansah, F., Ansah, P. O., Apinjoh, T., Arnaldo, P., Ashley, E., … van der Pluijm, R. W. (2023). Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples. Wellcome open research 8, 22. https://doi.org/10.12688/wellcomeopenres.18681.1
- Teufel, F., Almagro Armenteros, J.J., Johansen, A.R. et al. (2022). SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol 40, 1023–1025. https://doi.org/10.1038/s41587-021-01156-3
For more information about SignalP, see this Springer Nature blog post
- Schwach, F., Bushell, E., Gomes, A.R., Anar, B., Girling, G., Herd, C., Rayner, & J.C., Billker, O. (2015). PlasmoGEM, a database supporting a community resource for large-scale experimental genetics in malaria parasites. Nucleic Acids Research 43(D1), D1176–D1182. https://doi.org/10.1093/nar/gku1143
- Szklarczyk, D., Kirsch, R., Koutrouli, M., Nastou, K., Mehryary, F., Hachilif, R., Gable, A. L., Fang, T., Doncheva, N. T., Pyysalo, S., Bork, P., Jensen, L. J., & von Mering, C. (2023). The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research, 51(D1), D638–D646. https://doi.org/10.1093/nar/gkac1000
- Zhang, M. et al. (2018). Uncovering the essential genes of the human malaria parasite Plasmodium falciparum by saturation mutagenesis. Science 360, eaap7847. https://doi.org/10.1126/science.aap7847
- Zhang, Y., & Skolnick, J. (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic acids research, 33(7), 2302–2309. https://doi.org/10.1093/nar/gki524