About Gene List Toggle Layout

Pf Target Hunt Helper

This web app, created by the Winzeler Lab, was designed to be a resource for MalDA (Malaria Drug Accelerator) Consortium's Plasmodium falciparum target prioritization efforts. We have compiled a variety of annotations for all genes in the Pf3D7 genome (PlasmoDB release 66) relevant to their potential as antimalarial targets.

To navigate this app, simply enter the PlasmoDB ID of a gene or transcript of interest (e.g. PF3D7_0417200) at the end of the URL, or click on a link in the "Gene List" tab. "Toggle Layout" switches between compact (suitable for wide windows) and basic (suitable for narrow windows) formatting of gene information pages.

Methods

P. falciparum 3D7 Genes

5,318 protein coding genes (5,389 transcripts) were considered from the Plasmodium falciparum 3D7 reference genome in PlasmoDB Release 66 - 28 Nov 2023 (accessed January 3rd, 2024).

PlasmoDB Annotations

The following gene/transcript annotations were taken directly from PlasmoDB Release 66 using the "Search Strategy" function:

PfTargetHuntHelper includes linkouts to BRENDA for EC numbers and AmiGO for GO terms.

Functional Information

To provide additional data on biological function, we include the following:

Essentiality Information

Three sources of evidence were used to assess Pf gene essentiality:

  1. Zhang et al. 2018 [ref]: used piggyBac transposon insertion mutagenesis to generate 38,000 Pf mutants; calculated mutagenesis index scores (MIS) and mutagenesis fitness scores (MFS) for 5,399 nuclear genes, identifying 2,680 genes as essential for in vitro ABS growth. TABLE S5
  2. PlasmoGEM (Plasmodium Genetic Modification Project) [ref]: produced P. berghei gene disruption vectors with enhanced transfection efficiency; website includes blood stage phenotypic data (relative growth rate) for 2,578 Pb genes. PlasmoGEM link
  3. RMgmDB (Rodent Malaria genetically modified Parasites database) [ref]: manually curated web repository of genotype and phenotype information for 5,300+ P. berghei mutants (as of August 12, 2023 update) RMgmDB link

Binding Information

We also defined presence of a small molecule binding domain as a key factor for target potential. To approximate this, we searched for evidence of similarity to proteins with known ligand interactions through three complementary approaches:

  1. Orthology of Pf3D7 protein to at least one protein in BindingDB [ref], a public database of experimentally determined protein-ligand binding affinities (1,218,340 compounds and 9,195 targets as of 2024-01-28 update). BindingDB link
  2. AlphaFill [ref] predictions of ligands corresponding to Pf3D7 AlphaFold(v4) models, taken from the AlphaFill databank. The AlphaFill algorithm determines candidate ligands by searching for sequence homologs in PDB with known ligands. Of 5,099 Pf3D7 genes having a corresponding Uniprot ID with AlphaFill information, 2,771 had at least one ligand hit. AlphaFill link
  3. Inhibitors linked to EC (Enzyme Commission) number classes in the BRENDA Enzyme Database [ref] (Release 2023.1, updated 2/1/2023) for Pf3D7 genes based on EC number annotations in PlasmoDB release 66, applicable only to enzymes. Other ligand types were not considered. For genes with incomplete EC number annotations, all EC numbers matching the wildcard were considered.

Human Orthologs

Existence of a highly similar human ortholog poses cytotoxicity concerns for a candidate target. Homo sapiens genes orthologous to Pf3D7 genes were determined from OrthoMCL, and both sequence and structural similarity were evaluated through pairwise comparison of Pf3D7 and human ortholog AlphaFold(v4) structures in TM-align [ref]. Of 2,006 genes with human ortholog(s), 1,972 had AlphaFold structures enabling TM-align comparison.

Resistome Mutations

In vitro selections with antimalarial compounds yield mutations that may play roles in resistance. We have compiled a Pf "resistome" database consisting of all SNV and indel mutations identified from whole genome sequencing of 1,138 Pf clones or bulk culture samples with evolved resistance to one of 146 diverse compounds. Here, we report data on the following classes of mutations in the resistome database:

To determine variants, WGS reads were aligned to the Pf3D7 v13 genome and processed according to GATK 3.5 pipeline. GATK HaplotypeCaller with default parameters was used to generate SNV/indel variant calls, and SnpEff was used to annotate genes and variant effects. Initial filtering retained SNVs and indels with total depth ≥2 and alternate allele frequency (AAF) >0.2 in a resistant sample along with depth ≥3 and genotype call of 0/0 in its parent sample. Next, these variants were filtered for SnpEff gene annotation (excludes intergenic variants >1000nt away from nearest gene) and either major alt allele depth >40 and major AAF >0.4, or major alt allele depth >10 and AAF >0.8.

Protein Information

All records from Protein Data Bank (PDB) associated with a Pf gene ID were downloaded in February 2024. NOTE: PDB ID(s) are based on exact gene ID match. To thoroughly check whether a protein has a crystal structure in PDB, search PDB using other keywords.

We include visualization of the AlphaFold protein structure, if available, in an embedded version of iCn3D, a web-based 3D structure viewer from the NIH.

Field Variants

We used the MalariaGEN Pf7 [ref] dataset of 20,864 worldwide samples to assess field genetic variation. Variant (SNV and indel) call data was accessed via the malariagen_data Python package. We restricted our analysis to 5,868,659 variants marked as "passing" based on quality filters and exclusion of variants in hypervariable regions, mitochondrial and apicoplast genomes (see Methods of the Pf7 paper). Note: this means that no variants are reported for subtelomeric and non-nuclear genes, even though they may have field variants. Variants were then sorted based on effect ("synonymous" includes stop retained, "disruptive" was defined as anything other than synonymous, and "missense" is restricted to missense SNVs) and prevalence among all Pf7 samples ("singleton" occurs in only one sample, "doubleton" occurs in two samples, "rare" variants exclude singletons/doubletons but occur in less than 21 samples (<0.1% prevalence), all else defined as "common"). Whether a variant occurs in a sample or not was further stratified by homozygous genotype call, e.g. 1/1 indicating that allele 1 is present at nearly 100% frequency, or any genotype call containing the allele (e.g. 1/2 would be counted as both allele 1 and allele 2 occurring in a sample).

TL;DR: Each number in the table indicates how many unique variants (non-3D7 reference allele at a locus within the gene) of a particular effect type fall into a certain prevalence bin across the Pf7 dataset. In computing prevalence, whether or not a sample "has" a variant is defined either by homozygous only or any genotype call.

Associated Publications

A download from the NCBI FTP site was performed for gene2pubmed.gz (version 2024-02-21) containing tax ID, gene ID (Entrez) and PubMed ID. Gene IDs were mapped to this annotation set and corresponding PMIDs were extracted. To include gene references not related to an Entrez ID, but to the gene name (symbol), PMID details were obtained pragmatically using NCBI Eutils4 efetch function from NCBI. Title, authors and DOI identifier were retrived for each record.

More Resources

References

  1. Abdel-Haleem, A. M., Hefzi, H., Mineta, K., Gao, X., Gojobori, T., Palsson, B. O., Lewis, N. E., & Jamshidi, N. (2018). Functional interrogation of Plasmodium genus metabolism identifies species- and stage-specific differences in nutrient essentiality and drug targeting. PLoS computational biology, 14(1), e1005895. https://doi.org/10.1371/journal.pcbi.1005895
  2. Behrens, H. & Spielmann, T. Identification of domains in Plasmodium falciparum proteins of unknown function using DALI search on Alphafold predictions. bioRxiv 2023.06.05.543710; doi: https://doi.org/10.1101/2023.06.05.543710 [Preprint]
  3. Chang, A., Jeske, L., Ulbrich, S., Hofmann, J., Koblitz, J., Schomburg, I., Neumann-Schaal, M., Jahn, D., & Schomburg, D. (2021). BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic acids research, 49(D1), D498–D508. https://doi.org/10.1093/nar/gkaa1025
  4. Gilson, M. K., Liu, T., Baitaluk, M., Nicola, G., Hwang, L., & Chong, J. (2016). BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic acids research, 44(D1), D1045–D1053. https://doi.org/10.1093/nar/gkv1072
  5. Hekkelman, M.L., de Vries, I., Joosten, R.P. et al. (2023). AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat Methods 20, 205–213. https://doi.org/10.1038/s41592-022-01685-y
  6. Howick, V. M. et al. (2019). The Malaria Cell Atlas: Single parasite transcriptomes across the complete Plasmodium life cycle. Science 365, eaaw2619. https://doi.org/10.1126/science.aaw2619
  7. Khan, S.M., Kroeze, H., Franke-Fayard, B., & Janse, C.J. (2013). Standardization in generating and reporting genetically modified rodent malaria parasites: the RMgmDB database. Methods in molecular biology (Clifton, N.J.) 923, 139–150. https://doi.org/10.1007/978-1-62703-026-7_9
  8. Kimmel, J., Schmitt, M., Sinner, A. et al. (2023). Gene-by-gene screen of the unknown proteins encoded on Plasmodium falciparum chromosome 3. Cell systems, 14(1), 9–23.e7. https://doi.org/10.1016/j.cels.2022.12.001
  9. MalariaGEN, Abdel Hamid, M. M., Abdelraheem, M. H., Acheampong, D. O., Ahouidi, A., Ali, M., Almagro-Garcia, J., Amambua-Ngwa, A., Amaratunga, C., Amenga-Etego, L., Andagalu, B., Anderson, T., Andrianaranjaka, V., Aniebo, I., Aninagyei, E., Ansah, F., Ansah, P. O., Apinjoh, T., Arnaldo, P., Ashley, E., … van der Pluijm, R. W. (2023). Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples. Wellcome open research 8, 22. https://doi.org/10.12688/wellcomeopenres.18681.1
  10. Teufel, F., Almagro Armenteros, J.J., Johansen, A.R. et al. (2022). SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol 40, 1023–1025. https://doi.org/10.1038/s41587-021-01156-3
  11. For more information about SignalP, see this Springer Nature blog post

  12. Schwach, F., Bushell, E., Gomes, A.R., Anar, B., Girling, G., Herd, C., Rayner, & J.C., Billker, O. (2015). PlasmoGEM, a database supporting a community resource for large-scale experimental genetics in malaria parasites. Nucleic Acids Research 43(D1), D1176–D1182. https://doi.org/10.1093/nar/gku1143
  13. Szklarczyk, D., Kirsch, R., Koutrouli, M., Nastou, K., Mehryary, F., Hachilif, R., Gable, A. L., Fang, T., Doncheva, N. T., Pyysalo, S., Bork, P., Jensen, L. J., & von Mering, C. (2023). The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research, 51(D1), D638–D646. https://doi.org/10.1093/nar/gkac1000
  14. Zhang, M. et al. (2018). Uncovering the essential genes of the human malaria parasite Plasmodium falciparum by saturation mutagenesis. Science 360, eaap7847. https://doi.org/10.1126/science.aap7847
  15. Zhang, Y., & Skolnick, J. (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic acids research, 33(7), 2302–2309. https://doi.org/10.1093/nar/gki524