KGIPA

Document


1. Example Results

Since running KGIPA with SHAP[1-3] analysis usually takes more than 30 minutes and can be affected by the number of queued tasks in the backend, we provide two pre-run online examples to facilitate usage. These examples include KGIPA and KGIPA-Fast combined with SHAP analysis, allowing users to explore the results directly.

KGIPA + SHAP:   http://bliulab.net/KGIPA/myresult/114.246.199.190_1760077846.275579

KGIPA-Fast + SHAP:   http://bliulab.net/KGIPA/myresult/114.246.199.190_1760077854.4525082

Our models support prediction of seven types of non-covalent bonds on both peptide and protein sides, as well as predictions of non-covalent interactions without distinguishing bond types. Corresponding SHAP analyses allow bond-type-specific interpretation. In other words, the SHAP results vary according to the user-selected bond type.


2. Datasets

The datasets used in this study are derived from the RCSB PDB[4], with pairs showing over 80% similarity between the training and independent test datasets removed using CD-HIT[5]. Additionally, dataset labels were extracted from PDB-BRE[6] and PLIP[7] based on the complexes' 3D structures. The processed datasets can be downloaded from the following links:

Training Dataset:   TrainingDataset-KGIPA.pkl

Independent Dataset:   LEADS-PEP.pkl   Test167.pkl   Test251.pkl

Note: The datasets above include labels for various types of non-covalent bonds and binding residues.


3. Tools

KGIPA utilizes various tools for peptide and protein feature extraction, including SCRATCH-1D (v1.2)[8], IUPred2A[9], ncbi-blast (v2.13.0)[10], ProtT5[11], and trRosetta[12]. To run KGIPA locally, these tools must be properly configured. Detailed instructions for installation and configuration are provided in the following links:

SCRATCH-1D:   https://download.igb.uci.edu/

IUPred2A:   https://iupred2a.elte.hu/

ncbi-blast:   https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html

ProtT5:   https://zenodo.org/records/4644188

trRosettaX:   https://yanglab.qd.sdu.edu.cn/trRosetta/


4. Other Notes

Additionally, KGIPA and its associated tools rely on several databases, which have been compiled for researchers' convenience. These databases can be downloaded directly via the links provided below:

blosum62:   blosum62.txt

nrdb90:   nrdb90.tar.gz


5. References

[1] M. T. Ribeiro, S. Singh, C. Guestrin. "Why should i trust you?" Explaining the predictions of any classifier. Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. 1135-1144 (2016). [2] S. M. Lundberg, S.-I. Lee. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30 (2017). [3] S. M. Lundberg, B. Nair, M. S. Vavilala, M. Horibe, M. J. Eisses, T. Adams, D. E. Liston, D. K.-W. Low, S.-F. Newman, J. Kim. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749-760 (2018) [4] Burley, S. K. et al. RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res. 51, D488-D508 (2023). [5] Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150-3152 (2012). [6] Chen, S., Yan, K. & Liu, B. PDB-BRE: A ligand-protein interaction binding residue extractor based on Protein Data Bank. Proteins Struct. Funct. Bioinf. 92, 145-153 (2024). [7] Adasme, M. F. et al. PLIP 2021: Expanding the scope of the protein-ligand interaction profiler to DNA and RNA. Nucleic Acids Res. 49, W530-W534 (2021). [8] Cheng, J., Randall, A. Z., Sweredoski, M. J. & Baldi, P. SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res. 33, W72-W76 (2005). [9] Mészáros, B., Erdős, G. & Dosztányi, Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 46, W329-W337 (2018). [10] Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., & Madden, T. L. NCBI BLAST: a better web interface. Nucleic Acids Res. 36(suppl_2), W5-W9 (2008). [11] Elnaggar, A. et al. Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112-7127 (2021). [12] Du, Z. et al. The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 16, 5634-5651 (2021).

Cite

Upon the usage the users are requested to use the following citation:

Shutao Chen, Ke Yan, Jiangyi Shao, Xiangxiang Zeng, and Bin Liu*.
Pragmatic analysis with knowledge-guided for unraveling peptide-protein pairwise non-covalent mechanisms. (Submitted)