Signal peptide discrimination and cleavage site identification using SVM and NN

Kazemian, Hassan and Yusuf, S. A. and White, Kenneth (2014) Signal peptide discrimination and cleavage site identification using SVM and NN. Computers in Biology and Medicine, 45. pp. 98-110. ISSN 0010-4825

[img]
Preview
Text
SP_Paper_28th Jan 14 final with page numbers.pdf - Published Version

Download (2MB) | Preview
Official URL: http://www.computersinbiologyandmedicine.com/artic...

Abstract

About 15% of all proteins in a genome contain a signal peptide (SP) sequence, at the N-terminus, that targets the protein to intracellular secretory pathways. Once the protein is targeted correctly in the cell, the SP is cleaved, releasing the mature protein. Accurate prediction of the presence of these short aminoacid SP chains is crucial for modelling the topology of membrane proteins, since SP sequences can be confused with transmembrane domains due to similar composition of hydrophobic amino acids. This paper presents a cascaded Support Vector Machine (SVM)-Neural Network (NN) classification methodology for SP discrimination and cleavage site identification.

The proposed method utilises a dual phase classification approach using SVM as a primary classifier to discriminate SP sequences from Non-SP. The methodology further employs NNs to predict the most suitable cleavage site candidates. In phase one, a SVM classification utilises hydrophobic propensities as a primary feature vector extraction using symmetric sliding window amino-acid sequence analysis for discrimination of SP and Non-SP. In phase two, a NN classification uses asymmetric sliding window
sequence analysis for prediction of cleavage site identification.

The proposed SVM-NN method was tested using Uni-Prot non-redundant datasets of eukaryotic and prokaryotic proteins with SP and Non-SP N-termini. Computer simulation results demonstrate an overall accuracy of 0.90 for SP and Non-SP discrimination based on Matthews Correlation Coefficient (MCC) tests using SVM. For SP cleavage site prediction, the overall accuracy is 91.5% based on cross-validation tests using the novel SVM-NN model.

Item Type: Article
Uncontrolled Keywords: signal peptide discrimination, support vector machine, cleavage site prediction, neural networks
Subjects: 500 Natural Sciences and Mathemetics > 570 Life sciences; biology
Department: School of Human Sciences
Depositing User: Hassan Kazemian
Date Deposited: 29 Sep 2015 09:00
Last Modified: 28 Jun 2016 08:51
URI: http://repository.londonmet.ac.uk/id/eprint/761

Actions (login required)

View Item View Item