Search ETDs:
Searching for remotely homologous sequences in protein databases with hybrid PSI-blast
Li, Yuheng

2006, Doctor of Philosophy, Ohio State University, Biophysics.
Sequence alignment is one of the fundamental techniques used in molecular biology. It has been widely used in many biological applications, such as protein classification, gene finding, homology modeling, structure and function prediction, phylogenetic analysis and database annotation. In high sensitivity sequence homology database searches, progressive sequence model refinement by means of iterative searches is an effective method and is currently employed in many popular tools such as PSI-BLAST and SAM. Recently, a novel alignment algorithm has been proposed that offers features expected to improve the sensitivity of such iterative approaches, specifically a well-characterized theory of its statistics even in the presence of position-specific gap costs. We have demonstrated that the new hybrid alignment algorithm is ready to be used as the alignment core of PSI-BLAST. We also evaluated the accuracy of two proposed approaches to edge effect correction in short sequence alignment statistics that turns out to be one of the crucial issues in developing a hybrid-alignment based version of PSI-BLAST. In addition, we have exploited other benefits of the hybrid alignment. We show that incorporating information about the suboptimal alignments, otherwise ignored in PSI-BLAST, already improves the sensitivity of PSI-BLAST. In one experiment, we have found a set of sequences on which our tool disagrees with the classification given by SCOP. Careful examination points to a possible misclassification in SCOP. Cross-referencing with two other methods of protein structure classification, CATH and DALI, supports this view, indicating that the enriched information from suboptimal alignments is valuable for detecting more weakly related sequences. Finally, we have integrated position-specific gap penalties in PSI-BLAST, which is intensionally left out due to a theoretical limitation of its underlying Smith-Waterman score statistics. We also investigated several strategies to adjust the position-based gap costs derived from the forward-backward algorithm. The results show that the degree of conservedness calculated as a localized relative entropy from the position-specific substitution matrix is the most effective. Such enhancements further improve the sensitivity of PSI-BLAST for remote homology detection in database searches.
Mario Lauria (Advisor)
171 p.

Recommended Citations

Hide/Show APA Citation

Li, Y. (2006). Searching for remotely homologous sequences in protein databases with hybrid PSI-blast. (Electronic Thesis or Dissertation). Retrieved from https://etd.ohiolink.edu/

Hide/Show MLA Citation

Li, Yuheng. "Searching for remotely homologous sequences in protein databases with hybrid PSI-blast." Electronic Thesis or Dissertation. Ohio State University, 2006. OhioLINK Electronic Theses and Dissertations Center. 19 Apr 2015.

Hide/Show Chicago Citation

Li, Yuheng "Searching for remotely homologous sequences in protein databases with hybrid PSI-blast." Electronic Thesis or Dissertation. Ohio State University, 2006. https://etd.ohiolink.edu/

Files

osu1164741421.pdf (1.18 MB) View|Download