SEA-PHAGES | Excellent BLAST but poor HHPred

Link to this post \| posted 20 Mar, 2018 17:25
jross1025	I seem to recall that before the re-design of the HHPred website we got rather good correspondence between NCBI BLASTp and HHPred, at least in terms of how one could use HHPred to sort of confirm a function call based originally only or mainly on BLAST. Generally speaking in our experience really good BLAST was a good predictor of quite good (at least) HHPred. We have only just now begun to use the “new” site and results seem quite different. For example, feature 4 in our autoannotation of ugenie5, commencing at 3230bp, translates as follows: MADLGIRVDADSLVLWRGRDFKWNFENLDASQTPIPYPPGRLFFELQTGGEHNALHRVYITGATGGTYTLKCNGIDTAAIDYNDVSENPQGLAGDITDAVLGAVGAGNAVIHPVSLYPAWTLNFNLNSSKPLTEQLVNTINKTANDFFDTFDSLLGVDVEMTVTDQLNFKLVVTSRRSFDEVGVVTFAVDVTSTAVKNFFNAAAGLIGAVNAVSTDFYWNREYNIEYTGDLALTPIPATTANATGLVGTNKRIVTEVLEPGKEPMTIWEFVIEDSIASIKIESEEADKIANRVKWQLVFLPEGEVAGGDPIALGTVSKVGZ In NCBI BLASTp, this returns numerous very high quality hits 100% aligned 1:1, E values of zero or very nearly so and extensively aligned, with top scores well over 1000. The highest scoring hit, and in total 7 of the top ten hits, call either “minor tail protein” or “tail protein” as functions. However, at least in our hands when the same amino acid sequence is run through HHPred using the PDB_mmCIF70_25Feb, SCOPe70_2.07, Pfam-a_v31.0, and NCBI_conserved_domains(CD)_v3.16 databases (the closest I could get to what is suggested in the SEA online guide), you get a very different picture: seemingly patchy alignments (at least from what the graphic looks like) and VERY high E values (over 100). We’re accustomed to seeing this kind of thing only when the BLAST results themselves are rather iffy. Thoughts, anyone?

Link to this post | posted 20 Mar, 2018 17:25

I seem to recall that before the re-design of the HHPred website we got rather good correspondence between NCBI BLASTp and HHPred, at least in terms of how one could use HHPred to sort of confirm a function call based originally only or mainly on BLAST. Generally speaking in our experience really good BLAST was a good predictor of quite good (at least) HHPred. We have only just now begun to use the “new” site and results seem quite different. For example, feature 4 in our autoannotation of ugenie5, commencing at 3230bp, translates as follows:

MADLGIRVDADSLVLWRGRDFKWNFENLDASQTPIPYPPGRLFFELQTGGEHNALHRVYITGATGGTYTLKCNGIDTAAIDYNDVSENPQGLAGDITDAVLGAVGAGNAVIHPVSLYPAWTLNFNLNSSKPLTEQLVNTINKTANDFFDTFDSLLGVDVEMTVTDQLNFKLVVTSRRSFDEVGVVTFAVDVTSTAVKNFFNAAAGLIGAVNAVSTDFYWNREYNIEYTGDLALTPIPATTANATGLVGTNKRIVTEVLEPGKEPMTIWEFVIEDSIASIKIESEEADKIANRVKWQLVFLPEGEVAGGDPIALGTVSKVGZ

In NCBI BLASTp, this returns numerous very high quality hits 100% aligned 1:1, E values of zero or very nearly so and extensively aligned, with top scores well over 1000. The highest scoring hit, and in total 7 of the top ten hits, call either “minor tail protein” or “tail protein” as functions. However, at least in our hands when the same amino acid sequence is run through HHPred using the PDB_mmCIF70_25Feb, SCOPe70_2.07, Pfam-a_v31.0, and NCBI_conserved_domains(CD)_v3.16 databases (the closest I could get to what is suggested in the SEA online guide), you get a very different picture: seemingly patchy alignments (at least from what the graphic looks like) and VERY high E values (over 100). We’re accustomed to seeing this kind of thing only when the BLAST results themselves are rather iffy.

Thoughts, anyone?

Link to this post \| posted 03 Apr, 2018 14:07
welkin	Hi Joe, I am not surprised that you didn't get HHPred hits to minor tail proteins. The best HHPred data comes from crystal structures and tail proteins are extremely hard to crystalize as they are long and fibrous. They are also extremely modular, which can make generating a conserved domain or multiple sequence alignment difficult too (the other sources of data for HHPred searches). The minor tail proteins at the beginning of the genome in Cluster A phages were initially assigned this function by Graham when he worked on the annotation of L5 and D29— and so now we are using synteny more than anything else to assign these functions, as in, "Those big long extended proteins at the left end of Cluster A genomes that are not lysins are tail proteins". A lot of minor tail proteins get assigned functions by synteny– they are the right size and in the right place in the genome, and so, therefore, that's what they have to be. Make sense?

Link to this post | posted 03 Apr, 2018 14:07

welkin

Hi Joe,
I am not surprised that you didn't get HHPred hits to minor tail proteins. The best HHPred data comes from crystal structures and tail proteins are extremely hard to crystalize as they are long and fibrous. They are also extremely modular, which can make generating a conserved domain or multiple sequence alignment difficult too (the other sources of data for HHPred searches).
The minor tail proteins at the beginning of the genome in Cluster A phages were initially assigned this function by Graham when he worked on the annotation of L5 and D29— and so now we are using synteny more than anything else to assign these functions, as in, "Those big long extended proteins at the left end of Cluster A genomes that are not lysins are tail proteins". A lot of minor tail proteins get assigned functions by synteny– they are the right size and in the right place in the genome, and so, therefore, that's what they have to be.

Make sense?

Recent Activity

Excellent BLAST but poor HHPred