The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at

Using BLAST to validate start sites

| posted 03 Mar, 2016 15:49
Maybe I'm missing something obvious, but I really don't see how BLAST results can be used to validate (or call into question) a proposed start site. I understand the concept that perhaps moving the start can result in better BLAST (more hits, more 1/1 alignment, higher scores with lower E values etc) but what is it that people look at to determine that BLAST supports a start at, say 14639bp as opposed to a start at 14651? Or am I just fundamentally misunderstanding something?
| posted 03 Mar, 2016 16:15
Hi Joe,

I think the value of using BLAST when choosing start sites lies in the fact that you're comparing your choice to the final choices made by other (often experienced) people. So it's not biological evidence, but it's sort of consensus evidence. You could argue, of course, that simply because someone else has called it a certain way is no reason to call it that way too, and the skeptic in all of us resists conforming to the norm. But many of the genomes in GenBank have been very carefully QCed and pored over by those who do this work more than anyone else.

So I think of it kind of like checking your work with an expert. If you BLAST, and get 1:1 hits with many published things, it's like, "Oh, good, seems like everyone agrees with this start call. That makes me a little more confident in it." If all the BLAST hits are not 1:1, it's like "Hmmm…looks like everyone else called a different start for some possibly good reason. Maybe I should double-check and really buttress my argument on this one."

Two things to keep in mind when using BLAST results for start sites:
  1. Who submitted this genome to GenBank? Was it Debbie/Welkin/Graham? Or was it Jimmy FirstPhage? Give more weight to trusted experts.
  2. When was this genome submitted? We have more evidence now for making start site calls than we did in the past, so give more weight to newer genomes than older ones.

Hope that helps,
| posted 03 Mar, 2016 17:28
Dan: Thanks once again for helping out! I think my main problem is perhaps I don't quite understand what 1/1 alignment actually means. I thought it simply meant for example query=ala-ala-gly-gly-phe-met-met-etc target=ala-ala-gly-gly-phe-met-met-etc, in other words the sequences are same from beginning to end (maybe off a bit here and there) WHEREVER IT IS THEY ACTUALLY START IN THE SEQUENCE, so query could start at 14456, target could start at 14471, and they're still 1/1 aligned. Whereas say 1/4 alignment would mean something like query=ala-ala-gly-gly-phe-met-met-etc target=aa-aa-aa-ala-ala-gly-gly-phe-met-met-etc, so the aligned region in the target starts at aa#4 and continues from there until whenever, in other words the region of alignment is just sort of shifted 4 aa down. Should we just assume that 1/1 alignment means actual start site agreement?
| posted 03 Mar, 2016 18:47
Hey Joe,

What 1:1 means is that the first amino acid of the query sequence matches the first amino acid of the subject sequence. This implies that the start site of your query sequence matches the start site of the published subject sequence.

Or, like you said, a 1:4 alignment would mean that the first amino acid of the query matches the fourth amino acid of the subject. This means there are three more amino acids in the published subject sequence that are not in your query. This can indicate that you're choosing a different start than the published one.

I made a picture that hopefully helps show these different situations. Remember that the grayed-out bases in this picture won't appear in BLAST results, because they're not part of the alignment. But I included them in this picture to give a better sense of how the protein sequences are really lining up.

Edited 03 Mar, 2016 18:49
| posted 09 Mar, 2016 20:05
Thanks again, Dan, this is helpful!
Login to post a reply.