SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

Tutorial on Phamerator and Starterator Use?

| posted 03 Feb, 2016 15:08
Dan Russell
saleadon
Can you "overwhelm" Starterator? Analyzing one of our genes produced 351 tracks! The graph was incomplete: it only listed the first few start sites for just one of the tracks. The report appears to be useful, although it would be nice to see the graph.
Thanks.

Hey Steve,

What was the gene and Pham number that this happened on?

–Dan

Hi Dan,
The gene was 58 in Iridoclysis, Pham 6732. I have attached the file.
Steve
| posted 05 Feb, 2016 18:04
I have been trying to keep a list of bugs and possible improvements to starterator (see issues on my github cdshaffer/starterator repo if you want to see the specific list). I saw a very similar result in the phage Mitkao pham 1510 output. I was able to do a little sniffing around in that case. The problem was a single unusual gene with a very long ORF upstream of the start codon that messed up the calculation of the scaling to use for the X axis. Another pham had a different issue but a similar output in that there was just too much protein sequence divergence among the pham members so there was no pink simply because there was so little conservation among all members.

So in both cases I investigated it was not simply the size of the pham but unusual properties of the specific pham. This is very typical in bioinformatics. The computer programs will take care of 95-99% of cases, but since biology is not math there are always unusual corner cases that just don't work well. In the MitKao case one of the assumptions made my starterator is that there will be an in-frame stop codon not too far upstream of the annotated start codon. In rare cases this assumption is incorrect and the output fails to give meaningful results.

I always use results like this as a teaching moment. This is a great example that no computer program is 100% successful and it is why it is still worthwhile doing manual annotation. So in this case, the "experiment" (i.e. the automated analysis of a multiple sequence alignment of all genes in a pham using ClustalW) failed to give a result. I would explain to the student that we now have a decision to make: try to do the analysis manually or just move on. This brings up the opportunity to discuss cost/benefit analysis and how that relates to research and that there is never enough time to do everything and a good researcher is making good choices about where to invest time and $ to get the best outcome they can afford. I would then probably say in this case that the manual analysis is not worth the time/effort and just put in the notes that starterator was NI (not informative) as suggested in the Annotation Guide (see page 76).
| posted 08 Feb, 2016 14:01
cdshaffer
I have been trying to keep a list of bugs and possible improvements to starterator (see issues on my github cdshaffer/starterator repo if you want to see the specific list). I saw a very similar result in the phage Mitkao pham 1510 output. I was able to do a little sniffing around in that case. The problem was a single unusual gene with a very long ORF upstream of the start codon that messed up the calculation of the scaling to use for the X axis. Another pham had a different issue but a similar output in that there was just too much protein sequence divergence among the pham members so there was no pink simply because there was so little conservation among all members.

So in both cases I investigated it was not simply the size of the pham but unusual properties of the specific pham. This is very typical in bioinformatics. The computer programs will take care of 95-99% of cases, but since biology is not math there are always unusual corner cases that just don't work well. In the MitKao case one of the assumptions made my starterator is that there will be an in-frame stop codon not too far upstream of the annotated start codon. In rare cases this assumption is incorrect and the output fails to give meaningful results.

I always use results like this as a teaching moment. This is a great example that no computer program is 100% successful and it is why it is still worthwhile doing manual annotation. So in this case, the "experiment" (i.e. the automated analysis of a multiple sequence alignment of all genes in a pham using ClustalW) failed to give a result. I would explain to the student that we now have a decision to make: try to do the analysis manually or just move on. This brings up the opportunity to discuss cost/benefit analysis and how that relates to research and that there is never enough time to do everything and a good researcher is making good choices about where to invest time and $ to get the best outcome they can afford. I would then probably say in this case that the manual analysis is not worth the time/effort and just put in the notes that starterator was NI (not informative) as suggested in the Annotation Guide (see page 76).

Chris,
Thanks for your reply and your insights.
Steve
 
Login to post a reply.