SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

LO: Designation Question

| posted 16 Feb, 2016 16:09
We want to make certain we are completing the information in DNA Master correctly. At our recent training we were given the two choices for the "LO:" description.

LO: Longest Reasonable ORF -or-
Not Longest Reasonable ORF (explain in notes below)

I think we have some confusion over this language and a worksheet we 'borrowed' from another school. That worksheet actually has the student record the length of the longest open reading frame. Am I correct that the DNA Master "Notes" section does not require this information?

When would one choose an ORF that is not the longest reasonable ORF? If there was a long overlap, that would make it unreasonable? (We are finding some 90+bp overlaps that have been used in PhageDB.) If the SD sequence was 3' of the ATG that would make it unreasonable? What else might cause one to select an ORF other than the "Longest Reasonable ORF"?

Can you give me some scenarios that might lead one to select the "longest 'unreasonable' ORF" or a "shorter-than-longest reasonable ORF"?

I hope these questions are reasonable! smile

| posted 16 Feb, 2016 18:34
GregFrederick@letu.edu
We want to make certain we are completing the information in DNA Master correctly. At our recent training we were given the two choices for the "LO:" description.

LO: Longest Reasonable ORF -or-
Not Longest Reasonable ORF (explain in notes below)

I think we have some confusion over this language and a worksheet we 'borrowed' from another school. That worksheet actually has the student record the length of the longest open reading frame. Am I correct that the DNA Master "Notes" section does not require this information?

Correct. you do not need to include the length of the open reading frame.

When would one choose an ORF that is not the longest reasonable ORF? If there was a long overlap, that would make it unreasonable? (We are finding some 90+bp overlaps that have been used in PhageDB.) If the SD sequence was 3' of the ATG that would make it unreasonable? What else might cause one to select an ORF other than the "Longest Reasonable ORF"?

the most common example of "not the longest reasonable ORF" is that the start chosen leaves a gap between the gene you are working on and the upstream gene that could be made smaller by choosing a alternate start. So I am not looking so much for reasonable vs not reasonable, as longest vs not longest. People were taking "longest" as the most important criteria, and saying that genes were not the longest ORF because there was one distant upstream start codon that could cause a 90% overlap with the upstream gene. That is not a "reasonable" start to consider.

If something has a 90bp overlap, I definitely want to know about it.

Can you give me some scenarios that might lead one to select the "longest 'unreasonable' ORF" or a "shorter-than-longest reasonable ORF"?

Sure. when the comparative genomics, like through STarterator or BLAST, shows that the start that gives you the longest ORF in your phage gene in your genome isn't present in closely related genes, and one that gives you a shorter gene product is. Solid comparative data trumps everything.

I hope that helps!
| posted 16 Feb, 2016 19:41
Welkin Pope
Can you give me some scenarios that might lead one to select the "longest 'unreasonable' ORF" or a "shorter-than-longest reasonable ORF"?

Sure. when the comparative genomics, like through STarterator or BLAST, shows that the start that gives you the longest ORF in your phage gene in your genome isn't present in closely related genes, and one that gives you a shorter gene product is. Solid comparative data trumps everything.

I hope that helps!

Thanks Welkin!

So what if a BlastP ends up showing both ORFs (or even more than two) in published, non-draft genomes? Do you suggest using the longest published/finished ORF? Even if it breaks one or more of the guiding principles? The most abundant BLASTP result or what? Deciding what is 'reasonable' can be complicated!
| posted 16 Feb, 2016 22:31
Here is an example of what is a "reasonable".

All the of the phages we compared to Wunderphul are in the PhageDB phamerator DB.

Look at feature 28. In Zaka and CLoudwang3 they apparently called the longest ORF. But based on guiding principles it does not seem "reasonable" because of the entire overlap with feature 27 in those two genomes.

This one is easy to determine in our case because we have a STOP and this is the largest ORF in our case. But it obviously was not in Zaka and Cloudwang3. QUESTION: Do you want us to note those discrepancies so they can be edited? Are previous calls ever corrected/updated based on newer genomic information?

There are multiple examples of numerous starts being used in the databasse for a lot of our genes so determining the best and most reasonable start call remains complex even if homology trumps everything. QUESTION: In these cases do we rely on the "longest reasonable ORF" strategy (unless it severely breaks one or more guiding principle like feature 28 above)? If one frequent call results in 20ishBP overlap and another frequent call results in 3-4bp overlap, does one choose the longer or the shorter?

Obviously my "mental algorithm" is stuck in a loop and trying to add code that will help resolve the loop cycle. Thanks. smilesmile
| posted 16 Feb, 2016 23:21
Another confusing one is two adjacent start codes. (I asked a friend on the QC team. But I guess I'll throw it out here too.) In some of the finished genomes in PhagesDB the first ATG is used. In some the second is called even if there is little or no overlap. The SD values seem almost identical.

In most cases both starts seem to include the entire "coding potential". But only some of the finished genomes align 1:1. Others align 1:2 or 2:1.

If everything else is the same, do you call the first start, the second, the one starterator prefers?

Questions. Questions. Thanks for sharing your wisdom!
| posted 18 Feb, 2016 16:42
GregFrederick@letu.edu
Look at feature 28. In Zaka and CLoudwang3 they apparently called the longest ORF. But based on guiding principles it does not seem "reasonable" because of the entire overlap with feature 27 in those two genomes.

This one is easy to determine in our case because we have a STOP and this is the largest ORF in our case. But it obviously was not in Zaka and Cloudwang3. QUESTION: Do you want us to note those discrepancies so they can be edited? Are previous calls ever corrected/updated based on newer genomic information?

Greg,

I am going to suggest you have a look at the Annotation Guide section 9.4.1 in regard to your question about the feature 27 and 28 overlap you are seeing in Zaka and Cloudwang. This is a special situation.

Lee
| posted 18 Feb, 2016 18:30
Lee Hughes
GregFrederick@letu.edu
Look at feature 28. In Zaka and CLoudwang3 they apparently called the longest ORF. But based on guiding principles it does not seem "reasonable" because of the entire overlap with feature 27 in those two genomes.

This one is easy to determine in our case because we have a STOP and this is the largest ORF in our case. But it obviously was not in Zaka and Cloudwang3. QUESTION: Do you want us to note those discrepancies so they can be edited? Are previous calls ever corrected/updated based on newer genomic information?

Greg,

I am going to suggest you have a look at the Annotation Guide section 9.4.1 in regard to your question about the feature 27 and 28 overlap you are seeing in Zaka and Cloudwang. This is a special situation.

Lee

Thanks Lee. I had forgotten about that possibility. But does that mean that DaVinci (see image above) and all the others that do not have the programmed frameshift called are wrong and should be corrected?
| posted 18 Feb, 2016 19:36
GregFrederick@letu.edu
Thanks Lee. I had forgotten about that possibility. But does that mean that DaVinci (see image above) and all the others that do not have the programmed frameshift called are wrong and should be corrected?

My guess would be that there is an error in DaVinci. That is one that should probably be reviewed. Which others (that aren't drafts) didn't have the frameshift annotated?

Lee
| posted 18 Feb, 2016 19:51
Lee Hughes
GregFrederick@letu.edu
Thanks Lee. I had forgotten about that possibility. But does that mean that DaVinci (see image above) and all the others that do not have the programmed frameshift called are wrong and should be corrected?

My guess would be that there is an error in DaVinci. That is one that should probably be reviewed. Which others (that aren't drafts) didn't have the frameshift annotated?

Lee

I have to say I don't remember the names of the other phages. (Can I blame it on old age even though I don't like to use that card?) It's been more than a few days and a few genes examined since I was doing those Phamerator alignments. I'll try to pull up more of the subcluster into Phamerator in class this afternoon and let you know. Thanks.
Edited 18 Feb, 2016 20:04
 
Login to post a reply.