SEA-PHAGES | All posts created by welkin

← previous
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
next →

Link to this post \| posted 05 Apr, 2018 12:15
welkin	Hi Miriam, There is a document on the Faculty Information page in the Bioinformatics section that describes how to fix a corrupted file: http://seaphages.org/media/docs/How_to_fix_corrupted_files_wp_2016.pdf

Posted in: DNA Master → Corrupted files for merge

Link to this post \| posted 03 Apr, 2018 14:11
welkin	Unlike many other clusters, the Cluster A phages have some minor tail proteins at the left end of their genome, upstream of the lysins and terminase genes (around gene 4-6ish). You can recognize these proteins due to their size. Some of them may have structural motifs that suggest long, extended proteins, like collagen-repeats, or coiled-coils.

Posted in: Cluster A Annotation Tips → minor tail proteins

Link to this post \| posted 03 Apr, 2018 14:07
welkin	Hi Joe, I am not surprised that you didn't get HHPred hits to minor tail proteins. The best HHPred data comes from crystal structures and tail proteins are extremely hard to crystalize as they are long and fibrous. They are also extremely modular, which can make generating a conserved domain or multiple sequence alignment difficult too (the other sources of data for HHPred searches). The minor tail proteins at the beginning of the genome in Cluster A phages were initially assigned this function by Graham when he worked on the annotation of L5 and D29— and so now we are using synteny more than anything else to assign these functions, as in, "Those big long extended proteins at the left end of Cluster A genomes that are not lysins are tail proteins". A lot of minor tail proteins get assigned functions by synteny– they are the right size and in the right place in the genome, and so, therefore, that's what they have to be. Make sense?

Link to this post | posted 03 Apr, 2018 14:07

welkin

Hi Joe,
I am not surprised that you didn't get HHPred hits to minor tail proteins. The best HHPred data comes from crystal structures and tail proteins are extremely hard to crystalize as they are long and fibrous. They are also extremely modular, which can make generating a conserved domain or multiple sequence alignment difficult too (the other sources of data for HHPred searches).
The minor tail proteins at the beginning of the genome in Cluster A phages were initially assigned this function by Graham when he worked on the annotation of L5 and D29— and so now we are using synteny more than anything else to assign these functions, as in, "Those big long extended proteins at the left end of Cluster A genomes that are not lysins are tail proteins". A lot of minor tail proteins get assigned functions by synteny– they are the right size and in the right place in the genome, and so, therefore, that's what they have to be.

Make sense?

Posted in: Functional Annotation → Excellent BLAST but poor HHPred

Link to this post \| posted 22 Mar, 2018 15:08
welkin	Cluster H genomes have a non-canonical frameshift sequence in their tail assembly chaperones. While we are pretty sure we know where it is based on sequence alignments with Send513, we are going to not include it in our annotations at this point until we have wet bench data to back it up.

Posted in: Cluster H Annotation Tips → frameshifts

Link to this post \| posted 22 Mar, 2018 15:07
welkin	Cluster H genomes have a large gap in the center with no CDS or tRNA features. We don't know what goes in there and it is OK to leave it empty.

Posted in: Cluster H Annotation Tips → a big gap

Link to this post \| posted 14 Mar, 2018 17:43
welkin	This is in the Online Bioinformatics Guide, and restated here: Here is a list to highlight what we look for during the initial annotation review. Initial review is an iterative, rather than exhaustive process, and your files may be returned to you after we identify any one of the following issues. Please use a request for revision as an opportunity to double-check all the items below and it will be less likely that your files will be returned again. Most common reasons files are returned for revision during annotation preliminary review: 1. Missing the the official SEA-PHAGES cover sheet and checklist. We designed this sheet to highlight the items that should be addressed for every annotation. If you don't use it, we don't know if you've addressed them. 2. No annotation of the programmed translational frameshift in the tail assembly chaperones. We can't identify this region for all clusters, but in those that we can, it should be addressed. See the online guide and the Cluster-Specific annotation tips for help. Frameshifts should ONLY be added for the tail assembly chaperone as those are the only gene that have wet bench evidence for a shift. (https://seaphagesbioinformatics.helpdocsonline.com/article-54) 3. tRNAs are not correctly trimmed. Remember that the autoannotation tRNA predictions may not be correct and should be reviewed.(https://seaphagesbioinformatics.helpdocsonline.com/documenting-trnas-in-dna-master) 4. Functions do not match Official SEA-PHAGES function list. Unfortunately, we have to be extra picky about this — spelling, capitalization, and extraneous punctuation all matter as we move towards automated curation of our data. The computer thinks everything written out a different way is a different thing. Even hidden carriage return marks can interfere with downstream formatting. If you think your genome contains a gene with a function that is not on the list, create a new thread on the "request a new function for the official function list" forum. https://seaphagesbioinformatics.helpdocsonline.com/article-96 5. File formatting for any of the required files is not up-to-date. The old DNA Master Annotation Guide is no longer up-to-date or being updated— please use the new Online Guide. https://seaphagesbioinformatics.helpdocsonline.com/documentation 6. The phage page at phagesdb.org is missing the GPS coordinates and the complete name of the student(s) who found the phage. This information becomes part of the GenBank file, and we can't submit the files until we have that information. This is also a good time to check that the phage page record is complete and accurate. 7. Flagrant violations of the Guiding Principles of annotation: genes annotated on top of each other, huge gaps with no genes predicted and no explantion, etc. 8. No evidence that programs like HHpred and Starterator were used. These programs are essential for the identification of gene starts and accuracy of functional assignments. 9. Annotation is missing or has too many of the most common phage functions as laid out in the "Functions present in (almost) all phage genomes" in the Online Bioinformatics Guide. (https://seaphagesbioinformatics.helpdocsonline.com/article-91) This page is a quick reference guide to help you out with things like "how many portal genes should I expect to find in my genome?". Exceptions to this will be noted in the Cluster-specific annotation forums. 10. Finally, to generate the best annotations, please refer to the Cluster-Specific annotation forums. We've added lots of tips to help you out. If you have any questions, please ask. We are happy to help! Edited 04 Apr, 2018 19:31

Link to this post | posted 14 Mar, 2018 17:43

welkin

This is in the Online Bioinformatics Guide, and restated here:

Here is a list to highlight what we look for during the initial annotation review. Initial review is an iterative, rather than exhaustive process, and your files may be returned to you after we identify any one of the following issues. Please use a request for revision as an opportunity to double-check all the items below and it will be less likely that your files will be returned again.

Most common reasons files are returned for revision during annotation preliminary review:

1. Missing the the official SEA-PHAGES cover sheet and checklist.
We designed this sheet to highlight the items that should be addressed for every annotation. If you don't use it, we don't know if you've addressed them.

2. No annotation of the programmed translational frameshift in the tail assembly chaperones.
We can't identify this region for all clusters, but in those that we can, it should be addressed. See the online guide and the Cluster-Specific annotation tips for help. Frameshifts should ONLY be added for the tail assembly chaperone as those are the only gene that have wet bench evidence for a shift.
(https://seaphagesbioinformatics.helpdocsonline.com/article-54)

3. tRNAs are not correctly trimmed. Remember that the autoannotation tRNA predictions may not be correct and should be reviewed.(https://seaphagesbioinformatics.helpdocsonline.com/documenting-trnas-in-dna-master)

4. Functions do not match Official SEA-PHAGES function list.
Unfortunately, we have to be extra picky about this — spelling, capitalization, and extraneous punctuation all matter as we move towards automated curation of our data. The computer thinks everything written out a different way is a different thing. Even hidden carriage return marks can interfere with downstream formatting.
If you think your genome contains a gene with a function that is not on the list, create a new thread on the "request a new function for the official function list" forum.
https://seaphagesbioinformatics.helpdocsonline.com/article-96

5. File formatting for any of the required files is not up-to-date. The old DNA Master Annotation Guide is no longer up-to-date or being updated— please use the new Online Guide.
https://seaphagesbioinformatics.helpdocsonline.com/documentation

6. The phage page at phagesdb.org is missing the GPS coordinates and the complete name of the student(s) who found the phage. This information becomes part of the GenBank file, and we can't submit the files until we have that information. This is also a good time to check that the phage page record is complete and accurate.

7. Flagrant violations of the Guiding Principles of annotation: genes annotated on top of each other, huge gaps with no genes predicted and no explantion, etc.

8. No evidence that programs like HHpred and Starterator were used. These programs are essential for the identification of gene starts and accuracy of functional assignments.

9. Annotation is missing or has too many of the most common phage functions as laid out in the "Functions present in (almost) all phage genomes" in the Online Bioinformatics Guide. (https://seaphagesbioinformatics.helpdocsonline.com/article-91)
This page is a quick reference guide to help you out with things like "how many portal genes should I expect to find in my genome?".
Exceptions to this will be noted in the Cluster-specific annotation forums.

10. Finally, to generate the best annotations, please refer to the Cluster-Specific annotation forums. We've added lots of tips to help you out.

If you have any questions, please ask. We are happy to help!

Edited 04 Apr, 2018 19:31

Posted in: How to Pass Preliminary Annotation Review → How to Pass Preliminary Annotation Review

Link to this post \| posted 14 Mar, 2018 16:11
welkin	The MPMEs – mycobacteriophage mobile elements— are very small and closely related. So how to tell them apart? If your MPME is IDENTICAL to BPs 58, you have MPME1. If your MPME is IDENTICAL to Fruitloop 71, you have MPME2. IF you are not IDENTICAL to either, let us know! maybe you found a new one! See Sampson et al for more information. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2833263/

Posted in: Cluster G Annotation Tips → MPMEs--which one?

Link to this post \| posted 14 Mar, 2018 14:46
welkin	The MPMEs – mycobacteriophage mobile elements— are very small and closely related. So how to tell them apart? If your MPME is IDENTICAL to BPs 58, you have MPME1. If your MPME is IDENTICAL to Fruitloop 71, you have MPME2. IF you are not IDENTICAL to either, let us know! maybe you found a new one!. See Sampson et al for more information. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2833263/

Link to this post | posted 14 Mar, 2018 14:46

welkin

The MPMEs – mycobacteriophage mobile elements— are very small and closely related. So how to tell them apart?

If your MPME is IDENTICAL to BPs 58, you have MPME1.

If your MPME is IDENTICAL to Fruitloop 71, you have MPME2.

IF you are not IDENTICAL to either, let us know! maybe you found a new one!.

See Sampson et al for more information.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2833263/

Posted in: Cluster F Annotation Tips → MPMEs--which one?

Link to this post \| posted 14 Mar, 2018 14:41
welkin	Cluster F phages can have either +1 or -1 frameshifts in the tail assembly chaperones. Make sure you know which one your phage has.

Posted in: Cluster F Annotation Tips → frameshifts

Link to this post \| posted 14 Mar, 2018 14:40
welkin	The right arms of cluster F genes are characterized by TONS of tiny genes. These genes are sometimes so small that the gene prediction programs have a really difficult time predicting them. You can usually identify them easily though, as their start and stop codons will overlap with the flanking genes in a 4bp overlap.

Posted in: Cluster F Annotation Tips → 4 bp overlaps

← previous
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
next →

Recent Activity

All posts created by welkin