SEA-PHAGES | All posts created by welkin

← previous
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
next →

Link to this post \| posted 13 Apr, 2018 18:24
welkin	Cluster CQ phages contain a number of tRNAs in a cluster near the left end of the genome, transcribed from left to right. This appears as a gap in the Phamerator map, because Phamerator maps do not show tRNAs.

Posted in: Cluster CQ Annotation Tips → a number of tRNAs

Link to this post \| posted 13 Apr, 2018 18:23
welkin	Cluster CQ does not have an identifiable lysin B. Edited 16 Aug, 2023 16:48

Posted in: Cluster CQ Annotation Tips → lysin B

Link to this post \| posted 13 Apr, 2018 18:21
welkin	Lysin A, which is a single multi-domain protein in many Actinobacteriophages, is split at the domain boundaries encoded by two adjacent genes in cluster CQ. The functions should be reported as "lysin A, whatever the domain is" to indicate that each gene does not encode the full lysin A.

Posted in: Cluster CQ Annotation Tips → lysin A in two parts

Link to this post \| posted 13 Apr, 2018 17:32
welkin	Like the Cluster A mycobacteriophages, Cluster CA phages have two overlapping DNA primase genes back to back. It is unclear how the correct full length primase gets made.

Posted in: Cluster CA Annotation Tips → Double DNA primase

Link to this post \| posted 13 Apr, 2018 14:38
welkin	YEp, thanks, LEe— not an error. For the exact slip coordinate, I usually pick the middle nucleotide of the slippery sequence.

Posted in: Frameshifts and Introns → Frameshift-C1

Link to this post \| posted 13 Apr, 2018 14:30
welkin	Hi, You've identified a neat little region in the B3s that is not well-conserved. It seems like every B3 has its own different nucleotide sequence right here, which makes it hard to use comparative genomics to give you the answer. The Phamerator map has actually called a tiny forwards gene and your genemark TB data looks like it should be a slightly larger reverse gene in the same place– although the coding potential overlaps the RNA ligase, so that's no good. I guess I am leaning towards "not a gene", mostly because of the lack of small genes upstream of the RNA ligase in the other B3s. Edited 13 Apr, 2018 14:35

Link to this post | posted 13 Apr, 2018 14:30

welkin

Hi,
You've identified a neat little region in the B3s that is not well-conserved. It seems like every B3 has its own different nucleotide sequence right here, which makes it hard to use comparative genomics to give you the answer. The Phamerator map has actually called a tiny forwards gene and your genemark TB data looks like it should be a slightly larger reverse gene in the same place– although the coding potential overlaps the RNA ligase, so that's no good.

I guess I am leaning towards "not a gene", mostly because of the lack of small genes upstream of the RNA ligase in the other B3s.

Edited 13 Apr, 2018 14:35

Posted in: Gene or not a Gene → A gene or not a gene - Morty007 #68

Link to this post \| posted 13 Apr, 2018 13:30
welkin	Hi Arturo, This draft annotation in Phamerator is an excellent illustration in the limitations of the gene prediction programs. Genes 6, 7, and 8 are all in the draft annotation because something about their nucleotide content was scored highly enough by the algorithms to rate as a "gene". However, we also know that the gene prediction programs are wrong somewhere between 5 and 10% of the time. You are also correct that these calls, (6,7, and and (13,14) violate the guiding principles and should be resolved. You should explore all the prediction via BLAST and HHPred and see if the sequences are found in other phages and/or if they have predicted functions. From looking at an EG Phamerator map, it looks like keeping 14 and trimming 13 is the choice that was made for the related genomes. And to be clear: the guide states that 120bp is a normal lower size limit for genes, not a hard and fast rule. We know of a number of exceptions that we've characterized at the bench. So you should absolutely NOT just delete small ORFs from a draft annotation just because they are small. https://seaphagesbioinformatics.helpdocsonline.com/article-27 BEst, Welkin

Link to this post | posted 13 Apr, 2018 13:30

welkin

Hi Arturo,
This draft annotation in Phamerator is an excellent illustration in the limitations of the gene prediction programs.
Genes 6, 7, and 8 are all in the draft annotation because something about their nucleotide content was scored highly enough by the algorithms to rate as a "gene". However, we also know that the gene prediction programs are wrong somewhere between 5 and 10% of the time.

You are also correct that these calls, (6,7, and smile

and (13,14) violate the guiding principles and should be resolved. You should explore all the prediction via BLAST and HHPred and see if the sequences are found in other phages and/or if they have predicted functions. From looking at an EG Phamerator map, it looks like keeping 14 and trimming 13 is the choice that was made for the related genomes.

And to be clear: the guide states that 120bp is a normal lower size limit for genes, not a hard and fast rule. We know of a number of exceptions that we've characterized at the bench. So you should absolutely NOT just delete small ORFs from a draft annotation just because they are small.

https://seaphagesbioinformatics.helpdocsonline.com/article-27

BEst,
Welkin

Posted in: Gene or not a Gene → Cluster EG-Annotation guiding principles

Link to this post \| posted 13 Apr, 2018 13:19
welkin	Hi Joe, for BLAST: The gene accession number is available when you BLAST on NCBI. The database you are BLASTing against is either NCBI or phagesdb. We have worked hard to sync these two databases with respect of our own data, however, we have altered the annotation for some phages that we did not isolate or sequence in the phagesdb database. For phagesdb, there won't be a gene accession number. In HHPREd, you are doing your alignment against four databases at time. They do not all have equivalently reliable information. So if your function comes from a crystal structure, you'd write "PDB". If it is a pfam entry, you'd write pfam. Etc. Regarding the lines of evidence– we are asking you to investigate all three for every gene to make sure that you don't find conflicting answers. You may find that a scaffolding protein doesn't have any sequence similarity to anything via BLAST, and no entries via HHPRed, but it is still the only small protein between the protease and the capsid protein. In that case, it is fine that two lines are NKF, and synteny gets you "scaffolding".

Link to this post | posted 13 Apr, 2018 13:19

welkin

Hi Joe,
for BLAST:
The gene accession number is available when you BLAST on NCBI. The database you are BLASTing against is either NCBI or phagesdb. We have worked hard to sync these two databases with respect of our own data, however, we have altered the annotation for some phages that we did not isolate or sequence in the phagesdb database. For phagesdb, there won't be a gene accession number.

In HHPREd, you are doing your alignment against four databases at time. They do not all have equivalently reliable information. So if your function comes from a crystal structure, you'd write "PDB". If it is a pfam entry, you'd write pfam. Etc.

Regarding the lines of evidence– we are asking you to investigate all three for every gene to make sure that you don't find conflicting answers. You may find that a scaffolding protein doesn't have any sequence similarity to anything via BLAST, and no entries via HHPRed, but it is still the only small protein between the protease and the capsid protein. In that case, it is fine that two lines are NKF, and synteny gets you "scaffolding".

Posted in: Notes and Final Files → Clarification regarding "SIF"

Link to this post \| posted 12 Apr, 2018 17:59
welkin	Cluster Q phages have a natural gap in coding sequences in the right arm, starting around 51600 in Giles. This is because they have a conserved small RNA—demonstrated in this paper: Dedrick et al 2013. https://www.ncbi.nlm.nih.gov/pubmed/23560716

Posted in: Cluster Q Annotation Tips → small RNA in right arm

Link to this post \| posted 05 Apr, 2018 12:19
welkin	Hi Arturo, You are right that in a circularly permuted genome you should pay attention to the gap between gene 1 and its upstream gene (in this case, the last gene). However– we ask you write down the gap to highlight the space or overlap between genes and get you to think about whether or not you've chosen the correct start. Since you know where gene 1 starts, it becomes somewhat irrelevant to note the gap. So gene 1s in all the genomes have always received a pass on the "gap". you can just write "n/a" for gene 1.

Link to this post | posted 05 Apr, 2018 12:19

welkin

Hi Arturo,
You are right that in a circularly permuted genome you should pay attention to the gap between gene 1 and its upstream gene (in this case, the last gene).
However– we ask you write down the gap to highlight the space or overlap between genes and get you to think about whether or not you've chosen the correct start. Since you know where gene 1 starts, it becomes somewhat irrelevant to note the gap. So gene 1s in all the genomes have always received a pass on the "gap". you can just write "n/a" for gene 1.

Posted in: Choosing Start Sites → How to choose the start of the first gene for a circularly permuted genome

← previous
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
next →

Recent Activity

All posts created by welkin