SEA-PHAGES | All posts created by cdshaffer

← previous
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
next →

Link to this post \| posted 13 May, 2025 15:02
cdshaffer	yes. The genome is going to circularize after entry, at which point there is only one copy of the repeat within the genome. This means that annotations of the linear genome will always have the possibility of this kind of quirk. In addition, if you annotated a partial gene at the start of the genome it would cause all kinds of issues for the all the computational checks and make handling the genome much more labor intensive, so the best approach is as you suggest. Annotate the copy that is the full intact gene at the end and do not annotate the partial gene at the beginning.

Link to this post | posted 13 May, 2025 15:02

yes. The genome is going to circularize after entry, at which point there is only one copy of the repeat within the genome. This means that annotations of the linear genome will always have the possibility of this kind of quirk. In addition, if you annotated a partial gene at the start of the genome it would cause all kinds of issues for the all the computational checks and make handling the genome much more labor intensive, so the best approach is as you suggest. Annotate the copy that is the full intact gene at the end and do not annotate the partial gene at the beginning.

Posted in: Functional Annotation → Help with Annotating Direct Terminal Repeats

Link to this post \| posted 04 May, 2025 00:37
cdshaffer	Database 598 was released with 1 replacement phage to reflect a name spelling change. The whole starterator report for this phage has been added to collection found here just to have the correct spelling https://wustl.box.com/v/Actino-phage New Phage Released in 598: Schaffner

Posted in: Starterator → Whole phage starterator reports

Link to this post \| posted 26 Apr, 2025 18:57
cdshaffer	Database 597 was released with 1 new phage. The whole starterator report for this phage has been added to collection found here: https://wustl.box.com/v/Actino-phage New Phage Released in 594: KillerQueen

Posted in: Starterator → Whole phage starterator reports

Link to this post \| posted 19 Apr, 2025 22:54
cdshaffer	Database 596 was released with 4 new phage. The whole starterator reports for these phage have been added to collection found here: https://wustl.box.com/v/Actino-phage New Phage Released in 593: Keough Lilo27 MamaT & Skelbel

Posted in: Starterator → Whole phage starterator reports

Link to this post \| posted 18 Apr, 2025 21:37
cdshaffer	You posted really good evidence in support of the presence of zinc finger domains which are often part of DNA binding domains. However in the Mol Microbiol. 2017 Aug;105(4) paper they note that the Fin protein actually has a protein:protein interaction with RNA polymerase. I only did a quick scan of the figures but it appars they did nuclear magnetic resonance chemical shift analysis of fin in the presence of B' to look for evidence for which amino acids are involved in this protein protein interaction. See fig 5C. In there it looks like the residues that are the most perturbed and therefor the most likely to be critical for binding are Gly 15 and Glu 45. So the question is are those residues conserved? if so then you might have an argument for an annotation but just having a zinc finger motif means the only annotation justifieed is "zinc finger doamin" and we usually do not annotate just domain. I think is especially valid in this case as most people, including myself, think of zinc finder domains as DNA binding domains but this paper say that Fin is a RNA polymerase binding not DNA. binding. So I would stick with NKF

Link to this post | posted 18 Apr, 2025 21:37

cdshaffer

You posted really good evidence in support of the presence of zinc finger domains which are often part of DNA binding domains. However in the Mol Microbiol. 2017 Aug;105(4) paper they note that the Fin protein actually has a protein:protein interaction with RNA polymerase. I only did a quick scan of the figures but it appars they did nuclear magnetic resonance chemical shift analysis of fin in the presence of B' to look for evidence for which amino acids are involved in this protein protein interaction. See fig 5C. In there it looks like the residues that are the most perturbed and therefor the most likely to be critical for binding are Gly 15 and Glu 45. So the question is are those residues conserved? if so then you might have an argument for an annotation but just having a zinc finger motif means the only annotation justifieed is "zinc finger doamin" and we usually do not annotate just domain. I think is especially valid in this case as most people, including myself, think of zinc finder domains as DNA binding domains but this paper say that Fin is a RNA polymerase binding not DNA. binding.
So I would stick with NKF

Posted in: Request a new function on the SEA-PHAGES official list → Fin anti-sigmaF factor

Link to this post \| posted 14 Apr, 2025 17:28
cdshaffer	As a follow-up. If you want to examine the predicted structures in 2D like aragorn you can use this site: https://rnacentral.org/r2dt Take any result from the tRNAscan struct file like this: `Seq: TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC Str: >>>>>>>..>>>……….<<<.>.>>>…….<<<.<…..>>>>>…….<<<<<.<<<<<<<.` Get rid of the "seq:" and "Str:" replace all > with ( and all the < with ) and then add a header line starting with > and adding a description `>watermoore_tRNA4 TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC (((((((..(((……….))).(.(((…….))).)…..(((((…….))))).))))))).` Paste that into the web page and hit run to get a nice colorful 2D rendering, see attached. Edited 14 Apr, 2025 17:36 32Kb

Link to this post | posted 14 Apr, 2025 17:28

cdshaffer

As a follow-up. If you want to examine the predicted structures in 2D like aragorn you can use
this site:

https://rnacentral.org/r2dt

Take any result from the tRNAscan struct file like this:

Seq: TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC
Str: >>>>>>>..>>>……….<<<.>.>>>…….<<<.<…..>>>>>…….<<<<<.<<<<<<<.

Get rid of the "seq:" and "Str:" replace all > with ( and all the < with )
and then add a header line starting with > and adding a description

>watermoore_tRNA4
TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC
(((((((..(((……….))).(.(((…….))).)…..(((((…….))))).))))))).

Paste that into the web page and hit run to get a nice colorful 2D rendering, see attached.

Edited 14 Apr, 2025 17:36

Posted in: Bioinformatic Tools and Analyses → tRNAscan-SE

Link to this post \| posted 14 Apr, 2025 16:13
cdshaffer	As was discussed in the Friday CAT session tRNAscan has been down for a few days. The good news is that the program is also available for download and can be run on the command line. With that in mind I just processed all 750 odd phages that are marked as draft with tRNAscan-se. If your phage is missing post a follow-up message to this thread. You can get your results at the following link: http://phages.wustl.edu/trnascan/ Your results will be a zip archive with 3 files. These files are named starting with the Phage name and include "info", "struct", and "table" as part of the file name. The "info" file was created to confirm that the phage was analyzed and has the exact details on run parameters and version (you will need the version number if you intend to publish). For documentation here: tRNAscan-se was run with parameters "-X 0 -d -B -I -D -q" and the exact version was 2.0.12. The "struct" file contains the predicted structure of the tRNA as printed by tRNAscan-se. The "table" file contains the results in a table format very similar to the table results you get from the web page. If your genome has NO predicted tRNA's the "struct" and "table" files will be there but will be empty. Just be aware that the command line version does not produce those nice structure predictions like aragorn. The structure results look something like the text below and must be manually parsed. For this, the struct files are best viewed using a monospace font like consolas or courier (not available in forum posts): `eq: TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC Str: >>>>>>>..>>>……….<<<.>.>>>…….<<<.<…..>>>>>…….<<<<<.<<<<<<<.` Edited 14 Apr, 2025 17:11

Link to this post | posted 14 Apr, 2025 16:13

cdshaffer

As was discussed in the Friday CAT session tRNAscan has been down for a few days. The good news is that the program is also available for download and can be run on the command line. With that in mind I just processed all 750 odd phages that are marked as draft with tRNAscan-se. If your phage is missing post a follow-up message to this thread. You can get your results at the following link:

http://phages.wustl.edu/trnascan/

Your results will be a zip archive with 3 files. These files are named starting with the Phage name and include "info", "struct", and "table" as part of the file name.

The "info" file was created to confirm that the phage was analyzed and has the exact details on run parameters and version (you will need the version number if you intend to publish). For documentation here: tRNAscan-se was run with parameters "-X 0 -d -B -I -D -q" and the exact version was 2.0.12.

The "struct" file contains the predicted structure of the tRNA as printed by tRNAscan-se. The "table" file contains the results in a table format very similar to the table results you get from the web page.

If your genome has NO predicted tRNA's the "struct" and "table" files will be there but will be empty.

Just be aware that the command line version does not produce those nice structure predictions like aragorn. The structure results look something like the text below and must be manually parsed. For this, the struct files are best viewed using a monospace font like consolas or courier (not available in forum posts):


eq: TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC
Str: >>>>>>>..>>>……….<<<.>.>>>…….<<<.<…..>>>>>…….<<<<<.<<<<<<<.

Edited 14 Apr, 2025 17:11

Posted in: Bioinformatic Tools and Analyses → tRNAscan-SE

Link to this post \| posted 29 Mar, 2025 02:09
cdshaffer	Database 593 was released with 5 new phage. The whole starterator reports for these phage have been added to collection found here: https://wustl.box.com/v/Actino-phage New Phage Released in 593: FatCactus, Liberone, Neuvillette, RoseMarie, and Rossetti

Posted in: Starterator → Whole phage starterator reports

Link to this post \| posted 22 Mar, 2025 00:56
cdshaffer	Database 592 was released with one new phage. The whole starterator report for this phage has been added to collection here: https://wustl.box.com/v/Actino-phage Phage Released in 592: WileyE

Posted in: Starterator → Whole phage starterator reports

Link to this post \| posted 14 Mar, 2025 16:56
cdshaffer	Here is a simple example with super simple sequences: `query = AAA subject = CCCAAATTTGGG` When blast does the alignment with appropriate settings (since blast would never show this by default as it is too short, but you can force it) and you would in theory get this result: `1 AAA 3 \|\|\| 4 AAA 6` The %aligned is 100% because the entire query is found in the alignment the % coverage is 25% since only 3 bases of the 12 bases in the subject are in the alignment For % identity you get 100 % because 100% of the bases in the alignment match identically For DNA there is no %similar (the % similar is only used for amino acids alignment) but for a.a you just would count the fraction of alignment columns that are either identical or similar and divide by the length of the alignment. CCD is a database of protein domains (i.e. small parts of proteins seen widely) think things like zinc finger or ATP binding domain. Thus, for interpretation of CCD hits hits you care most about the % coverage and % similar the % aligned is mostly irrelevant.

Link to this post | posted 14 Mar, 2025 16:56

cdshaffer

Here is a simple example with super simple sequences:

query = AAA

subject = CCCAAATTTGGG

When blast does the alignment with appropriate settings (since blast would never show this by default as it is too short, but you can force it) and you would in theory get this result:


1 AAA 3
  |||
4 AAA 6

The %aligned is 100% because the entire query is found in the alignment
the % coverage is 25% since only 3 bases of the 12 bases in the subject are in the alignment
For % identity you get 100 % because 100% of the bases in the alignment match
identically
For DNA there is no %similar (the % similar is only used for amino acids alignment)
but for a.a you just would count the fraction of alignment columns that are either identical or similar and divide by the length of the alignment.

CCD is a database of protein domains (i.e. small parts of proteins seen widely) think things like zinc finger or ATP binding domain.
Thus, for interpretation of CCD hits hits you care most about the % coverage and % similar
the % aligned is mostly irrelevant.

Posted in: Bioinformatic Tools and Analyses → % Identity vs % Aligned vs % Coverage

← previous
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
next →

Recent Activity

All posts created by cdshaffer