SEA-PHAGES | All posts created by cdshaffer

← previous
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
next →

Link to this post \| posted 19 Apr, 2025 22:54
cdshaffer	Database 596 was released with 4 new phage. The whole starterator reports for these phage have been added to collection found here: https://wustl.box.com/v/Actino-phage New Phage Released in 593: Keough Lilo27 MamaT & Skelbel

Posted in: Starterator → Whole phage starterator reports

Link to this post \| posted 18 Apr, 2025 21:37
cdshaffer	You posted really good evidence in support of the presence of zinc finger domains which are often part of DNA binding domains. However in the Mol Microbiol. 2017 Aug;105(4) paper they note that the Fin protein actually has a protein:protein interaction with RNA polymerase. I only did a quick scan of the figures but it appars they did nuclear magnetic resonance chemical shift analysis of fin in the presence of B' to look for evidence for which amino acids are involved in this protein protein interaction. See fig 5C. In there it looks like the residues that are the most perturbed and therefor the most likely to be critical for binding are Gly 15 and Glu 45. So the question is are those residues conserved? if so then you might have an argument for an annotation but just having a zinc finger motif means the only annotation justifieed is "zinc finger doamin" and we usually do not annotate just domain. I think is especially valid in this case as most people, including myself, think of zinc finder domains as DNA binding domains but this paper say that Fin is a RNA polymerase binding not DNA. binding. So I would stick with NKF

Link to this post | posted 18 Apr, 2025 21:37

cdshaffer

You posted really good evidence in support of the presence of zinc finger domains which are often part of DNA binding domains. However in the Mol Microbiol. 2017 Aug;105(4) paper they note that the Fin protein actually has a protein:protein interaction with RNA polymerase. I only did a quick scan of the figures but it appars they did nuclear magnetic resonance chemical shift analysis of fin in the presence of B' to look for evidence for which amino acids are involved in this protein protein interaction. See fig 5C. In there it looks like the residues that are the most perturbed and therefor the most likely to be critical for binding are Gly 15 and Glu 45. So the question is are those residues conserved? if so then you might have an argument for an annotation but just having a zinc finger motif means the only annotation justifieed is "zinc finger doamin" and we usually do not annotate just domain. I think is especially valid in this case as most people, including myself, think of zinc finder domains as DNA binding domains but this paper say that Fin is a RNA polymerase binding not DNA. binding.
So I would stick with NKF

Posted in: Request a new function on the SEA-PHAGES official list → Fin anti-sigmaF factor

Link to this post \| posted 14 Apr, 2025 17:28
cdshaffer	As a follow-up. If you want to examine the predicted structures in 2D like aragorn you can use this site: https://rnacentral.org/r2dt Take any result from the tRNAscan struct file like this: `Seq: TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC Str: >>>>>>>..>>>……….<<<.>.>>>…….<<<.<…..>>>>>…….<<<<<.<<<<<<<.` Get rid of the "seq:" and "Str:" replace all > with ( and all the < with ) and then add a header line starting with > and adding a description `>watermoore_tRNA4 TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC (((((((..(((……….))).(.(((…….))).)…..(((((…….))))).))))))).` Paste that into the web page and hit run to get a nice colorful 2D rendering, see attached. Edited 14 Apr, 2025 17:36 32Kb

Link to this post | posted 14 Apr, 2025 17:28

cdshaffer

As a follow-up. If you want to examine the predicted structures in 2D like aragorn you can use
this site:

https://rnacentral.org/r2dt

Take any result from the tRNAscan struct file like this:

Seq: TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC
Str: >>>>>>>..>>>……….<<<.>.>>>…….<<<.<…..>>>>>…….<<<<<.<<<<<<<.

Get rid of the "seq:" and "Str:" replace all > with ( and all the < with )
and then add a header line starting with > and adding a description

>watermoore_tRNA4
TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC
(((((((..(((……….))).(.(((…….))).)…..(((((…….))))).))))))).

Paste that into the web page and hit run to get a nice colorful 2D rendering, see attached.

Edited 14 Apr, 2025 17:36

Posted in: Bioinformatic Tools and Analyses → tRNAscan-SE

Link to this post \| posted 14 Apr, 2025 16:13
cdshaffer	As was discussed in the Friday CAT session tRNAscan has been down for a few days. The good news is that the program is also available for download and can be run on the command line. With that in mind I just processed all 750 odd phages that are marked as draft with tRNAscan-se. If your phage is missing post a follow-up message to this thread. You can get your results at the following link: http://phages.wustl.edu/trnascan/ Your results will be a zip archive with 3 files. These files are named starting with the Phage name and include "info", "struct", and "table" as part of the file name. The "info" file was created to confirm that the phage was analyzed and has the exact details on run parameters and version (you will need the version number if you intend to publish). For documentation here: tRNAscan-se was run with parameters "-X 0 -d -B -I -D -q" and the exact version was 2.0.12. The "struct" file contains the predicted structure of the tRNA as printed by tRNAscan-se. The "table" file contains the results in a table format very similar to the table results you get from the web page. If your genome has NO predicted tRNA's the "struct" and "table" files will be there but will be empty. Just be aware that the command line version does not produce those nice structure predictions like aragorn. The structure results look something like the text below and must be manually parsed. For this, the struct files are best viewed using a monospace font like consolas or courier (not available in forum posts): `eq: TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC Str: >>>>>>>..>>>……….<<<.>.>>>…….<<<.<…..>>>>>…….<<<<<.<<<<<<<.` Edited 14 Apr, 2025 17:11

Link to this post | posted 14 Apr, 2025 16:13

cdshaffer

As was discussed in the Friday CAT session tRNAscan has been down for a few days. The good news is that the program is also available for download and can be run on the command line. With that in mind I just processed all 750 odd phages that are marked as draft with tRNAscan-se. If your phage is missing post a follow-up message to this thread. You can get your results at the following link:

http://phages.wustl.edu/trnascan/

Your results will be a zip archive with 3 files. These files are named starting with the Phage name and include "info", "struct", and "table" as part of the file name.

The "info" file was created to confirm that the phage was analyzed and has the exact details on run parameters and version (you will need the version number if you intend to publish). For documentation here: tRNAscan-se was run with parameters "-X 0 -d -B -I -D -q" and the exact version was 2.0.12.

The "struct" file contains the predicted structure of the tRNA as printed by tRNAscan-se. The "table" file contains the results in a table format very similar to the table results you get from the web page.

If your genome has NO predicted tRNA's the "struct" and "table" files will be there but will be empty.

Just be aware that the command line version does not produce those nice structure predictions like aragorn. The structure results look something like the text below and must be manually parsed. For this, the struct files are best viewed using a monospace font like consolas or courier (not available in forum posts):


eq: TCAGGTGcGGGGAAGATGGTAATCCGTTGGTTTTGGAAACCTAAGAcACCCGGTTCGATTCCGGGgCACCTGAC
Str: >>>>>>>..>>>……….<<<.>.>>>…….<<<.<…..>>>>>…….<<<<<.<<<<<<<.

Edited 14 Apr, 2025 17:11

Posted in: Bioinformatic Tools and Analyses → tRNAscan-SE

Link to this post \| posted 29 Mar, 2025 02:09
cdshaffer	Database 593 was released with 5 new phage. The whole starterator reports for these phage have been added to collection found here: https://wustl.box.com/v/Actino-phage New Phage Released in 593: FatCactus, Liberone, Neuvillette, RoseMarie, and Rossetti

Posted in: Starterator → Whole phage starterator reports

Link to this post \| posted 22 Mar, 2025 00:56
cdshaffer	Database 592 was released with one new phage. The whole starterator report for this phage has been added to collection here: https://wustl.box.com/v/Actino-phage Phage Released in 592: WileyE

Posted in: Starterator → Whole phage starterator reports

Link to this post \| posted 14 Mar, 2025 16:56
cdshaffer	Here is a simple example with super simple sequences: `query = AAA subject = CCCAAATTTGGG` When blast does the alignment with appropriate settings (since blast would never show this by default as it is too short, but you can force it) and you would in theory get this result: `1 AAA 3 \|\|\| 4 AAA 6` The %aligned is 100% because the entire query is found in the alignment the % coverage is 25% since only 3 bases of the 12 bases in the subject are in the alignment For % identity you get 100 % because 100% of the bases in the alignment match identically For DNA there is no %similar (the % similar is only used for amino acids alignment) but for a.a you just would count the fraction of alignment columns that are either identical or similar and divide by the length of the alignment. CCD is a database of protein domains (i.e. small parts of proteins seen widely) think things like zinc finger or ATP binding domain. Thus, for interpretation of CCD hits hits you care most about the % coverage and % similar the % aligned is mostly irrelevant.

Link to this post | posted 14 Mar, 2025 16:56

cdshaffer

Here is a simple example with super simple sequences:

query = AAA

subject = CCCAAATTTGGG

When blast does the alignment with appropriate settings (since blast would never show this by default as it is too short, but you can force it) and you would in theory get this result:


1 AAA 3
  |||
4 AAA 6

The %aligned is 100% because the entire query is found in the alignment
the % coverage is 25% since only 3 bases of the 12 bases in the subject are in the alignment
For % identity you get 100 % because 100% of the bases in the alignment match
identically
For DNA there is no %similar (the % similar is only used for amino acids alignment)
but for a.a you just would count the fraction of alignment columns that are either identical or similar and divide by the length of the alignment.

CCD is a database of protein domains (i.e. small parts of proteins seen widely) think things like zinc finger or ATP binding domain.
Thus, for interpretation of CCD hits hits you care most about the % coverage and % similar
the % aligned is mostly irrelevant.

Posted in: Bioinformatic Tools and Analyses → % Identity vs % Aligned vs % Coverage

Link to this post \| posted 10 Mar, 2025 19:30
cdshaffer	I always remind my students of rule 9 in cases like this as a good first bit of evidence: 9. Switches in gene orientation (from forward to reverse, or vice versa) are relatively rare. In other words, it is common to find groups of genes transcribed in the same direction. so if all the neighbor genes are on one strand then of the two predictions, I generally prefer the one on the same strand (there are rare cases of a single gene on the opposite strand, so use the Pollenz method described above to check for that). Another thing I have students do is run both predicted protein sequences through HHPRED. I think a true gene is much more likely to have an HHPRED hit as compared to a false positive ORF on the other strand. You don't need to see >90% probability, if one gene has HHPRED hits in the 70-80% probability and the other gene has a hits in the <40% probability, you have found good evidence which is likely the true gene and which is the false positive.

Link to this post | posted 10 Mar, 2025 19:30

cdshaffer

I always remind my students of rule 9 in cases like this as a good first bit of evidence:

9. Switches in gene orientation (from forward to reverse, or vice versa) are relatively rare. In other words, it is common to find groups of genes transcribed in the same direction.

so if all the neighbor genes are on one strand then of the two predictions, I generally prefer the one on the same strand (there are rare cases of a single gene on the opposite strand, so use the Pollenz method described above to check for that).

Another thing I have students do is run both predicted protein sequences through HHPRED. I think a true gene is much more likely to have an HHPRED hit as compared to a false positive ORF on the other strand. You don't need to see >90% probability, if one gene has HHPRED hits in the 70-80% probability and the other gene has a hits in the <40% probability, you have found good evidence which is likely the true gene and which is the false positive.

Posted in: Annotation → when Glimmer and Genemark call genes in different strands

Link to this post \| posted 16 Feb, 2025 15:37
cdshaffer	Database 587 has been released and the following whole phage reports have been added to the collection. See the first message in this thread for more info, these folders contain all the reports: https://wustl.box.com/v/Actino-phage Phage released in 587: BarrowTuph, Becksu, Bhagsy, Bigchungi, Bigflo, Bobquesha, Brookers, Corium, DeejaClivia, Doughnut, EmiMonkey, Emmaloid, EnzoK, Etoile, FoghornLeghorn, Gratitude, KingJulian, Lebo14, Levia, LionsBait, Lowa, LunarLander, Marco3, Nekros, Nette, NewHope4, Olivio, Phalaborwa, PhineBark, PhlipPhlop, Phlippers, Plata, Polo2Bam, Prinashe11, Razzleberry, Rearden, Riverton, Sabourin, Shaboozey, SilverChicken, Slim, Stuck, Stuu, Teejan, Terrific, TooFast2Furius, Violac

Link to this post | posted 16 Feb, 2025 15:37

cdshaffer

Database 587 has been released and the following whole phage reports have been added to the collection. See the first message in this thread for more info, these folders contain all the reports:

https://wustl.box.com/v/Actino-phage

Phage released in 587:

BarrowTuph, Becksu, Bhagsy, Bigchungi, Bigflo, Bobquesha,
Brookers, Corium, DeejaClivia, Doughnut, EmiMonkey, Emmaloid,
EnzoK, Etoile, FoghornLeghorn, Gratitude, KingJulian, Lebo14,
Levia, LionsBait, Lowa, LunarLander, Marco3, Nekros,
Nette, NewHope4, Olivio, Phalaborwa, PhineBark, PhlipPhlop,
Phlippers, Plata, Polo2Bam, Prinashe11, Razzleberry, Rearden,
Riverton, Sabourin, Shaboozey, SilverChicken, Slim, Stuck,
Stuu, Teejan, Terrific, TooFast2Furius, Violac

Posted in: Starterator → Whole phage starterator reports

Link to this post \| posted 10 Feb, 2025 15:21
cdshaffer	Database 586 has been released and the following whole phage reports have been added to the collection. See the first message in this thread for more info, these folders contain all the reports: https://wustl.box.com/v/Actino-phage Phage released in 585: Faiyaz, Giorgio, Gumpizza, Lea83, Mayonnaise, Polyphemus, RockScotty, Spiderbri, Tortoise12, Tristan

Posted in: Starterator → Whole phage starterator reports

← previous
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
next →

Recent Activity

All posts created by cdshaffer