SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 22 Oct, 2020 00:15
To me the results of of the starterator reports are quite telling. The two choices you point out are labelled start 12 and 15 in the current starterator report here. First the level of conservation for start 12 is much much higher than start 15. In fact there are only 2 of 56 phage that don't have start 2 and both of those have a start very very close by position to start 12. On the other hand start 15 is only seen in 2/3rds of these genes and for 7 of the 30 tracks there are no starts anywhere near start 15.

To me it is hard to believe that evolution would continue to choose to keep the bases that code for start 12 in virtually all these genes if start 15 was really the start cf translation, so I would have a strong preference for it.

As for coding potential (CP). If you look carefully you can see examples of other regions in the genome where you know the sequence is coding but the CP signal drops to zero. These are regions that are downstream of a strong CP signal but before the stop codon. See the CP for gene 14, there is easily at least 100 bases with no CP signal. So this is why I have a "rule" that a positive signal in CP is good evidence there IS a gene but no CP is not quite as good at indicating there IS NOT a gene. Said more formally, CP algorithm makes more false negative errors than false positive errors. So, in this case where one start says there is a CP false positive (start with 245 gap) and the other choice would say that CP is a false negative I would say that CP also is slightly more supportive of the big overlap start.

Taken together then I would annotate this gene to start at 1322. If I were helping a student with this I would now ask them to back and double check that gene 2 is real just because of that super large overlap. But even if gene 2 is real I would probably still stick with that huge overlap given the strong level of conservation seen in starterator report.
Posted in: Cluster A Annotation TipsVIP2-like toxin/ ADP-ribosyltransferase
| posted 24 Sep, 2020 20:17
The term "AAA-ATPase" is a domain found in a wide variety of proteins and is indicative of a specific fold pattern that creates an ATPase pocket that has been seen in many different proteins. So an annotation with that term says you want people to know that the protein very likely cleaves ATP and uses that specific fold structure. The term "terminase" reflects the biological role the protein supplies for the phage. To me, neither term is inherently better than the other, it all depends on what one is interested in. I am sure there are many enzymologists out there that are much more interested in the presence of an enzymatic domain, while others are more interested in why the phage would have that gene at all.

My own perspective is to go with the biological role if I can find sufficient evidence in support and only mention the presence of a domains (as they are often good hints to the biological role) when there is not sufficient evidence to call the role.
Posted in: Functional Annotationcluster K terminases
| posted 24 Sep, 2020 20:02
For those of you that have the SEAVM with pdm utils installed that came out in June 2020, it has version 2.0.5 of tRNA-scan already installed. It is the command line version, not the web page, but it is pretty straightforward to run.

First copy the genome fasta file into the VM and put the file on the Desktop. Open a terminal and change the working directory to the Desktop with this command:

cd Desktop

Now use the following command (I tried to use the parameter settings that would mimic the settings recommended on the web interface). You will need to replace the items bounded by the <> with your specific values:

tRNAscan-SE -B -I –max -D -o <name of file with results> <name of fasta file>

Here is an example of the command I used recently on phage Jada where the fasta genome file is called Jada.fasta, and I wanted the results file to be called Jada.tRNAscan.txt.

tRNAscan-SE -B -I –max -D -o Jada.tRNAscan.txt Jada.fasta

the results will be a text file that you can copy back to your computer, it can be opened by textedit on a Mac or Wordpad on Windows. Here are the first few lines of the output for phage Jada:

Sequence		tRNA  	Bounds	tRNA	Anti	Intron Bounds	Inf
Name    	tRNA #	Begin 	End   	Type	Codon	Begin	End	Score	N
——–	——	—– 	——	—-	—–	—–	—-	——	—
Jada    	1	65495 	65565 	Pro	TGG	0	0	50.1
Jada    	2	65734 	65808 	Pro	TGG	0	0	72.6
Jada    	3	66800 	66873 	Gln	CTG	0	0	54.6
Jada    	4	66926 	66996 	Trp	CCA	0	0	26.0
Jada    	5	67273 	67347 	Ile	GAT	0	0	68.1
Jada    	6	67752 	67835 	Ser	TGA	0	0	42.0	
Posted in: tRNAstRNAScan-SE Down-ish?
| posted 02 Sep, 2020 18:52
Wow! talk about a spectacularly unhelpful error message.

There are just a few things to try that are very basic:
1. make sure there is enough room on the hard drive for installation.
2. make sure any virus protection software is TURNED OFF. Virtualbox does some very deep level installation that could be blocked.
3. make sure to be logged into an account with Administrator privilege (to check this go to system preferences -> users & groups and make sure that there is the "Admin" label under the user name in the left column)
that is all I can come up with off the top. Good luck!
Posted in: SEA-PHAGES Virtual MachineVirtual box installation error
| posted 04 Aug, 2020 23:42
OH I see, great questions. As far as I know there is no version of the VM that anyone has built that can run both Starterator and pdm_utils. Not sure it is even possible but I have never tried so it could be possible. The issue is that starterator requires some pretty old versions of some of the the graphical libraries and pdm_utils requires some of the newer libraries that are only available on the newer versions of Ubuntu. It is possible that there are versions of the graphics libraries that are compatible with both starterator and the newer versions of ubuntu but I have not found them.
Posted in: StarteratorRelease of Starterator version 1.2
| posted 04 Aug, 2020 17:01
it is probably working on the old VM because you have not run phamerator so you are on an old (pre-udpates) version of the database. It is important to remember that Starterator does not do all the database checking and management, it relies on the old phamerator to do that. So if you are using a VM you should always run phamerator 1st to get the most recent database and then run starterator.
Posted in: StarteratorRelease of Starterator version 1.2
| posted 04 Aug, 2020 16:55
These issues are likely due to the changes in the database structure that have been released over the past few months. Sorry but I have just not had time to get the changes to Starterator in good enough shape to "publish" them as a new release. There are currently two options:
1. if you want a whole phage report just send me the name of the phage (if it is in phagesdb) or send me a DNA Master file (if it is not in phagesdb). I can easily run these on my "experimental" version of Starterator.
2. change your VM so it runs the experimental version of starterator. Briefly, to do this you need to pull the cdshaffer/phambymatch branch from my cdshaffer github repository and change phamerator preferences to a different server location so get the version of the phamerator database that is compatible with the phambymatch version of Starterator. If you are interested in that let me know I can easily provide more detailed command line instructions for doing the changes to get the experimental version of Starterator and how to change phamerater.
Posted in: StarteratorRelease of Starterator version 1.2
| posted 24 Jul, 2020 22:15
Wow, somehow I missed this message, sorry about that.
Yes starterator is still being updated but it often lags behind the other pages. The issue is that right now the Hatfull lab is working on a large refactoring of the database and while these changes are on going all the web pages are much more likely to get out of sync.

Exactly which version of the database each web page is on varies. I think phagesdb and phamerator are on version 362 while pecaan and starterator are on version 364. So if you are using pecaan things should work OK but if you are looking for phams based on phagesdb or phamerator.org you need to go to an older version of the starterator reports. To do that just substitute the "starterator" part of the URL with the database number, so to get pham 8041 from database version 362 go to:

http://phages.wustl.edu/362/Pham8041Report.pdf


And Voilà you have your report. Since most phams stay the same across releases most links from phagesdb should work, but when they don't just substitute in the "362" into the URL. This will work until phagesdb is updated to the current version, at which point all links should work.
Posted in: StarteratorPham not found in Starterator
| posted 19 Jun, 2020 17:47
I guess the answer depends on exactly what you are looking for in your comparison.

DNA master has a nice comparison tool if you only want to check a small number of genomes like you mention. Just like phamerator it will compare predicted protein sequences and group similar proteins into a single group that it assigns a unique color. The algorithm for comparison is not the exact same as phamerator so you get a "different but similar" kind of grouping but if all your phage are in the same subcluster it works pretty well. You will need DNA Master files of all the phage.

The DNA master protocol works really well for gene content analysis and moderately well if you are trying to see if you have called the same start sites as your comparison phage but for the later I have now switched to using the whole phage starterator report which I think is much better at doing a check of start codon choices. If you want a whole genome starterator please check out this thread: https://seaphages.org/forums/topic/4998/
It is trivially easy for me to set the run up and post the results, so always happy to run the analysis.
Posted in: PhameratorPhamDB: Make your own Phamerator databases
| posted 11 Jun, 2020 21:25
as the pham number is likely to change, here is the link to the Lokk gene 39 to check the current pham number: Lokk_CDS_35

P.S remember gene numbers can change by annotation and whether you are counting tRNA genes as well as protein genes. so confirming genes by coordinates is always recommended.
Posted in: Cluster A Annotation TipsPham 23651 function assignment