SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 27 Apr, 2016 16:52
Wow! There were three genes in Kerberos that crashed starterator. These are all issues with protein names that is a known incompatability between Starterator and the phamerator database which needs work. Hopefully this will get addressed this summer.

In the mean time I did get a starterator report of all the genes that did not crash (i.e. report is missing the three genes that crash but the rest are there.) I suggest for notes that you just put NA for the starterator comment for those three genes.

You can download the complete pdf using this link.
Edited 27 Apr, 2016 19:42
Posted in: StarteratorRead First: Common Starterator Troubleshooting
| posted 19 Apr, 2016 14:05
Good news, my run worked. Link to full report.
Edited 19 Apr, 2016 15:01
Posted in: StarteratorRead First: Common Starterator Troubleshooting
| posted 06 Apr, 2016 15:34
Welcome to Unix! The problem is likely one of two issues. The most common is that you have a folder in the path with a space in the name. This can be any folder in the hierarchy starting with the very top all the way to the edit_dir. If you are using the SEA virtual machine this folder path will start like thie: /home/seastudent… and then proceed with each folder down to edit_dir. If you have any spaces in the names of any of those folders Consed will fail to find the files. This is why folder names so often have underscores (i.e. edit_dir not "edit dir"smile. So check all your folder names.

The second most common issue is that the entries in the .fof file are not exactly correct or were made with the wrong text encoding. Check the .fof file paying particular attention to case. Unix is case sensitive, so File.ab1 is not the same as file.ab1. Finally, if you made the .fof file in a text editor that uses either mac or dos encoding it can be unreadable by Consed. Consed requires unix encoding.

Here is the protocol I use with my students:
1. create a new folder and put all the sequences files in that folder
2. cd to that folder in terminal
3. create the .fof file with the following command in terminal:

ls *.ab1 > reads.fof

4. you will now see another file in the folder with the reads called "reads.fof"

This reads.fof is unix encoded text file and is readable by Consed, also the computer is much better about avoiding typos. The user should open the reads.fof with a text editor and double check that the list of files looks good.

Now proceed with step two in Dan's protocol and move the .ab1 files into the chromat_dir and the reads.fof file into the edit_dir.

The above covers about 90% of the 512 errors, if neither helps you, post a followup.

Posted in: ConsedAdding Sanger Reads
| posted 05 Apr, 2016 18:12
OK, thanks for that phage, it revealed an error in my code for finding these end spanning genes. I finally got starterator to complete a run with more improvements to the code.

You can get the report here.


the end spanning gene is not in the report, you will just want to put NA for the starterator notes.
Posted in: Starteratorphage that crash starterator
| posted 05 Apr, 2016 17:14
According to the phamerator map, the wrap around gene is pham 9781 I cannot find any proteins in pham 9872 so I am going to guess you have a typo when you put 9782. If that is not the case let me know, I will chase down this discrepancy (could be an issue with a database and be causing the crash).

Proceeding with the analysis based on the typo assumption, I can confirm that Tonenili crashes for me on the wrap around gene. I am going to run the version of the program which should skip that gene. These runs take a while I will post again when I have results.
Edited 05 Apr, 2016 17:31
Posted in: Starteratorphage that crash starterator
| posted 04 Apr, 2016 21:49
Tonenili is a C1 phage, I bet there is a wrap around gene which is causing the crash. I have a version of starterator which tries to recognize this situation and just skip the analysis for that gene. Analysis is running now. I will try running on that version of starterator and see if we can avoid the crash. I will post an update one way or another as soon as I can
Edited 05 Apr, 2016 17:01
Posted in: Starteratorphage that crash starterator
| posted 16 Mar, 2016 20:31
My own feeling is that coding potential is a good positive signal but a lousy negative signal. That is, seeing coding potential is a good sign for the presence of a gene but no coding potential is not good evidence for the absence of a gene. Coding potential is mostly about matching the nucleotide biases of the genes in the training set (be it self training on the phage itself or trained on a nearby host). Its always possible that a gene has evolved for some reason to specifically NOT match a particular bias. Say expression in a different host for example, or evolved for slow translation rates by using rare codons.

In addition, I was trained that when in doubt it is always best to over annotate, not under annotate. The idea was it is better to have a false positive protein in a database than a false negative protein missing from the database. Its typically easier to find and reject an error than it is to find something that has been erroneously left out.

Having said that, I will mention that the first point is my own opinion and I would not be surprised if other reasonable annotators disagree. Its really about data interpretation without guidance from any experiments so its all about opinion. As for the later point, I have found that policy to be much more a consensus in the community of human annotators that trained me and much less so in the community of people I have met that work with prokaryotes. So a reasonable counter argument is that it is better to match the standards of the community to which I would defer to others to voice an opinion.
Posted in: Gene or not a GeneCluster B gene with no coding potential
| posted 12 Mar, 2016 02:47
TyrionL ran fine on my computer with the normal version of Starterator. This suggests that you may have a corrupted database or some other issue with your copy of the virtual machine. If you need to run a bunch more phage through Starterator you may want to re download the virtual disk and reinstall.

If you just have a small number of other phage to do, just post them here I am happy to run them, the only issue really is the slow turn around time since I can get busy and not check the forum for days.

You can download the full report for TyrionL from this link.
Posted in: Starteratorphage that crash starterator
| posted 07 Mar, 2016 16:47
Just an FYI, you can create the genbank files you need by using the DNA Master "Submit to GenBank" tool which, not surprisingly, is found in the Tools menu.
Posted in: PhameratorPhamDB: Make your own Phamerator databases
| posted 04 Mar, 2016 18:02
The only graphical way to do this is the recently published PhamDB (see my post in starterator). If you want to go the traditional route, updating the database requires using the python scripts posted by Charles Bowman at his bitbucket repository for k_phamerate. K_phamerate uses a set of programs for database creation that are incompatible with the version of Ubuntu running on the sea-phages virtual machine so you need a different machine to manage databases. I have a virtual machine with all the k-phamerate code installed that I can get to you if you want. This machine is quite a bit larger at almost 12 Gig. So it is not a trivial undertaking.
Edited 04 Mar, 2016 18:03
Posted in: PhameratorAnalyze our near finished sequence in Phamerator???