SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 10 Feb, 2016 06:48
OK,
I ran starterator on Geralt_Draft, Gene # 13. On my version of starterator with my code updatess and with the most recent version of the database it appears to have run OK. Your results may be different than mine with the updated code and a more recent database.

Here is the starterator output for Geralt_Draft, Gene # 13.

This is an unusual pham in that both geralt 13 and geralt 14 are in the phamily. I suspect that in some phage the two proteins are expressed as a single polypeptide.

In this version Geralt 13 is now track 70 and geralt 14 is track 154. There is a minor bug with track numbering in that each page is numbered 1 to 50 so you have to do a little math to find that track 70 should be track 20 on the second page. Track 154 will be the 4th track on the 4th page.

This looks to me as a case where the automated analysis does not work well for this very diverse group of proteins so I would just say that Starterator is Not Informative. Although I would say that start 7 @ 8532 is the best supported for geralt 13, and start 61 @ 9035 is the most supported for geralt_14.

There are a number of blank tracks after the last track which is track 155, (i.e. the 5th track on page 4) this is a minor bug where empty "tracks" are written to fill the page.
Edited 10 Feb, 2016 16:17
Posted in: StarteratorEmpty Track
| posted 10 Feb, 2016 06:33
Both Andies and Willsterrel appear to have issues that have already been solved as both ran fine on my copy with my bug fixes, links to full reports below:
Full starterator report for Andies
Full starterator report for WillSterrel
Posted in: StarteratorRead First: Common Starterator Troubleshooting
| posted 09 Feb, 2016 03:00
yes,
Open phamerator.
Select preferences from the edit menu. Click the "Force database update" button
I always start these updates very first thing in class before I start any lecturing to give maximum time. I also only have at most 5 student computers trying to update at any one time so the wireless network does not get saturated.
Posted in: PhameratorForce A Database Update? How?
| posted 07 Feb, 2016 01:43
As for the total failure, hard to diagnose without specifics. If it is failing on everything then your assumption that it cannot connect to the database is probably correct. First thing to test then is to see if you can connect to the Actino_draft database with Phamerator. That will help separate database problems from starterator problems. Open phamerator and look in the preferences to make sure Actino_draft is the selected database and then see if phamerator can find recently added phages and make phage maps. Post the results of that test.
Posted in: Starteratorphage that crash starterator
| posted 07 Feb, 2016 01:36
I ran all 4 of the phage above on my system where I updated my own local copy of Starterator to fix a few bugs that were posted last fall. results for the 4 phage above:

Picard ran fine so it is likely it had an issue that I have already corrected. You can get the full report from this link.

bubbles123 crashed for me on 68 of 104, pham 5447. This pham is for the gene 107, right hand most gene, it is on negative strand. The gene starts 2 bases in from the end of the phage, this is a corner case which needs fixing. Exact error reported in line 220 of find_most_common_start of phams.py. Error is ValueError: 0 is not in list. This is a new bug I have not seen before from starterator. Fixing and double checking the code will take time, for now I have just added temporary code to my copy of Starterator to skip that gene so I could create a full report of all the other genes. That full report is available here.

Roosevelt crashed for me on pham 31 of 89; pham number 8871. pham 8871 corresponds to gene 35. Error reported was line 98 of add_aligment in pham.py: KeyError "TWAMP_Draft_33". This might be an naming error, the geneid in the pham table is SEA_TWAMP_33 not TWAMP_Draft_33". There are a cluster of gene names from several phage that start with "SEA_", not sure why but genes with names like this appear to be mis-handled. This is a new bug, like above, fixing the bug will take time. I have had starterator run and just skip gene 35. The full report without gene 35 is available here.

Mojorita was able to run fine so it is likely also caused by a bug I have already squashed. You can get the full report from this link.

I have posted bug reports for the bubbles 123 and roosevelt problems to my github issues tracker where I am keeping a running list of known bugs and possible improvements. Not sure when I will be able to get to these for permanent fixes. Thanks for all the crash reports, the program cannot get better without reports like these.
Edited 07 Feb, 2016 21:03
Posted in: Starteratorphage that crash starterator
| posted 06 Feb, 2016 22:27
When I look at ShiaLaBeouf in Phamerator I see that it is labeled "ShiaLaBeouf_draft" and that there are 231 genes. The "_draft" means that the phage was run on DNA Master auto-annotation and those auto-annotated genes were incorporated into the database. That database is used by both Starterator and Phamerator so I always look in Phamerator first when debugging starterator.

Not sure why your DNA Master has a different number, could be something as simple as the DNA Master total you are looking at includes the tRNA genes (the phamerator database is only counting protein coding genes), or that someone added genes to the DNA Master file. Alternatively, it could be something complicated based on settings or default configuration of your copy of DNA Master compared to the copy that was run to create the auto-annotation that ended up in the database. Another possibility is that there was a glitch that caused an error in the database.

Starterator was designed to deal with this situation (i.e. you want to analyze a gene that is not in the phamerator database) by allowing you to enter in coordinates that define a gene (it is the routine listed in the start window as "One unphamerated gene" ). You are supposed to be able to enter the relevant data, phage, phage sequence, gene coordinates and strand and get a result, but I have not had good luck with that routine. It is certainly something that needs work under the hood with the code.

Anyway, if you still want to try to track down this discrepancy, then the first step would be to do a careful comparison of the gene list in DNA Master compared to the the Phamerator Database. I extracted the gene list from the phamerator database to help with comparison. You can get the file from this link.
Posted in: StarteratorEmpty Track
| posted 05 Feb, 2016 20:11
easy way (that I use in class when time is an issue) I tell student: open browser, connect to email, send files to yourself as attachment. Bonus feature: automatic backup of files.

If you need to move many files back and forth:
1. make sure you are using a recent version of Virtualbox and have installed guest additions in the ubuntu machine
2. with machine off go to settings -> shared folder
3. click the tiny folder with the plus sign to add a shared folder
4. in the folder path entry click the arrowhead and select "other"
5. select the mac Desktop folder to share between guest and host
6. select the "Make permanent" setting so you only have to do this once.
7. The shared folder will be in the /media folder in the guest which is in the top level of folders (i.e. two folders up from your home folder). The name of the folder inside the guest machine will be something like sf_Desktop
8. Files placed in there will appear on the Desktop of your mac

You an also set up drag'n'drop to supposedly be able to move files back and forth but I have had less success with that technique
Posted in: SEA-PHAGES Virtual MachineShared folders
| posted 05 Feb, 2016 18:34
I will look into this when I can, unfortunately my computer motherboard died last night and I am working from a loaner until it is repaired. Until then my starterator virtual machine is unavailable. It sounds like a couple of other cases I have run with little or no pink. It both of those cases it was not a bug as much as weird corner cases that starterator was not built to handle.
Posted in: StarteratorEmpty Track
| posted 05 Feb, 2016 18:04
I have been trying to keep a list of bugs and possible improvements to starterator (see issues on my github cdshaffer/starterator repo if you want to see the specific list). I saw a very similar result in the phage Mitkao pham 1510 output. I was able to do a little sniffing around in that case. The problem was a single unusual gene with a very long ORF upstream of the start codon that messed up the calculation of the scaling to use for the X axis. Another pham had a different issue but a similar output in that there was just too much protein sequence divergence among the pham members so there was no pink simply because there was so little conservation among all members.

So in both cases I investigated it was not simply the size of the pham but unusual properties of the specific pham. This is very typical in bioinformatics. The computer programs will take care of 95-99% of cases, but since biology is not math there are always unusual corner cases that just don't work well. In the MitKao case one of the assumptions made my starterator is that there will be an in-frame stop codon not too far upstream of the annotated start codon. In rare cases this assumption is incorrect and the output fails to give meaningful results.

I always use results like this as a teaching moment. This is a great example that no computer program is 100% successful and it is why it is still worthwhile doing manual annotation. So in this case, the "experiment" (i.e. the automated analysis of a multiple sequence alignment of all genes in a pham using ClustalW) failed to give a result. I would explain to the student that we now have a decision to make: try to do the analysis manually or just move on. This brings up the opportunity to discuss cost/benefit analysis and how that relates to research and that there is never enough time to do everything and a good researcher is making good choices about where to invest time and $ to get the best outcome they can afford. I would then probably say in this case that the manual analysis is not worth the time/effort and just put in the notes that starterator was NI (not informative) as suggested in the Annotation Guide (see page 76).
Posted in: PhameratorTutorial on Phamerator and Starterator Use?
| posted 25 Jan, 2016 18:02
I have not tried Windows 10 but the checksums for the virtual disk images are here.
Posted in: DNA MasterDNA Master and Windows 10