SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 10 Feb, 2016 22:18
I did a bunch of debugging last summer/early fall. I was able to fix some of the "off by 1" errors and figure out why some of the phage genes were failing. Starterator is a pretty large and sophisticated and I still don't have a good handle on how it does everything, that would take many hours of reading and thinking about the code which I just don't have time to do right now.

I have been reticent to push my fixes out to everyone. First, because I don't really understand everything, I was worried that my changes would create problems for some small percentage of phage even as it fixed problems for others. Also, not all my solutions are high quality. For example, starterator was crashing on negative strand genes that wrap around the end of the sequence. I could not figure out a way to fix this bug without substantially rewriting a huge chunk. So instead I just added a tiny bit of code so starterator just skips over those gene. Not ideal, but better to have a report with all but one gene instead of no report at all. It's one thing for me to change starterator for myself, it's something else entirely for me to set that as the policy for every copy of starterator out there.

I will say I am feeling more confident that my modifications do not causes more harm that good now that I have run about half a dozen phage through with no new errors that did not show up with the standard code base. Maybe we should have a few beta testers each try several phage with my code before we push them out to everyone.

All my changes are freely available if anyone wants to download them and try their phage. You can get my version of the code from my github repository (github.com cdshaffer/starterator). Anyone comfortable with using the git command line to pull from a remote directory can easily download and test the code. My most recent branch is called filterSpanningGenes.
Posted in: StarteratorRead First: Common Starterator Troubleshooting
| posted 10 Feb, 2016 16:39
Both Andies and Willsterrel appear to have issues that have already been solved as both ran fine on my copy with my bug fixes, links to full reports below:
Full starterator report for Andies
Full starterator report for WillSterrel
Posted in: Starteratorphage that crash starterator
| posted 10 Feb, 2016 06:48
OK,
I ran starterator on Geralt_Draft, Gene # 13. On my version of starterator with my code updatess and with the most recent version of the database it appears to have run OK. Your results may be different than mine with the updated code and a more recent database.

Here is the starterator output for Geralt_Draft, Gene # 13.

This is an unusual pham in that both geralt 13 and geralt 14 are in the phamily. I suspect that in some phage the two proteins are expressed as a single polypeptide.

In this version Geralt 13 is now track 70 and geralt 14 is track 154. There is a minor bug with track numbering in that each page is numbered 1 to 50 so you have to do a little math to find that track 70 should be track 20 on the second page. Track 154 will be the 4th track on the 4th page.

This looks to me as a case where the automated analysis does not work well for this very diverse group of proteins so I would just say that Starterator is Not Informative. Although I would say that start 7 @ 8532 is the best supported for geralt 13, and start 61 @ 9035 is the most supported for geralt_14.

There are a number of blank tracks after the last track which is track 155, (i.e. the 5th track on page 4) this is a minor bug where empty "tracks" are written to fill the page.
Edited 10 Feb, 2016 16:17
Posted in: StarteratorEmpty Track
| posted 10 Feb, 2016 06:33
Both Andies and Willsterrel appear to have issues that have already been solved as both ran fine on my copy with my bug fixes, links to full reports below:
Full starterator report for Andies
Full starterator report for WillSterrel
Posted in: StarteratorRead First: Common Starterator Troubleshooting
| posted 09 Feb, 2016 03:00
yes,
Open phamerator.
Select preferences from the edit menu. Click the "Force database update" button
I always start these updates very first thing in class before I start any lecturing to give maximum time. I also only have at most 5 student computers trying to update at any one time so the wireless network does not get saturated.
Posted in: PhameratorForce A Database Update? How?
| posted 07 Feb, 2016 01:43
As for the total failure, hard to diagnose without specifics. If it is failing on everything then your assumption that it cannot connect to the database is probably correct. First thing to test then is to see if you can connect to the Actino_draft database with Phamerator. That will help separate database problems from starterator problems. Open phamerator and look in the preferences to make sure Actino_draft is the selected database and then see if phamerator can find recently added phages and make phage maps. Post the results of that test.
Posted in: Starteratorphage that crash starterator
| posted 07 Feb, 2016 01:36
I ran all 4 of the phage above on my system where I updated my own local copy of Starterator to fix a few bugs that were posted last fall. results for the 4 phage above:

Picard ran fine so it is likely it had an issue that I have already corrected. You can get the full report from this link.

bubbles123 crashed for me on 68 of 104, pham 5447. This pham is for the gene 107, right hand most gene, it is on negative strand. The gene starts 2 bases in from the end of the phage, this is a corner case which needs fixing. Exact error reported in line 220 of find_most_common_start of phams.py. Error is ValueError: 0 is not in list. This is a new bug I have not seen before from starterator. Fixing and double checking the code will take time, for now I have just added temporary code to my copy of Starterator to skip that gene so I could create a full report of all the other genes. That full report is available here.

Roosevelt crashed for me on pham 31 of 89; pham number 8871. pham 8871 corresponds to gene 35. Error reported was line 98 of add_aligment in pham.py: KeyError "TWAMP_Draft_33". This might be an naming error, the geneid in the pham table is SEA_TWAMP_33 not TWAMP_Draft_33". There are a cluster of gene names from several phage that start with "SEA_", not sure why but genes with names like this appear to be mis-handled. This is a new bug, like above, fixing the bug will take time. I have had starterator run and just skip gene 35. The full report without gene 35 is available here.

Mojorita was able to run fine so it is likely also caused by a bug I have already squashed. You can get the full report from this link.

I have posted bug reports for the bubbles 123 and roosevelt problems to my github issues tracker where I am keeping a running list of known bugs and possible improvements. Not sure when I will be able to get to these for permanent fixes. Thanks for all the crash reports, the program cannot get better without reports like these.
Edited 07 Feb, 2016 21:03
Posted in: Starteratorphage that crash starterator
| posted 06 Feb, 2016 22:27
When I look at ShiaLaBeouf in Phamerator I see that it is labeled "ShiaLaBeouf_draft" and that there are 231 genes. The "_draft" means that the phage was run on DNA Master auto-annotation and those auto-annotated genes were incorporated into the database. That database is used by both Starterator and Phamerator so I always look in Phamerator first when debugging starterator.

Not sure why your DNA Master has a different number, could be something as simple as the DNA Master total you are looking at includes the tRNA genes (the phamerator database is only counting protein coding genes), or that someone added genes to the DNA Master file. Alternatively, it could be something complicated based on settings or default configuration of your copy of DNA Master compared to the copy that was run to create the auto-annotation that ended up in the database. Another possibility is that there was a glitch that caused an error in the database.

Starterator was designed to deal with this situation (i.e. you want to analyze a gene that is not in the phamerator database) by allowing you to enter in coordinates that define a gene (it is the routine listed in the start window as "One unphamerated gene" ). You are supposed to be able to enter the relevant data, phage, phage sequence, gene coordinates and strand and get a result, but I have not had good luck with that routine. It is certainly something that needs work under the hood with the code.

Anyway, if you still want to try to track down this discrepancy, then the first step would be to do a careful comparison of the gene list in DNA Master compared to the the Phamerator Database. I extracted the gene list from the phamerator database to help with comparison. You can get the file from this link.
Posted in: StarteratorEmpty Track
| posted 05 Feb, 2016 20:11
easy way (that I use in class when time is an issue) I tell student: open browser, connect to email, send files to yourself as attachment. Bonus feature: automatic backup of files.

If you need to move many files back and forth:
1. make sure you are using a recent version of Virtualbox and have installed guest additions in the ubuntu machine
2. with machine off go to settings -> shared folder
3. click the tiny folder with the plus sign to add a shared folder
4. in the folder path entry click the arrowhead and select "other"
5. select the mac Desktop folder to share between guest and host
6. select the "Make permanent" setting so you only have to do this once.
7. The shared folder will be in the /media folder in the guest which is in the top level of folders (i.e. two folders up from your home folder). The name of the folder inside the guest machine will be something like sf_Desktop
8. Files placed in there will appear on the Desktop of your mac

You an also set up drag'n'drop to supposedly be able to move files back and forth but I have had less success with that technique
Posted in: SEA-PHAGES Virtual MachineShared folders
| posted 05 Feb, 2016 18:34
I will look into this when I can, unfortunately my computer motherboard died last night and I am working from a loaner until it is repaired. Until then my starterator virtual machine is unavailable. It sounds like a couple of other cases I have run with little or no pink. It both of those cases it was not a bug as much as weird corner cases that starterator was not built to handle.
Posted in: StarteratorEmpty Track