SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 24 Feb, 2016 17:42
Yes the intermittent nature is very curious. Steve has updated the bzr archive with Actino_draft as the default. Please let us know if that seems to fix the problem. If not, then I agree with you, may take some serious doing to track down the issue, so if anyone else is having these issues please post here.
Edited 24 Feb, 2016 17:43
Posted in: PhameratorDatabase reversion
| posted 19 Feb, 2016 21:27
Good news, my updated code worked for this one. You can download the complete file here. Wow! over 550 pages, that's some big phams.

Sorry it took me so long, too much grading to stop by SEA Phages the last couple days.
If anyone is in a hurry feel free to send me an email note that you posted a request and I will get on it, it really doesn't take long to set up the run and then the computer does all the work so don't hesitate to ask!
Edited 19 Feb, 2016 21:30
Posted in: Starteratorphage that crash starterator
| posted 18 Feb, 2016 16:58
OK, general question. We have a new phage with lots of tRNA genes packed interdigitated with glimmer/Gene protein predictions. There are long spaces between the tRNA genes so there is certainly room for a protein gene but some of the glimmer and genemark predictions come pretty close to the aragorn hits. How much space should there be between these features? Do we need to leave room for a promoter anytime we switch from a tRNA gene to a protein gene or vice versa?

Should I tell the students to leave at least 25-50 bp needed for the promoter? I will have students look at the protein vs. tRNA distribution in the host bacteria but wanted to check if there is some kind of general rule to maintain consistency across all SEA phages annotations.
Edited 18 Feb, 2016 17:34
Posted in: tRNAsHow close can one pack protein and tRNA's genes
| posted 17 Feb, 2016 17:17
Many Cluster C phages have a gene that spans the physical end. This gives many computer programs fits, its one of the reasons for Starterator crashing on some phage. Also Phamerator has issues as well (although, thankfully it does not crash) and the whole genome maps created by phamerator often don't include genes of that type.

Glimmer (and maybe GeneMark) will predict genes that span the ends if you tell it that you have a circular genome (DNA Master does do this when it submits the sequence to NCBI for auto-annotation). So it is possible they will show up on your auto-annotation list.

As for finding them, I always have my students check all "largish" regions without genes (say larger than 150 bp) by BLAST. You can have DNA Master locate these "holes" automatically: in DNA Master click the "Validate" button below the feature list, then in the bottom right panel click the "control" tab and then "Locate gray holes" with a size of 150. The resulting list gives the positions and sequences of the "holes" which can then be used to search specifically by BLASTX to the protein database. If students do find hits, I would have them consider the quality of the hit (is it real or spurious) and examine the region carefully for a missing gene (evidence would include coding potential and the presence of an ORF that does not have too much overlap with other genes).
Edited 17 Feb, 2016 18:38
Posted in: DNA MasterGenes Across COS sites???
| posted 15 Feb, 2016 22:26
It is failing for me too. I can get to the web pages for glimmer and genemark directly on the web (the exact location can be found in DNA Master preferences; here is the link for glimmer) but when I try to run a glimmer or Genemark predictions on the web pages I get errors, here is an example:
Job failed.
Error Message : NCBI C++ Exception: Error: (CNetSrvConnException::eSrvListEmpty) "/netopt/ncbi_tools64/c++.by-date/production/20120616/GCC442-Release64/../src/connect/services/netservice_api.cpp", line 952: ncbi::CNetService::Iterate() — Couldn't find any available servers for the NS_Glimmer service.
Submit new Data

So it's a problem with NCBI, I posted a note to the help desk but mostly we are just going to have to wait until NCBI fixes the issue and of course it's the official dead presidents holiday so…….
Edited 16 Feb, 2016 00:15
Posted in: DNA MasterGlimmer Failure on Auto Annotation
| posted 12 Feb, 2016 17:42
Not a problem really, takes about 3 minutes to start things rolling and then everything runs in the background. Then takes another 3 or 4 minutes to post to box and copy the shared link.

Gideon did run just fine so it is likely a known bug. Here is the report.
Posted in: StarteratorRead First: Common Starterator Troubleshooting
| posted 10 Feb, 2016 22:18
I did a bunch of debugging last summer/early fall. I was able to fix some of the "off by 1" errors and figure out why some of the phage genes were failing. Starterator is a pretty large and sophisticated and I still don't have a good handle on how it does everything, that would take many hours of reading and thinking about the code which I just don't have time to do right now.

I have been reticent to push my fixes out to everyone. First, because I don't really understand everything, I was worried that my changes would create problems for some small percentage of phage even as it fixed problems for others. Also, not all my solutions are high quality. For example, starterator was crashing on negative strand genes that wrap around the end of the sequence. I could not figure out a way to fix this bug without substantially rewriting a huge chunk. So instead I just added a tiny bit of code so starterator just skips over those gene. Not ideal, but better to have a report with all but one gene instead of no report at all. It's one thing for me to change starterator for myself, it's something else entirely for me to set that as the policy for every copy of starterator out there.

I will say I am feeling more confident that my modifications do not causes more harm that good now that I have run about half a dozen phage through with no new errors that did not show up with the standard code base. Maybe we should have a few beta testers each try several phage with my code before we push them out to everyone.

All my changes are freely available if anyone wants to download them and try their phage. You can get my version of the code from my github repository (github.com cdshaffer/starterator). Anyone comfortable with using the git command line to pull from a remote directory can easily download and test the code. My most recent branch is called filterSpanningGenes.
Posted in: StarteratorRead First: Common Starterator Troubleshooting
| posted 10 Feb, 2016 16:39
Both Andies and Willsterrel appear to have issues that have already been solved as both ran fine on my copy with my bug fixes, links to full reports below:
Full starterator report for Andies
Full starterator report for WillSterrel
Posted in: Starteratorphage that crash starterator
| posted 10 Feb, 2016 06:48
OK,
I ran starterator on Geralt_Draft, Gene # 13. On my version of starterator with my code updatess and with the most recent version of the database it appears to have run OK. Your results may be different than mine with the updated code and a more recent database.

Here is the starterator output for Geralt_Draft, Gene # 13.

This is an unusual pham in that both geralt 13 and geralt 14 are in the phamily. I suspect that in some phage the two proteins are expressed as a single polypeptide.

In this version Geralt 13 is now track 70 and geralt 14 is track 154. There is a minor bug with track numbering in that each page is numbered 1 to 50 so you have to do a little math to find that track 70 should be track 20 on the second page. Track 154 will be the 4th track on the 4th page.

This looks to me as a case where the automated analysis does not work well for this very diverse group of proteins so I would just say that Starterator is Not Informative. Although I would say that start 7 @ 8532 is the best supported for geralt 13, and start 61 @ 9035 is the most supported for geralt_14.

There are a number of blank tracks after the last track which is track 155, (i.e. the 5th track on page 4) this is a minor bug where empty "tracks" are written to fill the page.
Edited 10 Feb, 2016 16:17
Posted in: StarteratorEmpty Track
| posted 10 Feb, 2016 06:33
Both Andies and Willsterrel appear to have issues that have already been solved as both ran fine on my copy with my bug fixes, links to full reports below:
Full starterator report for Andies
Full starterator report for WillSterrel
Posted in: StarteratorRead First: Common Starterator Troubleshooting