SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 24 Feb, 2016 18:30
Just as a check I wrote a quick biopython script to ran a blast search using the qblast server at NCBI and it worked, at nearly the same time I tried to submit a search from DNA Master and I too am getting the same error. This would suggest that DNA Master is the issue and it is not a general problem with the NCBI qblast service. So it is possible DNA Master (from all of us) has submitted too many requests in too short a time and NCBI has blocked DNA Master (i.e. all copies everywhere) from submitting any more searches. If this is indeed the case and if I understand NCBI policy in this regard Dr. Lawrence needs to get in touch with NCBI and get DNA Master on some kind of a whitelist so we don't get blocked. Dr. Lawrence should know more about these NCBI policies.

P.S. HAHA I only have to wait 58 million seconds, apparently NCBI likes me more than you smile
Edited 24 Feb, 2016 20:30
Posted in: DNA MasterBLAST in DNAM
| posted 24 Feb, 2016 17:46
OK thanks.
Looking more closely we have now found at least several tRNA's where there are NO bases between the end of the tRNA and the first base of the auto-annotation. Splicing of the RNA to excise the tRNA would thus produce a leaderless message. I have been talking to our local faculty who works almost exclusively with Streptomyces (the phage host in this case) and apparently in some of these strains leaderless messages are not that uncommon, so I think we will go ahead and annotate both no matter how close as along as they don't overlap. I tend to agree with the general annotation policy that it is better to annotate a false positive than a false negative; so when things are truly ambiguous over-predict don't under-predict.

He also raised the possibility that this is an intentional organization that could be used as some sort of regulation of expression, by splicing out an overlapping tRNA you effectively kill the protein encoded gene, resulting in less protein production than if the tRNA wasn't there. I don't know if this has ever been described before but would be very cool if true.

Posted in: tRNAsHow close can one pack protein and tRNA's genes
| posted 24 Feb, 2016 17:42
Yes the intermittent nature is very curious. Steve has updated the bzr archive with Actino_draft as the default. Please let us know if that seems to fix the problem. If not, then I agree with you, may take some serious doing to track down the issue, so if anyone else is having these issues please post here.
Edited 24 Feb, 2016 17:43
Posted in: PhameratorDatabase reversion
| posted 19 Feb, 2016 21:27
Good news, my updated code worked for this one. You can download the complete file here. Wow! over 550 pages, that's some big phams.

Sorry it took me so long, too much grading to stop by SEA Phages the last couple days.
If anyone is in a hurry feel free to send me an email note that you posted a request and I will get on it, it really doesn't take long to set up the run and then the computer does all the work so don't hesitate to ask!
Edited 19 Feb, 2016 21:30
Posted in: Starteratorphage that crash starterator
| posted 18 Feb, 2016 16:58
OK, general question. We have a new phage with lots of tRNA genes packed interdigitated with glimmer/Gene protein predictions. There are long spaces between the tRNA genes so there is certainly room for a protein gene but some of the glimmer and genemark predictions come pretty close to the aragorn hits. How much space should there be between these features? Do we need to leave room for a promoter anytime we switch from a tRNA gene to a protein gene or vice versa?

Should I tell the students to leave at least 25-50 bp needed for the promoter? I will have students look at the protein vs. tRNA distribution in the host bacteria but wanted to check if there is some kind of general rule to maintain consistency across all SEA phages annotations.
Edited 18 Feb, 2016 17:34
Posted in: tRNAsHow close can one pack protein and tRNA's genes
| posted 17 Feb, 2016 17:17
Many Cluster C phages have a gene that spans the physical end. This gives many computer programs fits, its one of the reasons for Starterator crashing on some phage. Also Phamerator has issues as well (although, thankfully it does not crash) and the whole genome maps created by phamerator often don't include genes of that type.

Glimmer (and maybe GeneMark) will predict genes that span the ends if you tell it that you have a circular genome (DNA Master does do this when it submits the sequence to NCBI for auto-annotation). So it is possible they will show up on your auto-annotation list.

As for finding them, I always have my students check all "largish" regions without genes (say larger than 150 bp) by BLAST. You can have DNA Master locate these "holes" automatically: in DNA Master click the "Validate" button below the feature list, then in the bottom right panel click the "control" tab and then "Locate gray holes" with a size of 150. The resulting list gives the positions and sequences of the "holes" which can then be used to search specifically by BLASTX to the protein database. If students do find hits, I would have them consider the quality of the hit (is it real or spurious) and examine the region carefully for a missing gene (evidence would include coding potential and the presence of an ORF that does not have too much overlap with other genes).
Edited 17 Feb, 2016 18:38
Posted in: DNA MasterGenes Across COS sites???
| posted 15 Feb, 2016 22:26
It is failing for me too. I can get to the web pages for glimmer and genemark directly on the web (the exact location can be found in DNA Master preferences; here is the link for glimmer) but when I try to run a glimmer or Genemark predictions on the web pages I get errors, here is an example:
Job failed.
Error Message : NCBI C++ Exception: Error: (CNetSrvConnException::eSrvListEmpty) "/netopt/ncbi_tools64/c++.by-date/production/20120616/GCC442-Release64/../src/connect/services/netservice_api.cpp", line 952: ncbi::CNetService::Iterate() — Couldn't find any available servers for the NS_Glimmer service.
Submit new Data

So it's a problem with NCBI, I posted a note to the help desk but mostly we are just going to have to wait until NCBI fixes the issue and of course it's the official dead presidents holiday so…….
Edited 16 Feb, 2016 00:15
Posted in: DNA MasterGlimmer Failure on Auto Annotation
| posted 12 Feb, 2016 17:42
Not a problem really, takes about 3 minutes to start things rolling and then everything runs in the background. Then takes another 3 or 4 minutes to post to box and copy the shared link.

Gideon did run just fine so it is likely a known bug. Here is the report.
Posted in: StarteratorRead First: Common Starterator Troubleshooting
| posted 10 Feb, 2016 22:18
I did a bunch of debugging last summer/early fall. I was able to fix some of the "off by 1" errors and figure out why some of the phage genes were failing. Starterator is a pretty large and sophisticated and I still don't have a good handle on how it does everything, that would take many hours of reading and thinking about the code which I just don't have time to do right now.

I have been reticent to push my fixes out to everyone. First, because I don't really understand everything, I was worried that my changes would create problems for some small percentage of phage even as it fixed problems for others. Also, not all my solutions are high quality. For example, starterator was crashing on negative strand genes that wrap around the end of the sequence. I could not figure out a way to fix this bug without substantially rewriting a huge chunk. So instead I just added a tiny bit of code so starterator just skips over those gene. Not ideal, but better to have a report with all but one gene instead of no report at all. It's one thing for me to change starterator for myself, it's something else entirely for me to set that as the policy for every copy of starterator out there.

I will say I am feeling more confident that my modifications do not causes more harm that good now that I have run about half a dozen phage through with no new errors that did not show up with the standard code base. Maybe we should have a few beta testers each try several phage with my code before we push them out to everyone.

All my changes are freely available if anyone wants to download them and try their phage. You can get my version of the code from my github repository (github.com cdshaffer/starterator). Anyone comfortable with using the git command line to pull from a remote directory can easily download and test the code. My most recent branch is called filterSpanningGenes.
Posted in: StarteratorRead First: Common Starterator Troubleshooting
| posted 10 Feb, 2016 16:39
Both Andies and Willsterrel appear to have issues that have already been solved as both ran fine on my copy with my bug fixes, links to full reports below:
Full starterator report for Andies
Full starterator report for WillSterrel
Posted in: Starteratorphage that crash starterator