SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

New Version of Starterator for 2017?

| posted 06 Jan, 2017 17:21
Hi all,

So we've got the new 2017 SEAVM up and running. I was expecting to see the new version of Starterator in there (with the names of the phages directly on the tracks, green (for human-annotated) vs. yellow (for auto-annotated) called starts, etc.). However, that's not what I get; I get the same outputs as the 2016 version (just the track number on each track, only blue lines for called starts). I can, however, access the new PDFs for each pham online.

Am I missing something in terms of getting the new outputs in Starterator in Ubuntu?

Thanks!
Nikki
| posted 10 Jan, 2017 01:15
Sorry I did get a chance to write some new code for starterator over thanksgiving in hopes of getting things into the VM but it was just not soon enough given the timing of the in silico workshop and the need to get the VM out to everyone attending. So if you are using the SEA 2017 and have not updated manually (using git command line), you still have the original version.

You can still use the SEA 2017 VM but you will not get any of my updates.
The current best way to get those reports is to use the pham reports I have been posting on the web. I post new reports every time there is a new phamerator database and I use my most recent code (for good or bad).

To get those you can go here http://phages.wustl.edu/starterator to see a list of all the pham reports (for all phams with two or more members). Just scroll down till you find the link for the pham you are interested in and click the link. However given the length of the list, an easier technique is to just type in the address using the pham number since everything else in the address is identical.

For example to get to the report for pham 21476 just type in this address

http://phages.wustl.edu/starterator/Pham21476Report.pdf

And if you want pham 66 just replace the 21476 in the above address with 66:

http://phages.wustl.edu/starterator/Pham66Report.pdf
| posted 11 Jan, 2017 18:30
Thanks for the clarification, Chris! I might try updating manually with the git command.
| posted 10 Feb, 2017 14:07
Hi Chris,

Thanks for working on the starterator program and updating the output. It looks great. The idea of pre-running on the phams and posting that output is excellent. Very helpful and likely how we will use the functionality most often.

Would you be able to add a post (or add to github) the command line procedure for updating to the current version of starterator in the 2017 SEA VM?

Thanks,

Aaron
| posted 10 Feb, 2017 16:51
The answer to your question depends on which version of starterator you want. There is currently version 1.1 which has many bug fixes (not all bugs are fixed just some) and other updates which I did mostly over the summer, it is the master branch at

https://github.com/SEA-PHAGES/starterator.git

Based on discussions with Welkin, Deb and extra feedback from the in silico workshop, I was able over winter to update a lot of the text output. I am using this code to run the pre-computed pdfs but that version has not been fully tested to make sure it is "release ready", (i.e. it runs on my machine with lots of extra modifications but I have not tested it with the default SEA 2017 VM). The version that is doing all the pre-computing is byphagewithbase branch in my personal repo at:

https://github.com/cdshaffer/starterator.git

Having said that anyone can install and run any version of the code but it takes a good understanding of unix administration and git:

1. Ensure you have the dependencies (if you are not using the SEA 2017 VM):

sudo apt-get install python-pip ncbi-blast+ git
sudo pip install PyPDF2
sudo pip install beautifulsoup4
sudo pip install requests
2. Remove all old Starterator files:

cd $HOME/Applications
rm -rfv $HOME/.starterator
rm -rfv $HOME/Applications/Starterator
3. Create new Starterator folder:

mkdir $HOME/Applications/Starterator
cd $HOME/Applications/Starterator
4. Clone the version you want:
git clone https:<insert URL here for the repo you want to clone>

5. Checkout the branch you want:

cd $HOME/Applications/Starterator/starterator
git checkout <insert branch name here>

6. Run your checked out version of Starterater:

bash starterator.sh

Please remember, no guarantees for anything in my repo to work on the 2017 VM. However if you try and it does work please do let me know.
Edited 13 Feb, 2017 17:04
| posted 11 Feb, 2017 17:58
Thanks, Chris. I'll give it a go with your latest code and let you know if I run into any errors.
| posted 11 Feb, 2017 18:23
Hi Chris,

I was able to easily install the byphagewithbase branch on the 2017 SEA VM, and it runs. However, the output produced is not identical to your pre-run reports with respect to the text. On my installation, I get "Phages represented in each track" output, but that is where the report ends. I don't get "Summary of Final Annotations" sections. Any thoughts?

In the instructions above, there was no need to install PyPDF2 or requests. They were already there in the default configuration.

Also, one other note on the output. If you could add the track number to the tracks, that would make it easier to identify them rather than having to count down the page. Definitely don't lose the phage names on the track, but maybe begin each of those lines with "track_number: phage name + X".
| posted 13 Feb, 2017 17:05
Aaron,

If you are doing whole phage then yes that is the current output for that branch. That branch is stuck in the middle of being updated, I have removed the old code for whole phage reports but have not had time to put in the new code to create the output similar to the pre-computed pham reports but with extra text based on which whole phage is being analyzed. Welcome to the cutting edge where you often get cut.

I did create a "whole phage" report manually by extracting all the pham numbers for the phage using command line mysql, creating all the pham pdf's using command line starterator, and finally concatenating all the pdf's together using command line ghostscipt. This will create a single PDF with all the pham reports for that phage but is still missing the first few header pages with the map and the list of suggested starts. If you want more details on exact commands let me know.

I like your idea of adding the track No. to the text on the track. I have added this as an issue on the starterator github page here:

https://github.com/SEA-PHAGES/starterator/issues/26

Not sure when I will have time to get back to coding on starterator, hopefully some over spring break.
Edited 13 Feb, 2017 17:50
| posted 14 Feb, 2017 17:50
I realized that even though I don't have time now to write code to create the kind of specific starterator text when doing whole phage reports, that something is better than nothing. So, I just cloned the code I used for creating the text for the individual pham reports into the code for whole phage. I ran a test with phage Amgine and you can download the results of that full page report here.

This solution is not ideal as the new individual pham reports give more details on each and every gene in the pham, so individual pham reports on large phams can get quite large. Now, when you put many of those together to make a whole phage report the file can get really really big. For example the whole phage Amgine report is over 1100 pages long. However, this may work better that going to the numerous pham reports on the web. I would suggest using a PDF viewer with good searching tools to quickly find things.

If anyone wants the updated code you just need to pull and run the most current version of the byphagewithbase branch from my github repo.
 
Login to post a reply.