SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by DanRussell

| posted 12 Aug, 2021 19:19
byrumc@cofc.edu
Quick question…During assembly of the genomes, is there a program used to trim the reads or is there no need to trim the 150-bp single end reads before looking at them in Newbler? Thanks!

Christine

Hi Christine,

We usually don't bother trimming the reads when doing phage assembly since it is usually fairly straightforward. So raw reads are totally fine.

We do use skewer to trim reads for our bacterial assemblies, partly because those are often 300-base reads and have lower quality more frequently towards their ends.

https://github.com/relipmoc/skewer

–Dan
Posted in: NewblerGetting Started with Phage Assembly
| posted 15 Jul, 2021 14:25
Hey Evan,

You make a good case as to why this one merits a check as a potential sequencing error, and it has some of those red flags (different from similar genomes, breaks a gene). But I just checked the sequencing data and see this:



The base in question is a couple to the right of the green line, and the "A" called there is really strongly supported with no conflicting reads. So it's a real biological thing!

–Dan
Posted in: AnnotationIs this a sequencing error?
| posted 22 Jun, 2021 13:17
Hi all,

This is partly my fault! Since we switched to a new web-hosting company back during the winter break, I haven't managed to successfully update all of my updating scripts, and hence I've needed to update to each new Phamerator database manually and often been behind.

But I just tried to update the Phamerator script on PhagesDB and make sure it's correct going forward, so hopefully there won't be major issues anymore.

–Dan
Posted in: StarteratorPham not found in Starterator
| posted 25 May, 2021 16:54
Hi Kathleen,

Looks like that screenshot shows someone using a Windows VM within a Mac, so they can't use Chris or Debbie's procedures, which would be Mac-specific. Instead, they can follow the directions here to connect to an FTP site from Windows:
https://www.howtogeek.com/272176/how-to-connect-to-ftp-servers-in-windows-without-extra-software/

Basically:
1. In Windows, open up a File Explorer window either using the Start Menu or clicking the little folder icon at the bottom of the screen.
2. In that File Explorer window, click on "This PC" on the left.
3. Right-click in the blank area called "Drives and Devices", and select "Add a network location" from the dropdown menu.
4. Then click "Choose a custom network location".
5. When it asks for an internet address, put "ftp://cobamide2.bio.pitt.edu/", then on the next screen make sure "Log in anonymously" is checked.
6. Complete the wizard with the open now option selected.
7. You should now see a folder open with the contents of that FTP site. Open the "DNAMas" folder, find the file "dna master.exe", and drag it to your desktop.

Then you should be able to install! Much more complicated than it used to be, but hopefully this will work.

–Dan
Posted in: DNA MasterDNA master server down?
| posted 26 Mar, 2021 13:09
Hi Kyle,

Yes, the way I most commonly "downsample" is by using a simple "head" command, which should be available on almost any Unix/Linux system. So if PhageReads.fastq is your big file of all reads, you'd do something like the following:

head -n 400000 PhageReads.fastq > 100k_PhageReads.fastq

You can play with the exact number. "head" just gives you the first "n" lines of the file. Each read in a fastq file is stored in 4 lines, and so when I ask for 400,000 lines, I'm getting 100,000 reads. If you use 1,000,000 in the command, you'll get 250,000 reads.

The ">" tells it to store the output in a new file, which you can name whatever you want. Then you can use that new file to move through the assembly process.

Good luck!
–Dan
Posted in: NewblerGetting Started with Phage Assembly
| posted 22 Feb, 2021 15:51
Hi Greg,

Yes, indeed, we had to move everything to a new location when our old web hosting company was acquired by G*D***y. Here's the new URL for our databases:

http://databases.hatfull.org/Actino_Draft/

Specifically, if you want the same database that's current in Phamerator and PhagesDB:

http://databases.hatfull.org/Actino_Draft/Actino_Draft.sql

(Don't click that link, it'll download the whole database!)

Take care,
–Dan
Posted in: PhameratorConnecting to Actinobacteriophage Database
| posted 27 Jan, 2021 19:34
Hi Chris,

As Debbie said, the key difference with BLASTx is that it's checking all possible reading frames, not just a single one. This definitely has its uses which are different than BLASTp. For example, if you have a gap in a phage annotation where no genes have been called, but there appears to be two or three possible ORFs in that region, you could BLASTx that nucleotide sequence and check all those ORFs at once. Or if you had a gene with a suspected intron, and wanted to check all the frames at once for presence of coding elements with matches in the database, you could use BLASTx.

DNA Master is actually sending amino acid sequence to the BLAST program, which checks that against an amino acid database, hence it's a BLASTp. If it were sending nucleotide sequence, BLAST itself would do the translation into all frames and then it would be BLASTx.

Another important distinction is that when you are using BLASTp, you've already made some choices about how your nucleotide sequence is translated. Not just the frame (which is very important of course, as mentioned above!) but also where it starts. A BLASTp search will just show you matches for the portion you've already decided is the gene, whereas a BLASTx will take all the ORFs (in their largest possible versions) and do lots of searches.

While checking results from mass spec is definitely one use for BLASTp, I'm sure that the vast majority of both BLASTp queries and BLASTp results are bioinformatically predicted rather than experimentally verified.

Hope that makes some sense!
–Dan
Posted in: DNA MasterDNA Master Blast type
| posted 27 Jan, 2021 14:40
Hi Chris,

If you BLAST a gene in DNA Master, you're actually BLASTing the gene product. (Since DNA Master has annotated genes, it's using those annotated amino acid sequences as the queries for BLAST.) So it's a BLASTp in DNA Master. Same for PECAAN, it's running BLASTp on PhagesDB or NCBI using the currently-annotated gene product as the query.

The only BLAST (I think) in Phamerator is the pairwise-genome comparison that shows coloration between genomes. That's a BLASTn of the genomic nucleotide sequences.

–Dan
Posted in: DNA MasterDNA Master Blast type
| posted 20 Nov, 2020 15:06
QUANTITATIVE BIOLOGIST - Assistant Professor of Teaching

Description: The University of California, Davis invites applications for an Assistant Professor of Teaching position in quantitative biology education. Professors of Teaching are Academic Senate faculty members whose expertise and responsibilities center on undergraduate education and scholarly analysis and improvement of teaching methods. The official title is Lecturer with the Potential for Security of Employment (LPSOE). LPSOEs are eligible for promotion to “Security of Employment,” which is analogous to tenure. The position will reside in the Department of Evolution and Ecology and be affiliated with the planned interdisciplinary major in Quantitative Biology, which seeks to serve as a national model for quantitative biology education. The appointee will be expected to carry out a high-level teaching program focused on the development of innovative curricula and use of effective teaching methods in quantitative biology. The appointee also will be expected to conduct scholarly research, which for this position is expected to include rigorous development and assessment of teaching methods and curricula in quantitative biology. The appointee will teach four undergraduate courses per year, including Mathematics or Statistics courses enrolling biology students. The appointee will also provide service to the planned Quantitative Biology major and the Department of Evolution and Ecology. Service expectations include, but are not limited to, undergraduate quantitative biology curriculum development, administration, and assessment, as well as outreach activities promoting the planned Quantitative Biology major. Academic scholarship expectations include dissemination of educational findings at a national level through peer-reviewed publications and conference presentations, and the potential to attract extramural funding to support this scholarship. In addition, the appointee should be committed to mentoring and fostering diversity, equity, and inclusion.

UC Davis, located approximately one hour from the San Francisco Bay area, has a large and highly collaborative community of Teaching Professors within the College of Biological Sciences and throughout the larger campus, a strong history of interdisciplinary educational initiatives in STEM fields, and a nationally renowned Center for Educational Effectiveness that together provide an excellent environment for work in quantitative biology education.

To ensure full consideration, completed applications should be received by November 30, 2020.

For questions about the position, please contact Mark Goldman (msgoldman@ucdavis.edu) or Sebastian Schreiber (sschreiber@ucdavis.edu).

For more details and to apply, please see:
https://recruit.ucdavis.edu/JPF03860
Posted in: General Message BoardJob Opportunity at UC Davis
| posted 11 Nov, 2020 15:51
Hi Kyle,

Good questions! There are a few resources that might be helpful here. One is that I wrote a small software package that helps streamline some of the assembly/QC process for phage genomes. It's called phageAssembler and is on github.

https://github.com/SEA-PHAGES/phageAssembler

It's only really meant to be installed on the 2017 SEA Virtual Machine. (I didn't really spend the time to make it thoroughly cross-platform.) But it should work there if you follow the Quick Start instructions. Because Newbler and consed are already installed on the SEA VM, it can use those installations and basically does the following:

INPUT: fastq file
1. Downsample reads from your fastq file to get a workable number (default 80,000)
2. Assemble those reads with Newbler
3. Report #s of contigs & sizes
4. BLAST large contigs against a phage database and report possible cluster
5. Attempt to locate base 1 by similarity to genomes in the database
6. Report coverage and GC% of assembled contigs
7. Run AceUtil to search and tag assembly weak areas
8. Create consed-ready file for review
9. Write findings to a log file

You can certainly do all those steps independently if you'd like to learn the process, but this script kind of gets you to the actual analysis part, skipping a lot of the need to learn command-line stuff for many different programs.

The second resource is a chapter I wrote that details the whole process:
https://pubmed.ncbi.nlm.nih.gov/29134591/

(If you can't get access, I can share the manuscript.) It's a more general look at what things you need to think about when sequencing and finishing phage genomes.

Finally, there are some video tutorials I made that walk through some of the assembly/finishing process. These are a bit old and potentially outdated, but probably still have some useful info if you want to do more of the steps yourself.
https://phagesdb.org/workflow/Sequencing/

And also, if you do sequence/assemble your own, we would definitely like to double-check them and include them in PhagesDB. To do so, we'd need your final sequence file and the sequencing reads.

Hope that helps!
–Dan
Edited 11 Nov, 2020 16:00
Posted in: NewblerGetting Started with Phage Assembly