SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by DanRussell

| posted 17 Nov, 2021 16:24
Hi Kyle,

Very interesting stuff! We have some Nanopore experience as well, but I'm pretty wary on its readiness to be a one-technology phage-sequencing option. In our most recent runs using a previously-sequenced (known) phage, single reads are around 89% accurate, but even high-coverage assemblies are still only around 98-99% accurate. Obviously, that means than 1 in every 50-100 bases would be wrong or gapped—even after lots of coverage—and that's not good enough to consider a phage "sequenced" or proceed with annotation.

(Side note: many of the remaining errors were 1-2 base insertions/deletions, so they'd definitely throw a wrench in annotation.)

That said, technologies improve over time, as does the software to make sense of their raw data. To really feel confident that an only-Nanopore-sequenced phage genome is reliable, we'd need to do several phages with known sequences and compare the Nano output to the reference. Chris actually did this with PacBio sequencing a bunch of years ago, and convinced me that when using the proper type of PacBio reads with enough coverage, you could trust a final sequence that came out of PacBio.

You're right that, while Illumina-Nano hybrid assemblies have been great for bacterial sequencing, they're overkill for phages. Since almost all phages assemble fine with Illumina reads only, the Nanopore isn't necessary. But that doesn't mean it can't have a use in phage research or a SEA-PHAGES classroom. For example, it's probably economically feasible (and cool) for students to each get a little bit of Nanopore data for their phages, and then you could use that to decide which ones to send for Illumina sequencing, or add a Cluster to the phage's profile.

We'll be talking about this stuff more at the next virtual faculty meeting! I think it's slated for Dec 17th, hopefully you'll be free.

Quick question: which Nanopore library prep kit did you use for you phage sequencing?

–Dan
Posted in: Sequencing, Assembling, and Finishing GenomesNanopore
| posted 21 Sep, 2021 20:04
Hi Steve,

Jeffrey has made a couple of changes that should help:

  1. Moved the DNA Master install file to an HTTP address instead of an FTP address to make the initial download easier
  2. Changed the default FTP mode within DNA Master from Active to Passive

The first will help with downloading the installer, while the second should make the first couple of updates work on most systems. That said, the usual caveats still apply: run as Admin, make sure preferences are correct, etc. Not sure why yours suddenly stopped working, but mine seems to be okay.

–Dan
Edited 21 Sep, 2021 20:05
Posted in: DNA MasterDNA master server down?
| posted 12 Aug, 2021 19:19
byrumc@cofc.edu
Quick question…During assembly of the genomes, is there a program used to trim the reads or is there no need to trim the 150-bp single end reads before looking at them in Newbler? Thanks!

Christine

Hi Christine,

We usually don't bother trimming the reads when doing phage assembly since it is usually fairly straightforward. So raw reads are totally fine.

We do use skewer to trim reads for our bacterial assemblies, partly because those are often 300-base reads and have lower quality more frequently towards their ends.

https://github.com/relipmoc/skewer

–Dan
Posted in: NewblerGetting Started with Phage Assembly
| posted 15 Jul, 2021 14:25
Hey Evan,

You make a good case as to why this one merits a check as a potential sequencing error, and it has some of those red flags (different from similar genomes, breaks a gene). But I just checked the sequencing data and see this:



The base in question is a couple to the right of the green line, and the "A" called there is really strongly supported with no conflicting reads. So it's a real biological thing!

–Dan
Posted in: AnnotationIs this a sequencing error?
| posted 22 Jun, 2021 13:17
Hi all,

This is partly my fault! Since we switched to a new web-hosting company back during the winter break, I haven't managed to successfully update all of my updating scripts, and hence I've needed to update to each new Phamerator database manually and often been behind.

But I just tried to update the Phamerator script on PhagesDB and make sure it's correct going forward, so hopefully there won't be major issues anymore.

–Dan
Posted in: StarteratorPham not found in Starterator
| posted 25 May, 2021 16:54
Hi Kathleen,

Looks like that screenshot shows someone using a Windows VM within a Mac, so they can't use Chris or Debbie's procedures, which would be Mac-specific. Instead, they can follow the directions here to connect to an FTP site from Windows:
https://www.howtogeek.com/272176/how-to-connect-to-ftp-servers-in-windows-without-extra-software/

Basically:
1. In Windows, open up a File Explorer window either using the Start Menu or clicking the little folder icon at the bottom of the screen.
2. In that File Explorer window, click on "This PC" on the left.
3. Right-click in the blank area called "Drives and Devices", and select "Add a network location" from the dropdown menu.
4. Then click "Choose a custom network location".
5. When it asks for an internet address, put "ftp://cobamide2.bio.pitt.edu/", then on the next screen make sure "Log in anonymously" is checked.
6. Complete the wizard with the open now option selected.
7. You should now see a folder open with the contents of that FTP site. Open the "DNAMas" folder, find the file "dna master.exe", and drag it to your desktop.

Then you should be able to install! Much more complicated than it used to be, but hopefully this will work.

–Dan
Posted in: DNA MasterDNA master server down?
| posted 26 Mar, 2021 13:09
Hi Kyle,

Yes, the way I most commonly "downsample" is by using a simple "head" command, which should be available on almost any Unix/Linux system. So if PhageReads.fastq is your big file of all reads, you'd do something like the following:

head -n 400000 PhageReads.fastq > 100k_PhageReads.fastq

You can play with the exact number. "head" just gives you the first "n" lines of the file. Each read in a fastq file is stored in 4 lines, and so when I ask for 400,000 lines, I'm getting 100,000 reads. If you use 1,000,000 in the command, you'll get 250,000 reads.

The ">" tells it to store the output in a new file, which you can name whatever you want. Then you can use that new file to move through the assembly process.

Good luck!
–Dan
Posted in: NewblerGetting Started with Phage Assembly
| posted 22 Feb, 2021 15:51
Hi Greg,

Yes, indeed, we had to move everything to a new location when our old web hosting company was acquired by G*D***y. Here's the new URL for our databases:

http://databases.hatfull.org/Actino_Draft/

Specifically, if you want the same database that's current in Phamerator and PhagesDB:

http://databases.hatfull.org/Actino_Draft/Actino_Draft.sql

(Don't click that link, it'll download the whole database!)

Take care,
–Dan
Posted in: PhameratorConnecting to Actinobacteriophage Database
| posted 27 Jan, 2021 19:34
Hi Chris,

As Debbie said, the key difference with BLASTx is that it's checking all possible reading frames, not just a single one. This definitely has its uses which are different than BLASTp. For example, if you have a gap in a phage annotation where no genes have been called, but there appears to be two or three possible ORFs in that region, you could BLASTx that nucleotide sequence and check all those ORFs at once. Or if you had a gene with a suspected intron, and wanted to check all the frames at once for presence of coding elements with matches in the database, you could use BLASTx.

DNA Master is actually sending amino acid sequence to the BLAST program, which checks that against an amino acid database, hence it's a BLASTp. If it were sending nucleotide sequence, BLAST itself would do the translation into all frames and then it would be BLASTx.

Another important distinction is that when you are using BLASTp, you've already made some choices about how your nucleotide sequence is translated. Not just the frame (which is very important of course, as mentioned above!) but also where it starts. A BLASTp search will just show you matches for the portion you've already decided is the gene, whereas a BLASTx will take all the ORFs (in their largest possible versions) and do lots of searches.

While checking results from mass spec is definitely one use for BLASTp, I'm sure that the vast majority of both BLASTp queries and BLASTp results are bioinformatically predicted rather than experimentally verified.

Hope that makes some sense!
–Dan
Posted in: DNA MasterDNA Master Blast type
| posted 27 Jan, 2021 14:40
Hi Chris,

If you BLAST a gene in DNA Master, you're actually BLASTing the gene product. (Since DNA Master has annotated genes, it's using those annotated amino acid sequences as the queries for BLAST.) So it's a BLASTp in DNA Master. Same for PECAAN, it's running BLASTp on PhagesDB or NCBI using the currently-annotated gene product as the query.

The only BLAST (I think) in Phamerator is the pairwise-genome comparison that shows coloration between genomes. That's a BLASTn of the genomic nucleotide sequences.

–Dan
Posted in: DNA MasterDNA Master Blast type