SEA-PHAGES | Nanopore

Link to this post \| posted 10 Nov, 2021 17:26
kmaclea	So, I've recently sequenced a couple of actinophages with Nanopore. I know Nanopore has high error rates, and there are several ways to deal with this, including of course hybrid short-read assemblies, and also polishing of Nanopore data. In this case, re-sequencing with Illumina would seem to make the point of Nanopore very small, though, since assembly with Illumina only is fairly trivial most of the time. In the latest example, I have a new phage that looks to be a member of cluster AW, with my longest read (115k-ish) a little more than twice what I would expect for the genome size, so I assume I basically have two copies in there. There are actually three such reads, and they all BLAST to cluster AW. I'm wondering if we can have a discussion about the use of Nanopore data in the process. Has this been done before? I don't see any mentions on the Forum of Nanopore or MinION or similar. Any thoughts? Kyle – Kyle MacLea Associate Professor, University of New Hampshire at Manchester kyle.maclea@unh.edu +1 603-641-4129

Link to this post | posted 10 Nov, 2021 17:26

So, I've recently sequenced a couple of actinophages with Nanopore. I know Nanopore has high error rates, and there are several ways to deal with this, including of course hybrid short-read assemblies, and also polishing of Nanopore data.

In this case, re-sequencing with Illumina would seem to make the point of Nanopore very small, though, since assembly with Illumina only is fairly trivial most of the time.

In the latest example, I have a new phage that looks to be a member of cluster AW, with my longest read (115k-ish) a little more than twice what I would expect for the genome size, so I assume I basically have two copies in there. There are actually three such reads, and they all BLAST to cluster AW.

I'm wondering if we can have a discussion about the use of Nanopore data in the process. Has this been done before? I don't see any mentions on the Forum of Nanopore or MinION or similar.

Any thoughts?

Kyle

–
Kyle MacLea
Associate Professor, University of New Hampshire at Manchester
kyle.maclea@unh.edu +1 603-641-4129

Link to this post \| posted 10 Nov, 2021 17:27
kmaclea	I should note I have over 9000 reads greater than 50k in my data set as well…. Lots of long reads. Polishing should be fruitful to reduce the error rates in the data. K= – Kyle MacLea Associate Professor, University of New Hampshire at Manchester kyle.maclea@unh.edu +1 603-641-4129

Link to this post \| posted 11 Nov, 2021 20:24
cdshaffer	Very cool, I think it could be fun to set it up so students could do their own sequencing on a nanopore sometime late in the first semester. These genomes are so small I think we could get enough data even on the smallest (i.e. cheapest) of the nanopore sequencers. Which did you use? Were you using the standard Minion or one of the smaller flongles? OR was this outsourced on a GridION? Thanks for being the initial test subject. The only comment I have is that I thought there was a certain rate in which the pore will pick up and start sequencing the second strand pretty quickly after the 1st. Does your single long 115k read look like an inverted repeat? This is what I would anticipate if you were reading the second strand after pulling the first strand through the pore. Have you tried assembly yet? Do you get a single contig of the size expected for an AW phage? If so I would think you get an estimate of the quality of the genome assembly by just running the draft assembly through auto-annotation. We have such a strong expectation of "tight pack" genes and since many assembly/sequencing errors would disrupt genes, I would think an auto-annotation of that preliminary assembly could provide a decent estimate of the quality and if there was a need for illumina polishing or not.

Link to this post | posted 11 Nov, 2021 20:24

cdshaffer

Very cool, I think it could be fun to set it up so students could do their own sequencing on a nanopore sometime late in the first semester. These genomes are so small I think we could get enough data even on the smallest (i.e. cheapest) of the nanopore sequencers. Which did you use? Were you using the standard Minion or one of the smaller flongles? OR was this outsourced on a GridION? Thanks for being the initial test subject.

The only comment I have is that I thought there was a certain rate in which the pore will pick up and start sequencing the second strand pretty quickly after the 1st. Does your single long 115k read look like an inverted repeat? This is what I would anticipate if you were reading the second strand after pulling the first strand through the pore.

Have you tried assembly yet? Do you get a single contig of the size expected for an AW phage? If so I would think you get an estimate of the quality of the genome assembly by just running the draft assembly through auto-annotation. We have such a strong expectation of "tight pack" genes and since many assembly/sequencing errors would disrupt genes, I would think an auto-annotation of that preliminary assembly could provide a decent estimate of the quality and if there was a need for illumina polishing or not.

Link to this post \| posted 11 Nov, 2021 21:06
kmaclea	Great to hear from you, Chris! I haven't yet done any assembly of the smaller reads to see if we get a single contig of the size expected for an AW phage. My initial guess is just as you propose–the second strand was sequenced directly after the first, and this is why we get the 'doubled' sequence. But I need to look in more detail and prove that out. Your suggestion to run the auto-annotation to see whether assembly/sequencing errors are disruptive enough to be an issue for annotation is a good one. When I have a moment I will try this and see what I come up with. I will report back. As for what I was doing–in this case I had access to a bunch of post-warranty old MinIon/GridION flow cells, so I did in fact (waste/er… use a whole flow cell on my AW phage. Clearly an entire MinION/GridION flow cell is vast overkill. That being said, if you use the Native Barcoding 1-12 and 13-24 kits along with your library preps, you could absolutely sequence (I am convinced) 24 phages on one flowcell which brings the costs into manageable range. For the classroom, perhaps having instructional staff do the library preps (just based on the amount of time necessary to DO it) may be necessary, but the students could help/watch the loading process and see the data coming in real time. I think it could be a great addition… Perhaps putting aside the phages being sent to Pitt for sequencing, and sequencing all your class's OTHER phages, or a subset of them, could be a goal. I have now ordered some Flongles, but I have not tried them yet. Back-of-envelope estimation…. 50 GB per MinION flow cell vs 2.8 GB per Flongle. If you ran 24 in multiplexed fashion on a single MinION, you would expect 2.08 GB per sample on average, which should be plenty of capacity to sequence a single phage, and comparable but a bit lower than the Flongle capacity. It's currently $1460 for 12 Flongles or $4400 for 48. ($91-121 per Flongle or "Phage" not including the Library prep which is probably another ~$60 or so per sample) MinION starter kit is $1000 which I believe comes with one sequencing kit but no barcoding kits which are $288 each. So let's assume you need to spend $1576 but can then process 24 phages. Cost then for 12 flongles is about $181/phage and for multiplexed minion is about $65/phage. (Also this doesn't include that you ALSO need to have purchased a MinION to use the Flongle with so the inital cost for Flongle use should be incremented by $1000 but then you would ALSO have another whole flow cell and the library prep costs could be covered if you used what came with your starter kit.) One intangible, though–with Flongles each student can actually load the Flongle mini-flow cell themselves, which is a valuable experience. However, that also means taking the TIME during the semester to allow that if you have a finite number of MinIONs to use your Flongles with–each runs for about 16 hours, so you would need to do them all consecutively depending on the number of MinIONs you own, basically one Flongle sample per calendar day per MinION device you own. I suspect if you have a decent number of students this will be unsustainable in terms of days in the semester to work with. So I have now convinced myself that the multiplexed MinION would be the way to go for large classes at least. For situations where very small numbers of students are involved, the Flongles may be a good angle. Regardless–I completely agree that this could be a really valuable addition. Once I have a sense of what error rates look like (and what Nanopore-only-polishing does for those error rates) I think I will be able to give some better answers to your queries. But I think this could be really enjoyable and beneficial for the students! Kyle – Kyle MacLea Associate Professor, University of New Hampshire at Manchester kyle.maclea@unh.edu +1 603-641-4129

Link to this post | posted 11 Nov, 2021 21:06

kmaclea

Great to hear from you, Chris!

I haven't yet done any assembly of the smaller reads to see if we get a single contig of the size expected for an AW phage. My initial guess is just as you propose–the second strand was sequenced directly after the first, and this is why we get the 'doubled' sequence. But I need to look in more detail and prove that out.

Your suggestion to run the auto-annotation to see whether assembly/sequencing errors are disruptive enough to be an issue for annotation is a good one. When I have a moment I will try this and see what I come up with. I will report back.

As for what I was doing–in this case I had access to a bunch of post-warranty old MinIon/GridION flow cells, so I did in fact (waste/er… smile

use a whole flow cell on my AW phage. Clearly an entire MinION/GridION flow cell is vast overkill. That being said, if you use the Native Barcoding 1-12 and 13-24 kits along with your library preps, you could absolutely sequence (I am convinced) 24 phages on one flowcell which brings the costs into manageable range. For the classroom, perhaps having instructional staff do the library preps (just based on the amount of time necessary to DO it) may be necessary, but the students could help/watch the loading process and see the data coming in real time. I think it could be a great addition… Perhaps putting aside the phages being sent to Pitt for sequencing, and sequencing all your class's OTHER phages, or a subset of them, could be a goal. I have now ordered some Flongles, but I have not tried them yet.

Back-of-envelope estimation…. 50 GB per MinION flow cell vs 2.8 GB per Flongle. If you ran 24 in multiplexed fashion on a single MinION, you would expect 2.08 GB per sample on average, which should be plenty of capacity to sequence a single phage, and comparable but a bit lower than the Flongle capacity. It's currently $1460 for 12 Flongles or $4400 for 48. ($91-121 per Flongle or "Phage" not including the Library prep which is probably another ~$60 or so per sample) MinION starter kit is $1000 which I believe comes with one sequencing kit but no barcoding kits which are $288 each. So let's assume you need to spend $1576 but can then process 24 phages. Cost then for 12 flongles is about $181/phage and for multiplexed minion is about $65/phage. (Also this doesn't include that you ALSO need to have purchased a MinION to use the Flongle with so the inital cost for Flongle use should be incremented by $1000 but then you would ALSO have another whole flow cell and the library prep costs could be covered if you used what came with your starter kit.)

One intangible, though–with Flongles each student can actually load the Flongle mini-flow cell themselves, which is a valuable experience. However, that also means taking the TIME during the semester to allow that if you have a finite number of MinIONs to use your Flongles with–each runs for about 16 hours, so you would need to do them all consecutively depending on the number of MinIONs you own, basically one Flongle sample per calendar day per MinION device you own. I suspect if you have a decent number of students this will be unsustainable in terms of days in the semester to work with.

So I have now convinced myself that the multiplexed MinION would be the way to go for large classes at least. For situations where very small numbers of students are involved, the Flongles may be a good angle.

Regardless–I completely agree that this could be a really valuable addition.

Once I have a sense of what error rates look like (and what Nanopore-only-polishing does for those error rates) I think I will be able to give some better answers to your queries.

But I think this could be really enjoyable and beneficial for the students!

Kyle

–
Kyle MacLea
Associate Professor, University of New Hampshire at Manchester
kyle.maclea@unh.edu +1 603-641-4129

Link to this post \| posted 17 Nov, 2021 15:28
kmaclea	cdshaffer Have you tried assembly yet? Do you get a single contig of the size expected for an AW phage? If so I would think you get an estimate of the quality of the genome assembly by just running the draft assembly through auto-annotation. We have such a strong expectation of "tight pack" genes and since many assembly/sequencing errors would disrupt genes, I would think an auto-annotation of that preliminary assembly could provide a decent estimate of the quality and if there was a need for illumina polishing or not. I did the original phage Nanopore that I described above, and then I did a phage multiplex Nanopore where I ran 8 phages on one flow cell. If anything, the multiplex was BETTER for assembly–because it had fewer reads for each phage the phages assembled quickly and easily. Now–polishing is the next question. You can polish Nanopore data with just Nanopore reads (racon and other programs). It may be that this is sufficient versus doing an Illumina "polish." I suspect more Illumina on top of Nanopore is really overkill, so I want to spend some time on the Nanopore-only polishing steps for the phages I've done so far. But if those go well I think this could be really interesting. Kyle – Kyle MacLea Associate Professor, University of New Hampshire at Manchester kyle.maclea@unh.edu +1 603-641-4129

Link to this post | posted 17 Nov, 2021 15:28

kmaclea

cdshaffer
Have you tried assembly yet? Do you get a single contig of the size expected for an AW phage? If so I would think you get an estimate of the quality of the genome assembly by just running the draft assembly through auto-annotation. We have such a strong expectation of "tight pack" genes and since many assembly/sequencing errors would disrupt genes, I would think an auto-annotation of that preliminary assembly could provide a decent estimate of the quality and if there was a need for illumina polishing or not.

I did the original phage Nanopore that I described above, and then I did a phage multiplex Nanopore where I ran 8 phages on one flow cell. If anything, the multiplex was BETTER for assembly–because it had fewer reads for each phage the phages assembled quickly and easily.

Now–polishing is the next question. You can polish Nanopore data with just Nanopore reads (racon and other programs). It may be that this is sufficient versus doing an Illumina "polish." I suspect more Illumina on top of Nanopore is really overkill, so I want to spend some time on the Nanopore-only polishing steps for the phages I've done so far. But if those go well I think this could be really interesting.

Kyle

–
Kyle MacLea
Associate Professor, University of New Hampshire at Manchester
kyle.maclea@unh.edu +1 603-641-4129

Link to this post \| posted 17 Nov, 2021 16:24
DanRussell	Hi Kyle, Very interesting stuff! We have some Nanopore experience as well, but I'm pretty wary on its readiness to be a one-technology phage-sequencing option. In our most recent runs using a previously-sequenced (known) phage, single reads are around 89% accurate, but even high-coverage assemblies are still only around 98-99% accurate. Obviously, that means than 1 in every 50-100 bases would be wrong or gapped—even after lots of coverage—and that's not good enough to consider a phage "sequenced" or proceed with annotation. (Side note: many of the remaining errors were 1-2 base insertions/deletions, so they'd definitely throw a wrench in annotation.) That said, technologies improve over time, as does the software to make sense of their raw data. To really feel confident that an only-Nanopore-sequenced phage genome is reliable, we'd need to do several phages with known sequences and compare the Nano output to the reference. Chris actually did this with PacBio sequencing a bunch of years ago, and convinced me that when using the proper type of PacBio reads with enough coverage, you could trust a final sequence that came out of PacBio. You're right that, while Illumina-Nano hybrid assemblies have been great for bacterial sequencing, they're overkill for phages. Since almost all phages assemble fine with Illumina reads only, the Nanopore isn't necessary. But that doesn't mean it can't have a use in phage research or a SEA-PHAGES classroom. For example, it's probably economically feasible (and cool) for students to each get a little bit of Nanopore data for their phages, and then you could use that to decide which ones to send for Illumina sequencing, or add a Cluster to the phage's profile. We'll be talking about this stuff more at the next virtual faculty meeting! I think it's slated for Dec 17th, hopefully you'll be free. Quick question: which Nanopore library prep kit did you use for you phage sequencing? –Dan

Link to this post | posted 17 Nov, 2021 16:24

DanRussell

Hi Kyle,

Very interesting stuff! We have some Nanopore experience as well, but I'm pretty wary on its readiness to be a one-technology phage-sequencing option. In our most recent runs using a previously-sequenced (known) phage, single reads are around 89% accurate, but even high-coverage assemblies are still only around 98-99% accurate. Obviously, that means than 1 in every 50-100 bases would be wrong or gapped—even after lots of coverage—and that's not good enough to consider a phage "sequenced" or proceed with annotation.

(Side note: many of the remaining errors were 1-2 base insertions/deletions, so they'd definitely throw a wrench in annotation.)

That said, technologies improve over time, as does the software to make sense of their raw data. To really feel confident that an only-Nanopore-sequenced phage genome is reliable, we'd need to do several phages with known sequences and compare the Nano output to the reference. Chris actually did this with PacBio sequencing a bunch of years ago, and convinced me that when using the proper type of PacBio reads with enough coverage, you could trust a final sequence that came out of PacBio.

You're right that, while Illumina-Nano hybrid assemblies have been great for bacterial sequencing, they're overkill for phages. Since almost all phages assemble fine with Illumina reads only, the Nanopore isn't necessary. But that doesn't mean it can't have a use in phage research or a SEA-PHAGES classroom. For example, it's probably economically feasible (and cool) for students to each get a little bit of Nanopore data for their phages, and then you could use that to decide which ones to send for Illumina sequencing, or add a Cluster to the phage's profile.

We'll be talking about this stuff more at the next virtual faculty meeting! I think it's slated for Dec 17th, hopefully you'll be free.

Quick question: which Nanopore library prep kit did you use for you phage sequencing?

–Dan

Link to this post \| posted 17 Nov, 2021 16:41
kmaclea	DanRussell Hi Kyle, Very interesting stuff! We have some Nanopore experience as well, but I'm pretty wary on its readiness to be a one-technology phage-sequencing option. In our most recent runs using a previously-sequenced (known) phage, single reads are around 89% accurate, but even high-coverage assemblies are still only around 98-99% accurate. Obviously, that means than 1 in every 50-100 bases would be wrong or gapped—even after lots of coverage—and that's not good enough to consider a phage "sequenced" or proceed with annotation.(Side note: many of the remaining errors were 1-2 base insertions/deletions, so they'd definitely throw a wrench in annotation.) Indels are definitely the biggest problem. My sense speaking with others is that a good polish with the newest software against the raw reads can make this rate quite low, but I agree with you that verification of this seems not only wise but essential to have confidence. DanRussell That said, technologies improve over time, as does the software to make sense of their raw data. To really feel confident that an only-Nanopore-sequenced phage genome is reliable, we'd need to do several phages with known sequences and compare the Nano output to the reference. Chris actually did this with PacBio sequencing a bunch of years ago, and convinced me that when using the proper type of PacBio reads with enough coverage, you could trust a final sequence that came out of PacBio. I agree. I have one phage in hand that I have personally sequenced and assembled from Illumina data (which I need to send you to verify that I have correctly found base 1–expect that soon). I could do that one in Nanopore as a comparison to include in our discussions. I then do have the ones I just did in Nanopore only that we could try to get Illumina data on. Is that something it would make sense for me to pursue separately for verification, or do you think that could fit into any existing runs you have planned? DanRussell You're right that, while Illumina-Nano hybrid assemblies have been great for bacterial sequencing, they're overkill for phages. Since almost all phages assemble fine with Illumina reads only, the Nanopore isn't necessary. But that doesn't mean it can't have a use in phage research or a SEA-PHAGES classroom. For example, it's probably economically feasible (and cool) for students to each get a little bit of Nanopore data for their phages, and then you could use that to decide which ones to send for Illumina sequencing, or add a Cluster to the phage's profile. I had not thought of this as a pre-screen method to find cluster assignment or to select which phage(s) to sequence with Illumina, but that would also make sense. It would make even more sense with DOGEMS–if you could narrow down a group to pool that included no cluster-mates, you could be confident you will get a number of separate assembling sequences out the other end. In our last DOGEMS iteration we had a couple of (I think) AZ phages that would not assemble in the presence of the others. If we had known ahead of time we could have customized the DOGEMS pool to include only one of those, perhaps enabling higher numbers of successful assemblies. DanRussell We'll be talking about this stuff more at the next virtual faculty meeting! I think it's slated for Dec 17th, hopefully you'll be free. I would definitely like to be a part of this discussion. I think that will be a good week for me, and my last day of regular childcare for the year–so I will plan to participate! DanRussell Quick question: which Nanopore library prep kit did you use for you phage sequencing? –Dan I am in my sabbatical lab right now which had a bunch of LSK-109 Ligation Sequencing Kits and Rev9 flow cells for me to use. So right now I used LSK-109 on one phage alone (WAAAAAAAY overkill but the flow cell was expired and no one else would use it so….) and one flow cell that I used LSK-109 with the Native Barcoding Kit 1-12 to multiplex. I have my own home lab MinION, though, and could use other kits if needed. Kyle – Kyle MacLea Associate Professor, University of New Hampshire at Manchester kyle.maclea@unh.edu +1 603-641-4129

Link to this post | posted 17 Nov, 2021 16:41

kmaclea

DanRussell
Hi Kyle,

Very interesting stuff! We have some Nanopore experience as well, but I'm pretty wary on its readiness to be a one-technology phage-sequencing option. In our most recent runs using a previously-sequenced (known) phage, single reads are around 89% accurate, but even high-coverage assemblies are still only around 98-99% accurate. Obviously, that means than 1 in every 50-100 bases would be wrong or gapped—even after lots of coverage—and that's not good enough to consider a phage "sequenced" or proceed with annotation.(Side note: many of the remaining errors were 1-2 base insertions/deletions, so they'd definitely throw a wrench in annotation.)

Indels are definitely the biggest problem. My sense speaking with others is that a good polish with the newest software against the raw reads can make this rate quite low, but I agree with you that verification of this seems not only wise but essential to have confidence.

DanRussell
That said, technologies improve over time, as does the software to make sense of their raw data. To really feel confident that an only-Nanopore-sequenced phage genome is reliable, we'd need to do several phages with known sequences and compare the Nano output to the reference. Chris actually did this with PacBio sequencing a bunch of years ago, and convinced me that when using the proper type of PacBio reads with enough coverage, you could trust a final sequence that came out of PacBio.

I agree. I have one phage in hand that I have personally sequenced and assembled from Illumina data (which I need to send you to verify that I have correctly found base 1–expect that soon). I could do that one in Nanopore as a comparison to include in our discussions. I then do have the ones I just did in Nanopore only that we could try to get Illumina data on. Is that something it would make sense for me to pursue separately for verification, or do you think that could fit into any existing runs you have planned?

DanRussell
You're right that, while Illumina-Nano hybrid assemblies have been great for bacterial sequencing, they're overkill for phages. Since almost all phages assemble fine with Illumina reads only, the Nanopore isn't necessary. But that doesn't mean it can't have a use in phage research or a SEA-PHAGES classroom. For example, it's probably economically feasible (and cool) for students to each get a little bit of Nanopore data for their phages, and then you could use that to decide which ones to send for Illumina sequencing, or add a Cluster to the phage's profile.

I had not thought of this as a pre-screen method to find cluster assignment or to select which phage(s) to sequence with Illumina, but that would also make sense. It would make even more sense with DOGEMS–if you could narrow down a group to pool that included no cluster-mates, you could be confident you will get a number of separate assembling sequences out the other end. In our last DOGEMS iteration we had a couple of (I think) AZ phages that would not assemble in the presence of the others. If we had known ahead of time we could have customized the DOGEMS pool to include only one of those, perhaps enabling higher numbers of successful assemblies.

DanRussell
We'll be talking about this stuff more at the next virtual faculty meeting! I think it's slated for Dec 17th, hopefully you'll be free.

I would definitely like to be a part of this discussion. I think that will be a good week for me, and my last day of regular childcare for the year–so I will plan to participate!

DanRussell
Quick question: which Nanopore library prep kit did you use for you phage sequencing?
–Dan

I am in my sabbatical lab right now which had a bunch of LSK-109 Ligation Sequencing Kits and Rev9 flow cells for me to use. So right now I used LSK-109 on one phage alone (WAAAAAAAY overkill but the flow cell was expired and no one else would use it so….) and one flow cell that I used LSK-109 with the Native Barcoding Kit 1-12 to multiplex.

I have my own home lab MinION, though, and could use other kits if needed.

Kyle

–
Kyle MacLea
Associate Professor, University of New Hampshire at Manchester
kyle.maclea@unh.edu +1 603-641-4129

Recent Activity

Nanopore