SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

Getting Started with Phage Assembly

| posted 12 Aug, 2021 20:05
Thanks, Dan! That helps!
Christine
| posted 07 Aug, 2022 17:41
Hi, I have a (circular) genome that was assembled at NCSU and its closest hit is a phage that was annotated in 2010 at Wellcome Trust Sanger back in 2010 (NC_015296.1). So, no UPitt assembly QC on either, I'm afraid. Hopefully, someone will help me wrap my head around this.

The strand is completely opposite between the two and there is a relative inversion in the middle. That is, all my forward genes are listed as complement in the other phage and vice versa. But, (with gp210 approximately the ends of both) the gp numbers are partially inverted gp1+strand=gp64-strand, gp2+strand=63-strand….. gp61-=gp1+; gp 62-=gp210+, gp 63=209….etc). So, it is like there are two giant crossovers if you map them in a phamerator fashion. I know that BLASTN searches also the reverse complement, but all of these genes are listed as Q1:S1 essentially, not reverse.

So, I think even if I assume that my genome should have gp 61 as gp1, there is still an inversion relative to each other. Otherwise, I would have gp1=gp210, gp2=gp209….

Finally, there are other phages (published more recently but less sequence conservation) that have the same strand but whose halves are flipped relative to mine. gp1+=gp142+, gp27-=gp168-, gp32-=gp172-

So, my current hypothesis is that phage A may be in GenBank as reverse complement but that doesn't matter. Coding is strand agnostic, so I think if I had gp1=gp210, gp2=gp209…. for the whole genome it would not matter, right?

Q1: Is it possible that my assembly should be inverted in some way? "Complementing the contig" for the whole genome? Even if I were to base it on the reverse complement, it would still have an inversion compared to the other.

Q2: Or, should I start with gp1 in the middle and flip the halves?

Q3: Alternatively, am I thinking about this whole thing stupidly and there really is no consensus on which strand or where to start gp1?

Thanks.
| posted 07 Aug, 2022 18:04
More info added in bold

Hi, I have a (circular) genome that was assembled at NCSU and its closest hit is a phage that was annotated in 2010 at Wellcome Trust Sanger back in 2010 (NC_015296.1). So, no UPitt assembly QC on either, I'm afraid. Hopefully, someone will help me wrap my head around this.

The strand is completely opposite between the two and there is a relative inversion in the middle. That is, all my forward genes are listed as complement in the other phage and vice versa. But, (with gp210 approximately the ends of both) the gp numbers are partially inverted gp1+strand=gp64-strand, gp2+strand=63-strand….. gp61-=gp1+; gp 62-=gp210+, gp 63=209….etc). So, it is like there are two giant crossovers if you map them in a phamerator fashion. I know that BLASTN searches also the reverse complement, but all of these genes are listed as Q1:S1 essentially, not reverse.

So, I think even if I assume that my genome should have gp 61 as gp1, there is still an inversion relative to each other. Otherwise, I would have gp1=gp210, gp2=gp209….

Finally, there are other phages (published more recently but less sequence conservation) that have the same strand but whose halves are flipped relative to mine. gp1+=gp142+, gp27-=gp168-, gp32-=gp172-

So, my current hypothesis is that phage A may be in GenBank as reverse complement but that doesn't matter. Coding is strand agnostic, so I think if I had gp1=gp210, gp2=gp209…. But, we have about 80% NKF and so large-scale functional synteny is a challenge but tapemeasure protein appears to be after the transition from forward to reverse around our gp 30. So it is early in the genome with our gp1 but not in the forward direction, and a couple of tail fiber proteins are "after" tapemeasure as reverse genes. So that would indicate that our assembly is maybe entirely reverse complement compared to the consensus setup?

Q1: Is it possible that my assembly should be inverted in some way? "Complementing the contig" for the whole genome? (Even if I were to base it on the reverse complement, it would still have an inversion compared to the other.)

Q2: Or, should I start with gp1 in the middle and flip the halves?

Thanks.
| posted 08 Aug, 2022 19:10
q1: no assembler that I know of is aware of scientific standards about which strand should be the top strand and which should be the bottom strand. These standards are determined in a community-by-community way and so vary from one system to another. For example in eukaryotes we typically use the standard set by the cytologists and how they present whole chromosomes. Thus, in a very high quality assembly (where we probably have evidence for the locations of centromeres and telomeres) we will publish the sequence to match the typical cytological display.

The phagesdb community has standards for determination of base 1 and strand. For all our phage deterination of Base 1 determination depends on the type of phage end structure, while strand is usually picked so the structural genes are top strand and near the beginning of the sequence. Dan posted some videos here with lots of help on this but you need to be able to look at the raw assembly to answer some of these questions.

Finally, a lot of published sequences are actually sequences of prophage and base 1 and orientation are set by the location of the insertion site and the standard orientation for the host genome. [based on your gene matching I think this is the case for NC_015296.1] There are a collection of phage like this in the phamerator database where the order and orientation of the sequence has been changed from the genbank record to a different order and orientation so as to match (as best as possible) the typical order in the phamerator database, this makes drawing and interpreting the comparison maps at phamerator.org much easier.

Q2: I would recommend you set your base 1 and strand using a similar stratagy, that is to say, pick the base 1 and orientation to make the subsequent steps of comparison as easy as possible. But that of course depends on what you're comparing your genome to. The good news here is that DNA master has a nice feature if you want to "roll" the genome around to set a different base 1 as well as the ability to switch to the complementary strand.
| posted 08 Aug, 2022 20:56
Finally, a lot of published sequences are actually sequences of prophage and base 1 and orientation are set by the location of the insertion site and the standard orientation for the host genome. [based on your gene matching I think this is the case for NC_015296.1]
Ah, ok, that is interesting. I hadn't suspected that.

I would recommend you set your base 1 and strand using a similar stratagy, that is to say, pick the base 1 and orientation to make the subsequent steps of comparison as easy as possible. But that of course depends on what you're comparing your genome to. The good news here is that DNA master has a nice feature if you want to "roll" the genome around to set a different base 1 as well as the ability to switch to the complementary strand.
Great information, Chris. I'll give that a try. Maybe it will also help us pull out a few more functions for structural genes by synteny. Thank you so much for your help.
 
Login to post a reply.