SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

Genes Across COS sites???

| posted 16 Feb, 2016 23:22
I have not looked yet. But are there any examples of genes spanning the COS site?

We have about 1kb on the right end of one of our genomes that has no called features.

Has anyone ever looked for genes that span the COS sites? Is there an easy way to do that? I'm just wondering if we should do so. Or if we might be missing genes there if we do not look.

Thanks.
| posted 17 Feb, 2016 17:17
Many Cluster C phages have a gene that spans the physical end. This gives many computer programs fits, its one of the reasons for Starterator crashing on some phage. Also Phamerator has issues as well (although, thankfully it does not crash) and the whole genome maps created by phamerator often don't include genes of that type.

Glimmer (and maybe GeneMark) will predict genes that span the ends if you tell it that you have a circular genome (DNA Master does do this when it submits the sequence to NCBI for auto-annotation). So it is possible they will show up on your auto-annotation list.

As for finding them, I always have my students check all "largish" regions without genes (say larger than 150 bp) by BLAST. You can have DNA Master locate these "holes" automatically: in DNA Master click the "Validate" button below the feature list, then in the bottom right panel click the "control" tab and then "Locate gray holes" with a size of 150. The resulting list gives the positions and sequences of the "holes" which can then be used to search specifically by BLASTX to the protein database. If students do find hits, I would have them consider the quality of the hit (is it real or spurious) and examine the region carefully for a missing gene (evidence would include coding potential and the presence of an ORF that does not have too much overlap with other genes).
Edited 17 Feb, 2016 18:38
| posted 17 Feb, 2016 19:38
cdshaffer
Many Cluster C phages have a gene that spans the physical end. This gives many computer programs fits, its one of the reasons for Starterator crashing on some phage. Also Phamerator has issues as well (although, thankfully it does not crash) and the whole genome maps created by phamerator often don't include genes of that type.

Glimmer (and maybe GeneMark) will predict genes that span the ends if you tell it that you have a circular genome (DNA Master does do this when it submits the sequence to NCBI for auto-annotation). So it is possible they will show up on your auto-annotation list.

As for finding them, I always have my students check all "largish" regions without genes (say larger than 150 bp) by BLAST. You can have DNA Master locate these "holes" automatically: in DNA Master click the "Validate" button below the feature list, then in the bottom right panel click the "control" tab and then "Locate gray holes" with a size of 150. The resulting list gives the positions and sequences of the "holes" which can then be used to search specifically by BLASTX to the protein database. If students do find hits, I would have them consider the quality of the hit (is it real or spurious) and examine the region carefully for a missing gene (evidence would include coding potential and the presence of an ORF that does not have too much overlap with other genes).

Great info! Thanks again! Greg
 
Login to post a reply.