SEA-PHAGES | All posts created by jross1025

Link to this post \| posted 26 Mar, 2019 19:48
jross1025	Debbie Jacobs-Sera Joe, I think you are confusing apples and oranges here. The 'standard code" settings have no effect anywhere in auto-annotation. Also if you have chosen ATG, GTG, and TTG start codons in the Translation tab of the Local Settings in your preferences, DNA Master will display them. You can select "Bacteria and Plant Plastid Code" in the New Features Tab of the Local Settings. Is that what changes? What is changing on you? debbie Deb: If you go into an autoannotation, Features tab, Description sub-tab, there's a dropdown window called "translation table"–that's what I'm referring to. "Bacteria and Plant Plastid code" is what is chosen in preferences under "new features" but it's displaying "standard code" when you actually run the autoannotation. I'm really just trying to determine if this is going to cause problems later on. I suspect it isn't, but it's worrying….

Link to this post | posted 26 Mar, 2019 19:48

Debbie Jacobs-Sera
Joe,
I think you are confusing apples and oranges here. The 'standard code" settings have no effect anywhere in auto-annotation. Also if you have chosen ATG, GTG, and TTG start codons in the Translation tab of the Local Settings in your preferences, DNA Master will display them.

You can select "Bacteria and Plant Plastid Code" in the New Features Tab of the Local Settings. Is that what changes?

What is changing on you?
debbie

Deb:
If you go into an autoannotation, Features tab, Description sub-tab, there's a dropdown window called "translation table"–that's what I'm referring to. "Bacteria and Plant Plastid code" is what is chosen in preferences under "new features" but it's displaying "standard code" when you actually run the autoannotation. I'm really just trying to determine if this is going to cause problems later on. I suspect it isn't, but it's worrying….

Posted in: DNA Master → DNAM failing to use proper translation code

Link to this post \| posted 26 Mar, 2019 15:24
jross1025	Dear All: Last year I noticed almost accidentally that I had seemingly autoannotated a genome using the so-called "standard" code. Not sure when this first actually popped up because seeing "standard" anything isn't necessarily going to set off red flags, but I'm almost positive that at the time my preferences specified "bacterial and plant plastid" code, which is the correct one. So this most recent year, it crops up again. I do uninstall/reinstall, re-set the preferences, again "bacterial and plant plastid" is selected, and again DNAM is autoannotating using (so it claims….) the "standard" code. Just yesterday evening I observed the same thing happening in one of our computer lab machines, so this is not specific either to my computer or to my download (lab computer software is added by an IT person, who pulls down his own copies of the program from the web). Also just yesterday for the first time I saw a quite different code pop up "CmB….something…." which I think is not even in the drop down window menu. I saw this both in the lab and my own machine, so again whatever this is transcends particular hardware/download copy. I'm not sure it's doing any harm–Ugenie5, and maybe LilDestine–were both autoannotated on–again supposedly–"standard" code, and both are in Genbank without having been sent back to me, but this is troubling nonetheless. Any thoughts? Edited 26 Mar, 2019 15:25

Link to this post | posted 26 Mar, 2019 15:24

jross1025

Dear All: Last year I noticed almost accidentally that I had seemingly autoannotated a genome using the so-called "standard" code. Not sure when this first actually popped up because seeing "standard" anything isn't necessarily going to set off red flags, but I'm almost positive that at the time my preferences specified "bacterial and plant plastid" code, which is the correct one. So this most recent year, it crops up again. I do uninstall/reinstall, re-set the preferences, again "bacterial and plant plastid" is selected, and again DNAM is autoannotating using (so it claims….) the "standard" code. Just yesterday evening I observed the same thing happening in one of our computer lab machines, so this is not specific either to my computer or to my download (lab computer software is added by an IT person, who pulls down his own copies of the program from the web). Also just yesterday for the first time I saw a quite different code pop up "CmB….something…." which I think is not even in the drop down window menu. I saw this both in the lab and my own machine, so again whatever this is transcends particular hardware/download copy. I'm not sure it's doing any harm–Ugenie5, and maybe LilDestine–were both autoannotated on–again supposedly–"standard" code, and both are in Genbank without having been sent back to me, but this is troubling nonetheless. Any thoughts?

Edited 26 Mar, 2019 15:25

Posted in: DNA Master → DNAM failing to use proper translation code

Link to this post \| posted 11 Jul, 2018 19:06
jross1025	There appears to be some issue right now with adding a gene in PECAAN (at least in my hands…. This is new–I distinctly remember doing it successfully yesterday or the day before. But just now when I went in to annotate a frameshift and I clicked on the "add a gene" button the window doesn't seem to display properly. In two different browsers, chrome and Firefox, what you see at the very top is the "forward/reverse" drop down menu, and below that the "advanced" button, but there's no place visible to enter the "stop" coordinate. Again, this wasn't the case just yesterday or the day before, and I've fiddled with it today for quite some time. BTW this is in the LilDestine genome if that makes a difference. Thoughts?

Link to this post | posted 11 Jul, 2018 19:06

jross1025

There appears to be some issue right now with adding a gene in PECAAN (at least in my hands… smile

. This is new–I distinctly remember doing it successfully yesterday or the day before. But just now when I went in to annotate a frameshift and I clicked on the "add a gene" button the window doesn't seem to display properly. In two different browsers, chrome and Firefox, what you see at the very top is the "forward/reverse" drop down menu, and below that the "advanced" button, but there's no place visible to enter the "stop" coordinate. Again, this wasn't the case just yesterday or the day before, and I've fiddled with it today for quite some time. BTW this is in the LilDestine genome if that makes a difference.
Thoughts?

Posted in: PECAAN → problem with adding a gene

Link to this post \| posted 26 May, 2018 21:51
jross1025	Welkin Pope hi Joe, you've got most of the nuances. the idea with the "longest start language" is that there could be a conserved start upstream of the starts chosen in all the files. In our dataset, we've annotated phages as we sequenced them. So the start we selected when the phage was a singleton may not be the best choice. you could imagine a scenario in which we now have ten phages in the cluster, and they all have a longer start in common, even though we selected the shorter start for all the others because we didn't have enough comparative data. So if you found such a start in the alignment, regardless of whether it was chosen most often in the GenBank files, it is probably time to do a reassessment, and reannotate all equivalent genes across the cluster. does that make more sense? YES, THANKS

Link to this post | posted 26 May, 2018 21:51

jross1025

Welkin Pope
hi Joe,
you've got most of the nuances.

the idea with the "longest start language" is that there could be a conserved start upstream of the starts chosen in all the files. In our dataset, we've annotated phages as we sequenced them. So the start we selected when the phage was a singleton may not be the best choice. you could imagine a scenario in which we now have ten phages in the cluster, and they all have a longer start in common, even though we selected the shorter start for all the others because we didn't have enough comparative data.
So if you found such a start in the alignment, regardless of whether it was chosen most often in the GenBank files, it is probably time to do a reassessment, and reannotate all equivalent genes across the cluster.

does that make more sense?

YES, THANKS

Posted in: Starterator → Help! I don't understand what this means!

Link to this post \| posted 25 May, 2018 22:04
jross1025	I seem to have done a crappy job teaching the new notes format so I'm having to do a lot of cleanup on a very late file, and believe it or not it is just now occurring to me that I don't have a clue what this language taken straight from the starterator guide means: When interpreting Starterator data, in general the start that is present in all genes that yields the longest possible gene is the correct one. The underlying rationale for this is that upstream sequence is more likely to vary than protein encoding sequence, and so the most conserved start that yields the longest genes should be selected. As always, there are exceptions to this, and so sometimes the analysis is not informative or not applicable. Examples of this will be described below. I've simply been notating "SS" when the auto-annotated start is the same one that the large majority of nondraft annotations use, without even asking if it's going to give the longest gene. If there doesn't seem to be a start that significantly more than half of nondrafts use then I call it NI and when the start isn't even available I call it NA. Don't know if what I'm doing is correct or not.

Link to this post | posted 25 May, 2018 22:04

jross1025

I seem to have done a crappy job teaching the new notes format so I'm having to do a lot of cleanup on a very late file, and believe it or not it is just now occurring to me that I don't have a clue what this language taken straight from the starterator guide means:

When interpreting Starterator data, in general the start that is present in all genes
that yields the longest possible gene is the correct one. The underlying rationale
for this is that upstream sequence is more likely to vary than protein encoding
sequence, and so the most conserved start that yields the longest genes should be
selected. As always, there are exceptions to this, and so sometimes the analysis is
not informative or not applicable. Examples of this will be described below.

I've simply been notating "SS" when the auto-annotated start is the same one that the large majority of nondraft annotations use, without even asking if it's going to give the longest gene. If there doesn't seem to be a start that significantly more than half of nondrafts use then I call it NI and when the start isn't even available I call it NA. Don't know if what I'm doing is correct or not.

Posted in: Starterator → Help! I don't understand what this means!

Link to this post \| posted 10 Apr, 2018 15:54
jross1025	I'm still a little unclear what's going on here: SIF-BLAST [NKF / function, database, phage name, gene number, database gene accession number, %alignment, evalue] SIF-HHPred [NKF / function, database, phage name, gene number, database accession number, %alignment*, probability] SIF-Syn: ("syn" refers to "synteny" ) [NKF / function, phage(s) used to infer ] I understand that y'all want us to pursue all three lines of evidence, but if the bottom line in one (or all) of them is "NKF" do we still need all the other stuff, i.e. gene number etc (and where are we going to get "gene accession number" from BLAST results? I don't remember ever seeing that in there….likewise what does "database" refer to in HHPred? We just use the standard set of databases (or as close as we can get) that come from the drop-down window in HHPred for basically every run, and quite frankly I wouldn't know how to do otherwise, certainly I wouldn't know how to explain to students how to do otherwise. jross

Link to this post | posted 10 Apr, 2018 15:54

jross1025

I'm still a little unclear what's going on here:

SIF-BLAST [NKF / function, database, phage name, gene number, database gene accession number, %alignment, evalue]
SIF-HHPred [NKF / function, database, phage name, gene number, database accession number, %alignment*, probability]
SIF-Syn: ("syn" refers to "synteny" ) [NKF / function, phage(s) used to infer ]
I understand that y'all want us to pursue all three lines of evidence, but if the bottom line in one (or all) of them is "NKF" do we still need all the other stuff, i.e. gene number etc (and where are we going to get "gene accession number" from BLAST results? I don't remember ever seeing that in there….likewise what does "database" refer to in HHPred? We just use the standard set of databases (or as close as we can get) that come from the drop-down window in HHPred for basically every run, and quite frankly I wouldn't know how to do otherwise, certainly I wouldn't know how to explain to students how to do otherwise. jross

Posted in: Notes and Final Files → Clarification regarding "SIF"

Link to this post \| posted 20 Mar, 2018 17:25
jross1025	I seem to recall that before the re-design of the HHPred website we got rather good correspondence between NCBI BLASTp and HHPred, at least in terms of how one could use HHPred to sort of confirm a function call based originally only or mainly on BLAST. Generally speaking in our experience really good BLAST was a good predictor of quite good (at least) HHPred. We have only just now begun to use the “new” site and results seem quite different. For example, feature 4 in our autoannotation of ugenie5, commencing at 3230bp, translates as follows: MADLGIRVDADSLVLWRGRDFKWNFENLDASQTPIPYPPGRLFFELQTGGEHNALHRVYITGATGGTYTLKCNGIDTAAIDYNDVSENPQGLAGDITDAVLGAVGAGNAVIHPVSLYPAWTLNFNLNSSKPLTEQLVNTINKTANDFFDTFDSLLGVDVEMTVTDQLNFKLVVTSRRSFDEVGVVTFAVDVTSTAVKNFFNAAAGLIGAVNAVSTDFYWNREYNIEYTGDLALTPIPATTANATGLVGTNKRIVTEVLEPGKEPMTIWEFVIEDSIASIKIESEEADKIANRVKWQLVFLPEGEVAGGDPIALGTVSKVGZ In NCBI BLASTp, this returns numerous very high quality hits 100% aligned 1:1, E values of zero or very nearly so and extensively aligned, with top scores well over 1000. The highest scoring hit, and in total 7 of the top ten hits, call either “minor tail protein” or “tail protein” as functions. However, at least in our hands when the same amino acid sequence is run through HHPred using the PDB_mmCIF70_25Feb, SCOPe70_2.07, Pfam-a_v31.0, and NCBI_conserved_domains(CD)_v3.16 databases (the closest I could get to what is suggested in the SEA online guide), you get a very different picture: seemingly patchy alignments (at least from what the graphic looks like) and VERY high E values (over 100). We’re accustomed to seeing this kind of thing only when the BLAST results themselves are rather iffy. Thoughts, anyone?

Link to this post | posted 20 Mar, 2018 17:25

jross1025

I seem to recall that before the re-design of the HHPred website we got rather good correspondence between NCBI BLASTp and HHPred, at least in terms of how one could use HHPred to sort of confirm a function call based originally only or mainly on BLAST. Generally speaking in our experience really good BLAST was a good predictor of quite good (at least) HHPred. We have only just now begun to use the “new” site and results seem quite different. For example, feature 4 in our autoannotation of ugenie5, commencing at 3230bp, translates as follows:

MADLGIRVDADSLVLWRGRDFKWNFENLDASQTPIPYPPGRLFFELQTGGEHNALHRVYITGATGGTYTLKCNGIDTAAIDYNDVSENPQGLAGDITDAVLGAVGAGNAVIHPVSLYPAWTLNFNLNSSKPLTEQLVNTINKTANDFFDTFDSLLGVDVEMTVTDQLNFKLVVTSRRSFDEVGVVTFAVDVTSTAVKNFFNAAAGLIGAVNAVSTDFYWNREYNIEYTGDLALTPIPATTANATGLVGTNKRIVTEVLEPGKEPMTIWEFVIEDSIASIKIESEEADKIANRVKWQLVFLPEGEVAGGDPIALGTVSKVGZ

In NCBI BLASTp, this returns numerous very high quality hits 100% aligned 1:1, E values of zero or very nearly so and extensively aligned, with top scores well over 1000. The highest scoring hit, and in total 7 of the top ten hits, call either “minor tail protein” or “tail protein” as functions. However, at least in our hands when the same amino acid sequence is run through HHPred using the PDB_mmCIF70_25Feb, SCOPe70_2.07, Pfam-a_v31.0, and NCBI_conserved_domains(CD)_v3.16 databases (the closest I could get to what is suggested in the SEA online guide), you get a very different picture: seemingly patchy alignments (at least from what the graphic looks like) and VERY high E values (over 100). We’re accustomed to seeing this kind of thing only when the BLAST results themselves are rather iffy.

Thoughts, anyone?

Posted in: Functional Annotation → Excellent BLAST but poor HHPred

Link to this post \| posted 06 Jul, 2017 20:57
jross1025	Hi everyone: At the symposium last month someone spoke about a column-less method of DNA extraction using a Zn precipitation. I looked up the paper and it seems we don' have access to the whole thing without a lot of gyrations. Can anyone send me the protocol? It would be much appreciated.

Posted in: Phage Discovery/Isolation → DNA isolation w/Zn precipitation

Link to this post \| posted 21 Mar, 2017 15:04
jross1025	I am just now noticing that the score that we tend to pay attention to when we run scans in this software package is an "inFernal" score, not "inTernal" as I originally read it. This doesn't appear to be a typo as it also occurs in the help documentation, but they really don't explain it. Is this just a strange contraction for "inferential"? Or is it some kind of obscure (at least to me) statistical term? A google search doesn't seem to help….

Link to this post | posted 21 Mar, 2017 15:04

jross1025

I am just now noticing that the score that we tend to pay attention to when we run scans in this software package is an "inFernal" score, not "inTernal" as I originally read it. This doesn't appear to be a typo as it also occurs in the help documentation, but they really don't explain it. Is this just a strange contraction for "inferential"? Or is it some kind of obscure (at least to me) statistical term? A google search doesn't seem to help….

Posted in: tRNAs → "Infernal" score in tRNAScanSE?

Link to this post \| posted 07 Feb, 2017 15:01
jross1025	Yesterday about 4 I started a BLAST-all-genes for our L cluster guy using the updated DNA Master with the secure NCBI connections–first time I've done this since the update. Result was very spotty–lots of genes skipped. Right now I'm doing "BLAST this gene" one at a time, which seems to be doing fine, but that's a pain. Should I have waited till late at night or on the weekend? Also I'm seeing something now in one gene that I've only ever seen in HHPred, not in BLAST: E values greater than 1. If this is really a mathematical probability, how is that possible? Is it just BLAST's way of saying "this is a crummy hit (which it was)"?

Link to this post | posted 07 Feb, 2017 15:01

jross1025

Yesterday about 4 I started a BLAST-all-genes for our L cluster guy using the updated DNA Master with the secure NCBI connections–first time I've done this since the update. Result was very spotty–lots of genes skipped. Right now I'm doing "BLAST this gene" one at a time, which seems to be doing fine, but that's a pain. Should I have waited till late at night or on the weekend? Also I'm seeing something now in one gene that I've only ever seen in HHPred, not in BLAST: E values greater than 1. If this is really a mathematical probability, how is that possible? Is it just BLAST's way of saying "this is a crummy hit (which it was)"?

Posted in: DNA Master → BLASTing whole genome with "secure" connection

Recent Activity

All posts created by jross1025