SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by fbaliraine

| posted 26 Apr, 2020 20:40
Thanks a lot Debbie!
This helps a lot.
Case closed!
Fred
Posted in: AnnotationTricky Start position decision
| posted 26 Apr, 2020 02:34
Hi Debbie,
Thanks for your response. It is phage Heath. The file is attached. I inserted gp 60, so anything beyond gp 60 will be plus 1bp from the draft sequence. The question is about feature 79 (78 in Draft sequence). I’ve checked, Starterator and realize that it is an Orpham, no data. Now that you have the file, could you kindly as well look at gp 74 (73 in draft), it is also an Orpham. The explanatory notes for available in the notes section. In both cases, it seams to me that GL/GM are is preferring ATG over GTG starts.
Thanks!
Fred
Edited 26 Apr, 2020 02:44
Posted in: AnnotationTricky Start position decision
| posted 25 Apr, 2020 08:55
Dear Phage hunters,
Please help me decide on this start.
Dilemma: From BLAST searches, everyone else seems to have selected the start called by both GM and GL. I don’t want to go against the tide but…Whereas the start from 63861 bp (ATG) which was called by both GL & GM gives the best RBS score (Z = 3.35, FS -1.993), it leaves a huge (423 bp) gap, and gives a short (162 bp) ORF, compared to position at 64122 bp (GTG) , which has RBS score (Z= 2.151, Fs -5.229), has a 30bp overlap which is acceptable according to the guiding principles of annotation, and gives a far longer ORF (423 bp vs 162).
Thanks!
Posted in: AnnotationTricky Start position decision
| posted 14 Mar, 2019 20:15
Steven Caruso
If you look at the tape measure protein in coliphage HK97: https://www.ncbi.nlm.nih.gov/protein/NP_037710.1. It's a big protein, 1089 amino acids long. I think what HHPred is seeing is your protein looks like a small chunk of it, but clearly you have a much better candidate in gp17 for tape measure.

Since it is far past tape measure, and not one of the large genes immediately following it, it's hard to justify using synteny to call it a minor tail protein. Your best HHPred hit is to a DUF, as is your second good match, then human signaling proteins. It would be hard for me to make a call on this one other than NKF.

Steve

Thank you Steve! Case closed!
Fred
Posted in: Functional Annotationminor tail protein, tape measure, or NKF?
| posted 14 Mar, 2019 17:52
The following sequence is for Glaske Gp 29 (120 bp):
MNLTDALRTAAEVYNPDDTIDLLGLFIIGLPGSLPAIAALWVTIRGQRRGRARAQRVDAKTDEIHEHVVNTHTSNMRKDLDDLRELVVDGFRRVERDIGGIREEIRTERKERIAGDRRE

I note that the majority of BLASTps show NKF for Glaske Gp 29. Although top hits show NKF, Glaske Gp 29 hits the minor tail protein of phage Xavier with q1:s1 in phages Db and NCBI. On the other hand though, HHPRED shows no minor tail protein, but hit # 5 (PF06120.11) shows 90% probability with the “tail length tape measure protein’ of phage HK97 (https://toolkit.tuebingen.mpg.de/#/jobs/Glaske_gp29 ). Nevertheless, no Chrystal structure nor related publication is shown on this hit, and even the BLAST hits were direct submissions with no related paper. Note also that Glaske has a clear, tape measure protein at Gp 17 (1217 bp); which is 10x plus longer than the Gp 29 (10bp). What’s the verdict?
I am also attaching a pdf copy of HHPred in case the above hotlink expires.
Posted in: Functional Annotationminor tail protein, tape measure, or NKF?
| posted 23 Jan, 2018 16:47
Question 1: In Spring 2017, the BLAST notes included both sources, i.e. NCBI GeneBank and PhagesDb. This year's summary notes do not include these two. Do we need to include the phagesDb BLAST or just go by the NCBI Blast? See below what I am reading from the example in resource guide (https://seaphagesbioinformatics.helpdocsonline.com/officialdocumentation):

"Blast-Start: [phage name, gene number, database, Q : S, coverage, e value/no significant BLAST alignments]. The best BLAST match for this gene that reflects the alignment at the start of the protein, and the alignment of the gene start with that BLAST match. (For example, “Matches KBG gp32, Query 1 to Subject 1”, 100%, 0.0” or “Aligns with Thibault gp45 q3:s45 65% 10-16). If the best BLAST alignment has an e value above 10-4, report “no significant BLAST alignments”
Which BLAST Database is being assumed in this example: PhagesDB or GenBank?
Question 2: With respect to e-values, supposing you get an e-value of 1.7 x 10-17, should we just record -17 as in the above example or do we include the leading values (1.7 in this example)? In a the same vein, DNA master does not allow for writing exponents; what are we expected to do in this case?
Question 3: In the above example, the the "Query to subject" is given two ways, in one case the words written out in full, while being abbreviated in the other example. Previously we were required to abbreviate as Q:S. For purposes of consistence, which way does the QC Team want us to record this?
Thanks!
Posted in: Notes and Final FilesDocumenting Gene Calls in DNA Master
| posted 21 Mar, 2017 16:08
We are finalizing our annotations but want to be sure of one thing before submission to the QC team. According to Fig 12.2 (attached) in the Resource Guide, the space under “product” is just having the feature number i.e. gp5. I mean, if say we had a “Hypothetical protein” or a known product like “lysin A”, do we still not put this information under product (and thus not replace the “gp5” in this example? Is it the QC team to fill in the product name?
Posted in: Notes and Final FilesFilling in the Product field
| posted 21 Apr, 2016 15:55
We have identified and annotated some programmed frameshifts. When we blast our fusion protein in phagesDb, we note that some hits indicate the frameshift, whereas others do not. In phage Gideon an example of a hit for gp 15, where the frameshift information is indicated is its 1:1, 100% alignment with "Angel_15, tail assembly chaperone; -1 frameshift." Should we note this in the DNA master file? If so, where exactly: Should it be under "Logic" or beside the function i.e. "F: tail assembly chaperone; -1 frameshift", or in under "Product"? Thanks!
Edited 21 Apr, 2016 15:57
Posted in: Frameshifts and IntronsAnnotation Advise: Frameshift
| posted 25 Feb, 2016 23:47
Just to be sure: In The DNA Master description regarding SD scores, i.e. “SD: Final=-#.###, Z=#.###, Best SD” or “No, Final=-#.###, Z=#.###, ##nd Best (explain)’’ since we are providing both the Z value and the final score, which of the two values should we ultimately consider when stating whether it is the best SD score or not? For example, when comparing a final score = -4.711 and Z = 2.412 for one start with final score = -4.122 and z= 2.379 for the alternative start, if we rank on the basis of the final score, then -4.711 is second best, but when we rank on the basis of the Z value, then Z = 2.412 is the best, but you will notice that these two values give conflicting answers yet they belong to the same start position. Should we thus report the SD score ranking on the basis of the final score or the Z value? Thanks!
Posted in: Notes and Final FilesSD Scoring in notes
| posted 23 Feb, 2016 15:48
Just to be sure: Does BLAST in the DNA Master description (BLAST: Matches Phage gp##, q#:s#
Aligns with PhageName gp## q##:s##) strictly refer to BLASTp in NCBI/DNA Master or could it be also in PhageDb? The reason I am asking is because in some cases DNA Master may show a q1:s1 match for a phage but in some cases when I change the start position from the one originally called by Glimmer/GeneMark, I get a q1:s1 in PhagesDb but not in NCBI. For example for position in Gideon_draft 28917 bp start position (Original Glimmer call @bp 2872smile gives a 100% match, q1:s1, E=0.0 with integrase (Y-int) of Phages Phreak, Gomashi, Frosty24, Cedasite, Avrafan, Annihilator, Hope, and Angel in phagesDb, but in NCBI, the top hit is Integrase of Mycobacterium abscessus q66:s2, E = 9e-120 over 61% of the protein.
Posted in: Notes and Final FilesBLAST notes: PhagesDB or GenBank?