Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.
Recent Activity
SIF-Blast; SIF-HHPred; SIF-Syn
Link to this post | posted 06 Mar, 2018 19:10 | |
---|---|
|
Hi Greg— I will try to get to all of this and/or update the guide to clarify for everyone. As of right now, all of the entries in the phagesdb database should match the entries for the same phages in the NCBI GenBank database. The reason for BLASTing against NCBI is to find information that is not found at phagesdb. So SIF-BLAST will be more complete if you use NCBI. IF your top hit is not a phage, but has a good e value and % alignment, that is OK. you should still report it. As we move into more distantly related hosts, we are likely to see more database matches that are not just actinobacteriophages. It certainly is a pain to find the gene number when it is not a phage. You may omit the gene number if it is not a phage. Make sure you supply the NCBI gene record number if you can't find the gene number. phagesdb uses the BLAST package that NCBI provides. the % alignment does not come with the package as a reported number the way it does when you BLAST on the NCBI site or through DNA Master. hhpred is both for finding new functions and for supporting your BLAST functional assignments. The two outputs should agree, or at least not assign two completely different functions. There are many phage genes that have been crystalized or added to the pFam database. If your best match is not a phage, supply the organism name and the database record number. synteny: comparing three to five phages should give you a good idea about what genes to look for. you should also scroll through all the pham pages on phagesdb to make sure that you are not missing underreported functions, which is what can happen when you choose 3-5 genomes are random. Synteny can be used for more than those twelve genes. those 12 are the minimum that it can be used for. I will clarify in teh guide. I agree that NA is probably more correct. Either NA or NKF is fine. You do not know that the five phages are correct simply because they have a function listed and the rest don't. The first phage gene could have been assigned that function in error and the rest could be blind copies— this is why we are having three lines of function investigations for every gene. Which brings to me to the next point— conflicting functional assignments. If all the assignments are variations on the same function (LysA, endolysin, lysin A) choose the function that matches the official list. If the functions do not agree (portal vs capsid morphogenesis protein) you've found a database error. you will have to use the rest of your investigations to figure out what the right answer is. hopefully soon we will get some kind of tracker going for people to report database errors so we can fix them. You should not pick the most specific function for your gene unless you can support it. We want the most specific supportable function for each gene. as far as synteny goes, yes, you can use it on everything, but it just isn't as important in all cases. it is very important in the structural genes, less so in the integration cassette, still worthwhile in genes that have partners, like RecE and RecT. As we uncover more functions, we may find more genes sets that are always together, and therefore synteny should always be evaluated. GregFrederick@letu.edu |
Link to this post | posted 06 Mar, 2018 19:24 | |
---|---|
|
GregFrederick@letu.eduGregFrederick@letu.edu. Hi Greg, I have updated the Guide, and I think I've clarified all your questions. BEst, Welkin |