2002 |
|
3. | Aggarwal, Gautam; Ramaswamy, Ramakrishna Ab initio gene identification: Prokaryote genome annotation with GeneScan and GLIMMER Journal Article Journal of Biosciences, 27 (1 SUPPL. 1), pp. 7–14, 2002, ISSN: 02505991. Abstract | Links | BibTeX | Tags: GeneScan, GLIMMER @article{Aggarwal2002, title = {Ab initio gene identification: Prokaryote genome annotation with GeneScan and GLIMMER}, author = {Gautam Aggarwal and Ramakrishna Ramaswamy}, url = {https://ramramaswamy.org/papers/083.pdf}, doi = {10.1007/BF02703679}, issn = {02505991}, year = {2002}, date = {2002-01-01}, journal = {Journal of Biosciences}, volume = {27}, number = {1 SUPPL. 1}, pages = {7–14}, abstract = {We compare the annotation of three complete genomes using the ab initio methods of gene identification GeneScan and GLIMMER. The annotation given in GenBank, the standard against which these are compared, has been made using GeneMark. We find a number of novel genes which are predicted by both methods used here, as well as a number of genes that are predicted by GeneMark, but are not identified by either of the nonconsensus methods that we have used. The three organisms studied here are all prokaryotic species with fairly compact genomes. The Fourier measure forms the basis for an efficient non-consensus method for gene prediction, and the algorithm GeneScan exploits this measure. We have bench-marked this program as well as GLIMMER using 3 complete prokaryotic genomes. An effort has also been made to study the limitations of these techniques for complete genome analysis. GeneScan and GLIMMER are of comparable accuracy insofar as gene-identification is concerned, with sensitivities and specificities typically greater than 0.9. The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a significant number of cases, similar results are provided by the two techniques. This suggests that there could be some as-yet unidentified additional genes in these three genomes, and also that some of the putative identifications made hitherto might require re-evaluation. All these cases are discussed in detail.}, keywords = {GeneScan, GLIMMER}, pubstate = {published}, tppubtype = {article} } We compare the annotation of three complete genomes using the ab initio methods of gene identification GeneScan and GLIMMER. The annotation given in GenBank, the standard against which these are compared, has been made using GeneMark. We find a number of novel genes which are predicted by both methods used here, as well as a number of genes that are predicted by GeneMark, but are not identified by either of the nonconsensus methods that we have used. The three organisms studied here are all prokaryotic species with fairly compact genomes. The Fourier measure forms the basis for an efficient non-consensus method for gene prediction, and the algorithm GeneScan exploits this measure. We have bench-marked this program as well as GLIMMER using 3 complete prokaryotic genomes. An effort has also been made to study the limitations of these techniques for complete genome analysis. GeneScan and GLIMMER are of comparable accuracy insofar as gene-identification is concerned, with sensitivities and specificities typically greater than 0.9. The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a significant number of cases, similar results are provided by the two techniques. This suggests that there could be some as-yet unidentified additional genes in these three genomes, and also that some of the putative identifications made hitherto might require re-evaluation. All these cases are discussed in detail. |
1999 |
|
2. | Ramakrishna, Ramaswamy; Srinivasan, Ramachandran Gene identification in bacterial and organellar genomes using GeneScan Journal Article Computers and Chemistry, 23 (2), pp. 165–174, 1999, ISSN: 00978485. Abstract | Links | BibTeX | Tags: Fourier, GeneScan, Haemophilus, Mycoplasma, Plasmodium @article{Ramakrishna1999, title = {Gene identification in bacterial and organellar genomes using GeneScan}, author = {Ramaswamy Ramakrishna and Ramachandran Srinivasan}, url = {https://ramramaswamy.org/papers/071.pdf}, doi = {10.1016/S0097-8485(98)00034-5}, issn = {00978485}, year = {1999}, date = {1999-01-01}, journal = {Computers and Chemistry}, volume = {23}, number = {2}, pages = {165–174}, abstract = {The performance of the GeneScan algorithm for gene identification has been improved by incorporation of a directed iterative scanning procedure. Application is made here to the cases of bacterial and organnellar genomes. The sensitivity of gene identification was 100% in Plasmodium falciparum plastid-like genome (35 kb) and in 98% in the Mycoplasma genitalium genome (‚àº580 kb) and the Haemophilus influenzae Rd genome (‚àº1.8 Mb). Sensitivity was found to improve in both the Open Reading Frames (ORFs) which have been identified as genes (by homology or by other methods) and those that are classified as hypothetical. False positive assignments (at the nucleotide level) were 0.25% in H. influenzae genome and 0.3% in M. genitalium. There were no false positive assignments in the plastid-like genome. The agreement between the GeneScan predictions and GeneMark predictions of putative ORFs was 97% in M. genitalium genome and 86% in H. influenzae genome. In terms of an exact match between predicted genes/ORFs and the annotation in the databank, GeneScan performance was evaluated to be between 72% and 90% in different genomes. We predict five putative ORFs that were not annotated earlier in the GenBank files for both M. genitalium and H. influenzae genomes. Our preliminary analysis of the newly sequenced G + C rich genome of Mycobacterium tuberculosis H37Rv also shows comparable sensitivity (99%). textcopyright 1999 Elsevier Science Ltd. All rights reserved.}, keywords = {Fourier, GeneScan, Haemophilus, Mycoplasma, Plasmodium}, pubstate = {published}, tppubtype = {article} } The performance of the GeneScan algorithm for gene identification has been improved by incorporation of a directed iterative scanning procedure. Application is made here to the cases of bacterial and organnellar genomes. The sensitivity of gene identification was 100% in Plasmodium falciparum plastid-like genome (35 kb) and in 98% in the Mycoplasma genitalium genome (‚àº580 kb) and the Haemophilus influenzae Rd genome (‚àº1.8 Mb). Sensitivity was found to improve in both the Open Reading Frames (ORFs) which have been identified as genes (by homology or by other methods) and those that are classified as hypothetical. False positive assignments (at the nucleotide level) were 0.25% in H. influenzae genome and 0.3% in M. genitalium. There were no false positive assignments in the plastid-like genome. The agreement between the GeneScan predictions and GeneMark predictions of putative ORFs was 97% in M. genitalium genome and 86% in H. influenzae genome. In terms of an exact match between predicted genes/ORFs and the annotation in the databank, GeneScan performance was evaluated to be between 72% and 90% in different genomes. We predict five putative ORFs that were not annotated earlier in the GenBank files for both M. genitalium and H. influenzae genomes. Our preliminary analysis of the newly sequenced G + C rich genome of Mycobacterium tuberculosis H37Rv also shows comparable sensitivity (99%). textcopyright 1999 Elsevier Science Ltd. All rights reserved. |
1997 |
|
1. | S Tiwari S Ramachandran, Bhattacharya Bhattacharya S A; Ramaswamy, R Prediction of probable genes by Fourier analysis of genomic sequences Journal Article Bioinformatics, 13 (3), pp. 263–270, 1997. Abstract | Links | BibTeX | Tags: Fourier, Genes, GeneScan @article{Tiwari1997, title = {Prediction of probable genes by Fourier analysis of genomic sequences}, author = {S Tiwari, S Ramachandran, S Bhattacharya, A Bhattacharya and R Ramaswamy}, url = {https://doi.org/10.1093/bioinformatics/13.3.263}, doi = {10.1093/bioinformatics/13.3.263}, year = {1997}, date = {1997-06-01}, journal = {Bioinformatics}, volume = {13}, number = {3}, pages = {263–270}, abstract = {Motivation: The major signal in coding regions of genomic sequences is a three-base periodicity. Our aim is to use Fourier techniques to analyse this periodicity, and thereby to develop a tool to recognize coding regions in genomic DNA. Result: The three-base periodicity in the nucleotide arrangement is evidenced as a sharp peak at frequency f = 1/3 in the Fourier (or power) spectrum. From extensive spectral analysis of DNA sequences of total length over 5.5 million base pairs from a wide variety or organisms (including the human genome), and by separately examining coding and non-coding sequences, we find that the relative height of the peak at f = 1/3 in the Fourier spectrum is a good discriminator of coding potential. This feature is utilized by us to detect probable coding regions in DNA sequences, by examining the local signal-to-noise ratio of the peak within a sliding window. While the overall accuracy is comparable to that of other techniques currently in use, the measure that is presently proposed is independent of training sets or existing database information, and can thus find general application. Availability: A computer program GeneScan which locates coding open reading frames and exonic regions in genomic sequences has been developed, and is available on request}, keywords = {Fourier, Genes, GeneScan}, pubstate = {published}, tppubtype = {article} } Motivation: The major signal in coding regions of genomic sequences is a three-base periodicity. Our aim is to use Fourier techniques to analyse this periodicity, and thereby to develop a tool to recognize coding regions in genomic DNA. Result: The three-base periodicity in the nucleotide arrangement is evidenced as a sharp peak at frequency f = 1/3 in the Fourier (or power) spectrum. From extensive spectral analysis of DNA sequences of total length over 5.5 million base pairs from a wide variety or organisms (including the human genome), and by separately examining coding and non-coding sequences, we find that the relative height of the peak at f = 1/3 in the Fourier spectrum is a good discriminator of coding potential. This feature is utilized by us to detect probable coding regions in DNA sequences, by examining the local signal-to-noise ratio of the peak within a sliding window. While the overall accuracy is comparable to that of other techniques currently in use, the measure that is presently proposed is independent of training sets or existing database information, and can thus find general application. Availability: A computer program GeneScan which locates coding open reading frames and exonic regions in genomic sequences has been developed, and is available on request |