Genome Annotation and Functional Genomics

As much of post-genomic biological research (including our own) relies upon high quality genome annotations, we are actively engaged in the development and application of computational methods to improve the annotation of functional biological features in genome sequences. Our work focuses on improving annotation of non-protein-coding regions of the genome including conserved noncoding sequences (CNSs), cis-regulatory modules (CRMs), transcription factor binding sites (TFBSs), transposable elements (TEs) and noncoding RNA (ncRNA) genes. Current projects include improving the (i) annotation of TEs in the fly and yeast genomes, (ii) annotation of CRMs and TFBSs in the fly genome, and (iii) analysis of transposon knockout collections in flies. Research in this area is supported by the EC FP7 programme.

Genome and Molecular Evolution

Whole genome sequence data offers an unparalleled resource for the evolutionary analysis of biological sequences, and allows new analyses of regions and types of mutations that were not available in the pre-genomic era. We use diverse comparative sequence analysis methods to understand the patterns, rates and forces of genome evolution within and between species. Currently, our work focuses on (i) the evolution of gene expression and noncoding sequences in the fly genome, (ii) the evolution of TEs in fly and yeast genomes, and (iii) the rate and pattern of genome rearrangement in eukaryotes. Research in this area is supported by HFSP and NERC.

Text and Data Mining

The growing rate of publication in biological sciences has resulted in an explosion of information that can no longer be synthesized by individual scientists. To cope with this flood of information, we are increasingly interested applying text and data mining methods to the areas of genome annotation and functional genomics. Currently, we are pursuing methods for the automated extraction of DNA sequences from full-text articles and mapping to genes and genomes and other strategies for the integration of the literature with genomic data. Research in this area is supported by BBSRC.

Comments are closed.