Archive for the ‘microbes’ Category

A Reference “Hologenome” for D. melanogaster

Reference genomes are crucial for many applications in next-generation sequencing (NGS) analyses, especially in whole genome resquencing studies that form the basis of population genomics. Typically, one uses an off-the-shelf reference genome assembly for your organism of interest (e.g. human, Drosophila, Arabidopsis) obtained from the NCBI, UCSC or Ensembl genome databases. However, single-organism reference genomes neglect the reality that most organisms live in symbiosis with a large number of microbial species. When performing a whole-genome shotgun sequencing experiment from a macro-organisms like D. melanogaster, it is inevitable that some proportion of its symbionts will also be sequenced, especially endosymbionts that live intracellularly like Wolbachia. To reduce mismapping and generate materials for population genomics of symbionts and their hosts, it is therefore approporiate to map to the genome of the species of interest as well as to the genomes of symbiotic microbial species, where available. However, reference genomes are currently stored on a per-species basis and the “hologenome” for any organism does not yet exist in a readily accessible form.

In my limited experience, attempting to construct such a “hologenome” can be very a tedious and potentially error-prone process since reference genomes for model metazoan species and microbes are not always available in the same location or format. In gearing up for the second iteration of our work on the population genomics of microbial symbionts of D. melanogaster, I have decided to script the construction of a D. melanogaster reference hologenome for all microbial symbionts associated with D. melanogaster whose genomes are publicly available, which I would thought may be of use to others. The code for this process is as follows and should work on any UNIX-based machine.

makeDmelHologenome.shview rawview file on GitHub

Finally, since some tools (e.g. SAMtools faidx) require all reference genome sequences to have the same number of characters per line, but since different genome databases use different numbers of characters per line the file above will have heterogeneous character counts for different species. To fix this, I use fasta_formatter from the FASTX toolkit to convert the dm3_hologenome.fa into a file with fixed character lengths. To download and run this script, with the conversion to fixed line lengths, execute the following:

$ wget
$ sh
$ fasta_formatter -i dm3_hologenome.fa -o dm3_hologenome_v1.fa -w 50

Credits: Thanks go to Douda Bensasson for some SED tips during the construction of this script.

Compendium of Drosophila Microbial Symbiont Genome Assemblies

Posted 25 Jan 2013 — by caseybergman
Category drosophila, genome bioinformatics, microbes

Following on from a recent post where we compiled genome assemblies for species in the Drosophila genus, here we bring together the growing list of genome assemblies for microbial symbionts of Drosophila species. Please add information about any other Drosophila symbiont genome sequences assemblies that might be missing from this table in the comments below.

Assembly Producer
Wolbachia pipientis wMel D. melanogaster TIGR AE017196
Wolbachia pipientis wMelPop D. melanogaster O’Neill Lab AQQE00000000
Wolbachia pipientis wAu D. simulans Sinkins Lab LK055284
Wolbachia pipientis wSim D. simulans Salzberg Lab AAGC00000000
Wolbachia pipientis wRi D. simulans Andersson Lab NC_012416
Wolbachia pipientis wHa D. simulans Andersson Lab NC_021089
Wolbachia pipientis wNo D. simulans Andersson Lab NC_021084
Wolbachia pipientis wSuzi D. suzukii Anfora Lab CAOU02000001
Wolbachia pipientis wAna D. annanasae Salzberg Lab AAGB00000000
Wolbachia pipientis n.a. D. willistoni Salzberg Lab AAQP00000000
Wolbachia pipientis wRec D. recens Bordenstein Lab JQAM00000000
Pseudomonas entomophila L48 D. melanogaster Genoscope NC_008027
Commensalibacter intestini A911 D. melanogaster Lee Lab AGFR00000000
Acetobacter pomorum DM001 D. melanogaster Lee Lab AEUP00000000
Gluconobacter morbifer G707 D. melanogaster Lee Lab AGQV00000000
Providencia sneebia DSM 19967 D. melanogaster Lazzaro Lab AKKN00000000
Providencia rettgeri Dmel1 D. melanogaster Lazzaro Lab AJSB00000000
Providencia alcalifaciens Dmel2 D. melanogaster Lazzaro Lab AKKM00000000
Providencia burhodogranariea DSM 19968 D. melanogaster Lazzaro Lab AKKL00000000
Lactobacillus brevis EW D. melanogaster Lee Lab AUTD00000000
Lactobacillus plantarum WJL D. melanogaster Lee Lab AUTE00000000
Lactobacillus plantarum DmelCS_001 D. melanogaster Douglas Lab JOJT00000000
Lactobacillus fructivorans DmelCS_002 D. melanogaster Douglas Lab JOJZ00000000
Lactobacillus brevis DmelCS_003 D. melanogaster Douglas Lab JOKA00000000
Acetobacter pomorum
DmelCS_004 D. melanogaster Douglas Lab JOKL00000000
Acetobacter malorum DmelCS_005 D. melanogaster Douglas Lab JOJU00000000
Acetobacter tropicalis DmelCS_006 D. melanogaster Douglas Lab JOKM00000000
Lactococcus lactis Bpl1 D. melanogaster Douglas Lab JRFX00000000
Enterococcus faecalis Fly1 D. melanogaster Broad Institute ACAR00000000

Wolbachia and Mitochondrial Genomes from Drosophila melanogaster Resequencing Projects

As part of ongoing efforts to characterize complete genome sequences of microbial symbioints of Drosophila species, the Bergman Lab has been involved in mining complete genomes of the Wolbachia endosymbiont from whole-genome shotgun sequences of D. melanogaster. This work is inspired by Steve Salzberg and colleagues’ pioneering paper in 2005 showing that Wolbachia genomes can be extracted from the whole-genome shotgun sequence assemblies of Drosophila species. We have adapted this technique to utilize short-read next generation sequencing data from population genomics projects in D. melanogaster, together with the reference Wolbachia genome published by Wu et al. (2005).

Currently we have identified infection status and extracted nearly-complete genomes from the two major resequencing efforts in D. melanogaster: the Drosophila Genetic Reference Panel (DGRP) and Drosophila Population Genomics Project (DPGP). We employ a fairly standard BWA/SAMtools  pipeline, with a few tricks that are essential for getting good quality assemblies and consensus sequences. Our first iteration of this pipeline was used to predict infection status in the DGRP strains. These results have been published in the DGRP main paper published in early 2012, and can be found in Supplemental Table 9 of the DGRP paper or more usefully in a comma-separated file at the DGRP website.

We have now applied an update version of our pipeline to both the DGRP and DPGP datasets and have used these “essentially complete” genome sequences to study the recent evolutionary history of Wolbachia in D. melanogaster. In the process of generating these data we received several inquiries about the status of this project so, in the spirit of Open Science that makes genomics such a productive field, we have released the consensus sequences and alignments of 179 Wolbachia and 290 mitochondrial genomes from the DGRP and DPGP prior to publication of our manuscript.

Since we are not the primary data producer for these assemblies, it is not approporiate to employ a data release policy based on the NHGRI guidelines. Instead, we have chosen to release these data under a Creative Commons Attribution 3.0 Unported License. If you use these assemblies or alignments in your projects, please cite the main DGRP and DPGP papers for the raw data and cite the following reference for the mtDNA or Wolbachia assemblies:

Richardson, M.F., L.A. Weinert, J.J. Welch, R.S. Linheiro, M.M. Magwire, F.M. Jiggins & C.M. Bergman (2012) Population genomics of the Wolbachia endosymbiont in Drosophila melanogaster. PLOS Genetics 8:e1003129.

A tar.gz archive contaning a reference-based multiple sequence alignment of release 1.0 of the complete D. melanogaster mtDNA and Wolbachia assemblies can be obtained here. If you have questions about the methods used to produce these assemblies, any associated meta-data from these assemblies, or the content of our manuscript please contact Casey Bergman for details.

Creative Commons License