Release of 20 European Drosophila melanogaster genomes

11 Comments
Posted 17 Jul 2012 in drosophila, high throughput sequencing, population genomics

[Update: These genome sequences (and 30 others) have been deposited in ENA under project accession ERP009059. If you use these data in your work, please cite as Bergman & Haddrill (2015) Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents. F1000Research 2015, 4:31.]

Background: As part of an ongoing efforts to characterize genetic diversity in the nuclear and cytoplasmic genomes of D. melanogaster, the Haddrill and Bergman Labs have collaborated to sequence the complete genomes of 20 D. melanogaster isofemale strains collected by Penny Haddrill in Montpellier, France in August 2010. These 20 genomes represent a random sample of the full collection described by Haddrill and Vespoor (2011), which also describes microsatellite variation data for these strains.

Following on from the very generous early release of D. melanogaster genomes by major resequencing efforts in Drosophila, we have decided to follow suit and release these genomes prior to publication to maximise their utility by the wider research community and prevent unnecessary duplication of effort. One major aim for our sequencing of a reasonably high number of strains from this European population is to provide a complementary dataset to help interpret the larger samples of North American and (predominantly) African strains from the Drosophila Genetic Reference Panel and Drosophila Population Genomics Project, respectively. For more on the philosophy behind why we have made the decision to release these data early, please see this blog post on genomic data release by individual labs in the next-generation sequencing era.

Methods: Genomic DNA was prepared by Penny Haddrill for each isofemale line by pooling fifty females, snap freezing them in liquid nitrogen, extracting DNA using a standard phenol-chloroform extraction protocol with ethanol and ammonium acetate precipitation. 500 bp short-insert libraries were constructed and 91 bp paired-end reads were generated using an Illumina HiSeq 2000 to an estimated coverage of ~50x per strain by BGI-Hong Kong. Basic QC on reads was performed by BGI and mapping to the Wolbachia genome following the protocol in Richardson et al. (submitted) confirmed the same infection status for as determined by PCR in Haddrill and Vespoor (2011).

Conditions for use: The Haddrill and Bergman labs intend to use these data to study patterns of genetic diversity in the nuclear and cytoplasmic genomes, to estimate the ratio of diversity on the X chromosome relative to the autosomes, to detect signatures of both positive and negative selection in the nuclear and cytoplasmic genomes, and investigate the impact of variation in recombination rate around the genome.

We have decidede to release these genomic data under a Creative Commons CC-BY license, which requires only that you credit the originators of the work as specified below.  However, we hope that users of these data respect the established model of genomic data release under the Ft. Lauderdale agreement that is traditionally honored for major sequencing centers.  Until the paper describing these genomes is published, please cite these data as:

  • Haddrill, P. and C.M. Bergman (2012) 20 Drosophila melanogaster genomes from Montpellier, France. http://bergmanlab.genetics.uga.edu/?p=1685

[Update: These genome sequence have now been published. If you use these data in your work, please cite as Bergman & Haddrill (2015) Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents. F1000Research 2015, 4:31.]

The data: Gzipped Illumina fastq files for forward and reverse paired reads can be downloaded at the following locations. A script to dowload all files in serial can be found below the table.

[Update: These genome sequences (and 30 others) have been deposited in ENA under project accession ERP009059. If you use these data in your work, please cite accession numbers ERR705945-ERR705964 in your work.]


10 Comments

  1. lakshmi

    Can I know more information about the samples and the genome information available on this site? Do these contain information related to all the chromosomes/ only specific ones of Dmel?

  2. caseybergman

    These samples are whole-genome shotgun sequences, so they contain sequence data for the entire set of genomes (nuclear, cytoplasmic, symbiotic) that are present in the sample.

  3. Pablo Duchen

    We have assembled these genomes and noticed that every SNP in the sample is also heterozygous for one of the alleles. If we were to mask all heterozygotes to N’s (which is a standard procedure for downstream pop-gen analyses) we would end up with zero polymorphism. We tried different mapping programs and also treated the reads as “single-end” but the problem remains. If anyone has a hint on what might be going on I would appreciate your input.

  4. caseybergman

    Dear Pablo – thanks for posting this information. We have not analysed these data extensively, so I can’t give a definitive answer about why this might be happening. But we will look into it further and post what we find here as soon as we can.

  5. Anne

    Hello,

    These seem to be gone. Is it just a temporary server error, or are these reads moving to another page?

  6. Roger

    Hello,

    I tried the links and it looks like the sequences are offlined? Is there anyway to get access to them?

    Thanks,

    -R

  7. caseybergman

    Anne/Roger –

    Thanks for pointing this out. We are moving resources to a new server and haven’t set these filepaths correctly. I’ll try to make sure this gets fixed asap.

    Best, Casey

  8. caseybergman

    Anne/Roger –

    These files are now web-accessible, many thanks for bringing this to our attention.

    Best, Casey

  9. zhoujj

    Hi Caseybergman, I can’t download the dataset. It seens that the download link are broken. Could you please fix it?

  10. caseybergman

    Hi zhoujj –

    This is fixed now, thanks for the notice.

    – Best, Casey


1 Trackbacks/Pingbacks

  1. Genomes of 20 European D. melanogaster strains. 23 08 12

Add Your Comment