The UCSC Genome Bioinformatics site is widely regarded as one of the most powerful bioinformatics portals for experimental and computational biologists on the web. While the UCSC Genome Browser provides excellent functionality for loading and visualizing custom data tracks, there are cases when your lab may have genome-wide data it wishes to share internally or with the wider scientific community, or when you might want to use the Genome Browser for an assembly that is not in the main or microbial Genome Browser sites. If so then you might need to install a local copy of the genome browser.
A blog post by noborujs on E-notações explains how to install the UCSC genome browser on an Ubuntu system, and the UCSC Genome genome browser team provides a general walk-through on their wiki and an internal presentation. Here I will provide a similar walkthrough for installing it on a CentOS system. The focus of these posts is go beyond the basic installation to explain the structure of the databases that underlie the browser; how these database table are used to create the web front-end, and how to customize a local installation with your own data. Hopefully having a clearer understanding of the database and browser architecture should make the process of loading your own data far easier.
This blog entry has grown to quite a size, so I’ve decided to split it into 3 more manageable parts and you can find a link to the next part at the end of this post.
The 3 parts are as follows:
1) Introduction and CentOS install (this post)
2) The databases & web front-end
3) Loading custom data
The walk-through presented here installs CentOS 5.7 using VirtualBox so you can follow this on your desktop if you have sufficient disk space. Like the Ubuntu walk-through linked to above, this will speed you through the install process with little or no reference to what the databases actually contain, or how they relate to the browser, this information will be included in part 2.
CentOS Installation
• Grab CentOS 5.7 i386 CD iso’s from one of the mirrors. Discs 1-5 are required for this install.
• Start VirtualBox and select ‘New’. Settings chosen:
• Name = CentOS 5.7
• OS = linux
• Version = Red Hat
• Virtual memory of 1GB
• Create a new hard disk with fixed virtual storage of 64GB
• Start the new virtual machine and select the first CentOS iso file as the installation source
• Choose ‘Server GUI’ as the installation type when prompted
• Set the hostname, time etc. and make sure you enable http when asked about services
• When the install prompts you for a different CD, select “Devices -> CD/DVD devices -> Choose a virtual CD/DVD disk file” and choose the relevant ISO file.
• Login as root, run Package Updater, reboot.
UCSC Browser Install
• Install dependencies:
yum install libpng-devel
yum install mysql-devel
• Set environment variables, the following were added to root’s .bashrc:
export MACHTYPE=i386
export MYSQLLIBS="/usr/lib/mysql/libmysqlclient.a -lz"
export MYSQLINC=/usr/include/mysql
export WEBROOT=/var/www
Each time you open a new terminal these environment variables will be set automatically.
• Download the Kent source tree from http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, unpack it and then build it:
wget http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
unzip jksrc.zip
mkdir -p ~/bin/$MACHTYPE # only used when you build other utilities from the source tree
cd kent/src/lib
mkdir $MACHTYPE
make
cd ../jkOwnLib
make
cd ../hg/lib
make
cd ..
make install DESTDIR=$WEBROOT CGI_BIN=/cgi-bin DOCUMENTROOT=$WEBROOT/html
You will now have all the cgi’s necessary to run the browser in /var/www/cgi-bin and some JavaScript and CSS in /var/www/html. We need to tell SELinux to allow access to these:
restorecon -R -v '/var/www'
• Grab the static web content. First edit kent/src/product/scripts/browserEnvironment.txt to reflect your environment (MACHTYPE, DOCUMENTROOT etc.) then
cd kent/src/product/scripts
./updateHtml.sh browserEnvironment.txt
• Create directories to hold temporary files for the browser:
mkdir $WEBROOT/trash
chmod 777 $WEBROOT/trash
ln -s $WEBROOT/trash $WEBROOT/html/trash
• Set up MySQL for the browser. We need an admin user with full access to the databases we’ll be creating later, and a read-only user for the cgi’s.
In MySQL:
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP,
ALTER, CREATE TEMPORARY TABLES
ON hgcentral.*
TO ucsc_admin@localhost
IDENTIFIED BY 'admin_password';
GRANT SELECT, CREATE TEMPORARY TABLES
ON hgcentral.*
TO ucsc_browser@localhost
IDENTIFIED BY 'browser_password';
The above commands will need repeating for each of the databases that we subsequently create.
We also need a third user that has read/write permissions to the hgcentral database only:
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER
ON hgcentral.*
TO ucsc_readwrite@localhost
IDENTIFIED BY 'readwrite_password';
Note: You should replace the passwords listed here with something sensible!
The installation README (path/to/installation/readme) suggests granting FILE to the admin user, but FILE can only be granted globally (i.e. GRANT FILE ON *.*) , so we must do this as follows:
GRANT FILE ON *.* TO ucsc_admin@localhost;
We now have the code for the browser and a database engine ready to receive some data. As an example, getting a human genome assembly installed requires the following:
• Create the primary gateway database for the browser:
mysql -u ucsc_admin -p -e "CREATE DATABASE hgcentral"
wget http://hgdownload.cse.ucsc.edu/admin/hgcentral.sql
mysql -u ucsc_admin -p hgcentral < hgcentral.sql
• Create the main configuration file for the browser hg.conf and save it in /var/www/cgi-bin:
cp kent/src/product/ex.hg.conf /var/www/cgi-bin/hg.conf
Then edit /var/www/cgi-bin/hg.conf to reflect the specifics of your installation.
• Admin users should also maintain a copy of this file saved as ~/.hg.conf, since the data loader applications look in your home directory for hg.conf. It is a good idea for ~/.hg.conf to be made private (i.e. only the owner can access it) otherwise your database admin password will be accessible by other users:
cp /var/www/cgi-bin/hg.conf ~/.hg.conf
chmod 600 ~/.hg.conf
When we issue commands to load custom data (see part 3) it is this copy of hg.conf that will supply the necessary database passwords.
The browser is now installed and functional, but will generate errors because the databases specified in the hgcentral database are not there yet. The gateway page needs a minimum human database in order to function even if the browser is being built for the display of other genomes.
To install a human genome database, the minimal set of mySQL tables required within the hg19 database is:
• grp
• trackDb
• hgFindSpec
• chromInfo
• gold – for performance this table is split by chromosome so we need chr*_gold*
• gap – split by chromosome as with gold so we need chr*_gap*
To install minimal hg19:
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP,
ALTER, CREATE TEMPORARY TABLES
ON hg19.*
TO ucsc_admin@localhost
IDENTIFIED BY 'admin_password';
GRANT SELECT, CREATE TEMPORARY TABLES
ON hg19.*
TO ucsc_browser@localhost
IDENTIFIED BY 'browser_password';
mysql -u ucsc_admin -p -e "CREATE DATABASE hg19"
cd /var/lib/mysql/hg19
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/grp* .
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/trackDb* .
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/hgFindSpec* .
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/chromInfo* .
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/chr*_gold* .
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/chr*_gap* .
The DNA sequence can be downloaded thus:
cd /gbdb
mkdir -p hg19/nib
cd hg19/nib
rsync -avP rsync://hgdownload.cse.ucsc.edu/gbdb/hg18/nib/chr*.nib .
Again we will need to tell SELinux to allow the webserver to access these files:
semanage fcontext -a -t httpd_sys_content_t "/gbdb"
Create hgFixed database:
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP,
ALTER, CREATE TEMPORARY TABLES
ON hgFixed.*
TO ucsc_admin@localhost
IDENTIFIED BY 'admin_password';
GRANT SELECT, CREATE TEMPORARY TABLES
ON hgFixed.*
TO ucsc_browser@localhost
IDENTIFIED BY 'browser_password';
mysql -u ucsc_admin -p -e "create database hgFixed"
The browser now functions properly and we can browse the hg19 assembly.
The 3rd part of this blog post looks at loading custom data, for this we will use some Drosophila melanogaster data taken from the modENCODE project. Therefore we will need repeat the steps we used to mirror the hg18 assembly to produce a local copy of the dm3 database. Alternatively, if you have sufficient disk space on your virtual machine you can grab all the D.melanogaster data like so:
mysql -u ucsc_admin -p -e "create database dm3"
cd /var/lib/mysql/dm3
rsync -avP rsync://hgdownload.cse.ucsc.edu/mysql/dm3/* .
cd gbdb
mkdir dm3
cd dm3
rsync -avP rsync://hgdownload.cse.ucsc.edu/gbdb/dm3/* .
At present these commands will download approximately 30Gb.
[This tutorial continues in Part 2: The UCSC genome browser database structure]