Customizing a Local UCSC Genome Browser Installation I: Introduction and CentOS Install

10 Comments
Posted 29 Mar 2012 in database, drosophila, genome bioinformatics, linux, UCSC genome browser

The UCSC Genome Bioinformatics site is widely regarded as one of the most powerful bioinformatics portals for experimental and computational biologists on the web. While the UCSC Genome Browser provides excellent functionality for loading and visualizing custom data tracks, there are cases when your lab may have genome-wide data it wishes to share internally or with the wider scientific community, or when you might want to use the Genome Browser for an assembly that is not in the main or microbial Genome Browser sites. If so then you might need to install a local copy of the genome browser.

A blog post by noborujs on E-notações explains how to install the UCSC genome browser on an Ubuntu system, and the UCSC Genome genome browser team provides a general walk-through on their wiki and an internal presentation. Here I will provide a similar walkthrough for installing it on a CentOS system. The focus of these posts is go beyond the basic installation to explain the structure of the databases that underlie the browser; how these database table are used to create the web front-end, and how to customize a local installation with your own data. Hopefully having a clearer understanding of the database and browser architecture should make the process of loading your own data far easier.

This blog entry has grown to quite a size, so I’ve decided to split it into 3 more manageable parts and you can find a link to the next part at the end of this post.

The 3 parts are as follows:

1) Introduction and CentOS install (this post)
2) The databases & web front-end
3) Loading custom data

The walk-through presented here installs CentOS 5.7 using VirtualBox so you can follow this on your desktop if you have sufficient disk space. Like the Ubuntu walk-through linked to above, this will speed you through the install process with little or no reference to what the databases actually contain, or how they relate to the browser, this information will be included in part 2.

CentOS Installation

• Grab CentOS 5.7 i386 CD iso’s from one of the mirrors. Discs 1-5 are required for this install.
• Start VirtualBox and select ‘New’. Settings chosen:
• Name = CentOS 5.7
• OS = linux
• Version = Red Hat
• Virtual memory of 1GB
• Create a new hard disk with fixed virtual storage of 64GB
• Start the new virtual machine and select the first CentOS iso file as the installation source
• Choose ‘Server GUI’ as the installation type when prompted
• Set the hostname, time etc. and make sure you enable http when asked about services
• When the install prompts you for a different CD, select “Devices -> CD/DVD devices -> Choose a virtual CD/DVD disk file” and choose the relevant ISO file.
• Login as root, run Package Updater, reboot.

UCSC Browser Install

• Install dependencies:

yum install libpng-devel
yum install mysql-devel

• Set environment variables, the following were added to root’s .bashrc:

export MACHTYPE=i386
export MYSQLLIBS="/usr/lib/mysql/libmysqlclient.a -lz"
export MYSQLINC=/usr/include/mysql
export WEBROOT=/var/www

Each time you open a new terminal these environment variables will be set automatically.

• Download the Kent source tree from http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, unpack it and then build it:

wget http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
unzip jksrc.zip
mkdir -p ~/bin/$MACHTYPE # only used when you build other utilities from the source tree
cd kent/src/lib
mkdir $MACHTYPE
make
cd ../jkOwnLib
make
cd ../hg/lib
make
cd ..
make install DESTDIR=$WEBROOT CGI_BIN=/cgi-bin DOCUMENTROOT=$WEBROOT/html

You will now have all the cgi’s necessary to run the browser in /var/www/cgi-bin and some JavaScript and CSS in /var/www/html. We need to tell SELinux to allow access to these:

restorecon -R -v '/var/www'

• Grab the static web content. First edit kent/src/product/scripts/browserEnvironment.txt to reflect your environment (MACHTYPE, DOCUMENTROOT etc.) then

cd kent/src/product/scripts
./updateHtml.sh browserEnvironment.txt

• Create directories to hold temporary files for the browser:

mkdir $WEBROOT/trash
chmod 777 $WEBROOT/trash
ln -s $WEBROOT/trash $WEBROOT/html/trash

• Set up MySQL for the browser. We need an admin user with full access to the databases we’ll be creating later, and a read-only user for the cgi’s.

In MySQL:

GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP,
ALTER, CREATE TEMPORARY TABLES
  ON hgcentral.*
  TO ucsc_admin@localhost
  IDENTIFIED BY 'admin_password';

GRANT SELECT, CREATE TEMPORARY TABLES
  ON hgcentral.*
  TO ucsc_browser@localhost
  IDENTIFIED BY 'browser_password';

The above commands will need repeating for each of the databases that we subsequently create.

We also need a third user that has read/write permissions to the hgcentral  database only:

GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER
  ON hgcentral.*
  TO ucsc_readwrite@localhost
  IDENTIFIED BY 'readwrite_password';

Note: You should replace the passwords listed here with something sensible!

The installation README (path/to/installation/readme) suggests granting FILE to the admin user, but FILE can only be granted globally (i.e. GRANT FILE ON *.*) , so we must do this as follows:

GRANT FILE ON *.* TO ucsc_admin@localhost;

We now have the code for the browser and a database engine ready to receive some data. As an example, getting a human genome assembly installed requires the following:

• Create the primary gateway database for the browser:

mysql -u ucsc_admin -p  -e "CREATE DATABASE hgcentral"
wget http://hgdownload.cse.ucsc.edu/admin/hgcentral.sql
mysql -u ucsc_admin -p hgcentral < hgcentral.sql

• Create the main configuration file for the browser hg.conf and save it in /var/www/cgi-bin:

cp kent/src/product/ex.hg.conf /var/www/cgi-bin/hg.conf

Then edit /var/www/cgi-bin/hg.conf to reflect the specifics of your installation.

• Admin users should also maintain a copy of this file saved as ~/.hg.conf, since the data loader applications look in your home directory for hg.conf. It is a good idea for ~/.hg.conf to be made private (i.e. only the owner can access it) otherwise your database admin password will be accessible by other users:

cp /var/www/cgi-bin/hg.conf ~/.hg.conf
chmod 600 ~/.hg.conf

When we issue commands to load custom data (see part 3) it is this copy of hg.conf that will supply the necessary database passwords.

The browser is now installed and functional, but will generate errors because the databases specified in the hgcentral database are not there yet. The gateway page needs a minimum human database in order to function even if the browser is being built for the display of other genomes.

To install a human genome database, the minimal set of mySQL tables required within the hg19 database is:
• grp
• trackDb
• hgFindSpec
• chromInfo
• gold – for performance this table is split by chromosome so we need chr*_gold*
• gap – split by chromosome as with gold so we need chr*_gap*

To install minimal hg19:

GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP,
ALTER, CREATE TEMPORARY TABLES
  ON hg19.*
  TO ucsc_admin@localhost
  IDENTIFIED BY 'admin_password';

GRANT SELECT, CREATE TEMPORARY TABLES
  ON hg19.*
  TO ucsc_browser@localhost
  IDENTIFIED BY 'browser_password';

mysql -u ucsc_admin -p -e "CREATE DATABASE hg19"
cd /var/lib/mysql/hg19
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/grp* .
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/trackDb* .
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/hgFindSpec* .
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/chromInfo* .
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/chr*_gold* .
rsync -avP rsync://hgdownload.cse.ucsc/mysql/hg19/chr*_gap* .

The DNA sequence can be downloaded thus:

cd /gbdb
mkdir -p hg19/nib
cd hg19/nib
rsync -avP rsync://hgdownload.cse.ucsc.edu/gbdb/hg18/nib/chr*.nib .

Again we will need to tell SELinux to allow the webserver to access these files:

semanage fcontext -a -t httpd_sys_content_t "/gbdb"

Create hgFixed database:

GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP,
ALTER, CREATE TEMPORARY TABLES
  ON hgFixed.*
  TO ucsc_admin@localhost
  IDENTIFIED BY 'admin_password';

GRANT SELECT, CREATE TEMPORARY TABLES
  ON hgFixed.*
  TO ucsc_browser@localhost
  IDENTIFIED BY 'browser_password';

mysql -u ucsc_admin -p -e "create database hgFixed"

The browser now functions properly and we can browse the hg19 assembly.

The 3rd part of this blog post looks at loading custom data, for this we will use some Drosophila melanogaster data taken from the modENCODE project. Therefore we will need repeat the steps we used to mirror the hg18 assembly to produce a local copy of the dm3 database. Alternatively, if you have sufficient disk space on your virtual machine you can grab all the D.melanogaster data like so:

mysql -u ucsc_admin -p -e "create database dm3"
cd /var/lib/mysql/dm3
rsync -avP rsync://hgdownload.cse.ucsc.edu/mysql/dm3/* .
cd gbdb
mkdir dm3
cd dm3
rsync -avP rsync://hgdownload.cse.ucsc.edu/gbdb/dm3/* .

At present these commands will download approximately 30Gb.

[This tutorial continues in Part 2: The UCSC genome browser database structure]


10 Comments

  1. cjjimenez

    I try to use the indicated command which is:
    ./updateHtml.sh browserEnvironment.txt

    and I got the following error:
    ./updateHtml.sh: line 30: .: browserEnvironment.txt: file not found
    ERROR: BROWSERHOME directory does not exist:
    ERROR: CGI_BIN directory does not exist:
    ERROR: check on the existence of the mentioned directories

    Please lend some advice in light of this problem. Thanks much

  2. cjjimenez

    By the way, this is my browserEnvironment.txt:

    #
    # browserEnvironment.txt
    #
    # example UCSC genome mirror site definition file
    #
    # copy this entire product/scripts/ directory to some directory outside
    # of the kent source tree. Edit this browserEnvironment.txt file
    # for your local site. This file is used as an argument to the scripts
    # for building the source tree and fetching /gbdb/ and goldenPath/ files
    # from UCSC and loading database tables.
    #

    # MACHTYPE needs to be a simple string such as: i386 or ppc or x86_64
    # see what your uname -m produces, make sure it has no – characters
    export MACHTYPE=`uname -m`
    # KENTHOME – directory where you want the kent source tree and built
    # binaries to exist. This is typically your $HOME/ directory
    export KENTHOME=”/scratch/tmp”
    # Set KENTBIN to a directory where you want the results of the kent
    # source tree build. *NOTE* This will also require a symlink
    # in your HOME directory: $HOME/bin/$MACHTYPE -> $KENTBIN/$MACHTYPE
    # as the source tree often places resulting binaries in
    # $HOME/bin/$MACHTYPE despite this directive (this deficiency
    # to be corrected in later versions of the source tree)
    export KENTBIN=”${KENTHOME}/bin”
    export BINDIR=”${KENTBIN}/$MACHTYPE”
    # This SCRIPTS is a directory where scripts from the kent source tree build
    # will be placed
    export SCRIPTS=”${KENTBIN}/scripts”
    # kentSrc – location where you want your master kent source tree to live
    export kentSrc=”${KENTHOME}/kent”
    # LOGDIR – location to keep log files of script activities:
    export LOGDIR=”${KENTHOME}/browserLogs”
    # GBDB – You will need a symlink /gbdb -> to this directory with gbdb downloads
    export GBDB=”/scratch/tmp/gbdb”
    # BROWSERHOME – directory where cgi-bin/ trash/ and htdocs/ should exist
    # This is typically something like /var/www or /usr/local/apache
    export BROWSERHOME=”/scratch/tmp”
    # DOCUMENTROOT – location of top level htdocs for downloading UCSC static html
    # file hierarchy
    export DOCUMENTROOT=”${BROWSERHOME}/htdocs”
    # GOLD – where to download goldenPath/database/ directories
    export GOLD=”${DOCUMENTROOT}/goldenPath”
    # TRASHDIR – the full path to your “trash” directory
    # *BEWARE* this TRASHDIR specification is used by the trashCleaner.sh
    # script to remove files in this directory. It needs to be correct.
    # It must be an absolute path starting with / and it must end in /trash
    export TRASHDIR=”${BROWSERHOME}/trash”
    # cgi-bin/ must be in the same directory as htdocs/ and trash/
    export CGI_BIN=”${BROWSERHOME}/cgi-bin”
    # Adding these to your path will enable finding all the kent commands
    export PATH=”$KENTBIN/$MACHTYPE:$KENTBIN/:$PATH”
    # where to find the hgsql command for database operations
    export HGSQL=”$KENTBIN/$MACHTYPE/hgsql”
    # You will need to find out where your MySQL libs are.
    # see notes in: README.building.source about finding them
    export MYSQLLIBS=”/usr/lib64/mysql/libmysqlclient.a -lz”
    export MYSQLINC=”/usr/include/mysql”
    # If your mysql is SSL enabled, use the -lssl:
    # export MYSQLLIBS=”/usr/lib64/mysql/libmysqlclient.a -lssl -lz”

    # You may need to specify the location of your png library if
    # it is not in any of: /usr/lib/libpng.a /usr/lib64/libpng.a
    # and your compiler can’t find it when specified as: -lpng
    # export PNGLIB=”/opt/local/lib/libpng.a”
    # export PNGINCL”-I/opt/local/include/libpng12″

    # Optional support of the BAM file format, see also:
    # http://genomewiki.ucsc.edu/index.php/Build_Environment_Variables
    # You will need to install the samtools library from
    # http://sourceforge.net/projects/samtools/files/samtools/
    # export USE_BAM=1
    # export KNETFILE_HOOKS=1
    # export SAMDIR=${BROWSERHOME}/samtools-0.1.11
    # export SAMINC=${SAMDIR}
    # export SAMLIB=${SAMDIR}/libbam.a
    # Optional support of the tabix compression and binary index format
    # (required for support of the VCF file format),
    # analogous to configuration of samtools above, see;
    # http://genomewiki.ucsc.edu/index.php/Build_Environment_Variables
    # You will need to install the tabix library from
    # http://sourceforge.net/projects/samtools/files/tabix/
    # export USE_TABIX=1
    # export KNETFILE_HOOKS=1
    # export TABIXDIR=${BROWSERHOME}/tabix-0.2.3
    # export TABIXINC=${TABIXDIR}
    # export TABIXLIB=${TABIXDIR}/libtabix.a

    export ENCODE_PIPELINE_BIN=”${BINDIR}”
    # protect these scripts by specifying the single machine on which they
    # should run and the single user name which should be running them.
    # for the AUTH_MACHINE you may need to see what uname -n says
    export AUTH_MACHINE=”yourMachineName”
    export AUTH_USER=”yourUserName”

    ########################################################################
    # probably no changes needed with these variables, they are just convenient
    # global variables for all the scripts
    # rsync command used to fetch from hgdownload.cse.ucsc.edu
    # Note the explicit arguments for mirroring in an attempt to work-around
    # the UCSC permissions on directories and files which are not
    # consistent for ‘group’ rwx permissions.
    export RSYNC=”rsync -rltgoD”
    # rsync location at UCSC to fetch files, select one of these two servers:
    export HGDOWNLOAD=”rsync://hgdownload.cse.ucsc.edu”
    # alternate rsync location at UCSC to fetch files
    # export HGDOWNLOAD=”rsync://hgdownload-sd.cse.ucsc.edu”
    #
    # END OF CONFIGURATION VARIABLES
    ########################################################################

    # verify this machine and user are correct
    export MACHINE_NAME=`uname -n | sed -e ‘s/\..*//’`
    if [ “X${AUTH_MACHINE}Y” != “X${MACHINE_NAME}Y” ]; then
    echo “ERROR: must be run on ${AUTH_MACHINE}, this machine is: ${MACHINE_NAME}”
    exit 255;
    fi

    if [ -z “${LOGNAME}” ]; then
    echo “ERROR: not found, expected environment variable: LOGNAME”
    exit 255;
    fi

    if [ -z “${USER}” ]; then
    USER=${LOGNAME}
    export USER
    fi

    if [ “X${AUTH_USER}Y” != “X${USER}Y” ]; then
    echo “ERROR: must be run only by user ${AUTH_USER}, your are: ${USER}”
    exit 255;
    fi

  3. caseybergman

    It looks to me like you have not created the BROWSERHOME and CGI_BIN directories, or they don’t have the right permissions.

  4. Hamed

    cjjimenez, have you solved you problem?

    I have the same problem.

    ./updateHtml.sh: line 30: .: browserEnvironment.txt: file not found
    ERROR: BROWSERHOME directory does not exist:
    ERROR: CGI_BIN directory does not exist:
    ERROR: check on the existence of the mentioned directories

    And:

    BROWSERHOME is “/var/www”
    CGI_BIN is “/var/www/cgi-bin”
    DOCUMENTROOT is “/var/www/htdocs”

    both directories already exist.

    I don’t know what to do, would you please help me?

  5. cjjimenez

    Hamed,

    I’m sorry, it is only now that I have read your post. I still have no idea how to edit the “browserEnvironment.txt”? By the way, I have skipped this step because as I see it, it only arranges the directory system of the kent source tree. How about you, have you solved it already?

  6. Tim Burgis

    Hi Hamed and cjjimenez,

    The browserEnvironment.txt file should be in the same directory as the updateHtml.sh script but if you can’t see it there (or you have placed it elsewhere) try giving the full path to browserEnvironment.txt file. For example, if it was in my home directory then I would use the command:

    ./updateHtml.sh /home/tim/browserEnvironment.txt

    I hope that fixes your problem. Let me know how you get on.

  7. cjjimenez

    Tim Burgis,

    Thanks for the answer. But I have confirmed that the files updateHtml.sh and browserEnvironment.txt are in the same directory before I posted here. That’s why I’m wondering what is the problem when these two files are in the same directory.

  8. Great job Casey.

  9. Juan

    Hi there,

    I was following this post to install the UCSC Genome Browser in our CentOS server, and I believe is still valid although two years older right? Anyway, I get an erro here:

    $ mysql -u ucsc_admin -p hgcentral < hgcentral.sql
    Enter password:
    ERROR 1044 (42000) at line 177: Access denied for user 'ucsc_admin'@'localhost' to database 'hgcentral'

    I know is not directly related with the installation, but with MySQL problems, but ucsc_admin has all the privileges given before, but still, I can't create the database…

    So, I did mysql -u root -p hgcentral < hgcentral.sql

    Will that be OK?

    Thanks in advance!
    Juan

  10. Adrian Platts

    This page remains a great resource, thanks for putting it together. I just wanted to add some comments having set up a browser mirror on CentOS7 this morning…

    >export MACHTYPE=i386

    I think this should generally now be:

    export MACHTYPE=x86_64

    >export MYSQLLIBS=”/usr/lib/mysql/libmysqlclient.a -lz”

    not only has this shifted over by default to the 64bit directory, but I found I needed an extra library to be linked as well:

    export MYSQLLIBS=”/usr/lib64/mysql/libmysqlclient.a -lz -ldl”

    this probably also needs adjusting in browserEnvironment.txt

    One of the things I hit that many others seem to find was the permissions errors generated around trash directories, and I didn’t see a good reply on the UCSC lists. For me it turned out that not only did I need to set ownership/permissions, but I also needed to let SELinux know that apache was good writing in this directory:

    chcon -R -t httpd_sys_rw_content_t trash

    Finally the rsync settings seemed to have changed slightly and I had to tweak the ones above to enble compression:

    rsync -avzP –delete –max-delete=20 rsync://hgdownload.cse.ucsc.edu/mysql/hg19/grp* .

    But other than those small tweaks, this guide really helped. Thanks.



Add Your Comment