Compiling UCSC Source Tree Utilities on OSX

5 Comments
Posted 12 Mar 2009 in genome bioinformatics, OSX hacks, UCSC genome browser

The UCSC genome bioinformatics site widely regarded one of the most powerful bioinformatics portals for experimental and computational biologists on the web. The ability to visualize genomics data through the genome browser and perform data mining through the table browser, coupled with the ability to easily import custom data, permit a large range of possible genome-wide analyses to be performed with relative ease. One of the limitations of web-based access to the UCSC genome browser is the inability to automate your own analyses, which has led to the development of systems such as Galaxy, which provide ways to record and share your analysis pipeline.

However, for those of us who would rather type than click, another solution is to download the source code (originally developed by Jim Kent) that builds and runs the UCSC genome browser and integrate the amazing set of stand-alone executables into your own command-line workflows. As concisely summarized by Peter Schattner in an article in PLoS Computational Biology, The Source Tree includes:

“programs for sorting, splitting, or merging fasta sequences; record parsing and data conversion using GenBank, fasta, nib, and blast data formats; sequence alignment; motif searching; hidden Markov model development; and much more. Library subroutines are available for everything from managing C data structures such as linked lists, balanced trees, hashes, and directed graphs to developing routines for SQL, HTML, or CGI code. Additional library functions are available for biological sequence and data manipulation tasks such as reverse complementation, codon and amino acid lookup and sequence translation, as well as functions specifically designed for extracting, loading, and manipulating data in the UCSC Genome Browser Databases.”

Compiling and installing the utilities from source tree is fairly straightforward on most linux systems, although my earliest attempts to install on a powerpc OSX machine failed several times. The problems relate to building some executables around MySQL libraries which I never fully sorted out, but I’ve now gotten a fairly robust protocol for installation on i386 OSX machine. These instructions are adapted from the general installation notes in kent/src/README.

1) Install MySQL (5.0.27) and MySQL-dev (3.23.58) using fink.

2) Install libpng. [Note: my attempts to do this via Fink were unsuccessful.]

3) Obtain and Make Kent Source Tree Utilities

$wget http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
$unzip jksrc.zip
$mkdir $HOME/bin/i386
$sudo mkdir /usr/local/apache/
$sudo mkdir /usr/local/apache/cgi-bin-yourusername
$sudo chown -R yourusername /usr/local/apache/cgi-bin-yourusername
$sudo mkdir /usr/local/apache/htdocs/
$sudo chown -R yourusername /usr/local/apache/htdocs
$export PATH=$PATH:$HOME/bin/i386

[Note: it is necessary to add path to bin before making, since some parts of build require executables that are put there earlier in build]

$export MACHTYPE=i386
$export MYSQLLIBS="/sw/lib/mysql/libmysqlclient.a -lz"
$export MYSQLINC=/sw/include/mysql
$cd kent/src/lib
$make
$cd ../jkOwnLib
$make
$cd ..
$make

These instructions should (hopefully) cleanly build the code base that runs a mirror of the of UCSC genome browser, as well as the ~600 utilities including my personal favorite overlapSelect (which I plan to write more about later).

Notes: This solution works on a 2.4 Ghz Intel Core 2 Duo Macbook running Mac OS 10.5.6 using i686-apple-darwin9-gcc-4.0.1. Thanks goes to Max Haeussler for tipping me off the Source Tree and the power of overlapSelect. This protocol was updated 19 March 2011 and works on the 9 March 2001 UCSC jksrc.zip file.


5 Comments

  1. Max

    what might be even easier is to simply download the mac os binaries directly from the ucsc binary download directories: http://hgdownload.cse.ucsc.edu/admin/exe/

  2. caseybergman

    Thanks for the pointer, Max, I hadn't seen this. However, it looks like these directories only have binaries for ~40 of the ~600 UCSC tools (not including overlapSelect), so it may still be necessary in most cases to build these tools from source.

  3. Max

    Oups… I hadn't seen this…

  4. Maxim

    Great tutorial,

    after two days having no luck to install JK source tree package on Ubuntu/Suse/Redhat this tutorial worked like a charm.

    However the install needed to do

    export OSTYPE

    in addition on my MacBook Pro 1.82Ghz with 10.5.8. Before it during compilation of the libs.
    Best
    Maxim

  5. Yifang

    Spent two weeks to figure out the way to compile some of the tools from this package/collection, no luck. Or, not able to figure out a way to do the job. For example, I tried the tool faToFastq, and I want compile from source code. What I did:
    1) Pull out all the headers files that are listed in the #include “xxxx.h”
    2) Put these file in the same folder of faToFastq.c
    3) gcc -Wall -o fa2fq faToFastq.c

    ///////////////////////////////////////////////////////////////
    tmp/ccsf9ywG.o: In function ‘usage’:
    faToFastq.c:(.text+0xf): undefined reference to `errAbort’
    /tmp/ccsf9ywG.o: In function `faToFastq’:
    faToFastq.c:(.text+0x32): undefined reference to `lineFileOpen’
    faToFastq.c:(.text+0x47): undefined reference to `mustOpen’
    faToFastq.c:(.text+0xf5): undefined reference to `faMixedSpeedReadNext’
    faToFastq.c:(.text+0x109): undefined reference to `carefulClose’
    /tmp/ccsf9ywG.o: In function `main’:
    faToFastq.c:(.text+0x132): undefined reference to `optionInit’
    faToFastq.c:(.text+0x153): undefined reference to `optionVal’
    collect2: error: ld returned 1 exit status
    ///////////////////////////////////////////////////////////////////////////

    Then I found out the errors are all related to the prototypes of those functions in the faToFastq.c file, as none of the header files included the prototype of these functions.
    Spend hours and hours to find them, but not succeed. Do you have any idea to find those function prototypes/declarations for this program? Thanks a lot!



Add Your Comment