Genome-wide association studies (GWAS) have revolutionised the genetic mapping of complex traits and diseases over the last decade [1-3]. However, a considerable amount of the markers identified to date lie within non-coding regions and/or might be only proxy markers to the actual causal variants [2,3]. Tools that aid the visual inspection of these loci may facilitate the identification of functional elements located near GWAS-associated variants. LocusZoom [4] and SNAP-plot [5] have become widely used tools to generate locus-specific graphical displays of association results in the context of linkage disequilibrium (LD) as well as the position relative to nearby genes and local recombination hotspots. However, it is now becoming increasingly important to also visualise GWAS results in the context of functional annotations beyond genes (e.g. chromatin state, transcription factor binding sites, phylogenetic conservation, etc.) [6]. Thus, we have developed LocusTrack, a web-based application that allows the user to generate regional GWAS results plots that incorporate genomic annotations within the same figure. Currently LocusTrack supports both user-provided custom tracks as well as tracks from the UCSC genome-browser.
Implementation
Features and functionality
LocusTrack plots display regional GWAS results in the top panel (Figure 1a). Here, the user can opt between showing P-values on the –log10 scale (i.e. LocusZoom-like fashion) on the left y-axis or displaying LD (r
2) (i.e. SNAP-like fashion), which is often useful for investigating a region in the absence of P-values. Recombination rates are represented on the right y-axis. By default, LocusTrack selects the SNP with the strongest association and generates a plot according to a user-defined window-frame size. However, it is also possible for the user to specify any other SNP(s) if desired. The plot also shows the pairwise (LD) pattern of each SNP with the user specified SNP. Users can choose to compute LD (r
2) estimates from different 1000 Genomes Project populations available.
The second LocusTrack panel displays symbol and location of genes within that region (Figure 1b). Intron and exon positions are displayed in a similar fashion to LocusZoom. Orientation of the transcribed strand is indicated by differential colouring (blue = plus strand; red = minus strand) and arrows. The position of gene symbols is automatically adjusted to minimise the area occupied in the figure and to avoid overlap with one another.
LocusZoom provides the option for the data point to reflect different genomic annotations such as synonymous variants, splice variants, transcription factor binding sites, conservation, and whether they are in the GWAS catalogue. LocusTrack can incorporate any type of annotation in the form of genomic tracks in a third panel (Figure 1c). In this way, the user can specify between 1 and 10 different tracks which can be either custom tracks (i.e. the user must upload the data), LD tracks (i.e. a track displaying LD of the SNPs in another population), or publicly-available UCSC tracks. Note that LocusTrack uses the bioconductor package rtracklayer [7] to retrieve and parse UCSC tables. However, some tables come in a non-parseable form, or are truncated by the UCSC browser if they exceed certain limits (usually around 100,000 records), so they cannot be obtained by the program. This is particularly true for wiggle and big-wiggle format files. However, for these cases, the user can download directly the tracks via UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables) and input them as custom tracks.
Our application also allows users to zoom in and focus on a smaller region in the bottom panel, drawn from that shown in the first two panels. This provides a closer look to the annotation tracks at the region of interest, without modifying the plots in the upper panels. This region can be defined either based on an LD cut-off or based on a simple zoom in. In addition, to facilitate inspection, LocusTrack can display every assessed SNP in a track-like fashion which uses the same color-coding of SNPs in the first panel.
Finally, our application generates an R object with the annotations requested for each specified loci (e.g. genes located in that region, LD, and the information of the tracks selected), facilitating the GWAS annotation.