Genes are segments of DNA that carry the genetic or inherited information within all living organisms and interact with each other to influence the organism's physical development and behavior. Gene expression is a key indicator of gene activity and can be measured in microarray experiments. Microarray experiments allow biologists to monitor the activity of thousands of genes in parallel across multiple conditions or more commonly, across multiple stages of a biological process. They monitor changes in the expression of genes over time and result in microarray time-series data. The analysis of this data is relevant to many areas of biology and medicine, such as studying treatments, disease, and developmental stages.
To illustrate the Time Series explorer concept we have developed we will use an example microarray time series data set from an experiment to study the activity of genes from breast tissue in a mouse during stages of development from a virgin state to post pregnancy. Knowledge gained through analysis of such data can be related to similar physiology in human beings and used in the study of diseases such as breast cancer . Figure 1 shows the activity against time plot of developmental microarray time-course data for a single gene. Here there are 17 time-points labelled to indicate the timing in relation to stages of development. V denotes the virgin stage and V10 denotes 10 hours into the virgin stage. The other stages are pregnancy (P), lactation (LAC) and involution (INV). A typical microarray time-course experiment produces data like this for around 8,000 genes, each having an expression value at every time- point.
Analysis of microarray time-course data generally involves the detection and exploration of patterns of correlated gene activity which relate to some biological activity. These patterns can be dominant trends, less dominant patterns or interval patterns. Dominant trends associate large numbers of genes with similar activity over the majority of time points. Less dominant patterns associate smaller numbers of genes and interval patterns associate genes with correlated activity over a limited period of time. Figure 2 illustrates an interval pattern showing activity plotted against time for a set of genes with activity correlated over the interval P from P1 to P3. This type of pattern is likely to suggest that this group of genes are related to a particular biological process and that that process is associated with the experimental conditions.
When patterns are found, exploring those patterns is likely to involve finding relationships between them, relating patterns to predefined groupings and relating patterns to the experimental conditions. Each of these activities is likely to increase the biologists understanding of the underlying biological system and bring them nearer to forming valuable hypotheses. The problem with this is that, due to the complexity and scale of the data, many relevant patterns are often very difficult to find. Indeed, without significant prior-knowledge relating to a patterns existence, the majority of applicable techniques can only allow a biologist to find dominant patterns and a limited number of the less dominant patterns. The Time-series Explorer technique is the only technique that allows biologists to find interval patterns without prior-knowledge of both the interval over which activity is correlated and the profile of that correlated activity.
The Time-series Explorer allows biologists to explore microarray time-course data with the use of two main coordinated visualisations: a graph view as described above and a scatter-plot. The primary purpose of the scatter-plot is to provide a representation that can be used to relate genes to patterns of activity associated with intervals (periods of time with a start time, end time and duration) of the overall time-frame. To do this the scatter-plot displays genes as single points on a two dimensional plot of gene expression against change in gene expression for the selected interval. This results in a plot with genes of high rising expression occurring in the top right quadrant and those of low falling expression in the lower left quadrant of the plot. Figure 3 shows a screenshot of Time-series Explorer software interface. This contains five main panels with which the user can interact to manipulate the representations of their data in order to find patterns. These are the toolbar, graph view, scatter-plot, gene list and grouping panel.
The graph view (ii. in Figure 3) allows users to select and adjust the parameters (start time, end time or duration) of the interval to be displayed to focus in on a specific interval or to animate the interval scatter-plot to reveal general trends and outliers across the time-course. The interaction mechanism of the graph view is a multi-range dynamic query slider which utilizes the internal slider space for a visual representation of data. Dragging the edges of a vertical bar overlaid onto the view to represent the selected interval allows the user to adjust its start and end times independently. Dragging the centre of the bar changes the start and end times with the duration remaining constant to shift the selected interval. During this interaction if the selected interval is shifted in the positive direction from earlier to later time-points changes across time in the animated scatter-plot convey changes across time in the data.
During an animation across time, which can be initiated either by clicking the interval as it is represented in the graph view and dragging it from left to right or by using the play button in the toolbar for an animation with a regular frame rate, genes with outlying high, low, rising or falling interval activity remain on the periphery of the scatter-plot and move smoothly in a predictable anticlockwise rotation around the axes origin. This effect is best explained by relating the axes of the scatter-plot to cardinal points of a compass (Figure 4). Conversely, if the animation of scatter-plot is reversed by shifting the selected interval backwards gene representations will move clockwise. Whichever the direction of animation, this uniformity of rotational direction makes it easier to interpret more complex patterns of activity. This is because with uniformity of rotational direction the representations of genes are less likely to have crossing paths and more likely to remain distinct during the animation. This, in turn, reduces the ambiguity in relating the representation of genes from one selected interval to another and allows genes to be tracked across time with multiple trends in activity related to the same genes or gene groupings.
To complement the benefits of uniform rotational, direction tight control of selected interval manipulation using the graph view gives users control over the pace of the animation. This allows them to slow down as interesting features become apparent, reverse the animation when they want to look at something again and stop the animation, when appropriate, to focus in on an interesting interval and investigate patterns occurring over that interval in more detail by interacting with the scatter-plot view.
Once the user ceases interacting with the graph view there are a number of different options for interacting with the scatter-plot (iii. in Figure 3). The majority of these interactions employ standard brushing and linking information visualisation operations. If the labelling tool is activated from the interface toolbar, moving the mouse over gene representations in the scatter-plot view causes them to be labelled and have their activity over the entire time-course highlighted in the graph view. The functioning of the excentric labelling tool is similar to that of the labelling tool with the exception that all gene representations within the bounds of a visible circle are labelled and highlighted. The additional information revealed by labelling and the subsequent coordination between scatter-plot and graph views allows the user to rapidly perform an assessment of a pattern's significance. If the user is interested in a smaller number of genes or wishes to investigate a sample of selected genes in more detail, double-clicking on gene representations in the scatter-plot allows them to view a pop- up details-on-demand window describing the un-scaled recorded intensities for the subject gene and a summary of the groupings to which the gene belongs. This again leads to a more informed assessment of a pattern's significance.
As an alternative to labelling, when the freehand or box selection tools are activated genes can be selected by clicking and dragging to draw a box or a freeform shape around their representations in the scatter-plot. In either case, the representations of un-selected genes are greyed out in both the graph and scatter-plot views allowing users to focus in on selections which are colour-coded, labelled, animated and selected again (using logical AND or OR rules) independent of the un-selected data. This allows the user to find groupings within groupings and combine selections to uncover or investigate more complex patterns in the data.
The toolbar (i. in Figure 3) contains 17 buttons in five groups with various different functions such as animating the scatter-plot view, changing the selection mode on the scatter-plot and viewing details for a selection. The operation of the grouping panel (v. in Figure 3) is similar to that of the Microsoft Windows file explorer tree-pane. Imported known groupings of genes are stored within folders that correspond to grouping categories. Clicking on the folders causes their contents to be expanded or collapsed and clicking on a grouping name causes the genes which belong to that grouping to be selected. Buttons on the grouping panel mini-toolbar allow new grouping categories to be added or deleted, or new groupings to be generated from the genes that are currently selected.
Figure 5 describes a pattern found when the Time-series Explorer was evaluated  with the data used to investigate the developmental stages in mouse breast tissue described earlier. The pattern was discovered, in part, when investigating general trends across the entire time-period. As the scatter-plot animated through days 1 to 3 of the pregnancy stage an outlying group of gene representations showed significant rising then falling expression. To investigate this further the relevant interval was animated again (figure 5a), then stopped so that the outlying genes could be labelled by moving the mouse over their representations in the scatter-plot (figure 5b). This revealed the majority of the genes also shared low expression over the remainder of the time-course (figure 5c). Next the genes were selected and cross-referenced with pre-defined gene classifications. Significantly the selection was found to contain a high proportion of Keratin associated genes. Figure 5 illustrates this pattern showing selected frames of the initial animation from interval P1 to P2 through to interval P2 to P3, the labelled scatter-plot at P1 to P2 and the effect of labelling in the coordinated graph view where the genes are highlighted.
Finding this pattern was a significant outcome of our user evaluation verifying that the Time-series Explorer is uniquely capable of revealing certain previously unsuspected patterns of temporal activity. Later the biologist involved in the evaluation was able to verify that the pattern found was of relevant biological significance. Moreover, the technique also proved capable of revealing suspected patterns of temporal activity and the evaluation uncovered significant advantages when using the Time-series Explorer over other more established techniques. Specifically, this was when the technique was used to uncover general trends occurring over limited periods of the time-course the user had the advantage of being able to quickly identify interesting sub-groupings, when identifying suspected outliers over smaller intervals the technique offered the biologists the ability to perceive distinct groupings of outliers and when looking for general trends across the entire time-course the biologists found it easier to assess more subtle patterns of general activity. We believe therefore that other biologists would benefit from using the technique with their time-course data to help them find these types of pattern.
Our initial user evaluation of the software demonstrator showed its potential to allow biologists to overcome a significant limitation found to exist with other techniques. The tool was suitable for the bench biologist to explore their data rather than having to send the data to a statistician to analyse and return the results. This brought the biologist closer to the experiment and the exploration of the resulting data. The technique proved flexible, allowing the user to find a range of patterns without having to switch between unrelated views of the data.
However the tool as it currently stands lacks some features which would be necessary to further improve the productivity and efficiency of microarray time-course data analysis by the biologists.
1 Stein T, Morris J, Davies C, Weber-Hall S, Duffy M-A, Heath V, Bell A, Ferrier R, Sandilands G, and Gusterson B. Involution of the mouse mammary gland is associated with an immune cascade and an acute-phase response, involving LBP, CD14 and STAT3. Breast Cancer Research 2004; 6(2): R75 - R91
2 Craig P, Kennedy JB, and Cumming A. Animated Interval Scatter-plot Views for the Exploratory Analysis of Large Scale Microarray Time-course Data. Information Visualisation 2005; 4(3): (accepted for publication, to appear Autumn 2005)
3 Strategic Analysis of World DNA Microarray Markets. 2004, Frost Sullivan.
4 World Bio-Informatics Market (2005-2010). 2005, RNCOS.