![]() Table of ContentsAbstractBuild & Installation Launching Hawkeye Command Line Options Sample Assembly Contact Information Acknowledgements DisplaysLaunch Pad
Scaffold View
Contig View
|
Scaffold ViewThe scaffold view of the assembly shows how the contigs and inserts are placed on the scaffold. It uses the mate-pair relationship and library sizes to categories the "happiness" of each insert, meaning it displays if the paired reads are correctly oriented and at the expected distance apart. The threshold distance for a "happy" insert can be adjusted by setting the maximum allowed number of standard deviations from the mean an insert can be. Details on all objects displayed in the Insert view can be found by clicking on any object. The mate for any unhappy insert is highlighted by right clicking on the read. Along with inserts and contigs, it also plots the read and insert coverage at each position along the scaffold, and a measure of overall happiness of the inserts called the CE statistic. The viewer also highlights the location of arbitrary features along the scaffold. This functionality is currently used to highlight clues of mis-assembly, such as regions of the genome where the assembly has a high occurence of unhappy insert coverage, or regions of high density correlated SNPs. Both such events are strong evidence for misassembly, and their combination at a location is nearly conclusive evidence.
The view is divided into 3 regions. Along the right is the control panel (f in the picture) and details panel (g) which allows users to filter and set parameters of the display and see the details of selected objects. Along the bottom left (e) is an overview of the entire scaffold, showing the contig placement and features. The rest of the display (a-d), the main display, shows the placement of the contigs (b), features (c), and inserts (d), and the statistical levels (a) of the currently selected region of the scaffold. The range slider below the overview and the magnifying glass tools allow users to select the region of interest. The display is interactive, and the details on every object are available on demand by clicking on the object. Return to top Mis-assembly DetectionThe combinations of evidence displayed in the scaffold view makes it possible to quickly identify mis-assemblies. Consider the region displayed below where happymates have been hidden, and k-mer coverage is plotted but otherwise uses default parameters. Our analysis begins at the cluster of yellow compressed inserts. Individually, a single compressed mate is not unusual since inserts sample the library distribution, but this is an unusually large cluster. Similiarly, the cluster of singleton mates (purple) below the compression is unusually large. Moving up, the small red features indicated there are multiple correlated SNPs in that same region, but this is a haploid organism. Further investigation in the Contig View should be used to confirm it is not chance correlations, but given the low background distribution of correlations, we can assume this is most likely due to mis-assembly. The bright white in the read coverage heat map indicates this is the highest read coverage in the scaffold. The CE Statistic indicates a very strong compression, and at -6 is well below the threshold. Finally, the spike in kmer coverage in yellow at the top of the plot indicates this is a complicated repeat region. Every mis-assembly characteristic has been met and we can conclude undeniably that this is a mis-assembly. In contrast, note the repeat (high kmer coverage) on the left of the plot has only 2 compressed mates, no correlated SNPs, and even coverage. This demonstrates the difficulty in understanding mis-assemblies: while nearly every mis-assembly occurs in a repeat, not every repeat is mis-assembled, and the presense of individual mis-assembly clues such as individual compressed mates are inconclusive. It is only the combination of evidence that allows one to prove mis-assembly.
Main DisplayThe main display (a-d) shows the contigs, inserts, features, and statistical information for a region of the scaffold. The scaffold of contigs (b) is represented as rectangles appropriately spaced and sized. The color of the rectangle indicates if the contig is oriented forward (blue) or reverse (dark blue) in the scaffold. Immediately beneath the contigs are two heatmaps. The first in purple highlights regions where the insert coverage is exceptionally high or low. Similarily, the green track highlights high and low read coverage. FeaturesBeneath the heatmaps, are the the feature tracks. Features are regions of scaffolds or contigs that have been selected for having interesting features. AMOS comes bundled with tools for computing mis-assembly type features, but arbitrary features of any type can be loaded from a tabbed deliminate file using 'loadFeatures'.
The currently available feature types are as follows:
Coverage and CE StatisticAbove the contigs (a) are two plots for coverage levels (top), and the CE statistic (bottom). The purple line in the coverage plot indicates the insert coverage, and the green line in the coverage plot indicates the read coverage. Mean values in the current scaffold are displayed as dashed lines. If loaded with -K, kmer coverage will be plotted in yellow. See Command Line Options for more information. Beneath the coverage plots is a plot of the CE statistic at that point (green and red lines). The CE statistic computes the level of compression or expansion for the inserts spanning a particular position. Values near zero indicate no deviation, large negative values (<3) indicate statistically unlikely compression, and large positive values (>3) indicate statistically unlikely expansion, and thus flag both compression and expansion type mis-assemblies. The value is computed on a per library basis, so there will be as many plots as libaries represented in the scaffold. See the Control Panel Library menu for the legend of library colors. A manuscript describing the CE statistic in more detail is in preparation. Inserts
Below the feature tracks are the inserts in the scaffold. If possible,
the mate-pairs are drawn connected by a thin line, while in all cases,
the thick rectangle indicates the position of the read. By default,
the colored and partioned categorically based on their mate happiness.
There are 7 happiness levels:
OverviewThe Overview panel (e) shows the entire scaffold and features. The background is tinted to highlight the currently visible region in the main display. It is aligned with the range slider beneath for selecting regions to display. Clicking in the panel recenters the display on the click point. Return to topControl PanelThe top of the control panel (f) controls the pointer. The arrow allows a user to select objects to get details in the details box (g). Right clicking an insert with the arrow highlights the mate if it is within the current scaffold, and control-clicking a read jumps to the mate even if the mate is in a different scaffold. This allows one to follow chain of mates between scaffolds. The magnifying glass tools allow one to zoom in or out of the main display. QueriesThe queries box allows one to control the main display. The Search box allows one to find any object by regular expresion on the name (eid or iid) of the object. The Happy Distance sets the maximum number of standard deviations from the mean an insert size may be and still be clasifyed as happy. FeaturesNext is a box for the features. Each predefined feature type has a checkbox and slider. The checkbox controls if that feature type should be displayed, and the slider controls how severe the feature has to be to be displayed. by default, all features are displayed, but the sliders can be used to show only regions with extreme insert or read coverage, for example. The colors of the feature sliders act as a legend for their display in the main display. Mate TypesNext is a series of toggles for the Mate Types, and controls if that type should be displayed or not. For example, happy mates are often the least informative, so they can be hidden. Display Toggles
Below the mate types are a series of toggles. They are as follows:
Mate Colors
Next to the display toggles is a radio button group controlling
how the mates are colored as follows:
LibrariesThe final region of the control panel is a legend for the libraries. Each library is listed by iid along with their mean and standard deviation. The color code is represented by a sample insert, but the same colors are also used for the CE statistic plot.
|