Teaching:TUW - UE InfoVis WS 2010/11 - Gruppe 05 - Aufgabe 2

From InfoVis:Wiki
Jump to navigation Jump to search

Pargnostics: Screen-Space Metrics for Parallel Coordinates

Introduction

Visualization still takes place in a space with a limited number of discrete pixels. The result of this often is over-plotting, clutter or other things. Most of the time this visual structures are avoided. But sometimes this artifacts can be useful, because they might point out interesting structures in the data. Good visualizations have to show the relevant information at the first glance and therefore show the information in a clear structure. Until now not a lot of attention is paid to the way a visualzation is presented on the screen. For the case of parallel coordinates so called Pargnostics (Parallel coordinates diagnostics) should act as a bridge between the created visualization and the perceptual system of the user. Based on metrics it's possible to provide an optimization which maximize or minimize certain visual artifacts. [Dasgupta and Kosara, 2010]

Metrics

Figure 1: Pixel-space histograms, see [Dasgupta and Kosara, 2010] p1018

To automatically enhance the analytical tasks of users Dasgupta and Kosara [2010] proposed several metrics. The metrics are used to measure the properties of parallel coordinates. First pixel-space histograms are calculated to optimize the calculation of these metrics. Pixel-space histograms discretize the lines drawn in parallel coordinates into bins - each bin being one pixel (for a total of h pixels in an axis):

  • One-Dimensional Axis Histogram: A vector b containing the number of lines that start or end at this pixel - see columns A and B in figure 1.
  • One-Dimensional Distance Histogram: A vector d where each component measures the slope of lines.
  • Two-Dimensional Axis Pair Histogram: A matrix where each cell xi,j states that n lines are going from pixel i to pixel j - see matrix in figure 1.

The metrics proposed can be used for measuring different data properties: correlation (number of line crossings, angles of crossing), aggregation (parallelism), many-to-one/one-to-many relationships (convergence, divergence), quality (over-plotting), information density (pixel-based entropy).

One of the most useful optimization is the line crossing and the parallelism metrics. Angels of crossing helps out, where line crossing gets difficult to read. The convergence-divergence metric works for categorical axes very well and the pixel based entropy optimizes the alpha value, which is useful on larger datasets.

Number of Line Crossings
This metric intuitively just does what the name implies, count how many line crossings there are between two axes of the prallel coordinates. The count is efficiently calculated by using intervals as proposed by [Allen, 1983; Rit 1986] with a complexity of O(h4). This count is then normalized by the maximum number of possible crossings in order compare the metric between different axis combinations. [Dasgupta and Kosara, 2010]
Angles of Crossing
Lines crossing at flat angles tend to create clutter [Dasgupta and Kosara, 2010]. To calculate this metric, first the crossing angles between every pair of lines which are crossing get calculated. From the results of these calculations the median crossing angle is obtained to be used as the resulting metric value.
Parallelism
A pair of lines which is not crossing is parallel to each other. Parallelism is often prefered as a metric compared to the number of crossings because it tends to produce less clutter [Dasgupta and Kosara, 2010]. Parallelism is calculated by taking the inverse of the absolute interquartile range of line distances.
Mutual Information
TBD
TBD
Convergence, Divergence
Lines from the left axis joining in single points on the right axis are converging. Divergence is the inverse of this - i.e. lines from the right joining on the left.
Over-plotting
Over-plotting measures how many lines are aggregated on a single pixel when the parallel coordinates are drawn. This metric is directly dependent on the number of bins (pixels) used for an axis [Dasgupta and Kosara, 2010]. The value obtained by this metric should be minimized to increase the quality of the visualization.
Pixel-based Entropy
TBD
TBD

Dimension Order Optimization

Using the metrics from above helps to find an optimization of the visualization display. This is an NP-complete problem, but by using a binned data model and by considering the special properties of parallel coordinates it's possible to reduce the amount of time which is needed to find optimal solutions [Dasgupta and Kosara, 2010]. Using a branch-and-bound algorithm also can reduce the time necessary.

The purpose of the branch-and-bound algorithm is to find the optimal order of axes. To do that a matrix of all axis pairs and their associated costs gets constructed. The costs are a combination of several metrics.

The construction of the matrix has to be done only once at the beginning of the algorithm.

Axis Inversions

While constructing the matrix of axis pairs both the inverted and noninverted situation for every axis pair is taken into account. The state - inverted or not inverted - with the lower cost is used in the matrix and the algorithm keeps track which one it was. This happens locally so inverting one axis pair doesn't have an immediate effect on other axis pairs.

Branch-and-Bound Optimization

The branch-and-bound algorithm uses a priority queue and best-first search. For that kind of implementations it's very important to make precise estimates which subtrees can be culled and which can't. Since these estimates are based on the full cost matrix which is constructed at the beginning of the algorithm they are indeed very precise.

Based on case studies by Dasgupta and Kosara [2010] the branch-and-bound algorithm finds the optimal solution really quick. The main reason for this is that the metrics are constructed for every axis pair on their own and not for every possible combination of the axis pairs in the entire visualization.

Conclusion

In this sense, Pargnostics fills a gap in the existing literature on parallel coordinates. Being able to analyze what ends up on the screen makes it possible to provide better visualization setups that take the specific properties of the visualization technique into account.
Dasgupta and Kosara, 2010


The metrics and the optimization explained above are an important step towards better visualiztions. This metrics not only describe the image which is rendered to screen, but also the different visual structures which can be seen in this image.

References

  • [Allen, 1983] J. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26:832-843, 1983.
  • [Dasgupta and Kosara, 2010] Aritra Dasgupta and Robert Kosara. Pargnostics: Screen-Space Metrics for Parallel Coordinates. IEEE Transactions on Visualization and Computer Graphics, 16(6):1017-1026, November/December 2010.
  • [Rit, 1986] J.-F. Rit. Propagating temporal constraints for scheduling. In Proceedings of the Fifth National Conference on Artificial Intelligence, pages 383-388, 1986