Graphical Inference for Infovis

The following article summarizes the work of [Wickham et al., 2010] on graphical inference.

Introduction

Information visualization provides tools to show new relations in data. While this technique can be very effictive, it is threatened by apophenia, the human ability to detect patterns in noise. Statistics instead provides methods that can examine if an assumption can be deduced from given sample data, and is therefore used to expose invalid hypotheses. Graphical inference tries to find a balance between these two methods: information visualisation to improve identification of new hypotheses, and statistics to reveal faulty conclusions.

The authors present two experimental protocols, Rorschach and Line-Up, which show how both techniques can be combined.

Motivation

Statistical methods generally try to show that a hypothesis is true or not. More specifically statistic investigate whether a difference exists (testing) or how big the difference is (estimating). For graphical inference you want to know if a difference actually exists, thus graphical inference works as testing procedure.

Statistic Foundation

For such a statistic test one needs to define a so-called null hypothesis H₀ which is tested against the alternative hypothesis H₁. As statistic produce results based on probability errors can happen. The two possible errors are classified as follows:

	Null Hypothesis (H₀) is true	Alternative Hypothesis (H₁) is true
Null Hypothesis is accepted	Right decision	Type II Error False Negative
Null Hypothesis is rejected	Type I Error False Positive	Right decision

(Also see external link section for further information on statistical hypothesis testing)

The testing process in statistics can be compared with the criminal justice system, where an accused is judged guilty or innocent. During the trial the defense tries to show that the null hypothesis is true, the prosecution advocates the alternative hypothesis.

The static test compares the accused and known innocents, using a specific metric. To assess the guilt of the accused, the ration fo the innocent that look more guilty than the accused is computed. A type I error would be a convicted innocent and a type II error would be an acquitted guilty.

In statistics we use test statistcs (like the t-statistic)to calculate the propapility (the p-value) that the decision for the alternate hypothesis is wrong. When we use visual testing instead, the data is ploted and the visual difference is measured by a human judge or jury.

Protocols

Two different protocols are presented: Rorschach and Line-Up

Rorschach

The Rorschach protocol was named after the Rorschach test, in which a subject has to interpret abstract ink blots.

Similar to that, for the Rorschach protocol a series of plots is generated and presented to a subject, who is then asked to find patterns in the visualisations.

An example is given in Figure 1: Nine histograms summarizing the accuracy at which 500 participants perform nine tasks. What do you see?

The goal of this operation is to train the senses to random deviations, and therefore reduce the effect of apophenia for the given type of visualisation.

Line-Up

The idea behind line-up is to show the real data plot together with decoys. When the observer is able to identificate the real data, we can assume that the real data differs. The line-up consists of the following steps:

generate n - 1 decoys
make a plot of the decoys together with a plot of the real data (positioning the real data plot ranomly)
let an observer assess, which data shows the deviation.

External Links

Statistics

Fun and Trivia

Sesame's street interpretation of the line-up

References

[Wickham et al., 2010] Hadley Wickham, Dianne Cook, Heike Hofmann, and Adreas Buja. Graphical Inference for Infovis. IEEE Transaction on Visualization and Computer Graphics, 16(6):973-979, November/December 2010

Teaching:TUW - UE InfoVis WS 2010/11 - Gruppe 01 - Aufgabe 2

Contents