Graphical Inference for Infovis

The following article summarizes the work of [Wickham et al., 2010] on graphical inference.

Introduction

Information visualization provides tools to show new relations in data. While this technique can be very effictive, it is threatened by apophenia, the human ability to detect patterns in noise. Statistics instead provides methods that can examine if an assumption can be deduced from given sample data, and is therefore used to expose invalid hypotheses. Graphical inference tries to find a balance between these two methods: information visualisation to improve identification of new hypotheses, and statistics to reveal faulty conclusions.

The authors present two experimental protocols, Rorschach and Line-Up, which show how both techniques can be combined.

Motivation

Statistical methods generally try to show that a hypothesis is true or not. More specifically statistic investigate whether a difference exists (testing) or how big the difference is (estimating). For graphical inference you want to know whether a difference is actually here, so graphical inference works as testing procedure. In statistics this is called a null hypothesis H0 (the situation) and the alternative hypothesis (the assumption). The result of a statistical test can take two fault conditions:

false positive: H0 is rejected, although a H1 is not true (also called type I error)
false negative: H0 is not rejected, although a H1 is true (also called type II error)

The testing process in statistics can be compared with the criminal justice system, where an accused is judged guilty or innocent. During the trial the defense tries to show that the null hypothesis is true, the prosecution advocates the alternative hypothesis.

The static test compares the accused and known innocents, using a specific metric. To assess the guilt of the accused, the ration fo the innocent that look more guilty than the accused is computed. A type I error would be a convicted innocent and a type II error would be an acquitted guilty.

In statistics we use test statistcs (like the t-statistic)to calculate the propapility (the p-value) that the decision for the alternate hypothesis is wrong. When we use visual testing instead, the data is ploted and the visual difference is measured by a human judge or jury.

Protocols

Two different protocols are presented: Rorschach and Line-Up

Rorschach

The Rorschach protocol was named after the Rorschach test, in which a subject has to interpret abstract ink blots. Similar to that, for the Rorschach protocol a series of null plots is generated and presented to a subject, who is then asked to find patterns in the visualisations.

The goal of this operation is to train the senses to random deviations, and therefore reduce the effect of apophenia for the given type of visualisation.

Line-Up

The idea behind line-up is to show the real data plot together with decoys. When the observer is able to identificate the real data, we can assume that the real data differs. The line-up consists of the following steps:

generate n - 1 decoys
make a plot of the decoys together with a plot of the real data (positioning the real data plot ranomly)
let an observer assess, which data shows the deviation.

References

[Wickham et al., 2010] Hadley Wickham, Dianne Cook, Heike Hofmann, and Adreas Buja. Graphical Inference for Infovis. IEEE Transaction on Visualization and Computer Graphics, 16(6):973-979, November/December 2010

[Wickham, 2010] Hadley Wickham, Created at: Oktober 12, 2010. Retrieved at: November 16, 2010. http://vimeo.com/15791526]

Teaching:TUW - UE InfoVis WS 2010/11 - Gruppe 01 - Aufgabe 2

Contents