Evaluating Visual Table Data Understanding

From InfoVis:Wiki
Revision as of 17:28, 14 April 2007 by 128.131.219.231 (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Authors


Short description

Evaluation of how information visualization supports exploration for visual table data. A controlled experiment designed to evaluate how the layout of table data affects the user understanding and his exploration process. This experiment raised interesting problems from the design phase to the data analysis. In this paper, we focus on evaluating how information visualization supports exploration for visual table data. We present a controlled experiment designed to evaluate how the layout of table data affects the user understanding and his exploration process. This experiment raised interesting problems from the design phase to the data analysis. A a task taxonomy, the experiment procedure and clues about data collection and analysis are presented. Concluding with lessons learnt from this experiment and the discussion of the format of future evaluation.

Controlled Experiment

In the experiment three table ordering algorithms commonly used to reorder rows and columns of table data where used. The first order is alphabetic (A) and considered as a control order. The two others (C) and (T) are issued from the field of bioinformatics. (C) is a hierarchical clustering followed by a linearization. (T) is based on the traveling salesman problem. They are used to reorder DNA microarrays (visual table data presenting gene expressions).

Datasets

Two real datasets where chosen for the experiment. The datasets had to be interesting and of reasonable size to keep the experiment subjects interested. One dataset was the university master´s grades and the second dataset was from the CIA World Factbook. This dataset consists of statistics of productivity, export, transportation for most of the countries in the world.

Tasks

The experiment subjects had to complete tasks of different difficulties. The datasets where divided in three difficulties. At a high difficulty level of the dataset the experiment subjects have interpret a lot of the data, while looking at easier parts of the dataset the readability of the datasets where tested.
The experiment consisted of the following tasks:

  • Low Level: on represenation visual coding

Readability:
Find a specified student or course
Give the grade for a given student and course
Find how many students followed a given course

Interpretation:
Find the course where students performs better/worse
Find the student with the highest/lowest grades

  • Medium Level: on groups and outliers

Readability:
Circle a group
Give it a representative element of the group
Circle outliers

Interpretation:
Give it a meaningful name
Give a representative element of the group
Explain why the are outliers


  • High Level: on correlations and trends

Readability:
Give two Courses correlated/uncorrelated

Interpretation:
Give two Courses correlated/uncorrelated
Give the main trends of this master (in terms of topics)
Explain how students/courses have been ordered

Figures

Pictures of the data sets in different sort orders.

InfoSky
InfoSky



InfoSky
InfoSky



Task complexity

InfoSky
InfoSky



Lessons Learned

  • Do not use real datasets
  • Do not ask too much
  • Do not exclude free comments or open questions
  • Do not leave too much freedom
  • Do not underestimate training

Important Citations

Main challenges of Infosky:

We chose two real datasets in order to conduct this experiment in a realistic context.
[Fekete et Henry, 2006]



We encountered problems to create relevant questionnaires for real datasets.
[Fekete et Henry, 2006]



Analyzing free comments, explanations or open questions is a nightmare.
[Fekete et Henry, 2006]