Extreme visualization: squeezing a billion records into a million pixels

From InfoVis:Wiki
Jump to navigation Jump to search

Authors


Short description

To perform database searches usually query languages like SQL and form fill in templates are used, with results shown in tabular lists.


However, more and more attention is drawn to dynamic queries sliders and other graphical selectors for query specification, with results displayed by information visualization techniques. These filtering techniques have proven to be effective for many tasks in which visual presentations enable discovery of relationships, clusters, outliers, gaps, and other patterns.


The scaling of visual presentations from millions to billions of records will require collaborative research efforts in information visualization and database management to enable rapid aggregation, meaningful coordinated windows, and effective summary graphics.


Current and proposed solutions that facilitate sense-making for interactive visual exploration of billion record data sets are

  • atomic,
  • aggregated,
  • and density plots.


Suitable Datatypes

Information visualizations are designed to deal with multi-dimensional and more importantly multi-variate data.


In addation to

  • integer,
  • categorical,
  • real,
  • and nominal

information visualizations often deal with even richer data types.


The four types

  • multi-variate,
  • time series,
  • tree,
  • and network

are tied to tasks such as finding clusters, gaps, outliers, trends, and relationships.


Figures

Atomic Visualizations

Million node treemap showing the directory structure on a file server. Different colors encode file time, the area encodes the file size. So if zooming is allowed or special algorithms are used to limit drawing of lower level nodes, the visualization of a million nodes on a single display is possible.

Aggregate Visualizations

Clicking on a aggregation marker will cause an expansion in place, but it's more effectively to use coordinated windows for displaying the single components. This technique allows the user to study an overview map and then select a region to see more detailed information in the coordinated window.

This Graphical Interface for Digital Libraries offers a scalable approach where each axis is an expandable hierarchy. Each grid cell shows up to 49 colored dots for documents, and shifts to an aggregation marker in the form of a bar chart. Clicking on a grid cell produces a listing of titles in the upper right window. Clicking on a title produces the catalog description in the bottom right window.

Density Plot Visualizations

Density Plots show a concentration of markers, which could be interpreted as a two-dimensional histogram. For time series data, density plots can show concentrations of time points.A good model is the work on cluster displays in parallel coordinate views.

Parallel coordinate shows 230,000 records in a fatal accident database on the left. The variable opacity bands show meaningful clusters on the right.


Important citations

The purpose of visualization is insight, not pictures.
[Ben Shneiderman, 2008]



Evaluation/Usage

  • Spotfire
  • Tableau

Handle at least a million records, provide dynamic query filtering and redisplay at interactive rates to support rapid exploration.


  • Hyperion
  • CrystalReports

Are Online Analytic Processing (OLAP) systems


  • Hierarchical Clustering Explorer (HCE)

Implements the strategy of ranking strength of features.


  • SpaceTree
  • DOITree

Give user control over which nodes are exposed.


  • Treemap 4.0

Allows a color coded density plot that shows the number of nodes or aggregate values of node attributes.

References

Extreme visualization: squeezing a billion records into a million pixels

GRIDL - Graphical Interface for Digital Libraries

Web Based Visual Exploration of Patent Information


Internal References

Treemap

Zoom

Filtering