Extreme visualization: squeezing a billion records into a million pixels: Difference between revisions

From InfoVis:Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(11 intermediate revisions by one other user not shown)
Line 1: Line 1:
= UNDER CONSTRUCTION =
== Authors ==
== Authors ==
*[[Shneiderman, Ben | Ben Shneiderman]]
*[[Shneiderman, Ben | Ben Shneiderman]]
Line 6: Line 4:


== Short description ==
== Short description ==
To perform database searches usually query languages like SQL and form fill in templates are used, with results shown in tabular lists.
However, more and more attention is drawn to dynamic queries sliders and other graphical selectors for query specification, with results displayed by information visualization techniques. These filtering techniques have proven to be effective for many tasks in which visual presentations enable discovery of relationships, clusters, outliers, gaps, and other patterns.
The scaling of visual presentations from millions to billions of records will require collaborative research efforts in information visualization and database management to enable rapid aggregation, meaningful coordinated windows, and effective summary graphics.
Current and proposed solutions that facilitate sense-making for interactive visual exploration of billion record data sets are
* atomic,
* aggregated,
* and density plots.




== Suitable Datatypes ==
== Suitable Datatypes ==
Information visualizations are designed to deal with  multi-dimensional and more importantly multi-variate data.
In addation to
* integer,
* categorical,
* real,
* and nominal
information visualizations often deal with even richer data types.
The four types
* multi-variate,
* time series,
* tree,
* and network
are tied to tasks such as finding clusters, gaps, outliers, trends, and relationships.




== Figures ==
== Figures ==
[[image: Figure1.jpg]]
=== Atomic Visualizations ===
 
[[image: Figure2.jpg]]
[[image: Figure2.jpg]]
[[image: Figure3.jpg]]
 
[[image: Figure4.jpg]]
Million node treemap showing the directory structure on a file server. Different colors encode file time, the area encodes the file size. So if zooming is allowed or special algorithms are used to limit drawing of lower level nodes, the visualization of a million nodes on a single display is possible.
 
=== Aggregate Visualizations ===
 
Clicking on a aggregation marker will cause an expansion in place, but it's more effectively to use coordinated windows for displaying the single components. This technique allows the user to study an overview map and then select a region to see more detailed information in the coordinated window.
 
[[image: Figure5.jpg]]
[[image: Figure5.jpg]]
[[image: Figure6.jpg]]
 
[[image: Figure7.jpg]]
This Graphical Interface for Digital Libraries offers a scalable approach where each axis is an expandable hierarchy. Each grid cell shows up to 49 colored dots for documents, and shifts to an aggregation marker in the form of a bar chart. Clicking on a grid cell produces a listing of titles in the upper right window. Clicking on a title produces the catalog description in the bottom right window.
[[image: Figure8jpg]]
 
[[image: Figure9.jpg]]
=== Density Plot Visualizations ===
[[image: Figure10.jpg]]
 
[[image: Figure11.jpg]]
Density Plots show a concentration of markers, which could be interpreted as a two-dimensional histogram.
For time series data, density plots can show concentrations of time points.A good model is the work on cluster displays in
parallel coordinate views.
 
[[image: Figure12.jpg]]
[[image: Figure12.jpg]]
[[image: Figure13.jpg]]
 
[[image: Figure14.jpg]]
Parallel coordinate shows 230,000 records in a fatal accident database on the left. The variable opacity bands show meaningful clusters on the right.




Line 32: Line 70:




== Evaluation ==
== Evaluation/Usage ==
* Spotfire
* Tableau
Handle at least a million records, provide dynamic query filtering and redisplay at interactive rates to support rapid exploration.
 
 
* Hyperion
* CrystalReports
Are Online Analytic Processing (OLAP) systems
 
 
* Hierarchical Clustering Explorer (HCE)
Implements the strategy of ranking strength of features.
 
 
* SpaceTree
* DOITree
Give user control over which nodes are exposed.
 


* Treemap 4.0
Allows a color coded density plot that shows the number of nodes or aggregate values of node attributes.


== References ==
== References ==
[http://delivery.acm.org/10.1145/1380000/1376618/p3-shneiderman.pdf?key1=1376618&key2=4764623421&coll=GUIDE&dl=GUIDE&CFID=37341484&CFTOKEN=83388005 Extreme visualization: squeezing a billion records into a million pixels]
[http://delivery.acm.org/10.1145/1380000/1376618/p3-shneiderman.pdf?key1=1376618&key2=4764623421&coll=GUIDE&dl=GUIDE&CFID=37341484&CFTOKEN=83388005 Extreme visualization: squeezing a billion records into a million pixels]
[http://www.cs.umd.edu/hcil/west-legal/gridl/ GRIDL - Graphical Interface for Digital Libraries]
[http://www.infovis-wiki.net/index.php?title=Web_Based_Visual_Exploration_of_Patent_Information Web Based Visual Exploration of Patent Information]




== internal References ==
== Internal References ==
[http://infovis-wiki.net/index.php?title=Treemap Treemap]
[http://infovis-wiki.net/index.php?title=Treemap Treemap]
[http://infovis-wiki.net/index.php?title=Zoom Zoom]
[http://infovis-wiki.net/index.php?title=Filtering Filtering]


[[category: techniques]]
[[category: techniques]]

Latest revision as of 22:58, 27 May 2009

Authors[edit]


Short description[edit]

To perform database searches usually query languages like SQL and form fill in templates are used, with results shown in tabular lists.


However, more and more attention is drawn to dynamic queries sliders and other graphical selectors for query specification, with results displayed by information visualization techniques. These filtering techniques have proven to be effective for many tasks in which visual presentations enable discovery of relationships, clusters, outliers, gaps, and other patterns.


The scaling of visual presentations from millions to billions of records will require collaborative research efforts in information visualization and database management to enable rapid aggregation, meaningful coordinated windows, and effective summary graphics.


Current and proposed solutions that facilitate sense-making for interactive visual exploration of billion record data sets are

  • atomic,
  • aggregated,
  • and density plots.


Suitable Datatypes[edit]

Information visualizations are designed to deal with multi-dimensional and more importantly multi-variate data.


In addation to

  • integer,
  • categorical,
  • real,
  • and nominal

information visualizations often deal with even richer data types.


The four types

  • multi-variate,
  • time series,
  • tree,
  • and network

are tied to tasks such as finding clusters, gaps, outliers, trends, and relationships.


Figures[edit]

Atomic Visualizations[edit]

Million node treemap showing the directory structure on a file server. Different colors encode file time, the area encodes the file size. So if zooming is allowed or special algorithms are used to limit drawing of lower level nodes, the visualization of a million nodes on a single display is possible.

Aggregate Visualizations[edit]

Clicking on a aggregation marker will cause an expansion in place, but it's more effectively to use coordinated windows for displaying the single components. This technique allows the user to study an overview map and then select a region to see more detailed information in the coordinated window.

This Graphical Interface for Digital Libraries offers a scalable approach where each axis is an expandable hierarchy. Each grid cell shows up to 49 colored dots for documents, and shifts to an aggregation marker in the form of a bar chart. Clicking on a grid cell produces a listing of titles in the upper right window. Clicking on a title produces the catalog description in the bottom right window.

Density Plot Visualizations[edit]

Density Plots show a concentration of markers, which could be interpreted as a two-dimensional histogram. For time series data, density plots can show concentrations of time points.A good model is the work on cluster displays in parallel coordinate views.

Parallel coordinate shows 230,000 records in a fatal accident database on the left. The variable opacity bands show meaningful clusters on the right.


Important citations[edit]

The purpose of visualization is insight, not pictures.
[Ben Shneiderman, 2008]



Evaluation/Usage[edit]

  • Spotfire
  • Tableau

Handle at least a million records, provide dynamic query filtering and redisplay at interactive rates to support rapid exploration.


  • Hyperion
  • CrystalReports

Are Online Analytic Processing (OLAP) systems


  • Hierarchical Clustering Explorer (HCE)

Implements the strategy of ranking strength of features.


  • SpaceTree
  • DOITree

Give user control over which nodes are exposed.


  • Treemap 4.0

Allows a color coded density plot that shows the number of nodes or aggregate values of node attributes.

References[edit]

Extreme visualization: squeezing a billion records into a million pixels

GRIDL - Graphical Interface for Digital Libraries

Web Based Visual Exploration of Patent Information


Internal References[edit]

Treemap

Zoom

Filtering