Teaching:TUW - UE InfoVis WS 2007/08 - Gruppe 07 - Aufgabe 1 - Boxplot

From InfoVis:Wiki
Jump to navigation Jump to search

Definitions[edit]

In descriptive statistics, a boxplot (also known as a box-and-whisker diagram or plot or candlestick chart) is a convenient way of graphically depicting groups of numerical data through their five-number summaries (the smallest observation, lower quartile (Q1), median, upper quartile (Q3), and largest observation). A boxplot also indicates which observations, if any, might be considered outliers. The boxplot was invented in 1977 by the American statistician John Tukey.

Boxplots are able to visually show different types of populations, without making any assumptions of the underlying statistical distribution. The spacings between the different parts of the box help indicate variance, skewness and identify outliers. Boxplots can be drawn either horizontally or vertically.
[Wikipedia, 2007]


Read full article on Wikipedia
Boxplot Illustration

Explanation[edit]

Several terms need to be explained in order to make the definition from the section above clear.

  • The box itself contains the middle 50% of the data.
  • the smallest observation - This term specifies the lowest value from the data set.
  • the lower quartile (Q1) - This term specifies a value that splits the data set into two parts, one with 25% of all data with lower values and the other with 75% of data with higher values.
  • median - This term specifies a value that splits the data set into two equally big parts (both parts have the same number of elements).
  • the upper quartile (Q3) - This term specifies a value that splits the data set into two parts, one with 75% of all data with lower values and the other with 25% of data with higher values.
  • the largest observation - This term specifies the highest value from the whole data set.
  • Not uncommonly, real datasets will display surprisingly high maximums or surprisingly low minimums called outliers.

Boxplots can be drawn in any orientation - horizontal or vertical.

Symmetry of a Boxplot[edit]

A Boxplot also gives a picture of symmetry of a dataset and visualises outliers clearly. But just due to the shape of a boxplot, someone should be careful with making any statement about the data distribution. E.g. a normally distributed dataset gives a symmetric boxplot, but a symmetric boxplot is not necessarily a visualisation of normally distributed data. Displaying a histogram in conjunction with the boxplot helps in this regard, and both are important tools for exploratory data analysis.


Example[edit]

As an example we consider values given from the table below to create a boxplot (right image). Notice that the dataset is approximately balanced around zero. Evidently the mean is near zero. However there is a variation in the dataset which ranges approximately from -6 to 6. The maximum and minimum values are showed as whiskers. Hence it is obvious that the boxplot is a powerful visualisation that has the ability to outframe characteristic attributes of the given dataset, in a way that viewers can quickly gain important informations from the visualisation that characterises the data.

Dataset
-5.13 -2.19
-2.43 -3.83
0.50 -3.25
4.32 1.63
5.18 -0.43
7.11 4.87
-3.10 -5.81
3.76 6.31
2.58 0.07
5.76 3.50


Related Links[edit]

References[edit]