Teaching:TUW - UE InfoVis WS 2006/07 - Gruppe 01 - Aufgabe 2: Difference between revisions

From InfoVis:Wiki
Jump to navigation Jump to search
Line 87: Line 87:


* '''Axis order:''' As already stated above it's not easy to see which unit is used on which axis. Therefore a closer analysis. While the risk decreases from left to right, which is what the axis says, the fear ranking (number of articles) doesn't play into the representation at all. As it can be easily seen the risk categories aren't ordered by the article numbers. Nonetheless the categories are arranged in an ascending order. Additionally the line should be descending independently of the unit assignment: x-axis:Amount of risk; y-axis:Amount of fear (Low risk is meant to implie High fear); or the other way around. So the reality is: The data is only ordered by the "odds of injury" and then simply placed on an ascending line implying a linear relationship which isn't there.
* '''Axis order:''' As already stated above it's not easy to see which unit is used on which axis. Therefore a closer analysis. While the risk decreases from left to right, which is what the axis says, the fear ranking (number of articles) doesn't play into the representation at all. As it can be easily seen the risk categories aren't ordered by the article numbers. Nonetheless the categories are arranged in an ascending order. Additionally the line should be descending independently of the unit assignment: x-axis:Amount of risk; y-axis:Amount of fear (Low risk is meant to implie High fear); or the other way around. So the reality is: The data is only ordered by the "odds of injury" and then simply placed on an ascending line implying a linear relationship which isn't there.
* Three records have no values for either odds of injury or odds of death, but are still positioned in the graphic.
* '''Missing values:''' There a three records which missing values for either odds of injury or odds of death, but are still positioned in the graphic. Without explanation how these missing values where handled.
* No apparent correlation between odds of injury and number of articles exists, though the graphic tries to convince the viewer otherwise. The few correlations that do exist can be attributed to chance.
* '''No linear correlation:''' As already stated there is no apparent correlation between odds of injury (or odds of death) and number of articles, though the graphic tries to convince the viewer otherwise. The few correlations that do exist can be attributed to chance.
* Is the number of articles written about a subject really a good measure for fear of this subject? Exactly the opposite could be claimed in that the more people know about a subject (i.e. the more articles they read about it), the less they fear it.
* Is the number of articles written about a subject really a good measure for fear of this subject? Exactly the opposite could be claimed in that the more people know about a subject (i.e. the more articles they read about it), the less they fear it.



Revision as of 22:31, 13 November 2006

Poor Graphic

Never Bitten, Twice Shy: The Real Dangers Of Summer

Discussion of the original graphic

First impressions

  • The first thing one will notice when looking at this diagram is: The shark.
  • The next things, that really catch one's eye are the other graphical symbols, arranged from bottom left to top right, which seem to stand in a linear relationship.
  • Then you see the horizontal axis, positioned in the middle of the graphic, reading "More risk, less fear" on the left and "More fear, less risk" on the right.
  • Then finally you get to notice the actual values and the legend - if you don't get distracted by the nearly unreadable article text in the upper left of the image.

Design

Before analyzing the actual data in the graphic we try to evaluate the graphic from a design point of view:

  • Data-Ink-Ratio: This image has very bad Data-Ink-Ratio. As there are a huge number of visual elements (e.g. the huge shark) which not only are unnecessary to visualize the data itself, but even prevent/distract you from concentrating on the message of the graphic. They are completely dispensable as they add no information that is not already provided by the textual labels.
  • Space: The image takes up a great amount of space but leaves entire regions of the graphic blank and so without use.
  • Axis location: The location of the only axis in the diagram is somewhat misleading. The axis is placed in the middle of the diagram and suggesting a separation of the risk categories in some way (e.g. into a negative/positive region). There is no logical reason for locating the axis in the middle. A y-axis is not even displayed, although elements are also arranged vertically.
  • Axis units: There is no real way to tell the units and/or the ranges for the x- and y-axes. The only hint is the text on the arrow which reads "More risk, less fear" on the left and "More fear, less risk" on the right. Now you could think the data is ordered from left to right by increasing fear and decreasing risk but that isn't the case. See below in the "Detailed analysis of the data"-section.
  • Text on graphic: The article text in the upper left does not belong in the graphic itself and should be shown separately. As it is it is nearly unreadable due to its small size and distracting the viewer from the information in graphic. Additionally there are some comments (e.g. on missing values or specific conditions under which these values were obtained) directly on the graph which again distract from the its message, as the user has to read them to check if they contain important information (which mostly they don't).
  • Data density: The graphic gets a pretty bad rating here too, as the amount of space (as stated above) used to show information about the dataset is in no relation to the number of elements in the set, which are only 13.

Only to show what can be accomplished by just an improvement of the Data-Ink-Ratio we created this simplification of the original graphic:

Never Bitten, Twice Shy: The Real Dangers Of Summer

Detailed analysis of the data

The "The real risks of summer" data in table form

Risk Odds of injury1 Odds of dying Fear Index2
Skin cancer 1 in 200 1 in 29,500 102
Food poisoning 1 in 800 1 in 55,600 257
Bicycles 1 in 1,700 1 in 578,000 233
Lawn mowers 1 in 5300 Not available 53
Heat exposure Not available 1 in 950,000 229
Children falling out of windows 1 in 12,800 1 in 2,400,000 89
Lyme disease 1 in 18,100 Not available 47
Fireworks 1 in 32,400 1 in 71,200,000 59
Amusement parks 1 in 34,800 1 in 72,300,000 101
Snake bites 1 in 41,300 1 in 19,300,000 109
Drowning (while boating) 1 in 64,500 1 in 400,900 1,688
West Nile virus 1 in 68,500 1 in 1,000,000 2,240
Shark attacks 1 in 6,000,000 1 in 578,000,000 276

[1]Full row text: Odds of injury requiring medical treatment
[2]Fear index means: Number of newspaper articles written last summer about this risk

The table shows 13 risk categories each with three types of information: odds of injury, odds of death and a "fear index", built on the "number of newspaper articles written last summer" about this risk category.


  • Axis order: As already stated above it's not easy to see which unit is used on which axis. Therefore a closer analysis. While the risk decreases from left to right, which is what the axis says, the fear ranking (number of articles) doesn't play into the representation at all. As it can be easily seen the risk categories aren't ordered by the article numbers. Nonetheless the categories are arranged in an ascending order. Additionally the line should be descending independently of the unit assignment: x-axis:Amount of risk; y-axis:Amount of fear (Low risk is meant to implie High fear); or the other way around. So the reality is: The data is only ordered by the "odds of injury" and then simply placed on an ascending line implying a linear relationship which isn't there.
  • Missing values: There a three records which missing values for either odds of injury or odds of death, but are still positioned in the graphic. Without explanation how these missing values where handled.
  • No linear correlation: As already stated there is no apparent correlation between odds of injury (or odds of death) and number of articles, though the graphic tries to convince the viewer otherwise. The few correlations that do exist can be attributed to chance.
  • Is the number of articles written about a subject really a good measure for fear of this subject? Exactly the opposite could be claimed in that the more people know about a subject (i.e. the more articles they read about it), the less they fear it.


  • There are three different dimensions of data in the picture, but where does the trend comes from?
** Odds injury
** Odds death
** Number of Articles

REM: Was ist gemeint mit "where does the trend come from?" Dass der Anstieg nach injuries ist haben wir eh schon. -MM

  • Outliers ??? What to write about them

REM: Ich denk nicht, dass man über die Ausreißer speziell noch was schreiben muss. Dass die keine Korrellation injuries/articles existiert haben wir schon und die outliers können uns eigenlich egal sein. -MM

Das bitte mit Statistikwerten unterlegen. Es gibt zum Beispiel nur eine Korrelation von -0,147886423 zwischen Death und Artikeln. Einen Scatterplot davon mit Trendlinie. Angeben das es manchmal eine Korrelation zwischen Odds und Artikeln besteht aber nicht immer. Und schon gar nicht so linear wie in der Graphik. Insbesondere ohne die Outliers "Nile virus" und "drowning" sind die werte der artikel im bereich [53,276] also ziemlich flach und nicht linear ansteigend. Dann eine Graphik wie das ausschaut wenn mann korrekt X:A

REM: Brauen wir wirklich eine genaue statistische Analyse? Darum gehts doch überhaupt nicht. Dass die Daten nicht zusammenhängen sieht man eh schön, müssen wir das wirklich genauer untermauern? -MM

Better graphic

After taking a closer look at the data, we found out that the main message of the original graphic could not be supported by the actual data. We analyzed the values with several different diagrams and concluded that a rising odds of injury is not related to a lesser (or higher) number of articles. According to this conclusion we can not show the "fear-risk-ratio" in the same way as the source picture does. Therefore we try to visualize the data in a new diagram, not supporting the original "more risk, less fear"-thesis.

Because of the wide spread of the values, we had to use a logarithmic scale. Data are ordered by the number of articles, no connection to the other dimensions can be found. The only correlation that might exist is between the odds of injury and the odds of dying.

REM: Bitte an denjenigen der die Grafik jetzt gemacht hat: Beschriftungen der Linien ans Ende der Linien setzen. Und wenn möglich einheitliche Symbole für die Punkte nehmen (nicht einmal Karos, einmal Dreiecke, ...). -MM


Another way of presenting the data is to use one total risk value for both the death and the injury odds. In that way we can simplifiy the graphic by using some kind of block diagram. The blocks represent the total risk in percent and are ordered by the number of articles. According to the author of the original graphic, the number of articles written about a risk is equal to the risk's fear index. Because of that, we'll also use the fear index as the x-axis inscription. In that way one can easily see, that there is absolutly no relation between people's fears and the risk of getting involved with the corresponding dangers.

Total risk of the dangers

References

[Few, 2004] Stephen Few, Intelligent Enterprise Magazine: Elegance through simplicity. Created at: October 16, 2004. Retrieved at: November 12, 2006. http://www.intelligententerprise.com/showArticle.jhtml;jsessionid=N2ATDQWY5VYKSQSNDBGCKHSCJUMEKJVN?articleID=49400920.
[Mizuno et al., 1999] Yoko Mizuno, Tufte Design Principle Project. Created at: January 26, 1999. Retrieved at: November 12, 2006. http://ldt.stanford.edu/ldt1999/Students/mizuno/Portfolio/Work/reports/tufte/ed229c-tufte-outline.html.

Links