Teaching:TUW - UE InfoVis WS 2005/06 - Gruppe G7 - Aufgabe 3: Difference between revisions

From InfoVis:Wiki
Jump to navigation Jump to search
(Techniques)
Line 106: Line 106:


===Types of Visualization Applied===
===Types of Visualization Applied===
The following techniques will be used:
* Filtering
* Linking
* Detail on Demand
* Range Slider [Ahlberg and Shneiderman, 1994] with Time Lense.
{{quotation |"Overview first, zoom and filter, then details on demand. View relationships and history..." | Visual Information-Seeking Mantra [Shneiderman, 1996] }}
First, the user is presented with a net of accessed web pages and resources. She is able to filter the information using the checkboxes of OS, browser,.. and the Time Lens Range Slider.
When clicking on a resource, the view zooms if necessary and marks all nodes linking to and from this page, image,...


===Visual Mapping===
===Visual Mapping===
Line 154: Line 170:


:[2] [Cooper, 2004] Colin Cooper, Logfile Definitions and Examples, Intranet Software Solutions (Europe) Limited [ISSEL], Access Date: 17 October 2005, http://www.issel.co.uk/FAQ/logfile_definitions_examples.htm
:[2] [Cooper, 2004] Colin Cooper, Logfile Definitions and Examples, Intranet Software Solutions (Europe) Limited [ISSEL], Access Date: 17 October 2005, http://www.issel.co.uk/FAQ/logfile_definitions_examples.htm
:[3] [Shneiderman, 1996] Ben Shneiderman (1996), The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations, In Proceedings of the IEEE Symposium on Visual Languages, pages 336-343, Washington. IEEE Computer Society Press.
:[4] [Ahlberg and Shneiderman, 1994] Christopher Ahlberg and Ben Shneiderman (1994), Visual Information Seeking: Tight Coupling of Dynamic Query Filters with Starfield Displays, In Proceedings of the Conference on Human Factors in Computing Systems (CHI'94), pages 313-317. ACM Press.

Revision as of 11:37, 22 November 2005

Topic

Webserver Logfile Visualization

Area of Application

Analysis of Application Area

General Description

Webservers typically generate logfiles containing huge amounts of information on page accesses, used client software, type of access, and many more. Analysis tools like for example AWStats try to make use of this information and present simple statistics mostly in form of tables or simple bar graphs. Unfortunately, they are mostly very limited and "low-level" regarding their information representation. More interesting questions like user behavior in combination with site structure, dead ends, changes in behavior regarding to time (evolution), typical behavioral patterns, finding groups of users that share similar behavioral patterns, site entry points, or intrusion detection cannot be answered by using similar tools. For being able to deal with this kind of topics, more advanced visual tools are needed that unveil this information. [Behlendorf et al., 2005]

Special Issues

There are many different known solutions/methods to visualize Webeserver Logfiles. Most of them make use of simple chart or bar diagrams. The question is, if it is possible to present this huge set of informations in just one diagram, that illustrates all the data in a simple and undastandable way.

Analysis of the Dataset

Here is an example of such a logfile entry: [Cooper,2004]


205.218.110.166 - - [08/Dec/1996:15:02:10 -0800] "GET /info/index.html HTTP/1.0" 200 14912 "http://www.yourcompany.com/index.html " "Mozilla/3.0Gold (Win95; I)" "35bebd61b31211cfbdcd00c04fd611cf"


The content of this entry explained, from left to right:

"205.218.110.166" - - This is the IP address of the machine making a request of your web server - its domain name can be determined in HitList by enabling Reverse DNS lookups, assuming your server hasn't put this information in already - many so, some don't. (if the domain name was in there, you'd see its URL instead of the raw IP).

"-" - this first dash is typically the server's IP address, which most NCSA format servers don't insert by default.

"-" - this second dash is typically authenticated usernames, which again many NCSA format servers don't insert by default.

"[08/Dec/1996:15:02:10 -0800]" - This is the date and time of the access, including the offset from Greenwich Mean Time - the latter is the "-800", meaning the web server being accessed is 8 hours ahead of GMT.

"GET /info/index.html HTTP/1.0" - This is the actual request the visitor's browser made when at your page or server.

"HTTP/1.0" refers to the protocol and its version, here being version 1.0 of the http protocol.

"200" - this is the server response code - a "successful" request (meaning the visitor's browser loaded the entire HTML/GIF/JPEG, etc.) generates a response code of 200. Others include:

206 - Partial request successful (not complete) 302 - URL has been redirected to another document 400 - Bad request was made by the client 401 - Authorization is required for this document 403 - Access to this document is forbidden 404 - Document not found 500 - Server internal error 501 - Application method (either GET or POST) is not implemented 503 - Server is out of resources

"14912" - This is the number of bytes transferred to the client during the visit. Since every request has some response, even erroneous requests will have a non-zero value for this field. "http://www.yourcompany.com/index.html" - This is the referrer field, or the site the visitor was on immediately prior to making this entry's request - in this case, the person was looking at the index.html (probably the home page) page before going to the /info/index.html page in this entry. "Mozilla/3.0Gold (Win95; I)" - this is the user-agent field, meaning the actual browser and OS used by the visitor - in this case, Mozilla is Netscape, the next value is the version (here, 3.0Gold), and the final value is the OS it was using (Windows 95).

Finally, the "35bebd61b31211cfbdcd00c04fd611cf" is the cookie information, which may or may not be there, depending on whether the webserver used has cookies enabled and whether one was passed from webserver to the visitor's computer.


There are several different specific Logfile Formats, for example: Microsoft IIS 3.0 and 2.0, Microsoft IIS4.0 (W3SVC format), Netscape (NCSA format with/without unique format header), Lotus Domino format, O'Reilly WebSite format...


Target Group

Identifying the Target

We identified the following Target groups:

  1. Software Companies: especially those who develop browsers and web based applications
  2. Web Dsigners
  3. Administrators
  4. Advertising Companys
  5. Web Users

Special Issues of the Target Group

Software Companies: In order to optimize their software, they have to explore the user's needs. Logfiles can offer them some useful informations.

For Web Designers it can be helpful to know some facts about their visitors. Logfiles tell them, which browsers they use, when they access the site, traffic, and so on. This Informations allow them to organize the webpage in a convenient and reliable way. It's the same with Web Administrators.

Advertising Companies strive to study user behaviour in order to orientate their adds more effectively.

Known Solutions / Methods (related to the target group)

  1. Radial Tree Viewer
  2. Anemone
  3. Internet Cartographer
  4. WebTracer
  5. WebHopper
  6. WebPath
  7. The Chicago Tribune Website
  8. Visualizing the online debate on the European Constitution
  9. Mercator

...

Intended Purpose

Goals and Objectives

  1. Which client software is used, which browsers
  2. How users behave, in combination with site structure, dead ends, changes in behavior regarding to time.
  3. Identify groups of users who act in a similar way.
  4. Identify "hot" topics

Problems and Tasks to Solve

- Improve the Servers: Administrators, Web Designers and Software Companies need this information, to improve the Servers. They need to know what people are interested in, if there are special groups, which have similar behavior, navigate to the same themes and in a similar way. What kind of themes are often searched for, and how. With this information, they can help Users to get the things they want faster and easier. They can adapt the Server to the needs of the Users.
In order to effectively manage a web server, it is necessary to get feedback about the activity and performance of the server as well as any problems that may be occuring

- Adapt Advertising to Users interests:Advertising Companys can use the information to identify themes and products of interest to a group of users. You get an overview of groups of things people are interessted in. So they can adapt the topics of their advertisement to this information.


Proposed Design

Types of Visualization Applied

The following techniques will be used:

  • Filtering
  • Linking
  • Detail on Demand
  • Range Slider [Ahlberg and Shneiderman, 1994] with Time Lense.


"Overview first, zoom and filter, then details on demand. View relationships and history..."
Visual Information-Seeking Mantra [Shneiderman, 1996]


First, the user is presented with a net of accessed web pages and resources. She is able to filter the information using the checkboxes of OS, browser,.. and the Time Lens Range Slider. When clicking on a resource, the view zooms if necessary and marks all nodes linking to and from this page, image,...


Visual Mapping

(Datadimension => Attribute)

The log contains the following information:

IP, date, request, protocol, response code, bytes, referer, user agent, cookie


From a given IP it is possible to determin the country of origin, so IPs are mapped to the world map. <Unknown> will also be listed.

The date is listed on the date lens slider. After choosing a date range the corresponding numbers and entries will change. Above this date slider the user will find bars of transferred bytes in this date region.

The request is mapped onto the "web site" net in the middle. Further calculation is necessary to match /file/ to /file/index.php and /file/index.php?somesessionid=2139123123. Only successful requests are listed here (eg response code 200).

If referer is an URL from within the server, the "web site" net draws a connection between the two pages. If the page is referred from the outside, the referring page is listed on the left hand side panel "external referers"

The user agent is parsed for operating system and browser information. Each is listed on the right with checkboxes, so the user can remove or include certain browsers or OS to the data set.

The number of hits is mapped to the size of the dots in the net view and external referer list.

When following the interaction of users it is possible to estimate a page reading time. This is displayed as arc in each page dot.

Description of Used Techniques

Possibilities of Interaction

Klicking on a browser, operating system or weekday check box reduces or increases the displayed data set. Numbers next to the settings change accordingly.

Klicking on a web page in net view also updates these numbers: The time lens and filters show the number of accesses to this resource. Marked are other pages which have been visited by users, who have been watching this page. Klicking on a blank spot resets that view.

If the button "filtered list" is pressed, the program produces a list of the webserver log file according to the settings (browser, page, time, os,...)

The "hide media" check box removes, if checked, all images, css and media files from the list, keeping only html, php, ...

Mockups / Fake Screenshots

Screenshot

Resources

[1] [Behlendorf et al., 2005] Brian Behlendorf, Apache HTTP Server Logfiles, Apache HTTP Server Project, Access Date: 17 October 2005, http://httpd.apache.org/docs/1.3/logs.html
[2] [Cooper, 2004] Colin Cooper, Logfile Definitions and Examples, Intranet Software Solutions (Europe) Limited [ISSEL], Access Date: 17 October 2005, http://www.issel.co.uk/FAQ/logfile_definitions_examples.htm
[3] [Shneiderman, 1996] Ben Shneiderman (1996), The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations, In Proceedings of the IEEE Symposium on Visual Languages, pages 336-343, Washington. IEEE Computer Society Press.
[4] [Ahlberg and Shneiderman, 1994] Christopher Ahlberg and Ben Shneiderman (1994), Visual Information Seeking: Tight Coupling of Dynamic Query Filters with Starfield Displays, In Proceedings of the Conference on Human Factors in Computing Systems (CHI'94), pages 313-317. ACM Press.