Teaching:TUW - UE InfoVis WS 2005/06 - Gruppe G8 - Aufgabe 3: Difference between revisions

From InfoVis:Wiki
Jump to navigation Jump to search
Line 252: Line 252:
# The details to the segments and bubbles are becoming visible as Tiptext via the mousepointer when rolling over the segment or bubble.
# The details to the segments and bubbles are becoming visible as Tiptext via the mousepointer when rolling over the segment or bubble.


[[Image:BubbleSchema.jpg]]<br>[Schnabl, Seyfang, Fritz, 2005]
<center>[[Image:BubbleSchema.jpg]]<br>[Schnabl, Seyfang, Fritz, 2005]</center>


[[Image:BubbleConnSchema.jpg]]<br>[Schnabl, Seyfang, Fritz, 2005]
<center>[[Image:BubbleConnSchema.jpg]]<br>[Schnabl, Seyfang, Fritz, 2005]</center>


====TreeNet====
====TreeNet====

Revision as of 19:43, 26 November 2005

Topic

On-line Music Portals: Analyzing the Users' Activities

Area of Application

This section describes the application area of
MICE - Music Investigation and Clustering Environment.


[Stefan Schnabl, 2005]

Analysis of Application Area

General Description

The dataset contains data of a special online music portal.
http://www.ericsson-mediasuite.com/music/web2/dyn/home

The great aim is to get information out about the activity of the users in the special areas. Maybe this can help to find out some hypothesis how the portal can be optimzies or restructured.
If a siginificant correlation between certain aspects can be found, this could be interesting for the portal owners and the technicians how established the portal.

Special Issues

We identified the following spots of interest:

  1. The user of MICE has different intentions than the person the data is about
  2. Basket Case Analysis and Cluster Analysis are heavily statistical techniques and quite intense in terms of required knowledge. Hence it is not easy to derive information from the results of these techniques.
  3. The questions which are to be solved will gain complexity once they are answered (report chain).
  4. The answering will not be a single "result" but a (iterative) process.
  5. Heavy task to realize an abstract graphical view on the data due to multidimensionality.

Analysis of the Dataset

The data is organized in several tables which refer to each other by Item Ids. The following tables are available:

- Album Represents all albums available.
- Artist Contains information about the known artists.
- Track References between Album and Artist.
- Asset Information concerning the products which are sold.
- Playlist Contains information about the playlists created by the users of the on-line portal.
- Playlist Item Associative reference between Track and Playlist.
- User Information concerning the users of the on-line portal. Referenced to Playlist.
- User action Data containing information about the operation a user executes within the on-line portal. Referenced to User and Track, Asset, or Artist.

The user behaviour results in the tables User and User action. All the other information is brought in by companies distributing such content.

Description of the Datatypes

The tables consist of the following Data items:

Album
Field Name Datatype Default value Constraints
artistId String (255 Bytes) [key] NULL
homeClusters String (255 Bytes) NULL
creationDate DateTime NULL
albumId Integer (20 Bytes) 0 not NULL
title String (255 Bytes) [key] NULL
albumId Primary Key
Artist
Field Name Datatype Default value Constraints
creationDate DateTime NULL
defaultAssetId String (255 Bytes) NULL
firstName String (255 Bytes) NULL
artistId Integer (20 Bytes) 0 not NULL
lastName String (255 Bytes) [key] NULL
homeClusters String (255 Bytes) NULL
artistId Primary Key
Track
Field Name Datatype Default value Constraints
artistId Integer (20 Bytes) NULL
creationDate DateTime [key] NULL
trackId Integer (20 Bytes) [key] 0 not NULL
title String (255 Bytes) NULL
ISRC String (255 Bytes) NULL
homeCluster String (255 Bytes) NULL
albumId String (255 Bytes) [key] NULL
trackId Primary Key
Asset
Field Name Datatype Default value Constraints
artistId Integer (20 Bytes) [key] NULL
assetType String (255 Bytes) NULL
creationDate DateTime NULL
assetId Integer (20 Bytes) 0 not NULL
trackId Integer (20 Bytes) [key] NULL
assetId Primary Key
Playlist
Field Name Datatype Default value Constraints
claim String (255 Bytes) NULL
creationDate DateTime NULL
playlistId Integer (20 Bytes) 0 not NULL
playListName String (255 Bytes) NULL
userId Integer (20 Bytes) [key] NULL
playlistId Primary Key
Playlist Item
Field Name Datatype Default value Constraints
trackId Integer (20 Bytes) [key] NULL
playlistId Integer (20 Bytes) [key] NULL
User
Field Name Datatype Default value Constraints
handSetId String (255 Bytes) NULL
userId Integer (20 Bytes) NOT NULL, auto increment
language String (255 Bytes) NULL
lastLoginDate DateTime NULL
registrationDate DateTime NULL
gender Character NULL
yearOfBirth unsigned Integer (10 Bytes) NULL
genrePref String(250 Bytes) NULL
favouredArtists String (no size specified)
userId Primary Key
User Action
Field Name Datatype Default value Constraints
context Integer (11 Bytes) -1 NOT NULL
userId Integer (20 Bytes) [key] 0 NOT NULL
productId Integer (20 Bytes) [key] 0 NOT NULL
actionTime TimeStamp [key] 0000-00-00 00:00:00 NOT NULL
action String (20 Bytes) <emtpy> NOT NULL
productType String (20 Bytes) <emtpy> NOT NULL

Description of the Datastructures

The tables are in touch with each other in the following way: The artist entity can be considered as the central point of the structure. It may be referenced to none or more assets, one or more album, one or more tracks and none or several user actions. Tracks and assets can also be referenced to several user actions. Due to the circumstance that a user action is always being executed by a user, there has to be a user - entitiy which can be referenced to several user actions. Further, playlists can be created by a user which results in a 1:n reference between user and the playlist-entity. The reference between the playlist and the contained tracks which is used to be a m:n reference is split up by the playlist item-entity. Finally tracks can be referenced to one or more clusters and a cluster may be in relation to 'super clusters', which are also contained in the cluster entity.

The symbols have to be interpreted in the following way: represents a 1:n reference between the entities A and B. One item of the entity B is referred to several items of the entity A. Further the minimum numbers of references may be specified with a small circle or a short line which can be interpreted as 0 or 1. So a circle (0) means that the reference pointer may be set to NULL, whereas a short line (1) means that there has to be at least one reference.

Target Group

Identifying the Target

The target for the exploration tool are mainly shopping analysts and web developer.
It is the aim to suit the needs of persons who run the on-line portal (since they are interested in optimization). On the other hand one must not forget about the technical aspects, and by that help the developers of such systems.

Special Issues of the Target Group

  1. Shop Owners: Extremely business focused.
  2. Web Developers: Extremely technically focused.
  3. Maybe forgetting the users needs to "feel good".

Known Solutions / Methods (related to the target group)

  1. (statisitcal) Data Mining mthods (e.g. Shoppingcart analysis, Principal Components Analysis, Cluster Analysis)
  2. SASS Enterprise Miner
  3. SPSS Statistical Investigation Tool

As a matter of fact these methods are all very heavily statistically oriented.

Intended Purpose

Goals and Objectives

  1. Find correlations between shopping habbits and other activities of the users.
  2. Find evidence for user activity with respect to profiled and unprofiled user.
  3. Linking between different visualization techniques.
  4. Zoom and Filter into more detail.

Problems and Tasks to Solve

  1. Giving a good overview without cutting away to much paths to the depth of the data.
  2. Clear mapping of variables.
  3. Clarify wht the user can expect by using certain buttons and methods.
  4. Cear structure of the methods.
  5. Consistency in usage of buttons and colors.
  6. Good guidance about wht to do with the results of a step of a method.
  7. Keep focused on matra: Overview First - Zoom and Filter - Details on Demand!

Example Questions

  1. In which timespan was a lot of activity?
  2. What are the buyers doing before or after they bought?
  3. How many people are visiting the different parts of the portal?
  4. Is VIEW and BUY correlated?
  5. Are there impulse buyers?

Proposed Design

Types of Visualization Applied

  1. Bubble Chart: Representing the parts of the online platform through bubbles. The traffic (or actions of people equal in the bubbles) is represented by the edges between the bubbles. Moreover the color inside the bubbles gives the proportions of persons as part of a selected group. A part stays grey representing "the rest".
  2. TreeNet: Showing the parts of the store (which are disjunct) as Treemaps, e.g. Tracks, Assets, Artists and dividing them into the different actions possible. The connection between the Treemaps (the actual netting) is then the amount of users which are equal within the parts. The edge is outlined with the smaller number of user giving the idea of the whole.
  3. Parallel Coordinates: The Variables selected are normalized (to a range of 0-1) and plotted into a parallel coordinate plot. Techniques as Focus/Context must be appield to be able to figure out patterns.
  4. (multidimensional) Scatterplot: A Scatterplot helping to understand (maybe in more detail) the relations of some variables.
  5. Morphing (timeline): The datasets are timestamped and thus can ve used to make a morphing of the plots. By that it is possible to see whether there are special occurences (maybe heavy advertisement leads to more activity).

Visual Mapping

The basic data represented are user actions. The essential table should be enhanced with a column indicating the user as an profiled user (or not).

Bubble Chart

  1. The size of the bubble represents the number of actions taken within the part of the portal. Since the portal is semantically divided in parts in which you just can set actions on disjunct products, the different product types are also represented through this basic view.
  2. Different color segments of the bubble represent the (selected) type of actions.
  3. The position on the timeline represents all the actions taken until a special point of time. Moreover this can be regulated to show all actions in a special timespan.
  4. The "whole" is always visible through a very transparent bubblechart giving the idea of all the actions.
  5. The details to the segments and bubbles are becoming visible as Tiptext via the mousepointer when rolling over the segment or bubble.

[Schnabl, Seyfang, Fritz, 2005]

[Schnabl, Seyfang, Fritz, 2005]

TreeNet

  1. The size of the single Treemaps is the number of user activities in a certain part of the portal.
  2. The dividing is done by the selected user activities.
  3. The edges between the single maps are representing the users which are equal in both maps. By that is is possible to see how much activity is done between two parts of the portal.
  4. This edges can also be divided with respect to certain variables (e.g. profiled and unprofiled user).
  5. Zooming into depth of one treemap or edge is provided, shrinking the context and enlarging the desired section.

Parallel Coordinates

The following attributes should be a fix part of the chart:

  1. Product Type
  2. User Action
  3. maybe: Time from 0-24

The following attributes can additionally be added to the chart:

  1. Cluster
  2. Age of User
  3. Asset Type
  4. ...

Very important is that some attributes exclude themselves (like assets and tracks are disjunct groups). This is not a structure but defined.

It must be possible to add color for certain constraints to the data:

  1. The profiled and unprofiled user,
  2. the timespan of the actions,
  3. ...

Additionally:

  1. Brushing
  2. Linking

(multidimensional) Scatterplot

  1. All Varaibles can be assigned to an axis.
  2. There is a maximum of 3 variables at the same time.
  3. When selecting more than 3 variables a subset out of htese variables is used to construct the scatterplot. An iteration takes place over time and exchanges variables continously to show all the selected variables.
  4. To make the graphic more readable the user can add color with respect to certain variables.

Morphing / Dynamic Queries

  • Specify start and end point.
  • Hit play to let morphing happen within the assigned timespan.
  • Hit STOP to set the header to the starting time.
  • Hit PAUSE to stop the header at any time.
  • Drag and Drop the head to jump to a specific time.
  • Switch between an absolute and relative time-axis.
  • The start and end point are also giving the span for the dynamic queries applied on the active (selected) methods.

Description of Used Techniques

(see above)

Possibilities of Interaction

  1. In any stage of the charts one can make use of the timeline to focus (dynamic queries).
  2. Hook on on/off checkboxes at the left to include/exclude certain user actions (Add / Remove Detail).
  3. Assign color to charts to represent special groups (e.g. profiled and unprofiled user)
  4. Assign different symbols (where possible) instead of colors.
  5. Rotation of subsamples of variables when selecting more than 3 (Scatterplot).
  6. Zoom to special Bubble down to the one user action (Zooming).
  7. Providing user data for the one user on demand and showing data in detail section (Details on Demand).
  8. When switching between the methods apply selection automatically to new method (Linking).
  9. When making a selection of a special timespan, apply it in all method screens (Linking).
  10. Select different lines out of the Parallel Coordinate Plot and keep selection visible in the other plots (Brushing).

Mockups / Fake Screenshots




Prototype

http://www.sunarts.at/Studium/InfoVis/MICE.html

Event Log

  • 25.11.2005 - introducing the playhead with speed controll and PLAY, PAUSE and STOP buttons
  • 23.11.2005 - introducing the timeline with the dynamic left and right boundaries
  • 22.11.2005 - placement of first version of MICE tool

< G8
<< overview