Teaching:TUW - UE InfoVis WS 2005/06 - Gruppe G8 - Aufgabe 3

From InfoVis:Wiki
Jump to navigation Jump to search

Topic

On-line Music Portals: Analyzing the Users' Activities

Area of Application

This section describes the application area of
MICE - Music Investigation and Clustering Environment.


[Stefan Schnabl, 2005]

Analysis of Application Area

General Description

The dataset contains data of a special online music portal.
http://www.ericsson-mediasuite.com/music/web2/dyn/home

The great aim is to get information out about the activity of the users in the special areas. Maybe this can help to find out some hypothesis how the portal can be optimzies or restructured.
If a siginificant correlation between certain aspects can be found, this could be interesting for the portal owners and the technicians how established the portal.

Special Issues

We identified the following spots of interest:

  1. The user of MICE has different intentions than the person the data is about
  2. Basket Case Analysis and Cluster Analysis are heavily statistical techniques and quite intense in terms of required knowledge. Hence it is not easy to derive information from the results of these techniques.
  3. The questions which are to be solved will gain complexity once they are answered (report chain).
  4. The answering will not be a single "result" but a (iterative) process.
  5. Heavy task to realize an abstract graphical view on the data due to multidimensionality.
  6. The action happens at a point of time, therefore we have practically no data flow (bec. the number of simultaneous actions is zero or one).
  7. The data needs aggregation and is only representabel through actions in a time intervall.

Analysis of the Dataset

The data is organized in several tables which refer to each other by Item Ids. The following tables are available:

- Album Represents all albums available.
- Artist Contains information about the known artists.
- Track References between Album and Artist.
- Asset Information concerning the products which are sold.
- Playlist Contains information about the playlists created by the users of the on-line portal.
- Playlist Item Associative reference between Track and Playlist.
- User Information concerning the users of the on-line portal. Referenced to Playlist.
- User action Data containing information about the operation a user executes within the on-line portal. Referenced to User and Track, Asset, or Artist.

The user behaviour results in the tables User and User action. All the other information is brought in by companies distributing such content.

Description of the Datatypes

The tables consist of the following Data items:

Album
Field Name Datatype Default value Constraints
artistId String (255 Bytes) [key] NULL
homeClusters String (255 Bytes) NULL
creationDate DateTime NULL
albumId Integer (20 Bytes) 0 not NULL
title String (255 Bytes) [key] NULL
albumId Primary Key
Artist
Field Name Datatype Default value Constraints
creationDate DateTime NULL
defaultAssetId String (255 Bytes) NULL
firstName String (255 Bytes) NULL
artistId Integer (20 Bytes) 0 not NULL
lastName String (255 Bytes) [key] NULL
homeClusters String (255 Bytes) NULL
artistId Primary Key
Track
Field Name Datatype Default value Constraints
artistId Integer (20 Bytes) NULL
creationDate DateTime [key] NULL
trackId Integer (20 Bytes) [key] 0 not NULL
title String (255 Bytes) NULL
ISRC String (255 Bytes) NULL
homeCluster String (255 Bytes) NULL
albumId String (255 Bytes) [key] NULL
trackId Primary Key
Asset
Field Name Datatype Default value Constraints
artistId Integer (20 Bytes) [key] NULL
assetType String (255 Bytes) NULL
creationDate DateTime NULL
assetId Integer (20 Bytes) 0 not NULL
trackId Integer (20 Bytes) [key] NULL
assetId Primary Key
Playlist
Field Name Datatype Default value Constraints
claim String (255 Bytes) NULL
creationDate DateTime NULL
playlistId Integer (20 Bytes) 0 not NULL
playListName String (255 Bytes) NULL
userId Integer (20 Bytes) [key] NULL
playlistId Primary Key
Playlist Item
Field Name Datatype Default value Constraints
trackId Integer (20 Bytes) [key] NULL
playlistId Integer (20 Bytes) [key] NULL
User
Field Name Datatype Default value Constraints
handSetId String (255 Bytes) NULL
userId Integer (20 Bytes) NOT NULL, auto increment
language String (255 Bytes) NULL
lastLoginDate DateTime NULL
registrationDate DateTime NULL
gender Character NULL
yearOfBirth unsigned Integer (10 Bytes) NULL
genrePref String(250 Bytes) NULL
favouredArtists String (no size specified)
userId Primary Key
User Action
Field Name Datatype Default value Constraints
context Integer (11 Bytes) -1 NOT NULL
userId Integer (20 Bytes) [key] 0 NOT NULL
productId Integer (20 Bytes) [key] 0 NOT NULL
actionTime TimeStamp [key] 0000-00-00 00:00:00 NOT NULL
action String (20 Bytes) <emtpy> NOT NULL
productType String (20 Bytes) <emtpy> NOT NULL

Description of the Datastructures

The tables are in touch with each other in the following way: The artist entity can be considered as the central point of the structure. It may be referenced to none or more assets, one or more album, one or more tracks and none or several user actions. Tracks and assets can also be referenced to several user actions. Due to the circumstance that a user action is always being executed by a user, there has to be a user - entitiy which can be referenced to several user actions. Further, playlists can be created by a user which results in a 1:n reference between user and the playlist-entity. The reference between the playlist and the contained tracks which is used to be a m:n reference is split up by the playlist item-entity. Finally tracks can be referenced to one or more clusters and a cluster may be in relation to 'super clusters', which are also contained in the cluster entity.

The symbols have to be interpreted in the following way: represents a 1:n reference between the entities A and B. One item of the entity B is referred to several items of the entity A. Further the minimum numbers of references may be specified with a small circle or a short line which can be interpreted as 0 or 1. So a circle (0) means that the reference pointer may be set to NULL, whereas a short line (1) means that there has to be at least one reference.

Target Group

Identifying the Target

The target for the exploration tool are mainly shopping analysts and web developer.
It is the aim to suit the needs of persons who run the on-line portal (since they are interested in optimization). On the other hand one must not forget about the technical aspects, and by that help the developers of such systems.

Special Issues of the Target Group

  1. Shop Owners: Extremely business focused.
  2. Web Developers: Extremely technically focused.
  3. Maybe forgetting the users needs to "feel good".

Known Solutions / Methods (related to the target group)

  1. (statisitcal) Data Mining mthods (e.g. Shoppingcart analysis, Principal Components Analysis, Cluster Analysis)
  2. SASS Enterprise Miner
  3. SPSS Statistical Investigation Tool

As a matter of fact these methods are all very heavily statistically oriented.

Intended Purpose

Goals and Objectives

  1. Find correlations between shopping habbits and other activities of the users.
  2. Find evidence for user activity with respect to profiled and unprofiled user.
  3. Linking between different visualization techniques.
  4. Zoom and Filter into more detail.

Problems and Tasks to Solve

  1. Giving a good overview without cutting away to much paths to the depth of the data.
  2. Clear mapping of variables.
  3. Clarify wht the user can expect by using certain buttons and methods.
  4. Cear structure of the methods.
  5. Consistency in usage of buttons and colors.
  6. Good guidance about wht to do with the results of a step of a method.
  7. Keep focused on matra: Overview First - Zoom and Filter - Details on Demand!

Example Questions

  1. In which timespan was a lot of activity?
  2. What are the buyers doing before or after they bought?
  3. How many people are visiting the different parts of the portal?
  4. Is VIEW and BUY correlated?
  5. Are there impulse buyers?

Proposed Design

Types of Visualization Applied

  1. Stacked Barplot: The Actions in an intervall (e.g. 1 hour) are counted and represented as stacked bars. Moreover the different parts of the bars are connected with lines to give an idea what is the development from one intervall to the next.
  2. (multidimensional) Scatterplot: A Scatterplot helping to understand (maybe in more detail) the relations of some variables.
  3. Morphing (timeline): The datasets are timestamped and thus can ve used to make a morphing of the plots. By that it is possible to see whether there are special occurences (maybe heavy advertisement leads to more activity).

Visual Mapping

The basic data represented are user actions. The essential table should be enhanced with a column indicating the user as an profiled user (or not).

Stacked Barplot

  1. Bars represent actions in a time intervall (e.g. one hour).
  2. Space between bars leave room for printing lines to give an ide of the trend.
  3. Difference in colour represents the specific action (e.g. green = VIEW, red = BUY, etc.)
  4. Actions are stacked over each other, but with fixed order (VIEW before BUY before RATE etc.)


Mosaicplot

Represent countdata from different factors with more levels as plots where the areas represent the values of the cells of the underlying contingency table. This gives a graphical test of independency of the actual values from the combinations of the factors.


(multidimensional) Scatterplot

  1. All Varaibles can be assigned to an axis.
  2. There is a maximum of 3 variables at the same time.
  3. When selecting more than 3 variables a subset out of htese variables is used to construct the scatterplot. An iteration takes place over time and exchanges variables continously to show all the selected variables.
  4. To make the graphic more readable the user can add color with respect to certain variables.

Morphing / Dynamic Queries

  • Specify start and end point.
  • Hit play to let morphing happen within the assigned timespan.
  • Hit STOP to set the header to the starting time.
  • Hit PAUSE to stop the header at any time.
  • Drag and Drop the head to jump to a specific time.
  • Switch between an absolute and relative time-axis.
  • The start and end point are also giving the span for the dynamic queries applied on the active (selected) methods.

Time

  • The time is represented interactivelly by the timeline in the upper part.
  • Additionally the current time and date are represented by an optional calender / clock object.

Description of Used Techniques

(see above)

Possibilities of Interaction

  1. In any stage of the charts one can make use of the timeline to focus (dynamic queries).
  2. Hook on on/off checkboxes at the left to include/exclude certain user actions (Add / Remove Detail).
  3. Assign color to charts to represent special groups (e.g. profiled and unprofiled user)
  4. Assign different symbols (where possible) instead of colors.
  5. Rotation of subsamples of variables when selecting more than 3 (Scatterplot).
  6. Zoom to special Bubble down to the one user action (Zooming).
  7. Providing user data for the one user on demand and showing data in detail section (Details on Demand).
  8. When switching between the methods apply selection automatically to new method (Linking).
  9. When making a selection of a special timespan, apply it in all method screens (Linking).
  10. Select different lines out of the Parallel Coordinate Plot and keep selection visible in the other plots (Brushing).
  11. When more than one visualizations are selected, one can klick and hold to zoom the spezified technique larger (Zooming). The others move to the background but stay visible.

Mockups / Fake Screenshots




Prototype

http://www.sunarts.at/Studium/InfoVis/MICE.html

Prototype

Event Log

  • 28.11.2005 - Visualisation techniques are now selectable NEXT to one another (switches, no tables).
  • 25.11.2005 - introducing the playhead with speed controll and PLAY, PAUSE and STOP buttons
  • 23.11.2005 - introducing the timeline with the dynamic left and right boundaries
  • 22.11.2005 - placement of first version of MICE tool
  • 20.12.2005 - final prototype released
  • 20.12.2005 - short description online



< G8
<< overview