Teaching:TUW - UE InfoVis WS 2005/06 - Gruppe G8 - Aufgabe 3
Topic
On-line Music Portals: Analyzing the Users' Activities
Area of Application
This section describes the application area of
MICE - Music Investigation and Clustering Environment.
Analysis of Application Area
General Description
The dataset contains data of a special online music portal.
http://www.ericsson-mediasuite.com/music/web2/dyn/home
The great aim is to get information out about the activity of the users in the special areas. Maybe this can help to find out some hypothesis how the portal can be optimzies or restructured.
If a siginificant correlation between certain aspects can be found, this could be interesting for the portal owners and the technicians how established the portal.
Special Issues
We identified the following spots of interest:
- The user of MICE has different intentions than the person the data is about
- Basket Case Analysis and Cluster Analysis are heavily statistical techniques and quite intense in terms of required knowledge. Hence it is not easy to derive information from the results of these techniques.
- The questions which are to be solved will gain complexity once they are answered (report chain).
- The answering will not be a single "result" but a (iterative) process.
- Heavy task to realize an abstract graphical view on the data due to multidimensionality.
- The action happens at a point of time, therefore we have practically no data flow (bec. the number of simultaneous actions is zero or one).
- The data needs aggregation and is only representabel through actions in a time intervall.
Analysis of the Dataset
The data is organized in several tables which refer to each other by Item Ids. The following tables are available:
- Album | Represents all albums available. |
- Artist | Contains information about the known artists. |
- Track | References between Album and Artist. |
- Asset | Information concerning the products which are sold. |
- Playlist | Contains information about the playlists created by the users of the on-line portal. |
- Playlist Item | Associative reference between Track and Playlist. |
- User | Information concerning the users of the on-line portal. Referenced to Playlist. |
- User action | Data containing information about the operation a user executes within the on-line portal. Referenced to User and Track, Asset, or Artist. |
The user behaviour results in the tables User and User action. All the other information is brought in by companies distributing such content.
Description of the Datatypes
The tables consist of the following Data items:
Album
Field Name | Datatype | Default value | Constraints |
artistId | String (255 Bytes) [key] | NULL | |
homeClusters | String (255 Bytes) | NULL | |
creationDate | DateTime | NULL | |
albumId | Integer (20 Bytes) | 0 | not NULL |
title | String (255 Bytes) [key] | NULL | |
albumId | Primary Key |
Artist
Field Name | Datatype | Default value | Constraints |
creationDate | DateTime | NULL | |
defaultAssetId | String (255 Bytes) | NULL | |
firstName | String (255 Bytes) | NULL | |
artistId | Integer (20 Bytes) | 0 | not NULL |
lastName | String (255 Bytes) [key] | NULL | |
homeClusters | String (255 Bytes) | NULL | |
artistId | Primary Key |
Track
Field Name | Datatype | Default value | Constraints |
artistId | Integer (20 Bytes) | NULL | |
creationDate | DateTime [key] | NULL | |
trackId | Integer (20 Bytes) [key] | 0 | not NULL |
title | String (255 Bytes) | NULL | |
ISRC | String (255 Bytes) | NULL | |
homeCluster | String (255 Bytes) | NULL | |
albumId | String (255 Bytes) [key] | NULL | |
trackId | Primary Key |
Asset
Field Name | Datatype | Default value | Constraints |
artistId | Integer (20 Bytes) [key] | NULL | |
assetType | String (255 Bytes) | NULL | |
creationDate | DateTime | NULL | |
assetId | Integer (20 Bytes) | 0 | not NULL |
trackId | Integer (20 Bytes) [key] | NULL | |
assetId | Primary Key |
Playlist
Field Name | Datatype | Default value | Constraints |
claim | String (255 Bytes) | NULL | |
creationDate | DateTime | NULL | |
playlistId | Integer (20 Bytes) | 0 | not NULL |
playListName | String (255 Bytes) | NULL | |
userId | Integer (20 Bytes) [key] | NULL | |
playlistId | Primary Key |
Playlist Item
Field Name | Datatype | Default value | Constraints |
trackId | Integer (20 Bytes) [key] | NULL | |
playlistId | Integer (20 Bytes) [key] | NULL |
User
Field Name | Datatype | Default value | Constraints |
handSetId | String (255 Bytes) | NULL | |
userId | Integer (20 Bytes) | NOT NULL, auto increment | |
language | String (255 Bytes) | NULL | |
lastLoginDate | DateTime | NULL | |
registrationDate | DateTime | NULL | |
gender | Character | NULL | |
yearOfBirth | unsigned Integer (10 Bytes) | NULL | |
genrePref | String(250 Bytes) | NULL | |
favouredArtists | String (no size specified) | ||
userId | Primary Key |
User Action
Field Name | Datatype | Default value | Constraints |
context | Integer (11 Bytes) | -1 | NOT NULL |
userId | Integer (20 Bytes) [key] | 0 | NOT NULL |
productId | Integer (20 Bytes) [key] | 0 | NOT NULL |
actionTime | TimeStamp [key] | 0000-00-00 00:00:00 | NOT NULL |
action | String (20 Bytes) | <emtpy> | NOT NULL |
productType | String (20 Bytes) | <emtpy> | NOT NULL |
Description of the Datastructures
The tables are in touch with each other in the following way: The artist entity can be considered as the central point of the structure. It may be referenced to none or more assets, one or more album, one or more tracks and none or several user actions. Tracks and assets can also be referenced to several user actions. Due to the circumstance that a user action is always being executed by a user, there has to be a user - entitiy which can be referenced to several user actions. Further, playlists can be created by a user which results in a 1:n reference between user and the playlist-entity. The reference between the playlist and the contained tracks which is used to be a m:n reference is split up by the playlist item-entity. Finally tracks can be referenced to one or more clusters and a cluster may be in relation to 'super clusters', which are also contained in the cluster entity.
The symbols have to be interpreted in the following way: represents a 1:n reference between the entities A and B. One item of the entity B is referred to several items of the entity A. Further the minimum numbers of references may be specified with a small circle or a short line which can be interpreted as 0 or 1. So a circle (0) means that the reference pointer may be set to NULL, whereas a short line (1) means that there has to be at least one reference.
Target Group
Identifying the Target
The target for the exploration tool are mainly shopping analysts and web developer.
It is the aim to suit the needs of persons who run the on-line portal (since they are interested in optimization). On the other hand one must not forget about the technical aspects, and by that help the developers of such systems.
Special Issues of the Target Group
- Shop Owners: Extremely business focused.
- Web Developers: Extremely technically focused.
- Maybe forgetting the users needs to "feel good".
- (statisitcal) Data Mining mthods (e.g. Shoppingcart analysis, Principal Components Analysis, Cluster Analysis)
- SASS Enterprise Miner
- SPSS Statistical Investigation Tool
As a matter of fact these methods are all very heavily statistically oriented.
Intended Purpose
Goals and Objectives
- Find correlations between shopping habbits and other activities of the users.
- Find evidence for user activity with respect to profiled and unprofiled user.
- Linking between different visualization techniques.
- Zoom and Filter into more detail.
Problems and Tasks to Solve
- Giving a good overview without cutting away to much paths to the depth of the data.
- Clear mapping of variables.
- Clarify wht the user can expect by using certain buttons and methods.
- Cear structure of the methods.
- Consistency in usage of buttons and colors.
- Good guidance about wht to do with the results of a step of a method.
- Keep focused on matra: Overview First - Zoom and Filter - Details on Demand!
Example Questions
- In which timespan was a lot of activity?
- What are the buyers doing before or after they bought?
- How many people are visiting the different parts of the portal?
- Is VIEW and BUY correlated?
- Are there impulse buyers?
Proposed Design
Types of Visualization Applied
- Stacked Barplot: The Actions in an intervall (e.g. 1 hour) are counted and represented as stacked bars. Moreover the different parts of the bars are connected with lines to give an idea what is the development from one intervall to the next.
- (multidimensional) Scatterplot: A Scatterplot helping to understand (maybe in more detail) the relations of some variables.
- Morphing (timeline): The datasets are timestamped and thus can ve used to make a morphing of the plots. By that it is possible to see whether there are special occurences (maybe heavy advertisement leads to more activity).
Visual Mapping
The basic data represented are user actions. The essential table should be enhanced with a column indicating the user as an profiled user (or not).
Stacked Barplot
- Bars represent actions in a time intervall (e.g. one hour).
- Space between bars leave room for printing lines to give an ide of the trend.
- Difference in colour represents the specific action (e.g. green = VIEW, red = BUY, etc.)
- Actions are stacked over each other, but with fixed order (VIEW before BUY before RATE etc.)
Mosaicplot
Represent countdata from different factors with more levels as plots where the areas represent the values of the cells of the underlying contingency table. This gives a graphical test of independency of the actual values from the combinations of the factors.
(multidimensional) Scatterplot
- All Varaibles can be assigned to an axis.
- There is a maximum of 3 variables at the same time.
- When selecting more than 3 variables a subset out of htese variables is used to construct the scatterplot. An iteration takes place over time and exchanges variables continously to show all the selected variables.
- To make the graphic more readable the user can add color with respect to certain variables.
Morphing / Dynamic Queries
- Specify start and end point.
- Hit play to let morphing happen within the assigned timespan.
- Hit STOP to set the header to the starting time.
- Hit PAUSE to stop the header at any time.
- Drag and Drop the head to jump to a specific time.
- Switch between an absolute and relative time-axis.
- The start and end point are also giving the span for the dynamic queries applied on the active (selected) methods.
Time
- The time is represented interactivelly by the timeline in the upper part.
- Additionally the current time and date are represented by an optional calender / clock object.
Description of Used Techniques
(see above)
Possibilities of Interaction
- In any stage of the charts one can make use of the timeline to focus (dynamic queries).
- Hook on on/off checkboxes at the left to include/exclude certain user actions (Add / Remove Detail).
- Assign color to charts to represent special groups (e.g. profiled and unprofiled user)
- Assign different symbols (where possible) instead of colors.
- Rotation of subsamples of variables when selecting more than 3 (Scatterplot).
- Zoom to special Bubble down to the one user action (Zooming).
- Providing user data for the one user on demand and showing data in detail section (Details on Demand).
- When switching between the methods apply selection automatically to new method (Linking).
- When making a selection of a special timespan, apply it in all method screens (Linking).
- Select different lines out of the Parallel Coordinate Plot and keep selection visible in the other plots (Brushing).
- When more than one visualizations are selected, one can klick and hold to zoom the spezified technique larger (Zooming). The others move to the background but stay visible.
Mockups / Fake Screenshots
Prototype
http://www.sunarts.at/Studium/InfoVis/MICE.html
Event Log
- 28.11.2005 - Visualisation techniques are now selectable NEXT to one another (switches, no tables).
- 25.11.2005 - introducing the playhead with speed controll and PLAY, PAUSE and STOP buttons
- 23.11.2005 - introducing the timeline with the dynamic left and right boundaries
- 22.11.2005 - placement of first version of MICE tool
- 20.12.2005 - final prototype released
- 20.12.2005 - short description online