Teaching:TUW - UE InfoVis WS 2005/06 - Gruppe G8 - Aufgabe 3
Topic
On-line Music Portals: Analyzing the Users' Activities
Area of Application
This section describes the application area of
MICE - Music Investigation and Clustering Environment.
Analysis of Application Area
General Description
Shopping analysis, Cluster / Pattern analysis, Web based area of application, all data analysed comes form online users, areas of application include shopping optimazation, web application development, technical optimization. The tool might also be interesting for pure online stores.
Special Issues
We identified the following spots of interest:
- The user of MICE has different intentions than the person the data is about
- Basket Case Analysis and Cluster Analysis are heavily statistical techniques and quite intense in terms of required knowledge
- The questions wanted to be solved will gain on complexity once they are answered (report chain)
- The answering will not be a single "result" but a (iterative) process.
- Heavy task to realize an abstract graphical view on the data.
- ...
Analysis of the Dataset
The data is organized in several tables which refer to each other by Item Ids. The following tables are available:
- Album | Represents all albums available. |
- Artist | Contains information about the known artists. |
- Track | References between Album and Artist. |
- Asset | Information concerning the products which are sold. |
- Playlist | Contains information about the playlists created by the users of the on-line portal. |
- Playlist Item | Associative reference between Track and Playlist. |
- User | Information concerning the users of the on-line portal. Referenced to Playlist. |
- User action | Data containing information about the operation a user executes within the on-line portal. Referenced to User and Track, Asset, or Artist. |
The user behaviour results in the tables User and User action. All the other information is brought in by companies distributing such content.
Description of the Datatypes
The tables consist of the following Data items:
Album
Field Name | Datatype | Default value | Constraints |
artistId | String (255 Bytes) [key] | NULL | |
homeClusters | String (255 Bytes) | NULL | |
creationDate | DateTime | NULL | |
albumId | Integer (20 Bytes) | 0 | not NULL |
title | String (255 Bytes) [key] | NULL | |
albumId | Primary Key |
Artist
Field Name | Datatype | Default value | Constraints |
creationDate | DateTime | NULL | |
defaultAssetId | String (255 Bytes) | NULL | |
firstName | String (255 Bytes) | NULL | |
artistId | Integer (20 Bytes) | 0 | not NULL |
lastName | String (255 Bytes) [key] | NULL | |
homeClusters | String (255 Bytes) | NULL | |
artistId | Primary Key |
Track
Field Name | Datatype | Default value | Constraints |
artistId | Integer (20 Bytes) | NULL | |
creationDate | DateTime [key] | NULL | |
trackId | Integer (20 Bytes) [key] | 0 | not NULL |
title | String (255 Bytes) | NULL | |
ISRC | String (255 Bytes) | NULL | |
homeCluster | String (255 Bytes) | NULL | |
albumId | String (255 Bytes) [key] | NULL | |
trackId | Primary Key |
Asset
Field Name | Datatype | Default value | Constraints |
artistId | Integer (20 Bytes) [key] | NULL | |
assetType | String (255 Bytes) | NULL | |
creationDate | DateTime | NULL | |
assetId | Integer (20 Bytes) | 0 | not NULL |
trackId | Integer (20 Bytes) [key] | NULL | |
assetId | Primary Key |
Playlist
Field Name | Datatype | Default value | Constraints |
claim | String (255 Bytes) | NULL | |
creationDate | DateTime | NULL | |
playlistId | Integer (20 Bytes) | 0 | not NULL |
playListName | String (255 Bytes) | NULL | |
userId | Integer (20 Bytes) [key] | NULL | |
playlistId | Primary Key |
Playlist Item
Field Name | Datatype | Default value | Constraints |
trackId | Integer (20 Bytes) [key] | NULL | |
playlistId | Integer (20 Bytes) [key] | NULL |
User
Field Name | Datatype | Default value | Constraints |
handSetId | String (255 Bytes) | NULL | |
userId | Integer (20 Bytes) | NOT NULL, auto increment | |
language | String (255 Bytes) | NULL | |
lastLoginDate | DateTime | NULL | |
registrationDate | DateTime | NULL | |
gender | Character | NULL | |
yearOfBirth | unsigned Integer (10 Bytes) | NULL | |
genrePref | String(250 Bytes) | NULL | |
favouredArtists | String | ||
userId | Primary Key |
User Action
Field Name | Datatype | Default value | Constraints |
context | Integer (11 Bytes) | -1 | NOT NULL |
userId | Integer (20 Bytes) [key] | 0 | NOT NULL |
productId | Integer (20 Bytes) [key] | 0 | NOT NULL |
actionTime | TimeStamp [key] | 0000-00-00 00:00:00 | NOT NULL |
action | String (20 Bytes) | <emtpy> | NOT NULL |
productType | String (20 Bytes) | <emtpy> | NOT NULL |
Description of the Datastructures
The tables are in touch with each other in the following way: The artist entity can be considered as the central point of the structure. It may be referenced to none or more assets, one or more album, one or more tracks and none or several user actions. Tracks and assets can also be referenced to several user actions. Due to the circumstance that a user action is always being executed by a user, there has to be a user - entitiy which can be referenced to several user actions. Further, playlists can be created by a user which results in a 1:n reference between user and the playlist-entity. The reference between the playlist and the contained tracks which is used to be a m:n reference is split up by the playlist item-entity. Finally tracks can be referenced to one or more clusters and a cluster may be in relation to 'super clusters', which are also contained in the cluster entity.
Target Group
Identifying the Target
The target for the exploration tool are mainly shopping analysts and web developer.
It is the aim to suit the needs of persone who run the on-line portal (since they are interested in optimization). On the other hand one must not forget about the technical aspects, and by that help the developers of such systems.
Special Issues of the Target Group
- Shop Owners: Extremely business focused.
- Web Developers: Extremely technically focused.
- Maybe forgetting the users needs to "feel good".
- Data Mining
- SASS Enterprise Miner
- SPSS Statistical Investigation Tool
- => All very heavily statistically oriented.
Intended Purpose
Goals and Objectives
Problems and Tasks to Solve
- Giving a good overview without cutting away to much paths to the depth of the data.
- Clear mapping of variables.
- Clarify wht the user can expect by using certain buttons and methods.
- Cear structure of the methods.
- Consistency in usage of buttons and colors.
- Good guidance about wht to do with the results of a step of a method.
- Keep focused on matra: Overview First - Zoom and Focus - Details on Demand!
Example Questions
- In which timespan was a lot of activity?
- What are the buyers doing before or after they bought?
- How many people are visiting the different parts of the portal?
- Is VIEW and BUY correlated?
- Are there impulse buyers?
Proposed Design
Types of Visualization Applied
- Bubble Chart: Representing the parts of the online platform through bubbles. The traffic (or actions of people equal in the bubbles) is represented by the edges between the bubbles. Moreover the color inside the bubbles gives the proportions of persons as part of a selected group. A part stays grey representing "the rest".
- Parallel Coordinates: The Variables selected are normalized (to a range of 0-1) and plotted into a parallel coordinate plot. Techniques as Focus/Context must be appield to be able to figure out patterns.
- (multidimensional) Scatterplot: A Scatterplot helping to understand (maybe in more detail) the relations of some variables.
- Morphing (timeline): The datasets are timestamped and thus can ve used to make a morphing of the plots. By that it is possible to see whether there are special occurences (maybe heavy advertisement leads to more activity).
Visual Mapping
The basic data represented are user actions. The essential table should be enhanced with a column indicating the user as an profiled user (or not).
Bubble Chart
- The size of the bubble represents the number of actions taken within the part of the portal. Since the portal is semantically divided in parts in which you just can set actions on disjunct products, the different product types are also represented through this basic view.
- Different color segments of the bubble represent the (selected) type of actions.
- The position on the timeline represents all the actions taken until a special point of time. Moreover this can be regulated to show all actions in a special timespan.
- The "whole" is always visible through a very transparent bubblechart giving the idea of all the actions.
- The details to the segments and bubbles are becoming visible as Tiptext via the mousepointer when rolling over the segment or bubble.
Parallel Coordinates
The following attributes should be a fix part of the chart:
- Product Type
- User Action
- maybe: Time from 0-24
The following attributes can be additionally added to the chart:
- Cluster
- Age of User
- Asset Type
- ...
Very important is that some attributes exclude themselves (like assets and tracks are disjunct groups). This is not a structure but defined.
It must be possible to add color for certain constraints to the data:
- The profiled and unprofiled user
- the timespan of the actions
- ...
(multidimensional) Scatterplot
- All Varaibles can be assigned to an axis.
- There is a maximum of 3 variables at the same time.
- There can be used a set of varaibles which is then exchanged continously using a subset of 3 variables out of the selected ones at a time.
- There can be color added to the plot giving additional information (see Parallel Coordinates).
Morphing
- Specify start and end point.
- Hit play to let morphing happen within the assigned timespan.
- Hit STOP to set the header to the starting time.
- Hit PAUSE to stop the header at any time.
- Drag and Drop the head to jump to a specific time.
- Switch between an absolute and relative time-axis.
Description of Used Techniques
Possibilities of Interaction
Mockups / Fake Screenshots