fill the void - bdunagan

18 Jul 2011
Visualizing 144,000 Minutes in iTunes

Digging around Things’s To Do list inspired me to tackle a larger personal dataset: iTunes. One feature I love about iTunes is play counts. The app keeps a cumulative count of how many times I’ve played any given track. These counts are maintained by my iPod and iPhone as well, so the numbers give an accurate picture of my music history. Combining those counts with track lengths, I can see how I spend my days in iTunes.

I wrote a Ruby script using Nokogiri to parse ~/Music/iTunes/iTunes Music Library.xml, extracting out tracks with their time and play counts using SAX callbacks. Then I used the excellent d3 to visualize the data as a squarified treemap.

Here’s what I came up with: 2,700 tracks played, 28,000 plays, and …

100 days in iTunes is about 10% of my life, given that my iTunes Library file is three-years-old. Hans Zimmer’s scores to “The Dark Knight” and “Inception” represent nearly 20 days, and a single song fills 10 days or 1% of my life. I guess variety kills my focus.

Also interesting is how much I listen to podcasts:

Googling around, I found remarkably few people delving into iTunes as a dataset:

I did stumble upon Planetary for iPad. The free app visualizes your iTunes library as a universe, where each artist is a solar system, each album is a planet, and each song is a moon. The app varies the size of the moons based on play counts. It’s a fantastic app from Bloom, a company founded by people from Stamen.

I’ve included the Ruby source code below. It reads in the iTunes XML file and outputs a JSON file with the track information and an HTML file with the d3 Javascript code embedded.

# Require libraries
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'time'
require 'json'

# Handy extension
class Object
  def valid?
    !self.nil? && self.length > 0

# Simple object for Tracks
class Track
  attr_accessor :title
  attr_accessor :artist
  attr_accessor :album
  attr_accessor :play_count
  attr_accessor :length
  def valid?
    self.title.valid? && self.play_count.valid? && self.length.valid?

  def to_s
    puts "[#{self.artist}] '#{self.title}' in '#{self.album}': #{self.play_count} (#{self.length.to_i / 1000})"

  def total_seconds_played
    (self.play_count.to_i * self.length.to_i) / 1000

# Handler for XML SAX callbacks
class TrackXMLHandler < Nokogiri::XML::SAX::Document
  attr_accessor :previous_item
  attr_accessor :track
  attr_accessor :tracks

  # Nokogiri delegate hook
  def characters(string)
    # Store previous valid Track.
    if self.previous_item == "Track ID"
      puts "#{self.track.to_s}" if self.track.valid?
      (self.tracks << self.track) if self.track.valid?
      self.track =
    # Process Track attributes.
    self.track.title = string if self.previous_item == "Name"
    self.track.artist = string if self.previous_item == "Artist"
    self.track.album = string if self.previous_item == "Album"
    self.track.play_count = string if self.previous_item == "Play Count"
    self.track.length = string if self.previous_item == "Total Time"
    self.previous_item = string

# Process iTunes library into Tracks.
trackHandler =
trackHandler.tracks = []
parser =
# We only read the iTunes XML file, but still, run at your own risk. :)
parser.parse_file(File.expand_path("~/Music/iTunes/iTunes\ Music\ Library.xml"))

# Iterate over Tracks.
total_time = 0
total_plays = 0
hash = {}
artist_hash = {}
trackHandler.tracks.each do |track|
  # Convert Track objects into JSON for d3 treemap (artist => album => title => total length).
  hash[track.artist] = {} if !hash.keys.include?(track.artist)
  hash[track.artist][track.album] = {} if !hash[track.artist].keys.include?(track.album)
  hash[track.artist][track.album][track.title] = track.total_seconds_played
  # Aggregate artist play time.
  artist_hash[track.artist] = 0 if !artist_hash.keys.include?(track.artist)
  artist_hash[track.artist] += track.total_seconds_played
  # Aggregate total time and plays.
  total_time += track.total_seconds_played
  total_plays += track.play_count.to_i

# Display basic stats.
artist_hash.keys.sort_by{|key| artist_hash[key]}.each {|artist| puts "#{(artist_hash[artist] / 60).to_s.rjust(6)} minutes: #{artist}"}
puts "=> #{trackHandler.tracks.count} tracks"
puts "=> #{total_plays} plays"
puts "=> #{total_time / 60} minutes"

# Write out JSON file for d3 Javascript in HTML file to read.
f ="itunes.json","w")

# Write out the HTML to make it work.
html = <<EOF
	<!-- HTML copied from mbostock's -->
    <script type="text/javascript" src="d3.js"></script>
    <script type="text/javascript" src="d3.layout.js"></script>
    <style type="text/css">
      rect {
        fill: none;
        stroke: #fff;
      text {
        font: 10px sans-serif;
    <script type="text/javascript">

// Increase the size for large libraries (2k+)
var w = 520,
    h = 520,
    color = d3.scale.category20c();

var treemap = d3.layout.treemap()
    .size([w + 1, h + 1])
    .children(function(d) { return isNaN(d.value) ? d3.entries(d.value) : null; })
    .value(function(d) { return d.value; })

var svg ="body").append("svg:svg")
    .style("width", w)
    .style("height", h)
    .attr("transform", "translate(-.5,-.5)");

d3.json("itunes.json", function(json) {
  var cell ="g")
      .attr("class", "cell")
      .attr("transform", function(d) { return "translate(" + d.x + "," + d.y + ")"; });

      .attr("width", function(d) { return d.dx; })
      .attr("height", function(d) { return d.dy; })
      .style("fill", function(d) { return d.children ? color( : null; });
//   cell.append("svg:text")
//       .attr("x", function(d) { return d.dx / 2; })
//       .attr("y", function(d) { return d.dy / 2; })
//       .attr("dy", ".35em")
//       .attr("text-anchor", "middle")
//       .text(function(d) { return d.children ? null :; });

This script is also available in my bdunagan GitHub repository.

Previous LinkedIn Twitter GitHub Email Next