Understand the Experiment

Understand the Experiment

Our work is based on two distinct data sources: The Billboard Charts and the Million Song Dataset.

The Billboard Charts

The Billboard charts tabulate the relative weekly popularity of songs or albums in the United States. The results are published in Billboard magazine. The two primary charts – the Hot 100 (top 100 singles) and the Billboard 200 (top 200 albums) – factor in airplay, as well as music sales in all relevant formats.

We used the yearly summary of the Hot 100 charts for our analysis.

The Million Song Dataset

The Million Song Dataset (MSD) is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. It started as a collaborative project between The Echo Nest and LabROSA (The Laboratory for the Recognition and Organization of Speech and Audio, from the Columbia University). It is supported in part by the NSF.

The main source of audio metadata in the MSD comes from a service called The Echo Nest. They developed a very advanced automatic music listening solution, which uses intrincated cognitive models to derive attributes such as song key, time signature, structure and timbre information vectors.

Song metadata normalization

We used the MusicBrainz catalog service to normalize metadata information, notably year of release.

Our approach

We used the popularity data from Billboard crossed with musical data from MSD/EchoNest to develop a number of visualizations of certain attributes of songs that can be considered commercially successful.

Such an analysis is bound to be incomplete and to have room for improvement, and we're with out eyes always open to new visualizations which can provide better insights. So, don't hesitate to send us suggesions and comments.