In 1994, Kevin Bacon stated that he had "worked with everybody in Hollywood or someone's who's worked with them". This sentence started what is known as the "Six degrees of Kevin Bacon", by which you find links between the actor and anyone else in Hollywood through a maximum of 6 relationships. What is true in Hollywood is also true in music: legendary Michael Jackson producer Quincy Jones is likely to be related to everyone in the music world through 6 levels of relationships (or less) and this idea is at the core of what Sondz is about — literally.

During our first “get together / deep dive weekend” that startupers do, a few core principles and technologies were selected for Sondz that still drive us to this day. After agreeing on the “obvious” mission of being able to provide a one-stop shop platform for any and all musical information, our first idea gravitated around how small the musical industry actually was, with a finite number of influences, relationships, major players and improbable get-togethers that made for a funny and interesting web of connections. Our second idea, that stemmed from that, was that it would be interesting to try and show the wealth of this network of people, this “social network” if you will, and we thus decided to store our data as the Facebooks of this world do: through what is called a “graph database”.

The world before graph databases

Before graph databases, there was SQL. This now venerable technology was invented in the 60’s to meet first business requirements and it was both revolutionary and simple — and still makes for most of the databases in use today. But it had a few fundamental limitations. In this system, data was stored in tables, tables were a list of items sharing common traits, there were keys to identify each row in a unique way and these keys provided a way to link different tables. And that was pretty much it.

Let’s take an example for music:

  • One table could be the complete list of all artists. This ensemble shares common criteria such as a name, a type (band, producer, guitarist…), a date of birth… its “key” could be the artist name.
  • Another table could be the complete list of all albums. This ensemble shares common criteria such as a name, type (album, single), date of release… its key could be the album name.

At this point, one could say: why not store all this information in the same place? The answer to that is — because artists usually work on more than 1 album and SQL’s limitations quickly start to show: every table or list has a set number of criteria or columns. So if you want to store albums and artists in the same list, you will need to pre-arrange columns for that. However, one artist could have thousands of albums and another… just 1. What should be the maximum number of columns, and therefore albums? What if a particular artist exceeds this limit? What if, all of a sudden, albums have a new criterion that needs to be added like, say, a Spotify URL? Do you need to add 1000 columns to add URLs to 1000 albums? Hopefully no: database pioneers had a few tricks up their sleeves and found a workaround in the technology — splitting into different lists and linking them.

Below is a representation of how to best represent such a thing in SQL:

Note that a third “awkward” table has creeped in between Artist and Album: the “Link” table. What is worse, this table has to exist in SQL and it is one of the main reasons why this type of technology is not suited for social networks: SQL does not allow for a simple way to store multiple interactions on 1 item. However, in the real world, different artists usually work on the same album. If the Link table did not exist, then the same album would have to be duplicated for every artist that worked on it: every producer, guitarist, drummer, arranger, vocalist… The only solution is to create a table of relations, that Link table, storing which artists worked on which albums and in which way… cumbersome.

Therefore, we realised very early on that we wanted — no, needed to store music the way a social network does, where artists would be related to one another as friends who contribute to the same threads (albums), or songs, singles… For that, we needed the technology behind the LinkedIns and Facebooks : a graph database.

The invention of the graph database

Graph databases are a relatively new technology that became mature in the mid 2000’s. Based on the database paradigm of NoSQL, it was aimed mostly at replacing SQL. The point of graph databases is to allow information storage as required by a social network by doing the following:

  • Store individuals in a social network;
  • Store all the things they may want to share between each other or collaborate on;
  • Store all the interactions between these individuals and the things they want to share or collaborate on.

Data is no longer stored in tables but in way more natural “nodes” and “relationships”. Let’s look at the example of Quincy Jones and some of his relations:

You can see on the left that he is the parent of Rashida Jones, together with Peggy Lipton, with whom he was married. You can also see that he is or was the member of four bands, is a teacher, is also the father of Jolie Levine… Anyone without knowledge of how databases work can understand what is being represented here, plus it has the added bonus of providing an answer to all kind of questions…

Now, if an artist works on a new album, no problem: add an Album node. You discover that artists you were not aware of contributed to a given album, no biggie: just add the relationships between these artists and said album. You can even go further and store how they contributed, what is known as the “type” of the relationship between the artist and the node (vocalist, producer, lyricist…). Cool, huh?

How many degrees between Kevin Bacon and Quincy Jones?

At Sondz, we painstakingly transform and store millions of artists, albums, songs and labels in our graph database. And this has one huge benefit: we can tell how many steps it takes to link Quincy Jones to Kevin Bacon, or how many degrees of Quincy Jones Kevin Bacon has (and vice versa). Once a graph database is set up, the question can be answered in a surprisingly simple way as shown below:

By asking literally what is the shortest path in our musical social network between artists Kevin Bacon and Quincy Jones, we get the answer below in just 94 milliseconds:

Let’s break it down:

So the answer to the question you all certainly had in mind when you started reading this article is 3 — as in the number of degrees of Quincy Jones for Kevin Bacon
(if you only count the people in the network).


Taking a step back and looking forward

The “Six Degrees of Kevin Bacon” game started as a joke to exemplify the “network effect” that is at the core of our social networks. The funniest part is that, from a purely mathematical standpoint, it’s a no-brainer: there is a near absolute certainty that, through 6 levels of relationships, you will find anyone in Hollywood related to Kevin Bacon. Why? Because, through 6 levels of relationships, anyone in the world is related to more than 6 Billion other humans… so roughly everyone else!

Comedy notwithstanding, the graph database at the core of Sondz already allows for us to show things that can rarely be accessed: which artists already collaborated with a given artist, who are his siblings, family, what are the bands he has been a part of… all of this becomes obvious once you have this type of database. And, of course, we are just getting started…