Are you new to SQL? Or maybe you have some experience but you would still like to practice some more? There is no better way to learn than by having fun. This week, I’d like to recommend “The SQL Murder Mystery”, an SQL challenge designed by The Knight Lab of Northwestern University.
The first perk of this challenge is that the webpage contains editors that allow the user to solve the challenge directly on the page.
Secondly, it is designed for being suitable at any level: includes a tutorial for beginners, a database schema to support intermediate users, and of course the option to just jump into the challenge for the more advanced.
I encourage you to check out the challenge yourself, and have a go at it. After that, whether you feel stuck and would like a hint, or you finished promptly and would like to just confront your steps with mine, you can find my step by step solution below. Good luck!
Before we start, here are a couple of quick tips for beginners:
- Query tables with a simple ‘Select * from table_name limit 10’ to see column names and data available
- Look for common column between two tables, ideally a unique identifier (id)
STEP 1: Reading the report
We are provided some information about the murder at the beginning of the challenge:
With this information, we can write our first script. I advise to first query the crime_scene_report with a quick
SELECT * FROM crime_scene_report LIMIT 10
This will inform us on the data format: the date field, for instance, is formatted in an unusual way. Ah, and if you are wondering why add the limit clause at the end: this is a best practice to reduce querying computation and time when exploring datasets.
Once we have found out about the formats for the columns we need to query, we can type:
SELECT * FROM crime_scene_report WHERE type = ‘murder’ AND city = ‘SQL City’ AND date = ‘20180115’
This query allow us to find out that the crime had two witnesses
- The first witness lives at the last house on “Northwestern Dr”.
- The second witness, named Annabel, lives somewhere on “Franklin Ave”.
STEP 2: Reading the witnesses’ interview
Using the name and addresses of our witnesses we can query the interview table the following way:
WITH witnesses as (SELECT * FROM interview JOIN person p ON p.id = person_id WHERE (name LIKE’%Annabel%’ AND address_street_name = ‘Franklin Ave’) OR (address_street_name = “Northwestern Dr”))
The resulting table will contain several interviews, but only two of them contain information from witnesses of a murder:
Witness 1: I saw the murder happen, and I recognized the killer from my gym when I was working out last week on January the 9th.
Witness 2: I heard a gunshot and then saw a man run out. He had a “Get Fit Now Gym” bag. The membership number on the bag started with “48Z”. Only gold members have those bags. The man got into a car with a plate that included “H42W”.
STEP 3: Querying the gym database
By gathering the information from the witnesses’ testimony, we know that the murderer is a man with a gold membership starting with “48Z”, a car plate including “H42W”, and who went to the gym on January 9th. Or, as SQL speakers like to say:
SELECT * FROM get_fit_now_member m JOIN get_fit_now_check_in ON m.id = membership_id WHERE check_in_date = ‘20180109’ AND membership_status = ‘gold’ AND membership_id LIKE ’48Z%’
The result of this query leaves us with just two subjects:
STEP 4: Querying the drivers licence table
Now that we’re down to just 2 suspects, we can use their person_id to get to our final solution:
WITH p AS (SELECT name, license_id FROM person WHERE id in (‘28819’, ‘67318’)) SELECT * FROM drivers_license d JOIN p on p.license_id = d.id WHERE id IN (select license_id FROM p)
….and voilà! We have our murderer! Run the query on your editor to find our whodunnit and solve the SQL murder mystery.
A bonus challenge!
When I checked my solution in the “Check your solution” editor, the database revealed that the solution was indeed correct, but that by checking the murderer’s interview transcript, I could find out who the real villain of this story was. More mystery to solve!
Below you will find the step by step solution first, and then the one-query solution at the end.
STEP 1: Querying the interview table
Let’s start by querying the interview table. (To avoid a spoiler to those who are just here for a hint, I won’t reveal the murderer’s id).
select * from interview where person_id = ‘insert murderer’s id here’
The murdered declared”
I was hired by a woman with a lot of money. I don’t know her name but I know she’s around 5’5″ (65″) or 5’7″ (67″). She has red hair and she drives a Tesla Model S. I know that she attended the SQL Symphony Concert 3 times in December 2017.
STEP 2: Querying the drivers’ license table
The driver’s licence table contains information on height, hair color and car model. Let’s start there:
SELECT id FROM drivers_license WHERE car_model = “Model S” AND gender = ‘female’
Now our options are restricted to three suspects. The other two pieces of information we have are that the lady we are looking for attended a concert at a specific time a specific amount of times, and that she is well off.
STEP 3: Querying the events table
Since the information regarding the event is more specific than the one regarding the income, let us start by querying the facebook_event_checkin table, and see if that is enough.
However, before querying the facebook_event_checkin table, we need to find an identifier for these three women (a person_id). So far we only have driver’s licence ids.
Let’s first query the person’s table:
SELECT * FROM person WHERE license_id IN (‘202298’, ‘291182’, ‘918773’)
Now that we have found the person ids, we can query the facebook_event_checkin table, and join it with the person table in order to obtain the person’s name as well as their id:
SELECT p.name, f.* FROM facebook_event_checkin f JOIN person p ON p.id = f.person_id WHERE f.person_id IN (‘78881’, ‘90700’, ‘99716’)
….and we got her! Congrats for making it so far. Now you can run the query to find our criminal mastermind.