You’ve seen and heard us talk a lot about Snowflake with a lot of enthusiasm at The Information Lab but once you’re ready to get your hands dirty (or wet) on Snowflake, where to start?
Once you’ve logged into your brand new environment you’re welcomed by … a lot of empty white space. So now that you’ve got your brand new ultra-fast and flexible cloud-based Database the first step would be to start gathering and loading some data.
Step 1. Gather your data
Before you know what your table(s) should look like, you should take a deeper look into your own data ; what are you trying to store?
As an example we can take a look at popular csv dataset :
“Data on the daily number of new reported COVID-19 cases and deaths by EU/EEA country”. The dataset can be downloaded here.
Step 2. Explore your data
This csv is setup like a lot of csv files ; the first line has the columnames, lines are used as splits between records and it uses a “,” as a seperator.
Lets take a look at the first 2 lines for a complete check :
We can see the following :
- dateRep is a date in a mm/dd/yyyy format
- day,month,year,cases,deaths are all whole numbers
- countriesAndTerritories,geoid,countryterritoryCode are all letters/words
- popData2020 is a whole number
- continentExp is a word
Step 3. Prepare your table
Now that we know what the data looks like , we can go to Snowflake and start building the Table.
Go to your Snowflake Database -> Tables and Create a Table.
Here you can use the wizard to start building the table with the information above. This should look something like :
We’re using Date, Integer, and String as field types in this “simple” table.
You can then press Finish to have Snowflake build the table.
Once the Table is ready, you can open it by clicking on the name. When it opens , you can see all the columns and their types, with a single clickable option : Load Table.
Step 4. Load the CSV into the Table
Loading the data can be done through the wizard that’s opened with “Load Table”.
- Select the Warehouse that you’re going to use to load the data.
- Select “Load files from your computer” and browse to the locally stored file.
- Select a File Format that you want to use. If non is selected, we can create a new one with the +-symbol: Keep all of the settings at default, no need to change anything except for 2 options :
– the “Header lines to skip”, we know that we can set this to 1.
– Set the dateformat to Other and type in dd/mm/yyyy. This format isn’t recognized by default.
- Try to use a name that makes sense (CSVSkip1st for instance)
- Click on Load.
This should conclude in the following screen :
Congratulations , you’ve just loaded data into your first Snowflake table!
Step 5. What’s next?
A single load is fun and all, but we don’t want to do this every day in the next blog post, we’ll try and see if we can use AWS to automate this procedure with the same file!
Here you can see the first part of this automation :
Scripting file download and upload to AWS S3 with Python