Last week, I wrote a blog about downloading data, uploading it to an S3 bucket, and importing it into Snowflake (which you can find here). Now it’s time to start automating that process!
We can do the automation in a number of different ways but let’s start with a Python script that we can manually run. For this, I’m using Python as my language of choice but feel free to use anything that you feel comfortable with!
Step 1 Download the file
First lets download the file locally ;
import requests url = 'https://opendata.ecdc.europa.eu/covid19/nationalcasedeath_eueea_daily_ei/csv/data.csv' r = requests.get(url, allow_redirects=True) open('<fill in your folderpath ending with filename.type>', 'wb').write(r.content)
This downloads the file to our local machine, Overwriting the file if it finds that it’s already there.
Step 2 The S3 bucket
Firstly, what’s an S3-bucket?
An Amazon S3 bucket is a public cloud storage resource available in Amazon Web Services’ (AWS) Simple Storage Service (S3), an object storage offering. AND for free if you keep within the pretty large boundaries of 5GB of Amazon S3 storage in the S3 Standard storage class; 20,000 GET Requests; 2,000 PUT, COPY, POST, or LIST Requests; and 15GB of Data Transfer Out each month for one year.
Now let’s start creating a S3 bucket:
- Navigate to the S3 dashboard
- Click “Create bucket”
- Enter a bucket name.
- Click on Create Bucket at the bottom to accept the default settings and create the bucket.
Step 3 Uploading to S3
The next and last step is uploading it to our S3 bucket :
import boto3 from botocore.exceptions import NoCredentialsError ACCESS_KEY = '<Enter Access Key here>' SECRET_KEY = '<Enter Secret Key here>' def upload_to_aws(local_file, bucket, s3_file): s3 = boto3.client('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY) try: s3.upload_file(local_file, bucket, s3_file) print("Upload Successful") return True except FileNotFoundError: print("The file was not found") return False except NoCredentialsError: print("Credentials not available") return False uploaded = upload_to_aws('<Enter path to local file here>', '<enter bucketname here>', '<store it in this folder using this filename in the s3 bucket>')
The above code was created by Ahmad Bilesanmi so huge shout out to him!
Let’s start filling in the gaps to make this pice of code work :
ACCESS_KEY = '<Enter Access Key here>' SECRET_KEY = '<Enter Secret Key here>'
Enter your Access information for your S3 environment here. Remember to keep these keys private and secure!
uploaded = upload_to_aws('<Enter path to local file here>', '<enter bucketname here>', '<store it in this folder using this filename in the s3 bucket>')
To illustrate the formating, I’ll give you all my piece of code as an example :
uploaded = upload_to_aws('/Users/mikedroog/Projects/whodata.csv', 'whocovid', 'whodata.csv')
Small note : I’m running on macOS so if you’re running on Windows , your filepath will look different and start with a drive-letter like C:\
If you log into your bucket, you’ll see that the file is now uploaded and more importantly; If you check your Snowflake database, it will probably have loaded the new file into your database already because of the pipe we created!
Next time we’ll make our automation complete and leverage AWS Lambda and Eventbridge to automate the entire download and upload without the need of a manually run script (or local download)