SQL is the cornerstone of database management and data manipulation. As data engineers and analysts, we spend a significant portion of our time crafting SQL queries to extract, transform, and analyze data. In the realm of analytics engineering, particularly with tools like dbt, maintaining clean and readable SQL code is crucial for collaboration, understanding, and maintenance. This is where SQLFluff comes to the rescue, making our dbt models concise and readable.
Enter SQLFluff
SQLFluff is an open-source linting tool for SQL code. It analyzes SQL queries and provides feedback on issues related to syntax errors, formatting inconsistencies, and adherence to coding conventions. Think of it as a spell checker for SQL - it helps ensure that your SQL code follows best practices and is consistent across your projects.
Why Use SQLFluff with dbt?
dbt is a powerful tool for managing the data transformation process in analytics projects. It allows you to define data models, tests, and documentation using SQL-based syntax. However, as projects grow in complexity and involve multiple contributors, maintaining a clean and consistent codebase becomes increasingly challenging.
SQLFluff is already implemented in dbt cloud where you can easily and seamlessly lint your sql models. we would like to achieve the same result using dbt core....
Linting your sql will help you in several ways:
- Consistency: SQLFluff enforces a consistent coding style across your dbt projects. By adhering to a consistent style guide, you make it easier for team members to understand and collaborate on the codebase. Think Upper/lower keywords, leading/trailing commas, indentations, aliasing rules, etc.
- Readability: Clean and well-formatted code is easier to read and understand. SQLFluff helps ensure that your SQL models in dbt are easy to comprehend, reducing the time needed for onboarding new team members or revisiting old code.
- Maintainability: Consistent formatting and coding conventions make your SQL models more maintainable. When making changes or debugging issues, having a standardized codebase simplifies the process and reduces the risk of introducing errors.
- Automated Checks: SQLFluff can be integrated into your dbt development workflow, allowing you to automatically check your SQL models for issues before they are committed to your version control system. This helps catch potential errors early in the development process.
Getting Started with SQLFluff and dbt
Integrating SQLFluff into your dbt workflow is quite straightforward. If you are working as the only developer, then you obviously have a clear understanding of how you want to style your code. When working with a team, you would first need to communicate and explain why it's important to have certain style rules.
A favorable result from this discussion would involve establishing an initial set of agreed-upon rules to adhere to. Additionally, you may consider implementing these rules for a specific section of the project, especially if it's extensive and encompasses multiple dbt projects.
Taking into account that you already have a dbt project, let's install the necessary packages:$ pip install sqlfluff sqlfluff-templater-dbt
You can confirm its installation by getting SQLFluff to show its version number:$ sqlfluff version
SQLFluff is a configurable SQL linter, which means you can configure your own linting rules. To achieve this we create a configuration file named .sqlfluff and place it in the root directory of your dbt project. Now we need to tell SQLFluff what kind of dialect to use and tune desired parameters for our rules.
dbtonic linting rules
dbt labs provides a .sqlfluff config file showcasing their approach (pasted below). You can use it for your project or change it to your taste. In the example config below, I used dialect = snowflake, you can use whatever warehouse you're familiar with. Run the sqlfluff dialects command, which will output a list of the current dialects available on your installation of SQLFluff.
[sqlfluff]
dialect = snowflake
templater = dbt
runaway_limit = 10
max_line_length = 80
indent_unit = space
[sqlfluff:indentation]
tab_space_size = 4
[sqlfluff:layout:type:comma]
spacing_before = touch
line_position = trailing
[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = lower
[sqlfluff:rules:aliasing.table]
aliasing = explicit
[sqlfluff:rules:aliasing.column]
aliasing = explicit
[sqlfluff:rules:aliasing.expression]
allow_scalar = False
[sqlfluff:rules:capitalisation.identifiers]
extended_capitalisation_policy = lower
[sqlfluff:rules:capitalisation.functions]
capitalisation_policy = lower
[sqlfluff:rules:capitalisation.literals]
capitalisation_policy = lower
[sqlfluff:rules:ambiguous.column_references] # Number in group by
group_by_and_order_by_style = implicit
Ignoring files in SQLFluff
.sqlfluffignore file (also in the root directory of your dbt project) provides a convenient way to tailor SQLFluff's linting process and improve its efficiency in analyzing SQL code within your project.
By listing the files or directories in the .sqlfluffignore file, you can customize SQLFluff's behavior to suit the specific needs of your project. This helps in focusing the linting process on relevant code while excluding irrelevant or generated files, test data, or other files that are not intended for linting.
Using SQLFluff
To get a feel for how to use SQLFluff, let's look at this sql model:
with
customers as (
select * from {{ ref("stg_customers") }}
),
fct_orders as (
select * from {{ ref('fct_orders') }}
),
customer_orders as (
select
customer_id
, MIN(order_date) AS first_order_date
, MAX(order_date) AS last_order_date
, count(order_id) AS cnt_orders
, sum(amount_usd) AS lifetime_value_usd
from fct_orders
GROUP BY 1
),
final as (
select
customers.customer_id
, customers.first_name
, customers.last_name
, customer_orders.first_order_date
, customer_orders.last_order_date
, coalesce(customer_orders.cnt_orders, 0) as cnt_orders
, customer_orders.lifetime_value_usd
from customers
left join customer_orders
on customers.customer_id = customer_orders.customer_id
)
select * from final
In order to test this model with SQLFluff, let's run the below command and check results:$ sqlfluff lint models/marts/core/dim_customers.sql
=== [dbt templater] Sorting Nodes…
=== [dbt templater] Compiling dbt project…
=== [dbt templater] Project Compiled.
== [models/marts/core/dim_customers.sql] FAIL
L: 11 | P: 11 | LT01 | Unnecessary trailing whitespace.
| [layout.spacing]
L: 13 | P: 9 | LT04 | Found leading comma ','. Expected only trailing near
| line breaks. [layout.commas]
L: 13 | P: 27 | CP01 | Keywords must be lower case.
| [capitalisation.keywords]
L: 14 | P: 9 | LT04 | Found leading comma ','. Expected only trailing near
| line breaks. [layout.commas]
L: 14 | P: 27 | CP01 | Keywords must be lower case.
| [capitalisation.keywords]
L: 15 | P: 9 | LT04 | Found leading comma ','. Expected only trailing near
| line breaks. [layout.commas]
All Finished 📜 🎉!
Now, fixing that sql model and applying the rules we can run a fix command:$ sqlfluff fix models/marts/core/dim_customers.sql
==== fixing violations ====
30 fixable linting violations found
Are you sure you wish to attempt to fix these? [Y/n] …
Attempting fixes…
Persisting Changes…
== [models/marts/core/dim_customers.sql] FIXED
Done. Please check your files to confirm.
All Finished 📜 🎉!
and voila! We fixed the sql model according to our style of choice, and this is the newly fixed model:
with
customers as (
select * from {{ ref("stg_customers") }}
),
fct_orders as (
select * from {{ ref('fct_orders') }}
),
customer_orders as (
select
customer_id,
MIN(order_date) as first_order_date,
MAX(order_date) as last_order_date,
COUNT(order_id) as cnt_orders,
SUM(amount_usd) as lifetime_value_usd
from fct_orders
group by 1
),
final as (
select
customers.customer_id,
customers.first_name,
customers.last_name,
customer_orders.first_order_date,
customer_orders.last_order_date,
customer_orders.lifetime_value_usd,
COALESCE(customer_orders.cnt_orders, 0) as cnt_orders
from customers
left join customer_orders
on customers.customer_id = customer_orders.customer_id
)
select * from final
By following these steps, you can leverage SQL fluff to enhance the quality and consistency of your SQL models in dbt, ultimately improving the maintainability and readability of your analytics codebase.
Give SQL fluff a try in your dbt projects and experience the benefits firsthand!
Thank you for reading this blog. Also check out our other blogs page to view more blogs on Power BI, Tableau, Alteryx, and Snowflake here.
Work together with one of our consultants and maximize the effects of your data. Contact us, and we'll help you right away.