In today’s fast-paced business environment, data is a critical asset that can make or break an organization’s success. To get the most value out of data, companies need to ensure they are measuring the right metrics accurately and consistently. This is where dbt (data build tool) comes in as a popular data transformation and modeling tool.
By using dbt, businesses can easily define and calculate metrics that are essential to their operations. Metrics like revenue, user engagement, customer churn, and product adoption are some examples that businesses track to understand their performance. By using dbt, these metrics can be reliably and consistently tracked over time, enabling organizations to make data-driven decisions.
In this blog, we will dive deeper into dbt metrics to see how they can be defined and calculated. So, let’s get started and explore how dbt can help you better measure and understand your business performance.
There are several benefits of using dbt to manage metrics in your data pipeline:
Please check the official repository of dbt Labs ‘jaffle_shop_metrics’ and fork this repository using your GitHub account in advance so that it can be referenced from dbt Cloud.
I assume you have a dbt account. if you don’t have one, then you must check it out!
I will also use Snowflake account, and if you don’t have one, feel free to make a free Snowflake trial account.
Now, let’s create a new dbt project using the remote repository we have just forked and connect it to a Snowflake account.
In order to check whether the objects on Snowflake can be operated correctly from dbt, we’ll do the next steps:
dbt deps (For managing external packages and dependencies between models) packages: - package: dbt-labs/metrics version: [">=1.4.0-a1","<1.5.0"] - package: dbt-labs/dbt_utils version: 1.0.0dbt seed and dbt run to check whether the schema and table are created in the target DWH.
example_develop_metric has an error. It is an example for writing a query using metrics. We will ignore it by deleting the model or by adding a ‘disable config’ in the model:Let’s have a look at the way data is structured in our pipeline, checking dependencies and derived matrices. Check out the different intermediate models to understand the underlying data and how it was joined together.
Let’s have a look at /models/marts/average_order_amount.yml. It is a metric defined to display the average order value by day, week, and month for each granularity such as customer status.
Note: The attribute dimensions in /models/marts/average_order_amount.yml (Metric yaml file) needs to be corrected. It has a few fields starting with ‘has’, but actually the field in orders model starts with ‘had’.
version: 2
metrics:
- name: average_order_amount
label: Average Order Amount
model: ref('orders')
description: "The average size of a jaffle order"
calculation_method: average
expression: amount
timestamp: order_date
time_grains: [day, week, month]
dimensions:
- had_credit_card_payment
- had_coupon_payment
- had_bank_transfer_payment
- had_gift_card_payment
- customer_status
A few mandatory and optional attributes had been taken into account such as:
name A unique string used to refer to the target Metrics. model The model name referenced by the target Metrics (ref('<name_of_model>'))label An arbitrary string to express metric more clearlycalculation_method Specifies the calculation method of the metric. More optional methods are available.time_grains The time-based component of the metric time_grains Uses the field specified by timestamp to specify the granularity of aggregation, such as day, week, month etc.dimensions A list of fields to group or filter the metric byYou can read and check all about available properties in the official docs.
revenue is yet another metric that calculates the total sales for the jaffle shop business. Another attribute has been added here: filters.
version: 2
metrics:
- name: revenue
label: Revenue
model: ref('orders')
description: "The total revenue of our jaffle business"
calculation_method: sum
expression: amount
timestamp: order_date
time_grains: [day, week, month, year]
dimensions:
- customer_status
- had_credit_card_payment
- had_coupon_payment
- had_bank_transfer_payment
- had_gift_card_payment
filters:
- field: status
operator: '='
value: "'completed'"
filters is a list of dictionaries to extra define the metric based on specific conditions. All three properties (field, operator and value) are required for each defined filter. When calculating revenue we sum up amounts for completed orders.
Another kind of metric calculation_method that can be used is derived. It is a calculation based on using 2 or more already defined metrics. We can see a derived calculation in the profit yaml file using the expression: " - " – (check out profit.yml). Note that derived metrics used in the expression share the specified time_grains and dimensions.
After defining the metrics we would also like to use them :). To query your defined metric, you must have the dbt_metrics package installed.
metric('<metric_name>')grain = '<time_grain>'dimensions = [<dimension1_name>, <dimension2_name>] select *
from
Secondary calculations refer to additional calculations or transformations performed on top of the base metrics to derive more complex or nuanced metrics.
In cases where we want to see a metric in the context of a particular filter which isn’t necessarily part of the metric definition, we can use a where clause. It takes a list of sql statements and adds them in as filters to the final CTE in the produced SQL.
In dbt, metrics are defined as pre-defined calculations that are based on one or more columns in a table. These metrics can be used to measure different aspects of the data and are commonly used to track performance, monitor KPIs (Key Performance Indicators), and derive insights.
The dbt Semantic Layer can leverage metrics to provide a consistent and standardized way of measuring and analyzing data. By defining metrics in the Semantic Layer, organizations can ensure that all users are using the same calculations, definitions, and business rules.