🟡 HubSpot Operations Practical Textbook — 2026 Edition
Chapter 4

Data Studio
Mapping variable name (e.g. total_score) and write destination property (e.g. custom property "total_score")

“I want to analyze HubSpot's CRM data and Snowflake's sales data together.” “I want to match Google Sheets' budget table and opportunity pipeline.” Up until now, the only options were for engineers to write ETL pipelines or manually merge CSV files. Data Studio isBlend and transform multiple data sources, apply formulas, and build datasets with a spreadsheet-like no-code UIRealize. This chapter explains the overall picture of Data Studio, data source connections, conversion and blending, formula proposal using AI, and pipeline design for each use case.

📖 Estimated reading time: 25 minutes
🎯 Target audience: RevOps, data analysts, BI engineers, HubSpot administrators
🔧 Required plan: Operations Hub Professional or higher

📋 Contents of this chapter

  1. 4-1What is Data Studio? Differences from ETL tools and architecture
  2. 4-2Connecting data sources—HubSpot CRM, external DB, spreadsheets
  3. 4-3Data transformation and blend design (joins, aggregations, filters)
  4. 4-4Formula suggestion by AI—Automatic generation of calculated columns and condition classifications
  5. 4-5Pipeline design by use case—reports, scoring, enrichment
Section 4-1

What is Data Studio? Differences from ETL tools and architecture

Data Studio will be added to HubSpot Data Hub in 2025No-code data integration and transformation platformis. Combine and transform multiple external data sources (Snowflake, BigQuery, AWS S3, Google Sheets, CSV files) and HubSpot CRM data with a spreadsheet-like UI,Publish as a dataset for use within HubSpothave a function.

📊 Data Studio — Revenue Attribution Dataset Pro
delay
🟡HubSpot CRM
❄️Snowflake
📊Google Sheets
conversion
🔗JOIN settings
⚙️calculated column
🔍filter
output
Dataset publication
📈Report linkage
Revenue Attribution Dataset — Preview (150 rows)
🟡
HubSpot Deals
CRM Object
LEFT JOIN
❄️
SF Revenue Data
Snowflake Table
+
⚙️
Calculated column × 3
AI generated formula
Deal Name
HS Amount
SF Revenue
Diff %
Status
ABC Co., Ltd. New introduction
¥2,400,000
¥2,400,000
0.0%
✓ Match
XYZ Corp Enterprise
¥8,500,000
¥8,200,000
-3.5%
⚠Confirmation required
Tanaka Seisakusho additional license
¥1,200,000
¥1,200,000
0.0%
✓ Match
Calculated column [Diff %]
= (SF_REVENUE - HS_AMOUNT) / HS_AMOUNT * 100

Differences between ETL tools, BI tools, and Looker

Data Studio
HubSpot Data Studio
Complete within HubSpot (no need to switch tools)
No-code UI, no engineers needed
Native integration with CRM data
Connect directly to HubSpot reports
Supported data sources are limited
Limits to complex conversion logic
ETL tools
Fivetran / dbt / Airbyte
Supports 200+ data sources
Compatible with complex conversions and large-scale data
Compatible with version control and CI/CD
Engineering knowledge required
Cost and management of separate tools will be incurred
Additional settings for joining with HubSpot data
BI tools
Looker / Tableau / Power BI
Advanced visualization
Real-time analysis of large-scale data
Drill-down interactive analysis
Difficult to interact with HubSpot two-way
High cost ($50~/user/month)
Unable to write converted values ​​back to CRM
💡 Scenarios where Data Studio shines the most

"I want to combine HubSpot CRM data with one or two other external data sources and use the results for HubSpot reporting and automation." Data Studio fits perfectly into this need. If large-scale complex ETL is required, combination with dbt/Fivetran is appropriate.Data Studio can cover 80% of the data integration and confirmation work that RevOps uses on a daily basis.

Section 4-2

Connecting data sources—HubSpot CRM, external DB, spreadsheets

There are two main types of data sources that can be connected to Data Studio: "HubSpot internal sources" and "external sources." Authentication settings are required to connect to external sources, and the sources available vary depending on the plan.

🟡
HubSpot CRM objects
internal source
Connect to all properties and related data for contacts, companies, deals, tickets, and custom objects. View the latest CRM values ​​in real time.
Plan:Pro~ (all property access)
❄️
Snowflake
external data warehouse
Connect directly to Snowflake database schema tables. All tables can be referenced from Data Studio by simply setting the account identifier, warehouse name, role, and authentication information.
Plan:Enterprise (full integration) / Pro (read only)
🔷
BigQuery
external data warehouse
Connect to BigQuery dataset tables in Google Cloud. Authenticate using the service account JSON key. Often used in use cases that combine GA4 data or Google Ads data with HubSpot CRM.
Plan:Enterprise (full) / Pro (read)
🟠
AWS S3
object storage
You can read CSV/Parquet files in your S3 bucket directly from Data Studio. Authenticate with IAM role or access key. Used to combine with data regularly uploaded to S3.
Plan:Professional〜
📊
Google Sheets
spreadsheet
You can connect to any sheet by simply authenticating with your Google account. It is the easiest to use for merging with data managed outside of CRM, such as budget tables, territory maps, and product price lists.
Plan:Professional〜
📄
CSV file (upload)
file source
You can upload a local CSV and use it as a temporary data source. Convenient for integrating reference data (industry benchmarks, ICP classification tables, etc.) that does not require regular updates. Maximum file size: 100MB.
Plan:Professional〜

Best practices for connection settings

Connection destinationRecommended authentication methodPrecautions
Snowflake Dedicated service account + read-only role Do not give write permission to the production database. Create a read-only role specifically for Data Studio
BigQuery Service account JSON key + BigQuery Data Viewer role Do not give project owner privileges. Set maximum scan volume to manage query costs
AWS S3 IAM user (least privilege: s3:GetObject only) We recommend restricting access to a specific prefix (folder) rather than accessing the entire bucket.
Google Sheets OAuth (Google account authentication) If you are connecting with a retired employee's account, you will not be able to retrieve data when your authority expires. Connect with a shared service account
Section 4-3

Data transformation and blend design (joins, aggregations, filters)

The core of Data Studio isJOIN, transform, and aggregate multiple tables” The ability to build pipelines with no code. As you drag and drop each step on the UI, an SQL query is automatically generated behind the scenes.

🔄 Data Studio transformation pipeline (5 steps)
📥
All contact properties
All company properties
🔗
JOIN settings
Select the join key (e.g. email address/ID) and JOIN type (LEFT / INNER / FULL)
🔍
filter
Exclude unnecessary rows (e.g. only Lifecycle Stage = Customer, only amount > 0)
⚙️
Conversion/calculation column
Add new columns using formulas, conditional branching, and string operations. You can also ask AI to make suggestions.
📤
Dataset publication
Save the completed table as a dataset. Available from HubSpot Reports/WF

Types and usage of JOIN

JOIN typeExplains everything from design to implementation.Typical usage with HubSpot
LEFT JOIN (recommended default) Keep all rows of the left table. Rows with no match in the right table will be NULL "All HubSpot contacts + those in Snowflake combine sales data." contacts won't disappear
INNER JOIN Keep only matching rows in both tables “I want to analyze only rows that exist in both HubSpot transactions AND Snowflake billing records.”
FULL OUTER JOIN Keep all rows from both tables. Unmatched rows are NULL “I want to detect both transactions that are in HubSpot but not in Snowflake (missed billing) and transactions that are in Snowflake but not in HubSpot (unregistered transactions).”
⚠️ “Data type mismatch” in JOIN keys is the most common cause of errors

When you try to JOIN HubSpot's contact ID (numeric type) and Snowflake's customer_id (string type), it fails.Unify data types using "calculated columns" before JOIN(Example: Convert HubSpot ID to string → CAST(contact_id AS VARCHAR)). The fastest way is to ask the AI ​​to ``convert this column to a string'' from the UI's ``Add calculated column.''

Section 4-4

Formula suggestion by AI—Automatic generation of calculated columns and condition classifications

One of the major differentiating features of Data Studio is“Have AI suggest a formula” functionis. On the Add Calculation Column screen, if you give instructions in natural language such as ``I want to calculate ARR from this data'' or ``I want to classify tiers based on industry and company size,'' AI will generate the corresponding formula.

🤖 Collection of examples of AI formula suggestions
Instructions in natural language from the “Ask AI to suggest formulas” button → Automatically generate formulas
ARR calculation
Calculate annual recurring revenue (ARR) from monthly contract amount
= IF(BILLING_CYCLE = "Monthly", AMOUNT * 12, IF(BILLING_CYCLE = "Quarterly", AMOUNT * 4, AMOUNT))
The amount is annualized according to the billing cycle (monthly, quarterly, yearly). Safe design that returns AMOUNT as is if BILLING_CYCLE is unknown.
Customer tier classification
Classify customers into three tiers based on ARR and number of employees
= CASE WHEN ARR >= 5000000 AND NUM_EMPLOYEES >= 100 THEN "Enterprise" WHEN ARR >= 1000000 OR NUM_EMPLOYEES >= 50 THEN "Mid-Market" ELSE "SMB" END
A CASE statement that classifies ARR of 5 million yen or more and 100 or more employees as Enterprise, ARR of 1 million yen or more or 50 or more employees as Mid-Market, and all others as SMB.
Health score calculation
Calculate customer health score from login frequency, number of support tickets, and number of functions used
= LEAST(100, (COALESCE(LOGIN_COUNT_30D, 0) * 2) + (COALESCE(FEATURES_USED, 0) * 5) - (COALESCE(OPEN_TICKETS, 0) * 10))
Score calculated as number of logins x 2 points + number of functions used x 5 points - number of unresolved tickets x 10 points. COALESCE converts NULL to 0. LEAST limited to maximum value 100.
Date difference/churn detection
Calculate the number of days that have passed since the last login date and determine the churn risk
= CASE WHEN DATEDIFF('day', LAST_LOGIN_DATE, CURRENT_DATE) > 60 THEN "High Risk" WHEN DATEDIFF('day', LAST_LOGIN_DATE, CURRENT_DATE) > 30 THEN "Medium Risk" ELSE "Healthy" END
More than 60 days since the last login is classified as high churn risk, 30 to 60 days as medium risk, and less than 30 days as healthy. Calculate the number of days with DATEDIFF.
Email domain extraction
Extract company domain from email address and exclude free emails
= CASE WHEN SPLIT_PART(EMAIL, '@', 2) IN ( 'gmail.com','yahoo.co.jp','hotmail.com','outlook.com' ) THEN NULL ELSE SPLIT_PART(EMAIL, '@', 2) END
Extracts the part after @ and returns NULL for major free email domains. Use this when you want to use only the business email domain as a company identifier.
✅ Tips for using AI formula suggestions

From "Calculate ARR", there are three types of BILLING_CYCLE fields: monthly, quarterly, and yearly.Create an annualized ARR column by 12x, 4x, and as is, respectively.The more specific you are about field names, conditions, and expected output, the more accurate it will be.Low threshold (less than 60)

Section 4-5

Pipeline design by use case—reports, scoring, enrichment

The actual usage patterns of Data Studio can be broadly classified into three types. Let's take a closer look at each pipeline design.

Pattern A: Building a Revenue Attribution report

Property design and association designCan detect problems with forecast accuracy, unbilled items, and double registrations.

💰 Revenue Attribution Pipeline
Matching HubSpot Opportunities x Snowflake Billing Data
1
Source selection: HubSpot Deals (Closed Won only)
There are four standard objects in HubSpot: Contacts, Companies, Deals, and Tickets. These can be used in most BtoB businesses, but
2
JOIN: Snowflake's revenue_recognition table
Join key: hs_deal_id = sf_deal_id (already mapped in Salesforce integration). JOIN type: FULL OUTER JOIN (to detect records that exist in either side)
JOIN snowflake.revenue.recognition ON hs_deals.deal_id = revenue.hs_deal_id
3
Add calculated columns: difference amount, difference rate, status flag
Tell the AI ​​to "Calculate the difference between HubSpot's amount and Snowflake's recognized_revenue, and flag any difference greater than 5% as 'needs confirmation'"
diff_amount = COALESCE(recognized_revenue,0) - COALESCE(amount,0) diff_pct = diff_amount / NULLIF(amount, 0) * 100 status_flag = CASE WHEN ABS(diff_pct) > 5 THEN 'Confirmation required' ELSE 'Normal' END
4
Publish dataset → Connect to HubSpot reports
Published as “Revenue Reconciliation Dataset”. Select this dataset in HubSpot's "Custom Reports" to create a list view of "Needs Attention" deals and a trend graph of monthly variance amounts.

Pattern B: Customer health scoring dataset

The use case is to combine HubSpot CRM + product usage data (BigQuery) + support tickets (Zendesk API) to calculate a customer health score and write it back to contact properties in HubSpot. The CS team's weekly manual merging of CSV files can now be completely automated.

🏥 Customer Health Scoring Pipeline
Integrated score calculation of CRM x product usage data x support data
1
Source ①: HubSpot Contacts (Customer only)
Filter by Lifecycle Stage = "customer". Get: contact_id, email, company, contract_start_date, csm_owner_id, plan_type
2
Source ②: BigQuery product_usage table (LEFT JOIN)
Join key: email. Get: login_count_30d, features_used_count, last_login_date, session_duration_avg
3
Source ③: S3 zendesk_tickets_weekly.csv (LEFT JOIN)
See weekly export of Zendesk data to S3. Join key: customer_email. Get: open_ticket_count, avg_response_hours, csat_score_avg
4
Calculated column: Health score + risk classification
Ask AI to suggest a formula to calculate your score out of 100 and High/Medium/Low risk tier.
health_score = LEAST(100, (COALESCE(login_count_30d,0)*2) + (COALESCE(features_used_count,0)*5) + (COALESCE(csat_score_avg,3)*8) - (COALESCE(open_ticket_count,0)*10)) risk_tier = CASE WHEN health_score >= 70 THEN 'Green' WHEN health_score >= 40 THEN 'Yellow' ELSE 'Red' END
5
Publish dataset → write back to CRM with workflow
Update datasets with weekly scheduled runs. Read dataset values ​​and automatically update contact health_score and risk_tier properties in HubSpot workflows (Chapter 7)
⚡ Concept of dataset update schedule

Data Studio datasets can be updated manually or scheduled (at least every hour).Every 1 to 4 hours for dashboards that require real-time performance; Daily for monthly reports; Weekly for datasets that involve heavy conversion processing.is the standard. Increased schedule updates will consume API quota, so increase the frequency only if you really need real-time updates.

📌 Chapter 4 Summary

Data Studio is the fastest route exclusively for “CRM × external data combination”

ETL tools (dbt/Fivetran) are flexible but require engineering man-hours. BI tools (Tableau/Looker) have powerful visualization, but cannot be written back to CRM. Data Studio is the fastest, no-code way to meet the needs of ``combining HubSpot CRM data with 1 to 3 external sources and leveraging them within HubSpot.''

Use 6 types of data sources

Six types can be used: HubSpot CRM (internal), Snowflake, BigQuery, AWS S3, Google Sheets, and CSV. External connections require a dedicated service account and minimum privilege authentication settings. Use a shared account for your organization, rather than connecting with a former employee's personal account.

JOIN defaults to LEFT JOIN—beware of type mismatch

For most use cases, LEFT JOIN (keeps all HubSpot records) is the correct choice. Use FULL OUTER JOIN only when you want to detect differences between records that exist on both sides. Mismatched JOIN key data types (numeric vs. string) is the most common cause of errors—convert the type in the computed column before joining.

AI formula suggestion specifically conveys field names, conditions, and expected values

From ``Calculate ARR'', give specific instructions such as ``Create an ARR column where the BILLING_CYCLE field has three values: Monthly/Quarterly/Annual, Monthly is 12 times, Quarterly is 4 times, and Annual is the same.'' Be sure to preview the generated formula before applying it to production.

Revenue Reconciliation detects “discrepancies between CRM numbers and financial numbers”

With a FULL OUTER JOIN of HubSpot Opportunities x Snowflake billing data, you can automatically detect opportunities that are in HubSpot but have not been billed or that have been billed but are not in HubSpot. By simply flagging discrepancies over 5% as "confirmation required" and creating a report, the monthly sales reconciliation process will be reduced from 1 hour to 5 minutes.

Set refresh schedules to match data freshness requirements

Not all datasets need to be updated in real time. A good guideline is every 1 to 4 hours for dashboards, daily for monthly analysis, and weekly for heavy join processing. If you do it too frequently, it will consume your API quota and affect other processes. When setting the data, think about how many hours old this data can be without affecting decision-making.

Next Chapter
Chapter 5: Data Agent and Smart Properties——AI automatically generates data →