🟡 HubSpot Operations Practical Textbook — 2026 Edition
Chapter 6

Data warehouse integration (Enterprise)
Snowflake / BigQuery Two-way collaboration with

“I want to sync all of HubSpot's CRM data to Snowflake, which is used by my analytics team, in real time.” “I want to use scores from BigQuery's machine learning models to automate HubSpot.”Data warehouse integration (DWH integration)is the problem to be solved. This chapter systematically explains the architecture, schema design, setup procedure, data lineage tracking, and cooperation design with BI tools for bidirectional synchronization with Snowflake and BigQuery.

📖 Estimated reading time: 25 minutes
🎯 Target: Data engineers, RevOps, BI staff, CTO
🔧 Required plan: Operations Hub Enterprise (required)

📋 Contents of this chapter

  1. 6-1Architecture of DWH integration—what gets synchronized in which direction?
  2. 6-2Setting up a Snowflake connection—schema design and authentication settings
  3. 6-3Setting up a BigQuery connection—project settings and schema
  4. 6-4Data lineage tracking/BI collaboration/operation design
Section 6-1

Architecture of DWH integration—what gets synchronized in which direction?

HubSpot Data Hub Enterprise's DWH integration brings together all your HubSpot CRM object data.Sync to your data warehouse in near real-timeIt is a function to do. Not just "export"Change-sensitive differential synchronization (CDC: Change Data Capture)Updates on the HubSpot side are reflected in DWH within a few minutes.

🏗️ HubSpot ↔ Data Warehouse Unified Architecture
👤
Contacts
All contact properties
🏢
Companies
All company properties
🟡
HubSpot Smart CRM
All objects/custom objects
Engagement/activity history
💰
Deals
Business negotiations/pipeline
🎫
Tickets
support ticket
↓ HubSpot → DWH (CDC differential synchronization, minimum 15 minutes)
↑ DWH → HubSpot (batch writeback/scheduling)
❄️
Snowflake
V2_LIVE Schema (15 minute update)
V2 schema (daily batch)
🔷
BigQuery
hubspot_live dataset (updated 15 minutes)
hubspot_daily dataset (daily)
CDC differential synchronization (transfer only changed rows/minimum 15 minutes)
Writeback (property updates from DWH to HubSpot)
All snapshots (daily batch)

Types of data synchronized and constraints

data categorySync to DWHWriteback from DWHremarks
standard object
(Contact/Company/Transaction/Ticket)
✓ All properties ✓ Write back property values Deleted records are managed with the is_deleted flag
custom object ✓ All properties ✓ Compatible Enterprise custom objects are also synced
engagement
(email opening/click/activity)
✓ Synchronized as an event table ✗ Cannot be written back For analytical use only. Writing from DWH is not possible
marketing email
(Sending history/subscription status)
✓ Synchronization △ Only subscription status can be written back Email sending history cannot be overwritten
Attachments/images ✗ Not compatible ✗ Not compatible Binary data is not synchronized. Sync URL only
Workflow execution log ✓ Synchronization (Enterprise) ✗ Cannot be written back Audit/compliance use
💡 How to use V2_LIVE schema and V2 schema (Snowflake)

Two types of schemas are automatically generated in Snowflake.V2_LIVE: Near real-time schema updated every 15 minutes. Use it to check dashboards and the day's business figures.V2: Snapshot all items in daily batch. Used for maintaining past history and heavy analytical queries. V2 is recommended as the main data source for BI tools (V2_LIVE is expensive as it retrieves the latest data every time a query is made).

Section 6-2

Setting up a Snowflake connection—schema design and authentication settings

Integration with Snowflake is done from the "Data Warehouse" settings on the HubSpot side. Once connected, HubSpot willAutomatically creates database schema tables on Snowflake and keeps them in sync from then on. There is no need to create tables manually.

❄️ Snowflake integration setup instructions
Time required: 30-60 minutes (requires Snowflake administrator privileges)
1
Snowflake side
Create a HubSpot dedicated role warehouse database
Run the following with Snowflake administrator privileges. Create a least privilege role for HubSpot and assign a dedicated warehouse. Separation from the production database facilitates cost management and authority control.
-- Role creationCREATE ROLE HUBSPOT_ROLE; -- Warehouse creation (X-Small is sufficient.later scale up as needed)CREATE WAREHOUSE HUBSPOT_WH WITH WAREHOUSE_SIZE = 'X-SMALL' AUTO_SUSPEND = 60 AUTO_RESUME = TRUE; -- Database creationCREATE DATABASE HUBSPOT_CRM; -- AuthorizationGRANT USAGE ON WAREHOUSE HUBSPOT_WH TO ROLE HUBSPOT_ROLE; GRANT ALL ON DATABASE HUBSPOT_CRM TO ROLE HUBSPOT_ROLE;
2
Snowflake side
Create a service account user for HubSpot connectivity
Create a service account specifically for HubSpot instead of a human account. Generate a strong random string of characters for your password.
CREATE USER HUBSPOT_USER PASSWORD = '<strong_random_password>' DEFAULT_ROLE = HUBSPOT_ROLE DEFAULT_WAREHOUSE = HUBSPOT_WH MUST_CHANGE_PASSWORD = FALSE; GRANT ROLE HUBSPOT_ROLE TO USER HUBSPOT_USER;
3
HubSpot side
Enter your connection information and test your connection in HubSpot
Enter the following information from HubSpot's "Settings → Data Management → Data Warehouse → Connect Snowflake".
Account identifier: xxxxx.ap-northeast-1 (e.g. Tokyo region) Warehouse name: HUBSPOT_WH Database name: HUBSPOT_CRM Schema name: PUBLIC (automatically generated) Username: HUBSPOT_USER Password: <Password you created>
4
HubSpot side
Select objects and orientation to sync
After successful connection, set the objects to be synchronized (contacts, companies, deals, tickets, custom objects) and the synchronization direction (HubSpot → Snowflake only/bidirectional). For the first time, we recommend starting with one direction from HubSpot to Snowflake, confirming operation, and then enabling both directions.
5
Snowflake side
Confirm initial synchronization is complete and validate table structure
The first full synchronization may take several minutes to several hours depending on the amount of data in HubSpot. After completion, check the automatically generated tables on the Snowflake side.
-- Check the list of created tables SHOW TABLES IN SCHEMA HUBSPOT_CRM.V2_LIVE; -- Check the number of contacts (does it match the one on HubSpot side?) SELECT COUNT(*) FROM HUBSPOT_CRM.V2_LIVE.CONTACTS; -- Check last sync time SELECT MAX(_FIVETRAN_SYNCED) FROM HUBSPOT_CRM.V2_LIVE.CONTACTS;

Structure of an automatically generated Snowflake schema

The tables that HubSpot automatically generates in Snowflake have a one-to-one correspondence with HubSpot's CRM objects. Each table hasAll properties are expanded as columnsIn addition, columns for system management (_hs_synced_at, hs_object_id, is_deleted) are added.

HUBSPOT_CRM.V2_LIVE — Schema excerpt of main tables Snowflake
CONTACTS contact
HS_OBJECT_IDNUMBERPK/HubSpot record ID
EMAILVARCHARemail address
FIRSTNAMEVARCHARgiven name
LASTNAMEVARCHARLast name
LIFECYCLESTAGEVARCHARlife cycle stage
HUBSPOT_OWNER_IDNUMBERFK → OWNERS
HS_LEAD_STATUSVARCHARlead status
INDUSTRYVARCHARIndustry
_HS_SYNCED_ATTIMESTAMP_NTZLast sync date and time
IS_DELETEDBOOLEANDeletion flag
DEALS transaction
HS_OBJECT_IDNUMBERPK
DEALNAMEVARCHAROpportunity name
AMOUNTFLOATamount
DEALSTAGEVARCHARBusiness negotiation stage
CLOSEDATEDATEScheduled closing date
PIPELINEVARCHARpipeline
HUBSPOT_OWNER_IDNUMBERFK → OWNERS
ASSOCIATIONS__CONTACT_IDSARRAYRelated contact ID array
_HS_SYNCED_ATTIMESTAMP_NTZLast sync date and time
IS_DELETEDBOOLEANDeletion flag
Section 6-3

Setting up a BigQuery connection—project settings and schema

Similar to Snowflake, integration with BigQuery can be completed by simply configuring HubSpot.Create a Google Cloud service account and pass JSON keys to HubSpotHubSpot automatically creates and manages BigQuery datasets and tables for you.

🔷 BigQuery integration setup instructions
Time required: 20-40 minutes (requires Google Cloud IAM administrator privileges)
1
Google Cloud side
Create a service account specifically for HubSpot
Create from Google Cloud Console → IAM and Management → Service Account. Example name:[email protected]. Only the following minimum roles will be granted.
Required IAM roles: - BigQuery Data Editor (hubspot- prefix dataset only) - BigQuery Job User (query execution privileges) - BigQuery Data Viewer (read access) * Never assign BigQuery Admin or Project Owner.
2
Google Cloud side
Generate and download JSON key for service account
Select "Keys" tab of the service account → "Add key" → "JSON" and download. This file is confidential information, so keep it in a safe place and do not share it via email, Slack, etc.
3
HubSpot side
Upload a JSON key to HubSpot and connect
Upload the GCP project ID and JSON key file from "Settings → Data Management → Data Warehouse → Connect BigQuery". Run a connection test and check if the connection is successful.
4
BigQuery side (check automatic generation)
See the dataset structure that HubSpot automatically generates
After the first synchronization, the following datasets are automatically generated in BigQuery.
hubspot_live (Near real-time: updated every 15 minutes)├── contacts├── companies├── deals├── tickets└── engagements (engagement events) hubspot_daily (Daily snapshot: retains the past 90 days) ├── contacts_YYYYMMDD ├── deals_YYYYMMDD └── ... (date partition)

Types and usage of synchronization schemas

Snowflake: V2_LIVE / BigQuery: hubspot_live
Near real-time schema
Changes made in HubSpot take effect within 15 minutes. Use for real-time dashboards to see the latest CRM status, confirmation of the current day's sales progress, and alert systems. The cost tends to be high because the latest data is acquired for each query.
Update frequency:Minimum 15 minutes(Only changed rows are transferred by CDC)
Snowflake: V2 / BigQuery: hubspot_daily
Daily snapshot schema
Obtain snapshots of all items on a daily basis. Used for monthly reports, annual comparisons, and machine learning learning data that refer to data as of a specific date in the past. Cheaper than V2_LIVE and more stable queries.
Update frequency:Once a day(Snapshot of all items. Retained for the past 90 days)
Two-way sync
DWH → HubSpot writeback
Write back scores, classifications, and predicted values ​​calculated by BigQuery/Snowflake to HubSpot properties. ML model inference results and advanced aggregate values ​​can be used as triggers for HubSpot automation.
How to write back:Via HubSpot API(Schedule batch or event trigger)
engagement data
Event log of email/activity history
Engagement events such as email opens, clicks, page views, form submissions, phone calls, and meetings are synchronized as a chronological table. It can be used as learning data for time series analysis and behavior prediction models.
Update frequency:Minimum 15 minutes(Cannot be rewritten/only for analysis)
⚠️ BigQuery cost management——pay attention to scan volume

BigQuery charges based on the amount of data scanned ($5/TB). When a BI tool frequently runs full scan queries to the hubspot_live tableCloud costs of tens of thousands of yen per month occurThere are some things. Countermeasures: ① Avoid SELECT * and select only necessary columns / ② Filter by partition key (WHERE _hs_synced_at > ...) / ③ Point hubspot_daily to BI tools and limit hubspot_live to emergency confirmation only.

Section 6-4

Data lineage tracking/BI collaboration/operation design

Data Lineage Tracking

On the Enterprise planData lineage trackingFunctions are available. You can record and visualize the entire history of where the value of this HubSpot property came from, which workflow wrote it, and whether it was updated via DWH. This function is essential for compliance, auditing, and root cause analysis of data quality.

🔍 Data lineage example: health_score property provenance
You can track “Where did this score come from” on one screen?
data source
BigQuery
hubspot_live.contacts
+ product_usage table
conversion
dbt model
customer_health_score.sql
v1.4.2 (updated 2026/03/01)
write back
HubSpot API
PATCH /contacts/{id}
Every Monday 2:00 JST
HubSpot properties
health_score
Last updated: 2026/03/03
Value: 82
Where used
WF: Churn Alert
with health_score < 40
Slack notification to CSM
📌 Why lineage tracking is important: When you encounter an issue like ``health_score suddenly goes to 0'', with lineage records you can identify the cause in minutes: ``This occurred at the same time as deploying v1.4.2 of the dbt model → SQL division by zero occurred''. Without lineage records, the investigation can take an entire day.

Collaboration design with BI tools

BI toolsConnection methodRecommended data sourcePrecautions
Tableau Snowflake / BigQuery connector (standard) Mainly V2 (daily) ・V2_LIVE is for checking the latest value Avoid Live connections and recommend Extract mode. Reduces query costs
Looker / Looker Studio BigQuery connector (native integration) Mainly use hubspot_daily Looker Studio is free to use, but query costs increase when there are many simultaneous connections. Enable caching
Power BI Snowflake/BigQuery connector Import mode (daily update) recommended DirectQuery has higher BigQuery costs. It is cheaper to import the Dataset and update it regularly.
Metabase Snowflake/BigQuery connector V2 / hubspot_daily Popular with teams that use both HubSpot's standard reporting and BI. Free OSS version also available

DWH integration operational checklist

✅ Things to check before actual operation
🔐 Security/Permissions
Created a minimal privilege service account specifically for HubSpot (without project owner privileges)
I saved authentication information such as JSON keys and passwords in a secret manager (never share Slack or email)
Access to tables containing personal information (email/phone number) was restricted to the minimum necessary roles.
Determined retention period and deletion policy for personal data on the DWH side from the perspective of GDPR/Personal Information Protection Law
🔄 Sync/Data Quality
Verified that the number of records in HubSpot matches the number of records in DWH after the initial sync completes
I set an alert to monitor the "time elapsed since last synchronization" in the _HS_SYNCED_AT column (notification when over 30 minutes)
Shared the practice of always including a WHERE condition in BI queries that excludes records with is_deleted = TRUE.
Established a mechanism to monitor error logs for writeback processing (DWH → HubSpot)
💰 Cost management
Set a budget alert for BigQuery's monthly scan volume (recommended: notify when it exceeds $100 per month)
Set AUTO_SUSPEND (60 seconds) on Snowflake warehouse to reduce idle cost to zero
BI tool shared rules with team to avoid frequent full scans to hubspot_live (V2_LIVE)

📌 Chapter 6 Summary

DWH integration reflects HubSpot → DWH within 15 minutes with “CDC differential sync”

The CDC method transfers only changed rows, which is much more efficient than exporting all records. Two schemas are automatically generated: V2_LIVE (15 minute update) and V2/hubspot_daily (daily). Daily schema is recommended as the main data source for BI tools—real-time increases BI costs.

The basic rule for setup is a "dedicated service account with least privileges"

Snowflake creates a dedicated role + warehouse + database. BigQuery only grants BigQuery Data Editor + Job User to the service account. Granting project owner or administrator privileges to a HubSpot service account is a significant security risk.

Two-way sync allows you to use “DWH ML scores” for HubSpot automation

In addition to one-way communication from HubSpot to DWH, machine learning scores, segment classification, and predicted values ​​calculated with BigQuery/Snowflake can be written back to properties via the HubSpot API. This enables advanced automation such as "alert the CSM when the churn risk score calculated by ML exceeds a threshold."

Become an organization that can explain the “source of scores” with data lineage tracking

Without a mechanism to track where this score came from and what model version it was calculated by, it would take an entire day to identify the cause of a problem. Create an accountable data organization with a 3-piece set of data lineage records + dbt version control + writeback logs.

Next Chapter
Go to Chapter 7 →