Chapter 6: Data Warehouse Integration (Enterprise)—Bidirectional integration with Snowflake/BigQuery

Section 6-1

Architecture of DWH integration—what gets synchronized in which direction?

HubSpot Data Hub Enterprise's DWH integration brings together all your HubSpot CRM object data.Sync to your data warehouse in near real-timeIt is a function to do. Not just "export"Change-sensitive differential synchronization (CDC: Change Data Capture)Updates on the HubSpot side are reflected in DWH within a few minutes.

🏗️ HubSpot ↔ Data Warehouse Unified Architecture

👤

Contacts

All contact properties

🏢

Companies

All company properties

🟡

HubSpot Smart CRM

All objects/custom objects
Engagement/activity history

💰

Deals

Business negotiations/pipeline

🎫

Tickets

support ticket

↓ HubSpot → DWH (CDC differential synchronization, minimum 15 minutes)

↑ DWH → HubSpot (batch writeback/scheduling)

❄️

Snowflake

V2_LIVE Schema (15 minute update)
V2 schema (daily batch)

🔷

BigQuery

hubspot_live dataset (updated 15 minutes)
hubspot_daily dataset (daily)

CDC differential synchronization (transfer only changed rows/minimum 15 minutes)

Writeback (property updates from DWH to HubSpot)

All snapshots (daily batch)

Types of data synchronized and constraints

data category	Sync to DWH	Writeback from DWH	remarks
standard object (Contact/Company/Transaction/Ticket)	✓ All properties	✓ Write back property values	Deleted records are managed with the is_deleted flag
custom object	✓ All properties	✓ Compatible	Enterprise custom objects are also synced
engagement (email opening/click/activity)	✓ Synchronized as an event table	✗ Cannot be written back	For analytical use only. Writing from DWH is not possible
marketing email (Sending history/subscription status)	✓ Synchronization	△ Only subscription status can be written back	Email sending history cannot be overwritten
Attachments/images	✗ Not compatible	✗ Not compatible	Binary data is not synchronized. Sync URL only
Workflow execution log	✓ Synchronization (Enterprise)	✗ Cannot be written back	Audit/compliance use

💡 How to use V2_LIVE schema and V2 schema (Snowflake)

Two types of schemas are automatically generated in Snowflake.V2_LIVE: Near real-time schema updated every 15 minutes. Use it to check dashboards and the day's business figures.V2: Snapshot all items in daily batch. Used for maintaining past history and heavy analytical queries. V2 is recommended as the main data source for BI tools (V2_LIVE is expensive as it retrieves the latest data every time a query is made).

Section 6-2

Setting up a Snowflake connection—schema design and authentication settings

Integration with Snowflake is done from the "Data Warehouse" settings on the HubSpot side. Once connected, HubSpot willAutomatically creates database schema tables on Snowflake and keeps them in sync from then on. There is no need to create tables manually.

❄️ Snowflake integration setup instructions

Time required: 30-60 minutes (requires Snowflake administrator privileges)

Snowflake side

Create a HubSpot dedicated role warehouse database

Run the following with Snowflake administrator privileges. Create a least privilege role for HubSpot and assign a dedicated warehouse. Separation from the production database facilitates cost management and authority control.

-- Role creationCREATE ROLE HUBSPOT_ROLE; -- Warehouse creation (X-Small is sufficient.later scale up as needed)CREATE WAREHOUSE HUBSPOT_WH WITH WAREHOUSE_SIZE = 'X-SMALL' AUTO_SUSPEND = 60 AUTO_RESUME = TRUE; -- Database creationCREATE DATABASE HUBSPOT_CRM; -- AuthorizationGRANT USAGE ON WAREHOUSE HUBSPOT_WH TO ROLE HUBSPOT_ROLE; GRANT ALL ON DATABASE HUBSPOT_CRM TO ROLE HUBSPOT_ROLE;

Snowflake side

Create a service account user for HubSpot connectivity

Create a service account specifically for HubSpot instead of a human account. Generate a strong random string of characters for your password.

CREATE USER HUBSPOT_USER PASSWORD = '<strong_random_password>' DEFAULT_ROLE = HUBSPOT_ROLE DEFAULT_WAREHOUSE = HUBSPOT_WH MUST_CHANGE_PASSWORD = FALSE; GRANT ROLE HUBSPOT_ROLE TO USER HUBSPOT_USER;

HubSpot side

Enter your connection information and test your connection in HubSpot

Enter the following information from HubSpot's "Settings → Data Management → Data Warehouse → Connect Snowflake".

Account identifier: xxxxx.ap-northeast-1 (e.g. Tokyo region) Warehouse name: HUBSPOT_WH Database name: HUBSPOT_CRM Schema name: PUBLIC (automatically generated) Username: HUBSPOT_USER Password: <Password you created>

HubSpot side

Select objects and orientation to sync

After successful connection, set the objects to be synchronized (contacts, companies, deals, tickets, custom objects) and the synchronization direction (HubSpot → Snowflake only/bidirectional). For the first time, we recommend starting with one direction from HubSpot to Snowflake, confirming operation, and then enabling both directions.

Snowflake side

Confirm initial synchronization is complete and validate table structure

The first full synchronization may take several minutes to several hours depending on the amount of data in HubSpot. After completion, check the automatically generated tables on the Snowflake side.

-- Check the list of created tables SHOW TABLES IN SCHEMA HUBSPOT_CRM.V2_LIVE; -- Check the number of contacts (does it match the one on HubSpot side?) SELECT COUNT(*) FROM HUBSPOT_CRM.V2_LIVE.CONTACTS; -- Check last sync time SELECT MAX(_FIVETRAN_SYNCED) FROM HUBSPOT_CRM.V2_LIVE.CONTACTS;

Structure of an automatically generated Snowflake schema

The tables that HubSpot automatically generates in Snowflake have a one-to-one correspondence with HubSpot's CRM objects. Each table hasAll properties are expanded as columnsIn addition, columns for system management (_hs_synced_at, hs_object_id, is_deleted) are added.

HUBSPOT_CRM.V2_LIVE — Schema excerpt of main tables Snowflake

CONTACTS contact

HS_OBJECT_IDNUMBERPK/HubSpot record ID

EMAILVARCHARemail address

FIRSTNAMEVARCHARgiven name

LASTNAMEVARCHARLast name

LIFECYCLESTAGEVARCHARlife cycle stage

HUBSPOT_OWNER_IDNUMBERFK → OWNERS

HS_LEAD_STATUSVARCHARlead status

INDUSTRYVARCHARIndustry

_HS_SYNCED_ATTIMESTAMP_NTZLast sync date and time

IS_DELETEDBOOLEANDeletion flag

DEALS transaction

HS_OBJECT_IDNUMBERPK

DEALNAMEVARCHAROpportunity name

AMOUNTFLOATamount

DEALSTAGEVARCHARBusiness negotiation stage

CLOSEDATEDATEScheduled closing date

PIPELINEVARCHARpipeline

HUBSPOT_OWNER_IDNUMBERFK → OWNERS

ASSOCIATIONS__CONTACT_IDSARRAYRelated contact ID array

_HS_SYNCED_ATTIMESTAMP_NTZLast sync date and time

IS_DELETEDBOOLEANDeletion flag

Section 6-3

Setting up a BigQuery connection—project settings and schema

Similar to Snowflake, integration with BigQuery can be completed by simply configuring HubSpot.Create a Google Cloud service account and pass JSON keys to HubSpotHubSpot automatically creates and manages BigQuery datasets and tables for you.

🔷 BigQuery integration setup instructions

Time required: 20-40 minutes (requires Google Cloud IAM administrator privileges)

Google Cloud side

Create a service account specifically for HubSpot

Create from Google Cloud Console → IAM and Management → Service Account. Example name:[email protected]. Only the following minimum roles will be granted.

Required IAM roles: - BigQuery Data Editor (hubspot- prefix dataset only) - BigQuery Job User (query execution privileges) - BigQuery Data Viewer (read access) * Never assign BigQuery Admin or Project Owner.

Google Cloud side

Generate and download JSON key for service account

Select "Keys" tab of the service account → "Add key" → "JSON" and download. This file is confidential information, so keep it in a safe place and do not share it via email, Slack, etc.

HubSpot side

Upload a JSON key to HubSpot and connect

Upload the GCP project ID and JSON key file from "Settings → Data Management → Data Warehouse → Connect BigQuery". Run a connection test and check if the connection is successful.

BigQuery side (check automatic generation)

See the dataset structure that HubSpot automatically generates

After the first synchronization, the following datasets are automatically generated in BigQuery.

hubspot_live (Near real-time: updated every 15 minutes)├── contacts├── companies├── deals├── tickets└── engagements (engagement events) hubspot_daily (Daily snapshot: retains the past 90 days) ├── contacts_YYYYMMDD ├── deals_YYYYMMDD └── ... (date partition)

Types and usage of synchronization schemas

Snowflake: V2_LIVE / BigQuery: hubspot_live

Near real-time schema

Changes made in HubSpot take effect within 15 minutes. Use for real-time dashboards to see the latest CRM status, confirmation of the current day's sales progress, and alert systems. The cost tends to be high because the latest data is acquired for each query.

Update frequency:Minimum 15 minutes(Only changed rows are transferred by CDC)

Snowflake: V2 / BigQuery: hubspot_daily

Daily snapshot schema

Obtain snapshots of all items on a daily basis. Used for monthly reports, annual comparisons, and machine learning learning data that refer to data as of a specific date in the past. Cheaper than V2_LIVE and more stable queries.

Update frequency:Once a day(Snapshot of all items. Retained for the past 90 days)

Two-way sync

DWH → HubSpot writeback

Write back scores, classifications, and predicted values calculated by BigQuery/Snowflake to HubSpot properties. ML model inference results and advanced aggregate values can be used as triggers for HubSpot automation.

How to write back:Via HubSpot API(Schedule batch or event trigger)

engagement data

Event log of email/activity history

Engagement events such as email opens, clicks, page views, form submissions, phone calls, and meetings are synchronized as a chronological table. It can be used as learning data for time series analysis and behavior prediction models.

Update frequency:Minimum 15 minutes(Cannot be rewritten/only for analysis)

⚠️ BigQuery cost management——pay attention to scan volume

BigQuery charges based on the amount of data scanned ($5/TB). When a BI tool frequently runs full scan queries to the hubspot_live tableCloud costs of tens of thousands of yen per month occurThere are some things. Countermeasures: ① Avoid SELECT * and select only necessary columns / ② Filter by partition key (WHERE _hs_synced_at > ...) / ③ Point hubspot_daily to BI tools and limit hubspot_live to emergency confirmation only.

Section 6-4

Data lineage tracking/BI collaboration/operation design

Data Lineage Tracking

On the Enterprise planData lineage trackingFunctions are available. You can record and visualize the entire history of where the value of this HubSpot property came from, which workflow wrote it, and whether it was updated via DWH. This function is essential for compliance, auditing, and root cause analysis of data quality.

🔍 Data lineage example: health_score property provenance

You can track “Where did this score come from” on one screen?

data source

BigQuery

hubspot_live.contacts
+ product_usage table

→

conversion

dbt model

customer_health_score.sql
v1.4.2 (updated 2026/03/01)

→

write back

HubSpot API

PATCH /contacts/{id}
Every Monday 2:00 JST

→

HubSpot properties

health_score

Last updated: 2026/03/03
Value: 82

→

Where used

WF: Churn Alert

with health_score < 40
Slack notification to CSM

📌 Why lineage tracking is important: When you encounter an issue like ``health_score suddenly goes to 0'', with lineage records you can identify the cause in minutes: ``This occurred at the same time as deploying v1.4.2 of the dbt model → SQL division by zero occurred''. Without lineage records, the investigation can take an entire day.

Collaboration design with BI tools

BI tools	Connection method	Recommended data source	Precautions
Tableau	Snowflake / BigQuery connector (standard)	Mainly V2 (daily) ・V2_LIVE is for checking the latest value	Avoid Live connections and recommend Extract mode. Reduces query costs
Looker / Looker Studio	BigQuery connector (native integration)	Mainly use hubspot_daily	Looker Studio is free to use, but query costs increase when there are many simultaneous connections. Enable caching
Power BI	Snowflake/BigQuery connector	Import mode (daily update) recommended	DirectQuery has higher BigQuery costs. It is cheaper to import the Dataset and update it regularly.
Metabase	Snowflake/BigQuery connector	V2 / hubspot_daily	Popular with teams that use both HubSpot's standard reporting and BI. Free OSS version also available

DWH integration operational checklist

✅ Things to check before actual operation

🔐 Security/Permissions

Created a minimal privilege service account specifically for HubSpot (without project owner privileges)

I saved authentication information such as JSON keys and passwords in a secret manager (never share Slack or email)

Access to tables containing personal information (email/phone number) was restricted to the minimum necessary roles.

Determined retention period and deletion policy for personal data on the DWH side from the perspective of GDPR/Personal Information Protection Law

🔄 Sync/Data Quality

Verified that the number of records in HubSpot matches the number of records in DWH after the initial sync completes

I set an alert to monitor the "time elapsed since last synchronization" in the _HS_SYNCED_AT column (notification when over 30 minutes)

Shared the practice of always including a WHERE condition in BI queries that excludes records with is_deleted = TRUE.

Established a mechanism to monitor error logs for writeback processing (DWH → HubSpot)

💰 Cost management

Set a budget alert for BigQuery's monthly scan volume (recommended: notify when it exceeds $100 per month)

Set AUTO_SUSPEND (60 seconds) on Snowflake warehouse to reduce idle cost to zero

BI tool shared rules with team to avoid frequent full scans to hubspot_live (V2_LIVE)

📌 Chapter 6 Summary

DWH integration reflects HubSpot → DWH within 15 minutes with “CDC differential sync”

The CDC method transfers only changed rows, which is much more efficient than exporting all records. Two schemas are automatically generated: V2_LIVE (15 minute update) and V2/hubspot_daily (daily). Daily schema is recommended as the main data source for BI tools—real-time increases BI costs.

The basic rule for setup is a "dedicated service account with least privileges"

Snowflake creates a dedicated role + warehouse + database. BigQuery only grants BigQuery Data Editor + Job User to the service account. Granting project owner or administrator privileges to a HubSpot service account is a significant security risk.

Two-way sync allows you to use “DWH ML scores” for HubSpot automation

In addition to one-way communication from HubSpot to DWH, machine learning scores, segment classification, and predicted values calculated with BigQuery/Snowflake can be written back to properties via the HubSpot API. This enables advanced automation such as "alert the CSM when the churn risk score calculated by ML exceeds a threshold."

Become an organization that can explain the “source of scores” with data lineage tracking

Without a mechanism to track where this score came from and what model version it was calculated by, it would take an entire day to identify the cause of a problem. Create an accountable data organization with a 3-piece set of data lineage records + dbt version control + writeback logs.

Next Chapter

Go to Chapter 7 →

Data warehouse integration (Enterprise)
Snowflake / BigQuery Two-way collaboration with

📋 Contents of this chapter

Architecture of DWH integration—what gets synchronized in which direction?

Types of data synchronized and constraints

Setting up a Snowflake connection—schema design and authentication settings

Structure of an automatically generated Snowflake schema

Setting up a BigQuery connection—project settings and schema

Types and usage of synchronization schemas

Data lineage tracking/BI collaboration/operation design

Data Lineage Tracking

Collaboration design with BI tools

DWH integration operational checklist

📌 Chapter 6 Summary

DWH integration reflects HubSpot → DWH within 15 minutes with “CDC differential sync”

The basic rule for setup is a "dedicated service account with least privileges"

Two-way sync allows you to use “DWH ML scores” for HubSpot automation

Become an organization that can explain the “source of scores” with data lineage tracking

Data warehouse integration (Enterprise)Snowflake / BigQuery Two-way collaboration with

📋 Contents of this chapter

Architecture of DWH integration—what gets synchronized in which direction?

Types of data synchronized and constraints

Setting up a Snowflake connection—schema design and authentication settings

Structure of an automatically generated Snowflake schema

Setting up a BigQuery connection—project settings and schema

Types and usage of synchronization schemas

Data lineage tracking/BI collaboration/operation design

Data Lineage Tracking

Collaboration design with BI tools

DWH integration operational checklist

📌 Chapter 6 Summary

DWH integration reflects HubSpot → DWH within 15 minutes with “CDC differential sync”

The basic rule for setup is a "dedicated service account with least privileges"

Two-way sync allows you to use “DWH ML scores” for HubSpot automation

Become an organization that can explain the “source of scores” with data lineage tracking

Data warehouse integration (Enterprise)
Snowflake / BigQuery Two-way collaboration with