“I want to sync all of HubSpot's CRM data to Snowflake, which is used by my analytics team, in real time.” “I want to use scores from BigQuery's machine learning models to automate HubSpot.”Data warehouse integration (DWH integration)is the problem to be solved. This chapter systematically explains the architecture, schema design, setup procedure, data lineage tracking, and cooperation design with BI tools for bidirectional synchronization with Snowflake and BigQuery.
HubSpot Data Hub Enterprise's DWH integration brings together all your HubSpot CRM object data.Sync to your data warehouse in near real-timeIt is a function to do. Not just "export"Change-sensitive differential synchronization (CDC: Change Data Capture)Updates on the HubSpot side are reflected in DWH within a few minutes.
| data category | Sync to DWH | Writeback from DWH | remarks |
|---|---|---|---|
| standard object (Contact/Company/Transaction/Ticket) |
✓ All properties | ✓ Write back property values | Deleted records are managed with the is_deleted flag |
| custom object | ✓ All properties | ✓ Compatible | Enterprise custom objects are also synced |
| engagement (email opening/click/activity) |
✓ Synchronized as an event table | ✗ Cannot be written back | For analytical use only. Writing from DWH is not possible |
| marketing email (Sending history/subscription status) |
✓ Synchronization | △ Only subscription status can be written back | Email sending history cannot be overwritten |
| Attachments/images | ✗ Not compatible | ✗ Not compatible | Binary data is not synchronized. Sync URL only |
| Workflow execution log | ✓ Synchronization (Enterprise) | ✗ Cannot be written back | Audit/compliance use |
Two types of schemas are automatically generated in Snowflake.V2_LIVE: Near real-time schema updated every 15 minutes. Use it to check dashboards and the day's business figures.V2: Snapshot all items in daily batch. Used for maintaining past history and heavy analytical queries. V2 is recommended as the main data source for BI tools (V2_LIVE is expensive as it retrieves the latest data every time a query is made).
Integration with Snowflake is done from the "Data Warehouse" settings on the HubSpot side. Once connected, HubSpot willAutomatically creates database schema tables on Snowflake and keeps them in sync from then on. There is no need to create tables manually.
The tables that HubSpot automatically generates in Snowflake have a one-to-one correspondence with HubSpot's CRM objects. Each table hasAll properties are expanded as columnsIn addition, columns for system management (_hs_synced_at, hs_object_id, is_deleted) are added.
Similar to Snowflake, integration with BigQuery can be completed by simply configuring HubSpot.Create a Google Cloud service account and pass JSON keys to HubSpotHubSpot automatically creates and manages BigQuery datasets and tables for you.
BigQuery charges based on the amount of data scanned ($5/TB). When a BI tool frequently runs full scan queries to the hubspot_live tableCloud costs of tens of thousands of yen per month occurThere are some things. Countermeasures: ① Avoid SELECT * and select only necessary columns / ② Filter by partition key (WHERE _hs_synced_at > ...) / ③ Point hubspot_daily to BI tools and limit hubspot_live to emergency confirmation only.
On the Enterprise planData lineage trackingFunctions are available. You can record and visualize the entire history of where the value of this HubSpot property came from, which workflow wrote it, and whether it was updated via DWH. This function is essential for compliance, auditing, and root cause analysis of data quality.
| BI tools | Connection method | Recommended data source | Precautions |
|---|---|---|---|
| Tableau | Snowflake / BigQuery connector (standard) | Mainly V2 (daily) ・V2_LIVE is for checking the latest value | Avoid Live connections and recommend Extract mode. Reduces query costs |
| Looker / Looker Studio | BigQuery connector (native integration) | Mainly use hubspot_daily | Looker Studio is free to use, but query costs increase when there are many simultaneous connections. Enable caching |
| Power BI | Snowflake/BigQuery connector | Import mode (daily update) recommended | DirectQuery has higher BigQuery costs. It is cheaper to import the Dataset and update it regularly. |
| Metabase | Snowflake/BigQuery connector | V2 / hubspot_daily | Popular with teams that use both HubSpot's standard reporting and BI. Free OSS version also available |
The CDC method transfers only changed rows, which is much more efficient than exporting all records. Two schemas are automatically generated: V2_LIVE (15 minute update) and V2/hubspot_daily (daily). Daily schema is recommended as the main data source for BI tools—real-time increases BI costs.
Snowflake creates a dedicated role + warehouse + database. BigQuery only grants BigQuery Data Editor + Job User to the service account. Granting project owner or administrator privileges to a HubSpot service account is a significant security risk.
In addition to one-way communication from HubSpot to DWH, machine learning scores, segment classification, and predicted values calculated with BigQuery/Snowflake can be written back to properties via the HubSpot API. This enables advanced automation such as "alert the CSM when the churn risk score calculated by ML exceeds a threshold."
Without a mechanism to track where this score came from and what model version it was calculated by, it would take an entire day to identify the cause of a problem. Create an accountable data organization with a 3-piece set of data lineage records + dbt version control + writeback logs.