“I want to analyze HubSpot's CRM data and Snowflake's sales data together.” “I want to match Google Sheets' budget table and opportunity pipeline.” Up until now, the only options were for engineers to write ETL pipelines or manually merge CSV files. Data Studio isBlend and transform multiple data sources, apply formulas, and build datasets with a spreadsheet-like no-code UIRealize. This chapter explains the overall picture of Data Studio, data source connections, conversion and blending, formula proposal using AI, and pipeline design for each use case.
Data Studio will be added to HubSpot Data Hub in 2025No-code data integration and transformation platformis. Combine and transform multiple external data sources (Snowflake, BigQuery, AWS S3, Google Sheets, CSV files) and HubSpot CRM data with a spreadsheet-like UI,Publish as a dataset for use within HubSpothave a function.
"I want to combine HubSpot CRM data with one or two other external data sources and use the results for HubSpot reporting and automation." Data Studio fits perfectly into this need. If large-scale complex ETL is required, combination with dbt/Fivetran is appropriate.Data Studio can cover 80% of the data integration and confirmation work that RevOps uses on a daily basis.。
There are two main types of data sources that can be connected to Data Studio: "HubSpot internal sources" and "external sources." Authentication settings are required to connect to external sources, and the sources available vary depending on the plan.
| Connection destination | Recommended authentication method | Precautions |
|---|---|---|
| Snowflake | Dedicated service account + read-only role | Do not give write permission to the production database. Create a read-only role specifically for Data Studio |
| BigQuery | Service account JSON key + BigQuery Data Viewer role | Do not give project owner privileges. Set maximum scan volume to manage query costs |
| AWS S3 | IAM user (least privilege: s3:GetObject only) | We recommend restricting access to a specific prefix (folder) rather than accessing the entire bucket. |
| Google Sheets | OAuth (Google account authentication) | If you are connecting with a retired employee's account, you will not be able to retrieve data when your authority expires. Connect with a shared service account |
The core of Data Studio isJOIN, transform, and aggregate multiple tables” The ability to build pipelines with no code. As you drag and drop each step on the UI, an SQL query is automatically generated behind the scenes.
| JOIN type | Explains everything from design to implementation. | Typical usage with HubSpot |
|---|---|---|
| LEFT JOIN (recommended default) | Keep all rows of the left table. Rows with no match in the right table will be NULL | "All HubSpot contacts + those in Snowflake combine sales data." contacts won't disappear |
| INNER JOIN | Keep only matching rows in both tables | “I want to analyze only rows that exist in both HubSpot transactions AND Snowflake billing records.” |
| FULL OUTER JOIN | Keep all rows from both tables. Unmatched rows are NULL | “I want to detect both transactions that are in HubSpot but not in Snowflake (missed billing) and transactions that are in Snowflake but not in HubSpot (unregistered transactions).” |
When you try to JOIN HubSpot's contact ID (numeric type) and Snowflake's customer_id (string type), it fails.Unify data types using "calculated columns" before JOIN(Example: Convert HubSpot ID to string → CAST(contact_id AS VARCHAR)). The fastest way is to ask the AI to ``convert this column to a string'' from the UI's ``Add calculated column.''
One of the major differentiating features of Data Studio is“Have AI suggest a formula” functionis. On the Add Calculation Column screen, if you give instructions in natural language such as ``I want to calculate ARR from this data'' or ``I want to classify tiers based on industry and company size,'' AI will generate the corresponding formula.
From "Calculate ARR", there are three types of BILLING_CYCLE fields: monthly, quarterly, and yearly.Create an annualized ARR column by 12x, 4x, and as is, respectively.The more specific you are about field names, conditions, and expected output, the more accurate it will be.Low threshold (less than 60)
The actual usage patterns of Data Studio can be broadly classified into three types. Let's take a closer look at each pipeline design.
Property design and association designCan detect problems with forecast accuracy, unbilled items, and double registrations.。
The use case is to combine HubSpot CRM + product usage data (BigQuery) + support tickets (Zendesk API) to calculate a customer health score and write it back to contact properties in HubSpot. The CS team's weekly manual merging of CSV files can now be completely automated.
Data Studio datasets can be updated manually or scheduled (at least every hour).Every 1 to 4 hours for dashboards that require real-time performance; Daily for monthly reports; Weekly for datasets that involve heavy conversion processing.is the standard. Increased schedule updates will consume API quota, so increase the frequency only if you really need real-time updates.
ETL tools (dbt/Fivetran) are flexible but require engineering man-hours. BI tools (Tableau/Looker) have powerful visualization, but cannot be written back to CRM. Data Studio is the fastest, no-code way to meet the needs of ``combining HubSpot CRM data with 1 to 3 external sources and leveraging them within HubSpot.''
Six types can be used: HubSpot CRM (internal), Snowflake, BigQuery, AWS S3, Google Sheets, and CSV. External connections require a dedicated service account and minimum privilege authentication settings. Use a shared account for your organization, rather than connecting with a former employee's personal account.
For most use cases, LEFT JOIN (keeps all HubSpot records) is the correct choice. Use FULL OUTER JOIN only when you want to detect differences between records that exist on both sides. Mismatched JOIN key data types (numeric vs. string) is the most common cause of errors—convert the type in the computed column before joining.
From ``Calculate ARR'', give specific instructions such as ``Create an ARR column where the BILLING_CYCLE field has three values: Monthly/Quarterly/Annual, Monthly is 12 times, Quarterly is 4 times, and Annual is the same.'' Be sure to preview the generated formula before applying it to production.
With a FULL OUTER JOIN of HubSpot Opportunities x Snowflake billing data, you can automatically detect opportunities that are in HubSpot but have not been billed or that have been billed but are not in HubSpot. By simply flagging discrepancies over 5% as "confirmation required" and creating a report, the monthly sales reconciliation process will be reduced from 1 hour to 5 minutes.
Not all datasets need to be updated in real time. A good guideline is every 1 to 4 hours for dashboards, daily for monthly analysis, and weekly for heavy join processing. If you do it too frequently, it will consume your API quota and affect other processes. When setting the data, think about how many hours old this data can be without affecting decision-making.