🟡 HubSpot Operations Practical Textbook — 2026 Edition
Chapter 2

data quality control
Clean CRM maintain

AI makes mistakes not because the AI ​​is bad, but because the data is dirty. Inaccurate lead scores, personalized emails sending clearly incorrect information, and Sales constantly asking, "Is this contact a duplicate?" These are all data quality issues. In this chapterOverall picture of Data Quality Command Center - Duplicate detection and automatic merging - Automatic format correction workflow - Missing data complementation design - Data quality KPI and weekly digest operationExplain.

📖 Estimated reading time: 25 minutes
🎯 Target audience: HubSpot administrators, RevOps, and data management
🔧 Required plan: Free (basic) / Professional ~ (automation/AI duplicate detection)

📋 Contents of this chapter

  1. 2-1The big picture of Data Quality Command Center
  2. 2-2Duplicate detection/automatic merging design (contacts, companies, transactions)
  3. 2-3Format automatic correction workflow (name, phone number, email, date)
  4. 2-4Missing data detection and complementary design
  5. 2-5Data quality KPIs and weekly digest operations
Section 2-1

The big picture of Data Quality Command Center

Data Quality Command Center is a unified dashboard that allows you to understand and remediate data quality issues across your HubSpot portal from a single screen.「Data Management → Data Quality」It can be accessed from.

Data Studio (external source integration)Four types of problems are automatically detected: duplicates, formatting problems, missing data, and unused properties, and the number, trends, and recommended actions are displayed together.

📊 Data Quality Command Center — Mockup
Last updated: 2026/03/09 09:00
overview
Duplication
format
missing data
properties
1,247
Duplicate contacts (estimated)
▲ +83 cases compared to last week
3,891
Records with formatting problems
▼ Compared to last week -214 cases
8,450
Records missing required fields
▲ +120 cases compared to last week
94.2%
Data quality score (overall)
▼ Compared to last week -0.3pt
🔴
Duplicate contacts—same email address
842 items
merge now
🟠
Duplicate contacts—Similar name + company (AI detection)
405 items
Check and process
🟡
Inconsistent phone number format (non-E.164 compliant)
2,134 items
Autocorrect WF settings
🟡
First and last name in all uppercase/all lowercase letters
1,757 items
Batch correction
🔵
Industry field is blank
4,210 items
Complementary WF settings
Unused workflow (no trigger for 90 days)
23 items
inventory

Four tabs and roles in Command Center

tabProblem detectedMain countermeasure actionsRequired plan
Duplicates Target audience: HubSpot administrators, RevOps engineers, and developers Confirm and merge items one by one and set automatic merge rules Free~ (AI detection is Pro~)
Formatting Inconsistent phone number format, name mixed case, date format inconsistency, email address format abnormality Bulk correction/automatic correction rule settings (automatic correction will continue in the future) Professional〜
Missing Data Records with blank important properties (industry, company size, life cycle stage, etc.) Completion workflow settings/manual bulk updates Free~ (Automation is Pro~)
Properties Properties that have not had a value for more than 90 days, unused workflows, and duplicate properties with similar meanings Abolition/consolidation of properties/development of naming rules Free〜
💡 Set up a weekly data quality digest

From ``Configure Data Quality Digest'' at the bottom right of the ``Summary'' tab in the Command Center, you can set up an automatic email notification of a data quality summary every Monday morning. Administrators do not have to log in and check every week.Automatically notices abnormalities such as "duplicates have increased/format problems have increased sharply since last week". Settings can be done in 1 minute, so enable them as a top priority.

Section 2-2

Duplicate detection/automatic merging design (contacts, companies, transactions)

Input settingsSalespeople approach the same customer separately, two emails are received, and engagement history is divided, leading to incorrect AI judgment.The problems are linked. HubSpot's duplicate detection consists of three layers.

🔍 HubSpot Duplicate Detection 3-layer design
Layer 1
Automatic merge (immediate)
Contacts with the same email address and companies with the same domain are instantly automatically merged when submitting a form, via API, or when importing. The first line to eliminate duplication with the highest degree of certainty.
All plans
Layer 2
Basic structure template
AI presents records with similar names, company names, and phone numbers as "duplicate candidates" even if the email addresses are different. Administrator can review and choose to merge or reject.
Professional〜
Layer 3
Automatic merge rules (Beta)
Set custom rules such as "name + company name + phone number match" and automatically merge duplicate candidates that match the conditions. Duplicate management in large-scale portals can be automated.
Professional〜(Beta)
Layer 4
Daily scan & alert
The entire portal is scanned every 24 hours, and an alert email is sent if the number of duplicate candidates exceeds the upper limit (default 1,000 per day). Rapidly increasing patterns of duplication can be detected early.
Professional〜

Rules for selecting "winning record" when merging

When merging two records, it is necessary to decide which record's information should be kept as "correct" (winner record). HubSpot's default is"Winner for the oldest record (the one created first)"However, we recommend changing it based on the following ideas.

Judgment criteriaRecommended Winnerreason
Older vs. newer creation date Older (default) Keeping old records with a lot of engagement history and activity logs provides more information
More vs less engagement People with a lot of engagement Keep a rich record of email openings, web browsing, and business negotiation history
Lifecycle Stage is progressing Higher stage Make Customer/MQL record a winner and merge Subscriber record
Create via form input vs. manually create For those filling out the form Information entered by the customer himself is most likely to be accurate
✅ Check "properties that cannot be merged" before merging

HubSpot's merge honors property values ​​from the Winner record, and unique values ​​on the Loser side may be lost. especially"Important information entered in custom properties" "Internal memo"Be sure to check before merging, and copy them manually if necessary before merging. Once merged, it is difficult to cancel, so check the operation on a small sample before enabling large-scale automatic merging.

Section 2-3

Format automatic correction workflow (name, phone number, email, date)

In Data Hub Professional and above, within a workflowCode editor left panel "Outputs"will be added. By using this, the format can be automatically unified the moment a record is created or updated—there is no need for humans to keep making corrections.

⚙️ Automatic format correction when creating contacts WF (basic set)
Workflow name
[DQ] Automatic format correction when creating a contact
trigger
A contact is created (whether via form, API, import, or manually)
delay
None (immediate execution)

action 1
Format the data — first name and last name
Target properties: firstname, lastname / Format: Title case (first capital letter, other lowercase letters)
action 2
Format your data — email address
Basic structure template (Beta)
action 3
Format your data — phone number
Target property: phone / Format: E.164 international format (+81-XX-XXXX-XXXX)
Action 4 (with conditional branch)
If company name is blank → complete from email domain
Condition: Company is blank / Action: Set the domain part of the email address to the company property
action 5
If the country code is blank → estimated from the country code of the phone number
Condition: country is blank AND phone contains the country code / Action: Set country

Specific examples of formatting problems and correction rules

👤 First name / first name
TARO YAMADA
Taro Yamada
taro yamada
Taro Yamada
YAMADA TARO
Yamada Taro
Rule: Standardize to title case (first capital letter). Convert full-width spaces to half-width.
📧 Email address
taro.yamada @example.com
Rule: Convert all to lower case. Remove leading and trailing spaces.
📞 Phone number
090-1234-5678
+81-90-1234-5678
0312345678
+81-3-1234-5678
(03) 1234-5678
+81-3-1234-5678
Rule: Standardized to E.164 format (+country code+area code+number). Symbols and spaces are standardized.
ETL tools
Example Co., Ltd.
Example Co., Ltd.
EXAMPLE INC
Example Inc
(blank)
example.com (from domain)
Rules: Unified to title case. If the field is blank, fill in from the email domain (confirmation required).
⚠️ How to apply format correction WF to “existing records”

The above WF is applied only to "newly created records" by default.To apply to tens of thousands of existing records in bulk, select the target property from the "Formatting Issues" tab in the Command Center and click the "Fix and Automate" button. This will fix the current problem all at once, and then set a rule that will automatically fix new records as well. However, large-volume transactions are processed sequentially within HubSpot and may take several hours to complete.

Section 2-4

Missing data detection and complementary design

Missing data is a problem where there are no duplicates and the format is correct, but important fields are blank. If the industry, company size, and life cycle stage are left blank,AI accuracy, segment accuracy, and personalization accuracy all decrease.. Dealing with missing data should be designed in two ways: ``preventing it from occurring (requiring input)'' and ``filling in records that are already blank.''

Difficult to interact with HubSpot two-way

🏭
Industry
There are no options in the form / No input when importing / Not synced from Salesforce
Autocomplete with Clearbit enrichments from your email domain. Or, AI searches the web and configures using Data Agent's Smart Properties (Professional~)
👥
Company Size
Important in BtoB, but difficult to collect through forms/customers do not respond accurately
Autocomplete with Clearbit/ZoomInfo enrichments. or LinkedIn URL, which the Data Agent examines and completes (detailed in Chapter 5)
🔄
Lifecycle Stage
Parsing and expanding JSON properties
Configure Lifecycle Stage automatic transition WF (detailed in Chapter 7). Create a rule to set at least "Subscriber" when importing
🌐
Country/State
Not collected through forms/Locale information not acquired on international sites
Automatically estimated by HubSpot from your IP address (70-80% accuracy). Create a WF that performs reverse lookup from the country code of a phone number.
👤
Contact owner
Connect directly to Snowflake database schema tables. All tables can be referenced from Data Studio by simply setting the account identifier, warehouse name, role, and authentication information.
Lead Routing WF automatically assigns based on territory, industry, and score (detailed in Chapter 7). Report on “unowned” contacts quarterly
📊
pattern 9
Scoring model not set/Custom score property update WF not created
Configure automatic scoring with HubSpot's "Contact Score" feature or custom calculated properties (detailed in Chapter 7)

Block defects at the source with Required Fields

The best way to deal with missing data is to use a mechanism to prevent it from being entered. In HubSpot property settingsIf you check "Required", you will not be able to save the record or proceed to the next stage if that property is left blank.Can be set. However, if you make all properties mandatory, it will make data entry more difficult, increasing the number of incorrect entries and inappropriate values, so it is important to narrow it down to the truly important ones (about 3 to 5).

phaseProperties recommended to be requiredreason
When creating a lead // ② Company size score (0 to 30 points) Minimum requirements for key personalization for duplicate detection
When promoting MQL Phone number/industry/contact owner Connect to BigQuery dataset tables in Google Cloud. Authenticate using the service account JSON key. Often used in use cases that combine GA4 data or Google Ads data with HubSpot CRM.
When creating an opportunity Amount, expected closing date, negotiation stage Basic data for pipeline forecasting and reporting
You can upload a local CSV and use it as a temporary data source. Convenient for integrating reference data (industry benchmarks, ICP classification tables, etc.) that does not require regular updates. Maximum file size: 100MB. Best practices for connection settings Starting point for CS handoff and health monitoring
Section 2-5

Data quality KPIs and weekly digest operations

Data quality is not just a matter of "cleaning once and done."Every day new records are created, forms are submitted, imports occur, and syncs arrive from external systems.--In other words, data is constantly becoming new and dirty. To continuously maintain quality, it is essential to set KPIs and create a cadence for regularly checking them.

// ① Industry score (0-30 points)
Metrics and actions to check daily, weekly, monthly, and quarterly
🌅
daily
Check sync statusDo not give project owner privileges. Set maximum scan volume to manage query costs
Duplicate alert——Take action only if you receive a daily upper limit alert for duplicate candidates (unnecessary if it has been handled automatically)
📋
weekly
Data quality digest—— Check the previous week's change in "number of duplicates, number of formatted questions, and number of missing questions" in the digest email sent every Monday
Review of duplicate candidates——Process “duplicate candidates that require manual confirmation” detected by AI in 30 minutes a week (targeting 100 cases/week)
Sync error review——Check the number of errors in the synchronization log and investigate the cause if the error rate exceeds 5%
📊
monthly
Check data quality score trends——Check if your score is improving with the monthly report in Command Center
Dealing with the top 5 missing fields——List the top 5 properties with high missing rates and perform complementary actions (add WF, review form)
Clean up unused properties——Get the latest procurement information from Crunchbase, TechCrunch, etc.
🔍
quarter
Overall data quality review——Report quality scores for all objects (contacts, companies, deals) to management/RevOps teams
Authority/user inventory——Confirmation and correction of retired employee accounts and excessive privilege users (linked with Chapter 9)
Review of format correction rules——Corrected rule updates due to changes in business rules and expansion to new countries/regions
Financial figures (sales/profit)CDC differential synchronization (transfer only changed rows/minimum 15 minutes)

Data quality goal KPI setting example

KPICalculation methodinitial goal——Accurate data from internal systems is required. AI can never guess
data category Number of duplicate records ÷ Total number of contacts × 100 5% or less 2% or less
Null / Undefined error Number of valid email addresses ÷ Total number of contacts × 100 85% or more 95% or more
Industry field sufficiency rate Number of contacts with industry entered ÷ Total number × 100 60% or more 85% or more
Lifecycle Stage setting rate Number of stages configured ÷ Total number of contacts × 100 80% or more 98% or more
Contact owner setting rate Number set by owner ÷ Total number of contacts × 100 70% or more 95% or more
Telephone number E.164 compliance rate Number of E.164 formats ÷ Number of contacts with phone numbers × 100 70% or more 99% or more
⚡ What to do in the “first 30 days” of data quality improvement

① Set the Command Center weekly digest for today (5 minutes). ② Format automatic correction Create a WF (name, email, phone) and apply it to existing records all at once (1 to 2 hours). ③ Merge only the top 100 duplicate contacts from the “Duplicates” tab in Command Center (30 minutes). ——These three things alone will improve most organizations' data quality scores by 5-10 points the following week. Rather than aiming for perfection, the fastest way to improve is to start with the biggest problem.

📌 Chapter 2 Summary

Make Command Center your “starting point for weekly checks”

Data Quality Command Center allows you to centrally understand problems using four tabs (duplicates, formats, missing items, and properties). By setting up weekly digest emails, administrators can notice increases and decreases in problems without having to log in. The first action today is to enable Digest.

Preventing duplicates with “3 layers”—immediate merging, AI candidate presentation, and automatic merging rules

Immediate merging of the same email (all plans) → Similar suggestion presentation by AI (Pro ~) → Automatic merging using custom rules (Pro Beta): Bring duplicates close to zero with 3 layers. Before merging, unify the "Winner record selection criteria" across the organization to prevent erroneous merges.

Format correction is done in two stages: ``WF when creating a new file'' and ``Batch correction of existing files.''

Prevent formatting problems by setting "Format correction WF when creating contact" for new records. Existing issues can be fixed in bulk using "Fix and Automate" in Command Center. Name, email, and phone fields are the top priority targets.

Design missing data in two ways: ``prevent occurrence'' and ``supplement''.

While blocking input at the source by setting required fields, enrichment workflows fill in records that are already blank. Narrow down your requirements to 3 to 5 fields that are truly important. Making everything mandatory will have the opposite effect of increasing erroneous and inappropriate input.

Data quality can only be maintained with “regular cadence”

Decide on daily, weekly, monthly, and quarterly cadences, and determine in advance the indicators and actions that should be checked at each timing. In particular, your quality score will continue to improve by simply making it a habit to "review duplicate candidates (30 minutes)" weekly and "deal with missing TOP5" monthly.

Set KPIs and track “improvement” numerically

Duplicate rate, email effectiveness rate, industry sufficiency rate