Chapter 2: Data Quality Management—Maintaining a Clean CRM

Section 2-1

The big picture of Data Quality Command Center

Data Quality Command Center is a unified dashboard that allows you to understand and remediate data quality issues across your HubSpot portal from a single screen.「Data Management → Data Quality」It can be accessed from.

Data Studio (external source integration)Four types of problems are automatically detected: duplicates, formatting problems, missing data, and unused properties, and the number, trends, and recommended actions are displayed together.。

📊 Data Quality Command Center — Mockup

Last updated: 2026/03/09 09:00

overview

Duplication

format

missing data

properties

1,247

Duplicate contacts (estimated)

▲ +83 cases compared to last week

3,891

Records with formatting problems

▼ Compared to last week -214 cases

8,450

Records missing required fields

▲ +120 cases compared to last week

94.2%

Data quality score (overall)

▼ Compared to last week -0.3pt

🔴

Duplicate contacts—same email address

842 items

merge now

🟠

Duplicate contacts—Similar name + company (AI detection)

405 items

Check and process

🟡

Inconsistent phone number format (non-E.164 compliant)

2,134 items

Autocorrect WF settings

🟡

First and last name in all uppercase/all lowercase letters

1,757 items

Batch correction

🔵

Industry field is blank

4,210 items

Complementary WF settings

⚫

Unused workflow (no trigger for 90 days)

23 items

inventory

Four tabs and roles in Command Center

tab	Problem detected	Main countermeasure actions	Required plan
Duplicates	Target audience: HubSpot administrators, RevOps engineers, and developers	Confirm and merge items one by one and set automatic merge rules	Free~ (AI detection is Pro~)
Formatting	Inconsistent phone number format, name mixed case, date format inconsistency, email address format abnormality	Bulk correction/automatic correction rule settings (automatic correction will continue in the future)	Professional〜
Missing Data	Records with blank important properties (industry, company size, life cycle stage, etc.)	Completion workflow settings/manual bulk updates	Free~ (Automation is Pro~)
Properties	Properties that have not had a value for more than 90 days, unused workflows, and duplicate properties with similar meanings	Abolition/consolidation of properties/development of naming rules	Free〜

💡 Set up a weekly data quality digest

From ``Configure Data Quality Digest'' at the bottom right of the ``Summary'' tab in the Command Center, you can set up an automatic email notification of a data quality summary every Monday morning. Administrators do not have to log in and check every week.Automatically notices abnormalities such as "duplicates have increased/format problems have increased sharply since last week". Settings can be done in 1 minute, so enable them as a top priority.

Section 2-2

Duplicate detection/automatic merging design (contacts, companies, transactions)

Input settingsSalespeople approach the same customer separately, two emails are received, and engagement history is divided, leading to incorrect AI judgment.The problems are linked. HubSpot's duplicate detection consists of three layers.

🔍 HubSpot Duplicate Detection 3-layer design

Layer 1

Automatic merge (immediate)

Contacts with the same email address and companies with the same domain are instantly automatically merged when submitting a form, via API, or when importing. The first line to eliminate duplication with the highest degree of certainty.

All plans

Layer 2

Basic structure template

AI presents records with similar names, company names, and phone numbers as "duplicate candidates" even if the email addresses are different. Administrator can review and choose to merge or reject.

Professional〜

Layer 3

Automatic merge rules (Beta)

Set custom rules such as "name + company name + phone number match" and automatically merge duplicate candidates that match the conditions. Duplicate management in large-scale portals can be automated.

Professional〜（Beta）

Layer 4

Daily scan & alert

The entire portal is scanned every 24 hours, and an alert email is sent if the number of duplicate candidates exceeds the upper limit (default 1,000 per day). Rapidly increasing patterns of duplication can be detected early.

Professional〜

Rules for selecting "winning record" when merging

When merging two records, it is necessary to decide which record's information should be kept as "correct" (winner record). HubSpot's default is"Winner for the oldest record (the one created first)"However, we recommend changing it based on the following ideas.

Judgment criteria	Recommended Winner	reason
Older vs. newer creation date	Older (default)	Keeping old records with a lot of engagement history and activity logs provides more information
More vs less engagement	People with a lot of engagement	Keep a rich record of email openings, web browsing, and business negotiation history
Lifecycle Stage is progressing	Higher stage	Make Customer/MQL record a winner and merge Subscriber record
Create via form input vs. manually create	For those filling out the form	Information entered by the customer himself is most likely to be accurate

✅ Check "properties that cannot be merged" before merging

HubSpot's merge honors property values from the Winner record, and unique values on the Loser side may be lost. especially"Important information entered in custom properties" "Internal memo"Be sure to check before merging, and copy them manually if necessary before merging. Once merged, it is difficult to cancel, so check the operation on a small sample before enabling large-scale automatic merging.

Section 2-3

Format automatic correction workflow (name, phone number, email, date)

In Data Hub Professional and above, within a workflowCode editor left panel "Outputs"will be added. By using this, the format can be automatically unified the moment a record is created or updated—there is no need for humans to keep making corrections.

⚙️ Automatic format correction when creating contacts WF (basic set)

Workflow name

[DQ] Automatic format correction when creating a contact

trigger

A contact is created (whether via form, API, import, or manually)

delay

None (immediate execution)

action 1

Format the data — first name and last name

Target properties: firstname, lastname / Format: Title case (first capital letter, other lowercase letters)

action 2

Format your data — email address

Basic structure template (Beta)

action 3

Format your data — phone number

Target property: phone / Format: E.164 international format (+81-XX-XXXX-XXXX)

Action 4 (with conditional branch)

If company name is blank → complete from email domain

Condition: Company is blank / Action: Set the domain part of the email address to the company property

action 5

If the country code is blank → estimated from the country code of the phone number

Condition: country is blank AND phone contains the country code / Action: Set country

Specific examples of formatting problems and correction rules

👤 First name / first name

TARO YAMADA

→

Taro Yamada

taro yamada

→

Taro Yamada

YAMADA　TARO

→

Yamada Taro

Rule: Standardize to title case (first capital letter). Convert full-width spaces to half-width.

📧 Email address

[email protected]

→

[email protected]

→

[email protected]

taro.yamada @example.com

→

[email protected]

Rule: Convert all to lower case. Remove leading and trailing spaces.

📞 Phone number

090-1234-5678

→

+81-90-1234-5678

0312345678

→

+81-3-1234-5678

(03) 1234-5678

→

+81-3-1234-5678

Rule: Standardized to E.164 format (+country code+area code+number). Symbols and spaces are standardized.

ETL tools

Example Co., Ltd.

→

Example Co., Ltd.

EXAMPLE　INC

→

Example Inc

(blank)

→

example.com (from domain)

Rules: Unified to title case. If the field is blank, fill in from the email domain (confirmation required).

⚠️ How to apply format correction WF to “existing records”

The above WF is applied only to "newly created records" by default.To apply to tens of thousands of existing records in bulk, select the target property from the "Formatting Issues" tab in the Command Center and click the "Fix and Automate" button. This will fix the current problem all at once, and then set a rule that will automatically fix new records as well. However, large-volume transactions are processed sequentially within HubSpot and may take several hours to complete.

Section 2-4

Missing data detection and complementary design

Missing data is a problem where there are no duplicates and the format is correct, but important fields are blank. If the industry, company size, and life cycle stage are left blank,AI accuracy, segment accuracy, and personalization accuracy all decrease.. Dealing with missing data should be designed in two ways: ``preventing it from occurring (requiring input)'' and ``filling in records that are already blank.''

Difficult to interact with HubSpot two-way

🏭

Industry

There are no options in the form / No input when importing / Not synced from Salesforce

Autocomplete with Clearbit enrichments from your email domain. Or, AI searches the web and configures using Data Agent's Smart Properties (Professional~)

👥

Company Size

Important in BtoB, but difficult to collect through forms/customers do not respond accurately

Autocomplete with Clearbit/ZoomInfo enrichments. or LinkedIn URL, which the Data Agent examines and completes (detailed in Chapter 5)

🔄

Lifecycle Stage

Parsing and expanding JSON properties

Configure Lifecycle Stage automatic transition WF (detailed in Chapter 7). Create a rule to set at least "Subscriber" when importing

🌐

Country/State

Not collected through forms/Locale information not acquired on international sites

Automatically estimated by HubSpot from your IP address (70-80% accuracy). Create a WF that performs reverse lookup from the country code of a phone number.

👤

Contact owner

Connect directly to Snowflake database schema tables. All tables can be referenced from Data Studio by simply setting the account identifier, warehouse name, role, and authentication information.

Lead Routing WF automatically assigns based on territory, industry, and score (detailed in Chapter 7). Report on “unowned” contacts quarterly

📊

pattern 9

Scoring model not set/Custom score property update WF not created

Configure automatic scoring with HubSpot's "Contact Score" feature or custom calculated properties (detailed in Chapter 7)

Block defects at the source with Required Fields

The best way to deal with missing data is to use a mechanism to prevent it from being entered. In HubSpot property settingsIf you check "Required", you will not be able to save the record or proceed to the next stage if that property is left blank.Can be set. However, if you make all properties mandatory, it will make data entry more difficult, increasing the number of incorrect entries and inappropriate values, so it is important to narrow it down to the truly important ones (about 3 to 5).

phase	Properties recommended to be required	reason
When creating a lead	// ② Company size score (0 to 30 points)	Minimum requirements for key personalization for duplicate detection
When promoting MQL	Phone number/industry/contact owner	Connect to BigQuery dataset tables in Google Cloud. Authenticate using the service account JSON key. Often used in use cases that combine GA4 data or Google Ads data with HubSpot CRM.
When creating an opportunity	Amount, expected closing date, negotiation stage	Basic data for pipeline forecasting and reporting
You can upload a local CSV and use it as a temporary data source. Convenient for integrating reference data (industry benchmarks, ICP classification tables, etc.) that does not require regular updates. Maximum file size: 100MB.	Best practices for connection settings	Starting point for CS handoff and health monitoring

Section 2-5

Data quality KPIs and weekly digest operations

Data quality is not just a matter of "cleaning once and done."Every day new records are created, forms are submitted, imports occur, and syncs arrive from external systems.--In other words, data is constantly becoming new and dirty. To continuously maintain quality, it is essential to set KPIs and create a cadence for regularly checking them.

// ① Industry score (0-30 points)

Metrics and actions to check daily, weekly, monthly, and quarterly

🌅

daily

Check sync statusDo not give project owner privileges. Set maximum scan volume to manage query costs

Duplicate alert——Take action only if you receive a daily upper limit alert for duplicate candidates (unnecessary if it has been handled automatically)

📋

weekly

Data quality digest—— Check the previous week's change in "number of duplicates, number of formatted questions, and number of missing questions" in the digest email sent every Monday

Review of duplicate candidates——Process “duplicate candidates that require manual confirmation” detected by AI in 30 minutes a week (targeting 100 cases/week)

Sync error review——Check the number of errors in the synchronization log and investigate the cause if the error rate exceeds 5%

📊

monthly

Check data quality score trends——Check if your score is improving with the monthly report in Command Center

Dealing with the top 5 missing fields——List the top 5 properties with high missing rates and perform complementary actions (add WF, review form)

Clean up unused properties——Get the latest procurement information from Crunchbase, TechCrunch, etc.

🔍

quarter

Overall data quality review——Report quality scores for all objects (contacts, companies, deals) to management/RevOps teams

Authority/user inventory——Confirmation and correction of retired employee accounts and excessive privilege users (linked with Chapter 9)

Review of format correction rules——Corrected rule updates due to changes in business rules and expansion to new countries/regions

Financial figures (sales/profit)CDC differential synchronization (transfer only changed rows/minimum 15 minutes)

Data quality goal KPI setting example

KPI	Calculation method	initial goal	——Accurate data from internal systems is required. AI can never guess
data category	Number of duplicate records ÷ Total number of contacts × 100	5% or less	2% or less
Null / Undefined error	Number of valid email addresses ÷ Total number of contacts × 100	85% or more	95% or more
Industry field sufficiency rate	Number of contacts with industry entered ÷ Total number × 100	60% or more	85% or more
Lifecycle Stage setting rate	Number of stages configured ÷ Total number of contacts × 100	80% or more	98% or more
Contact owner setting rate	Number set by owner ÷ Total number of contacts × 100	70% or more	95% or more
Telephone number E.164 compliance rate	Number of E.164 formats ÷ Number of contacts with phone numbers × 100	70% or more	99% or more

⚡ What to do in the “first 30 days” of data quality improvement

① Set the Command Center weekly digest for today (5 minutes). ② Format automatic correction Create a WF (name, email, phone) and apply it to existing records all at once (1 to 2 hours). ③ Merge only the top 100 duplicate contacts from the “Duplicates” tab in Command Center (30 minutes). ——These three things alone will improve most organizations' data quality scores by 5-10 points the following week. Rather than aiming for perfection, the fastest way to improve is to start with the biggest problem.

📌 Chapter 2 Summary

Make Command Center your “starting point for weekly checks”

Data Quality Command Center allows you to centrally understand problems using four tabs (duplicates, formats, missing items, and properties). By setting up weekly digest emails, administrators can notice increases and decreases in problems without having to log in. The first action today is to enable Digest.

Preventing duplicates with “3 layers”—immediate merging, AI candidate presentation, and automatic merging rules

Immediate merging of the same email (all plans) → Similar suggestion presentation by AI (Pro ~) → Automatic merging using custom rules (Pro Beta): Bring duplicates close to zero with 3 layers. Before merging, unify the "Winner record selection criteria" across the organization to prevent erroneous merges.

Format correction is done in two stages: ``WF when creating a new file'' and ``Batch correction of existing files.''

Prevent formatting problems by setting "Format correction WF when creating contact" for new records. Existing issues can be fixed in bulk using "Fix and Automate" in Command Center. Name, email, and phone fields are the top priority targets.

Design missing data in two ways: ``prevent occurrence'' and ``supplement''.

While blocking input at the source by setting required fields, enrichment workflows fill in records that are already blank. Narrow down your requirements to 3 to 5 fields that are truly important. Making everything mandatory will have the opposite effect of increasing erroneous and inappropriate input.

Data quality can only be maintained with “regular cadence”

Decide on daily, weekly, monthly, and quarterly cadences, and determine in advance the indicators and actions that should be checked at each timing. In particular, your quality score will continue to improve by simply making it a habit to "review duplicate candidates (30 minutes)" weekly and "deal with missing TOP5" monthly.

Set KPIs and track “improvement” numerically

Duplicate rate, email effectiveness rate, industry sufficiency rate

data quality control
Clean CRM maintain

📋 Contents of this chapter

The big picture of Data Quality Command Center

Four tabs and roles in Command Center

Duplicate detection/automatic merging design (contacts, companies, transactions)

Rules for selecting "winning record" when merging

Format automatic correction workflow (name, phone number, email, date)

Specific examples of formatting problems and correction rules

Missing data detection and complementary design

Difficult to interact with HubSpot two-way

Block defects at the source with Required Fields

Data quality KPIs and weekly digest operations

Data quality goal KPI setting example

📌 Chapter 2 Summary

Make Command Center your “starting point for weekly checks”

Preventing duplicates with “3 layers”—immediate merging, AI candidate presentation, and automatic merging rules

Format correction is done in two stages: ``WF when creating a new file'' and ``Batch correction of existing files.''

Design missing data in two ways: ``prevent occurrence'' and ``supplement''.

Data quality can only be maintained with “regular cadence”

Set KPIs and track “improvement” numerically

data quality controlClean CRM maintain

📋 Contents of this chapter

The big picture of Data Quality Command Center

Four tabs and roles in Command Center

Duplicate detection/automatic merging design (contacts, companies, transactions)

Rules for selecting "winning record" when merging

Format automatic correction workflow (name, phone number, email, date)

Specific examples of formatting problems and correction rules

Missing data detection and complementary design

Difficult to interact with HubSpot two-way

Block defects at the source with Required Fields

Data quality KPIs and weekly digest operations

Data quality goal KPI setting example

📌 Chapter 2 Summary

Make Command Center your “starting point for weekly checks”

Preventing duplicates with “3 layers”—immediate merging, AI candidate presentation, and automatic merging rules

Format correction is done in two stages: ``WF when creating a new file'' and ``Batch correction of existing files.''

Design missing data in two ways: ``prevent occurrence'' and ``supplement''.

Data quality can only be maintained with “regular cadence”

Set KPIs and track “improvement” numerically

data quality control
Clean CRM maintain