AI makes mistakes not because the AI is bad, but because the data is dirty. Inaccurate lead scores, personalized emails sending clearly incorrect information, and Sales constantly asking, "Is this contact a duplicate?" These are all data quality issues. In this chapterOverall picture of Data Quality Command Center - Duplicate detection and automatic merging - Automatic format correction workflow - Missing data complementation design - Data quality KPI and weekly digest operationExplain.
Data Quality Command Center is a unified dashboard that allows you to understand and remediate data quality issues across your HubSpot portal from a single screen.「Data Management → Data Quality」It can be accessed from.
Data Studio (external source integration)Four types of problems are automatically detected: duplicates, formatting problems, missing data, and unused properties, and the number, trends, and recommended actions are displayed together.。
| tab | Problem detected | Main countermeasure actions | Required plan |
|---|---|---|---|
| Duplicates | Target audience: HubSpot administrators, RevOps engineers, and developers | Confirm and merge items one by one and set automatic merge rules | Free~ (AI detection is Pro~) |
| Formatting | Inconsistent phone number format, name mixed case, date format inconsistency, email address format abnormality | Bulk correction/automatic correction rule settings (automatic correction will continue in the future) | Professional〜 |
| Missing Data | Records with blank important properties (industry, company size, life cycle stage, etc.) | Completion workflow settings/manual bulk updates | Free~ (Automation is Pro~) |
| Properties | Properties that have not had a value for more than 90 days, unused workflows, and duplicate properties with similar meanings | Abolition/consolidation of properties/development of naming rules | Free〜 |
From ``Configure Data Quality Digest'' at the bottom right of the ``Summary'' tab in the Command Center, you can set up an automatic email notification of a data quality summary every Monday morning. Administrators do not have to log in and check every week.Automatically notices abnormalities such as "duplicates have increased/format problems have increased sharply since last week". Settings can be done in 1 minute, so enable them as a top priority.
Input settingsSalespeople approach the same customer separately, two emails are received, and engagement history is divided, leading to incorrect AI judgment.The problems are linked. HubSpot's duplicate detection consists of three layers.
When merging two records, it is necessary to decide which record's information should be kept as "correct" (winner record). HubSpot's default is"Winner for the oldest record (the one created first)"However, we recommend changing it based on the following ideas.
| Judgment criteria | Recommended Winner | reason |
|---|---|---|
| Older vs. newer creation date | Older (default) | Keeping old records with a lot of engagement history and activity logs provides more information |
| More vs less engagement | People with a lot of engagement | Keep a rich record of email openings, web browsing, and business negotiation history |
| Lifecycle Stage is progressing | Higher stage | Make Customer/MQL record a winner and merge Subscriber record |
| Create via form input vs. manually create | For those filling out the form | Information entered by the customer himself is most likely to be accurate |
HubSpot's merge honors property values from the Winner record, and unique values on the Loser side may be lost. especially"Important information entered in custom properties" "Internal memo"Be sure to check before merging, and copy them manually if necessary before merging. Once merged, it is difficult to cancel, so check the operation on a small sample before enabling large-scale automatic merging.
In Data Hub Professional and above, within a workflowCode editor left panel "Outputs"will be added. By using this, the format can be automatically unified the moment a record is created or updated—there is no need for humans to keep making corrections.
The above WF is applied only to "newly created records" by default.To apply to tens of thousands of existing records in bulk, select the target property from the "Formatting Issues" tab in the Command Center and click the "Fix and Automate" button. This will fix the current problem all at once, and then set a rule that will automatically fix new records as well. However, large-volume transactions are processed sequentially within HubSpot and may take several hours to complete.
Missing data is a problem where there are no duplicates and the format is correct, but important fields are blank. If the industry, company size, and life cycle stage are left blank,AI accuracy, segment accuracy, and personalization accuracy all decrease.. Dealing with missing data should be designed in two ways: ``preventing it from occurring (requiring input)'' and ``filling in records that are already blank.''
The best way to deal with missing data is to use a mechanism to prevent it from being entered. In HubSpot property settingsIf you check "Required", you will not be able to save the record or proceed to the next stage if that property is left blank.Can be set. However, if you make all properties mandatory, it will make data entry more difficult, increasing the number of incorrect entries and inappropriate values, so it is important to narrow it down to the truly important ones (about 3 to 5).
| phase | Properties recommended to be required | reason |
|---|---|---|
| When creating a lead | // ② Company size score (0 to 30 points) | Minimum requirements for key personalization for duplicate detection |
| When promoting MQL | Phone number/industry/contact owner | Connect to BigQuery dataset tables in Google Cloud. Authenticate using the service account JSON key. Often used in use cases that combine GA4 data or Google Ads data with HubSpot CRM. |
| When creating an opportunity | Amount, expected closing date, negotiation stage | Basic data for pipeline forecasting and reporting |
| You can upload a local CSV and use it as a temporary data source. Convenient for integrating reference data (industry benchmarks, ICP classification tables, etc.) that does not require regular updates. Maximum file size: 100MB. | Best practices for connection settings | Starting point for CS handoff and health monitoring |
Data quality is not just a matter of "cleaning once and done."Every day new records are created, forms are submitted, imports occur, and syncs arrive from external systems.--In other words, data is constantly becoming new and dirty. To continuously maintain quality, it is essential to set KPIs and create a cadence for regularly checking them.
| KPI | Calculation method | initial goal | ——Accurate data from internal systems is required. AI can never guess |
|---|---|---|---|
| data category | Number of duplicate records ÷ Total number of contacts × 100 | 5% or less | 2% or less |
| Null / Undefined error | Number of valid email addresses ÷ Total number of contacts × 100 | 85% or more | 95% or more |
| Industry field sufficiency rate | Number of contacts with industry entered ÷ Total number × 100 | 60% or more | 85% or more |
| Lifecycle Stage setting rate | Number of stages configured ÷ Total number of contacts × 100 | 80% or more | 98% or more |
| Contact owner setting rate | Number set by owner ÷ Total number of contacts × 100 | 70% or more | 95% or more |
| Telephone number E.164 compliance rate | Number of E.164 formats ÷ Number of contacts with phone numbers × 100 | 70% or more | 99% or more |
① Set the Command Center weekly digest for today (5 minutes). ② Format automatic correction Create a WF (name, email, phone) and apply it to existing records all at once (1 to 2 hours). ③ Merge only the top 100 duplicate contacts from the “Duplicates” tab in Command Center (30 minutes). ——These three things alone will improve most organizations' data quality scores by 5-10 points the following week. Rather than aiming for perfection, the fastest way to improve is to start with the biggest problem.
Data Quality Command Center allows you to centrally understand problems using four tabs (duplicates, formats, missing items, and properties). By setting up weekly digest emails, administrators can notice increases and decreases in problems without having to log in. The first action today is to enable Digest.
Immediate merging of the same email (all plans) → Similar suggestion presentation by AI (Pro ~) → Automatic merging using custom rules (Pro Beta): Bring duplicates close to zero with 3 layers. Before merging, unify the "Winner record selection criteria" across the organization to prevent erroneous merges.
Prevent formatting problems by setting "Format correction WF when creating contact" for new records. Existing issues can be fixed in bulk using "Fix and Automate" in Command Center. Name, email, and phone fields are the top priority targets.
While blocking input at the source by setting required fields, enrichment workflows fill in records that are already blank. Narrow down your requirements to 3 to 5 fields that are truly important. Making everything mandatory will have the opposite effect of increasing erroneous and inappropriate input.
Decide on daily, weekly, monthly, and quarterly cadences, and determine in advance the indicators and actions that should be checked at each timing. In particular, your quality score will continue to improve by simply making it a habit to "review duplicate candidates (30 minutes)" weekly and "deal with missing TOP5" monthly.
Duplicate rate, email effectiveness rate, industry sufficiency rate