CRM data: what's in your CRM, how it decays, where to source it

07 May 26

Articles

CRM data: what's in your CRM, how it decays, where to source it

What does CRM data actually contain, and how does it decay by field? DataLane provides discovery-first sourcing for local-business CRMs. ✓ See the playbook.

CRM data: what's in your CRM, how it decays, where

CRM data is the structured set of fields a business stores about its customers, leads, and accounts to coordinate sales, marketing, and service. Every record is a row of identity, descriptive, quantitative, and qualitative fields. Every field has a different decay rate, a different source of truth, and a different cost when wrong. Most articles on this topic stop at the four-bucket definition. This one goes field by field.

Before any of that, a segment qualifier. What "good CRM data" looks like depends on who you sell to. For LinkedIn-native enterprise SaaS, CRM data is mostly identity and firmographic enrichment from LinkedIn-scraped sources, and the decay problem is title and email churn. For local-business GTM (home services, restaurants, trades, franchises, route operators), most of the contact-level fields ZoomInfo, Apollo, Clay, Cognism, and Lusha populate (LinkedIn URL, work email, role title) are missing on 50%+ of records. The data problem and the decay problem are different in each segment, so the playbook has to be different too. One more upstream distinction shapes everything that follows. Discovery (building the universe of businesses and decision-makers from scratch, especially the local segments LinkedIn doesn't index) is a different job than enrichment (filling in attributes for accounts you already know). Clay, Apollo, and ZoomInfo do enrichment. DataLane does discovery. The rest of this article assumes that split.

1. What CRM data actually is

A CRM record is a row of fields about a person, a company, or a deal. The four standard buckets are identity (who the record is), descriptive (context about the record), quantitative (behavior and engagement), and qualitative (notes, calls, conversation). Each bucket has its own sourcing pipeline and its own failure mode.

1.1. Identity data

Identity fields include full name, work email, mobile phone, LinkedIn URL, account name, and account domain. Some of these fields are stable. Account name and account domain rarely change outside of a rebrand or acquisition. Others are volatile. Work email rotates on every job change. Mobile phone rotates on number portability and role changes. LinkedIn URL can disappear when a profile is renamed or deleted. Full name is the most stable field on the record and the worst single key to dedupe on.

1.2. Descriptive data

Descriptive fields include job title, role and seniority, department, account industry or NAICS, company size, location, and tech stack. Title is one of the most-used and most-stale fields in most CRMs. NAICS is structurally unreliable for many local segments. A pizza shop and a Michelin-starred restaurant are both NAICS 722511. Tech stack data comes from third-party detection (BuiltWith, HG Insights, Wappalyzer-style web scrapers) and decays as vendors rotate contracts.

1.3. Quantitative data

Quantitative fields include email opens and clicks, page visits, content downloads, deal stage history, last-touch date, and lifecycle stage transitions. The source is mostly first-party: marketing automation, sales engagement, and product telemetry. Decay isn't really the issue here. Completeness is. So is identity stitching across tools that each generate their own contact ID.

1.4. Qualitative data

Qualitative fields are call notes, meeting summaries, support tickets, recorded objections, and custom fields filled in by AEs. This is the most valuable, most under-managed bucket in any CRM. Conversation-intelligence tools (Gong, Chorus, Avoma) increasingly write structured notes back to the record automatically. Without that pipe, qualitative data dies on the rep's laptop.

2. The decay rate of each CRM field

Contact-level CRM data decays at a widely cited baseline of about 30% per year for enterprise B2B. That's the average. Different fields decay at very different rates, and local-business records decay structurally faster than enterprise records. Match a refresh cadence to each field's decay rate, not to the slowest field on the record.

Field	Decay direction	Why it decays	Refresh strategy
Work email	High (~22-30% / yr)	Job changes, M&A, domain rotation	Monthly verification + enrichment
Mobile phone	High	Number portability, role change	On-demand verification before dial
LinkedIn URL	Medium	Profile rename, account deletion	Match-back on enrichment refresh
Job title	High	Promotions, role changes, departures	Real-time signal monitoring
Account domain	Low	Acquisitions, rebrands	Annual reconciliation
Account industry / NAICS	Low-medium	Code revisions, business pivot	Annual; structurally unreliable for local segments
Tech stack	Medium	Vendor rotation, contract churn	Quarterly re-detection
Phone (HQ / landline)	Medium-high in local segments	Closure, ownership change, line disconnect	Verify at use
Mailing address	Medium	Office moves, hybrid	Validate at use

These rates are directional. The 30% baseline is for enterprise B2B. Local-business contact decay runs faster for structural reasons.

2.1. Why local-business records decay faster

A 30-restaurant route operator turns over phone lines and ownership at a higher rate than a Fortune 1000 sales org. Closures, ownership transitions, line disconnects, and seasonal staffing all hit the contact record. There is often no stable corporate email or LinkedIn. The decay rate isn't a number you can pull off a vendor's blog. The structural reasons are real and they show up at use time as bounce rates, wrong-number calls, and meetings that never book.

2.2. The cost of stale fields

A bounced email burns sender reputation. A wrong mobile burns AE time. A wrong title misroutes the lead. A wrong NAICS code lands the wrong messaging. Each of these has a direct operational cost. The most measurable one is rep time. Manual enrichment of a single account (LinkedIn search, license lookup, ownership match-back, mobile verification) takes about 45 minutes when done by hand. With a discovery-first stack and a clean source-of-truth assignment, the same record takes about two minutes. That delta, multiplied across the rep's daily account list, is what bad CRM data actually costs.

3. Where CRM data comes from

There are three real sources of CRM data. First-party (forms, conversations, product). Third-party LinkedIn-dependent providers (ZoomInfo, Apollo, Clay, Cognism, Lusha, RocketReach). And discovery-first, non-LinkedIn sources (license records, permits, POS detection, franchise hierarchies, citation data, business registrations). None of them is "best." Each one matches a specific field and a specific segment.

3.1. First-party data

Form fills, sales calls, product usage, and support tickets are the highest-fidelity CRM data you have. The reader self-identified. The rep wrote the note. The product logged the event. First-party data is reliable for engaged accounts. It does not help you build the universe of accounts you should be selling to, because it only exists after engagement.

3.2. LinkedIn-dependent providers

ZoomInfo, Apollo, Clay, Cognism, Lusha, and RocketReach all share the same core architecture. LinkedIn scraping plus corporate web data plus contributory networks (where users contribute their own contacts in exchange for credits). For LinkedIn-native ICPs, this layer is the dominant source of identity and descriptive fields. For local-business ICPs, this layer caps mobile coverage at 10-20% of decision-makers, because the source data isn't there. A roofer doesn't keep a current LinkedIn. A franchise GM doesn't post on it. The architecture only works where the input exists.

3.3. Discovery-first sources

Discovery-first sources fill the cells LinkedIn-dependent enrichment can't. Contractor license records cover more than 805,000 active licenses across the US trades. Liquor and food permits, franchise corporate filings, POS technology detection, citation networks, business registrations, and payroll-derived employee counts each fill specific fields for specific segments. DataLane is built on these sources. Decision-maker mobile coverage on local-business ICPs runs 60%+ on a discovery-first stack, against 10-20% on LinkedIn-dependent stacks. It is a complement to the LinkedIn-dependent layer, not a replacement.

Source	Best for	Limit
First-party	Engaged accounts, behavior fields	Only works post-engagement
LinkedIn-dependent (ZoomInfo, Apollo, Clay, Cognism, Lusha)	Enterprise / mid-market identity + descriptive	10-20% coverage in local-business segments
Discovery-first (license, permit, POS, citation)	Local-business + trades + franchise universe	Less mature for enterprise tech contacts

4. CRM data hygiene

Most CRM hygiene fails because teams treat every field the same way. Hygiene is a function of three things. Field-level acceptance rules. Refresh cadence per field. An unambiguous source of truth per field. Get those three right and most of the dedupe-and-validate checklist takes care of itself.

4.1. Field-level acceptance rules

Email format validation. Phone normalization to E.164. Domain canonicalization (strip www, lowercase). Country code on every address. These are cheap to implement and high ROI on every downstream automation that runs against the record.

4.2. Refresh cadence per field

Daily for active engagement records. Monthly for contact-level enrichment. Quarterly for account firmographic data. Annual for industry classification. Match cadence to the decay rate from the prior section. Refreshing every field every day is wasted spend; refreshing nothing is decay you're paying for at use time.

4.3. Source of truth per field

Most fights about CRM data quality are fights about which system overwrites which. Marketing automation versus sales engagement versus enrichment provider versus AE manual entry. Pick a winner per field. Document it. Enforce it on write. This is the single biggest hygiene lever in most CRMs and the one most teams skip.

5. CRM data models

The canonical CRM object model is Lead, Contact, Account, Opportunity, Activity. Custom fields proliferate around industry-specific attributes, intent scores, and segmentation tags. Every custom field is a maintenance liability. The reader knows the basics; the cost lives in the custom layer.

5.1. Lead vs. contact vs. account

Lead is unqualified inbound. Contact is associated with an Account post-qualification. Account is the company. The lead-to-contact conversion is where most CRM data quality breaks, because dedupe logic at conversion is rarely tight, and the same person can exist as a Lead and a Contact at the same time.

5.2. Custom fields

Every custom field needs an owner, an acceptance rule, and a refresh cadence. Most don't have any of those. They get created in a sprint, filled in for a quarter, and then sit at 18% completeness for years. The quiet cost of CRM debt.

6. CRM data examples

The shape of a healthy record depends on the segment. Two examples, side by side.

6.1. An enterprise SaaS contact record

Identity: full name, work email (verified within 30 days), LinkedIn URL, mobile phone (verified at use). Descriptive: job title, role classification, department, tech stack from BuiltWith. Quantitative: last touch date, MQL date, current deal stage. Qualitative: most recent call summary from Gong. Source of truth: enrichment provider for identity, marketing automation for behavior, sales engagement for last touch.

6.2. A local-business account record

Identity: business name, primary phone (verified), owner name where available, decision-maker mobile (verified). Descriptive: license type and number, NAICS or trade classification, service area, employee count from filings. Quantitative: last contact, season-relevant signals (storm, permit pull, renewal date). Qualitative: call notes, route density. The LinkedIn URL field is often empty here. That isn't a data-quality failure. It's a sourcing reality. About half of local-business decision-makers have no current LinkedIn presence, and the discovery-first sources fill the cells that absence creates.

7. How CRM data quality connects to revenue

CRM data quality doesn't show up as a line item on the P&L. It shows up as bounce rate, dial-to-connect rate, meeting-book rate, and pipeline coverage ratio. Stale email drives bounce rate, which drives sender reputation, which drives deliverability on the next campaign. Wrong mobile drives AE time burned per booked meeting. Mis-segmented industry drives wrong messaging, which drives reply rate. None of these are clean stats to cite, but the cause-and-effect is real. The one quantified anchor: the manual enrichment tax. 45 minutes per account by hand. Two minutes per account on a clean discovery-first stack. The delta is the cost of bad CRM data, paid by your most expensive role.

8. Buying CRM data

The buyer's checklist for CRM data has five items. Source transparency: where does the data actually come from. Match-back logic: how does the provider tie a record to your existing accounts. Refresh cadence: how often does each field get re-verified. Segment coverage: test against your 100 accounts, not the provider's database size. Pricing structure: per-record, per-credit, or per-seat, and what gets metered.

8.1. The 100-account test

Take your 100 most important target accounts. Ask the provider to enrich them. Measure mobile coverage, email verification rate, and title accuracy on those 100 specifically. This is the honest benchmark. It works the same way for ZoomInfo, Apollo, Clay, Cognism, Lusha, or a discovery-first provider. Database size is a vanity metric. Coverage on your 100 is the only number that matters.

8.2. Pricing models

Per-record pricing aligns cost with consumption. Per-credit pricing meters specific actions (export, reveal, enrichment refresh). Per-seat pricing flatlines cost regardless of volume. Match the model to your usage curve. A SDR org with 30 reps each pulling 200 records a day has a different optimal model than a four-person founder-led sales team running monthly enrichment refreshes.

Frequently asked questions

What is CRM data?

CRM data is the structured set of fields a business stores about its customers, leads, and accounts. The four standard buckets are identity (name, email, phone, LinkedIn URL, account domain), descriptive (title, role, industry, tech stack), quantitative (engagement, deal stage, last touch), and qualitative (call notes, custom fields, support tickets).

How fast does CRM data decay?

The widely cited enterprise B2B baseline is about 30% per year for contact-level data. Work email, mobile phone, and job title decay fastest. Account domain and industry classification decay slowest. Local-business records decay structurally faster because of higher closure rates, ownership transitions, and phone-line turnover.

Where does CRM data come from?

Three sources. First-party (forms, calls, product). Third-party LinkedIn-dependent providers like ZoomInfo, Apollo, Clay, Cognism, and Lusha. And discovery-first, non-LinkedIn sources like license records, permits, POS detection, and franchise filings. Each source fills different fields for different segments.

What's the difference between LinkedIn-dependent and discovery-first CRM data?

LinkedIn-dependent providers scrape LinkedIn, corporate web data, and contributory networks. They cover enterprise and mid-market identity well and cap at 10-20% mobile coverage on local-business decision-makers. Discovery-first sources pull from license records, permits, POS, and franchise data and run 60%+ mobile coverage on local-business segments.

How often should I refresh CRM data?

Match cadence to decay rate per field. Daily for active engagement. Monthly for contact-level enrichment. Quarterly for account firmographic. Annual for industry classification. Refreshing every field every day is wasted spend.

What is the manual enrichment tax?

A single account, enriched by hand (LinkedIn search, license lookup, ownership match-back, mobile verification) takes about 45 minutes. The same record on a clean discovery-first stack takes about two minutes. The delta, multiplied across an AE's daily account list, is what bad CRM data costs in rep time.

How do I evaluate a CRM data provider?

Run the 100-account test. Take your 100 most important target accounts, ask the provider to enrich them, and measure mobile coverage, email verification rate, and title accuracy. Database size is a vanity metric. Coverage on your accounts is the only number that matters.

Data quality compounds. Fix the source layer first; the workflow layer is downstream.