Buyer intent data architecture: how it's collected, scored, and wired

07 May 26

Articles

Buyer intent data architecture: how it's collected, scored, and wired

How is buyer intent data actually collected and wired into Salesforce? DataLane provides the contact layer intent platforms depend on. ✓ See the architecture.

Buyer intent data architecture

Buyer intent data is a set of behavioral signals (content consumption on B2B publisher sites, review-site comparisons, first-party web visits, search-keyword surges) that, when aggregated to the account level, indicate which accounts are actively in-market for a category. This article goes one layer under the taxonomy. It's about architecture and implementation: how the signals get collected, how they become a score, how they wire into Salesforce or HubSpot, and where the architecture breaks for non-LinkedIn-native ICPs. One framing note up front: intent is a discovery problem, not an enrichment problem. Enrichment tools (Clay, Apollo, ZoomInfo) fill in attributes on accounts you already know. Discovery surfaces the in-market accounts and local-business decision-makers you don't. The architecture choices below are about discovery quality, not record fill-rates.

1. What buyer intent data is, in one sentence

1.1. The one-sentence definition

Buyer intent data is a set of behavioral signals that, when aggregated to the account level, indicate which accounts are actively in-market for a category.

1.2. What this article covers (and what it doesn't)

This piece is for RevOps and sales-ops practitioners who already know what intent data is at a definitional level and need to understand how it's collected, scored, and wired into a sales workflow. For the taxonomy (first-party vs. second-party vs. third-party) and use-case breakdown, the sibling b2b-intent-data piece is the right entry. For vendor shopping, see the intent-data-providers buyer's guide. This one goes one layer below both.

1.3. Who this is written for

RevOps leads wiring intent into Salesforce or HubSpot, sales-ops managers defining SLAs on intent-flagged accounts, ABM managers evaluating provider signals against their target list, and teams selling into non-LinkedIn-native ICPs (local businesses, trades operators, franchise decision-makers) who need to understand where standard intent architectures have structural gaps before they spend the budget.

2. The four collection architectures behind buyer intent data

2.1. Publisher co-op tagging (Bombora-style third-party intent)

The dominant third-party model. A data co-op (Bombora's network of 5,000+ B2B publisher sites is the canonical example) tags content pages across participating publishers. When a user reads a page tagged "CRM data enrichment" or "sales intelligence platform," a privacy-compliant identifier is captured and resolved to a company domain via IP, cookie graph, or deterministic matching. Signals are aggregated at the account level over a rolling window, baselined against the account's own historical consumption to distinguish a surge from steady-state reading, then licensed as a feed to vendors. 6sense, Cognism, ZoomInfo, and many CRM and ABM platforms embed Bombora as their intent layer. The architecture depends entirely on the co-op covering the sites the buyer actually reads. For enterprise SaaS and B2B tech buyers, coverage is strong. For local-business operators who read trade publications, franchise newsletters, and state licensing portals instead of G2 or TechCrunch, coverage is thin by construction.

2.2. Reverse-IP and first-party web identification

First-party architecture. A JavaScript snippet on your own site captures visitor sessions, including anonymous visits. Reverse-IP resolution (RB2B, LeadForensics, Dreamdata, 6sense's first-party layer) matches the visit to a company. The visitor's behavior (pages viewed, pricing-page dwell time, return cadence) gets stitched into an account-level session graph. Fidelity is the highest of any intent type because the buyer is on your site. Reach is the narrowest because you only catch accounts that found you. Reverse-IP vendors vary significantly in how aggressively they identify known individuals versus staying at the account level. Review vendor privacy posture before signing.

2.3. Review-site and comparison-site behavior

Second-party architecture. G2, TrustRadius, Capterra, and similar platforms capture logged-in and cookied users researching categories, comparing vendors, and clicking "Get a Quote" on specific products. Because the user is demonstrating active evaluation behavior, the signal's purchase-intent density is higher than raw publisher content consumption. G2's buyer intent product is the most operationally mature of this type. Architectural limit: the category needs meaningful presence on the review site. Emerging or niche categories generate thin signal.

2.4. Search-keyword surge and ad-ecosystem signals

Less discussed but structurally real. Some providers ingest anonymized search-keyword surges, paid-ad engagement data, and technographic-change events (new tech stack detected, employee hiring surge in relevant functions). These are weaker standalone signals than co-op content consumption, useful as corroborating features in a predictive model.

2.5. What the architectures share

All four produce event streams at the account level. All four depend on entity resolution: mapping a raw event (IP, cookie, review click, keyword hit) to a canonical company identifier. Entity resolution is where most architectural variance hides. A vendor with weaker entity resolution misattributes events to the wrong account, inflating noise at the top of the funnel. When evaluating providers, ask how the event-to-account mapping is done and what the match-confidence distribution looks like, not just how many raw signals get ingested.

3. How raw signals become an account score

3.1. Topic modeling and surge detection

Raw events are classified against a topic taxonomy (CRM, sales intelligence platform, ABM platform, data enrichment). For each account-topic pair, the provider computes a baseline consumption rate and flags a surge when recent consumption exceeds baseline by a configurable threshold. Surge detection (not raw volume) is the useful signal. An account reading one article per week on "CRM" is not in-market. An account that went from zero to twelve articles on "CRM data enrichment" in fourteen days probably is.

3.2. Predictive buying-stage models

Predictive intent platforms (6sense, Demandbase) layer a model on top of raw signals that outputs a buying-stage classification: Awareness, Consideration, Decision, Purchase. Models are trained on historical closed-won and closed-lost data against first-party, third-party, technographic, and firmographic features. Model accuracy is bounded by the training set. Accounts that resemble historical winners get classified well. Outlier segments (new verticals, non-LinkedIn-native ICPs) get classified worse because the training data is thinner.

3.3. Score recency and decay

Intent signals decay. A surge from 72 hours ago is more actionable than one from 14 days ago. Providers differ in decay functions. Some apply linear decay, some exponential, some windowed. Ask. A provider that doesn't decay old signals will show every account you've ever seen as "still in-market," which defeats the operational point.

3.4. Where the transparent signals live (and where they don't)

Transparent signals (the surge topics, the recency, the underlying publisher category) are more operationally useful than black-box scores. A rep working an account needs to know why it's flagged, not just that it's flagged. When evaluating a provider, ask what surfaces into the CRM account record beyond a single score number. If the answer is "just a score," reps won't trust it and adoption dies.

4. Wiring intent signals into CRM and the sales workflow

4.1. Standard CRM object model for intent

Intent data should land on the Account object in Salesforce or the Company object in HubSpot, not in a parallel system. Minimum fields: intent score (numeric), top surge topic (picklist), second surge topic, signal recency (date), signal source (picklist for which provider generated the signal), last refresh date. Anything missing forces a rep into a second tool, and that second tool goes unused.

4.2. Native connectors vs. flat-file sync vs. iPaaS

Native Salesforce or HubSpot connectors are the gold standard. The provider pushes updates directly to the account object, handles field mapping, and supports workflow triggers. Flat-file sync (nightly CSV drops) is a downgrade that works for batch use cases but not for timing-sensitive SLAs. IPaaS routing through Workato, Tray, or Zapier adds a failure surface and latency. For Salesforce enterprise teams with a mature admin, native is non-negotiable. For HubSpot mid-market, evaluate native vs. partner-built connectors on data-pipe latency and field-mapping flexibility.

4.3. Workflow triggers on signal changes

Intent data is operational only when signal changes fire workflow. Minimum: when an account's intent score crosses a threshold, fire a Salesforce flow that assigns the account to the correct sequence, notifies the account owner via Slack or email, and updates the account tier if the signal is sustained. Without these triggers, fresh signals sit in a dashboard until the rep happens to look. Usually too late.

4.4. SLAs on intent-flagged account action

Fresh signals decay, so action windows matter. A reasonable SLA: high-intent accounts worked within 24 hours, mid-intent within a week, low-intent during normal prospecting cadence. Write the SLA into the RevOps playbook, enforce it through Salesforce reports (accounts flagged more than 48 hours ago with no activity logged), and review it in weekly pipeline. Without timing discipline, the intent layer underperforms by a large margin.

4.5. Measurement

The honest measurement of intent ROI is incremental account entry. Did the team touch accounts they wouldn't have touched otherwise, and did those accounts convert? "Engaged accounts" is vanity without incremental-entry tracking, and it's trivially inflated by counting accounts your team was already working. Build reporting that separates accounts surfaced by intent that were net-new to the working universe, the motion those accounts got, and conversion rates against non-intent-flagged accounts in the same tier and vertical.

5. Where the architecture breaks

5.1. Entity resolution errors in collection

When a publisher co-op can't cleanly resolve a visiting IP or cookie to a company, the event drops or gets misattributed. Teams working from mis-resolved intent end up calling on accounts that weren't actually researching. The honest benchmark is measuring signal against accounts you independently know are in-market, not trusting the vendor's internal match rates.

5.2. Publisher co-op coverage gaps by segment

Co-op architectures depend on participating publishers. B2B SaaS categories have deep coverage. Local-business operator categories (restaurant operations, HVAC, home services, multi-location retail) have thin-to-nonexistent publisher co-op coverage. The decision-makers don't read the sites in the co-op. An intent bake-off on a restaurant-operator account list returns a handful of signals. On an enterprise SaaS list, it returns many. Match architecture to ICP.

5.3. Predictive model failure on outlier segments

Predictive buying-stage models are trained on historical closed-won patterns. If your ICP looks like the training set, classification accuracy is real. If your ICP is an outlier (a new vertical, a non-LinkedIn-native segment, a product line the provider's existing customers don't sell), the model overfits to the wrong features and returns stage classifications that don't match reality. Ask providers for segment-specific model accuracy, not aggregate.

5.4. Intent-to-reach gap

Intent tells you which accounts are in-market. It doesn't provide the decision-maker mobiles or emails needed to execute outreach. If the contact database feeding your team returns 10-20% decision-maker mobile coverage on a given ICP against a discovery-first benchmark of 60%+, roughly 80% of intent-flagged accounts become unreachable. Intent plus unreachable equals a pipeline illusion. The rep knows the account is in-market, can't get the decision-maker on a call, the signal expires, the pipeline doesn't move. This is the single biggest failure mode in intent implementations and the one least discussed in vendor marketing. Coverage times accuracy equals effective coverage, and effective coverage determines whether the intent layer produces pipeline.

5.5. Attribution theater in the ROI case

Vendors often claim "intent-flagged accounts close at 3x the rate." The statistic is usually confounded. In-market accounts are over-represented in both the intent-flagged population and the closed-won population, independent of the intent layer. Without a control group, the causal claim is overstated. Treat vendor ROI case studies with the same skepticism you'd apply to any self-reported efficacy claim.

5.6. Dashboard without workflow

The most common operational failure mode. The data lands, the dashboard is beautiful, and no one works it. Intent data that sits in a dashboard nobody checks is a line item without a return. The workflow wiring and SLA discipline above are the preconditions for the data producing pipeline.

6. Buyer intent data for non-LinkedIn-native ICPs

6.1. Why standard intent architectures underdeliver for local operators

Publisher co-ops don't index the content local-business owners actually consume: state licensing portals, franchise disclosure filings, trade-specific publications, POS vendor documentation, permit databases. Predictive models trained on enterprise SaaS closed-won data don't classify restaurant-operator buying stages accurately, because the training set looks nothing like the target segment. The result: a standard third-party intent buy on a local-business ICP returns thin coverage and noisy stage classifications.

6.2. Vertical event data as a better intent proxy

For non-LinkedIn-native operator segments, vertical-specific events often function as a stronger intent proxy than traditional third-party content consumption. New contractor licensing events (a new business applying for a state license is structurally in-market for software and services). Permit filings in home services (active projects signal near-term software and supply needs). Franchise disclosure updates (expansion signals, vendor-displacement windows). POS and tech-stack changes captured through public data. These events are captured at the account level by discovery-first data providers like DataLane, whose architecture indexes 17M+ US local-business locations from state licensing boards, permit filings, and franchise registries rather than LinkedIn or B2B publisher co-ops.

6.3. The two-layer stack for mixed-motion ABM

Teams with mixed motions (enterprise plus local) typically run a two-layer stack. A traditional intent provider (Bombora, 6sense, G2 buyer intent) covers the LinkedIn-native accounts. A discovery-first data provider (DataLane) covers the local and SMB tail the intent provider misses. Both layers feed the same ABM or outbound workflow. DataLane is not an intent data provider. It's the data layer that surfaces vertical event data as intent-adjacent signals and provides the decision-maker mobile coverage that makes intent-flagged accounts actually reachable.

6.4. Why Clay, ZoomInfo, Apollo, Cognism, and Lusha don't solve this

Prospects often assume one of the flexible horizontal contact platforms (Clay in particular, given its workflow composability) will close the coverage gap. It doesn't. ZoomInfo, Apollo, Clay, Cognism, and Lusha all share the same core architecture: LinkedIn scraping plus corporate web data. On LinkedIn-native ICPs, this architecture is strong. On local-business operators, trades decision-makers, and franchise operators, where roughly 50% of the target universe has no LinkedIn profile, all five inherit the same coverage ceiling. Switching among them doesn't change the underlying architecture. A discovery-first layer sourced from non-LinkedIn public data is what fills the gap.

7. Evaluating a buyer intent data provider

7.1. Start with architecture match to your ICP

If your ICP is LinkedIn-native enterprise tech, a publisher co-op architecture (Bombora, 6sense) covers you. If it's mixed motion, you need the two-layer stack. If it's primarily local-business operators, traditional third-party intent will underdeliver. Prioritize vertical event data and first-party intent. Match the architecture to the ICP before comparing vendor feature matrices.

7.2. Send your 100 accounts, not the vendor's

The standard bake-off trap: the vendor sends a curated sample with strong coverage, you sign, and real-world coverage on your actual list turns out materially lower. Never accept a vendor-selected sample. Send your 100 accounts from the real ICP. Measure what the vendor returns. Compare against two other providers on the same 100. Same methodology that applies to contact data evaluation.

7.3. Check signal depth, not just score

Ask what surfaces into the CRM account record beyond a single score: the surge topics, signal recency, signal source, baseline vs. current volume. If the only artifact is a score, reps won't trust it and won't use it.

7.4. Define the intent-to-action workflow before signing

The single highest-value thing you can do before purchasing is define the response workflow in advance. Which BDRs work intent-flagged accounts. How fast. With what sequence. In which CRM field is the signal stored. What SLA applies. Providers that resist helping you define the workflow before signing are often selling the data and skipping the operational lift, which is where the ROI actually lives.

7.5. Audit the integration depth

Native connector to your CRM, clean field mapping, workflow-trigger support, and refresh cadence that matches your SLAs. Table-stakes. If a provider sells a strong data layer but the integration is a nightly CSV drop, discount the offer accordingly.

7.6. Where each major provider is the right choice

Bombora is the right choice when you want raw third-party intent without an ABM platform wrapper and your ICP is LinkedIn-native enterprise B2B. 6sense is the right choice when you have the account volume and analyst maturity to operationalize a predictive buying-stage model and your team already runs ABM at enterprise scale. G2 buyer intent is the right choice when your category has strong G2 presence and your sales motion benefits from comparison-stage signal. Demandbase is the right choice when you want intent plus account-based advertising orchestrated in one platform. For non-LinkedIn-native ICPs, none of the above solves the coverage problem alone. Pair with a discovery-first data layer.

Frequently asked questions

What is buyer intent data?

Behavioral signals (content consumption on B2B publisher co-op sites, review-site comparisons, first-party web visits, search-keyword surges) aggregated to the account level to indicate which accounts are actively in-market for a category. Intent data is signal about accounts. The full targeting stack is intent plus contact data plus firmographics.

How is buyer intent data actually collected?

Four main architectures. Publisher co-op tagging (Bombora) aggregates anonymous content consumption across a network of B2B sites. Reverse-IP identification (Dreamdata, RB2B, LeadForensics) resolves anonymous visitors to companies. Review-site behavior (G2 buyer intent) captures users researching or comparing vendors. Search-keyword surge and ad-ecosystem signals round out the stack as corroborating features. All four produce account-level event streams that are topic-classified and surge-detected before being surfaced as a score.

What's the difference between first-party, second-party, and third-party intent data?

First-party is behavior on your owned channels (site visits, product trial, content downloads). Highest fidelity, narrowest reach. Second-party is behavior on independent platforms where buyers actively evaluate (G2, TrustRadius). High purchase-intent density. Third-party is aggregated publisher co-op signal (Bombora). Broadest reach, lower fidelity per event, dependent on co-op coverage of the sites your buyers actually read.

How do you implement buyer intent data in Salesforce or HubSpot?

Intent lands on the Account or Company object with six fields: intent score, top surge topic, second surge topic, signal recency, signal source, last refresh date. Use native connectors. Avoid nightly CSV drops for timing-sensitive workflows. Fire workflow triggers on score thresholds (assign to sequence, notify owner, update tier). Set SLAs on response (24 hours for high-intent, one week for mid-intent) and enforce through Salesforce reports.

Is buyer intent data accurate?

Depends on architecture and segment. Third-party publisher co-op accuracy is a function of co-op coverage (strong for enterprise tech, thin for local-business operators) and entity-resolution quality. First-party reverse-IP accuracy is high on identifiable traffic, with meaningful drop-off on mobile and privacy-hardened browsers. Predictive model accuracy is high on segments that resemble the training set, lower on outliers. Always ask for segment-specific accuracy, not aggregate.

How much does buyer intent data cost?

Entry-tier first-party and smaller providers run $15K to $30K annually. Mid-market bundles (Cognism with Bombora embedded, G2 buyer intent, Dreamdata) run $30K to $80K. Enterprise platforms (6sense, Demandbase, Bombora direct) run $80K to $250K+ annually depending on seat count, account volume, and signal depth. Budget the full stack (intent plus contact data plus ABM platform plus CRM plus marketing automation plus advertising) not the intent line item in isolation.

Buyer intent data architecture is the wiring between signals and the rest of the GTM stack. The signal source determines the segment fit. Topic-co-op signals work for LinkedIn-native ICPs and produce nothing usable for local-business segments because the underlying audiences don't browse those publications.