Modern enterprises operate across dozens of systems — each generating its own data, with its own identifiers, in its own format. When an event occurs across multiple channels, correlating that activity back to a single source becomes surprisingly hard. Different systems use different IDs for the same underlying entity, and reconciling them typically requires manual effort, rigid rules, or brittle lookups that break the moment data structures change.
This fragmentation creates real problems for security: if you can’t reliably connect activity across systems, you can’t see the full picture of what’s happening — and attackers know it.
The Core Problem
When databases originate from different systems, identical data often carries different identifiers depending on where it was captured. A transaction, a session, a device fingerprint — each system may label it differently, with different structure, different attributes, and different timestamps. Traditional correlation approaches struggle when the data doesn’t conform to expected patterns or when new sources are added that weren’t anticipated at design time.
The result is fragmented visibility. Analysts work with incomplete pictures. Automated systems miss connections they should catch.
A Machine Learning Approach to Cross-Channel Identity
This patent describes a system that uses a large language model to dynamically analyze and correlate data points across disparate sources — and generate a shared identifier that follows that data across systems.
Rather than relying on pre-defined rules or schema mappings, the model examines attributes across data points: timestamps, behavioral patterns, telemetry signals. It then determines, with a measurable confidence score, whether two or more data points originated from the same source — even when they look structurally different.
When confidence exceeds an established threshold, the system assigns a cross-channel identifier that links the data across platforms without requiring it to be moved or centralized.
How the System Works
The process follows a clear sequence:
- Data identification — The system locates data points from multiple sources captured at different times
- LLM-driven correlation — A large language model analyzes attributes across data points to assess whether they share a common origin
- Confidence scoring — A numerical confidence rating is generated for each correlation determination
- Secondary validation — A second AI engine reviews low-confidence determinations before a decision is made
- Identifier assignment — If confidence thresholds are met, a shared identifier is applied, linking the data across all relevant systems
- Ongoing verification — When new data arrives, the system confirms consistency with established historical patterns before extending the shared identifier
Built-In Security Application
The system isn’t only about data management — it has a direct security function. When incoming data produces a confidence score that falls below threshold, suggesting the activity may not be consistent with established patterns, the system can block the associated user session in real time.
This turns the correlation engine into an active security control: the same mechanism that unifies legitimate data across systems can flag and interrupt anomalous activity that doesn’t fit.
Why Distributed Storage Matters
A deliberate design choice in this architecture is that data remains at its original location rather than being pulled into a central store. The shared identifier travels — the data doesn’t. This reduces network load, limits the blast radius of a potential breach, and respects the governance boundaries that often exist between systems in large financial institutions.
The Bigger Picture
The convergence of large language models and security infrastructure is still early. Most LLM applications in enterprise settings are about productivity — summarization, search, code generation. This work points toward a different use case: using AI’s ability to reason across heterogeneous, unstructured inputs to solve problems that rule-based systems fundamentally can’t.
Cross-channel identity is one of those problems. It will not be the last.
US Patent Application No. 20260044620 — “Systems and Methods for Automatically and Dynamically Generating a Cross-Channel Identifier for Disparate Data in an Electronic Network.” Inventors: Vinicius Mouffron Ribas Da Costa, Michael R. Young, Mark A. Odiorne, Jinna Kim, Kelly Renee-Drop Keiter, Lauran Adele Hollar. Assignee: Bank of America Corporation.