The Registry That Was Deployed Before It Was Studied

On June 24, 2026, six researchers from Imperial College London and CSIRO Data61 posted the first empirical study of ERC-8004, the Ethereum standard published in August 2025 that proposes to give autonomous AI agents a permissionless trust layer for cross-organizational transactions. The standard, ERC-8004: Trustless Agents, specifies three on-chain registries: Identity, Reputation, and Validation. The authors of the empirical study, Xihan Xiong, Zelin Li, Wei Wei, Qin Wang, William Knottenbelt, and Zhipeng Wang, crawled the deployments of those registries on Ethereum, BNB Smart Chain, and Base, covering the period from initial deployment through May 13, 2026. The paper does what a year of governance discussion has not done. It asks whether the artifact, as deployed, carries the function the standard says it carries.

The finding is a particular one. Across the three chains, the fraction of Identity registrations exposing a valid registration file with at least one live service endpoint was 3 percent on Ethereum, 4 percent on BNB Smart Chain, and 15 percent on Base. The other 85 to 97 percent of registrations are placeholders. On the Reputation Registry, the values posted are not commensurable across agents, feedback records are rarely grounded in verifiable interactions, and reputation can be manipulated at minimal cost. The authors flag 73.6 percent of reviewers on Ethereum, 59.2 percent on BNB Smart Chain, and 90.6 percent on Base as exhibiting coordinated Sybil behavior. After removing the Sybil-flagged feedback, 15.5 percent of rated agents on Ethereum, 72.3 percent on BNB Smart Chain, and 89.4 percent on Base have no valid feedback remaining. The paper's plain statement is that the Reputation Registry, as currently deployed, cannot function as a trust signal.

The shape of the gap

The Verik citation database has tracked a particular shape of governance gap for the past two months. The argument turns on a single structural point. The artifact the policy or the protocol names must in fact carry the function the policy or the protocol assigns to it.

ERC-8004 names three registries. The empirical study finds that two of them, Identity and Reputation, do not carry the functions the standard assigns to them. The Identity Registry is not, as deployed, a directory of reachable agents. It is, in 85 to 97 percent of cases, a directory of placeholders that do not resolve. The Reputation Registry is not, as deployed, a trust signal. It is a feedback channel dominated by coordinated reviewers, in which the operation of removing the coordination removes most of the data. The third component, the Validation Registry, is a hook-set whose deployment-side properties the study does not measure directly.

What the standard says and what the deployment shows

ERC-8004's specification describes a tiered trust model with security proportional to value at risk. The standard names four tiers of mechanism: reputation systems using client feedback, validation via stake-secured re-execution, zero-knowledge machine learning proofs, and trusted execution environment oracles. The empirical study examines only the first tier, the reputation-feedback mechanism, in production. That is the tier that the standard reserves for low-stake interactions. The finding, then, is not that the high-assurance tiers fail. It is that the lowest-assurance tier, the one designated as the entry point for the protocol, has been deployed at a scale at which it has acquired the appearance of a registry without acquiring the function of a registry. The Sybil rate on Base, the chain with the highest fraction of live registrations, is also the chain with the highest Sybil rate.

The standard was created on August 13, 2025. The first empirical study of how the standard performs in production appeared on June 24, 2026, ten months later. In that ten-month window, the protocol was adopted, the registries were populated, the registration files were written, and the feedback was posted. The instrumentation that would have detected the gap between the artifact and the function was not in place when the deployment began. It is in place now, in the form of the empirical study, after the gap has materialized.

The Agent Identity arc, sharpened

The Verik Agent Identity arc has been carrying a single open question. If an agent is to act on behalf of a principal across an organizational boundary, what is the registry against which the counterparty agent is checked, and what does the registry actually attest to?

The empirical study answers a narrower version of that question. The currently deployed permissionless trust layer for AI agent economies attests to a presence on chain that is, in 85 to 97 percent of cases, not matched by a presence at a service endpoint. The agent-to-agent transactions ERC-8004 was written to support are now occurring against a directory whose entries are mostly unreachable, and a reputation layer whose feedback is mostly coordinated. The Validation Registry was written into the standard precisely so that higher-assurance tiers could carry traffic the reputation tier cannot. The empirical study confirms that the reputation tier carries the load it was specified to carry only at low stakes.

What remains on the table

The Identity Registry exposes a service endpoint in 3 to 15 percent of cases across the three chains studied. The remaining registrations are placeholders. What is the appropriate signaling between a counterparty agent that queries the registry and an operator who reads the empirical study on the same morning? The directory the agent queries and the directory the empirical study describes are the same directory.

The Reputation Registry's reviewer base exhibits 59.2 to 90.6 percent coordinated Sybil behavior across the three chains. After Sybil removal, 15.5 to 89.4 percent of rated agents are left with no valid feedback. The standard reserves the reputation tier for low-stake interactions. What is the threshold at which the reputation tier transitions out of the low-stake band, and what registry feature would signal that transition to a counterparty agent at query time?

The standard was created in August 2025. The first empirical study of its deployment appeared in June 2026. The instrumentation that would have measured the gap between the artifact and the function arrived ten months after the artifact was specified and deployed. What is the appropriate temporal relationship between the publication of a trust-layer standard and the publication of empirical evidence about how the standard performs in production?

ERC-8004 names four tiers of trust mechanism. The empirical study examines the lowest tier. The Validation Registry's stake-secured re-execution, zero-knowledge machine learning proofs, and trusted execution environment oracles have not yet been measured in production at scale. Which of those higher tiers, if any, is currently carrying traffic, and where is the empirical baseline that would let a counterparty agent reason about the assurance level of an attestation it receives?

The Registry was deployed. The function the Registry was specified to carry has not yet been instrumented.