Decentralizing XMTP, a minimum viable proposal

Objective

Propose a credibly neutral network architecture that provides for a high throughput, low fee messaging experience for XMTP inbox client developers.

Motivation

XMTP is a credibly neutral messaging protocol that developers can use to provide secure, private, and portable messaging experiences for users. Messaging users should have sovereignty over their messaging identity and history. They also expect a private, secure messaging experience. Consumer messaging users expect a free and low friction experience. Bulk senders would like to build reputation while reliably reaching their audiences. All users would like protections from spam. The requirements of such a system are novel, and require unique solutions made available through a scalable modular blockchain infrastructure.

Current state (Centralized XMTP)

XMTP is currently powered by a centralized Waku network, where clients store various forms of state across “topics” in the network. The state includes encrypted shared identity keys, contact bundles, conversation invitation messages, and conversation messages. The centralized network lacks censorship resistance and the commingling of state with different requirements adds significant complexity. XMTP looks like this today:

Censorship possible, with limited transparency

Centralization means a single entity can censor by dropping or tampering with messages during transport. Senders have no way of discovering this is happening, and recipients have no way to confirm they have received all data that was submitted to the network.

Security weaknesses

Message history is stored permanently in a way that’s accessible to a user’s shared key across installations, which results in having no forward secrecy on data stored in a publicly accessible network.

Complex design

The commingling of variegated state adds complexity to decentralization designs that must attempt to provide a full suite of guarantees. Contact bundle state is expected to be stored permanently, and only the owner of the externally owned account (EOA) should be allowed to advertise contact information for their address. Additionally, there’s a need for similar types of state in the future, such as prekeys (for end-to-end encryption), public user preferences (for name resolution ordering), and private user preferences (for standardized inbox filtering). At the same time, transport state should not be stored permanently due to security concerns.

Limitations in credibly neutral spam handling

Network DOS protection is afforded by IP restrictions set in centrally managed and configured network infrastructure. This provides limited flexibility, and is not a credibly neutral solution.

Additionally, the publicly-discoverable conversation invite inbox is easily susceptible to spam targeting an individual, and there’s currently no spam control mechanism for inboxes that adds friction to delivery. The work of processing messages and filtering or indexing for spam is placed on the recipient’s client/device, which can result in poor, DOS-like experiences for some recipients.

Summarizing the problems

  • The network is not credibly neutral, and the economics enrich few participants who will inevitably aggregate power
  • Censorship resistance and accountability are weak-to-non-existent
  • Commingling state in a monolithic network adds complexity to network design and build out, while weakening message security
  • There is no credibly neutral mechanism for protecting network resources
  • Spam control for user inboxes is entirely dependent on message filtering by inbox apps

Decentralizing XMTP, an overview

We propose a credibly neutral and modular EVM-based solution that equips developers with the necessary tools to provide their end users with messaging capabilities while maintaining sovereignty over identity and data, ensuring censorship resistance, and providing accountability guarantees. The solution includes a novel method of combating messaging spam through the use of well-established blockchain transaction fee mechanisms.

The blockchain ultimately facilitates three core workflows:

  1. Contact discovery: Alice discovers how to reach Bob using Bob’s advertised contact bundle.
  2. Conversation initiation and consent: Alice requests and receives consent from Bob via an invitation message (”invite”).
  3. Messaging after consent: Alice sends and receives messages with Bob, and Bob can send and receive messages with Alice, via a conversation. Crucially, if Charlie is not part of the same conversation, Charlie cannot send and receive messages with Bob or Alice.

Steps towards decentralization

The proposed steps below are essential to transition XMTP away from its currently centralized state to a progressively decentralized one:

Comprehensive security upgrades

Upgrade the protocol’s message security architecture to use per-device installation keys, MLS end-to-end encryption (E2EE), and revocable keys. These changes introduce forward secrecy and post-compromise security for messaging, as well as the ability to protect users from malicious inbox apps.

Decouple invite and conversation message history storage from the transport network, such that it’s stored post-delivery on device or with a post delivery service (PDS) provider. A recipient client can do the work of PDS locally to their device/installation. This includes validating, filtering, and indexing message history across invites and conversations.

A trusted PDS provider can be engaged by the recipient to provide more powerful always-online PDS services, that may also provide more sophisticated spam and indexing functionality. The provider can act as another installation for the recipient client, where the privacy tradeoff can be scope minimized to sender metadata decryption only.

We can minimize trust between clients and PDS providers by having clients store their own message history and periodically check against the data stored with the provider. This may only be done with more recent data, or more of the history, depending on the configuration. If a provider is detected to be censoring a recipient, they can move to a different provider.

Conversation membership verification

Establish a conversation membership protocol that network nodes can verify. This introduces consent configuration state for “conversations” (i.e., conversation membership). This state can be managed locally across installations, while network transport nodes facilitate sharing and enforcement.

Reduce complexity by organizing state by permanence

Decouple and decentralize network state that needs non-temporary storage and special authorization, such as contacts, prekeys, conversation membership configuration, and user preferences.

Permanent state may include contact identifier, fallback keys, private and public user preferences (the ”contact bundle”). This state has separate requirements, such as strong immutability and tamper-proof ownership guarantees. Contact bundles should also have the ability to be widely discoverable.

A public blockchain with strong security guarantees such as Ethereum makes sense for contact bundle identifier and fallback keys. We are actively prototyping an Ethereum DID directory for this state.

Session prekeys and user preferences may be better suited for a network with weaker ownership guarantees, such as IPFS, where inbox app providers can make availability and censorship guarantees for their users.

Private user preferences and conversation membership configuration can be managed by installations.

Moving contact bundle state onchain introduces fees—one transaction is required to create the user’s onchain contact bundle, and one transaction for each subsequent update to the onchain bundle. This fundamentally changes the way users interact with XMTP.

Improve censorship resistance, accountability, and spam mitigation

Decouple invites from the centralized network

Pass invite message metadata as input in a blockchain transaction, thus providing a verifiable record of message history. This provides censorship resistance and accountability.

Encrypted invite message contents are pinned in a decentralized content network, such as IPFS for recipient retrieval.

Financially disincentivize spam and protect network resources from DOS using the blockchain’s transaction fee mechanism (TFM). A TFM is the most reliably proven, credibly neutral way to mitigate spam in permissionless networks.

Smart contracts introduce configurable protection for specific public inboxes. The contract acts as a gating mechanism, and may encode an auto-adjusting fee that fluctuates with inbox demand, a flat fee configured by the inbox owner, or something else entirely. This provides a level of inbox-specific spam control for recipient clients and enables them to provide better experiences for users.

We use the blockchain’s TFM and smart contracts as a spam gadget to provide much needed protection for user public inboxes.

The requirement for all messages to pay network resource fees conflicts with the widely held assumption that consumer messaging must be free. By implementing fees in the protocol, we assume protocol aligned providers will come to market with consumer free tiers. We accept that provider business models may diverge from the protocol’s mandate for credible neutrality, and that users seeking the highest level of protocol guarantees may wish to pay their own fees to access the network.

Initially, special purpose gateways can provide anonymity and rate limited free tiers for senders. The gateway is a trusted provider that introduces a hop for blockchain transaction execution, and is useful for abstracting transaction submission and payment. Users have no guarantee that a gateway provider will not censor or tamper with their blockchain transaction. However, the blockchain transaction contains only a pointer to the encrypted message contents stored on the content network, which cannot be tampered with and can only be censored by the entity pinning the message (i.e., the inbox app). Further, trust between clients and gateways can be minimized by sharing the blockchain transaction hash with clients for transaction execution verification. If a gateway is not readily providing transaction confirmation or is found to be censoring, the sender can use a different provider.

Decentralize transport of conversation messages

Decentralize the now single-purpose temporary-storage Waku network.

Pass conversation message metadata as input in a blockchain transaction. Conversation metadata transport can live on a blockchain that has been highly optimized for throughput. The modular approach allows for plugging in solutions for data availability (DA) that tradeoff security for higher throughput and lower fees.

Encrypted conversation message contents are pinned in a decentralized content network, such as IPFS for recipient retrieval.

Implementation through a phased approach

We will progressively implement the previously outlined steps in three phases.

Phase 1 (Decentralizing contact bundles)

Upgrade security and migrate contacts to onchain “Contact Directory” smart contract on an Ethereum L2.

  • Introduces MLS E2EE
  • Introduces cost for contact registration. Initially, XMTP Labs can pay for consumer users. At some point, inbox apps must subsidize consumer users. Similar to Farcaster’s approach.

Phase 2 (Decentralizing messaging consent requests)

Migrate consent request (invite messages) transport to onchain “Public Inbox” smart contract on an Ethereum L2.

  • “Public” refers to the fact that anyone can send a consent request to the inbox. Message contents are encrypted, and the message sender retains anonymity by using a gateway provider.
  • Introduces cost for conversation initiation. This will impact bulk senders much more than it will consumer messaging users.
  • Encrypted invite message contents pinned in a decentralized content network, such as IPFS by inbox apps

Phase 3 (Decentralizing message transport)

Migrate conversation message transport to “Conversation” smart contracts onchain using an Ethereum modular scaling architecture.

  • Move conversation transport network to an L3 validium that plugs in a more scalable DA, and settles to the L2
  • All encrypted message contents pinned in decentralized content network, such as IPFS by inbox apps

Benefits of utilizing a blockchain in system architecture

By tracking message metadata in a blockchain we get a logically centralized view of message history. The centralized view—redundantly provided by many nodes—provides the list of consistently ordered valid messages. This is useful for accountability, because without this view users would have no way of knowing if their messages are being censored by providers. If a user is censored or de-platformed by a provider, they have a transaction history of their messages, which may be used to request the message content be resent with fresh keys.

The logically centralized view of the data also allows a user client to plug into any node in the network and get the messaging transaction data. This lessens the impact of censoring by malicious inbox apps and provides protection against DOS attacks.

A blockchain also provides out-of-the box spam mitigation in the permissionless setting. This comes in the form of a transaction fee mechanism and general purpose smart contracts. The ability to enforce predefined execution in a credibly neutral way is a powerful spam gadget for a messaging network.

Conversation transport network

Conversation message transport can live on a blockchain or in an offchain hub model. Censorship resistance and accountability guarantees of blockchain are desirable. However, the spam gadget is less useful. Conversations are protected from spam by network level membership verification, which leaves network DOS as the primary concern.

The specific architecture choice for the conversation transport network is less important in the short-to-medium-term. The decision will depend on progress made in developing scalable blockchain infrastructure necessary to support consumer free tiers and user privacy.

High throughput and lower fees

A blockchain is not a feasible architecture for a messaging network if the throughput and fees do not meet inbox app provider expectations.

A modular approach allows us to optimize components of the system along the dimensions of core properties offered and cost.

A modular approach also allows us to dissect system state by permanence requirements, and propose tailored solutions for each.

Key components

Settlement

Ethereum for settlement. Ethereum is the preferred trust machine. Liquidity flows and the total value of secured assets clearly validate this point. Using Ethereum for settlement is expensive, but with advances in proof generation, more L2 (or L3) transactions can be fit into a single validation transaction, thus lowering the per message cost of the L2 (and L3). As noted in phase 3, we are exploring moving high volume conversation messages to a third layer that trades off security with higher throughput and lower cost for DA and settlement.

Execution data availability

Specialized blockchain for DA. The solution advancement for tackling the high cost of verifiable transaction data publishing is exciting. New publishing blockchains, including the Ethereum Beacon chain, are focused on scaling to meet chain operator needs (i.e., higher throughput, and lower publishing cost). It is currently hard to predict the potential scale and cost reductions, or if these solutions will be reliable. We are exploring data availability solutions that provide the required guarantees, at the lowest cost/risk to the system.

Encrypted message content availability

Encrypted message contents are tamperproof and not necessary for verifiable state transition of the chain. DA cost can be made significantly cheaper and more predictable by posting encrypted message contents to a decentralized content network, such as IPFS. Leaving encrypted message contents out of the transaction payload reduces transaction size ~4x on average. This is likely a conservative estimate. Greater consistency in DA cost significantly improves the chain’s ability to sustainability operate.

Users are reliant on inbox app providers to upkeep pinning. Inbox apps and users are incentive aligned, and guarantees can be made about encrypted message content availability. We want strong security properties around E2EE during message transport, which means encryption keys are discarded after delivery. It’s especially important to maintain these security properties because the decentralized transport network is open and publicly viewable. Encrypted message contents should be made available only long enough for recipient installations to receive the message.

Practically, this means the ability to discontinue content availability, user-inbox app incentive alignment, and cost efficiencies make the choice of pinning encrypted message contents in a separate content network an optimal solution over including them in the blockchain transaction.

Execution

Messaging user workflows fit nicely in a parallel execution environment, as there is no interdependence across invites or conversations. The chain can adopt a battle tested VM that offers parallel execution (e.g., Solana Virtual Machine).

Economics

A parallel environment, and multidimensional fee market (i.e., DA and settlement priced separately for each state type) allows us to be precise in pricing specific usage patterns. We can tailor pricing to user personas, who have varying preferences for ability and willingness to pay.

Protocol economics

Economic objective

The primary economic objective is sustainable protocol operation. Sustainability is achieved by delivering a valuable and reliable service to developers, fending off centralizing forces and attackers, and generating and capturing value for decentralized operators.

Customer value proposition

Permissionless and decentralized protocol operation ensures developers have irrevocable access to a large and growing network of users. In this new paradigm, developers are free to plug in protocol SDKs and build end user experiences for an existing user base that cannot be taken away from them. The result is a growing pie and economic flourishing for competitor applications, which fuels a virtuous cycle of network development and new users.

Decentralized protocol ownership

XMTP requires infrastructure operators, developers, and users. Each wishes to operate independent of a central authority and without permissions. Developers and operators choose XMTP because it is credibly neutral.

Threats to neutrality grow along with value generation, as centralizing forces seek to capture the value for themselves. In order to prevent this, ownership must be placed in the hands of operators, developers, and users as early and frequently as possible. A global network of third party infrastructure providers ensures XMTP remains neutral.

The modular architecture outlined in this proposal leverages the decentralization of Ethereum and a decentralized content network, such as IPFS. However, the XMTP L2 chain will be operated by a centralized sequencer controlled by XMTP Labs. While the permissionless L2 infrastructure is being designed, XMTP Labs must choose what to do with surplus revenue captured by the sequencer. A credibly neutral outcome would be for community governance to decide.

Protecting the network

A transaction fee mechanism (TFM) is the most reliably proven, credibly neutral way to mitigate spam in permissionless networks. Messaging fees financially disincentivize spam and promote network availability, by limiting sybil attacks and ensuring the network and users are not overwhelmed with spam messages. Fees effectively crowd out lower value spam, ensuring only valuable messages are processed by the network and reach user inboxes.

Spam gadget

We can program the TFM and smart contracts to protect network resources and user inboxes from spam.

Messaging patterns

Where possible, messaging patterns that resemble bulk messaging should be priced separately from consumer messaging patterns.

The cost to send a message to a consenting conversation recipient should be cheaper than sending a message to a nonconsenting recipient’s public invite inbox. This aligns consumer and honest bulk sender incentives with the protocol, where all parties win from less spam.

Similarly, the cost to use the network should increase with the number of addresses a user sends from, and the network should be cheaper for a user that communicates from one address.

Value generation for network operators

A sustainable protocol requires protocol revenue that exceeds the cost incurred by chain operators in order to provide the guarantees and properties desired by developers and users.

The protocol TFM ultimately generates protocol revenue, and should efficiently price network resources for users. For XMTP this means balancing the preferences of two distinct user personas within the network: bulk senders and consumers. The gap between the two persona’s willingness and ability to pay messaging fees is fairly wide—consumers are not willing to pay, and bulk senders are willing and able to pay. Passing even nominal costs along to consumers in the form of explicit fees would lead to an unacceptable user experience and would stunt network adoption and jeopardize sustainability.

:bulb: XMTP should attempt to capture value from bulk senders and distribute it to consumers. This value transfer can happen in-protocol or extra-protocol through a consumer subsidy provider.

Free usage for consumer-users

Initially, XMTP will run extra-protocol gateways that abstract fees for consumer users. Of course, users can pay their own fees if they wish to maintain full permissionless access to the system. The gateway must implement a means of verification for Sybil resistance (own an ENS/xmtp.id, submit to proof-of-personhood verification, stake).

XMTP identity registration fees and sender reputation

The protocol must also be aware of sybil attackers that attempt to simulate honest users. In order for the protocol to sustainably operate, it must have some ability to reduce sybil attacks and convey sender reputation.

Bulk senders benefit by building the reputation of their known addresses. For instance, by building the reputation of a 0x address or an ENS name, bulk senders can reliably avoid inbox app filtering algorithms, reach more of their audience, and enjoy more effective campaigns.

However, mechanisms that require fees for bulk messaging patterns, while offering subsidies for consumer messaging patterns create a strong incentive for bulk senders to simulate consumer patterns. This incentive is most apparent for bulk senders not necessarily concerned with reputation (i.e., low reputation spammers).

A low reputation spammer can subsidize their campaign and significantly boost ROIs by simulating consumer patterns (i.e., create enough XMTP IDs to remain within the consumer fee structure). The protocol and users pay the price of this attack, through lost protocol revenue and increased spam.

Limiting effectiveness of sybil attacks

There are currently no registration fees in XMTP. Thus, the cost to spam the network from multiple unique IDs is negligible.

The attack can be mitigated by moving contact bundle creation onchain, thus introducing a mandatory “registration fee”. However, registration is a one time event, and attackers would do well by reusing IDs. There may be ways to limit sybil address useful life, such as tracking them in extra-protocol lists maintained by inbox app providers.

Conclusion

The sustainability of XMTP depends on its ability to maintain credible neutrality as the network grows and generates value. The network must financially disincentivize spam and protect itself from attackers, while providing censorship resistance and sovereignty to users, and high throughput and low costs for developers and honest bulk senders. Two of the biggest challenges are mitigating sybil and promoting reputation for senders.

8 Likes

This phased proposal is flexible, well thought out and appropriate. XMTP is collaborating with Veramo Labs and the Decentralized Identity Foundation (DIF) to develop a did-eth-registry contract. This initiative lays the groundwork for a decentralized identity system for contact credentials, aligning with the DID Core Specification and EIP-1056 for Ethereum Lightweight Identity. This is a highly extensible foundation for developing the next generation of XMTP.

7 Likes

Thanks for your proposal @trevor.

Could the introduction of transaction fees for spam control in the XMTP network inadvertently create barriers for genuine users, particularly those who are cost-sensitive?

What are the other anticipated challenges in migrating current XMTP users to this new system, and how might these be mitigated for applications like ours that actively try to sell wallet messaging as a service?

Hi @levy. Thanks for the questions.

Could the introduction of transaction fees for spam control in the XMTP network inadvertently create barriers for genuine users, particularly those who are cost-sensitive?

Consumer users that are not willing to pay for messaging will have the hardest time with explicit fees. These users require a subsidized messaging experience. The subsidy will be facilitated by a gateway service that pays network fees on the user’s behalf. Consumer inbox apps can integrate a gateway service to offer the subsidized experience for consumer users. To start, XMTP Labs will develop and operate an open source gateway that offers the subsidy.

What are the other anticipated challenges in migrating current XMTP users to this new system, and how might these be mitigated for applications like ours that actively try to sell wallet messaging as a service?

Both consumer and bulk messaging platforms can integrate a gateway in order to seamlessly connect to the network and send messages. If desirable, the gateway can be used to pay network fees on behalf of bulk senders.

The biggest migration challenge will be for current XMTP consumer users that want subsidized messaging. These users should be verified according to the gateway’s verification policy. This is required to protect the subsidy from bots.