XIP-59: Trigger on-chain calls via wallet_sendCalls

Author: J Kishore Kumar (@justjkk)

Abstract

This content type provides a standard format for triggering on-chain calls typically by a programmatic agent like an AI agent responding to a user request. The client handling the content type SHOULD provide the user with an option to execute the provided transaction data and MAY track the status of the transaction by publishing another message with the transaction reference content type (XIP-21).

Motivation

The goal of the wallet send calls content type is to provide a rich interaction between an AI agent and the user and allow for on-chain actions like sending, swapping, lending tokens and any other DeFi protocol that can be represented as a single or a sequence of on-chain transactions. As XMTP natively represents users by their wallet address, implementing this content type benefits the ecosystem for both the current DeFAI use-case as well as any future use-case that enables on-chain interaction like minting NFT or registering votes on a group for a DAO proposal, etc.

Specification

Content type

{
  authorityId: "xmtp.org",
  typeId: "walletSendCalls",
  versionMajor: 1,
  versionMinor: 0,
}

Wallet send calls schema

type WalletSendCallsParams = {
  version: string;
  chainId: `0x${string}`; // Hex chain id
  from: `0x${string}`;
  calls: {
    to?: `0x${string}` | undefined;
    data?: `0x${string}` | undefined;
    value?: `0x${string}` | undefined; // Hex value
    gas?: `0x${string}` | undefined;
    metadata?: {
      description: string;
      transactionType: string;
    } & Record<string, any>;
  }[];
  capabilities?: Record<string, any> | undefined;
};

Rationale

Aligning this XIP with the existing EIP-5792 specification simplifies the client integration and by default enables features like Paymaster and Bundling(if using ERC-4337 smart wallet) by negotiating the wallet capabilities. For an EOA wallet that does not support the EIP-5792 Wallet API, the client SHOULD fallback to triggering the eth_sendTransaction API one or more times depending on the call data.

Backward compatibility

To maintain backward compatibility, a content fallback is stored in the codec as text — ensuring that the intent of the wallet-send-calls content type is conveyed even in non-supporting clients.

Test cases

Test cases will validate the interpretation of schema type and effective use of a content fallback. These are essential for ensuring interoperability across XMTP platforms.

Reference implementation

You can find a WIP reference implementation of this wallet-send-calls content type in HeyElsa/xmtp-js repo fork under the content-types/content-type-wallet-send-calls directory.

Security considerations

The following security risks are identified and clients implementing this content type should take necessary precautions.

Threat model

Content injection

The content type defines parameters that will end up as arguments to functions and since this content type may be coming from an untrustworthy user or agent, care should be taken to sanitize the input before using the parameters in any code. Typescript may provide only an indication about the parameter constraints and does not provide runtime validation. Even if runtime validation is supported, the client implementation still SHOULD sanitize the input based on the context of where the parameter may end up in. Eg: SQL injection, XSS.

Spoofing

The user visible part of the transaction is typically the description part of the metadata which could be completely different from the actual transaction data if coming from a malicious agent. So, the user may end up performing The client SHOULD mitigate this by doing one or more of the following:

  • Maintain an allowlist of users or agents whose messages it trusts.
  • Perform transaction simulation and cross-check it against the metadata or override the metadata that is displayed to the user.
  • Display a warning or disclaimer alerting the user that the metadata description may not be trustworthy and to use their judgement.

Informed consent

Because the data could be coming from a confused/malicious agent, the client implementation MUST NOT automatically sign or execute the transaction. The user should always provide their confirmation after reviewing either the metadata or the simulation results. This also protects against any possible confusion or hallucination by the AI agent.

Privacy considerations

The client implementation SHOULD take care not to accidentally expose the IP address of the user to the agent indirectly. Eg: Passing the paymaster URL that is provided by the agent to the wallet may expose the user’s IP if the wallet queries the paymaster API from the user’s device.

Copyright

Copyright and related rights waived via CC0.

2 Likes

Love where this proposal is going. This strikes me as more of a SHOULD than a MUST — for example, I can imagine a client giving users control to auto-approves txns below a certain dollar value, or that “receive only”, or other user-defined “safe enough” situations.

1 Like

I wonder whether XMTP network “consent” to message satisfies this need implicitly, or whether some add’l signal might be required to do this well.

Ideally we’d let accounts self-declare/prove that they’re automated agents, which would give them a trust boost (that clients can choose to display) versus accounts that may or may not be agents (having proven neither agent- nor human-ness).

This is explored quickly in XIP-51

I think we should add an optional permit2 eip712 field. This will let the client receiving these messages know that a permit2 signature is required.

I’d propose an additional field to the calls object

// ...
calls: {
  // ...
  permit2?: {
    eip712: {
      types: {
        EIP712Domain: { name: string; type: string }[];
      } & Record<string, { name: string; type: string }[]>;
      primaryType: string;
      domain: { [key: string]: any };
      message: { [key: string]: any };
    };
  };
};
//...

EIP712: EIPs/EIPS/eip-712.md at master · ethereum/EIPs · GitHub

The signature is included with the transaction data. Here’s an example that uses viem

if (permit2) {
  const signature = await signTypedDataAsync(permit2.eip712);
  const signatureLengthInHex = numberToHex(size(signature), {
    signed: false,
    size: 32,
  });
  transaction.data = concat([
    transaction.data,
    signatureLengthInHex,
    signature,
  ]);
}

sendTransaction({
  to: transaction.to,
  data: transaction.data,
  gas: BigInt(transaction.gas),
  gasPrice: BigInt(transaction.gasPrice),
  value: BigInt(transaction.value),
});

Permit2 example useful-solidity-patterns/patterns/permit2 at main · dragonfly-xyz/useful-solidity-patterns · GitHub

Because the network also has “consent” you could approve once and say "this can execute on behalf of me anytime for this relationship. This is where pro-active agents + transactions will become magic and we want to protect users and guarantee consent but the surface area would be more powerful if the agents in the future can be proactive if I give them the right permissions + consent up front.

That is a good point. On re-reading the RFC definitions SHOULD is definitely the right one and gives more flexibility in implementation. Let me update the draft and also provide a caveat for the client implementations to enable auto-approve only for safe situations.

I think it is up to the client implementation to decide how to combat the problem of spoofing or impersonation based on their use case.

In a closed client implementation (eg: you own both the client and the agent), the straightforward approach is to hardcode the inbox id of the agent. Reading and inferring messages from untrusted agents may lead to a confused deputy attack because the end user may have assumed that the client is trusted and so the chat and any metadata description shown in the client is also trusted.

So, if the client implementation is open to all users/agent like xmtp.chat, the client should also provide a warning or disclaimer when displaying the transaction request for the user to double check.

Thanks for sharing the idea and example. I went through the permit2 mechanism which involves signing a EIP712 typed data and then including that as input to a later contract function execution. However, the way the protocol contract accepts the permit2 signature is not standard and is up to the function’s definition. For Eg: Uniswap’s Universal Router.execute takes permit2 signature data under one of the commands like PERMIT2_PERMIT_BATCH.

Given that the wallet_sendCalls is operating at a low level and preparing the transaction data happens beforehand, the client implementation cannot easily
modify the transaction data and include the permit2 signature. One solution is to define a placeholder in the transaction data which the client can string-replace with the permit2 signature but that approach won’t work if the transaction data has a complex structure(like abi-encoded json, etc).

Also, I’m thinking, should we introduce a new content-type(lets say eth_signTypedData) to allow signing any EIP712 typed data(including permit2, hyperliquid trading, coinbase spend permissions) as this content-type closely follows EIP 5792 which corresponds to wallet_sendCalls and eth_sendTransaction RPC calls?

eth_signTypedData doesn’t have much overlap with wallet_sendCalls parameters.

The flow will then become as follows:

  1. Agent sends the permit2 sign message using a eth_signTypedData content-type.
  2. Client shows/triggers the wallet UI for signing and after signing posts a signature message in the chat.
  3. Agent uses the signature and constructs the transaction data for execution using wallet_sendCalls content-type.

Please let me know your thoughts.

You’re right. Probably doesn’t make sense to add if protocols don’t have a standard way of accepting the signature.

I do think that a separate EIP712 content type makes sense. However, for using permit2 like in my example it doesn’t make sense to get the signature and send it back to the agent for processing. The client should process it.

I’m thinking that its not part of the standard, but there can be non-standardized fields in the metadata object that the client must know how to process if they are to integrate. Various agents can have varying methods of doing things. We (Bankr) use 0x for swaps and they use permit2 as I mentioned. So if clients want to interact w/ Bankr then they will need to integrate the custom metadata fields and combine the signature with the transaction data.

TLDR you’re right and i think we can use the metadata object to add non-standard fields that clients can support if they choose. also +1 on the eth_signTypedData content type.

1 Like

@justjkk I was chatting with a major inbox app developer today who mentioned they prefer the eth_signTypedData content type as well. It would be a welcomed addition to the proposal!