Lexicon Records

Posted 2024-10-13 16:00 ‐ 5 min read

The Smoke Signal website allows users to create and manage events and RSVPs easily. Additionally, Smoke Signal is a Lexicon for events and RSVPs that anyone can use. The goal is that the PDS records made through the Smoke Signal website could just as well be produced with a different website or tool, and that's OK. It's more than OK; it's encouraged.

Currently, Smoke Signal records adhere to the Lexicon and XRPC specifications published by Bluesky Social. The Lexicon specification provides solid foundational elements that make all of this possible.

I believe that the Lexicon spec needs to evolve to add two critical missing pieces: Authority and Discovery

Proposal

I'm proposing two things:

  1. A protocol-level NSID and collection for storing lexicons as records.
  2. Changing $type record attribute value to an AT-URI of lexicon records.

These two changes would have a big impact on the ATmosphere because they would allow Lexicons to resolve and, through resolution, represent ownership and authority.

The com.atproto.lexicon NSID

The top-level com.atproto.lexicon NSID would become a constant value type indicating that a given record is a Lexicon definition.

{
  "lexicon": 1,
  "id": "com.atproto.lexicon",
  "description": "An ATProtocol Lexicon Definition",
  "defs": {
    "main": {}
  }
}

An example Lexicon record would look like:

{
  "uri": "at://did:plc:tgudj2fjm77pzkuawquqhsxm/com.atproto.lexicon/3l5np3qjdst3x",
  "cid": "",
  "value": {
    "$type": "com.atproto.lexicon",
    "lexicon": "events.smokesignal.calendar.event",
    "collections": ["events.smokesignal.calendar.event"],
    "definition": {}
  }
}

As collections are developed and published, they are put into this collection and made available.

Having lexicons exist as records does several big things:

First, it changes the relationship between records and their types. Making incremental changes to records and versioning them becomes manageable, offering greater flexibility.

Currently, the Lexicon specs require that all changes to Lexicons remain fully and completely backward compatible. Although that is a great aspiration, that isn't practical and leads to undocumented changes, and "fingers crossed" releases where you just sort of hope that record consumers will handle it correctly.

Second, it allows Lexicon definitions to be more easily resolved and discovered. With Lexicon definitions being records, they can be proactively published and received by relay consumers.

Currently, when you encounter a record with a non-standard type, there is no way to associate the definition directly. Although using domains is an avenue that could be explored, the cost of introducing an additional well-known spec and dealing with domain expiration is very high.

Lastly, it allows Lexicon definitions to be easily retrieved. If you encounter a Lexicon in the wild, you can resolve the AT-URI to the schema to quickly see what it looks like. The AT-URI can be converted to an HTTP request to a "com.atproto.repo.getRecord" XRPC call, which you can use your browser to invoke.

This dramatically lowers the bar for developers to discover and use Lexicon definitions.

Using AT-URI values in $type

The value of the $type attribute in repository records would become the AT-URI of the Lexicon that the record implements.

{
  "uri": "at://did:plc:cbkjy5n7bk3ax2wplmtjofq2/events.smokesignal.calendar.event/3l5movzhkwk2w",
  "cid": "bafyreievnkc776s3yobqazqxu4r7vtli4ikqft6oomf7shmtmsqquf53ya",
  "value": {
    "$type": "at://did:plc:tgudj2fjm77pzkuawquqhsxm/com.atproto.lexicon/3l5np3qjdst3x"
  }
}

Currently, records rely on the $type attribute to reference Lexicon for a given record.

{
  "uri": "at://did:plc:cbkjy5n7bk3ax2wplmtjofq2/events.smokesignal.calendar.event/3l5movzhkwk2w",
  "cid": "bafyreievnkc776s3yobqazqxu4r7vtli4ikqft6oomf7shmtmsqquf53ya",
  "value": {
    "$type": "events.smokesignal.calendar.event"
  }
}

Although this keeps types simple, it leaves a lot of room for interpretation, which creates several unanswered questions:

  • Who created that Lexicon?
  • How do I verify and validate it?

This change references the Lexicon definition directly, answering these questions immediately. The owner of the Lexicon is the DID in the AT-URI. By fetching the content of the AT-URI, you can validate the record.

Alternative

As an alternative to replacing the value of $type outright, it could be useful to append the AT-URI with a separator.

Borrowing from some HTTP headers, the semicolon could work:

{
  "$type": "events.smokesignal.calendar.event;at://did:plc:tgudj2fjm77pzkuawquqhsxm/com.atproto.lexicon/3l5np3qjdst3x"
}

Additionally, there's something appealing to keeping it simple and splitting on the first whitespace token:

{
  "$type": "events.smokesignal.calendar.event at://did:plc:tgudj2fjm77pzkuawquqhsxm/com.atproto.lexicon/3l5np3qjdst3x"
}

Side Effects

This change can also lead to several interesting side effects:

  1. Personal Data Servers can lookup a Lexicon by AT-URI for record validation purposes when records are created through XRPC calls.

  2. SDK maintainers would have a single, versioned Lexicon schema per com.atproto.lexicon for the purpose of validating Lexicon definitions.

  3. A Lexicon definition could "forked" without changing the original definition.

Final Thoughts

This is a relatively easy way to create strong references between records and the definitions they implement. It satisfies all of the requirements, allowing Lexicons to be discoverable and authoritative. It keeps the definitions "close to home" and in the PDS of the authoritative DID, and by using the existing personal data server and repository mechanics in the protocol, lexicons can be treated as any other highly portable record.

It removes the guesswork from interacting with new record types, using the paved path to interact with records through XRPC features and functionality.

This change is going to increase confidence and awareness in the ATmosphere. Developers can feel confident that when they encounter a new record, they can more easily ascertain its structure and usage. Lexicon publishers will have more confidence that when records implement their defined Lexicon, they are done correctly.

What do you think? Let's talk about it: @ngerakines.me