Add tags chapter

This commit is contained in:
Jon Staab
2026-04-16 16:49:18 -07:00
parent 2553cff300
commit 8a29ff39d6
6 changed files with 901 additions and 13 deletions
+248
View File
@@ -0,0 +1,248 @@
# Research: Tags
## Topic Summary
The tags chapter introduces a typed representation of nostr tags to replace
the `Vec<Vec<String>>` used in the events chapter. Tags are arrays of
strings whose first element names the tag and whose subsequent elements
carry values, relay hints, and markers. The chapter should cover:
- A `Tag` wrapper around `Vec<String>` with accessors for name, value, and
the rest of the entries
- Helpers that read and filter tags on an event (`find`, `find_all`,
`values`, `value`, `has`)
- The distinction between indexed single-letter tags and multi-character
tags
- Parsing and constructing address tags (`kind:pubkey:identifier`) and
`EventPointer`/`ProfilePointer`/`AddressPointer` conveniences
- NIP-10 markers on e-tags (`root`, `reply`, `mention`) and how to read
them positionally
- Integration with the `Event`/`EventContent`/etc. types from the events
chapter — swap `Vec<Vec<String>>` for `Vec<Tag>`
We want an ergonomic but minimal type. Not rust-nostr's 60-variant enum; a
thin wrapper plus free functions on slices, close in spirit to nostrlib or
welshman.
## Philosophy
From `ref/building-nostr`:
**Tags are the structured data half of events.** An event's content is
generally human-readable; tags hold structured data. Encoding JSON into
content is an antipattern. Conversely, tags are where every reference,
index, or machine-readable annotation should live.
**Lists of lists, not maps.** Tags are arrays of arrays of strings by
design. This preserves two properties a dictionary cannot: keys may repeat
(important for multiple `e` or `p` references), and order is preserved.
The parallel drawn by building-nostr is to URL query parameters and Python
ordered dicts.
**Keep tags short.** "In general, tags should be as short as is reasonable.
Two to three entries is all you really need; if you have more than that,
you're probably trying to pack more data into a single tag than really
belongs." Prefer multiple tags over positional fields.
**Three categories, conflated.** Building-nostr identifies three
categories of tag that were conflated in the original design: data tags
(for display/handling), filter tags (single-letter, queryable via `#x`),
and behavior tags (like `expiration`, `-`, `h` — affect implementation
handling orthogonally to kind). The conflation is called out as "a design
mistake" but we have to live with it.
**Single-letter = indexed.** Single-letter tag names (`a``z`, `A``Z`)
are the ones relays index and expose via `#e`, `#p`, etc. filters.
Multi-character names (`imeta`, `alt`, `expiration`) are typically not
indexed. The tag-name convention is therefore meaningful: naming a tag
with a single letter asserts it's intended for filtering.
**The `e` tag is overloaded.** Eight different NIPs use `e` for different
things (reply, fork, transaction reference, report target, list member,
approval, merge, mention). Building-nostr warns: when resolving a tag's
meaning, always consult the kind spec first, then tag specs — never the
other way around. Our library should stay neutral about semantics and let
callers interpret based on kind.
**Design general-purpose tags cautiously.** Broad tags can conflict with
kind-specific semantics. Our tag type should not bake in interpretation.
## Reference Implementation Analysis
### applesauce (TypeScript)
- Tags remain `string[][]` throughout; no wrapper class.
- Type-level annotation via `NameValueTag<Name>` generic tuple; runtime
type guards (`isETag`, `isPTag`, ...) identify kinds.
- Markers (`"root" | "reply" | "mention" | ""`) as a union type.
- A-tag parsing lives in `parseReplaceableAddress(address)` returning
`AddressPointer | null`, with an inverse
`getReplaceableAddressFromPointer`.
- Operations-as-functions: `TagOperation = (tags) => tags`. Events expose
`modifyPublicTags(...ops)` that pipes operations.
- Helpers: `addEventPointerTag`, `addProfilePointerTag`,
`addAddressPointerTag`, `ensureSingletonTag`, `ensureNamedValueTag`,
`fillAndTrimTag` (normalizes nulls and trailing blanks).
### ndk (TypeScript)
- `NDKTag = string[]`, raw; `NDKEvent.tags: NDKTag[]`.
- Accessors on event: `getMatchingTags(name, marker?)`, `hasTag`,
`tagValue` (returns index 1 or undefined), plus `removeTag`, `replaceTag`.
- Address tags: `tagAddress()` constructs `${kind}:${pubkey}:${dTag}`;
`tagId()` returns event id or address depending on replaceability;
`tagType()` returns `"e" | "a"`.
- NIP-10 markers at `tag[3]`: `getRootTag`, `getReplyTag` fall back to
positional interpretation when markers are absent.
- `referenceTags(marker?)` emits `[["a", addr], ["e", id, relay, marker, pubkey]]`.
- `generateContentTags` auto-tags `npub`/`note`/`nevent`/`naddr`/hashtags
from content.
### nostr-gadgets (TypeScript, JSR)
- Raw `string[]` tags, documented by convention.
- Single helper: `getTagOr(event, tagName, dflt)`.
- Validators: `isHex32`, `isATag` (regex `^\d+:[0-9a-f]{64}:[^:]+$`).
- Composition pattern: `itemsFromTags<I>(processor)` factory — each
fetcher passes a per-tag processor to build typed items.
- Deletion kind-5: switch on `tag[0]` for `e` (id filter) vs `a`
(kind+author+#d filter).
### nostrlib (Go, fiatjaf)
- `Tag = []string`, `Tags = []Tag`; embedded directly in `Event`.
- Helpers on `Tags`:
- `Find(key)`, `FindLast(key)`
- `FindWithValue(key, value)`, `FindLastWithValue`
- `FindAll(key)` returns `iter.Seq[Tag]` (lazy)
- `Has(key)`, `ContainsAny(key, values)`
- `GetD()` for the `d` identifier on parameterized replaceables
- Pointer interface: `ProfilePointer`, `EventPointer`, `EntityPointer`
all share `AsTag`, `AsTagReference`, `AsFilter`, `MatchesEvent`.
- Address parsing: `ParseAddrString("kind:pubkey:d")` splits on `:`,
validates kind (0..65535) and pubkey (hex), preserves identifier.
- Standard library only (`iter`, `slices`, `strconv`). No tag taxonomy
enum; NIPs implement their own parsing helpers over raw slices.
- Thread markers (`root`/`reply`/`mention`) and relay-list markers
(`read`/`write`) are read via index, never via typed fields.
### nostr-tools (TypeScript)
- Plain `tags: string[][]`, no wrapper.
- Direct indexing throughout: `tag[0]` name, `tag[1]` value, `tag[2]`
relay, `tag[3]` marker, `tag[4]` pubkey hint.
- Address-tag parsing inline per NIP:
`let [kind, pubkey, identifier] = tag[1].split(':')`.
- NIP-10 supports both explicit markers and legacy positional fallback
(oldest/newest heuristic).
- Each NIP module owns its own tag construction and parsing; no central
tag API.
### rust-nostr (Rust)
- `Tag` wraps `Vec<String>` plus `OnceCell<Option<TagStandard>>` for
lazy parsed enum.
- `TagStandard` enum has 60+ variants covering most NIPs (`Event`,
`PublicKey`, `Coordinate`, `Kind`, `Amount`, `Image`, `Title`, ...).
- `TagKind<'a>` categorizes: named variants, `SingleLetter(SingleLetterTag)`
with case tracking, `Custom(Cow<'a, str>)`.
- E-tag parser is position-aware: `tag[3]` attempts Marker first, falls
back to PublicKey (NIP-01 legacy); `tag[4]` is PublicKey only if `[3]`
was a marker.
- A-tag parser uses `Coordinate::from_str`.
- `Tags` collection (not `Vec<Tag>`) maintains a
`BTreeMap<SingleLetterTag, BTreeSet<String>>` index for dedup and
indexed lookup, plus helpers `event_ids()`, `public_keys()`,
`coordinates()`.
- Trade-offs: extensibility (every new tag type touches the enum),
OnceCell overhead per tag, case-preservation fields. Very thorough
but heavy.
- **We should not replicate the enum approach.** Prefer a thin wrapper
over `Vec<String>` and let callers parse.
### welshman (TypeScript — predecessor of this library)
- No wrapper class; raw `string[][]`.
- 50+ pure functions in `/util/src/Tags.ts`:
- Filters: `getTags(tagName, tags)`, `getTag(tagName, tags)`
- Value extractors: `getTagValues`, `getTagValue`
- Type-specific: `getEventTags`, `getPubkeyTags`, `getAddressTags`,
`getRelayTags`, `getTopicTags`, `getKindTags`
- Reply logic: `getReplyTags`, `getCommentTags` (NIP-10 + NIP-22
uppercase/lowercase dual-tag)
- `uniqTags` dedup, `tagger` factory
- Dedicated `Address` class with `kind`, `pubkey`, `identifier`, `relays`;
factories `from`, `fromNaddr`, `fromEvent`; `isAddress` regex
`^\d+:\w+:.*$`; `toString` and `toNaddr`.
- Event envelope types (`EventContent`, `EventTemplate`, `StampedEvent`,
...) match our exact Rust hierarchy — this is where we borrowed it.
Tags stay as `string[][]`.
- High-level builders in `/app/src/tags.ts`: `tagEventForReply`,
`tagEventForComment`, `tagEventForQuote`, `tagEventForReaction`.
## Common Patterns
**Raw lists dominate.** Every library except rust-nostr keeps tags as the
native string array. The rust-nostr enum is an outlier, and its heaviness
is visible (extensibility pain, memory overhead).
**Free functions over methods.** Welshman and applesauce both prefer pure
functions that take tags and return tags or values. Method-on-type
approaches (ndk) tend to get cluttered.
**Address tags get their own type.** Nostrlib (`EntityPointer`), welshman
(`Address`), applesauce (`AddressPointer`), rust-nostr (`Coordinate`)
all introduce a small struct for `kind:pubkey:d`. This is consistently
the one tag type worth parsing eagerly because it combines three fields
that are always used together.
**Markers are positional.** No library introduces a `Marker` enum
dependency that leaks into the base tag type. Marker interpretation
happens at the reader site (`getReplyTags` etc.), not at construction
time.
**Single-letter indexing matters for filters.** Nostrlib and rust-nostr
explicitly model the single-letter vs multi-character distinction.
Applesauce and welshman rely on convention.
## Considerations for Our Implementation
Given our literate-programming posture and existing style (thin wrappers
over bytes in `keys`, struct pipelines in `events`), we should:
1. **Introduce `Tag(Vec<String>)` as a tuple wrapper.** Provide `name()`,
`value()` (second element or empty), `values()` (all after the first),
`get(i)`, `len()`, `as_slice()`, plus `From<Vec<String>>`,
`IntoIterator`, `Serialize/Deserialize` that flatten transparently to
an array. `new` constructor that takes a name and variadic values.
2. **Free functions on `&[Tag]`.** `find(tags, name)`, `find_all(tags, name)`,
`values(tags, name)`, `value(tags, name)`, `has(tags, name)` as
standalone helpers. Keep them name-agnostic — `name` is `&str`.
3. **An `Address` struct for `a` tags.** Fields `kind: u16`,
`pubkey: PublicKey`, `identifier: String`, plus optional `relays`.
Implement `FromStr`/`Display` for the `kind:pubkey:d` form, and an
`Address::to_tag()` / `Address::from_tag()` pair. Keep it minimal —
no `naddr` yet (that lands in the bech32/entities chapter).
4. **Update `Event` and friends to use `Vec<Tag>`.** The events chapter
left tags as `Vec<Vec<String>>` explicitly because the `Tag` type
wasn't ready. Swap it now and keep the canonical hash bytes identical
(serialize `Tag` transparently as `Vec<String>` in the canonical form).
5. **Stay neutral on semantics.** No `TagKind` enum, no marker parsing
baked into `Tag`. Building-nostr is explicit that tags must be
interpreted in the context of the event kind; a generic type should
not try to know better.
6. **Brief section on markers.** Show how to read NIP-10 markers
positionally — `tag.get(3)` — without introducing a marker type. The
marker-aware reply threading will belong in a later chapter.
7. **No hidden-tag / modify pipelines.** That belongs later, with
encryption of private tag lists.
The goal is a type that disappears when you're not using it and becomes
helpful the moment you are — exactly the "little more than an empty
shell" that building-nostr describes for nostr itself.