Add tags chapter
This commit is contained in:
@@ -0,0 +1,248 @@
|
||||
# Research: Tags
|
||||
|
||||
## Topic Summary
|
||||
|
||||
The tags chapter introduces a typed representation of nostr tags to replace
|
||||
the `Vec<Vec<String>>` used in the events chapter. Tags are arrays of
|
||||
strings whose first element names the tag and whose subsequent elements
|
||||
carry values, relay hints, and markers. The chapter should cover:
|
||||
|
||||
- A `Tag` wrapper around `Vec<String>` with accessors for name, value, and
|
||||
the rest of the entries
|
||||
- Helpers that read and filter tags on an event (`find`, `find_all`,
|
||||
`values`, `value`, `has`)
|
||||
- The distinction between indexed single-letter tags and multi-character
|
||||
tags
|
||||
- Parsing and constructing address tags (`kind:pubkey:identifier`) and
|
||||
`EventPointer`/`ProfilePointer`/`AddressPointer` conveniences
|
||||
- NIP-10 markers on e-tags (`root`, `reply`, `mention`) and how to read
|
||||
them positionally
|
||||
- Integration with the `Event`/`EventContent`/etc. types from the events
|
||||
chapter — swap `Vec<Vec<String>>` for `Vec<Tag>`
|
||||
|
||||
We want an ergonomic but minimal type. Not rust-nostr's 60-variant enum; a
|
||||
thin wrapper plus free functions on slices, close in spirit to nostrlib or
|
||||
welshman.
|
||||
|
||||
## Philosophy
|
||||
|
||||
From `ref/building-nostr`:
|
||||
|
||||
**Tags are the structured data half of events.** An event's content is
|
||||
generally human-readable; tags hold structured data. Encoding JSON into
|
||||
content is an antipattern. Conversely, tags are where every reference,
|
||||
index, or machine-readable annotation should live.
|
||||
|
||||
**Lists of lists, not maps.** Tags are arrays of arrays of strings by
|
||||
design. This preserves two properties a dictionary cannot: keys may repeat
|
||||
(important for multiple `e` or `p` references), and order is preserved.
|
||||
The parallel drawn by building-nostr is to URL query parameters and Python
|
||||
ordered dicts.
|
||||
|
||||
**Keep tags short.** "In general, tags should be as short as is reasonable.
|
||||
Two to three entries is all you really need; if you have more than that,
|
||||
you're probably trying to pack more data into a single tag than really
|
||||
belongs." Prefer multiple tags over positional fields.
|
||||
|
||||
**Three categories, conflated.** Building-nostr identifies three
|
||||
categories of tag that were conflated in the original design: data tags
|
||||
(for display/handling), filter tags (single-letter, queryable via `#x`),
|
||||
and behavior tags (like `expiration`, `-`, `h` — affect implementation
|
||||
handling orthogonally to kind). The conflation is called out as "a design
|
||||
mistake" but we have to live with it.
|
||||
|
||||
**Single-letter = indexed.** Single-letter tag names (`a`–`z`, `A`–`Z`)
|
||||
are the ones relays index and expose via `#e`, `#p`, etc. filters.
|
||||
Multi-character names (`imeta`, `alt`, `expiration`) are typically not
|
||||
indexed. The tag-name convention is therefore meaningful: naming a tag
|
||||
with a single letter asserts it's intended for filtering.
|
||||
|
||||
**The `e` tag is overloaded.** Eight different NIPs use `e` for different
|
||||
things (reply, fork, transaction reference, report target, list member,
|
||||
approval, merge, mention). Building-nostr warns: when resolving a tag's
|
||||
meaning, always consult the kind spec first, then tag specs — never the
|
||||
other way around. Our library should stay neutral about semantics and let
|
||||
callers interpret based on kind.
|
||||
|
||||
**Design general-purpose tags cautiously.** Broad tags can conflict with
|
||||
kind-specific semantics. Our tag type should not bake in interpretation.
|
||||
|
||||
## Reference Implementation Analysis
|
||||
|
||||
### applesauce (TypeScript)
|
||||
|
||||
- Tags remain `string[][]` throughout; no wrapper class.
|
||||
- Type-level annotation via `NameValueTag<Name>` generic tuple; runtime
|
||||
type guards (`isETag`, `isPTag`, ...) identify kinds.
|
||||
- Markers (`"root" | "reply" | "mention" | ""`) as a union type.
|
||||
- A-tag parsing lives in `parseReplaceableAddress(address)` returning
|
||||
`AddressPointer | null`, with an inverse
|
||||
`getReplaceableAddressFromPointer`.
|
||||
- Operations-as-functions: `TagOperation = (tags) => tags`. Events expose
|
||||
`modifyPublicTags(...ops)` that pipes operations.
|
||||
- Helpers: `addEventPointerTag`, `addProfilePointerTag`,
|
||||
`addAddressPointerTag`, `ensureSingletonTag`, `ensureNamedValueTag`,
|
||||
`fillAndTrimTag` (normalizes nulls and trailing blanks).
|
||||
|
||||
### ndk (TypeScript)
|
||||
|
||||
- `NDKTag = string[]`, raw; `NDKEvent.tags: NDKTag[]`.
|
||||
- Accessors on event: `getMatchingTags(name, marker?)`, `hasTag`,
|
||||
`tagValue` (returns index 1 or undefined), plus `removeTag`, `replaceTag`.
|
||||
- Address tags: `tagAddress()` constructs `${kind}:${pubkey}:${dTag}`;
|
||||
`tagId()` returns event id or address depending on replaceability;
|
||||
`tagType()` returns `"e" | "a"`.
|
||||
- NIP-10 markers at `tag[3]`: `getRootTag`, `getReplyTag` fall back to
|
||||
positional interpretation when markers are absent.
|
||||
- `referenceTags(marker?)` emits `[["a", addr], ["e", id, relay, marker, pubkey]]`.
|
||||
- `generateContentTags` auto-tags `npub`/`note`/`nevent`/`naddr`/hashtags
|
||||
from content.
|
||||
|
||||
### nostr-gadgets (TypeScript, JSR)
|
||||
|
||||
- Raw `string[]` tags, documented by convention.
|
||||
- Single helper: `getTagOr(event, tagName, dflt)`.
|
||||
- Validators: `isHex32`, `isATag` (regex `^\d+:[0-9a-f]{64}:[^:]+$`).
|
||||
- Composition pattern: `itemsFromTags<I>(processor)` factory — each
|
||||
fetcher passes a per-tag processor to build typed items.
|
||||
- Deletion kind-5: switch on `tag[0]` for `e` (id filter) vs `a`
|
||||
(kind+author+#d filter).
|
||||
|
||||
### nostrlib (Go, fiatjaf)
|
||||
|
||||
- `Tag = []string`, `Tags = []Tag`; embedded directly in `Event`.
|
||||
- Helpers on `Tags`:
|
||||
- `Find(key)`, `FindLast(key)`
|
||||
- `FindWithValue(key, value)`, `FindLastWithValue`
|
||||
- `FindAll(key)` returns `iter.Seq[Tag]` (lazy)
|
||||
- `Has(key)`, `ContainsAny(key, values)`
|
||||
- `GetD()` for the `d` identifier on parameterized replaceables
|
||||
- Pointer interface: `ProfilePointer`, `EventPointer`, `EntityPointer`
|
||||
all share `AsTag`, `AsTagReference`, `AsFilter`, `MatchesEvent`.
|
||||
- Address parsing: `ParseAddrString("kind:pubkey:d")` splits on `:`,
|
||||
validates kind (0..65535) and pubkey (hex), preserves identifier.
|
||||
- Standard library only (`iter`, `slices`, `strconv`). No tag taxonomy
|
||||
enum; NIPs implement their own parsing helpers over raw slices.
|
||||
- Thread markers (`root`/`reply`/`mention`) and relay-list markers
|
||||
(`read`/`write`) are read via index, never via typed fields.
|
||||
|
||||
### nostr-tools (TypeScript)
|
||||
|
||||
- Plain `tags: string[][]`, no wrapper.
|
||||
- Direct indexing throughout: `tag[0]` name, `tag[1]` value, `tag[2]`
|
||||
relay, `tag[3]` marker, `tag[4]` pubkey hint.
|
||||
- Address-tag parsing inline per NIP:
|
||||
`let [kind, pubkey, identifier] = tag[1].split(':')`.
|
||||
- NIP-10 supports both explicit markers and legacy positional fallback
|
||||
(oldest/newest heuristic).
|
||||
- Each NIP module owns its own tag construction and parsing; no central
|
||||
tag API.
|
||||
|
||||
### rust-nostr (Rust)
|
||||
|
||||
- `Tag` wraps `Vec<String>` plus `OnceCell<Option<TagStandard>>` for
|
||||
lazy parsed enum.
|
||||
- `TagStandard` enum has 60+ variants covering most NIPs (`Event`,
|
||||
`PublicKey`, `Coordinate`, `Kind`, `Amount`, `Image`, `Title`, ...).
|
||||
- `TagKind<'a>` categorizes: named variants, `SingleLetter(SingleLetterTag)`
|
||||
with case tracking, `Custom(Cow<'a, str>)`.
|
||||
- E-tag parser is position-aware: `tag[3]` attempts Marker first, falls
|
||||
back to PublicKey (NIP-01 legacy); `tag[4]` is PublicKey only if `[3]`
|
||||
was a marker.
|
||||
- A-tag parser uses `Coordinate::from_str`.
|
||||
- `Tags` collection (not `Vec<Tag>`) maintains a
|
||||
`BTreeMap<SingleLetterTag, BTreeSet<String>>` index for dedup and
|
||||
indexed lookup, plus helpers `event_ids()`, `public_keys()`,
|
||||
`coordinates()`.
|
||||
- Trade-offs: extensibility (every new tag type touches the enum),
|
||||
OnceCell overhead per tag, case-preservation fields. Very thorough
|
||||
but heavy.
|
||||
- **We should not replicate the enum approach.** Prefer a thin wrapper
|
||||
over `Vec<String>` and let callers parse.
|
||||
|
||||
### welshman (TypeScript — predecessor of this library)
|
||||
|
||||
- No wrapper class; raw `string[][]`.
|
||||
- 50+ pure functions in `/util/src/Tags.ts`:
|
||||
- Filters: `getTags(tagName, tags)`, `getTag(tagName, tags)`
|
||||
- Value extractors: `getTagValues`, `getTagValue`
|
||||
- Type-specific: `getEventTags`, `getPubkeyTags`, `getAddressTags`,
|
||||
`getRelayTags`, `getTopicTags`, `getKindTags`
|
||||
- Reply logic: `getReplyTags`, `getCommentTags` (NIP-10 + NIP-22
|
||||
uppercase/lowercase dual-tag)
|
||||
- `uniqTags` dedup, `tagger` factory
|
||||
- Dedicated `Address` class with `kind`, `pubkey`, `identifier`, `relays`;
|
||||
factories `from`, `fromNaddr`, `fromEvent`; `isAddress` regex
|
||||
`^\d+:\w+:.*$`; `toString` and `toNaddr`.
|
||||
- Event envelope types (`EventContent`, `EventTemplate`, `StampedEvent`,
|
||||
...) match our exact Rust hierarchy — this is where we borrowed it.
|
||||
Tags stay as `string[][]`.
|
||||
- High-level builders in `/app/src/tags.ts`: `tagEventForReply`,
|
||||
`tagEventForComment`, `tagEventForQuote`, `tagEventForReaction`.
|
||||
|
||||
## Common Patterns
|
||||
|
||||
**Raw lists dominate.** Every library except rust-nostr keeps tags as the
|
||||
native string array. The rust-nostr enum is an outlier, and its heaviness
|
||||
is visible (extensibility pain, memory overhead).
|
||||
|
||||
**Free functions over methods.** Welshman and applesauce both prefer pure
|
||||
functions that take tags and return tags or values. Method-on-type
|
||||
approaches (ndk) tend to get cluttered.
|
||||
|
||||
**Address tags get their own type.** Nostrlib (`EntityPointer`), welshman
|
||||
(`Address`), applesauce (`AddressPointer`), rust-nostr (`Coordinate`)
|
||||
all introduce a small struct for `kind:pubkey:d`. This is consistently
|
||||
the one tag type worth parsing eagerly because it combines three fields
|
||||
that are always used together.
|
||||
|
||||
**Markers are positional.** No library introduces a `Marker` enum
|
||||
dependency that leaks into the base tag type. Marker interpretation
|
||||
happens at the reader site (`getReplyTags` etc.), not at construction
|
||||
time.
|
||||
|
||||
**Single-letter indexing matters for filters.** Nostrlib and rust-nostr
|
||||
explicitly model the single-letter vs multi-character distinction.
|
||||
Applesauce and welshman rely on convention.
|
||||
|
||||
## Considerations for Our Implementation
|
||||
|
||||
Given our literate-programming posture and existing style (thin wrappers
|
||||
over bytes in `keys`, struct pipelines in `events`), we should:
|
||||
|
||||
1. **Introduce `Tag(Vec<String>)` as a tuple wrapper.** Provide `name()`,
|
||||
`value()` (second element or empty), `values()` (all after the first),
|
||||
`get(i)`, `len()`, `as_slice()`, plus `From<Vec<String>>`,
|
||||
`IntoIterator`, `Serialize/Deserialize` that flatten transparently to
|
||||
an array. `new` constructor that takes a name and variadic values.
|
||||
|
||||
2. **Free functions on `&[Tag]`.** `find(tags, name)`, `find_all(tags, name)`,
|
||||
`values(tags, name)`, `value(tags, name)`, `has(tags, name)` as
|
||||
standalone helpers. Keep them name-agnostic — `name` is `&str`.
|
||||
|
||||
3. **An `Address` struct for `a` tags.** Fields `kind: u16`,
|
||||
`pubkey: PublicKey`, `identifier: String`, plus optional `relays`.
|
||||
Implement `FromStr`/`Display` for the `kind:pubkey:d` form, and an
|
||||
`Address::to_tag()` / `Address::from_tag()` pair. Keep it minimal —
|
||||
no `naddr` yet (that lands in the bech32/entities chapter).
|
||||
|
||||
4. **Update `Event` and friends to use `Vec<Tag>`.** The events chapter
|
||||
left tags as `Vec<Vec<String>>` explicitly because the `Tag` type
|
||||
wasn't ready. Swap it now and keep the canonical hash bytes identical
|
||||
(serialize `Tag` transparently as `Vec<String>` in the canonical form).
|
||||
|
||||
5. **Stay neutral on semantics.** No `TagKind` enum, no marker parsing
|
||||
baked into `Tag`. Building-nostr is explicit that tags must be
|
||||
interpreted in the context of the event kind; a generic type should
|
||||
not try to know better.
|
||||
|
||||
6. **Brief section on markers.** Show how to read NIP-10 markers
|
||||
positionally — `tag.get(3)` — without introducing a marker type. The
|
||||
marker-aware reply threading will belong in a later chapter.
|
||||
|
||||
7. **No hidden-tag / modify pipelines.** That belongs later, with
|
||||
encryption of private tag lists.
|
||||
|
||||
The goal is a type that disappears when you're not using it and becomes
|
||||
helpful the moment you are — exactly the "little more than an empty
|
||||
shell" that building-nostr describes for nostr itself.
|
||||
Reference in New Issue
Block a user