Add filters chapter

2026-04-21 12:08:55 -07:00
parent c8f6bc1652
commit a8a57a3d77
6 changed files with 1876 additions and 36 deletions
@@ -0,0 +1,241 @@
+# Research: Filters
+
+## Topic Summary
+
+Filters are the NIP-01 data structure for matching events. They form an elegant primitive for
+matching events independent of the client/relay context — not just for REQ messages, but as a
+general-purpose event matching and querying abstraction. The chapter should cover the filter
+structure, matching semantics (AND within a filter, OR across filters), tag filters, timestamp
+constraints, limits, and programmatic construction/manipulation of filters.
+
+## Philosophy
+
+From `ref/building-nostr`:
+
+**Filters as a conceptual category for tags**: The author identifies "filter tags" as one of
+three tag categories (alongside data and behavior tags). Filter tags are "especially useful for
+filtering and retrieval" and tend to be single-letter because relays only index single-letter
+tags to reduce database overhead.
+
+**Filter structure (NIP-01)**: Standard fields are `ids`, `authors`, `kinds`, `since`, `until`,
+`limit`. Tag filters are created by prefixing tag names with `#` (e.g., `#p`, `#e`). Extensions
+include prefix matching and NIP-50 `search`. Negative matches were proposed but rejected due to
+relay performance concerns.
+
+**Filters as a routing problem**: "Where to send a given filter" is distinct from "where to send
+a given event." Filters have less information than events, making routing harder. A routing
+heuristic "connects a filter that might be constructed to support a particular use case with the
+relay where matching events are stored." This is analogous to database indexes.
+
+**Design principles**:
+- **Minimalism**: Filters match discrete criteria without complex negation or boolean logic
+- **Decentralization**: Clients understand routing heuristics; relays don't need to understand intent
+- **Extensibility**: The `#tag` convention allows arbitrary new filters without protocol changes
+- **Trade-offs**: Single-letter indexing limits expressiveness but maintains relay scalability
+- **Partition tolerance**: Missing a relay means missing some events, which is acceptable
+
+## Reference Implementation Analysis
+
+### applesauce
+
+**Types**: `Filter` extends nostr-tools' `CoreFilter` with NIP-91 AND operator support (`&`-prefixed
+tag names) and NIP-50 `search` field. All standard NIP-01 fields present.
+
+**Matching**: `matchFilter(filter, event)` checks basic fields first for early rejection, then
+uses `getIndexableTags(event)` to build a cached `Set<string>` of `"tagName:value"` pairs on the
+event (via Symbol key). NIP-91 AND tags processed before OR tags. `matchFilters` implements OR
+across filter arrays.
+
+**Utilities**: `mergeFilters(...filters)` unions array fields and deduplicates; takes minimum
+`limit`, minimum `since`, maximum `until`. `isFilterEqual` uses `fast-deep-equal` for subscription
+deduplication. `createFilterMap` distributes filters across relays by author.
+
+**SQL layer**: Separate `buildFilterConditions` translates filters to SQL WHERE clauses. AND tags
+use `GROUP BY`/`HAVING COUNT` subqueries. OR tags use `IN` subqueries.
+
+**Patterns**: Symbol-based memoization of tag indexes on event objects. Early exit optimization.
+Only indexes single-letter tags. RxJS integration for streaming filter results.
+
+### ndk
+
+**Types**: `NDKFilter<K extends NDKKind>` — generic parameterized type with all NIP-01 fields plus
+`search`. Dynamic tag properties via `[key: #${string}]`.
+
+**Matching**: `matchFilter(filter, event)` uses `indexOf()` for array membership. Special case:
+`#t` (hashtag) tags are case-insensitive. Does NOT check `limit` or `search` — these are
+submission constraints, not matching criteria. Short-circuits on first mismatch.
+
+**Utilities**:
+- `mergeFilters()` — unions arrays, deduplicates via Set. Preserves filters with `limit` separately (limits can't be merged). Returns array.
+- `filterFingerprint()` — deterministic hash of filter structure for subscription grouping
+- `compareFilter()` — checks if one filter is a subset of another (for cache-hit validation)
+- `filterFromId()` — converts bech32 identifiers to filters
+- `filterForEventsTaggingId()` — creates tag filters for events referencing a given ID
+
+**Validation**: Three modes (VALIDATE/FIX/IGNORE). Checks for undefined values, type correctness,
+hex format, kind range. Guardrails catch common mistakes: empty filters, bech32 in hex arrays,
+`since > until`, `#t` with literal `#` prefix.
+
+**Patterns**: Generic type parameterization. Pluggable validation modes. Readable subscription IDs
+generated from filter structure.
+
+### nostr-gadgets
+
+Uses `nostr-tools` filter types and functions directly — no custom filter implementation.
+
+**Construction patterns**: Mutable accumulation (`filter.authors?.push(target)`), inline literals,
+spread-based composition (`{ ...f, authors: [pubkey], since: newest }`).
+
+**Filter-based deletion**: Converts event tags to filter arrays for batch deletion operations.
+
+**Multi-level filtering**: Filters used at query construction, relay permission checking (purgatory),
+and client-side event matching.
+
+### nostrlib
+
+**Types**: Go struct with `IDs []ID`, `Kinds []Kind`, `Authors []PubKey`, `Tags TagMap`,
+`Since Timestamp`, `Until Timestamp`, `Limit int`, `Search string`, `LimitZero bool`. Uses
+fixed-size byte arrays for IDs/PubKeys.
+
+**Matching**: Two methods:
+- `Matches(event)` — full matching including timestamp constraints
+- `MatchesIgnoringTimestampConstraints(event)` — `[//go:inline]` optimized, used for live events after EOSE
+
+Tag matching via `tags.ContainsAny(tagName, values)`. Uses `slices.Contains()` for array membership.
+
+**Utilities**: `Clone()` deep copies. `FilterEqual()` order-independent comparison. `GetTheoreticalLimit()`
+estimates max results considering replaceability. No merging functions.
+
+**Serialization**: Custom easyjson codec with `xhex` for fast hex encoding. `LimitZero` bool
+distinguishes `"limit": 0` from omitted limit.
+
+**Patterns**: Pure data structure with stateless methods. Subscription switches matching function
+after EOSE. Query optimizer scores tags by "goodness" for index selection.
+
+### nostr-tools
+
+**Types**: Simple TypeScript type with all NIP-01 fields. Index signature `[key: #${string}]` for
+dynamic tag filters. All properties optional.
+
+**Matching**: `matchFilter(filter, event)` — conjunctive (AND) matching. Uses `indexOf()` for
+membership. Iterates filter properties for `#`-prefixed tag filters. Both `since` and `until` are
+inclusive. `matchFilters` implements OR across array.
+
+**Utilities**:
+- `mergeFilters(...filters)` — unions array properties, takes max `limit`, min `since`, max `until`
+- `getFilterLimit(filter)` — computes intrinsic limit considering replaceability:
+  - Empty arrays → 0
+  - IDs → `ids.length`
+  - Replaceable kinds → `authors.length * kinds.length`
+  - Addressable kinds → `authors.length * kinds.length * #d.length`
+  - Returns minimum across all applicable constraints
+
+**Patterns**: Minimalist, functional, no external dependencies for filter logic. Pure JavaScript.
+Early exit on mismatch. Kind classification integration for limit calculation.
+
+**Design**: Self-contained. No validation. `search` field defined but not used in matching logic.
+
+### rust-nostr
+
+**Types**: `Filter` struct with `Option<BTreeSet<T>>` for all set fields. Uses `BTreeSet` for
+O(log n) lookups and deterministic serialization. `generic_tags: BTreeMap<SingleLetterTag, BTreeSet<String>>`
+for dynamic tag filters.
+
+**Matching**: `match_event(&self, event, opts)` with `MatchEventOptions` controlling which fields
+to check (7 boolean flags). Individual match methods are `#[inline]`. Tag matching uses lazy-initialized
+`event.tags.indexes()` (OnceCell pattern). NIP-50 search: case-insensitive substring via `.windows()`.
+
+**Builder pattern**: Fluent chainable methods consuming `self`: `Filter::new().kind(k).author(pk)`.
+Convenience methods for common tags: `.event()`, `.pubkey()`, `.hashtag()`, `.identifier()`,
+`.coordinate()`. Generic `.custom_tag()` for arbitrary tags. Remove methods return `None` if set
+becomes empty.
+
+**Option semantics**: `None` = no constraint (matches all). `Some(empty_set)` = matches nothing.
+This distinction is explicitly documented (GitHub issue #302).
+
+**Utilities**: `is_empty()`, `extract_public_keys()`. No merging/combining API — multiple filters
+handled at protocol layer.
+
+**Patterns**: no_std compatible (uses `alloc`). BTreeSet for deterministic ordering. Custom serde
+with `#[serde(flatten)]` for generic tags.
+
+### welshman
+
+**Types**: Standard NIP-01 filter type. `neverFilter = {ids: []}` constant for "matches nothing."
+
+**Matching**: Delegates to nostr-tools for NIP-01 matching. Extends with search: splits by
+whitespace, case-insensitive, requires ALL terms match (AND logic).
+
+**Utilities**:
+- `getFilterId(filter)` — deterministic hash for deduplication (sort keys, join, hash)
+- `calculateFilterGroup(filter)` — groups by matching space (structural fields vs temporal)
+- `unionFilters(filters)` — groups by `calculateFilterGroup`, merges arrays within groups
+- `intersectFilters(groups)` — Cartesian product across filter groups with intelligent merging (max `since`, min `until`, max `limit`, concatenate `search`)
+- `getIdFilters(idsOrAddresses)` — converts mix of IDs and addresses to filters
+- `getReplyFilters(events)` — generates filters for replies (#e for regular, #a for replaceable)
+- `addRepostFilters(filters)` — adds repost kind variants
+- `trimFilter(filter)` — caps array fields at 1000 items with random sampling
+- `getFilterGenerality()` — heuristic score 0 (specific) to 1 (general)
+
+**Patterns**: Functional composition. Immutable transformations. Hash-based deduplication.
+Domain-driven builders for common query patterns.
+
+## Common Patterns
+
+1. **Type structure**: All implementations use optional fields. Missing = no constraint. The `#tag`
+   convention for dynamic tag filters is universal.
+
+2. **AND/OR semantics**: Universal agreement — AND within a single filter, OR across an array of
+   filters. This is fundamental to NIP-01.
+
+3. **Matching order**: Most implementations check scalar fields first (ids, kinds, authors) for
+   early exit before the more expensive tag matching.
+
+4. **Tag indexing**: Several implementations build cached indexes on events for efficient repeated
+   matching (applesauce: Symbol-based Set cache; rust-nostr: OnceCell BTreeMap).
+
+5. **No negation**: No implementation supports negative matching (NOT). This aligns with protocol
+   design — rejected for relay performance reasons.
+
+6. **Limit semantics**: `limit` is not a matching criterion — it's a result count constraint.
+   Most matching functions ignore it. `getFilterLimit`/`GetTheoreticalLimit` computes intrinsic
+   upper bounds based on kind replaceability.
+
+7. **Merging**: Most implementations provide union-style filter merging. Array fields are unioned
+   and deduplicated. Scalar fields use min/max logic.
+
+8. **Option vs empty**: rust-nostr explicitly distinguishes `None` (no constraint) from
+   `Some(empty)` (matches nothing). Other implementations handle this implicitly.
+
+## Considerations for Our Implementation
+
+1. **Filter as a standalone primitive**: Frame filters independent of REQ messages. They're a
+   general-purpose matching predicate over events.
+
+2. **Struct design**: Use `Option<BTreeSet<T>>` following rust-nostr's approach — it correctly
+   models the distinction between "no constraint" and "empty constraint." BTreeSet gives
+   deterministic serialization and O(log n) lookups.
+
+3. **Matching function**: Implement `matches(&self, event: &Event) -> bool` with early exit on
+   scalar fields. Tag matching should use event tag indexes.
+
+4. **Builder pattern**: Fluent API for construction: `Filter::new().kind(1).author(pk)`.
+   Convenience methods for common tags (#e, #p, #d, #a, #t).
+
+5. **Generic tag filters**: Support arbitrary single-letter tag filters via
+   `BTreeMap<SingleLetterTag, BTreeSet<String>>` or similar.
+
+6. **Serialization**: Custom JSON serialization to flatten generic tags as `#tag` keys. Handle
+   `limit: 0` vs omitted limit.
+
+7. **No merging in core**: Following rust-nostr, keep the filter primitive simple. Merging and
+   combining can live in higher-level utilities if needed.
+
+8. **Limit calculation**: Consider `getFilterLimit`-style intrinsic limit computation based on
+   kind replaceability — useful for query optimization.
+
+9. **Dependencies**: Filter should depend only on existing types (Event, EventId, Pubkey, Kind,
+   Timestamp, Tags). Self-contained within coracle-lib.
+
+10. **Test strategy**: Test matching logic thoroughly — all field types, AND semantics, tag
+    matching, timestamp boundaries, empty sets vs None, edge cases.