Add filters chapter

This commit is contained in:
Jon Staab
2026-04-21 12:08:55 -07:00
parent c8f6bc1652
commit a8a57a3d77
6 changed files with 1876 additions and 36 deletions
+241
View File
@@ -0,0 +1,241 @@
# Research: Filters
## Topic Summary
Filters are the NIP-01 data structure for matching events. They form an elegant primitive for
matching events independent of the client/relay context — not just for REQ messages, but as a
general-purpose event matching and querying abstraction. The chapter should cover the filter
structure, matching semantics (AND within a filter, OR across filters), tag filters, timestamp
constraints, limits, and programmatic construction/manipulation of filters.
## Philosophy
From `ref/building-nostr`:
**Filters as a conceptual category for tags**: The author identifies "filter tags" as one of
three tag categories (alongside data and behavior tags). Filter tags are "especially useful for
filtering and retrieval" and tend to be single-letter because relays only index single-letter
tags to reduce database overhead.
**Filter structure (NIP-01)**: Standard fields are `ids`, `authors`, `kinds`, `since`, `until`,
`limit`. Tag filters are created by prefixing tag names with `#` (e.g., `#p`, `#e`). Extensions
include prefix matching and NIP-50 `search`. Negative matches were proposed but rejected due to
relay performance concerns.
**Filters as a routing problem**: "Where to send a given filter" is distinct from "where to send
a given event." Filters have less information than events, making routing harder. A routing
heuristic "connects a filter that might be constructed to support a particular use case with the
relay where matching events are stored." This is analogous to database indexes.
**Design principles**:
- **Minimalism**: Filters match discrete criteria without complex negation or boolean logic
- **Decentralization**: Clients understand routing heuristics; relays don't need to understand intent
- **Extensibility**: The `#tag` convention allows arbitrary new filters without protocol changes
- **Trade-offs**: Single-letter indexing limits expressiveness but maintains relay scalability
- **Partition tolerance**: Missing a relay means missing some events, which is acceptable
## Reference Implementation Analysis
### applesauce
**Types**: `Filter` extends nostr-tools' `CoreFilter` with NIP-91 AND operator support (`&`-prefixed
tag names) and NIP-50 `search` field. All standard NIP-01 fields present.
**Matching**: `matchFilter(filter, event)` checks basic fields first for early rejection, then
uses `getIndexableTags(event)` to build a cached `Set<string>` of `"tagName:value"` pairs on the
event (via Symbol key). NIP-91 AND tags processed before OR tags. `matchFilters` implements OR
across filter arrays.
**Utilities**: `mergeFilters(...filters)` unions array fields and deduplicates; takes minimum
`limit`, minimum `since`, maximum `until`. `isFilterEqual` uses `fast-deep-equal` for subscription
deduplication. `createFilterMap` distributes filters across relays by author.
**SQL layer**: Separate `buildFilterConditions` translates filters to SQL WHERE clauses. AND tags
use `GROUP BY`/`HAVING COUNT` subqueries. OR tags use `IN` subqueries.
**Patterns**: Symbol-based memoization of tag indexes on event objects. Early exit optimization.
Only indexes single-letter tags. RxJS integration for streaming filter results.
### ndk
**Types**: `NDKFilter<K extends NDKKind>` — generic parameterized type with all NIP-01 fields plus
`search`. Dynamic tag properties via `[key: #${string}]`.
**Matching**: `matchFilter(filter, event)` uses `indexOf()` for array membership. Special case:
`#t` (hashtag) tags are case-insensitive. Does NOT check `limit` or `search` — these are
submission constraints, not matching criteria. Short-circuits on first mismatch.
**Utilities**:
- `mergeFilters()` — unions arrays, deduplicates via Set. Preserves filters with `limit` separately (limits can't be merged). Returns array.
- `filterFingerprint()` — deterministic hash of filter structure for subscription grouping
- `compareFilter()` — checks if one filter is a subset of another (for cache-hit validation)
- `filterFromId()` — converts bech32 identifiers to filters
- `filterForEventsTaggingId()` — creates tag filters for events referencing a given ID
**Validation**: Three modes (VALIDATE/FIX/IGNORE). Checks for undefined values, type correctness,
hex format, kind range. Guardrails catch common mistakes: empty filters, bech32 in hex arrays,
`since > until`, `#t` with literal `#` prefix.
**Patterns**: Generic type parameterization. Pluggable validation modes. Readable subscription IDs
generated from filter structure.
### nostr-gadgets
Uses `nostr-tools` filter types and functions directly — no custom filter implementation.
**Construction patterns**: Mutable accumulation (`filter.authors?.push(target)`), inline literals,
spread-based composition (`{ ...f, authors: [pubkey], since: newest }`).
**Filter-based deletion**: Converts event tags to filter arrays for batch deletion operations.
**Multi-level filtering**: Filters used at query construction, relay permission checking (purgatory),
and client-side event matching.
### nostrlib
**Types**: Go struct with `IDs []ID`, `Kinds []Kind`, `Authors []PubKey`, `Tags TagMap`,
`Since Timestamp`, `Until Timestamp`, `Limit int`, `Search string`, `LimitZero bool`. Uses
fixed-size byte arrays for IDs/PubKeys.
**Matching**: Two methods:
- `Matches(event)` — full matching including timestamp constraints
- `MatchesIgnoringTimestampConstraints(event)``[//go:inline]` optimized, used for live events after EOSE
Tag matching via `tags.ContainsAny(tagName, values)`. Uses `slices.Contains()` for array membership.
**Utilities**: `Clone()` deep copies. `FilterEqual()` order-independent comparison. `GetTheoreticalLimit()`
estimates max results considering replaceability. No merging functions.
**Serialization**: Custom easyjson codec with `xhex` for fast hex encoding. `LimitZero` bool
distinguishes `"limit": 0` from omitted limit.
**Patterns**: Pure data structure with stateless methods. Subscription switches matching function
after EOSE. Query optimizer scores tags by "goodness" for index selection.
### nostr-tools
**Types**: Simple TypeScript type with all NIP-01 fields. Index signature `[key: #${string}]` for
dynamic tag filters. All properties optional.
**Matching**: `matchFilter(filter, event)` — conjunctive (AND) matching. Uses `indexOf()` for
membership. Iterates filter properties for `#`-prefixed tag filters. Both `since` and `until` are
inclusive. `matchFilters` implements OR across array.
**Utilities**:
- `mergeFilters(...filters)` — unions array properties, takes max `limit`, min `since`, max `until`
- `getFilterLimit(filter)` — computes intrinsic limit considering replaceability:
- Empty arrays → 0
- IDs → `ids.length`
- Replaceable kinds → `authors.length * kinds.length`
- Addressable kinds → `authors.length * kinds.length * #d.length`
- Returns minimum across all applicable constraints
**Patterns**: Minimalist, functional, no external dependencies for filter logic. Pure JavaScript.
Early exit on mismatch. Kind classification integration for limit calculation.
**Design**: Self-contained. No validation. `search` field defined but not used in matching logic.
### rust-nostr
**Types**: `Filter` struct with `Option<BTreeSet<T>>` for all set fields. Uses `BTreeSet` for
O(log n) lookups and deterministic serialization. `generic_tags: BTreeMap<SingleLetterTag, BTreeSet<String>>`
for dynamic tag filters.
**Matching**: `match_event(&self, event, opts)` with `MatchEventOptions` controlling which fields
to check (7 boolean flags). Individual match methods are `#[inline]`. Tag matching uses lazy-initialized
`event.tags.indexes()` (OnceCell pattern). NIP-50 search: case-insensitive substring via `.windows()`.
**Builder pattern**: Fluent chainable methods consuming `self`: `Filter::new().kind(k).author(pk)`.
Convenience methods for common tags: `.event()`, `.pubkey()`, `.hashtag()`, `.identifier()`,
`.coordinate()`. Generic `.custom_tag()` for arbitrary tags. Remove methods return `None` if set
becomes empty.
**Option semantics**: `None` = no constraint (matches all). `Some(empty_set)` = matches nothing.
This distinction is explicitly documented (GitHub issue #302).
**Utilities**: `is_empty()`, `extract_public_keys()`. No merging/combining API — multiple filters
handled at protocol layer.
**Patterns**: no_std compatible (uses `alloc`). BTreeSet for deterministic ordering. Custom serde
with `#[serde(flatten)]` for generic tags.
### welshman
**Types**: Standard NIP-01 filter type. `neverFilter = {ids: []}` constant for "matches nothing."
**Matching**: Delegates to nostr-tools for NIP-01 matching. Extends with search: splits by
whitespace, case-insensitive, requires ALL terms match (AND logic).
**Utilities**:
- `getFilterId(filter)` — deterministic hash for deduplication (sort keys, join, hash)
- `calculateFilterGroup(filter)` — groups by matching space (structural fields vs temporal)
- `unionFilters(filters)` — groups by `calculateFilterGroup`, merges arrays within groups
- `intersectFilters(groups)` — Cartesian product across filter groups with intelligent merging (max `since`, min `until`, max `limit`, concatenate `search`)
- `getIdFilters(idsOrAddresses)` — converts mix of IDs and addresses to filters
- `getReplyFilters(events)` — generates filters for replies (#e for regular, #a for replaceable)
- `addRepostFilters(filters)` — adds repost kind variants
- `trimFilter(filter)` — caps array fields at 1000 items with random sampling
- `getFilterGenerality()` — heuristic score 0 (specific) to 1 (general)
**Patterns**: Functional composition. Immutable transformations. Hash-based deduplication.
Domain-driven builders for common query patterns.
## Common Patterns
1. **Type structure**: All implementations use optional fields. Missing = no constraint. The `#tag`
convention for dynamic tag filters is universal.
2. **AND/OR semantics**: Universal agreement — AND within a single filter, OR across an array of
filters. This is fundamental to NIP-01.
3. **Matching order**: Most implementations check scalar fields first (ids, kinds, authors) for
early exit before the more expensive tag matching.
4. **Tag indexing**: Several implementations build cached indexes on events for efficient repeated
matching (applesauce: Symbol-based Set cache; rust-nostr: OnceCell BTreeMap).
5. **No negation**: No implementation supports negative matching (NOT). This aligns with protocol
design — rejected for relay performance reasons.
6. **Limit semantics**: `limit` is not a matching criterion — it's a result count constraint.
Most matching functions ignore it. `getFilterLimit`/`GetTheoreticalLimit` computes intrinsic
upper bounds based on kind replaceability.
7. **Merging**: Most implementations provide union-style filter merging. Array fields are unioned
and deduplicated. Scalar fields use min/max logic.
8. **Option vs empty**: rust-nostr explicitly distinguishes `None` (no constraint) from
`Some(empty)` (matches nothing). Other implementations handle this implicitly.
## Considerations for Our Implementation
1. **Filter as a standalone primitive**: Frame filters independent of REQ messages. They're a
general-purpose matching predicate over events.
2. **Struct design**: Use `Option<BTreeSet<T>>` following rust-nostr's approach — it correctly
models the distinction between "no constraint" and "empty constraint." BTreeSet gives
deterministic serialization and O(log n) lookups.
3. **Matching function**: Implement `matches(&self, event: &Event) -> bool` with early exit on
scalar fields. Tag matching should use event tag indexes.
4. **Builder pattern**: Fluent API for construction: `Filter::new().kind(1).author(pk)`.
Convenience methods for common tags (#e, #p, #d, #a, #t).
5. **Generic tag filters**: Support arbitrary single-letter tag filters via
`BTreeMap<SingleLetterTag, BTreeSet<String>>` or similar.
6. **Serialization**: Custom JSON serialization to flatten generic tags as `#tag` keys. Handle
`limit: 0` vs omitted limit.
7. **No merging in core**: Following rust-nostr, keep the filter primitive simple. Merging and
combining can live in higher-level utilities if needed.
8. **Limit calculation**: Consider `getFilterLimit`-style intrinsic limit computation based on
kind replaceability — useful for query optimization.
9. **Dependencies**: Filter should depend only on existing types (Event, EventId, Pubkey, Kind,
Timestamp, Tags). Self-contained within coracle-lib.
10. **Test strategy**: Test matching logic thoroughly — all field types, AND semantics, tag
matching, timestamp boundaries, empty sets vs None, edge cases.