Add filters chapter
This commit is contained in:
@@ -0,0 +1,241 @@
|
||||
# Research: Filters
|
||||
|
||||
## Topic Summary
|
||||
|
||||
Filters are the NIP-01 data structure for matching events. They form an elegant primitive for
|
||||
matching events independent of the client/relay context — not just for REQ messages, but as a
|
||||
general-purpose event matching and querying abstraction. The chapter should cover the filter
|
||||
structure, matching semantics (AND within a filter, OR across filters), tag filters, timestamp
|
||||
constraints, limits, and programmatic construction/manipulation of filters.
|
||||
|
||||
## Philosophy
|
||||
|
||||
From `ref/building-nostr`:
|
||||
|
||||
**Filters as a conceptual category for tags**: The author identifies "filter tags" as one of
|
||||
three tag categories (alongside data and behavior tags). Filter tags are "especially useful for
|
||||
filtering and retrieval" and tend to be single-letter because relays only index single-letter
|
||||
tags to reduce database overhead.
|
||||
|
||||
**Filter structure (NIP-01)**: Standard fields are `ids`, `authors`, `kinds`, `since`, `until`,
|
||||
`limit`. Tag filters are created by prefixing tag names with `#` (e.g., `#p`, `#e`). Extensions
|
||||
include prefix matching and NIP-50 `search`. Negative matches were proposed but rejected due to
|
||||
relay performance concerns.
|
||||
|
||||
**Filters as a routing problem**: "Where to send a given filter" is distinct from "where to send
|
||||
a given event." Filters have less information than events, making routing harder. A routing
|
||||
heuristic "connects a filter that might be constructed to support a particular use case with the
|
||||
relay where matching events are stored." This is analogous to database indexes.
|
||||
|
||||
**Design principles**:
|
||||
- **Minimalism**: Filters match discrete criteria without complex negation or boolean logic
|
||||
- **Decentralization**: Clients understand routing heuristics; relays don't need to understand intent
|
||||
- **Extensibility**: The `#tag` convention allows arbitrary new filters without protocol changes
|
||||
- **Trade-offs**: Single-letter indexing limits expressiveness but maintains relay scalability
|
||||
- **Partition tolerance**: Missing a relay means missing some events, which is acceptable
|
||||
|
||||
## Reference Implementation Analysis
|
||||
|
||||
### applesauce
|
||||
|
||||
**Types**: `Filter` extends nostr-tools' `CoreFilter` with NIP-91 AND operator support (`&`-prefixed
|
||||
tag names) and NIP-50 `search` field. All standard NIP-01 fields present.
|
||||
|
||||
**Matching**: `matchFilter(filter, event)` checks basic fields first for early rejection, then
|
||||
uses `getIndexableTags(event)` to build a cached `Set<string>` of `"tagName:value"` pairs on the
|
||||
event (via Symbol key). NIP-91 AND tags processed before OR tags. `matchFilters` implements OR
|
||||
across filter arrays.
|
||||
|
||||
**Utilities**: `mergeFilters(...filters)` unions array fields and deduplicates; takes minimum
|
||||
`limit`, minimum `since`, maximum `until`. `isFilterEqual` uses `fast-deep-equal` for subscription
|
||||
deduplication. `createFilterMap` distributes filters across relays by author.
|
||||
|
||||
**SQL layer**: Separate `buildFilterConditions` translates filters to SQL WHERE clauses. AND tags
|
||||
use `GROUP BY`/`HAVING COUNT` subqueries. OR tags use `IN` subqueries.
|
||||
|
||||
**Patterns**: Symbol-based memoization of tag indexes on event objects. Early exit optimization.
|
||||
Only indexes single-letter tags. RxJS integration for streaming filter results.
|
||||
|
||||
### ndk
|
||||
|
||||
**Types**: `NDKFilter<K extends NDKKind>` — generic parameterized type with all NIP-01 fields plus
|
||||
`search`. Dynamic tag properties via `[key: #${string}]`.
|
||||
|
||||
**Matching**: `matchFilter(filter, event)` uses `indexOf()` for array membership. Special case:
|
||||
`#t` (hashtag) tags are case-insensitive. Does NOT check `limit` or `search` — these are
|
||||
submission constraints, not matching criteria. Short-circuits on first mismatch.
|
||||
|
||||
**Utilities**:
|
||||
- `mergeFilters()` — unions arrays, deduplicates via Set. Preserves filters with `limit` separately (limits can't be merged). Returns array.
|
||||
- `filterFingerprint()` — deterministic hash of filter structure for subscription grouping
|
||||
- `compareFilter()` — checks if one filter is a subset of another (for cache-hit validation)
|
||||
- `filterFromId()` — converts bech32 identifiers to filters
|
||||
- `filterForEventsTaggingId()` — creates tag filters for events referencing a given ID
|
||||
|
||||
**Validation**: Three modes (VALIDATE/FIX/IGNORE). Checks for undefined values, type correctness,
|
||||
hex format, kind range. Guardrails catch common mistakes: empty filters, bech32 in hex arrays,
|
||||
`since > until`, `#t` with literal `#` prefix.
|
||||
|
||||
**Patterns**: Generic type parameterization. Pluggable validation modes. Readable subscription IDs
|
||||
generated from filter structure.
|
||||
|
||||
### nostr-gadgets
|
||||
|
||||
Uses `nostr-tools` filter types and functions directly — no custom filter implementation.
|
||||
|
||||
**Construction patterns**: Mutable accumulation (`filter.authors?.push(target)`), inline literals,
|
||||
spread-based composition (`{ ...f, authors: [pubkey], since: newest }`).
|
||||
|
||||
**Filter-based deletion**: Converts event tags to filter arrays for batch deletion operations.
|
||||
|
||||
**Multi-level filtering**: Filters used at query construction, relay permission checking (purgatory),
|
||||
and client-side event matching.
|
||||
|
||||
### nostrlib
|
||||
|
||||
**Types**: Go struct with `IDs []ID`, `Kinds []Kind`, `Authors []PubKey`, `Tags TagMap`,
|
||||
`Since Timestamp`, `Until Timestamp`, `Limit int`, `Search string`, `LimitZero bool`. Uses
|
||||
fixed-size byte arrays for IDs/PubKeys.
|
||||
|
||||
**Matching**: Two methods:
|
||||
- `Matches(event)` — full matching including timestamp constraints
|
||||
- `MatchesIgnoringTimestampConstraints(event)` — `[//go:inline]` optimized, used for live events after EOSE
|
||||
|
||||
Tag matching via `tags.ContainsAny(tagName, values)`. Uses `slices.Contains()` for array membership.
|
||||
|
||||
**Utilities**: `Clone()` deep copies. `FilterEqual()` order-independent comparison. `GetTheoreticalLimit()`
|
||||
estimates max results considering replaceability. No merging functions.
|
||||
|
||||
**Serialization**: Custom easyjson codec with `xhex` for fast hex encoding. `LimitZero` bool
|
||||
distinguishes `"limit": 0` from omitted limit.
|
||||
|
||||
**Patterns**: Pure data structure with stateless methods. Subscription switches matching function
|
||||
after EOSE. Query optimizer scores tags by "goodness" for index selection.
|
||||
|
||||
### nostr-tools
|
||||
|
||||
**Types**: Simple TypeScript type with all NIP-01 fields. Index signature `[key: #${string}]` for
|
||||
dynamic tag filters. All properties optional.
|
||||
|
||||
**Matching**: `matchFilter(filter, event)` — conjunctive (AND) matching. Uses `indexOf()` for
|
||||
membership. Iterates filter properties for `#`-prefixed tag filters. Both `since` and `until` are
|
||||
inclusive. `matchFilters` implements OR across array.
|
||||
|
||||
**Utilities**:
|
||||
- `mergeFilters(...filters)` — unions array properties, takes max `limit`, min `since`, max `until`
|
||||
- `getFilterLimit(filter)` — computes intrinsic limit considering replaceability:
|
||||
- Empty arrays → 0
|
||||
- IDs → `ids.length`
|
||||
- Replaceable kinds → `authors.length * kinds.length`
|
||||
- Addressable kinds → `authors.length * kinds.length * #d.length`
|
||||
- Returns minimum across all applicable constraints
|
||||
|
||||
**Patterns**: Minimalist, functional, no external dependencies for filter logic. Pure JavaScript.
|
||||
Early exit on mismatch. Kind classification integration for limit calculation.
|
||||
|
||||
**Design**: Self-contained. No validation. `search` field defined but not used in matching logic.
|
||||
|
||||
### rust-nostr
|
||||
|
||||
**Types**: `Filter` struct with `Option<BTreeSet<T>>` for all set fields. Uses `BTreeSet` for
|
||||
O(log n) lookups and deterministic serialization. `generic_tags: BTreeMap<SingleLetterTag, BTreeSet<String>>`
|
||||
for dynamic tag filters.
|
||||
|
||||
**Matching**: `match_event(&self, event, opts)` with `MatchEventOptions` controlling which fields
|
||||
to check (7 boolean flags). Individual match methods are `#[inline]`. Tag matching uses lazy-initialized
|
||||
`event.tags.indexes()` (OnceCell pattern). NIP-50 search: case-insensitive substring via `.windows()`.
|
||||
|
||||
**Builder pattern**: Fluent chainable methods consuming `self`: `Filter::new().kind(k).author(pk)`.
|
||||
Convenience methods for common tags: `.event()`, `.pubkey()`, `.hashtag()`, `.identifier()`,
|
||||
`.coordinate()`. Generic `.custom_tag()` for arbitrary tags. Remove methods return `None` if set
|
||||
becomes empty.
|
||||
|
||||
**Option semantics**: `None` = no constraint (matches all). `Some(empty_set)` = matches nothing.
|
||||
This distinction is explicitly documented (GitHub issue #302).
|
||||
|
||||
**Utilities**: `is_empty()`, `extract_public_keys()`. No merging/combining API — multiple filters
|
||||
handled at protocol layer.
|
||||
|
||||
**Patterns**: no_std compatible (uses `alloc`). BTreeSet for deterministic ordering. Custom serde
|
||||
with `#[serde(flatten)]` for generic tags.
|
||||
|
||||
### welshman
|
||||
|
||||
**Types**: Standard NIP-01 filter type. `neverFilter = {ids: []}` constant for "matches nothing."
|
||||
|
||||
**Matching**: Delegates to nostr-tools for NIP-01 matching. Extends with search: splits by
|
||||
whitespace, case-insensitive, requires ALL terms match (AND logic).
|
||||
|
||||
**Utilities**:
|
||||
- `getFilterId(filter)` — deterministic hash for deduplication (sort keys, join, hash)
|
||||
- `calculateFilterGroup(filter)` — groups by matching space (structural fields vs temporal)
|
||||
- `unionFilters(filters)` — groups by `calculateFilterGroup`, merges arrays within groups
|
||||
- `intersectFilters(groups)` — Cartesian product across filter groups with intelligent merging (max `since`, min `until`, max `limit`, concatenate `search`)
|
||||
- `getIdFilters(idsOrAddresses)` — converts mix of IDs and addresses to filters
|
||||
- `getReplyFilters(events)` — generates filters for replies (#e for regular, #a for replaceable)
|
||||
- `addRepostFilters(filters)` — adds repost kind variants
|
||||
- `trimFilter(filter)` — caps array fields at 1000 items with random sampling
|
||||
- `getFilterGenerality()` — heuristic score 0 (specific) to 1 (general)
|
||||
|
||||
**Patterns**: Functional composition. Immutable transformations. Hash-based deduplication.
|
||||
Domain-driven builders for common query patterns.
|
||||
|
||||
## Common Patterns
|
||||
|
||||
1. **Type structure**: All implementations use optional fields. Missing = no constraint. The `#tag`
|
||||
convention for dynamic tag filters is universal.
|
||||
|
||||
2. **AND/OR semantics**: Universal agreement — AND within a single filter, OR across an array of
|
||||
filters. This is fundamental to NIP-01.
|
||||
|
||||
3. **Matching order**: Most implementations check scalar fields first (ids, kinds, authors) for
|
||||
early exit before the more expensive tag matching.
|
||||
|
||||
4. **Tag indexing**: Several implementations build cached indexes on events for efficient repeated
|
||||
matching (applesauce: Symbol-based Set cache; rust-nostr: OnceCell BTreeMap).
|
||||
|
||||
5. **No negation**: No implementation supports negative matching (NOT). This aligns with protocol
|
||||
design — rejected for relay performance reasons.
|
||||
|
||||
6. **Limit semantics**: `limit` is not a matching criterion — it's a result count constraint.
|
||||
Most matching functions ignore it. `getFilterLimit`/`GetTheoreticalLimit` computes intrinsic
|
||||
upper bounds based on kind replaceability.
|
||||
|
||||
7. **Merging**: Most implementations provide union-style filter merging. Array fields are unioned
|
||||
and deduplicated. Scalar fields use min/max logic.
|
||||
|
||||
8. **Option vs empty**: rust-nostr explicitly distinguishes `None` (no constraint) from
|
||||
`Some(empty)` (matches nothing). Other implementations handle this implicitly.
|
||||
|
||||
## Considerations for Our Implementation
|
||||
|
||||
1. **Filter as a standalone primitive**: Frame filters independent of REQ messages. They're a
|
||||
general-purpose matching predicate over events.
|
||||
|
||||
2. **Struct design**: Use `Option<BTreeSet<T>>` following rust-nostr's approach — it correctly
|
||||
models the distinction between "no constraint" and "empty constraint." BTreeSet gives
|
||||
deterministic serialization and O(log n) lookups.
|
||||
|
||||
3. **Matching function**: Implement `matches(&self, event: &Event) -> bool` with early exit on
|
||||
scalar fields. Tag matching should use event tag indexes.
|
||||
|
||||
4. **Builder pattern**: Fluent API for construction: `Filter::new().kind(1).author(pk)`.
|
||||
Convenience methods for common tags (#e, #p, #d, #a, #t).
|
||||
|
||||
5. **Generic tag filters**: Support arbitrary single-letter tag filters via
|
||||
`BTreeMap<SingleLetterTag, BTreeSet<String>>` or similar.
|
||||
|
||||
6. **Serialization**: Custom JSON serialization to flatten generic tags as `#tag` keys. Handle
|
||||
`limit: 0` vs omitted limit.
|
||||
|
||||
7. **No merging in core**: Following rust-nostr, keep the filter primitive simple. Merging and
|
||||
combining can live in higher-level utilities if needed.
|
||||
|
||||
8. **Limit calculation**: Consider `getFilterLimit`-style intrinsic limit computation based on
|
||||
kind replaceability — useful for query optimization.
|
||||
|
||||
9. **Dependencies**: Filter should depend only on existing types (Event, EventId, Pubkey, Kind,
|
||||
Timestamp, Tags). Self-contained within coracle-lib.
|
||||
|
||||
10. **Test strategy**: Test matching logic thoroughly — all field types, AND semantics, tag
|
||||
matching, timestamp boundaries, empty sets vs None, edge cases.
|
||||
Reference in New Issue
Block a user