Add filters chapter

This commit is contained in:
Jon Staab
2026-04-21 12:08:55 -07:00
parent c8f6bc1652
commit a8a57a3d77
6 changed files with 1876 additions and 36 deletions
+207
View File
@@ -0,0 +1,207 @@
# Plan: Filters
## Topic Summary
Filters are the NIP-01 data structure for matching events. They form an elegant primitive for
matching events independent of the client/relay context — not just for REQ messages, but as a
general-purpose event matching and querying abstraction. The chapter covers the filter structure,
matching semantics (AND within a filter, OR across filters), tag filters, timestamp constraints,
limits, construction, hashing, grouping, and cardinality estimation.
## Chapter Outline
1. **Introduction** — Filters as a general-purpose event matching primitive. Not tied to relays;
they're a predicate you can evaluate against any event. Analogy to database WHERE clauses.
2. **The Filter Struct** — Walk through the fields:
- `ids: Option<BTreeSet<[u8; 32]>>` — match event IDs
- `authors: Option<BTreeSet<PublicKey>>` — match event authors
- `kinds: Option<BTreeSet<u16>>` — match event kinds
- `tags: BTreeMap<String, BTreeSet<String>>` — match tag values by tag name
- `since: Option<u64>` — lower bound on `created_at` (inclusive)
- `until: Option<u64>` — upper bound on `created_at` (inclusive)
- `limit: Option<usize>` — result count constraint (not a matching criterion)
Explain `Option` semantics: `None` = no constraint, `Some(empty set)` = matches nothing.
Note that `limit` is metadata for consumers, not part of matching logic.
3. **Matching** — Implement `matches(&self, event: &Event) -> bool`:
- AND semantics: all present fields must match
- Early exit on scalar checks (ids, kinds, authors) before tag matching
- Tag matching: for each tag filter, event must have at least one tag with that name
whose value is in the filter's set (OR within a tag filter, AND across tag filters)
- Timestamp: `since <= created_at <= until`
- `limit` is ignored
- Implement `matches_any(filters: &[Filter], event: &Event) -> bool` as a free function
for OR-across-filters semantics
4. **Construction** — Builder pattern with fluent API:
- `Filter::new()` — empty filter (matches everything)
- `.id(id)` / `.ids(iter)` — add event IDs
- `.author(pk)` / `.authors(iter)` — add authors
- `.kind(k)` / `.kinds(iter)` — add kinds
- `.tag(name, value)` / `.tags(name, iter)` — add arbitrary tag filters
- `.since(ts)` / `.until(ts)` — set timestamp bounds
- `.limit(n)` — set result limit
- `.address(addr)` — convenience: sets kind, author, and `#d` tag from an Address
5. **Serialization** — Custom serde implementation:
- Standard fields serialize normally, skip `None` fields
- `tags` BTreeMap flattened: key `"foo"` becomes JSON key `"#foo"` with array value
- Handle `limit: 0` vs omitted limit (Some(0) serializes as `"limit": 0`)
- Deserialization: any key starting with `#` collected into `tags` map
- Show round-trip example
6. **Identity and Grouping** — Utilities for deduplication and merging:
- `filter_id(filter) -> String` — deterministic hash of filter contents for dedup
- `filter_group(filter) -> String` — hash of structural fields only (ids, kinds, authors,
tag keys) excluding values and temporal fields. Two filters in the same group can be
merged by unioning their value sets.
7. **Cardinality**`cardinality(&self) -> Option<usize>`:
- Returns `Some(n)` when the maximum number of matching events can be determined
- `ids` present → `ids.len()`
- All kinds are replaceable + `authors` present → `authors.len() * kinds.len()`
- All kinds are addressable + `authors` present + `#d` present →
`authors.len() * kinds.len() * d_values.len()`
- Otherwise → `None` (unbounded)
- If explicit `limit` is set, return `min(limit, computed)` when computed is Some,
or `Some(limit)` when computed is None
- Empty set in any field → `Some(0)`
8. **Recap** — Summarize filter as a composable primitive. Tease usage in relay connections
chapter.
## API Design
```rust
// --- Filter struct ---
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct Filter {
pub ids: Option<BTreeSet<[u8; 32]>>,
pub authors: Option<BTreeSet<PublicKey>>,
pub kinds: Option<BTreeSet<u16>>,
pub since: Option<u64>,
pub until: Option<u64>,
pub limit: Option<usize>,
// Flattened in serde as #key -> [values]
pub tags: BTreeMap<String, BTreeSet<String>>,
}
// --- Construction (builder, consuming self) ---
impl Filter {
pub fn new() -> Self
pub fn id(self, id: [u8; 32]) -> Self
pub fn ids(self, ids: impl IntoIterator<Item = [u8; 32]>) -> Self
pub fn author(self, author: PublicKey) -> Self
pub fn authors(self, authors: impl IntoIterator<Item = PublicKey>) -> Self
pub fn kind(self, kind: u16) -> Self
pub fn kinds(self, kinds: impl IntoIterator<Item = u16>) -> Self
pub fn tag(self, name: impl Into<String>, value: impl Into<String>) -> Self
pub fn tags(self, name: impl Into<String>, values: impl IntoIterator<Item = impl Into<String>>) -> Self
pub fn since(self, since: u64) -> Self
pub fn until(self, until: u64) -> Self
pub fn limit(self, limit: usize) -> Self
pub fn address(self, addr: &Address) -> Self
}
// --- Matching ---
impl Filter {
pub fn matches(&self, event: &Event) -> bool
pub fn cardinality(&self) -> Option<usize>
}
pub fn matches_any(filters: &[Filter], event: &Event) -> bool
// --- Identity and grouping ---
pub fn filter_id(filter: &Filter) -> String
pub fn filter_group(filter: &Filter) -> String
```
## Code Organization
All code in `coracle-lib/src/filters.rs`. Single file, single module. Add `pub mod filters;`
to `coracle-lib/src/lib.rs`.
## Dependencies
- `serde` / `serde_json` — already used in the events chapter for serialization
- `std::collections::BTreeSet` / `BTreeMap` — stdlib, no external crate
- `sha2` — already used in events chapter for hashing; reuse for filter_id
No new external dependencies needed.
## Narrative Notes
- Open by framing filters as a standalone primitive. They're a predicate, not a protocol
message. The fact that relays use them in REQ is one application, but they're equally
useful for client-side filtering, local storage queries, and event routing decisions.
- The `Option` semantics deserve careful explanation. Show the difference:
`None` = "I don't care about this field" vs `Some(empty)` = "this field must match
one of these zero values (i.e., nothing matches)". This is the key insight that makes
filters composable.
- When explaining matching, walk through a concrete example: construct a filter, show an
event, trace through the matching logic field by field.
- For tag filters, emphasize that tag keys are arbitrary strings — not restricted to
single letters. The single-letter convention is a relay indexing optimization, not a
protocol constraint.
- `limit` gets a brief note: it's not part of matching. It tells a consumer (relay, storage
engine) how many results to return. Include it in the struct because it's part of the
NIP-01 filter object, but `matches()` ignores it.
- For serialization, the interesting part is the tag flattening. Show the JSON representation
and explain how `tags: {"e": {"abc"}, "p": {"def"}}` becomes `{"#e": ["abc"], "#p": ["def"]}`.
- `filter_id` and `filter_group` are utility functions, not methods, because they serve
infrastructure concerns (dedup, subscription management) rather than core filter semantics.
- `cardinality` leverages kind classification from the kinds chapter. Connect the dots:
replaceable events have at most one per author per kind, addressable events have at most
one per author per kind per identifier.
## Design Decisions
1. **`Option<BTreeSet<T>>` for set fields** — Preserves the None-vs-empty distinction that
NIP-01 requires. BTreeSet gives O(log n) membership checks and deterministic iteration
order for serialization/hashing. (Research: rust-nostr uses this approach.)
2. **Arbitrary string tag keys** — Not restricted to single letters. The protocol allows any
tag name; single-letter indexing is a relay optimization. Consumers can enforce restrictions.
3. **Minimal builder API**`.id()`, `.author()`, `.kind()`, `.tag()`, `.address()` plus
plural variants. No convenience methods for every common tag (#e, #p, #t, etc.) — the
generic `.tag("e", value)` is clear enough. Keeps the chapter focused.
4. **`limit` in struct but not in matching** — NIP-01 defines it as part of the filter object,
so it belongs in the struct. But it's a result constraint, not a predicate, so `matches()`
ignores it. (Research: NDK, nostr-tools, all implementations agree on this.)
5. **Free functions for identity/grouping**`filter_id` and `filter_group` are not methods
because they serve infrastructure concerns. Keeps the Filter impl block focused on
construction and matching.
6. **`cardinality` returns `Option<usize>`** — `None` means unbounded. Leverages kind
classification (replaceable, addressable) to compute tight upper bounds when possible.
(Research: nostr-tools' `getFilterLimit`, nostrlib's `GetTheoreticalLimit`.)
7. **Custom serde for tag flattening** — Tags serialize as `#name` keys at the top level of
the JSON object, matching the NIP-01 wire format. This requires custom Serialize/Deserialize
implementations rather than derive macros.
8. **`.address()` convenience** — Translates an Address into the correct combination of kind,
author, and #d tag filter. This is the one domain-aware convenience method because
address-based filtering is extremely common and error-prone to construct manually.
## Open Questions
- Should `filter_group` include tag *names* (keys) in the group hash, or only the set of
field names that are present? Including tag names means `{#e: [...]}` and `{#p: [...]}`
are in different groups (correct for merging). Leaning toward including tag names.