Add filters chapter
This commit is contained in:
@@ -0,0 +1,207 @@
|
||||
# Plan: Filters
|
||||
|
||||
## Topic Summary
|
||||
|
||||
Filters are the NIP-01 data structure for matching events. They form an elegant primitive for
|
||||
matching events independent of the client/relay context — not just for REQ messages, but as a
|
||||
general-purpose event matching and querying abstraction. The chapter covers the filter structure,
|
||||
matching semantics (AND within a filter, OR across filters), tag filters, timestamp constraints,
|
||||
limits, construction, hashing, grouping, and cardinality estimation.
|
||||
|
||||
## Chapter Outline
|
||||
|
||||
1. **Introduction** — Filters as a general-purpose event matching primitive. Not tied to relays;
|
||||
they're a predicate you can evaluate against any event. Analogy to database WHERE clauses.
|
||||
|
||||
2. **The Filter Struct** — Walk through the fields:
|
||||
- `ids: Option<BTreeSet<[u8; 32]>>` — match event IDs
|
||||
- `authors: Option<BTreeSet<PublicKey>>` — match event authors
|
||||
- `kinds: Option<BTreeSet<u16>>` — match event kinds
|
||||
- `tags: BTreeMap<String, BTreeSet<String>>` — match tag values by tag name
|
||||
- `since: Option<u64>` — lower bound on `created_at` (inclusive)
|
||||
- `until: Option<u64>` — upper bound on `created_at` (inclusive)
|
||||
- `limit: Option<usize>` — result count constraint (not a matching criterion)
|
||||
|
||||
Explain `Option` semantics: `None` = no constraint, `Some(empty set)` = matches nothing.
|
||||
Note that `limit` is metadata for consumers, not part of matching logic.
|
||||
|
||||
3. **Matching** — Implement `matches(&self, event: &Event) -> bool`:
|
||||
- AND semantics: all present fields must match
|
||||
- Early exit on scalar checks (ids, kinds, authors) before tag matching
|
||||
- Tag matching: for each tag filter, event must have at least one tag with that name
|
||||
whose value is in the filter's set (OR within a tag filter, AND across tag filters)
|
||||
- Timestamp: `since <= created_at <= until`
|
||||
- `limit` is ignored
|
||||
- Implement `matches_any(filters: &[Filter], event: &Event) -> bool` as a free function
|
||||
for OR-across-filters semantics
|
||||
|
||||
4. **Construction** — Builder pattern with fluent API:
|
||||
- `Filter::new()` — empty filter (matches everything)
|
||||
- `.id(id)` / `.ids(iter)` — add event IDs
|
||||
- `.author(pk)` / `.authors(iter)` — add authors
|
||||
- `.kind(k)` / `.kinds(iter)` — add kinds
|
||||
- `.tag(name, value)` / `.tags(name, iter)` — add arbitrary tag filters
|
||||
- `.since(ts)` / `.until(ts)` — set timestamp bounds
|
||||
- `.limit(n)` — set result limit
|
||||
- `.address(addr)` — convenience: sets kind, author, and `#d` tag from an Address
|
||||
|
||||
5. **Serialization** — Custom serde implementation:
|
||||
- Standard fields serialize normally, skip `None` fields
|
||||
- `tags` BTreeMap flattened: key `"foo"` becomes JSON key `"#foo"` with array value
|
||||
- Handle `limit: 0` vs omitted limit (Some(0) serializes as `"limit": 0`)
|
||||
- Deserialization: any key starting with `#` collected into `tags` map
|
||||
- Show round-trip example
|
||||
|
||||
6. **Identity and Grouping** — Utilities for deduplication and merging:
|
||||
- `filter_id(filter) -> String` — deterministic hash of filter contents for dedup
|
||||
- `filter_group(filter) -> String` — hash of structural fields only (ids, kinds, authors,
|
||||
tag keys) excluding values and temporal fields. Two filters in the same group can be
|
||||
merged by unioning their value sets.
|
||||
|
||||
7. **Cardinality** — `cardinality(&self) -> Option<usize>`:
|
||||
- Returns `Some(n)` when the maximum number of matching events can be determined
|
||||
- `ids` present → `ids.len()`
|
||||
- All kinds are replaceable + `authors` present → `authors.len() * kinds.len()`
|
||||
- All kinds are addressable + `authors` present + `#d` present →
|
||||
`authors.len() * kinds.len() * d_values.len()`
|
||||
- Otherwise → `None` (unbounded)
|
||||
- If explicit `limit` is set, return `min(limit, computed)` when computed is Some,
|
||||
or `Some(limit)` when computed is None
|
||||
- Empty set in any field → `Some(0)`
|
||||
|
||||
8. **Recap** — Summarize filter as a composable primitive. Tease usage in relay connections
|
||||
chapter.
|
||||
|
||||
## API Design
|
||||
|
||||
```rust
|
||||
// --- Filter struct ---
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct Filter {
|
||||
pub ids: Option<BTreeSet<[u8; 32]>>,
|
||||
pub authors: Option<BTreeSet<PublicKey>>,
|
||||
pub kinds: Option<BTreeSet<u16>>,
|
||||
pub since: Option<u64>,
|
||||
pub until: Option<u64>,
|
||||
pub limit: Option<usize>,
|
||||
// Flattened in serde as #key -> [values]
|
||||
pub tags: BTreeMap<String, BTreeSet<String>>,
|
||||
}
|
||||
|
||||
// --- Construction (builder, consuming self) ---
|
||||
|
||||
impl Filter {
|
||||
pub fn new() -> Self
|
||||
pub fn id(self, id: [u8; 32]) -> Self
|
||||
pub fn ids(self, ids: impl IntoIterator<Item = [u8; 32]>) -> Self
|
||||
pub fn author(self, author: PublicKey) -> Self
|
||||
pub fn authors(self, authors: impl IntoIterator<Item = PublicKey>) -> Self
|
||||
pub fn kind(self, kind: u16) -> Self
|
||||
pub fn kinds(self, kinds: impl IntoIterator<Item = u16>) -> Self
|
||||
pub fn tag(self, name: impl Into<String>, value: impl Into<String>) -> Self
|
||||
pub fn tags(self, name: impl Into<String>, values: impl IntoIterator<Item = impl Into<String>>) -> Self
|
||||
pub fn since(self, since: u64) -> Self
|
||||
pub fn until(self, until: u64) -> Self
|
||||
pub fn limit(self, limit: usize) -> Self
|
||||
pub fn address(self, addr: &Address) -> Self
|
||||
}
|
||||
|
||||
// --- Matching ---
|
||||
|
||||
impl Filter {
|
||||
pub fn matches(&self, event: &Event) -> bool
|
||||
pub fn cardinality(&self) -> Option<usize>
|
||||
}
|
||||
|
||||
pub fn matches_any(filters: &[Filter], event: &Event) -> bool
|
||||
|
||||
// --- Identity and grouping ---
|
||||
|
||||
pub fn filter_id(filter: &Filter) -> String
|
||||
pub fn filter_group(filter: &Filter) -> String
|
||||
```
|
||||
|
||||
## Code Organization
|
||||
|
||||
All code in `coracle-lib/src/filters.rs`. Single file, single module. Add `pub mod filters;`
|
||||
to `coracle-lib/src/lib.rs`.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- `serde` / `serde_json` — already used in the events chapter for serialization
|
||||
- `std::collections::BTreeSet` / `BTreeMap` — stdlib, no external crate
|
||||
- `sha2` — already used in events chapter for hashing; reuse for filter_id
|
||||
|
||||
No new external dependencies needed.
|
||||
|
||||
## Narrative Notes
|
||||
|
||||
- Open by framing filters as a standalone primitive. They're a predicate, not a protocol
|
||||
message. The fact that relays use them in REQ is one application, but they're equally
|
||||
useful for client-side filtering, local storage queries, and event routing decisions.
|
||||
|
||||
- The `Option` semantics deserve careful explanation. Show the difference:
|
||||
`None` = "I don't care about this field" vs `Some(empty)` = "this field must match
|
||||
one of these zero values (i.e., nothing matches)". This is the key insight that makes
|
||||
filters composable.
|
||||
|
||||
- When explaining matching, walk through a concrete example: construct a filter, show an
|
||||
event, trace through the matching logic field by field.
|
||||
|
||||
- For tag filters, emphasize that tag keys are arbitrary strings — not restricted to
|
||||
single letters. The single-letter convention is a relay indexing optimization, not a
|
||||
protocol constraint.
|
||||
|
||||
- `limit` gets a brief note: it's not part of matching. It tells a consumer (relay, storage
|
||||
engine) how many results to return. Include it in the struct because it's part of the
|
||||
NIP-01 filter object, but `matches()` ignores it.
|
||||
|
||||
- For serialization, the interesting part is the tag flattening. Show the JSON representation
|
||||
and explain how `tags: {"e": {"abc"}, "p": {"def"}}` becomes `{"#e": ["abc"], "#p": ["def"]}`.
|
||||
|
||||
- `filter_id` and `filter_group` are utility functions, not methods, because they serve
|
||||
infrastructure concerns (dedup, subscription management) rather than core filter semantics.
|
||||
|
||||
- `cardinality` leverages kind classification from the kinds chapter. Connect the dots:
|
||||
replaceable events have at most one per author per kind, addressable events have at most
|
||||
one per author per kind per identifier.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
1. **`Option<BTreeSet<T>>` for set fields** — Preserves the None-vs-empty distinction that
|
||||
NIP-01 requires. BTreeSet gives O(log n) membership checks and deterministic iteration
|
||||
order for serialization/hashing. (Research: rust-nostr uses this approach.)
|
||||
|
||||
2. **Arbitrary string tag keys** — Not restricted to single letters. The protocol allows any
|
||||
tag name; single-letter indexing is a relay optimization. Consumers can enforce restrictions.
|
||||
|
||||
3. **Minimal builder API** — `.id()`, `.author()`, `.kind()`, `.tag()`, `.address()` plus
|
||||
plural variants. No convenience methods for every common tag (#e, #p, #t, etc.) — the
|
||||
generic `.tag("e", value)` is clear enough. Keeps the chapter focused.
|
||||
|
||||
4. **`limit` in struct but not in matching** — NIP-01 defines it as part of the filter object,
|
||||
so it belongs in the struct. But it's a result constraint, not a predicate, so `matches()`
|
||||
ignores it. (Research: NDK, nostr-tools, all implementations agree on this.)
|
||||
|
||||
5. **Free functions for identity/grouping** — `filter_id` and `filter_group` are not methods
|
||||
because they serve infrastructure concerns. Keeps the Filter impl block focused on
|
||||
construction and matching.
|
||||
|
||||
6. **`cardinality` returns `Option<usize>`** — `None` means unbounded. Leverages kind
|
||||
classification (replaceable, addressable) to compute tight upper bounds when possible.
|
||||
(Research: nostr-tools' `getFilterLimit`, nostrlib's `GetTheoreticalLimit`.)
|
||||
|
||||
7. **Custom serde for tag flattening** — Tags serialize as `#name` keys at the top level of
|
||||
the JSON object, matching the NIP-01 wire format. This requires custom Serialize/Deserialize
|
||||
implementations rather than derive macros.
|
||||
|
||||
8. **`.address()` convenience** — Translates an Address into the correct combination of kind,
|
||||
author, and #d tag filter. This is the one domain-aware convenience method because
|
||||
address-based filtering is extremely common and error-prone to construct manually.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- Should `filter_group` include tag *names* (keys) in the group hash, or only the set of
|
||||
field names that are present? Including tag names means `{#e: [...]}` and `{#p: [...]}`
|
||||
are in different groups (correct for merging). Leaning toward including tag names.
|
||||
Reference in New Issue
Block a user