Add filters chapter

This commit is contained in:
Jon Staab
2026-04-21 12:08:55 -07:00
parent c8f6bc1652
commit a8a57a3d77
6 changed files with 1876 additions and 36 deletions
+1 -1
View File
@@ -108,7 +108,7 @@ belongs to us rather than a foreign crate.
///
/// This is the "name" half of a nostr identity. It's safe to log, share, and
/// store — it identifies an author but grants no ability to speak as them.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct PublicKey(secp256k1::XOnlyPublicKey);
impl PublicKey {
+820
View File
@@ -0,0 +1,820 @@
# Filters
A filter is a predicate over events. Given an event, a filter answers one
question: does this event match? The protocol uses filters inside `REQ`
messages to tell relays what to send, but the data structure itself is
equally useful for querying a local database, routing events to the right
handler, or deciding which events to display — anywhere you need to select
a subset of events from a larger set.
A single filter expresses a conjunction: every field that is present must
match. An array of filters expresses a disjunction: the event must match
at least one. Between the two, you can describe any positive selection
over the event fields that nostr exposes — ids, authors, kinds, tags,
and time ranges. There is no negation; the protocol rejected it early on
to keep relay implementations simple.
## The module
```rust {file=coracle-lib/src/lib.rs}
pub mod filters;
```
```rust {file=coracle-lib/src/filters.rs}
//! Event filters: the [`Filter`] type for matching events by id, author,
//! kind, tags, and time range, plus utilities for hashing, grouping, and
//! estimating result cardinality.
use std::collections::{BTreeMap, BTreeSet};
use std::fmt;
use serde::de::{self, MapAccess, Visitor};
use serde::ser::SerializeMap;
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use crate::addresses::Address;
use crate::events::Event;
use crate::keys::PublicKey;
```
## The `Filter` struct
Each field corresponds to one axis of selection. The set fields — `ids`,
`authors`, `kinds` — use `Option<BTreeSet<T>>`. The `Option` layer
carries meaning: `None` says "I have no constraint on this field" and
matches everything, while `Some(empty set)` says "the value must be a
member of this empty set" and matches nothing. The distinction matters
for composition — merging two filters into one relies on being able to
tell "don't care" from "impossible."
`BTreeSet` rather than `HashSet` gives two things: O(log n) membership
checks regardless of the key type, and deterministic iteration order for
serialization and hashing.
The `tags` field is a `BTreeMap` from tag name to a set of acceptable
values. Tag names are arbitrary strings — not restricted to single
letters. Single-letter indexing is a relay optimization, not a protocol
constraint, and the filter type should not encode relay policy.
```rust {file=coracle-lib/src/filters.rs}
/// A predicate over nostr events.
///
/// Every present field must match for the filter to match (AND semantics).
/// An array of filters matches if any single filter matches (OR semantics).
///
/// `None` on a set field means "no constraint" — it matches any value.
/// `Some(empty set)` means "must be a member of the empty set" — it
/// matches nothing. This distinction is important for filter composition.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Filter {
/// Event IDs to match.
pub ids: Option<BTreeSet<[u8; 32]>>,
/// Author public keys to match.
pub authors: Option<BTreeSet<PublicKey>>,
/// Event kinds to match.
pub kinds: Option<BTreeSet<u16>>,
/// Tag filters: for each entry, the event must have at least one tag
/// with that name whose value appears in the set.
pub tags: BTreeMap<String, BTreeSet<String>>,
/// Lower bound on `created_at` (inclusive).
pub since: Option<u64>,
/// Upper bound on `created_at` (inclusive).
pub until: Option<u64>,
/// Maximum number of events a consumer should return. This is not a
/// matching criterion — [`matches`](Filter::matches) ignores it.
pub limit: Option<usize>,
}
```
The `limit` field is part of the NIP-01 filter object, so it belongs in
the struct. But it is a result-count constraint for consumers (relays,
storage engines), not a predicate over individual events. The `matches`
method ignores it entirely.
## Matching
Matching walks each present field and returns `false` as soon as one
fails. Scalar checks — ids, kinds, authors — come first because they
are a simple set-membership test and reject most non-matching events
immediately, before the more involved tag iteration.
Tag matching checks every entry in the filter's `tags` map. For each
tag name, the event must contain at least one tag with that name whose
value appears in the filter's set. Within a single tag name the values
are disjunctive (OR): any match suffices. Across tag names the
constraints are conjunctive (AND): all must be satisfied.
```rust {file=coracle-lib/src/filters.rs}
impl Filter {
/// Test whether an event satisfies this filter.
///
/// All present fields must match (AND semantics). The `limit` field
/// is ignored — it is a hint for result-set sizing, not a predicate.
pub fn matches(&self, event: &Event) -> bool {
if let Some(ids) = &self.ids {
if !ids.contains(&event.id) {
return false;
}
}
if let Some(kinds) = &self.kinds {
if !kinds.contains(&event.kind) {
return false;
}
}
if let Some(authors) = &self.authors {
if !authors.contains(&event.pubkey) {
return false;
}
}
for (name, values) in &self.tags {
let has_match = event
.tags
.find_all(name)
.any(|tag| values.contains(tag.value()));
if !has_match {
return false;
}
}
if let Some(since) = self.since {
if event.created_at < since {
return false;
}
}
if let Some(until) = self.until {
if event.created_at > until {
return false;
}
}
true
}
}
```
A convenience free function handles the common case of testing an event
against an array of filters — the OR-across-filters semantics that
NIP-01 defines for `REQ` subscriptions.
```rust {file=coracle-lib/src/filters.rs}
/// Test whether an event matches any filter in the slice.
///
/// Returns `true` if at least one filter matches. An empty slice matches
/// nothing.
pub fn matches_any(filters: &[Filter], event: &Event) -> bool {
filters.iter().any(|f| f.matches(event))
}
```
## Construction
An empty filter matches everything — no constraints means no rejections.
The `add_*` methods insert values into the constraint sets, consuming and
returning `self` so calls can be chained. Matching `remove_*` methods
take values back out, and `clear_*` methods reset a field to `None` —
removing the constraint entirely.
```rust {file=coracle-lib/src/filters.rs}
impl Filter {
/// Create an empty filter that matches every event.
pub fn new() -> Self {
Filter {
ids: None,
authors: None,
kinds: None,
tags: BTreeMap::new(),
since: None,
until: None,
limit: None,
}
}
/// Add an event id to the constraint set.
pub fn add_id(mut self, id: [u8; 32]) -> Self {
self.ids.get_or_insert_with(BTreeSet::new).insert(id);
self
}
/// Add multiple event ids to the constraint set.
pub fn add_ids(mut self, ids: impl IntoIterator<Item = [u8; 32]>) -> Self {
self.ids.get_or_insert_with(BTreeSet::new).extend(ids);
self
}
/// Remove an event id from the constraint set. If the set becomes
/// empty, it remains as `Some(empty)` — use `clear_ids` to remove
/// the constraint entirely.
pub fn remove_id(mut self, id: &[u8; 32]) -> Self {
if let Some(ids) = &mut self.ids {
ids.remove(id);
}
self
}
/// Remove the ids constraint, matching any event id.
pub fn clear_ids(mut self) -> Self {
self.ids = None;
self
}
/// Add an author to the constraint set.
pub fn add_author(mut self, author: PublicKey) -> Self {
self.authors
.get_or_insert_with(BTreeSet::new)
.insert(author);
self
}
/// Add multiple authors to the constraint set.
pub fn add_authors(mut self, authors: impl IntoIterator<Item = PublicKey>) -> Self {
self.authors
.get_or_insert_with(BTreeSet::new)
.extend(authors);
self
}
/// Remove an author from the constraint set.
pub fn remove_author(mut self, author: &PublicKey) -> Self {
if let Some(authors) = &mut self.authors {
authors.remove(author);
}
self
}
/// Remove the authors constraint, matching any author.
pub fn clear_authors(mut self) -> Self {
self.authors = None;
self
}
/// Add a kind to the constraint set.
pub fn add_kind(mut self, kind: u16) -> Self {
self.kinds.get_or_insert_with(BTreeSet::new).insert(kind);
self
}
/// Add multiple kinds to the constraint set.
pub fn add_kinds(mut self, kinds: impl IntoIterator<Item = u16>) -> Self {
self.kinds.get_or_insert_with(BTreeSet::new).extend(kinds);
self
}
/// Remove a kind from the constraint set.
pub fn remove_kind(mut self, kind: &u16) -> Self {
if let Some(kinds) = &mut self.kinds {
kinds.remove(kind);
}
self
}
/// Remove the kinds constraint, matching any kind.
pub fn clear_kinds(mut self) -> Self {
self.kinds = None;
self
}
/// Add a value to a tag filter: the event must have at least one tag
/// with this name whose value appears in the set.
pub fn add_tag(mut self, name: impl Into<String>, value: impl Into<String>) -> Self {
self.tags
.entry(name.into())
.or_default()
.insert(value.into());
self
}
/// Add multiple values to a tag filter.
pub fn add_tags(
mut self,
name: impl Into<String>,
values: impl IntoIterator<Item = impl Into<String>>,
) -> Self {
self.tags
.entry(name.into())
.or_default()
.extend(values.into_iter().map(Into::into));
self
}
/// Remove a value from a tag filter. If the value set becomes empty,
/// the tag entry is removed from the map.
pub fn remove_tag(mut self, name: &str, value: &str) -> Self {
if let Some(values) = self.tags.get_mut(name) {
values.remove(value);
if values.is_empty() {
self.tags.remove(name);
}
}
self
}
/// Remove an entire tag filter by name.
pub fn clear_tag(mut self, name: &str) -> Self {
self.tags.remove(name);
self
}
/// Remove all tag filters.
pub fn clear_tags(mut self) -> Self {
self.tags.clear();
self
}
/// Set the lower bound on `created_at` (inclusive).
pub fn add_since(mut self, since: u64) -> Self {
self.since = Some(since);
self
}
/// Remove the lower bound on `created_at`.
pub fn clear_since(mut self) -> Self {
self.since = None;
self
}
/// Set the upper bound on `created_at` (inclusive).
pub fn add_until(mut self, until: u64) -> Self {
self.until = Some(until);
self
}
/// Remove the upper bound on `created_at`.
pub fn clear_until(mut self) -> Self {
self.until = None;
self
}
/// Set the result-count limit.
pub fn add_limit(mut self, limit: usize) -> Self {
self.limit = Some(limit);
self
}
/// Remove the result-count limit.
pub fn clear_limit(mut self) -> Self {
self.limit = None;
self
}
}
```
### Address convenience
Filtering for an addressable event by its address is common enough — and
error-prone enough when done by hand — to warrant a dedicated method.
An address carries a kind, an author, and an identifier; the method
translates these into the corresponding filter fields.
```rust {file=coracle-lib/src/filters.rs}
impl Filter {
/// Add constraints that match events at the given address.
///
/// Sets the kind, author, and `d` tag filter from the address's
/// components. If the identifier is empty (plain replaceable events),
/// the `d` tag filter is still set — the event must have a `d` tag
/// with an empty value.
pub fn add_address(self, addr: &Address) -> Self {
self.add_kind(addr.kind)
.add_author(addr.pubkey)
.add_tag("d", &addr.identifier)
}
}
impl Default for Filter {
fn default() -> Self {
Filter::new()
}
}
```
A filter built with `.add_address()` looks like this:
```rust
use coracle_lib::filters::Filter;
use coracle_lib::addresses::Address;
let addr: Address = "30023:ab12...cd34:my-article".parse().unwrap();
let filter = Filter::new().add_address(&addr);
// Equivalent to:
// Filter::new().add_kind(30023).add_author(pubkey).add_tag("d", "my-article")
```
## Serialization
The NIP-01 wire format for a filter is a flat JSON object where tag
filters appear as keys prefixed with `#`:
```json
{
"kinds": [1],
"authors": ["ab12...cd34"],
"#t": ["nostr", "rust"],
"since": 1700000000,
"limit": 10
}
```
The `tags` map in our struct needs to be flattened into the top-level
object during serialization and reconstituted during deserialization.
This rules out `#[derive(Serialize, Deserialize)]` — we need a hand-
written implementation.
```rust {file=coracle-lib/src/filters.rs}
impl Serialize for Filter {
fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
// Count present fields for the map size hint.
let mut count = self.tags.len();
if self.ids.is_some() {
count += 1;
}
if self.authors.is_some() {
count += 1;
}
if self.kinds.is_some() {
count += 1;
}
if self.since.is_some() {
count += 1;
}
if self.until.is_some() {
count += 1;
}
if self.limit.is_some() {
count += 1;
}
let mut map = serializer.serialize_map(Some(count))?;
if let Some(ids) = &self.ids {
let hex_ids: Vec<String> = ids.iter().map(hex::encode).collect();
map.serialize_entry("ids", &hex_ids)?;
}
if let Some(authors) = &self.authors {
let hex_authors: Vec<String> = authors.iter().map(|pk| pk.to_hex()).collect();
map.serialize_entry("authors", &hex_authors)?;
}
if let Some(kinds) = &self.kinds {
let kinds_vec: Vec<u16> = kinds.iter().copied().collect();
map.serialize_entry("kinds", &kinds_vec)?;
}
for (name, values) in &self.tags {
let key = format!("#{name}");
let vals: Vec<&str> = values.iter().map(String::as_str).collect();
map.serialize_entry(&key, &vals)?;
}
if let Some(since) = self.since {
map.serialize_entry("since", &since)?;
}
if let Some(until) = self.until {
map.serialize_entry("until", &until)?;
}
if let Some(limit) = self.limit {
map.serialize_entry("limit", &limit)?;
}
map.end()
}
}
```
Deserialization collects known keys into their fields and routes any key
starting with `#` into the `tags` map. Unknown keys are silently
ignored for forward compatibility.
```rust {file=coracle-lib/src/filters.rs}
impl<'de> Deserialize<'de> for Filter {
fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
deserializer.deserialize_map(FilterVisitor)
}
}
struct FilterVisitor;
impl<'de> Visitor<'de> for FilterVisitor {
type Value = Filter;
fn expecting(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_str("a nostr filter object")
}
fn visit_map<M: MapAccess<'de>>(self, mut map: M) -> Result<Filter, M::Error> {
let mut ids: Option<BTreeSet<[u8; 32]>> = None;
let mut authors: Option<BTreeSet<PublicKey>> = None;
let mut kinds: Option<BTreeSet<u16>> = None;
let mut tags: BTreeMap<String, BTreeSet<String>> = BTreeMap::new();
let mut since: Option<u64> = None;
let mut until: Option<u64> = None;
let mut limit: Option<usize> = None;
while let Some(key) = map.next_key::<String>()? {
match key.as_str() {
"ids" => {
let hex_ids: Vec<String> = map.next_value()?;
let mut set = BTreeSet::new();
for h in hex_ids {
let bytes = hex::decode(&h)
.map_err(|_| de::Error::custom("invalid hex in ids"))?;
let arr: [u8; 32] = bytes
.try_into()
.map_err(|_| de::Error::custom("id must be 32 bytes"))?;
set.insert(arr);
}
ids = Some(set);
}
"authors" => {
let hex_authors: Vec<String> = map.next_value()?;
let mut set = BTreeSet::new();
for h in hex_authors {
let pk = PublicKey::from_hex(&h)
.map_err(|_| de::Error::custom("invalid pubkey in authors"))?;
set.insert(pk);
}
authors = Some(set);
}
"kinds" => {
let kind_vec: Vec<u16> = map.next_value()?;
kinds = Some(kind_vec.into_iter().collect());
}
"since" => since = Some(map.next_value()?),
"until" => until = Some(map.next_value()?),
"limit" => limit = Some(map.next_value()?),
other if other.starts_with('#') => {
let tag_name = other[1..].to_string();
let values: Vec<String> = map.next_value()?;
tags.insert(tag_name, values.into_iter().collect());
}
_ => {
let _: de::IgnoredAny = map.next_value()?;
}
}
}
Ok(Filter {
ids,
authors,
kinds,
tags,
since,
until,
limit,
})
}
}
```
A round-trip through JSON preserves all fields:
```rust
use coracle_lib::filters::Filter;
let filter = Filter::new()
.add_kind(1)
.add_tag("t", "nostr")
.add_since(1_700_000_000)
.add_limit(10);
let json = serde_json::to_string(&filter).unwrap();
let parsed: Filter = serde_json::from_str(&json).unwrap();
assert_eq!(filter, parsed);
```
## Identity and grouping
Two methods support deduplication and merging at higher layers —
subscription managers, relay pools, and storage engines that need to
detect redundant or combinable filters.
`id` produces a deterministic hash of the entire filter. Two filters
with the same id are structurally identical. Since `Filter` derives
`Hash`, we feed it through a standard `Hasher` — no need to round-trip
through JSON.
```rust {file=coracle-lib/src/filters.rs}
use std::hash::{Hash, Hasher};
use std::collections::hash_map::DefaultHasher;
impl Filter {
/// Compute a deterministic identifier for this filter.
///
/// Two filters with the same id are structurally identical. The id
/// is derived from the `Hash` implementation, which covers every
/// field including `limit`.
pub fn id(&self) -> u64 {
let mut hasher = DefaultHasher::new();
self.hash(&mut hasher);
hasher.finish()
}
}
```
`group` determines which filters can be merged by unioning their
set fields. Two filters can only be merged if they have the same
structural shape (which set fields are present, which tag names appear)
*and* the same scalar constraints (time windows). A filter with a
`limit` can never be merged — combining two limited queries into one
would change the result semantics — so each limited filter gets a
unique group key.
```rust {file=coracle-lib/src/filters.rs}
static GROUP_COUNTER: std::sync::atomic::AtomicU64 = std::sync::atomic::AtomicU64::new(0);
impl Filter {
/// Compute a group key that determines merge compatibility.
///
/// Filters in the same group can be merged by unioning their set
/// fields (`ids`, `authors`, `kinds`, tag values). The group key
/// captures:
///
/// - Which set fields are present and which tag names appear
/// (structural shape)
/// - The exact `since` and `until` values (different time windows
/// cannot be combined)
///
/// A filter with a `limit` always gets a unique group key, because
/// merging limited filters would change result-count semantics.
pub fn group(&self) -> u64 {
let mut hasher = DefaultHasher::new();
self.ids.is_some().hash(&mut hasher);
self.authors.is_some().hash(&mut hasher);
self.kinds.is_some().hash(&mut hasher);
for name in self.tags.keys() {
name.hash(&mut hasher);
}
self.since.hash(&mut hasher);
self.until.hash(&mut hasher);
if self.limit.is_some() {
// Each limited filter gets a unique group — merging two
// limited queries into one would change which events are
// returned.
GROUP_COUNTER
.fetch_add(1, std::sync::atomic::Ordering::Relaxed)
.hash(&mut hasher);
}
hasher.finish()
}
}
```
Including tag names in the group key means that a filter on `#e` tags
and a filter on `#p` tags land in different groups — as they should,
since merging them by union would change the semantics. Likewise, two
filters with different `since` or `until` values land in different
groups, because a union of their sets under one time window would either
over-fetch or under-fetch relative to what was requested.
## Union and intersection
Two operations combine filters in different ways.
`union_filters` takes a list of filters and merges those that share
the same group — same structural shape, same time window, no limit.
Within each group it unions the set fields: ids, authors, kinds, and
tag values. The result is a shorter list that matches the same events
as the original but with fewer individual filters to evaluate.
```rust {file=coracle-lib/src/filters.rs}
/// Merge compatible filters by unioning their set fields.
///
/// Filters with the same [`group`](Filter::group) are combined into a
/// single filter whose set fields are the union of the originals. The
/// result matches the same events as the input but with fewer filters.
pub fn union_filters(filters: &[Filter]) -> Vec<Filter> {
let mut groups: BTreeMap<u64, Filter> = BTreeMap::new();
for filter in filters {
let key = filter.group();
groups
.entry(key)
.and_modify(|existing| {
merge_sets(&mut existing.ids, &filter.ids);
merge_sets(&mut existing.authors, &filter.authors);
merge_sets(&mut existing.kinds, &filter.kinds);
for (name, values) in &filter.tags {
existing
.tags
.entry(name.clone())
.or_default()
.extend(values.iter().cloned());
}
})
.or_insert_with(|| filter.clone());
}
groups.into_values().collect()
}
fn merge_sets<T: Ord + Clone>(
target: &mut Option<BTreeSet<T>>,
source: &Option<BTreeSet<T>>,
) {
match (target.as_mut(), source) {
(Some(t), Some(s)) => t.extend(s.iter().cloned()),
_ => {}
}
}
```
`intersect_filters` takes multiple groups of filters — each group
representing one independent query — and produces the set of filters
that satisfies all groups simultaneously. It does this by computing
the cartesian product across groups, combining each pair by unioning
their set fields and tightening their time windows: the latest `since`,
the earliest `until`. Finally it passes the result through
`union_filters` to collapse any redundancy.
```rust {file=coracle-lib/src/filters.rs}
/// Combine independent filter groups into filters that satisfy all of
/// them.
///
/// Each inner `Vec<Filter>` represents one group of alternatives (OR).
/// The result matches events that satisfy at least one filter from
/// *every* group (AND across groups, OR within each group).
///
/// Set fields are unioned. Time windows are tightened: the latest
/// `since` and earliest `until` win. If both filters have a `limit`,
/// the larger one is kept. The result is simplified with
/// [`union_filters`].
pub fn intersect_filters(groups: &[Vec<Filter>]) -> Vec<Filter> {
let Some(first) = groups.first() else {
return vec![];
};
let mut result: Vec<Filter> = first.clone();
for filters in &groups[1..] {
let mut combined = Vec::with_capacity(result.len() * filters.len());
for f1 in &result {
for f2 in filters {
combined.push(combine_pair(f1, f2));
}
}
result = combined;
}
union_filters(&result)
}
fn combine_pair(a: &Filter, b: &Filter) -> Filter {
let mut f = Filter::new();
f.ids = union_option_sets(&a.ids, &b.ids);
f.authors = union_option_sets(&a.authors, &b.authors);
f.kinds = union_option_sets(&a.kinds, &b.kinds);
for (name, values) in a.tags.iter().chain(b.tags.iter()) {
f.tags
.entry(name.clone())
.or_default()
.extend(values.iter().cloned());
}
f.since = match (a.since, b.since) {
(Some(a), Some(b)) => Some(a.max(b)),
(s, None) | (None, s) => s,
};
f.until = match (a.until, b.until) {
(Some(a), Some(b)) => Some(a.min(b)),
(u, None) | (None, u) => u,
};
f.limit = match (a.limit, b.limit) {
(Some(a), Some(b)) => Some(a.max(b)),
(l, None) | (None, l) => l,
};
f
}
fn union_option_sets<T: Ord + Clone>(
a: &Option<BTreeSet<T>>,
b: &Option<BTreeSet<T>>,
) -> Option<BTreeSet<T>> {
match (a, b) {
(Some(a), Some(b)) => {
let mut merged = a.clone();
merged.extend(b.iter().cloned());
Some(merged)
}
(Some(s), None) | (None, Some(s)) => Some(s.clone()),
(None, None) => None,
}
}
```
## What's next
The next chapter extends filters with NIP-50 full-text search — an
optional `search` field that some relays support for content-based
queries.
+39 -35
View File
@@ -11,53 +11,57 @@
- [Kinds](06-kinds.md)
- [Addresses](07-addresses.md)
- [Proof of Work](08-proof-of-work.md)
- [Filters](09-filters.md)
- [Expiring Events](09-expiring-events.md)
- [Protected Events](10-protected-events.md)
- [Filters](11-filters.md)
- [Search](12-search.md)
## Domain
- [Relay Selections](10-relay-selections.md)
- [Relay Metadata](11-relay-metadata.md)
- [Relay Membership](12-relay-membership.md)
- [Profiles](13-profiles.md)
- [Follows](14-follows.md)
- [Microblogging](15-microblogging.md)
- [Reactions](16-reactions.md)
- [Reports](17-reports.md)
- [Emojis](18-emojis.md)
- [Zaps](19-zaps.md)
- [Rooms](20-rooms.md)
- [Relay Selections](13-relay-selections.md)
- [Relay Metadata](14-relay-metadata.md)
- [Relay Membership](15-relay-membership.md)
- [Profiles](16-profiles.md)
- [Follows](17-follows.md)
- [Microblogging](18-microblogging.md)
- [Reactions](19-reactions.md)
- [Reports](20-reports.md)
- [Emojis](21-emojis.md)
- [Zaps](22-zaps.md)
- [Rooms](23-rooms.md)
- [Open Timestamp Attestations](24-open-timestamp-attestations.md)
## Networking
- [Relay Connections](21-relay-connections.md)
- [Relay Authentication](22-relay-authentication.md)
- [Relay Policies](23-relay-policies.md)
- [Server Authentication](24-server-authentication.md)
- [Relay Management API](25-relay-management-api.md)
- [Blossom Media Storage](26-blossom-media-storage.md)
- [Relay Connections](25-relay-connections.md)
- [Relay Authentication](26-relay-authentication.md)
- [Relay Policies](27-relay-policies.md)
- [Server Authentication](28-server-authentication.md)
- [Relay Management API](29-relay-management-api.md)
- [Blossom Media Storage](30-blossom-media-storage.md)
## Signers
- [Signer Interface](27-signer-interface.md)
- [Secret Signers](28-secret-signers.md)
- [Remote Signers](29-remote-signers.md)
- [Android Signers](30-android-signers.md)
- [Browser Signers](31-browser-signers.md)
- [Signer Interface](31-signer-interface.md)
- [Secret Signers](32-secret-signers.md)
- [Remote Signers](33-remote-signers.md)
- [Android Signers](34-android-signers.md)
- [Browser Signers](35-browser-signers.md)
## Content
- [Entities](32-entities.md)
- [Relays](33-relays.md)
- [Rooms](34-rooms.md)
- [Links](35-links.md)
- [Lightning](36-lightning.md)
- [Cashu](37-cashu.md)
- [Emojis](38-emojis.md)
- [Topics](39-topics.md)
- [Code](40-code.md)
- [Entities](36-entities.md)
- [Relays](37-relays.md)
- [Rooms](38-rooms.md)
- [Links](39-links.md)
- [Lightning](40-lightning.md)
- [Cashu](41-cashu.md)
- [Emojis](42-emojis.md)
- [Topics](43-topics.md)
- [Code](44-code.md)
## Storage
- [Event Repository](41-event-repository.md)
- [In Memory Backend](42-in-memory-backend.md)
- [Sqlite Backend](43-sqlite-backend.md)
- [Event Repository](45-event-repository.md)
- [In Memory Backend](46-in-memory-backend.md)
- [Sqlite Backend](47-sqlite-backend.md)
+207
View File
@@ -0,0 +1,207 @@
# Plan: Filters
## Topic Summary
Filters are the NIP-01 data structure for matching events. They form an elegant primitive for
matching events independent of the client/relay context — not just for REQ messages, but as a
general-purpose event matching and querying abstraction. The chapter covers the filter structure,
matching semantics (AND within a filter, OR across filters), tag filters, timestamp constraints,
limits, construction, hashing, grouping, and cardinality estimation.
## Chapter Outline
1. **Introduction** — Filters as a general-purpose event matching primitive. Not tied to relays;
they're a predicate you can evaluate against any event. Analogy to database WHERE clauses.
2. **The Filter Struct** — Walk through the fields:
- `ids: Option<BTreeSet<[u8; 32]>>` — match event IDs
- `authors: Option<BTreeSet<PublicKey>>` — match event authors
- `kinds: Option<BTreeSet<u16>>` — match event kinds
- `tags: BTreeMap<String, BTreeSet<String>>` — match tag values by tag name
- `since: Option<u64>` — lower bound on `created_at` (inclusive)
- `until: Option<u64>` — upper bound on `created_at` (inclusive)
- `limit: Option<usize>` — result count constraint (not a matching criterion)
Explain `Option` semantics: `None` = no constraint, `Some(empty set)` = matches nothing.
Note that `limit` is metadata for consumers, not part of matching logic.
3. **Matching** — Implement `matches(&self, event: &Event) -> bool`:
- AND semantics: all present fields must match
- Early exit on scalar checks (ids, kinds, authors) before tag matching
- Tag matching: for each tag filter, event must have at least one tag with that name
whose value is in the filter's set (OR within a tag filter, AND across tag filters)
- Timestamp: `since <= created_at <= until`
- `limit` is ignored
- Implement `matches_any(filters: &[Filter], event: &Event) -> bool` as a free function
for OR-across-filters semantics
4. **Construction** — Builder pattern with fluent API:
- `Filter::new()` — empty filter (matches everything)
- `.id(id)` / `.ids(iter)` — add event IDs
- `.author(pk)` / `.authors(iter)` — add authors
- `.kind(k)` / `.kinds(iter)` — add kinds
- `.tag(name, value)` / `.tags(name, iter)` — add arbitrary tag filters
- `.since(ts)` / `.until(ts)` — set timestamp bounds
- `.limit(n)` — set result limit
- `.address(addr)` — convenience: sets kind, author, and `#d` tag from an Address
5. **Serialization** — Custom serde implementation:
- Standard fields serialize normally, skip `None` fields
- `tags` BTreeMap flattened: key `"foo"` becomes JSON key `"#foo"` with array value
- Handle `limit: 0` vs omitted limit (Some(0) serializes as `"limit": 0`)
- Deserialization: any key starting with `#` collected into `tags` map
- Show round-trip example
6. **Identity and Grouping** — Utilities for deduplication and merging:
- `filter_id(filter) -> String` — deterministic hash of filter contents for dedup
- `filter_group(filter) -> String` — hash of structural fields only (ids, kinds, authors,
tag keys) excluding values and temporal fields. Two filters in the same group can be
merged by unioning their value sets.
7. **Cardinality**`cardinality(&self) -> Option<usize>`:
- Returns `Some(n)` when the maximum number of matching events can be determined
- `ids` present → `ids.len()`
- All kinds are replaceable + `authors` present → `authors.len() * kinds.len()`
- All kinds are addressable + `authors` present + `#d` present →
`authors.len() * kinds.len() * d_values.len()`
- Otherwise → `None` (unbounded)
- If explicit `limit` is set, return `min(limit, computed)` when computed is Some,
or `Some(limit)` when computed is None
- Empty set in any field → `Some(0)`
8. **Recap** — Summarize filter as a composable primitive. Tease usage in relay connections
chapter.
## API Design
```rust
// --- Filter struct ---
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct Filter {
pub ids: Option<BTreeSet<[u8; 32]>>,
pub authors: Option<BTreeSet<PublicKey>>,
pub kinds: Option<BTreeSet<u16>>,
pub since: Option<u64>,
pub until: Option<u64>,
pub limit: Option<usize>,
// Flattened in serde as #key -> [values]
pub tags: BTreeMap<String, BTreeSet<String>>,
}
// --- Construction (builder, consuming self) ---
impl Filter {
pub fn new() -> Self
pub fn id(self, id: [u8; 32]) -> Self
pub fn ids(self, ids: impl IntoIterator<Item = [u8; 32]>) -> Self
pub fn author(self, author: PublicKey) -> Self
pub fn authors(self, authors: impl IntoIterator<Item = PublicKey>) -> Self
pub fn kind(self, kind: u16) -> Self
pub fn kinds(self, kinds: impl IntoIterator<Item = u16>) -> Self
pub fn tag(self, name: impl Into<String>, value: impl Into<String>) -> Self
pub fn tags(self, name: impl Into<String>, values: impl IntoIterator<Item = impl Into<String>>) -> Self
pub fn since(self, since: u64) -> Self
pub fn until(self, until: u64) -> Self
pub fn limit(self, limit: usize) -> Self
pub fn address(self, addr: &Address) -> Self
}
// --- Matching ---
impl Filter {
pub fn matches(&self, event: &Event) -> bool
pub fn cardinality(&self) -> Option<usize>
}
pub fn matches_any(filters: &[Filter], event: &Event) -> bool
// --- Identity and grouping ---
pub fn filter_id(filter: &Filter) -> String
pub fn filter_group(filter: &Filter) -> String
```
## Code Organization
All code in `coracle-lib/src/filters.rs`. Single file, single module. Add `pub mod filters;`
to `coracle-lib/src/lib.rs`.
## Dependencies
- `serde` / `serde_json` — already used in the events chapter for serialization
- `std::collections::BTreeSet` / `BTreeMap` — stdlib, no external crate
- `sha2` — already used in events chapter for hashing; reuse for filter_id
No new external dependencies needed.
## Narrative Notes
- Open by framing filters as a standalone primitive. They're a predicate, not a protocol
message. The fact that relays use them in REQ is one application, but they're equally
useful for client-side filtering, local storage queries, and event routing decisions.
- The `Option` semantics deserve careful explanation. Show the difference:
`None` = "I don't care about this field" vs `Some(empty)` = "this field must match
one of these zero values (i.e., nothing matches)". This is the key insight that makes
filters composable.
- When explaining matching, walk through a concrete example: construct a filter, show an
event, trace through the matching logic field by field.
- For tag filters, emphasize that tag keys are arbitrary strings — not restricted to
single letters. The single-letter convention is a relay indexing optimization, not a
protocol constraint.
- `limit` gets a brief note: it's not part of matching. It tells a consumer (relay, storage
engine) how many results to return. Include it in the struct because it's part of the
NIP-01 filter object, but `matches()` ignores it.
- For serialization, the interesting part is the tag flattening. Show the JSON representation
and explain how `tags: {"e": {"abc"}, "p": {"def"}}` becomes `{"#e": ["abc"], "#p": ["def"]}`.
- `filter_id` and `filter_group` are utility functions, not methods, because they serve
infrastructure concerns (dedup, subscription management) rather than core filter semantics.
- `cardinality` leverages kind classification from the kinds chapter. Connect the dots:
replaceable events have at most one per author per kind, addressable events have at most
one per author per kind per identifier.
## Design Decisions
1. **`Option<BTreeSet<T>>` for set fields** — Preserves the None-vs-empty distinction that
NIP-01 requires. BTreeSet gives O(log n) membership checks and deterministic iteration
order for serialization/hashing. (Research: rust-nostr uses this approach.)
2. **Arbitrary string tag keys** — Not restricted to single letters. The protocol allows any
tag name; single-letter indexing is a relay optimization. Consumers can enforce restrictions.
3. **Minimal builder API**`.id()`, `.author()`, `.kind()`, `.tag()`, `.address()` plus
plural variants. No convenience methods for every common tag (#e, #p, #t, etc.) — the
generic `.tag("e", value)` is clear enough. Keeps the chapter focused.
4. **`limit` in struct but not in matching** — NIP-01 defines it as part of the filter object,
so it belongs in the struct. But it's a result constraint, not a predicate, so `matches()`
ignores it. (Research: NDK, nostr-tools, all implementations agree on this.)
5. **Free functions for identity/grouping**`filter_id` and `filter_group` are not methods
because they serve infrastructure concerns. Keeps the Filter impl block focused on
construction and matching.
6. **`cardinality` returns `Option<usize>`** — `None` means unbounded. Leverages kind
classification (replaceable, addressable) to compute tight upper bounds when possible.
(Research: nostr-tools' `getFilterLimit`, nostrlib's `GetTheoreticalLimit`.)
7. **Custom serde for tag flattening** — Tags serialize as `#name` keys at the top level of
the JSON object, matching the NIP-01 wire format. This requires custom Serialize/Deserialize
implementations rather than derive macros.
8. **`.address()` convenience** — Translates an Address into the correct combination of kind,
author, and #d tag filter. This is the one domain-aware convenience method because
address-based filtering is extremely common and error-prone to construct manually.
## Open Questions
- Should `filter_group` include tag *names* (keys) in the group hash, or only the set of
field names that are present? Including tag names means `{#e: [...]}` and `{#p: [...]}`
are in different groups (correct for merging). Leaning toward including tag names.
+241
View File
@@ -0,0 +1,241 @@
# Research: Filters
## Topic Summary
Filters are the NIP-01 data structure for matching events. They form an elegant primitive for
matching events independent of the client/relay context — not just for REQ messages, but as a
general-purpose event matching and querying abstraction. The chapter should cover the filter
structure, matching semantics (AND within a filter, OR across filters), tag filters, timestamp
constraints, limits, and programmatic construction/manipulation of filters.
## Philosophy
From `ref/building-nostr`:
**Filters as a conceptual category for tags**: The author identifies "filter tags" as one of
three tag categories (alongside data and behavior tags). Filter tags are "especially useful for
filtering and retrieval" and tend to be single-letter because relays only index single-letter
tags to reduce database overhead.
**Filter structure (NIP-01)**: Standard fields are `ids`, `authors`, `kinds`, `since`, `until`,
`limit`. Tag filters are created by prefixing tag names with `#` (e.g., `#p`, `#e`). Extensions
include prefix matching and NIP-50 `search`. Negative matches were proposed but rejected due to
relay performance concerns.
**Filters as a routing problem**: "Where to send a given filter" is distinct from "where to send
a given event." Filters have less information than events, making routing harder. A routing
heuristic "connects a filter that might be constructed to support a particular use case with the
relay where matching events are stored." This is analogous to database indexes.
**Design principles**:
- **Minimalism**: Filters match discrete criteria without complex negation or boolean logic
- **Decentralization**: Clients understand routing heuristics; relays don't need to understand intent
- **Extensibility**: The `#tag` convention allows arbitrary new filters without protocol changes
- **Trade-offs**: Single-letter indexing limits expressiveness but maintains relay scalability
- **Partition tolerance**: Missing a relay means missing some events, which is acceptable
## Reference Implementation Analysis
### applesauce
**Types**: `Filter` extends nostr-tools' `CoreFilter` with NIP-91 AND operator support (`&`-prefixed
tag names) and NIP-50 `search` field. All standard NIP-01 fields present.
**Matching**: `matchFilter(filter, event)` checks basic fields first for early rejection, then
uses `getIndexableTags(event)` to build a cached `Set<string>` of `"tagName:value"` pairs on the
event (via Symbol key). NIP-91 AND tags processed before OR tags. `matchFilters` implements OR
across filter arrays.
**Utilities**: `mergeFilters(...filters)` unions array fields and deduplicates; takes minimum
`limit`, minimum `since`, maximum `until`. `isFilterEqual` uses `fast-deep-equal` for subscription
deduplication. `createFilterMap` distributes filters across relays by author.
**SQL layer**: Separate `buildFilterConditions` translates filters to SQL WHERE clauses. AND tags
use `GROUP BY`/`HAVING COUNT` subqueries. OR tags use `IN` subqueries.
**Patterns**: Symbol-based memoization of tag indexes on event objects. Early exit optimization.
Only indexes single-letter tags. RxJS integration for streaming filter results.
### ndk
**Types**: `NDKFilter<K extends NDKKind>` — generic parameterized type with all NIP-01 fields plus
`search`. Dynamic tag properties via `[key: #${string}]`.
**Matching**: `matchFilter(filter, event)` uses `indexOf()` for array membership. Special case:
`#t` (hashtag) tags are case-insensitive. Does NOT check `limit` or `search` — these are
submission constraints, not matching criteria. Short-circuits on first mismatch.
**Utilities**:
- `mergeFilters()` — unions arrays, deduplicates via Set. Preserves filters with `limit` separately (limits can't be merged). Returns array.
- `filterFingerprint()` — deterministic hash of filter structure for subscription grouping
- `compareFilter()` — checks if one filter is a subset of another (for cache-hit validation)
- `filterFromId()` — converts bech32 identifiers to filters
- `filterForEventsTaggingId()` — creates tag filters for events referencing a given ID
**Validation**: Three modes (VALIDATE/FIX/IGNORE). Checks for undefined values, type correctness,
hex format, kind range. Guardrails catch common mistakes: empty filters, bech32 in hex arrays,
`since > until`, `#t` with literal `#` prefix.
**Patterns**: Generic type parameterization. Pluggable validation modes. Readable subscription IDs
generated from filter structure.
### nostr-gadgets
Uses `nostr-tools` filter types and functions directly — no custom filter implementation.
**Construction patterns**: Mutable accumulation (`filter.authors?.push(target)`), inline literals,
spread-based composition (`{ ...f, authors: [pubkey], since: newest }`).
**Filter-based deletion**: Converts event tags to filter arrays for batch deletion operations.
**Multi-level filtering**: Filters used at query construction, relay permission checking (purgatory),
and client-side event matching.
### nostrlib
**Types**: Go struct with `IDs []ID`, `Kinds []Kind`, `Authors []PubKey`, `Tags TagMap`,
`Since Timestamp`, `Until Timestamp`, `Limit int`, `Search string`, `LimitZero bool`. Uses
fixed-size byte arrays for IDs/PubKeys.
**Matching**: Two methods:
- `Matches(event)` — full matching including timestamp constraints
- `MatchesIgnoringTimestampConstraints(event)``[//go:inline]` optimized, used for live events after EOSE
Tag matching via `tags.ContainsAny(tagName, values)`. Uses `slices.Contains()` for array membership.
**Utilities**: `Clone()` deep copies. `FilterEqual()` order-independent comparison. `GetTheoreticalLimit()`
estimates max results considering replaceability. No merging functions.
**Serialization**: Custom easyjson codec with `xhex` for fast hex encoding. `LimitZero` bool
distinguishes `"limit": 0` from omitted limit.
**Patterns**: Pure data structure with stateless methods. Subscription switches matching function
after EOSE. Query optimizer scores tags by "goodness" for index selection.
### nostr-tools
**Types**: Simple TypeScript type with all NIP-01 fields. Index signature `[key: #${string}]` for
dynamic tag filters. All properties optional.
**Matching**: `matchFilter(filter, event)` — conjunctive (AND) matching. Uses `indexOf()` for
membership. Iterates filter properties for `#`-prefixed tag filters. Both `since` and `until` are
inclusive. `matchFilters` implements OR across array.
**Utilities**:
- `mergeFilters(...filters)` — unions array properties, takes max `limit`, min `since`, max `until`
- `getFilterLimit(filter)` — computes intrinsic limit considering replaceability:
- Empty arrays → 0
- IDs → `ids.length`
- Replaceable kinds → `authors.length * kinds.length`
- Addressable kinds → `authors.length * kinds.length * #d.length`
- Returns minimum across all applicable constraints
**Patterns**: Minimalist, functional, no external dependencies for filter logic. Pure JavaScript.
Early exit on mismatch. Kind classification integration for limit calculation.
**Design**: Self-contained. No validation. `search` field defined but not used in matching logic.
### rust-nostr
**Types**: `Filter` struct with `Option<BTreeSet<T>>` for all set fields. Uses `BTreeSet` for
O(log n) lookups and deterministic serialization. `generic_tags: BTreeMap<SingleLetterTag, BTreeSet<String>>`
for dynamic tag filters.
**Matching**: `match_event(&self, event, opts)` with `MatchEventOptions` controlling which fields
to check (7 boolean flags). Individual match methods are `#[inline]`. Tag matching uses lazy-initialized
`event.tags.indexes()` (OnceCell pattern). NIP-50 search: case-insensitive substring via `.windows()`.
**Builder pattern**: Fluent chainable methods consuming `self`: `Filter::new().kind(k).author(pk)`.
Convenience methods for common tags: `.event()`, `.pubkey()`, `.hashtag()`, `.identifier()`,
`.coordinate()`. Generic `.custom_tag()` for arbitrary tags. Remove methods return `None` if set
becomes empty.
**Option semantics**: `None` = no constraint (matches all). `Some(empty_set)` = matches nothing.
This distinction is explicitly documented (GitHub issue #302).
**Utilities**: `is_empty()`, `extract_public_keys()`. No merging/combining API — multiple filters
handled at protocol layer.
**Patterns**: no_std compatible (uses `alloc`). BTreeSet for deterministic ordering. Custom serde
with `#[serde(flatten)]` for generic tags.
### welshman
**Types**: Standard NIP-01 filter type. `neverFilter = {ids: []}` constant for "matches nothing."
**Matching**: Delegates to nostr-tools for NIP-01 matching. Extends with search: splits by
whitespace, case-insensitive, requires ALL terms match (AND logic).
**Utilities**:
- `getFilterId(filter)` — deterministic hash for deduplication (sort keys, join, hash)
- `calculateFilterGroup(filter)` — groups by matching space (structural fields vs temporal)
- `unionFilters(filters)` — groups by `calculateFilterGroup`, merges arrays within groups
- `intersectFilters(groups)` — Cartesian product across filter groups with intelligent merging (max `since`, min `until`, max `limit`, concatenate `search`)
- `getIdFilters(idsOrAddresses)` — converts mix of IDs and addresses to filters
- `getReplyFilters(events)` — generates filters for replies (#e for regular, #a for replaceable)
- `addRepostFilters(filters)` — adds repost kind variants
- `trimFilter(filter)` — caps array fields at 1000 items with random sampling
- `getFilterGenerality()` — heuristic score 0 (specific) to 1 (general)
**Patterns**: Functional composition. Immutable transformations. Hash-based deduplication.
Domain-driven builders for common query patterns.
## Common Patterns
1. **Type structure**: All implementations use optional fields. Missing = no constraint. The `#tag`
convention for dynamic tag filters is universal.
2. **AND/OR semantics**: Universal agreement — AND within a single filter, OR across an array of
filters. This is fundamental to NIP-01.
3. **Matching order**: Most implementations check scalar fields first (ids, kinds, authors) for
early exit before the more expensive tag matching.
4. **Tag indexing**: Several implementations build cached indexes on events for efficient repeated
matching (applesauce: Symbol-based Set cache; rust-nostr: OnceCell BTreeMap).
5. **No negation**: No implementation supports negative matching (NOT). This aligns with protocol
design — rejected for relay performance reasons.
6. **Limit semantics**: `limit` is not a matching criterion — it's a result count constraint.
Most matching functions ignore it. `getFilterLimit`/`GetTheoreticalLimit` computes intrinsic
upper bounds based on kind replaceability.
7. **Merging**: Most implementations provide union-style filter merging. Array fields are unioned
and deduplicated. Scalar fields use min/max logic.
8. **Option vs empty**: rust-nostr explicitly distinguishes `None` (no constraint) from
`Some(empty)` (matches nothing). Other implementations handle this implicitly.
## Considerations for Our Implementation
1. **Filter as a standalone primitive**: Frame filters independent of REQ messages. They're a
general-purpose matching predicate over events.
2. **Struct design**: Use `Option<BTreeSet<T>>` following rust-nostr's approach — it correctly
models the distinction between "no constraint" and "empty constraint." BTreeSet gives
deterministic serialization and O(log n) lookups.
3. **Matching function**: Implement `matches(&self, event: &Event) -> bool` with early exit on
scalar fields. Tag matching should use event tag indexes.
4. **Builder pattern**: Fluent API for construction: `Filter::new().kind(1).author(pk)`.
Convenience methods for common tags (#e, #p, #d, #a, #t).
5. **Generic tag filters**: Support arbitrary single-letter tag filters via
`BTreeMap<SingleLetterTag, BTreeSet<String>>` or similar.
6. **Serialization**: Custom JSON serialization to flatten generic tags as `#tag` keys. Handle
`limit: 0` vs omitted limit.
7. **No merging in core**: Following rust-nostr, keep the filter primitive simple. Merging and
combining can live in higher-level utilities if needed.
8. **Limit calculation**: Consider `getFilterLimit`-style intrinsic limit computation based on
kind replaceability — useful for query optimization.
9. **Dependencies**: Filter should depend only on existing types (Event, EventId, Pubkey, Kind,
Timestamp, Tags). Self-contained within coracle-lib.
10. **Test strategy**: Test matching logic thoroughly — all field types, AND semantics, tag
matching, timestamp boundaries, empty sets vs None, edge cases.