Add filters chapter
This commit is contained in:
+1
-1
@@ -108,7 +108,7 @@ belongs to us rather than a foreign crate.
|
||||
///
|
||||
/// This is the "name" half of a nostr identity. It's safe to log, share, and
|
||||
/// store — it identifies an author but grants no ability to speak as them.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
pub struct PublicKey(secp256k1::XOnlyPublicKey);
|
||||
|
||||
impl PublicKey {
|
||||
|
||||
@@ -0,0 +1,820 @@
|
||||
# Filters
|
||||
|
||||
A filter is a predicate over events. Given an event, a filter answers one
|
||||
question: does this event match? The protocol uses filters inside `REQ`
|
||||
messages to tell relays what to send, but the data structure itself is
|
||||
equally useful for querying a local database, routing events to the right
|
||||
handler, or deciding which events to display — anywhere you need to select
|
||||
a subset of events from a larger set.
|
||||
|
||||
A single filter expresses a conjunction: every field that is present must
|
||||
match. An array of filters expresses a disjunction: the event must match
|
||||
at least one. Between the two, you can describe any positive selection
|
||||
over the event fields that nostr exposes — ids, authors, kinds, tags,
|
||||
and time ranges. There is no negation; the protocol rejected it early on
|
||||
to keep relay implementations simple.
|
||||
|
||||
## The module
|
||||
|
||||
```rust {file=coracle-lib/src/lib.rs}
|
||||
pub mod filters;
|
||||
```
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
//! Event filters: the [`Filter`] type for matching events by id, author,
|
||||
//! kind, tags, and time range, plus utilities for hashing, grouping, and
|
||||
//! estimating result cardinality.
|
||||
|
||||
use std::collections::{BTreeMap, BTreeSet};
|
||||
use std::fmt;
|
||||
|
||||
use serde::de::{self, MapAccess, Visitor};
|
||||
use serde::ser::SerializeMap;
|
||||
use serde::{Deserialize, Deserializer, Serialize, Serializer};
|
||||
|
||||
use crate::addresses::Address;
|
||||
use crate::events::Event;
|
||||
use crate::keys::PublicKey;
|
||||
```
|
||||
|
||||
## The `Filter` struct
|
||||
|
||||
Each field corresponds to one axis of selection. The set fields — `ids`,
|
||||
`authors`, `kinds` — use `Option<BTreeSet<T>>`. The `Option` layer
|
||||
carries meaning: `None` says "I have no constraint on this field" and
|
||||
matches everything, while `Some(empty set)` says "the value must be a
|
||||
member of this empty set" and matches nothing. The distinction matters
|
||||
for composition — merging two filters into one relies on being able to
|
||||
tell "don't care" from "impossible."
|
||||
|
||||
`BTreeSet` rather than `HashSet` gives two things: O(log n) membership
|
||||
checks regardless of the key type, and deterministic iteration order for
|
||||
serialization and hashing.
|
||||
|
||||
The `tags` field is a `BTreeMap` from tag name to a set of acceptable
|
||||
values. Tag names are arbitrary strings — not restricted to single
|
||||
letters. Single-letter indexing is a relay optimization, not a protocol
|
||||
constraint, and the filter type should not encode relay policy.
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
/// A predicate over nostr events.
|
||||
///
|
||||
/// Every present field must match for the filter to match (AND semantics).
|
||||
/// An array of filters matches if any single filter matches (OR semantics).
|
||||
///
|
||||
/// `None` on a set field means "no constraint" — it matches any value.
|
||||
/// `Some(empty set)` means "must be a member of the empty set" — it
|
||||
/// matches nothing. This distinction is important for filter composition.
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
|
||||
pub struct Filter {
|
||||
/// Event IDs to match.
|
||||
pub ids: Option<BTreeSet<[u8; 32]>>,
|
||||
/// Author public keys to match.
|
||||
pub authors: Option<BTreeSet<PublicKey>>,
|
||||
/// Event kinds to match.
|
||||
pub kinds: Option<BTreeSet<u16>>,
|
||||
/// Tag filters: for each entry, the event must have at least one tag
|
||||
/// with that name whose value appears in the set.
|
||||
pub tags: BTreeMap<String, BTreeSet<String>>,
|
||||
/// Lower bound on `created_at` (inclusive).
|
||||
pub since: Option<u64>,
|
||||
/// Upper bound on `created_at` (inclusive).
|
||||
pub until: Option<u64>,
|
||||
/// Maximum number of events a consumer should return. This is not a
|
||||
/// matching criterion — [`matches`](Filter::matches) ignores it.
|
||||
pub limit: Option<usize>,
|
||||
}
|
||||
```
|
||||
|
||||
The `limit` field is part of the NIP-01 filter object, so it belongs in
|
||||
the struct. But it is a result-count constraint for consumers (relays,
|
||||
storage engines), not a predicate over individual events. The `matches`
|
||||
method ignores it entirely.
|
||||
|
||||
## Matching
|
||||
|
||||
Matching walks each present field and returns `false` as soon as one
|
||||
fails. Scalar checks — ids, kinds, authors — come first because they
|
||||
are a simple set-membership test and reject most non-matching events
|
||||
immediately, before the more involved tag iteration.
|
||||
|
||||
Tag matching checks every entry in the filter's `tags` map. For each
|
||||
tag name, the event must contain at least one tag with that name whose
|
||||
value appears in the filter's set. Within a single tag name the values
|
||||
are disjunctive (OR): any match suffices. Across tag names the
|
||||
constraints are conjunctive (AND): all must be satisfied.
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
impl Filter {
|
||||
/// Test whether an event satisfies this filter.
|
||||
///
|
||||
/// All present fields must match (AND semantics). The `limit` field
|
||||
/// is ignored — it is a hint for result-set sizing, not a predicate.
|
||||
pub fn matches(&self, event: &Event) -> bool {
|
||||
if let Some(ids) = &self.ids {
|
||||
if !ids.contains(&event.id) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(kinds) = &self.kinds {
|
||||
if !kinds.contains(&event.kind) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(authors) = &self.authors {
|
||||
if !authors.contains(&event.pubkey) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
for (name, values) in &self.tags {
|
||||
let has_match = event
|
||||
.tags
|
||||
.find_all(name)
|
||||
.any(|tag| values.contains(tag.value()));
|
||||
if !has_match {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(since) = self.since {
|
||||
if event.created_at < since {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(until) = self.until {
|
||||
if event.created_at > until {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
A convenience free function handles the common case of testing an event
|
||||
against an array of filters — the OR-across-filters semantics that
|
||||
NIP-01 defines for `REQ` subscriptions.
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
/// Test whether an event matches any filter in the slice.
|
||||
///
|
||||
/// Returns `true` if at least one filter matches. An empty slice matches
|
||||
/// nothing.
|
||||
pub fn matches_any(filters: &[Filter], event: &Event) -> bool {
|
||||
filters.iter().any(|f| f.matches(event))
|
||||
}
|
||||
```
|
||||
|
||||
## Construction
|
||||
|
||||
An empty filter matches everything — no constraints means no rejections.
|
||||
The `add_*` methods insert values into the constraint sets, consuming and
|
||||
returning `self` so calls can be chained. Matching `remove_*` methods
|
||||
take values back out, and `clear_*` methods reset a field to `None` —
|
||||
removing the constraint entirely.
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
impl Filter {
|
||||
/// Create an empty filter that matches every event.
|
||||
pub fn new() -> Self {
|
||||
Filter {
|
||||
ids: None,
|
||||
authors: None,
|
||||
kinds: None,
|
||||
tags: BTreeMap::new(),
|
||||
since: None,
|
||||
until: None,
|
||||
limit: None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Add an event id to the constraint set.
|
||||
pub fn add_id(mut self, id: [u8; 32]) -> Self {
|
||||
self.ids.get_or_insert_with(BTreeSet::new).insert(id);
|
||||
self
|
||||
}
|
||||
|
||||
/// Add multiple event ids to the constraint set.
|
||||
pub fn add_ids(mut self, ids: impl IntoIterator<Item = [u8; 32]>) -> Self {
|
||||
self.ids.get_or_insert_with(BTreeSet::new).extend(ids);
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove an event id from the constraint set. If the set becomes
|
||||
/// empty, it remains as `Some(empty)` — use `clear_ids` to remove
|
||||
/// the constraint entirely.
|
||||
pub fn remove_id(mut self, id: &[u8; 32]) -> Self {
|
||||
if let Some(ids) = &mut self.ids {
|
||||
ids.remove(id);
|
||||
}
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove the ids constraint, matching any event id.
|
||||
pub fn clear_ids(mut self) -> Self {
|
||||
self.ids = None;
|
||||
self
|
||||
}
|
||||
|
||||
/// Add an author to the constraint set.
|
||||
pub fn add_author(mut self, author: PublicKey) -> Self {
|
||||
self.authors
|
||||
.get_or_insert_with(BTreeSet::new)
|
||||
.insert(author);
|
||||
self
|
||||
}
|
||||
|
||||
/// Add multiple authors to the constraint set.
|
||||
pub fn add_authors(mut self, authors: impl IntoIterator<Item = PublicKey>) -> Self {
|
||||
self.authors
|
||||
.get_or_insert_with(BTreeSet::new)
|
||||
.extend(authors);
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove an author from the constraint set.
|
||||
pub fn remove_author(mut self, author: &PublicKey) -> Self {
|
||||
if let Some(authors) = &mut self.authors {
|
||||
authors.remove(author);
|
||||
}
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove the authors constraint, matching any author.
|
||||
pub fn clear_authors(mut self) -> Self {
|
||||
self.authors = None;
|
||||
self
|
||||
}
|
||||
|
||||
/// Add a kind to the constraint set.
|
||||
pub fn add_kind(mut self, kind: u16) -> Self {
|
||||
self.kinds.get_or_insert_with(BTreeSet::new).insert(kind);
|
||||
self
|
||||
}
|
||||
|
||||
/// Add multiple kinds to the constraint set.
|
||||
pub fn add_kinds(mut self, kinds: impl IntoIterator<Item = u16>) -> Self {
|
||||
self.kinds.get_or_insert_with(BTreeSet::new).extend(kinds);
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove a kind from the constraint set.
|
||||
pub fn remove_kind(mut self, kind: &u16) -> Self {
|
||||
if let Some(kinds) = &mut self.kinds {
|
||||
kinds.remove(kind);
|
||||
}
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove the kinds constraint, matching any kind.
|
||||
pub fn clear_kinds(mut self) -> Self {
|
||||
self.kinds = None;
|
||||
self
|
||||
}
|
||||
|
||||
/// Add a value to a tag filter: the event must have at least one tag
|
||||
/// with this name whose value appears in the set.
|
||||
pub fn add_tag(mut self, name: impl Into<String>, value: impl Into<String>) -> Self {
|
||||
self.tags
|
||||
.entry(name.into())
|
||||
.or_default()
|
||||
.insert(value.into());
|
||||
self
|
||||
}
|
||||
|
||||
/// Add multiple values to a tag filter.
|
||||
pub fn add_tags(
|
||||
mut self,
|
||||
name: impl Into<String>,
|
||||
values: impl IntoIterator<Item = impl Into<String>>,
|
||||
) -> Self {
|
||||
self.tags
|
||||
.entry(name.into())
|
||||
.or_default()
|
||||
.extend(values.into_iter().map(Into::into));
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove a value from a tag filter. If the value set becomes empty,
|
||||
/// the tag entry is removed from the map.
|
||||
pub fn remove_tag(mut self, name: &str, value: &str) -> Self {
|
||||
if let Some(values) = self.tags.get_mut(name) {
|
||||
values.remove(value);
|
||||
if values.is_empty() {
|
||||
self.tags.remove(name);
|
||||
}
|
||||
}
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove an entire tag filter by name.
|
||||
pub fn clear_tag(mut self, name: &str) -> Self {
|
||||
self.tags.remove(name);
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove all tag filters.
|
||||
pub fn clear_tags(mut self) -> Self {
|
||||
self.tags.clear();
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the lower bound on `created_at` (inclusive).
|
||||
pub fn add_since(mut self, since: u64) -> Self {
|
||||
self.since = Some(since);
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove the lower bound on `created_at`.
|
||||
pub fn clear_since(mut self) -> Self {
|
||||
self.since = None;
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the upper bound on `created_at` (inclusive).
|
||||
pub fn add_until(mut self, until: u64) -> Self {
|
||||
self.until = Some(until);
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove the upper bound on `created_at`.
|
||||
pub fn clear_until(mut self) -> Self {
|
||||
self.until = None;
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the result-count limit.
|
||||
pub fn add_limit(mut self, limit: usize) -> Self {
|
||||
self.limit = Some(limit);
|
||||
self
|
||||
}
|
||||
|
||||
/// Remove the result-count limit.
|
||||
pub fn clear_limit(mut self) -> Self {
|
||||
self.limit = None;
|
||||
self
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Address convenience
|
||||
|
||||
Filtering for an addressable event by its address is common enough — and
|
||||
error-prone enough when done by hand — to warrant a dedicated method.
|
||||
An address carries a kind, an author, and an identifier; the method
|
||||
translates these into the corresponding filter fields.
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
impl Filter {
|
||||
/// Add constraints that match events at the given address.
|
||||
///
|
||||
/// Sets the kind, author, and `d` tag filter from the address's
|
||||
/// components. If the identifier is empty (plain replaceable events),
|
||||
/// the `d` tag filter is still set — the event must have a `d` tag
|
||||
/// with an empty value.
|
||||
pub fn add_address(self, addr: &Address) -> Self {
|
||||
self.add_kind(addr.kind)
|
||||
.add_author(addr.pubkey)
|
||||
.add_tag("d", &addr.identifier)
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for Filter {
|
||||
fn default() -> Self {
|
||||
Filter::new()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
A filter built with `.add_address()` looks like this:
|
||||
|
||||
```rust
|
||||
use coracle_lib::filters::Filter;
|
||||
use coracle_lib::addresses::Address;
|
||||
|
||||
let addr: Address = "30023:ab12...cd34:my-article".parse().unwrap();
|
||||
let filter = Filter::new().add_address(&addr);
|
||||
// Equivalent to:
|
||||
// Filter::new().add_kind(30023).add_author(pubkey).add_tag("d", "my-article")
|
||||
```
|
||||
|
||||
## Serialization
|
||||
|
||||
The NIP-01 wire format for a filter is a flat JSON object where tag
|
||||
filters appear as keys prefixed with `#`:
|
||||
|
||||
```json
|
||||
{
|
||||
"kinds": [1],
|
||||
"authors": ["ab12...cd34"],
|
||||
"#t": ["nostr", "rust"],
|
||||
"since": 1700000000,
|
||||
"limit": 10
|
||||
}
|
||||
```
|
||||
|
||||
The `tags` map in our struct needs to be flattened into the top-level
|
||||
object during serialization and reconstituted during deserialization.
|
||||
This rules out `#[derive(Serialize, Deserialize)]` — we need a hand-
|
||||
written implementation.
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
impl Serialize for Filter {
|
||||
fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
|
||||
// Count present fields for the map size hint.
|
||||
let mut count = self.tags.len();
|
||||
if self.ids.is_some() {
|
||||
count += 1;
|
||||
}
|
||||
if self.authors.is_some() {
|
||||
count += 1;
|
||||
}
|
||||
if self.kinds.is_some() {
|
||||
count += 1;
|
||||
}
|
||||
if self.since.is_some() {
|
||||
count += 1;
|
||||
}
|
||||
if self.until.is_some() {
|
||||
count += 1;
|
||||
}
|
||||
if self.limit.is_some() {
|
||||
count += 1;
|
||||
}
|
||||
|
||||
let mut map = serializer.serialize_map(Some(count))?;
|
||||
|
||||
if let Some(ids) = &self.ids {
|
||||
let hex_ids: Vec<String> = ids.iter().map(hex::encode).collect();
|
||||
map.serialize_entry("ids", &hex_ids)?;
|
||||
}
|
||||
if let Some(authors) = &self.authors {
|
||||
let hex_authors: Vec<String> = authors.iter().map(|pk| pk.to_hex()).collect();
|
||||
map.serialize_entry("authors", &hex_authors)?;
|
||||
}
|
||||
if let Some(kinds) = &self.kinds {
|
||||
let kinds_vec: Vec<u16> = kinds.iter().copied().collect();
|
||||
map.serialize_entry("kinds", &kinds_vec)?;
|
||||
}
|
||||
|
||||
for (name, values) in &self.tags {
|
||||
let key = format!("#{name}");
|
||||
let vals: Vec<&str> = values.iter().map(String::as_str).collect();
|
||||
map.serialize_entry(&key, &vals)?;
|
||||
}
|
||||
|
||||
if let Some(since) = self.since {
|
||||
map.serialize_entry("since", &since)?;
|
||||
}
|
||||
if let Some(until) = self.until {
|
||||
map.serialize_entry("until", &until)?;
|
||||
}
|
||||
if let Some(limit) = self.limit {
|
||||
map.serialize_entry("limit", &limit)?;
|
||||
}
|
||||
|
||||
map.end()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Deserialization collects known keys into their fields and routes any key
|
||||
starting with `#` into the `tags` map. Unknown keys are silently
|
||||
ignored for forward compatibility.
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
impl<'de> Deserialize<'de> for Filter {
|
||||
fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
|
||||
deserializer.deserialize_map(FilterVisitor)
|
||||
}
|
||||
}
|
||||
|
||||
struct FilterVisitor;
|
||||
|
||||
impl<'de> Visitor<'de> for FilterVisitor {
|
||||
type Value = Filter;
|
||||
|
||||
fn expecting(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
f.write_str("a nostr filter object")
|
||||
}
|
||||
|
||||
fn visit_map<M: MapAccess<'de>>(self, mut map: M) -> Result<Filter, M::Error> {
|
||||
let mut ids: Option<BTreeSet<[u8; 32]>> = None;
|
||||
let mut authors: Option<BTreeSet<PublicKey>> = None;
|
||||
let mut kinds: Option<BTreeSet<u16>> = None;
|
||||
let mut tags: BTreeMap<String, BTreeSet<String>> = BTreeMap::new();
|
||||
let mut since: Option<u64> = None;
|
||||
let mut until: Option<u64> = None;
|
||||
let mut limit: Option<usize> = None;
|
||||
|
||||
while let Some(key) = map.next_key::<String>()? {
|
||||
match key.as_str() {
|
||||
"ids" => {
|
||||
let hex_ids: Vec<String> = map.next_value()?;
|
||||
let mut set = BTreeSet::new();
|
||||
for h in hex_ids {
|
||||
let bytes = hex::decode(&h)
|
||||
.map_err(|_| de::Error::custom("invalid hex in ids"))?;
|
||||
let arr: [u8; 32] = bytes
|
||||
.try_into()
|
||||
.map_err(|_| de::Error::custom("id must be 32 bytes"))?;
|
||||
set.insert(arr);
|
||||
}
|
||||
ids = Some(set);
|
||||
}
|
||||
"authors" => {
|
||||
let hex_authors: Vec<String> = map.next_value()?;
|
||||
let mut set = BTreeSet::new();
|
||||
for h in hex_authors {
|
||||
let pk = PublicKey::from_hex(&h)
|
||||
.map_err(|_| de::Error::custom("invalid pubkey in authors"))?;
|
||||
set.insert(pk);
|
||||
}
|
||||
authors = Some(set);
|
||||
}
|
||||
"kinds" => {
|
||||
let kind_vec: Vec<u16> = map.next_value()?;
|
||||
kinds = Some(kind_vec.into_iter().collect());
|
||||
}
|
||||
"since" => since = Some(map.next_value()?),
|
||||
"until" => until = Some(map.next_value()?),
|
||||
"limit" => limit = Some(map.next_value()?),
|
||||
other if other.starts_with('#') => {
|
||||
let tag_name = other[1..].to_string();
|
||||
let values: Vec<String> = map.next_value()?;
|
||||
tags.insert(tag_name, values.into_iter().collect());
|
||||
}
|
||||
_ => {
|
||||
let _: de::IgnoredAny = map.next_value()?;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(Filter {
|
||||
ids,
|
||||
authors,
|
||||
kinds,
|
||||
tags,
|
||||
since,
|
||||
until,
|
||||
limit,
|
||||
})
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
A round-trip through JSON preserves all fields:
|
||||
|
||||
```rust
|
||||
use coracle_lib::filters::Filter;
|
||||
|
||||
let filter = Filter::new()
|
||||
.add_kind(1)
|
||||
.add_tag("t", "nostr")
|
||||
.add_since(1_700_000_000)
|
||||
.add_limit(10);
|
||||
let json = serde_json::to_string(&filter).unwrap();
|
||||
let parsed: Filter = serde_json::from_str(&json).unwrap();
|
||||
assert_eq!(filter, parsed);
|
||||
```
|
||||
|
||||
## Identity and grouping
|
||||
|
||||
Two methods support deduplication and merging at higher layers —
|
||||
subscription managers, relay pools, and storage engines that need to
|
||||
detect redundant or combinable filters.
|
||||
|
||||
`id` produces a deterministic hash of the entire filter. Two filters
|
||||
with the same id are structurally identical. Since `Filter` derives
|
||||
`Hash`, we feed it through a standard `Hasher` — no need to round-trip
|
||||
through JSON.
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
use std::hash::{Hash, Hasher};
|
||||
use std::collections::hash_map::DefaultHasher;
|
||||
|
||||
impl Filter {
|
||||
/// Compute a deterministic identifier for this filter.
|
||||
///
|
||||
/// Two filters with the same id are structurally identical. The id
|
||||
/// is derived from the `Hash` implementation, which covers every
|
||||
/// field including `limit`.
|
||||
pub fn id(&self) -> u64 {
|
||||
let mut hasher = DefaultHasher::new();
|
||||
self.hash(&mut hasher);
|
||||
hasher.finish()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`group` determines which filters can be merged by unioning their
|
||||
set fields. Two filters can only be merged if they have the same
|
||||
structural shape (which set fields are present, which tag names appear)
|
||||
*and* the same scalar constraints (time windows). A filter with a
|
||||
`limit` can never be merged — combining two limited queries into one
|
||||
would change the result semantics — so each limited filter gets a
|
||||
unique group key.
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
static GROUP_COUNTER: std::sync::atomic::AtomicU64 = std::sync::atomic::AtomicU64::new(0);
|
||||
|
||||
impl Filter {
|
||||
/// Compute a group key that determines merge compatibility.
|
||||
///
|
||||
/// Filters in the same group can be merged by unioning their set
|
||||
/// fields (`ids`, `authors`, `kinds`, tag values). The group key
|
||||
/// captures:
|
||||
///
|
||||
/// - Which set fields are present and which tag names appear
|
||||
/// (structural shape)
|
||||
/// - The exact `since` and `until` values (different time windows
|
||||
/// cannot be combined)
|
||||
///
|
||||
/// A filter with a `limit` always gets a unique group key, because
|
||||
/// merging limited filters would change result-count semantics.
|
||||
pub fn group(&self) -> u64 {
|
||||
let mut hasher = DefaultHasher::new();
|
||||
|
||||
self.ids.is_some().hash(&mut hasher);
|
||||
self.authors.is_some().hash(&mut hasher);
|
||||
self.kinds.is_some().hash(&mut hasher);
|
||||
for name in self.tags.keys() {
|
||||
name.hash(&mut hasher);
|
||||
}
|
||||
|
||||
self.since.hash(&mut hasher);
|
||||
self.until.hash(&mut hasher);
|
||||
|
||||
if self.limit.is_some() {
|
||||
// Each limited filter gets a unique group — merging two
|
||||
// limited queries into one would change which events are
|
||||
// returned.
|
||||
GROUP_COUNTER
|
||||
.fetch_add(1, std::sync::atomic::Ordering::Relaxed)
|
||||
.hash(&mut hasher);
|
||||
}
|
||||
|
||||
hasher.finish()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Including tag names in the group key means that a filter on `#e` tags
|
||||
and a filter on `#p` tags land in different groups — as they should,
|
||||
since merging them by union would change the semantics. Likewise, two
|
||||
filters with different `since` or `until` values land in different
|
||||
groups, because a union of their sets under one time window would either
|
||||
over-fetch or under-fetch relative to what was requested.
|
||||
|
||||
## Union and intersection
|
||||
|
||||
Two operations combine filters in different ways.
|
||||
|
||||
`union_filters` takes a list of filters and merges those that share
|
||||
the same group — same structural shape, same time window, no limit.
|
||||
Within each group it unions the set fields: ids, authors, kinds, and
|
||||
tag values. The result is a shorter list that matches the same events
|
||||
as the original but with fewer individual filters to evaluate.
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
/// Merge compatible filters by unioning their set fields.
|
||||
///
|
||||
/// Filters with the same [`group`](Filter::group) are combined into a
|
||||
/// single filter whose set fields are the union of the originals. The
|
||||
/// result matches the same events as the input but with fewer filters.
|
||||
pub fn union_filters(filters: &[Filter]) -> Vec<Filter> {
|
||||
let mut groups: BTreeMap<u64, Filter> = BTreeMap::new();
|
||||
|
||||
for filter in filters {
|
||||
let key = filter.group();
|
||||
groups
|
||||
.entry(key)
|
||||
.and_modify(|existing| {
|
||||
merge_sets(&mut existing.ids, &filter.ids);
|
||||
merge_sets(&mut existing.authors, &filter.authors);
|
||||
merge_sets(&mut existing.kinds, &filter.kinds);
|
||||
for (name, values) in &filter.tags {
|
||||
existing
|
||||
.tags
|
||||
.entry(name.clone())
|
||||
.or_default()
|
||||
.extend(values.iter().cloned());
|
||||
}
|
||||
})
|
||||
.or_insert_with(|| filter.clone());
|
||||
}
|
||||
|
||||
groups.into_values().collect()
|
||||
}
|
||||
|
||||
fn merge_sets<T: Ord + Clone>(
|
||||
target: &mut Option<BTreeSet<T>>,
|
||||
source: &Option<BTreeSet<T>>,
|
||||
) {
|
||||
match (target.as_mut(), source) {
|
||||
(Some(t), Some(s)) => t.extend(s.iter().cloned()),
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`intersect_filters` takes multiple groups of filters — each group
|
||||
representing one independent query — and produces the set of filters
|
||||
that satisfies all groups simultaneously. It does this by computing
|
||||
the cartesian product across groups, combining each pair by unioning
|
||||
their set fields and tightening their time windows: the latest `since`,
|
||||
the earliest `until`. Finally it passes the result through
|
||||
`union_filters` to collapse any redundancy.
|
||||
|
||||
```rust {file=coracle-lib/src/filters.rs}
|
||||
/// Combine independent filter groups into filters that satisfy all of
|
||||
/// them.
|
||||
///
|
||||
/// Each inner `Vec<Filter>` represents one group of alternatives (OR).
|
||||
/// The result matches events that satisfy at least one filter from
|
||||
/// *every* group (AND across groups, OR within each group).
|
||||
///
|
||||
/// Set fields are unioned. Time windows are tightened: the latest
|
||||
/// `since` and earliest `until` win. If both filters have a `limit`,
|
||||
/// the larger one is kept. The result is simplified with
|
||||
/// [`union_filters`].
|
||||
pub fn intersect_filters(groups: &[Vec<Filter>]) -> Vec<Filter> {
|
||||
let Some(first) = groups.first() else {
|
||||
return vec![];
|
||||
};
|
||||
|
||||
let mut result: Vec<Filter> = first.clone();
|
||||
|
||||
for filters in &groups[1..] {
|
||||
let mut combined = Vec::with_capacity(result.len() * filters.len());
|
||||
|
||||
for f1 in &result {
|
||||
for f2 in filters {
|
||||
combined.push(combine_pair(f1, f2));
|
||||
}
|
||||
}
|
||||
|
||||
result = combined;
|
||||
}
|
||||
|
||||
union_filters(&result)
|
||||
}
|
||||
|
||||
fn combine_pair(a: &Filter, b: &Filter) -> Filter {
|
||||
let mut f = Filter::new();
|
||||
|
||||
f.ids = union_option_sets(&a.ids, &b.ids);
|
||||
f.authors = union_option_sets(&a.authors, &b.authors);
|
||||
f.kinds = union_option_sets(&a.kinds, &b.kinds);
|
||||
|
||||
for (name, values) in a.tags.iter().chain(b.tags.iter()) {
|
||||
f.tags
|
||||
.entry(name.clone())
|
||||
.or_default()
|
||||
.extend(values.iter().cloned());
|
||||
}
|
||||
|
||||
f.since = match (a.since, b.since) {
|
||||
(Some(a), Some(b)) => Some(a.max(b)),
|
||||
(s, None) | (None, s) => s,
|
||||
};
|
||||
|
||||
f.until = match (a.until, b.until) {
|
||||
(Some(a), Some(b)) => Some(a.min(b)),
|
||||
(u, None) | (None, u) => u,
|
||||
};
|
||||
|
||||
f.limit = match (a.limit, b.limit) {
|
||||
(Some(a), Some(b)) => Some(a.max(b)),
|
||||
(l, None) | (None, l) => l,
|
||||
};
|
||||
|
||||
f
|
||||
}
|
||||
|
||||
fn union_option_sets<T: Ord + Clone>(
|
||||
a: &Option<BTreeSet<T>>,
|
||||
b: &Option<BTreeSet<T>>,
|
||||
) -> Option<BTreeSet<T>> {
|
||||
match (a, b) {
|
||||
(Some(a), Some(b)) => {
|
||||
let mut merged = a.clone();
|
||||
merged.extend(b.iter().cloned());
|
||||
Some(merged)
|
||||
}
|
||||
(Some(s), None) | (None, Some(s)) => Some(s.clone()),
|
||||
(None, None) => None,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## What's next
|
||||
|
||||
The next chapter extends filters with NIP-50 full-text search — an
|
||||
optional `search` field that some relays support for content-based
|
||||
queries.
|
||||
+39
-35
@@ -11,53 +11,57 @@
|
||||
- [Kinds](06-kinds.md)
|
||||
- [Addresses](07-addresses.md)
|
||||
- [Proof of Work](08-proof-of-work.md)
|
||||
- [Filters](09-filters.md)
|
||||
- [Expiring Events](09-expiring-events.md)
|
||||
- [Protected Events](10-protected-events.md)
|
||||
- [Filters](11-filters.md)
|
||||
- [Search](12-search.md)
|
||||
|
||||
## Domain
|
||||
|
||||
- [Relay Selections](10-relay-selections.md)
|
||||
- [Relay Metadata](11-relay-metadata.md)
|
||||
- [Relay Membership](12-relay-membership.md)
|
||||
- [Profiles](13-profiles.md)
|
||||
- [Follows](14-follows.md)
|
||||
- [Microblogging](15-microblogging.md)
|
||||
- [Reactions](16-reactions.md)
|
||||
- [Reports](17-reports.md)
|
||||
- [Emojis](18-emojis.md)
|
||||
- [Zaps](19-zaps.md)
|
||||
- [Rooms](20-rooms.md)
|
||||
- [Relay Selections](13-relay-selections.md)
|
||||
- [Relay Metadata](14-relay-metadata.md)
|
||||
- [Relay Membership](15-relay-membership.md)
|
||||
- [Profiles](16-profiles.md)
|
||||
- [Follows](17-follows.md)
|
||||
- [Microblogging](18-microblogging.md)
|
||||
- [Reactions](19-reactions.md)
|
||||
- [Reports](20-reports.md)
|
||||
- [Emojis](21-emojis.md)
|
||||
- [Zaps](22-zaps.md)
|
||||
- [Rooms](23-rooms.md)
|
||||
- [Open Timestamp Attestations](24-open-timestamp-attestations.md)
|
||||
|
||||
## Networking
|
||||
|
||||
- [Relay Connections](21-relay-connections.md)
|
||||
- [Relay Authentication](22-relay-authentication.md)
|
||||
- [Relay Policies](23-relay-policies.md)
|
||||
- [Server Authentication](24-server-authentication.md)
|
||||
- [Relay Management API](25-relay-management-api.md)
|
||||
- [Blossom Media Storage](26-blossom-media-storage.md)
|
||||
- [Relay Connections](25-relay-connections.md)
|
||||
- [Relay Authentication](26-relay-authentication.md)
|
||||
- [Relay Policies](27-relay-policies.md)
|
||||
- [Server Authentication](28-server-authentication.md)
|
||||
- [Relay Management API](29-relay-management-api.md)
|
||||
- [Blossom Media Storage](30-blossom-media-storage.md)
|
||||
|
||||
## Signers
|
||||
|
||||
- [Signer Interface](27-signer-interface.md)
|
||||
- [Secret Signers](28-secret-signers.md)
|
||||
- [Remote Signers](29-remote-signers.md)
|
||||
- [Android Signers](30-android-signers.md)
|
||||
- [Browser Signers](31-browser-signers.md)
|
||||
- [Signer Interface](31-signer-interface.md)
|
||||
- [Secret Signers](32-secret-signers.md)
|
||||
- [Remote Signers](33-remote-signers.md)
|
||||
- [Android Signers](34-android-signers.md)
|
||||
- [Browser Signers](35-browser-signers.md)
|
||||
|
||||
## Content
|
||||
|
||||
- [Entities](32-entities.md)
|
||||
- [Relays](33-relays.md)
|
||||
- [Rooms](34-rooms.md)
|
||||
- [Links](35-links.md)
|
||||
- [Lightning](36-lightning.md)
|
||||
- [Cashu](37-cashu.md)
|
||||
- [Emojis](38-emojis.md)
|
||||
- [Topics](39-topics.md)
|
||||
- [Code](40-code.md)
|
||||
- [Entities](36-entities.md)
|
||||
- [Relays](37-relays.md)
|
||||
- [Rooms](38-rooms.md)
|
||||
- [Links](39-links.md)
|
||||
- [Lightning](40-lightning.md)
|
||||
- [Cashu](41-cashu.md)
|
||||
- [Emojis](42-emojis.md)
|
||||
- [Topics](43-topics.md)
|
||||
- [Code](44-code.md)
|
||||
|
||||
## Storage
|
||||
|
||||
- [Event Repository](41-event-repository.md)
|
||||
- [In Memory Backend](42-in-memory-backend.md)
|
||||
- [Sqlite Backend](43-sqlite-backend.md)
|
||||
- [Event Repository](45-event-repository.md)
|
||||
- [In Memory Backend](46-in-memory-backend.md)
|
||||
- [Sqlite Backend](47-sqlite-backend.md)
|
||||
|
||||
@@ -0,0 +1,207 @@
|
||||
# Plan: Filters
|
||||
|
||||
## Topic Summary
|
||||
|
||||
Filters are the NIP-01 data structure for matching events. They form an elegant primitive for
|
||||
matching events independent of the client/relay context — not just for REQ messages, but as a
|
||||
general-purpose event matching and querying abstraction. The chapter covers the filter structure,
|
||||
matching semantics (AND within a filter, OR across filters), tag filters, timestamp constraints,
|
||||
limits, construction, hashing, grouping, and cardinality estimation.
|
||||
|
||||
## Chapter Outline
|
||||
|
||||
1. **Introduction** — Filters as a general-purpose event matching primitive. Not tied to relays;
|
||||
they're a predicate you can evaluate against any event. Analogy to database WHERE clauses.
|
||||
|
||||
2. **The Filter Struct** — Walk through the fields:
|
||||
- `ids: Option<BTreeSet<[u8; 32]>>` — match event IDs
|
||||
- `authors: Option<BTreeSet<PublicKey>>` — match event authors
|
||||
- `kinds: Option<BTreeSet<u16>>` — match event kinds
|
||||
- `tags: BTreeMap<String, BTreeSet<String>>` — match tag values by tag name
|
||||
- `since: Option<u64>` — lower bound on `created_at` (inclusive)
|
||||
- `until: Option<u64>` — upper bound on `created_at` (inclusive)
|
||||
- `limit: Option<usize>` — result count constraint (not a matching criterion)
|
||||
|
||||
Explain `Option` semantics: `None` = no constraint, `Some(empty set)` = matches nothing.
|
||||
Note that `limit` is metadata for consumers, not part of matching logic.
|
||||
|
||||
3. **Matching** — Implement `matches(&self, event: &Event) -> bool`:
|
||||
- AND semantics: all present fields must match
|
||||
- Early exit on scalar checks (ids, kinds, authors) before tag matching
|
||||
- Tag matching: for each tag filter, event must have at least one tag with that name
|
||||
whose value is in the filter's set (OR within a tag filter, AND across tag filters)
|
||||
- Timestamp: `since <= created_at <= until`
|
||||
- `limit` is ignored
|
||||
- Implement `matches_any(filters: &[Filter], event: &Event) -> bool` as a free function
|
||||
for OR-across-filters semantics
|
||||
|
||||
4. **Construction** — Builder pattern with fluent API:
|
||||
- `Filter::new()` — empty filter (matches everything)
|
||||
- `.id(id)` / `.ids(iter)` — add event IDs
|
||||
- `.author(pk)` / `.authors(iter)` — add authors
|
||||
- `.kind(k)` / `.kinds(iter)` — add kinds
|
||||
- `.tag(name, value)` / `.tags(name, iter)` — add arbitrary tag filters
|
||||
- `.since(ts)` / `.until(ts)` — set timestamp bounds
|
||||
- `.limit(n)` — set result limit
|
||||
- `.address(addr)` — convenience: sets kind, author, and `#d` tag from an Address
|
||||
|
||||
5. **Serialization** — Custom serde implementation:
|
||||
- Standard fields serialize normally, skip `None` fields
|
||||
- `tags` BTreeMap flattened: key `"foo"` becomes JSON key `"#foo"` with array value
|
||||
- Handle `limit: 0` vs omitted limit (Some(0) serializes as `"limit": 0`)
|
||||
- Deserialization: any key starting with `#` collected into `tags` map
|
||||
- Show round-trip example
|
||||
|
||||
6. **Identity and Grouping** — Utilities for deduplication and merging:
|
||||
- `filter_id(filter) -> String` — deterministic hash of filter contents for dedup
|
||||
- `filter_group(filter) -> String` — hash of structural fields only (ids, kinds, authors,
|
||||
tag keys) excluding values and temporal fields. Two filters in the same group can be
|
||||
merged by unioning their value sets.
|
||||
|
||||
7. **Cardinality** — `cardinality(&self) -> Option<usize>`:
|
||||
- Returns `Some(n)` when the maximum number of matching events can be determined
|
||||
- `ids` present → `ids.len()`
|
||||
- All kinds are replaceable + `authors` present → `authors.len() * kinds.len()`
|
||||
- All kinds are addressable + `authors` present + `#d` present →
|
||||
`authors.len() * kinds.len() * d_values.len()`
|
||||
- Otherwise → `None` (unbounded)
|
||||
- If explicit `limit` is set, return `min(limit, computed)` when computed is Some,
|
||||
or `Some(limit)` when computed is None
|
||||
- Empty set in any field → `Some(0)`
|
||||
|
||||
8. **Recap** — Summarize filter as a composable primitive. Tease usage in relay connections
|
||||
chapter.
|
||||
|
||||
## API Design
|
||||
|
||||
```rust
|
||||
// --- Filter struct ---
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct Filter {
|
||||
pub ids: Option<BTreeSet<[u8; 32]>>,
|
||||
pub authors: Option<BTreeSet<PublicKey>>,
|
||||
pub kinds: Option<BTreeSet<u16>>,
|
||||
pub since: Option<u64>,
|
||||
pub until: Option<u64>,
|
||||
pub limit: Option<usize>,
|
||||
// Flattened in serde as #key -> [values]
|
||||
pub tags: BTreeMap<String, BTreeSet<String>>,
|
||||
}
|
||||
|
||||
// --- Construction (builder, consuming self) ---
|
||||
|
||||
impl Filter {
|
||||
pub fn new() -> Self
|
||||
pub fn id(self, id: [u8; 32]) -> Self
|
||||
pub fn ids(self, ids: impl IntoIterator<Item = [u8; 32]>) -> Self
|
||||
pub fn author(self, author: PublicKey) -> Self
|
||||
pub fn authors(self, authors: impl IntoIterator<Item = PublicKey>) -> Self
|
||||
pub fn kind(self, kind: u16) -> Self
|
||||
pub fn kinds(self, kinds: impl IntoIterator<Item = u16>) -> Self
|
||||
pub fn tag(self, name: impl Into<String>, value: impl Into<String>) -> Self
|
||||
pub fn tags(self, name: impl Into<String>, values: impl IntoIterator<Item = impl Into<String>>) -> Self
|
||||
pub fn since(self, since: u64) -> Self
|
||||
pub fn until(self, until: u64) -> Self
|
||||
pub fn limit(self, limit: usize) -> Self
|
||||
pub fn address(self, addr: &Address) -> Self
|
||||
}
|
||||
|
||||
// --- Matching ---
|
||||
|
||||
impl Filter {
|
||||
pub fn matches(&self, event: &Event) -> bool
|
||||
pub fn cardinality(&self) -> Option<usize>
|
||||
}
|
||||
|
||||
pub fn matches_any(filters: &[Filter], event: &Event) -> bool
|
||||
|
||||
// --- Identity and grouping ---
|
||||
|
||||
pub fn filter_id(filter: &Filter) -> String
|
||||
pub fn filter_group(filter: &Filter) -> String
|
||||
```
|
||||
|
||||
## Code Organization
|
||||
|
||||
All code in `coracle-lib/src/filters.rs`. Single file, single module. Add `pub mod filters;`
|
||||
to `coracle-lib/src/lib.rs`.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- `serde` / `serde_json` — already used in the events chapter for serialization
|
||||
- `std::collections::BTreeSet` / `BTreeMap` — stdlib, no external crate
|
||||
- `sha2` — already used in events chapter for hashing; reuse for filter_id
|
||||
|
||||
No new external dependencies needed.
|
||||
|
||||
## Narrative Notes
|
||||
|
||||
- Open by framing filters as a standalone primitive. They're a predicate, not a protocol
|
||||
message. The fact that relays use them in REQ is one application, but they're equally
|
||||
useful for client-side filtering, local storage queries, and event routing decisions.
|
||||
|
||||
- The `Option` semantics deserve careful explanation. Show the difference:
|
||||
`None` = "I don't care about this field" vs `Some(empty)` = "this field must match
|
||||
one of these zero values (i.e., nothing matches)". This is the key insight that makes
|
||||
filters composable.
|
||||
|
||||
- When explaining matching, walk through a concrete example: construct a filter, show an
|
||||
event, trace through the matching logic field by field.
|
||||
|
||||
- For tag filters, emphasize that tag keys are arbitrary strings — not restricted to
|
||||
single letters. The single-letter convention is a relay indexing optimization, not a
|
||||
protocol constraint.
|
||||
|
||||
- `limit` gets a brief note: it's not part of matching. It tells a consumer (relay, storage
|
||||
engine) how many results to return. Include it in the struct because it's part of the
|
||||
NIP-01 filter object, but `matches()` ignores it.
|
||||
|
||||
- For serialization, the interesting part is the tag flattening. Show the JSON representation
|
||||
and explain how `tags: {"e": {"abc"}, "p": {"def"}}` becomes `{"#e": ["abc"], "#p": ["def"]}`.
|
||||
|
||||
- `filter_id` and `filter_group` are utility functions, not methods, because they serve
|
||||
infrastructure concerns (dedup, subscription management) rather than core filter semantics.
|
||||
|
||||
- `cardinality` leverages kind classification from the kinds chapter. Connect the dots:
|
||||
replaceable events have at most one per author per kind, addressable events have at most
|
||||
one per author per kind per identifier.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
1. **`Option<BTreeSet<T>>` for set fields** — Preserves the None-vs-empty distinction that
|
||||
NIP-01 requires. BTreeSet gives O(log n) membership checks and deterministic iteration
|
||||
order for serialization/hashing. (Research: rust-nostr uses this approach.)
|
||||
|
||||
2. **Arbitrary string tag keys** — Not restricted to single letters. The protocol allows any
|
||||
tag name; single-letter indexing is a relay optimization. Consumers can enforce restrictions.
|
||||
|
||||
3. **Minimal builder API** — `.id()`, `.author()`, `.kind()`, `.tag()`, `.address()` plus
|
||||
plural variants. No convenience methods for every common tag (#e, #p, #t, etc.) — the
|
||||
generic `.tag("e", value)` is clear enough. Keeps the chapter focused.
|
||||
|
||||
4. **`limit` in struct but not in matching** — NIP-01 defines it as part of the filter object,
|
||||
so it belongs in the struct. But it's a result constraint, not a predicate, so `matches()`
|
||||
ignores it. (Research: NDK, nostr-tools, all implementations agree on this.)
|
||||
|
||||
5. **Free functions for identity/grouping** — `filter_id` and `filter_group` are not methods
|
||||
because they serve infrastructure concerns. Keeps the Filter impl block focused on
|
||||
construction and matching.
|
||||
|
||||
6. **`cardinality` returns `Option<usize>`** — `None` means unbounded. Leverages kind
|
||||
classification (replaceable, addressable) to compute tight upper bounds when possible.
|
||||
(Research: nostr-tools' `getFilterLimit`, nostrlib's `GetTheoreticalLimit`.)
|
||||
|
||||
7. **Custom serde for tag flattening** — Tags serialize as `#name` keys at the top level of
|
||||
the JSON object, matching the NIP-01 wire format. This requires custom Serialize/Deserialize
|
||||
implementations rather than derive macros.
|
||||
|
||||
8. **`.address()` convenience** — Translates an Address into the correct combination of kind,
|
||||
author, and #d tag filter. This is the one domain-aware convenience method because
|
||||
address-based filtering is extremely common and error-prone to construct manually.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- Should `filter_group` include tag *names* (keys) in the group hash, or only the set of
|
||||
field names that are present? Including tag names means `{#e: [...]}` and `{#p: [...]}`
|
||||
are in different groups (correct for merging). Leaning toward including tag names.
|
||||
@@ -0,0 +1,241 @@
|
||||
# Research: Filters
|
||||
|
||||
## Topic Summary
|
||||
|
||||
Filters are the NIP-01 data structure for matching events. They form an elegant primitive for
|
||||
matching events independent of the client/relay context — not just for REQ messages, but as a
|
||||
general-purpose event matching and querying abstraction. The chapter should cover the filter
|
||||
structure, matching semantics (AND within a filter, OR across filters), tag filters, timestamp
|
||||
constraints, limits, and programmatic construction/manipulation of filters.
|
||||
|
||||
## Philosophy
|
||||
|
||||
From `ref/building-nostr`:
|
||||
|
||||
**Filters as a conceptual category for tags**: The author identifies "filter tags" as one of
|
||||
three tag categories (alongside data and behavior tags). Filter tags are "especially useful for
|
||||
filtering and retrieval" and tend to be single-letter because relays only index single-letter
|
||||
tags to reduce database overhead.
|
||||
|
||||
**Filter structure (NIP-01)**: Standard fields are `ids`, `authors`, `kinds`, `since`, `until`,
|
||||
`limit`. Tag filters are created by prefixing tag names with `#` (e.g., `#p`, `#e`). Extensions
|
||||
include prefix matching and NIP-50 `search`. Negative matches were proposed but rejected due to
|
||||
relay performance concerns.
|
||||
|
||||
**Filters as a routing problem**: "Where to send a given filter" is distinct from "where to send
|
||||
a given event." Filters have less information than events, making routing harder. A routing
|
||||
heuristic "connects a filter that might be constructed to support a particular use case with the
|
||||
relay where matching events are stored." This is analogous to database indexes.
|
||||
|
||||
**Design principles**:
|
||||
- **Minimalism**: Filters match discrete criteria without complex negation or boolean logic
|
||||
- **Decentralization**: Clients understand routing heuristics; relays don't need to understand intent
|
||||
- **Extensibility**: The `#tag` convention allows arbitrary new filters without protocol changes
|
||||
- **Trade-offs**: Single-letter indexing limits expressiveness but maintains relay scalability
|
||||
- **Partition tolerance**: Missing a relay means missing some events, which is acceptable
|
||||
|
||||
## Reference Implementation Analysis
|
||||
|
||||
### applesauce
|
||||
|
||||
**Types**: `Filter` extends nostr-tools' `CoreFilter` with NIP-91 AND operator support (`&`-prefixed
|
||||
tag names) and NIP-50 `search` field. All standard NIP-01 fields present.
|
||||
|
||||
**Matching**: `matchFilter(filter, event)` checks basic fields first for early rejection, then
|
||||
uses `getIndexableTags(event)` to build a cached `Set<string>` of `"tagName:value"` pairs on the
|
||||
event (via Symbol key). NIP-91 AND tags processed before OR tags. `matchFilters` implements OR
|
||||
across filter arrays.
|
||||
|
||||
**Utilities**: `mergeFilters(...filters)` unions array fields and deduplicates; takes minimum
|
||||
`limit`, minimum `since`, maximum `until`. `isFilterEqual` uses `fast-deep-equal` for subscription
|
||||
deduplication. `createFilterMap` distributes filters across relays by author.
|
||||
|
||||
**SQL layer**: Separate `buildFilterConditions` translates filters to SQL WHERE clauses. AND tags
|
||||
use `GROUP BY`/`HAVING COUNT` subqueries. OR tags use `IN` subqueries.
|
||||
|
||||
**Patterns**: Symbol-based memoization of tag indexes on event objects. Early exit optimization.
|
||||
Only indexes single-letter tags. RxJS integration for streaming filter results.
|
||||
|
||||
### ndk
|
||||
|
||||
**Types**: `NDKFilter<K extends NDKKind>` — generic parameterized type with all NIP-01 fields plus
|
||||
`search`. Dynamic tag properties via `[key: #${string}]`.
|
||||
|
||||
**Matching**: `matchFilter(filter, event)` uses `indexOf()` for array membership. Special case:
|
||||
`#t` (hashtag) tags are case-insensitive. Does NOT check `limit` or `search` — these are
|
||||
submission constraints, not matching criteria. Short-circuits on first mismatch.
|
||||
|
||||
**Utilities**:
|
||||
- `mergeFilters()` — unions arrays, deduplicates via Set. Preserves filters with `limit` separately (limits can't be merged). Returns array.
|
||||
- `filterFingerprint()` — deterministic hash of filter structure for subscription grouping
|
||||
- `compareFilter()` — checks if one filter is a subset of another (for cache-hit validation)
|
||||
- `filterFromId()` — converts bech32 identifiers to filters
|
||||
- `filterForEventsTaggingId()` — creates tag filters for events referencing a given ID
|
||||
|
||||
**Validation**: Three modes (VALIDATE/FIX/IGNORE). Checks for undefined values, type correctness,
|
||||
hex format, kind range. Guardrails catch common mistakes: empty filters, bech32 in hex arrays,
|
||||
`since > until`, `#t` with literal `#` prefix.
|
||||
|
||||
**Patterns**: Generic type parameterization. Pluggable validation modes. Readable subscription IDs
|
||||
generated from filter structure.
|
||||
|
||||
### nostr-gadgets
|
||||
|
||||
Uses `nostr-tools` filter types and functions directly — no custom filter implementation.
|
||||
|
||||
**Construction patterns**: Mutable accumulation (`filter.authors?.push(target)`), inline literals,
|
||||
spread-based composition (`{ ...f, authors: [pubkey], since: newest }`).
|
||||
|
||||
**Filter-based deletion**: Converts event tags to filter arrays for batch deletion operations.
|
||||
|
||||
**Multi-level filtering**: Filters used at query construction, relay permission checking (purgatory),
|
||||
and client-side event matching.
|
||||
|
||||
### nostrlib
|
||||
|
||||
**Types**: Go struct with `IDs []ID`, `Kinds []Kind`, `Authors []PubKey`, `Tags TagMap`,
|
||||
`Since Timestamp`, `Until Timestamp`, `Limit int`, `Search string`, `LimitZero bool`. Uses
|
||||
fixed-size byte arrays for IDs/PubKeys.
|
||||
|
||||
**Matching**: Two methods:
|
||||
- `Matches(event)` — full matching including timestamp constraints
|
||||
- `MatchesIgnoringTimestampConstraints(event)` — `[//go:inline]` optimized, used for live events after EOSE
|
||||
|
||||
Tag matching via `tags.ContainsAny(tagName, values)`. Uses `slices.Contains()` for array membership.
|
||||
|
||||
**Utilities**: `Clone()` deep copies. `FilterEqual()` order-independent comparison. `GetTheoreticalLimit()`
|
||||
estimates max results considering replaceability. No merging functions.
|
||||
|
||||
**Serialization**: Custom easyjson codec with `xhex` for fast hex encoding. `LimitZero` bool
|
||||
distinguishes `"limit": 0` from omitted limit.
|
||||
|
||||
**Patterns**: Pure data structure with stateless methods. Subscription switches matching function
|
||||
after EOSE. Query optimizer scores tags by "goodness" for index selection.
|
||||
|
||||
### nostr-tools
|
||||
|
||||
**Types**: Simple TypeScript type with all NIP-01 fields. Index signature `[key: #${string}]` for
|
||||
dynamic tag filters. All properties optional.
|
||||
|
||||
**Matching**: `matchFilter(filter, event)` — conjunctive (AND) matching. Uses `indexOf()` for
|
||||
membership. Iterates filter properties for `#`-prefixed tag filters. Both `since` and `until` are
|
||||
inclusive. `matchFilters` implements OR across array.
|
||||
|
||||
**Utilities**:
|
||||
- `mergeFilters(...filters)` — unions array properties, takes max `limit`, min `since`, max `until`
|
||||
- `getFilterLimit(filter)` — computes intrinsic limit considering replaceability:
|
||||
- Empty arrays → 0
|
||||
- IDs → `ids.length`
|
||||
- Replaceable kinds → `authors.length * kinds.length`
|
||||
- Addressable kinds → `authors.length * kinds.length * #d.length`
|
||||
- Returns minimum across all applicable constraints
|
||||
|
||||
**Patterns**: Minimalist, functional, no external dependencies for filter logic. Pure JavaScript.
|
||||
Early exit on mismatch. Kind classification integration for limit calculation.
|
||||
|
||||
**Design**: Self-contained. No validation. `search` field defined but not used in matching logic.
|
||||
|
||||
### rust-nostr
|
||||
|
||||
**Types**: `Filter` struct with `Option<BTreeSet<T>>` for all set fields. Uses `BTreeSet` for
|
||||
O(log n) lookups and deterministic serialization. `generic_tags: BTreeMap<SingleLetterTag, BTreeSet<String>>`
|
||||
for dynamic tag filters.
|
||||
|
||||
**Matching**: `match_event(&self, event, opts)` with `MatchEventOptions` controlling which fields
|
||||
to check (7 boolean flags). Individual match methods are `#[inline]`. Tag matching uses lazy-initialized
|
||||
`event.tags.indexes()` (OnceCell pattern). NIP-50 search: case-insensitive substring via `.windows()`.
|
||||
|
||||
**Builder pattern**: Fluent chainable methods consuming `self`: `Filter::new().kind(k).author(pk)`.
|
||||
Convenience methods for common tags: `.event()`, `.pubkey()`, `.hashtag()`, `.identifier()`,
|
||||
`.coordinate()`. Generic `.custom_tag()` for arbitrary tags. Remove methods return `None` if set
|
||||
becomes empty.
|
||||
|
||||
**Option semantics**: `None` = no constraint (matches all). `Some(empty_set)` = matches nothing.
|
||||
This distinction is explicitly documented (GitHub issue #302).
|
||||
|
||||
**Utilities**: `is_empty()`, `extract_public_keys()`. No merging/combining API — multiple filters
|
||||
handled at protocol layer.
|
||||
|
||||
**Patterns**: no_std compatible (uses `alloc`). BTreeSet for deterministic ordering. Custom serde
|
||||
with `#[serde(flatten)]` for generic tags.
|
||||
|
||||
### welshman
|
||||
|
||||
**Types**: Standard NIP-01 filter type. `neverFilter = {ids: []}` constant for "matches nothing."
|
||||
|
||||
**Matching**: Delegates to nostr-tools for NIP-01 matching. Extends with search: splits by
|
||||
whitespace, case-insensitive, requires ALL terms match (AND logic).
|
||||
|
||||
**Utilities**:
|
||||
- `getFilterId(filter)` — deterministic hash for deduplication (sort keys, join, hash)
|
||||
- `calculateFilterGroup(filter)` — groups by matching space (structural fields vs temporal)
|
||||
- `unionFilters(filters)` — groups by `calculateFilterGroup`, merges arrays within groups
|
||||
- `intersectFilters(groups)` — Cartesian product across filter groups with intelligent merging (max `since`, min `until`, max `limit`, concatenate `search`)
|
||||
- `getIdFilters(idsOrAddresses)` — converts mix of IDs and addresses to filters
|
||||
- `getReplyFilters(events)` — generates filters for replies (#e for regular, #a for replaceable)
|
||||
- `addRepostFilters(filters)` — adds repost kind variants
|
||||
- `trimFilter(filter)` — caps array fields at 1000 items with random sampling
|
||||
- `getFilterGenerality()` — heuristic score 0 (specific) to 1 (general)
|
||||
|
||||
**Patterns**: Functional composition. Immutable transformations. Hash-based deduplication.
|
||||
Domain-driven builders for common query patterns.
|
||||
|
||||
## Common Patterns
|
||||
|
||||
1. **Type structure**: All implementations use optional fields. Missing = no constraint. The `#tag`
|
||||
convention for dynamic tag filters is universal.
|
||||
|
||||
2. **AND/OR semantics**: Universal agreement — AND within a single filter, OR across an array of
|
||||
filters. This is fundamental to NIP-01.
|
||||
|
||||
3. **Matching order**: Most implementations check scalar fields first (ids, kinds, authors) for
|
||||
early exit before the more expensive tag matching.
|
||||
|
||||
4. **Tag indexing**: Several implementations build cached indexes on events for efficient repeated
|
||||
matching (applesauce: Symbol-based Set cache; rust-nostr: OnceCell BTreeMap).
|
||||
|
||||
5. **No negation**: No implementation supports negative matching (NOT). This aligns with protocol
|
||||
design — rejected for relay performance reasons.
|
||||
|
||||
6. **Limit semantics**: `limit` is not a matching criterion — it's a result count constraint.
|
||||
Most matching functions ignore it. `getFilterLimit`/`GetTheoreticalLimit` computes intrinsic
|
||||
upper bounds based on kind replaceability.
|
||||
|
||||
7. **Merging**: Most implementations provide union-style filter merging. Array fields are unioned
|
||||
and deduplicated. Scalar fields use min/max logic.
|
||||
|
||||
8. **Option vs empty**: rust-nostr explicitly distinguishes `None` (no constraint) from
|
||||
`Some(empty)` (matches nothing). Other implementations handle this implicitly.
|
||||
|
||||
## Considerations for Our Implementation
|
||||
|
||||
1. **Filter as a standalone primitive**: Frame filters independent of REQ messages. They're a
|
||||
general-purpose matching predicate over events.
|
||||
|
||||
2. **Struct design**: Use `Option<BTreeSet<T>>` following rust-nostr's approach — it correctly
|
||||
models the distinction between "no constraint" and "empty constraint." BTreeSet gives
|
||||
deterministic serialization and O(log n) lookups.
|
||||
|
||||
3. **Matching function**: Implement `matches(&self, event: &Event) -> bool` with early exit on
|
||||
scalar fields. Tag matching should use event tag indexes.
|
||||
|
||||
4. **Builder pattern**: Fluent API for construction: `Filter::new().kind(1).author(pk)`.
|
||||
Convenience methods for common tags (#e, #p, #d, #a, #t).
|
||||
|
||||
5. **Generic tag filters**: Support arbitrary single-letter tag filters via
|
||||
`BTreeMap<SingleLetterTag, BTreeSet<String>>` or similar.
|
||||
|
||||
6. **Serialization**: Custom JSON serialization to flatten generic tags as `#tag` keys. Handle
|
||||
`limit: 0` vs omitted limit.
|
||||
|
||||
7. **No merging in core**: Following rust-nostr, keep the filter primitive simple. Merging and
|
||||
combining can live in higher-level utilities if needed.
|
||||
|
||||
8. **Limit calculation**: Consider `getFilterLimit`-style intrinsic limit computation based on
|
||||
kind replaceability — useful for query optimization.
|
||||
|
||||
9. **Dependencies**: Filter should depend only on existing types (Event, EventId, Pubkey, Kind,
|
||||
Timestamp, Tags). Self-contained within coracle-lib.
|
||||
|
||||
10. **Test strategy**: Test matching logic thoroughly — all field types, AND semantics, tag
|
||||
matching, timestamp boundaries, empty sets vs None, edge cases.
|
||||
Reference in New Issue
Block a user