9.3 KiB
Plan: Search
Topic Summary
NIP-50 adds an optional full-text search field to the subscription filter from
chapter 11. A relay that supports the capability interprets the query against
event content (and, for some kinds, other fields), returning results ordered by
relevance rather than created_at, with limit applied after ranking. The
query may carry key:value extensions — domain:, language:, sentiment:,
nsfw:, include:spam — which relays may support or ignore.
This chapter extends Filter with a search field, threads it through
serialization / grouping / set algebra, introduces a typed SearchQuery that
splits free-text terms from key:value extensions, and implements a best-effort
local relevance score in [0, 1] used to both include and rank events —
mirroring the NIP's "descending order by quality of result, limit last."
Chapter Outline
- Intro / framing — Search as a relay-defined, optional capability; content discovery is client-initiated routing, not a global index; results are partial and ranked by the relay. The local matcher is an honest best-effort fallback, not a reimplementation of relay search.
- The
searchfield — Addsearch: Option<String>toFilter; builder methodsadd_search/clear_search; note it joins the derivedHash(soid()covers it for free). - Serialization — Emit/parse a plain
"search"key in the hand-written serde impl, present only whenSome. - The
SearchQuerymodel — A newsearchmodule: terms + orderedkey:valueextensions,parse,Display, builders, and theFilterbridge. - Scoring & matching —
search_score(fraction-of-terms + diminishing frequency bonus, capped at 1.0);matchesincludes an event when score > 0;rank_search_resultssorts by score thencreated_atand applieslimit. - Grouping and set algebra —
searchentersgroup()(distinct searches never merge);union_filterscarries it through unchanged;intersect_filterskeeps a conflicting-search pair separate instead of fabricating a combined query. - What's next — Brief pointer to the Domain section (relay selection, discovering NIP-50-capable relays via relay metadata, is a later concern).
API Design
coracle-lib/src/filters.rs (extends existing Filter)
pub struct Filter {
// ... existing fields ...
/// NIP-50 full-text search query. Relay-interpreted; see `SearchQuery`.
pub search: Option<String>,
}
impl Filter {
pub fn add_search(self, search: impl Into<String>) -> Self; // sets Some
pub fn clear_search(self) -> Self; // sets None
/// Bridge to the typed model.
pub fn add_search_query(self, query: &SearchQuery) -> Self; // = add_search(query.to_string())
pub fn search_query(&self) -> Option<SearchQuery>; // parse the field back
/// Best-effort local relevance score in [0.0, 1.0].
/// Returns 1.0 when there is no search, or a search with no free-text
/// terms (only extensions, which are unenforceable locally).
pub fn search_score(&self, event: &Event) -> f64;
}
/// Filter `events` to those matching `filter`, sort by relevance
/// (search_score desc, then created_at desc), and apply `filter.limit`.
pub fn rank_search_results<'a>(filter: &Filter, events: &'a [Event]) -> Vec<&'a Event>;
matches gains a final check: if self.search_score(event) == 0.0 { return false }.
Because search_score returns 1.0 when there is no search (or no terms), this
only rejects when a search with terms matched none of them — i.e. "any term
present ⇒ included."
coracle-lib/src/search.rs (new module)
/// A parsed NIP-50 search query: free-text terms plus `key:value` extensions.
#[derive(Debug, Clone, PartialEq, Eq, Default)]
pub struct SearchQuery {
pub terms: Vec<String>,
pub extensions: Vec<(String, String)>, // ordered; repeats allowed
}
impl SearchQuery {
pub fn new() -> Self;
/// Total parse: split on whitespace; a token is an extension iff it is
/// `key:value` with key in [A-Za-z0-9_-]+, non-empty value not starting
/// with '/'. Everything else is a term. Never fails.
pub fn parse(input: &str) -> Self;
pub fn add_term(self, term: impl Into<String>) -> Self;
pub fn add_extension(self, key: impl Into<String>, value: impl Into<String>) -> Self;
pub fn is_empty(&self) -> bool;
}
impl fmt::Display for SearchQuery { /* terms first, then "key:value" exts, space-joined */ }
Filter::matches / search_score tokenize via SearchQuery::parse, using only
terms (extensions are ignored by the local matcher).
Scoring formula (search_score)
For the parsed query's distinct terms (case-insensitive), against
event.content lowercased:
total= number of distinct terms; if 0 → return 1.0.- For each term,
count= non-overlapping occurrences in content. matched= terms withcount ≥ 1;extra= (Σ count) − matched (repeats beyond the first hit of each matched term).base = matched / total(fraction of terms present, in [0, 1]).bonus = (1 − 1/(1 + extra)) / total(diminishing, strictly< 1/total, so a partial match never reaches the next term's bucket).score = (base + bonus).min(1.0).
Properties (asserted in tests): in [0, 1]; all terms once ⇒ 1.0; missing a term
⇒ < 1.0; more occurrences ⇒ ≥ score (monotonic, never exceeds 1.0); no terms
matched ⇒ exactly 0.0.
Code Organization
coracle-lib/src/filters.rs— add thesearchfield, builders, the serde changes,search_score, thematchescheck,rank_search_results, and thegroup()/intersect_filtersupdates.use crate::search::SearchQuery;.coracle-lib/src/search.rs— theSearchQuerytype. Newpub mod search;inlib.rs, placed beforefilters(filters depends on it).coracle-lib/src/prelude.rs— addpub use crate::search::SearchQuery;(the prelude already re-exports commonly used items).coracle-lib/tests/search.rs— hand-written integration tests (not tangled).
Dependencies
None new. Parsing and matching use std only. No FTS engine — out of scope and
against the minimal-dependency rule.
Narrative Notes
- Open with the philosophy: search is opt-in and relay-defined; no global index; results partial and relay-ranked. Frame the local scorer as a fallback for in-memory/offline querying, and warn (per rust-nostr's SDK) that re-filtering a relay's returned results client-side can wrongly drop legitimate hits — relays rank with richer, extension-aware logic.
- Explain why extensions are parsed but ignored locally:
sentiment:,domain:, etc. require data the client doesn't have, so honoring them locally is impossible; we keep them in the typed model for building/inspecting queries, not for local evaluation. - Justify the score model concretely: NIP-50 mandates relevance ordering, so a boolean match is the wrong shape — a [0,1] score lets us both include (score > 0) and rank. Walk through the fraction + diminishing-bonus formula with a small worked example.
- For grouping: reuse the chapter-11 reasoning — two filters with different
searches can't be unioned without changing semantics, so
searchjoins the group key. Show thatunion_filtersthen keeps them separate automatically. - For
intersect_filters: explain the one structural change —combine_pairreturnsOption<Filter>; a pair whose two searches differ returnsNone, and the caller emits both filters separately rather than concatenating queries.
Design Decisions
- Typed
SearchQuery, lean/generic. Terms + a generic ordered list ofkey:valueextensions, withadd_term/add_extension. No per-extension helpers or typed enums — keeps the surface small and forward-compatible with relay-specific extensions. (Every reference treats search as opaque; the typed model is our value-add.) - Local relevance score in [0, 1], fraction-of-terms + diminishing frequency bonus, capped at 1.0. Chosen over a boolean to model NIP-50's relevance ordering. Extensions excluded from scoring.
matchesincludes on score > 0 ("any term present"); ranking viarank_search_resultshandles relevance +limit-after-sort.searchparticipates ingroup(), sounion_filtersnever merges distinct searches.intersect_filterskeeps a conflicting-search pair separate (combine returnsOption,None⇒ emit both) rather than concatenating, per the user's choice.- Builder naming
add_search/clear_searchto match the existingadd_since/clear_sincevocabulary (not rust-nostr'ssearch/remove_search). - Unicode-aware lowercasing (
to_lowercase) for the local matcher rather than ASCII-only, given multilingual nostr content; note the allocation trade-off. Substring counting viastr::matches. - Extension parse heuristic documented: a colon-bearing token like a URL may
be read as an extension; applications needing exact control build
SearchQueryfield-by-field instead of parsing.
Open Questions
- Exact wording of the frequency-bonus explanation — keep the formula in prose light; lean on a worked example. (Resolved during writing.)
- Whether
rank_search_resultsbelongs as a free function (consistent withmatches_any/union_filters) — yes, free function.