Files
coracle-rust/book/plan/search.md
T
2026-05-20 16:07:58 -07:00

9.3 KiB
Raw Blame History

Plan: Search

Topic Summary

NIP-50 adds an optional full-text search field to the subscription filter from chapter 11. A relay that supports the capability interprets the query against event content (and, for some kinds, other fields), returning results ordered by relevance rather than created_at, with limit applied after ranking. The query may carry key:value extensions — domain:, language:, sentiment:, nsfw:, include:spam — which relays may support or ignore.

This chapter extends Filter with a search field, threads it through serialization / grouping / set algebra, introduces a typed SearchQuery that splits free-text terms from key:value extensions, and implements a best-effort local relevance score in [0, 1] used to both include and rank events — mirroring the NIP's "descending order by quality of result, limit last."

Chapter Outline

  1. Intro / framing — Search as a relay-defined, optional capability; content discovery is client-initiated routing, not a global index; results are partial and ranked by the relay. The local matcher is an honest best-effort fallback, not a reimplementation of relay search.
  2. The search field — Add search: Option<String> to Filter; builder methods add_search / clear_search; note it joins the derived Hash (so id() covers it for free).
  3. Serialization — Emit/parse a plain "search" key in the hand-written serde impl, present only when Some.
  4. The SearchQuery model — A new search module: terms + ordered key:value extensions, parse, Display, builders, and the Filter bridge.
  5. Scoring & matchingsearch_score (fraction-of-terms + diminishing frequency bonus, capped at 1.0); matches includes an event when score > 0; rank_search_results sorts by score then created_at and applies limit.
  6. Grouping and set algebrasearch enters group() (distinct searches never merge); union_filters carries it through unchanged; intersect_filters keeps a conflicting-search pair separate instead of fabricating a combined query.
  7. What's next — Brief pointer to the Domain section (relay selection, discovering NIP-50-capable relays via relay metadata, is a later concern).

API Design

coracle-lib/src/filters.rs (extends existing Filter)

pub struct Filter {
    // ... existing fields ...
    /// NIP-50 full-text search query. Relay-interpreted; see `SearchQuery`.
    pub search: Option<String>,
}

impl Filter {
    pub fn add_search(self, search: impl Into<String>) -> Self;   // sets Some
    pub fn clear_search(self) -> Self;                            // sets None

    /// Bridge to the typed model.
    pub fn add_search_query(self, query: &SearchQuery) -> Self;   // = add_search(query.to_string())
    pub fn search_query(&self) -> Option<SearchQuery>;            // parse the field back

    /// Best-effort local relevance score in [0.0, 1.0].
    /// Returns 1.0 when there is no search, or a search with no free-text
    /// terms (only extensions, which are unenforceable locally).
    pub fn search_score(&self, event: &Event) -> f64;
}

/// Filter `events` to those matching `filter`, sort by relevance
/// (search_score desc, then created_at desc), and apply `filter.limit`.
pub fn rank_search_results<'a>(filter: &Filter, events: &'a [Event]) -> Vec<&'a Event>;

matches gains a final check: if self.search_score(event) == 0.0 { return false }. Because search_score returns 1.0 when there is no search (or no terms), this only rejects when a search with terms matched none of them — i.e. "any term present ⇒ included."

coracle-lib/src/search.rs (new module)

/// A parsed NIP-50 search query: free-text terms plus `key:value` extensions.
#[derive(Debug, Clone, PartialEq, Eq, Default)]
pub struct SearchQuery {
    pub terms: Vec<String>,
    pub extensions: Vec<(String, String)>,  // ordered; repeats allowed
}

impl SearchQuery {
    pub fn new() -> Self;
    /// Total parse: split on whitespace; a token is an extension iff it is
    /// `key:value` with key in [A-Za-z0-9_-]+, non-empty value not starting
    /// with '/'. Everything else is a term. Never fails.
    pub fn parse(input: &str) -> Self;
    pub fn add_term(self, term: impl Into<String>) -> Self;
    pub fn add_extension(self, key: impl Into<String>, value: impl Into<String>) -> Self;
    pub fn is_empty(&self) -> bool;
}

impl fmt::Display for SearchQuery { /* terms first, then "key:value" exts, space-joined */ }

Filter::matches / search_score tokenize via SearchQuery::parse, using only terms (extensions are ignored by the local matcher).

Scoring formula (search_score)

For the parsed query's distinct terms (case-insensitive), against event.content lowercased:

  • total = number of distinct terms; if 0 → return 1.0.
  • For each term, count = non-overlapping occurrences in content.
  • matched = terms with count ≥ 1; extra = (Σ count) matched (repeats beyond the first hit of each matched term).
  • base = matched / total (fraction of terms present, in [0, 1]).
  • bonus = (1 1/(1 + extra)) / total (diminishing, strictly < 1/total, so a partial match never reaches the next term's bucket).
  • score = (base + bonus).min(1.0).

Properties (asserted in tests): in [0, 1]; all terms once ⇒ 1.0; missing a term ⇒ < 1.0; more occurrences ⇒ ≥ score (monotonic, never exceeds 1.0); no terms matched ⇒ exactly 0.0.

Code Organization

  • coracle-lib/src/filters.rs — add the search field, builders, the serde changes, search_score, the matches check, rank_search_results, and the group() / intersect_filters updates. use crate::search::SearchQuery;.
  • coracle-lib/src/search.rs — the SearchQuery type. New pub mod search; in lib.rs, placed before filters (filters depends on it).
  • coracle-lib/src/prelude.rs — add pub use crate::search::SearchQuery; (the prelude already re-exports commonly used items).
  • coracle-lib/tests/search.rs — hand-written integration tests (not tangled).

Dependencies

None new. Parsing and matching use std only. No FTS engine — out of scope and against the minimal-dependency rule.

Narrative Notes

  • Open with the philosophy: search is opt-in and relay-defined; no global index; results partial and relay-ranked. Frame the local scorer as a fallback for in-memory/offline querying, and warn (per rust-nostr's SDK) that re-filtering a relay's returned results client-side can wrongly drop legitimate hits — relays rank with richer, extension-aware logic.
  • Explain why extensions are parsed but ignored locally: sentiment:, domain:, etc. require data the client doesn't have, so honoring them locally is impossible; we keep them in the typed model for building/inspecting queries, not for local evaluation.
  • Justify the score model concretely: NIP-50 mandates relevance ordering, so a boolean match is the wrong shape — a [0,1] score lets us both include (score > 0) and rank. Walk through the fraction + diminishing-bonus formula with a small worked example.
  • For grouping: reuse the chapter-11 reasoning — two filters with different searches can't be unioned without changing semantics, so search joins the group key. Show that union_filters then keeps them separate automatically.
  • For intersect_filters: explain the one structural change — combine_pair returns Option<Filter>; a pair whose two searches differ returns None, and the caller emits both filters separately rather than concatenating queries.

Design Decisions

  1. Typed SearchQuery, lean/generic. Terms + a generic ordered list of key:value extensions, with add_term/add_extension. No per-extension helpers or typed enums — keeps the surface small and forward-compatible with relay-specific extensions. (Every reference treats search as opaque; the typed model is our value-add.)
  2. Local relevance score in [0, 1], fraction-of-terms + diminishing frequency bonus, capped at 1.0. Chosen over a boolean to model NIP-50's relevance ordering. Extensions excluded from scoring.
  3. matches includes on score > 0 ("any term present"); ranking via rank_search_results handles relevance + limit-after-sort.
  4. search participates in group(), so union_filters never merges distinct searches.
  5. intersect_filters keeps a conflicting-search pair separate (combine returns Option, None ⇒ emit both) rather than concatenating, per the user's choice.
  6. Builder naming add_search/clear_search to match the existing add_since/clear_since vocabulary (not rust-nostr's search/remove_search).
  7. Unicode-aware lowercasing (to_lowercase) for the local matcher rather than ASCII-only, given multilingual nostr content; note the allocation trade-off. Substring counting via str::matches.
  8. Extension parse heuristic documented: a colon-bearing token like a URL may be read as an extension; applications needing exact control build SearchQuery field-by-field instead of parsing.

Open Questions

  • Exact wording of the frequency-bonus explanation — keep the formula in prose light; lean on a worked example. (Resolved during writing.)
  • Whether rank_search_results belongs as a free function (consistent with matches_any/union_filters) — yes, free function.