# Plan: Search ## Topic Summary NIP-50 adds an optional full-text `search` field to the subscription filter from chapter 11. A relay that supports the capability interprets the query against event content (and, for some kinds, other fields), returning results ordered by relevance rather than `created_at`, with `limit` applied after ranking. The query may carry `key:value` extensions — `domain:`, `language:`, `sentiment:`, `nsfw:`, `include:spam` — which relays may support or ignore. This chapter extends `Filter` with a `search` field, threads it through serialization / grouping / set algebra, introduces a typed `SearchQuery` that splits free-text terms from `key:value` extensions, and implements a best-effort local relevance **score in [0, 1]** used to both include and rank events — mirroring the NIP's "descending order by quality of result, limit last." ## Chapter Outline 1. **Intro / framing** — Search as a relay-defined, optional capability; content discovery is client-initiated routing, not a global index; results are partial and ranked by the relay. The local matcher is an honest best-effort fallback, not a reimplementation of relay search. 2. **The `search` field** — Add `search: Option` to `Filter`; builder methods `add_search` / `clear_search`; note it joins the derived `Hash` (so `id()` covers it for free). 3. **Serialization** — Emit/parse a plain `"search"` key in the hand-written serde impl, present only when `Some`. 4. **The `SearchQuery` model** — A new `search` module: terms + ordered `key:value` extensions, `parse`, `Display`, builders, and the `Filter` bridge. 5. **Scoring & matching** — `search_score` (fraction-of-terms + diminishing frequency bonus, capped at 1.0); `matches` includes an event when score > 0; `rank_search_results` sorts by score then `created_at` and applies `limit`. 6. **Grouping and set algebra** — `search` enters `group()` (distinct searches never merge); `union_filters` carries it through unchanged; `intersect_filters` keeps a conflicting-search pair separate instead of fabricating a combined query. 7. **What's next** — Brief pointer to the Domain section (relay selection, discovering NIP-50-capable relays via relay metadata, is a later concern). ## API Design ### `coracle-lib/src/filters.rs` (extends existing `Filter`) ```rust pub struct Filter { // ... existing fields ... /// NIP-50 full-text search query. Relay-interpreted; see `SearchQuery`. pub search: Option, } impl Filter { pub fn add_search(self, search: impl Into) -> Self; // sets Some pub fn clear_search(self) -> Self; // sets None /// Bridge to the typed model. pub fn add_search_query(self, query: &SearchQuery) -> Self; // = add_search(query.to_string()) pub fn search_query(&self) -> Option; // parse the field back /// Best-effort local relevance score in [0.0, 1.0]. /// Returns 1.0 when there is no search, or a search with no free-text /// terms (only extensions, which are unenforceable locally). pub fn search_score(&self, event: &Event) -> f64; } /// Filter `events` to those matching `filter`, sort by relevance /// (search_score desc, then created_at desc), and apply `filter.limit`. pub fn rank_search_results<'a>(filter: &Filter, events: &'a [Event]) -> Vec<&'a Event>; ``` `matches` gains a final check: `if self.search_score(event) == 0.0 { return false }`. Because `search_score` returns 1.0 when there is no search (or no terms), this only rejects when a search *with terms* matched none of them — i.e. "any term present ⇒ included." ### `coracle-lib/src/search.rs` (new module) ```rust /// A parsed NIP-50 search query: free-text terms plus `key:value` extensions. #[derive(Debug, Clone, PartialEq, Eq, Default)] pub struct SearchQuery { pub terms: Vec, pub extensions: Vec<(String, String)>, // ordered; repeats allowed } impl SearchQuery { pub fn new() -> Self; /// Total parse: split on whitespace; a token is an extension iff it is /// `key:value` with key in [A-Za-z0-9_-]+, non-empty value not starting /// with '/'. Everything else is a term. Never fails. pub fn parse(input: &str) -> Self; pub fn add_term(self, term: impl Into) -> Self; pub fn add_extension(self, key: impl Into, value: impl Into) -> Self; pub fn is_empty(&self) -> bool; } impl fmt::Display for SearchQuery { /* terms first, then "key:value" exts, space-joined */ } ``` `Filter::matches` / `search_score` tokenize via `SearchQuery::parse`, using only `terms` (extensions are ignored by the local matcher). ### Scoring formula (`search_score`) For the parsed query's distinct `terms` (case-insensitive), against `event.content` lowercased: - `total` = number of distinct terms; if 0 → return 1.0. - For each term, `count` = non-overlapping occurrences in content. - `matched` = terms with `count ≥ 1`; `extra` = (Σ count) − matched (repeats beyond the first hit of each matched term). - `base = matched / total` (fraction of terms present, in [0, 1]). - `bonus = (1 − 1/(1 + extra)) / total` (diminishing, strictly `< 1/total`, so a partial match never reaches the next term's bucket). - `score = (base + bonus).min(1.0)`. Properties (asserted in tests): in [0, 1]; all terms once ⇒ 1.0; missing a term ⇒ `< 1.0`; more occurrences ⇒ ≥ score (monotonic, never exceeds 1.0); no terms matched ⇒ exactly 0.0. ## Code Organization - **`coracle-lib/src/filters.rs`** — add the `search` field, builders, the serde changes, `search_score`, the `matches` check, `rank_search_results`, and the `group()` / `intersect_filters` updates. `use crate::search::SearchQuery;`. - **`coracle-lib/src/search.rs`** — the `SearchQuery` type. New `pub mod search;` in `lib.rs`, placed before `filters` (filters depends on it). - **`coracle-lib/src/prelude.rs`** — add `pub use crate::search::SearchQuery;` (the prelude already re-exports commonly used items). - **`coracle-lib/tests/search.rs`** — hand-written integration tests (not tangled). ## Dependencies None new. Parsing and matching use `std` only. No FTS engine — out of scope and against the minimal-dependency rule. ## Narrative Notes - Open with the philosophy: search is opt-in and relay-defined; no global index; results partial and relay-ranked. Frame the local scorer as a fallback for in-memory/offline querying, and warn (per rust-nostr's SDK) that re-filtering a relay's returned results client-side can wrongly drop legitimate hits — relays rank with richer, extension-aware logic. - Explain *why* extensions are parsed but **ignored locally**: `sentiment:`, `domain:`, etc. require data the client doesn't have, so honoring them locally is impossible; we keep them in the typed model for *building/inspecting* queries, not for local evaluation. - Justify the score model concretely: NIP-50 mandates relevance ordering, so a boolean match is the wrong shape — a [0,1] score lets us both include (score > 0) and rank. Walk through the fraction + diminishing-bonus formula with a small worked example. - For grouping: reuse the chapter-11 reasoning — two filters with different searches can't be unioned without changing semantics, so `search` joins the group key. Show that `union_filters` then keeps them separate automatically. - For `intersect_filters`: explain the one structural change — `combine_pair` returns `Option`; a pair whose two searches differ returns `None`, and the caller emits both filters separately rather than concatenating queries. ## Design Decisions 1. **Typed `SearchQuery`, lean/generic.** Terms + a generic ordered list of `key:value` extensions, with `add_term`/`add_extension`. No per-extension helpers or typed enums — keeps the surface small and forward-compatible with relay-specific extensions. (Every reference treats search as opaque; the typed model is our value-add.) 2. **Local relevance score in [0, 1]**, fraction-of-terms + diminishing frequency bonus, capped at 1.0. Chosen over a boolean to model NIP-50's relevance ordering. Extensions excluded from scoring. 3. **`matches` includes on score > 0** ("any term present"); ranking via `rank_search_results` handles relevance + `limit`-after-sort. 4. **`search` participates in `group()`**, so `union_filters` never merges distinct searches. 5. **`intersect_filters` keeps a conflicting-search pair separate** (combine returns `Option`, `None` ⇒ emit both) rather than concatenating, per the user's choice. 6. **Builder naming `add_search`/`clear_search`** to match the existing `add_since`/`clear_since` vocabulary (not rust-nostr's `search`/`remove_search`). 7. **Unicode-aware lowercasing** (`to_lowercase`) for the local matcher rather than ASCII-only, given multilingual nostr content; note the allocation trade-off. Substring counting via `str::matches`. 8. **Extension parse heuristic** documented: a colon-bearing token like a URL may be read as an extension; applications needing exact control build `SearchQuery` field-by-field instead of parsing. ## Open Questions - Exact wording of the frequency-bonus explanation — keep the formula in prose light; lean on a worked example. (Resolved during writing.) - Whether `rank_search_results` belongs as a free function (consistent with `matches_any`/`union_filters`) — yes, free function.