Files
coracle-rust/book/research/search.md
T
2026-05-20 16:07:58 -07:00

16 KiB
Raw Blame History

Research: Search

Topic Summary

NIP-50 adds an optional full-text search field to the subscription filter introduced in chapter 11. A relay that supports the capability interprets the query string against event content (and, for some kinds, other fields), returning results ordered by relevance rather than created_at. The query may carry structured extensions in the form of key:value pairs — domain:, language:, sentiment:, nsfw:, include:spam — which relays may support or ignore.

The chapter will:

  1. Add a search field to the existing Filter type, wiring it through construction, serialization, hashing, grouping, and the union/intersect utilities.
  2. Introduce a typed SearchQuery model that splits free-text terms from key:value extensions, so applications can build and inspect queries safely instead of stringly-typed concatenation. (This is a deliberate departure from every reference, which treats the query as an opaque string.)
  3. Implement a best-effort, case-insensitive local matcher over event content, while documenting that real ranking and extension semantics are relay-defined.

The code lives in coracle-lib: the search field extends filters.rs, and the query model gets a dedicated search.rs module.

Philosophy

From ref/building-nostr, the framing relevant to search is that content discovery on nostr is client-initiated routing through relay selection, not a query against a global index. Searching is "knowing where to send queries." A relay that supports NIP-50 is exercising an optional, relay-authored capability — like content curation or access control — and defines its own matching semantics, including which extensions it honors. This mirrors the NIP's own "relays SHOULD ignore extensions they don't support."

Three principles bear directly on the chapter's voice:

  • No guaranteed completeness. "No implementation will have a complete view of every heuristic that is applicable" — so search results are neither global nor exhaustive. A client queries the relays it knows support search and accepts a partial, spontaneous view. This should be stated honestly, not hidden.
  • Indexing is the curator's responsibility, not the user's. Authors publish signed events; relays (or indexing services) that want content discoverable maintain the index. Clients do nothing special beyond sending a search filter to a search-capable relay.
  • Publicity, not privacy. Full-text indexing makes content patterns discoverable and gives relay operators visibility into queries. The honest framing: search is a publicity feature.

The takeaway for our library: model search as a first-class but optional filter field, keep the query structured enough that applications can reason about it, and be candid that local matching is a best-effort approximation of a relay-defined operation.

Reference Implementation Analysis

applesauce

search is an optional string on an extended Filter type (packages/core/src/helpers/filter.ts): Filter = CoreFilter & { search?: string }, extending nostr-tools' base type. Opaque — no extension parsing.

Dual-mode: relay subscriptions pass the string through verbatim; a local SQLite backend (packages/sqlite) indexes content into an FTS5 table and runs events_search MATCH ? with the raw string double-quote-escaped. Local client-side matchFilter() ignores the search field entirely. Pluggable "search content formatters" decide what gets indexed (default: content; enhanced: kind-0 profile fields plus t/subject/title/summary/d tags). Supports order: "created_at" | "rank" for FTS5 ranking. Low coupling; SQLite is optional. No query-extension awareness anywhere.

ndk

search?: string on NDKFilter (core/src/subscription/index.ts:30). Opaque, relay-only. No parsing, no validation (filter-validation pipeline skips it), no client-side matching (delegates to nostr-tools' matchFilters, which ignores search). No helper functions for building search filters; callers construct { search: "..." } by hand. The field is serialized and sent to relays as-is. No NIP-11 capability negotiation or fallback. Minimal by design.

nostr-gadgets

Re-uses @nostr/tools' Filter type (search?: string). Opaque, relay-only. Notably its local stores reject search: the in-memory store returns an empty set if filter.search is present, and the RedEventStore docs state "any filters supported (except 'search')." Provides a hardcoded SEARCH_RELAYS constant (defaults.ts): relay.nostr.band, nostr.wine, relay.noswhere.com, relay.nos.today. No query builders, no dynamic relay capability detection.

nostrlib (Go)

Search string on the Filter struct (filter.go), (de)serialized as a plain "search" JSON key. The core Filter.Matches / MatchesIgnoringTimestampConstraints ignores search — matching is delegated to eventstore backends. Key-value backends (BoltDB, LMDB, MMM) return nothing for search queries; only the Bleve backend implements real full-text search: per-document language auto-detection (lingua-go, 22 languages), per-language analyzers, boolean query syntax (AND/OR/NOT, parens, quoted phrases), NIP-27 reference extraction with 2× boost, and case-insensitive substring validation of quoted phrases. Kind-0 profiles index name/display_name/about; reposts unpack inner events. Khatru relay policies NoSearchQueries/RemoveSearchQueries let operators disable search. SDK SearchUsers() just sends a Search filter to designated user-search relays. No NIP-50 extension parsing (treats domain:x as a regular word); a 2-char minimum query length is enforced by Bleve.

nostr-tools

search?: string on the base Filter (filter.ts). The canonical "defined-but-unused" implementation. matchFilter()/matchFilters() do not check search at all; mergeFilters() drops it entirely. No parsing, no validation, no helpers, no tests for the field. Strictly a transport-layer placeholder so applications can send search filters to relays. Minimal-deps philosophy: search is purely a relay concern.

rust-nostr

The most directly relevant reference (also Rust). In crates/nostr/src/filter.rs:

/// A string describing a query in a human-readable form, i.e. "best nostr apps"
/// <https://github.com/nostr-protocol/nips/blob/master/50.md>
#[serde(skip_serializing_if = "Option::is_none")]
#[serde(default)]
pub search: Option<String>,

Builder API: search<S: Into<String>>(self, value: S) -> Self and remove_search(self) -> Self — symmetric, generic, #[inline]. Opaque (no extension parsing).

Local matching (search_match):

fn search_match(&self, event: &Event) -> bool {
    match &self.search {
        Some(query) => event.content.as_bytes()
            .windows(query.len())
            .any(|window| window.eq_ignore_ascii_case(query.as_bytes())),
        None => true,
    }
}

Case-insensitive ASCII substring via sliding window; None matches everything. Gated by a MatchEventOptions { nip50: bool, .. } flag (default true). Notably, the SDK relay sets .nip50(false) with the comment "Skip NIP-50 matches since they may create issues and ban non-malicious relays" — i.e. client-side re-matching of a relay's search results can wrongly drop valid hits. DB backends (LMDB, SQLite) extend matching to a fixed set of searchable tags — title, description, subject, name — lowercasing the query once up front; empty search → no results. A Features { full_text_search: bool } flag declares backend capability.

Patterns worth emulating: Into<String> builder, skip_serializing_if for a clean wire format, an explicit opt-out for search matching, ASCII case folding for speed.

welshman

The TypeScript toolkit our library descends from. search?: string on Filter (packages/util/src/Filters.ts). It is the only reference that matches search locally and threads it through filter utilities:

export const matchFilter = (filter, event) => {
  if (!nostrToolsMatchFilter(filter, event)) return false
  if (filter.search) {
    const content = event.content.toLowerCase()
    const terms = filter.search.toLowerCase().split(/\s+/g)
    for (const term of terms) {
      if (content.includes(term)) return true
      return false   // <-- bug: returns after first term
    }
  }
  return true
}

The intent is term-splitting + case-insensitive substring, but the early return false means only the first term is ever checked. A correct version should decide AND vs OR across terms explicitly — this is the one place we can clearly improve on the reference.

Filter utilities (directly parallel to our group/union_filters/intersect_filters):

  • calculateFilterGroup pushes search:${search} into the group key — a filter with a search is only mergeable with an identical search.
  • unionFilters treats search (like since/until/limit) as a scalar preserved from the first filter in the group, not merged.
  • intersectFilters concatenates differing searches with a space ([a, b].join(" ")) — modeling "must match both" as a compound query — and takes whichever is present otherwise.
  • getFilterId includes search in the deterministic hash, so different searches never dedupe.

Search-relay selection lives in the router: getSearchRelays() returns relays whose NIP-11 supported_nips includes "50". No extension parsing.

Common Patterns

  • search is universally an optional plain string. Every reference models it as Option<String> / search?: string. None parse the key:value extensions — they treat the whole query as opaque and let the relay interpret it. Our typed SearchQuery is therefore a value-add, not a port.
  • Local matching is the exception, not the rule. nostr-tools, ndk, applesauce (in matchFilter), and nostrlib's core Filter all ignore search locally; matching happens relay-side (or in a dedicated index like Bleve/FTS5). Only rust-nostr and welshman attempt local matching, both with case-insensitive substring over content.
  • Where matching exists, it's case-insensitive substring — rust-nostr does ASCII-only eq_ignore_ascii_case over byte windows (whole query as one needle); welshman lowercases and splits on whitespace into terms (intending multi-term, buggily). DB backends additionally search a small fixed set of metadata tags (title, description, subject, name).
  • Search makes filters un-mergeable. Both welshman (group key) and the general intuition agree: two filters with different search strings can't be unioned without changing semantics. rust-nostr sidesteps merging at this layer entirely.
  • Client-side re-matching is risky. rust-nostr's SDK disables NIP-50 matching when filtering relay results, because a relay's notion of a match (ranked, fuzzy, multi-field, extension-aware) is richer than a client's substring check — re-filtering can drop legitimate hits.
  • Relay selection by NIP-11. Search-capable relays are discovered via supported_nips containing 50 (welshman) or a hardcoded allowlist (nostr-gadgets). This is an application/networking concern, out of scope for coracle-lib.

Considerations for Our Implementation

Filter field. Add pub search: Option<String> to Filter. Follow rust-nostr: add_search<S: Into<String>>(self, S) and clear_search(self) to match the existing add_*/clear_* builder vocabulary (our methods are named add_since/clear_since, etc., so add_search/clear_search fits better than rust-nostr's search/remove_search). The field already participates in the derived Hash (so id() covers it for free), but serialization, group(), union_filters, intersect_filters, and matches() all need explicit updates.

Serialization. Our Filter has hand-written serde (to flatten #tag keys). Add search as a plain "search" key — emit only when Some (mirroring since/until/limit), and read it in the visitor's match arm. A round-trip test must cover it.

Grouping / union / intersect. Per welshman: include search in the group() hash so filters with different searches land in different groups (never merged). In union_filters, since group members share an identical search by construction, the search carries over via the or_insert_with(|| filter.clone()) seed — no special merge needed, but worth a comment. In combine_pair (intersect), decide how to combine two searches: welshman concatenates with a space. Concatenation is defensible ("must match both") but lossy and surprising; a cleaner rule for a typed model is to merge two SearchQuery values (union their terms and extensions) or, if we keep the field as a string at this layer, to concatenate with a space and document it. Recommend: concatenate with a space when both present and differ, matching welshman, and note the limitation.

Local matching. Extend Filter::matches to test search after the cheap scalar checks. Best-effort, case-insensitive. Two design choices to settle in planning:

  1. Whole-query substring (rust-nostr) vs. term-split AND/OR (welshman, fixed). A typed SearchQuery makes term-split natural: match the free-text terms (AND across terms reads as the intuitive "all words present"; document it), and treat key:value extensions as unenforceable locally — i.e. ignored by the local matcher, since we can't evaluate sentiment: or domain: without external data. This honesty matches the NIP.

  2. ASCII (eq_ignore_ascii_case) vs. Unicode lowercasing. ASCII is what rust-nostr ships and is allocation-free; Unicode to_lowercase is more correct for non-Latin content but allocates. Given nostr's multilingual content, prefer Unicode to_lowercase for the local matcher — correctness over micro-optimization, consistent with our "clarity over cleverness" rule — and note the trade-off.

    Also document, per rust-nostr's SDK, that local matching is a fallback: relay results should generally be trusted as-is rather than re-filtered.

SearchQuery model (new search.rs). A struct splitting a query into free-text terms: Vec<String> and extensions: Vec<(String, String)> (ordered; NIP-50 doesn't forbid repeats, and order can matter to relays). Parsing: split on whitespace, treat a token containing : (with a non-empty key before it) as an extension, everything else as a term. Provide:

  • SearchQuery::parse(&str) -> SearchQuery (total, never fails — unknown shapes fall back to terms).
  • Display / to_string() that re-renders to the wire string (terms first or preserve order; planning to decide).
  • Builder helpers: term, extension, plus typed convenience for the spec-defined extensions (domain, language, sentiment, nsfw, include_spam) — optional, decide scope in planning.
  • A bridge to Filter: Filter::add_search can accept impl Into<String> so both a raw string and query.to_string() work; optionally Filter::search_query() to parse the field back out.

Keep sentiment/nsfw values as strings (or small enums) — leaning toward strings to stay forward-compatible with relay-specific values, with named constructors for the common cases.

Dependencies. None new. Parsing is plain string handling; matching uses std. Avoid pulling in a real FTS engine — out of scope and against the minimal-dependency rule.

Out of scope (defer / mention only). Real relevance ranking; relay-side indexing; NIP-11 search-relay discovery (a networking concern); the order hint from applesauce; multi-field/tag matching beyond content (could mention title/subject as a possible extension but keep the matcher content-only for clarity).