16 KiB
Research: Search
Topic Summary
NIP-50 adds an optional full-text search field to the subscription filter
introduced in chapter 11. A relay that supports the capability interprets the
query string against event content (and, for some kinds, other fields),
returning results ordered by relevance rather than created_at. The query may
carry structured extensions in the form of key:value pairs — domain:,
language:, sentiment:, nsfw:, include:spam — which relays may support or
ignore.
The chapter will:
- Add a
searchfield to the existingFiltertype, wiring it through construction, serialization, hashing, grouping, and the union/intersect utilities. - Introduce a typed
SearchQuerymodel that splits free-text terms fromkey:valueextensions, so applications can build and inspect queries safely instead of stringly-typed concatenation. (This is a deliberate departure from every reference, which treats the query as an opaque string.) - Implement a best-effort, case-insensitive local matcher over event content, while documenting that real ranking and extension semantics are relay-defined.
The code lives in coracle-lib: the search field extends filters.rs, and
the query model gets a dedicated search.rs module.
Philosophy
From ref/building-nostr, the framing relevant to search is that content
discovery on nostr is client-initiated routing through relay selection, not a
query against a global index. Searching is "knowing where to send queries." A
relay that supports NIP-50 is exercising an optional, relay-authored
capability — like content curation or access control — and defines its own
matching semantics, including which extensions it honors. This mirrors the NIP's
own "relays SHOULD ignore extensions they don't support."
Three principles bear directly on the chapter's voice:
- No guaranteed completeness. "No implementation will have a complete view of every heuristic that is applicable" — so search results are neither global nor exhaustive. A client queries the relays it knows support search and accepts a partial, spontaneous view. This should be stated honestly, not hidden.
- Indexing is the curator's responsibility, not the user's. Authors publish
signed events; relays (or indexing services) that want content discoverable
maintain the index. Clients do nothing special beyond sending a
searchfilter to a search-capable relay. - Publicity, not privacy. Full-text indexing makes content patterns discoverable and gives relay operators visibility into queries. The honest framing: search is a publicity feature.
The takeaway for our library: model search as a first-class but optional
filter field, keep the query structured enough that applications can reason
about it, and be candid that local matching is a best-effort approximation of a
relay-defined operation.
Reference Implementation Analysis
applesauce
search is an optional string on an extended Filter type
(packages/core/src/helpers/filter.ts): Filter = CoreFilter & { search?: string },
extending nostr-tools' base type. Opaque — no extension parsing.
Dual-mode: relay subscriptions pass the string through verbatim; a local SQLite
backend (packages/sqlite) indexes content into an FTS5 table and runs
events_search MATCH ? with the raw string double-quote-escaped. Local
client-side matchFilter() ignores the search field entirely. Pluggable
"search content formatters" decide what gets indexed (default: content;
enhanced: kind-0 profile fields plus t/subject/title/summary/d tags).
Supports order: "created_at" | "rank" for FTS5 ranking. Low coupling; SQLite
is optional. No query-extension awareness anywhere.
ndk
search?: string on NDKFilter (core/src/subscription/index.ts:30).
Opaque, relay-only. No parsing, no validation (filter-validation pipeline
skips it), no client-side matching (delegates to nostr-tools' matchFilters,
which ignores search). No helper functions for building search filters; callers
construct { search: "..." } by hand. The field is serialized and sent to
relays as-is. No NIP-11 capability negotiation or fallback. Minimal by design.
nostr-gadgets
Re-uses @nostr/tools' Filter type (search?: string). Opaque,
relay-only. Notably its local stores reject search: the in-memory store
returns an empty set if filter.search is present, and the RedEventStore docs
state "any filters supported (except 'search')." Provides a hardcoded
SEARCH_RELAYS constant (defaults.ts): relay.nostr.band, nostr.wine,
relay.noswhere.com, relay.nos.today. No query builders, no dynamic relay
capability detection.
nostrlib (Go)
Search string on the Filter struct (filter.go), (de)serialized as a plain
"search" JSON key. The core Filter.Matches / MatchesIgnoringTimestampConstraints
ignores search — matching is delegated to eventstore backends. Key-value
backends (BoltDB, LMDB, MMM) return nothing for search queries; only the Bleve
backend implements real full-text search: per-document language auto-detection
(lingua-go, 22 languages), per-language analyzers, boolean query syntax
(AND/OR/NOT, parens, quoted phrases), NIP-27 reference extraction with 2× boost,
and case-insensitive substring validation of quoted phrases. Kind-0 profiles index
name/display_name/about; reposts unpack inner events. Khatru relay policies
NoSearchQueries/RemoveSearchQueries let operators disable search. SDK
SearchUsers() just sends a Search filter to designated user-search relays. No
NIP-50 extension parsing (treats domain:x as a regular word); a 2-char minimum
query length is enforced by Bleve.
nostr-tools
search?: string on the base Filter (filter.ts). The canonical
"defined-but-unused" implementation. matchFilter()/matchFilters() do not
check search at all; mergeFilters() drops it entirely. No parsing, no
validation, no helpers, no tests for the field. Strictly a transport-layer
placeholder so applications can send search filters to relays. Minimal-deps
philosophy: search is purely a relay concern.
rust-nostr
The most directly relevant reference (also Rust). In
crates/nostr/src/filter.rs:
/// A string describing a query in a human-readable form, i.e. "best nostr apps"
/// <https://github.com/nostr-protocol/nips/blob/master/50.md>
#[serde(skip_serializing_if = "Option::is_none")]
#[serde(default)]
pub search: Option<String>,
Builder API: search<S: Into<String>>(self, value: S) -> Self and
remove_search(self) -> Self — symmetric, generic, #[inline]. Opaque (no
extension parsing).
Local matching (search_match):
fn search_match(&self, event: &Event) -> bool {
match &self.search {
Some(query) => event.content.as_bytes()
.windows(query.len())
.any(|window| window.eq_ignore_ascii_case(query.as_bytes())),
None => true,
}
}
Case-insensitive ASCII substring via sliding window; None matches
everything. Gated by a MatchEventOptions { nip50: bool, .. } flag (default
true). Notably, the SDK relay sets .nip50(false) with the comment "Skip NIP-50
matches since they may create issues and ban non-malicious relays" — i.e.
client-side re-matching of a relay's search results can wrongly drop valid hits.
DB backends (LMDB, SQLite) extend matching to a fixed set of searchable tags —
title, description, subject, name — lowercasing the query once up front;
empty search → no results. A Features { full_text_search: bool } flag declares
backend capability.
Patterns worth emulating: Into<String> builder, skip_serializing_if for a
clean wire format, an explicit opt-out for search matching, ASCII case folding
for speed.
welshman
The TypeScript toolkit our library descends from. search?: string on Filter
(packages/util/src/Filters.ts). It is the only reference that matches search
locally and threads it through filter utilities:
export const matchFilter = (filter, event) => {
if (!nostrToolsMatchFilter(filter, event)) return false
if (filter.search) {
const content = event.content.toLowerCase()
const terms = filter.search.toLowerCase().split(/\s+/g)
for (const term of terms) {
if (content.includes(term)) return true
return false // <-- bug: returns after first term
}
}
return true
}
The intent is term-splitting + case-insensitive substring, but the early
return false means only the first term is ever checked. A correct version
should decide AND vs OR across terms explicitly — this is the one place we can
clearly improve on the reference.
Filter utilities (directly parallel to our group/union_filters/intersect_filters):
calculateFilterGrouppushessearch:${search}into the group key — a filter with a search is only mergeable with an identical search.unionFilterstreatssearch(likesince/until/limit) as a scalar preserved from the first filter in the group, not merged.intersectFiltersconcatenates differing searches with a space ([a, b].join(" ")) — modeling "must match both" as a compound query — and takes whichever is present otherwise.getFilterIdincludes search in the deterministic hash, so different searches never dedupe.
Search-relay selection lives in the router: getSearchRelays() returns relays
whose NIP-11 supported_nips includes "50". No extension parsing.
Common Patterns
searchis universally an optional plain string. Every reference models it asOption<String>/search?: string. None parse thekey:valueextensions — they treat the whole query as opaque and let the relay interpret it. Our typedSearchQueryis therefore a value-add, not a port.- Local matching is the exception, not the rule. nostr-tools, ndk,
applesauce (in
matchFilter), and nostrlib's coreFilterall ignore search locally; matching happens relay-side (or in a dedicated index like Bleve/FTS5). Only rust-nostr and welshman attempt local matching, both with case-insensitive substring overcontent. - Where matching exists, it's case-insensitive substring — rust-nostr does
ASCII-only
eq_ignore_ascii_caseover byte windows (whole query as one needle); welshman lowercases and splits on whitespace into terms (intending multi-term, buggily). DB backends additionally search a small fixed set of metadata tags (title,description,subject,name). - Search makes filters un-mergeable. Both welshman (group key) and the general intuition agree: two filters with different search strings can't be unioned without changing semantics. rust-nostr sidesteps merging at this layer entirely.
- Client-side re-matching is risky. rust-nostr's SDK disables NIP-50 matching when filtering relay results, because a relay's notion of a match (ranked, fuzzy, multi-field, extension-aware) is richer than a client's substring check — re-filtering can drop legitimate hits.
- Relay selection by NIP-11. Search-capable relays are discovered via
supported_nipscontaining50(welshman) or a hardcoded allowlist (nostr-gadgets). This is an application/networking concern, out of scope forcoracle-lib.
Considerations for Our Implementation
Filter field. Add pub search: Option<String> to Filter. Follow
rust-nostr: add_search<S: Into<String>>(self, S) and clear_search(self) to
match the existing add_*/clear_* builder vocabulary (our methods are named
add_since/clear_since, etc., so add_search/clear_search fits better than
rust-nostr's search/remove_search). The field already participates in the
derived Hash (so id() covers it for free), but serialization, group(),
union_filters, intersect_filters, and matches() all need explicit updates.
Serialization. Our Filter has hand-written serde (to flatten #tag keys).
Add search as a plain "search" key — emit only when Some (mirroring
since/until/limit), and read it in the visitor's match arm. A round-trip
test must cover it.
Grouping / union / intersect. Per welshman: include search in the
group() hash so filters with different searches land in different groups (never
merged). In union_filters, since group members share an identical search by
construction, the search carries over via the or_insert_with(|| filter.clone())
seed — no special merge needed, but worth a comment. In combine_pair
(intersect), decide how to combine two searches: welshman concatenates with a
space. Concatenation is defensible ("must match both") but lossy and surprising;
a cleaner rule for a typed model is to merge two SearchQuery values (union
their terms and extensions) or, if we keep the field as a string at this layer,
to concatenate with a space and document it. Recommend: concatenate with a space
when both present and differ, matching welshman, and note the limitation.
Local matching. Extend Filter::matches to test search after the cheap
scalar checks. Best-effort, case-insensitive. Two design choices to settle in
planning:
-
Whole-query substring (rust-nostr) vs. term-split AND/OR (welshman, fixed). A typed
SearchQuerymakes term-split natural: match the free-text terms (AND across terms reads as the intuitive "all words present"; document it), and treatkey:valueextensions as unenforceable locally — i.e. ignored by the local matcher, since we can't evaluatesentiment:ordomain:without external data. This honesty matches the NIP. -
ASCII (
eq_ignore_ascii_case) vs. Unicode lowercasing. ASCII is what rust-nostr ships and is allocation-free; Unicodeto_lowercaseis more correct for non-Latin content but allocates. Given nostr's multilingual content, prefer Unicodeto_lowercasefor the local matcher — correctness over micro-optimization, consistent with our "clarity over cleverness" rule — and note the trade-off.Also document, per rust-nostr's SDK, that local matching is a fallback: relay results should generally be trusted as-is rather than re-filtered.
SearchQuery model (new search.rs). A struct splitting a query into
free-text terms: Vec<String> and extensions: Vec<(String, String)> (ordered;
NIP-50 doesn't forbid repeats, and order can matter to relays). Parsing: split on
whitespace, treat a token containing : (with a non-empty key before it) as an
extension, everything else as a term. Provide:
SearchQuery::parse(&str) -> SearchQuery(total, never fails — unknown shapes fall back to terms).Display/to_string()that re-renders to the wire string (terms first or preserve order; planning to decide).- Builder helpers:
term,extension, plus typed convenience for the spec-defined extensions (domain,language,sentiment,nsfw,include_spam) — optional, decide scope in planning. - A bridge to
Filter:Filter::add_searchcan acceptimpl Into<String>so both a raw string andquery.to_string()work; optionallyFilter::search_query()to parse the field back out.
Keep sentiment/nsfw values as strings (or small enums) — leaning toward
strings to stay forward-compatible with relay-specific values, with named
constructors for the common cases.
Dependencies. None new. Parsing is plain string handling; matching uses std. Avoid pulling in a real FTS engine — out of scope and against the minimal-dependency rule.
Out of scope (defer / mention only). Real relevance ranking; relay-side
indexing; NIP-11 search-relay discovery (a networking concern); the order
hint from applesauce; multi-field/tag matching beyond content (could mention
title/subject as a possible extension but keep the matcher content-only for
clarity).