7.9 KiB
Search
NIP-50 adds one field to the filter from the previous chapter: a search
string. A relay that advertises the capability reads the string as a
human-readable query — best nostr apps — matches it against event content,
and returns results ordered by relevance rather than by created_at, with
limit applied after ranking.
Search is opt-in and implementation-defined. Relays decide whether they index events
at all, what matches, and how ranking works. The query may also carry
key:value extensions — domain:, language:, sentiment:, nsfw:,
include:spam — and a relay honors only the ones it understands, ignoring the
rest. There is no global index and no guarantee of completeness: a client
queries the relays it believes support search and accepts a partial view.
Search may be implemented relay-side, or it may be performed on a client in some situations. This chapter provides utilities for parsing search terms along with a very basic model for implementing search that is decoupled from filter matching itself and entirely opt-in.
The module
pub mod search;
//! NIP-50 full-text search queries.
//!
//! A [`SearchQuery`] holds the terms of a search string and computes a
//! best-effort relevance score against event content — for the case where
//! search runs on the client, over events already in hand, rather than on a
//! relay.
use std::fmt;
The query model
A SearchQuery is just the query's terms: the words split out of the search
string. NIP-50 also defines key:value extensions, but their meaning is
relay-defined, and the local scorer has no way to evaluate sentiment:negative
or domain:example.com without data it doesn't have. Rather than model
extensions we can't honor, we treat every token as a term. A relay that
understands an extension still sees it verbatim in the query string; the local
scorer simply matches it as text like any other word.
/// A parsed NIP-50 search query: the terms of the query string.
///
/// NIP-50 `key:value` extensions are not modeled separately — their semantics
/// are relay-defined and cannot be evaluated locally, so each is kept as an
/// ordinary term.
#[derive(Debug, Clone, PartialEq, Eq, Default)]
pub struct SearchQuery {
/// The query's terms, in order.
pub terms: Vec<String>,
}
Parsing
Parsing splits the query on whitespace. Every token becomes a term, including anything that looks like an extension. There is nothing to reject, so parsing is total — it never errors.
impl SearchQuery {
/// Create an empty query.
pub fn new() -> Self {
SearchQuery::default()
}
/// Parse a raw query string by splitting it on whitespace. Every token,
/// extension-like or not, becomes a term. Parsing never fails.
pub fn parse(input: &str) -> Self {
SearchQuery {
terms: input.split_whitespace().map(str::to_string).collect(),
}
}
/// True when the query has no terms.
pub fn is_empty(&self) -> bool {
self.terms.is_empty()
}
}
Rendering joins the terms back into a query string. It is the inverse of parsing: feeding the output of one into the other gives an equal query, modulo runs of whitespace collapsing to single spaces.
impl fmt::Display for SearchQuery {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_str(&self.terms.join(" "))
}
}
Scoring
NIP-50 returns results in descending order of relevance, so a boolean "matches
or not" is the wrong shape for a local implementation. The scorer instead
returns a number in 0.0..=1.0, which can drive both inclusion (anything above
zero is a hit) and ordering.
The score has two parts. The base is the fraction of the query's terms that
appear in the content, compared case-insensitively — three terms, two present,
gives 2/3. On top of that, repeated occurrences add a small, diminishing
bonus, so that among events matching the same set of terms the ones that mention
them more often rank higher. The bonus is bounded below 1/total, which means
it can reorder events within a fraction but can never push a partial match up
to a full one: a missing term always costs more than any number of repetitions
can recover. An empty query — no terms — scores 1.0, since there is no text to
constrain.
impl SearchQuery {
/// Score `content` against this query's terms, in `0.0..=1.0`.
///
/// The base score is the fraction of the query's terms found in the content
/// (case-insensitive substring). Repeated occurrences add a diminishing
/// bonus, strictly less than one term's worth, so a partial match never
/// reaches `1.0`. An empty query scores `1.0`: there is no text to match.
pub fn score(&self, content: &str) -> f64 {
let total = self.terms.len();
if total == 0 {
return 1.0;
}
let haystack = content.to_lowercase();
let mut matched = 0usize;
let mut extra = 0usize;
for term in &self.terms {
let needle = term.to_lowercase();
if needle.is_empty() {
// An empty term imposes no constraint; treat it as present.
matched += 1;
continue;
}
let count = haystack.matches(needle.as_str()).count();
if count > 0 {
matched += 1;
extra += count - 1;
}
}
let base = matched as f64 / total as f64;
let bonus = (1.0 - 1.0 / (1.0 + extra as f64)) / total as f64;
(base + bonus).min(1.0)
}
}
Lowercasing uses to_lowercase, which folds case across Unicode rather than
only ASCII. That allocates, but nostr content is multilingual, and correctness
on non-Latin text is worth more than avoiding a copy in a best-effort matcher.
Connecting queries to filters
The previous chapter gave Filter a search field but no way to set it. The
setters follow the established add_* / clear_* vocabulary.
use crate::search::SearchQuery;
impl Filter {
/// Set the NIP-50 search query.
pub fn add_search(mut self, search: impl Into<String>) -> Self {
self.search = Some(search.into());
self
}
/// Remove the search query, leaving no search constraint.
pub fn clear_search(mut self) -> Self {
self.search = None;
self
}
}
Scoring an event against a filter is then a matter of parsing the field and
delegating to SearchQuery::score. With no search set the method returns 1.0,
so an unsearched filter never penalizes an event. This is purely the search
dimension — it is independent of the structural matches check from the
previous chapter, and the two are meant to be composed by the caller, not folded
together. A consumer that wants search-ranked results filters with matches,
scores with search_score, and sorts as it sees fit.
impl Filter {
/// Best-effort local relevance score for `event`, in `0.0..=1.0`.
///
/// Parses the `search` field and scores it against the event's content,
/// returning `1.0` when there is no search. This considers *only* the
/// `search` field; it is independent of [`matches`](Filter::matches).
pub fn search_score(&self, event: &Event) -> f64 {
match &self.search {
Some(query) => SearchQuery::parse(query).score(&event.content),
None => 1.0,
}
}
}
What's next
Search depends on routing the query to a relay that actually supports it. Discovering which relays advertise NIP-50, and choosing among them, is a networking and relay-metadata concern — the subject of the Domain and Networking sections, where relay selection is built on top of the filter types assembled here.