Search

NIP-50 adds one field to the filter from the previous chapter: a search string. A relay that advertises the capability reads the string as a human-readable query — best nostr apps — matches it against event content, and returns results ordered by relevance rather than by created_at, with limit applied after ranking.

Search is opt-in and implementation-defined. Relays decide whether they index events at all, what matches, and how ranking works. The query may also carry key:value extensions — domain:, language:, sentiment:, nsfw:, include:spam — and a relay honors only the ones it understands, ignoring the rest. There is no global index and no guarantee of completeness: a client queries the relays it believes support search and accepts a partial view.

Search may be implemented relay-side, or it may be performed on a client in some situations. This chapter provides utilities for parsing search terms along with a very basic model for implementing search that is decoupled from filter matching itself and entirely opt-in.

The module

pub mod search;

//! NIP-50 full-text search queries.
//!
//! A [`SearchQuery`] holds the terms of a search string and computes a
//! best-effort relevance score against event content — for the case where
//! search runs on the client, over events already in hand, rather than on a
//! relay.

use std::fmt;

The query model

A SearchQuery is just the query's terms: the words split out of the search string. NIP-50 also defines key:value extensions, but their meaning is relay-defined, and the local scorer has no way to evaluate sentiment:negative or domain:example.com without data it doesn't have. Rather than model extensions we can't honor, we treat every token as a term. A relay that understands an extension still sees it verbatim in the query string; the local scorer simply matches it as text like any other word.

/// A parsed NIP-50 search query: the terms of the query string.
///
/// NIP-50 `key:value` extensions are not modeled separately — their semantics
/// are relay-defined and cannot be evaluated locally, so each is kept as an
/// ordinary term.
#[derive(Debug, Clone, PartialEq, Eq, Default)]
pub struct SearchQuery {
    /// The query's terms, in order.
    pub terms: Vec<String>,
}

Parsing

Parsing splits the query on whitespace. Every token becomes a term, including anything that looks like an extension. There is nothing to reject, so parsing is total — it never errors.

impl SearchQuery {
    /// Create an empty query.
    pub fn new() -> Self {
        SearchQuery::default()
    }

    /// Parse a raw query string by splitting it on whitespace. Every token,
    /// extension-like or not, becomes a term. Parsing never fails.
    pub fn parse(input: &str) -> Self {
        SearchQuery {
            terms: input.split_whitespace().map(str::to_string).collect(),
        }
    }

    /// True when the query has no terms.
    pub fn is_empty(&self) -> bool {
        self.terms.is_empty()
    }
}

Rendering joins the terms back into a query string. It is the inverse of parsing: feeding the output of one into the other gives an equal query, modulo runs of whitespace collapsing to single spaces.

impl fmt::Display for SearchQuery {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.write_str(&self.terms.join(" "))
    }
}

Scoring

NIP-50 returns results in descending order of relevance, so a boolean "matches or not" is the wrong shape for a local implementation. The scorer instead returns a number in 0.0..=1.0, which can drive both inclusion (anything above zero is a hit) and ordering.

The score has two parts. The base is the fraction of the query's terms that appear in the content, compared case-insensitively — three terms, two present, gives 2/3. On top of that, repeated occurrences add a small, diminishing bonus, so that among events matching the same set of terms the ones that mention them more often rank higher. The bonus is bounded below 1/total, which means it can reorder events within a fraction but can never push a partial match up to a full one: a missing term always costs more than any number of repetitions can recover. An empty query — no terms — scores 1.0, since there is no text to constrain.

impl SearchQuery {
    /// Score `content` against this query's terms, in `0.0..=1.0`.
    ///
    /// The base score is the fraction of the query's terms found in the content
    /// (case-insensitive substring). Repeated occurrences add a diminishing
    /// bonus, strictly less than one term's worth, so a partial match never
    /// reaches `1.0`. An empty query scores `1.0`: there is no text to match.
    pub fn score(&self, content: &str) -> f64 {
        let total = self.terms.len();
        if total == 0 {
            return 1.0;
        }

        let haystack = content.to_lowercase();

        let mut matched = 0usize;
        let mut extra = 0usize;
        for term in &self.terms {
            let needle = term.to_lowercase();
            if needle.is_empty() {
                // An empty term imposes no constraint; treat it as present.
                matched += 1;
                continue;
            }
            let count = haystack.matches(needle.as_str()).count();
            if count > 0 {
                matched += 1;
                extra += count - 1;
            }
        }

        let base = matched as f64 / total as f64;
        let bonus = (1.0 - 1.0 / (1.0 + extra as f64)) / total as f64;
        (base + bonus).min(1.0)
    }
}

Lowercasing uses to_lowercase, which folds case across Unicode rather than only ASCII. That allocates, but nostr content is multilingual, and correctness on non-Latin text is worth more than avoiding a copy in a best-effort matcher.

Connecting queries to filters

The previous chapter gave Filter a search field but no way to set it. The setters follow the established add_* / clear_* vocabulary.

use crate::search::SearchQuery;

impl Filter {
    /// Set the NIP-50 search query.
    pub fn add_search(mut self, search: impl Into<String>) -> Self {
        self.search = Some(search.into());
        self
    }

    /// Remove the search query, leaving no search constraint.
    pub fn clear_search(mut self) -> Self {
        self.search = None;
        self
    }
}

Scoring an event against a filter is then a matter of parsing the field and delegating to SearchQuery::score. With no search set the method returns 1.0, so an unsearched filter never penalizes an event. This is purely the search dimension — it is independent of the structural matches check from the previous chapter, and the two are meant to be composed by the caller, not folded together. A consumer that wants search-ranked results filters with matches, scores with search_score, and sorts as it sees fit.

impl Filter {
    /// Best-effort local relevance score for `event`, in `0.0..=1.0`.
    ///
    /// Parses the `search` field and scores it against the event's content,
    /// returning `1.0` when there is no search. This considers *only* the
    /// `search` field; it is independent of [`matches`](Filter::matches).
    pub fn search_score(&self, event: &Event) -> f64 {
        match &self.search {
            Some(query) => SearchQuery::parse(query).score(&event.content),
            None => 1.0,
        }
    }
}

What's next

Search depends on routing the query to a relay that actually supports it. Discovering which relays advertise NIP-50, and choosing among them, is a networking and relay-metadata concern — the subject of the Domain and Networking sections, where relay selection is built on top of the filter types assembled here.

7.9 KiB Raw Permalink Blame History