Home Phrase prefix Query in Tantivy
Post
Cancel

Phrase prefix Query in Tantivy

What is a Phrase Prefix Query?

A phrase prefix query is a type of query that enables us to search for phrases based on a given prefix mostly used in autocomplete. Rather than searching for an exact phrase match, this query type focuses on finding documents that contain a specific prefix followed by any number of terms. By using a phrase prefix query, we can retrieve documents that match the desired prefix and capture the essence of our search.

In this example, we will explore the phrase prefix functionality of Tantivy, a full-text search engine library. We’ll go through the steps of defining a schema, creating an index, indexing documents, and performing a phrase prefix query to retrieve relevant documents.

Defining the Schema

The Tantivy index requires a well-defined schema that specifies the fields in the index and their indexing behavior. In this example, we’ll define a schema with a single field named “title.” We want to enable full-text search on this field and also store the original content of the documents.

1
2
3
let mut schema_builder = Schema::builder();
let title = schema_builder.add_text_field("title", TEXT | STORED);
let schema = schema_builder.build();

Here, we use the add_text_field method to create a text field named “title” and specify that it should be tokenized and indexed (TEXT). Additionally, we indicate that the field should be stored in a compressed key-value store (STORED), allowing us to retrieve the document content later.

Indexing Documents

Next, we create a new index and an index writer to insert documents into the index. We’ll add a few example documents with titles using the add_document method.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
let index_path = TempDir::new()?;
let index = Index::create_in_dir(&index_path, schema.clone())?;
let mut index_writer = index.writer(50_000_000)?;

index_writer.add_document(doc!(
    title => "The Name of the Wind",
))?;
index_writer.add_document(doc!(
    title => "The Diary of Muadib",
))?;
index_writer.add_document(doc!(
    title => "A Dairy Cow",
))?;
index_writer.add_document(doc!(
    title => "The Diary of a Young Girl",
))?;
index_writer.commit()?;

In this code snippet, we create a temporary directory to store the index files. Then, we add documents to the index by creating Document instances using the doc! macro. Each document has a field named “title” with a corresponding value. Finally, we call commit to make the documents searchable.

Searching

To perform a search, we need to create a searcher, which provides access to the indexed data. We’ll use the ‘ON_COMMIT’ reload policy, which automatically reloads the index after each commit.

1
2
3
4
5
let reader = index
    .reader_builder()
    .reload_policy(ReloadPolicy::OnCommit)
    .try_into()?;
let searcher = reader.searcher();

With the searcher in place, we can now execute a phrase prefix query. In this example, we search for the phrase “the dairy” in the “title” field.

1
2
3
4
5
6
7
8
9
10
11
let query = "the dairy";
let query_whitespace_split: Vec<&str> = query.split_whitespace().collect();
let term_queries: Vec<Term> = query_whitespace_split
    .iter()
    .map(|term| Term::from_field_text(title, term))
    .collect();
let phrase_query = PhrasePrefixQuery::new(term_queries);

let (top_docs, count) = searcher
    .search(&phrase_query, &(TopDocs::with_limit(5), Count))
    .unwrap();

Here, we split the query into individual terms and create a PhrasePrefixQuery using the term_queries vector. We then execute the query using the searcher’s search method, which returns the top matching documents and the total count.

Finally, we iterate over the retrieved documents and print their scores and content.

1
2
3
4
for (score, doc_address) in top_docs {
    let retrieved_doc = searcher.doc(doc_address)?;
    println!("score {score:?} doc {}", schema.to_json(&retrieved_doc));
}

The output will display the scores and content of the matching documents:

1
2
score 1.0 doc {"title":["The Diary of Muadib"]}
score 1.0 doc {"title":["The Diary of a Young Girl"]}

In this case, the query “the dairy” matches two documents, namely “The Diary of Muadib” and “The Diary of a Young Girl.”

Conclusion

In this example, we explored the phrase prefix functionality in Tantivy. We learned how to define a schema, create an index, index documents, and perform a phrase prefix query to retrieve relevant documents. Tantivy provides a powerful and flexible full-text search engine for your applications.

This post is licensed under CC BY 4.0 by the author.
Contents