The author selected the Open Internet/Free Speech Fund to receive a donation as part of the Write for DOnations program. Show IntroductionMongoDB queries that filter data by searching for exact matches, using greater-than or less-than comparisons, or by using regular expressions will work well enough in many situations. However, these methods fall short when it comes to filtering against fields containing rich textual data. Imagine you typed “coffee recipe” into a web search engine but it only returned pages that contained that exact phrase. In this case, you may not find exactly what you were looking for since most popular websites with coffee recipes may not contain the exact phrase “coffee recipe.” If you were to enter that phrase into a real search engine, though, you might find pages with titles like “Great Coffee Drinks (with Recipes!)” or “Coffee Shop Drinks and Treats You Can Make at Home.” In these examples, the word “coffee” is present but the titles contain another form of the word “recipe” or exclude it entirely. This level of flexibility in matching text to a search query is typical for full-text search engines that specialize in searching textual data. There are multiple specialized open-source tools for such applications in use, with ElasticSearch being an especially popular choice. However, for scenarios that don’t require the robust search features found in dedicated search engines, some general-purpose database management systems offer their own full-text search capabilities. In this tutorial, you’ll learn by example how to create a text index in MongoDB and use it to search the documents in the database against common full-text search queries and filters. PrerequisitesTo follow this tutorial, you will need:
Note: The linked tutorials on how to configure your server, install MongoDB, and secure the MongoDB installation refer to Ubuntu 20.04. This tutorial concentrates on MongoDB itself, not the underlying operating system. It will generally work with any MongoDB installation regardless of the operating system as long as authentication has been enabled. Step 1 — Preparing the Test DataTo help you learn how to perform full-text searches in MongoDB, this step outlines how to open the MongoDB shell to connect to your locally-installed MongoDB instance. It also explains how to create a sample collection and insert a few sample documents into it. This sample data will be used in commands and examples throughout this guide to help explain how to use MongoDB to search text data. To create this sample collection, connect to the MongoDB shell as your administrative user. This tutorial follows the conventions of the prerequisite
MongoDB security tutorial and assumes the name of this administrative user is AdminSammy and its authentication database is
Enter the password you set during installation to gain access to the shell. After providing the password, your prompt will change to a greater-than sign:
Note: On a fresh connection, the MongoDB shell will connect to the Alternatively, you could switch to another database to run all of the example commands given in this tutorial. To switch to another database, run the
To understand how full-text search can be applied to documents in MongoDB, you’ll need a collection of documents you can filter against. This guide will use a collection of sample documents that include names and descriptions of several different types of coffee drinks. These documents will have the same format as the following example document describing a Cuban coffee drink: Example Cafecito document
This document contains two fields: the Run the following
This method will return a list of object identifiers assigned to the newly inserted objects:
You can verify that the documents were properly inserted by running the
With the sample data in place, you’re ready to start learning how to use MongoDB’s full-text search features. Step 2 — Creating a Text IndexTo start using MongoDB’s full-text search capabilities, you must create a text index on a collection. Indexes are special data structures that store only a small subset of data from each document in a collection separately from the documents themselves. There are several types of indexes users can create in MongoDB, all of which help the database optimize search performance when querying the collection. A text index, however, is a special type of index used to further facilitate searching fields containing text data. When a user creates a text index, MongoDB will automatically drop any language-specific stop words from searches. This means that MongoDB will ignore the most common words for the given language (in English, words like “a”, “an”, “the”, or “this”). MongoDB will also implement a form of suffix-stemming in searches. This involves MongoDB identifying the root part of the search term and treating other grammar forms of that root (created by adding common suffixes like “-ing”, “-ed”, or perhaps “-er”) as equivalent to the root for the purposes of the search. Thanks to these and other features, MongoDB can more flexibly support queries written in natural language and provide better results. Note: This tutorial focuses on English text, but MongoDB supports multiple languages when using full-text search and text indexes. To learn more about what languages MongoDB supports, refer to the official documentation on supported languages. You can only create one text index for any given MongoDB collection, but the index can be created using more than one field. In our example collection, there is useful text stored in both the Run the following
For each of the two fields,
Now that you’ve created the index, you can use it to issue full-text search queries to the database. In the next step, you’ll learn how to execute queries containing both single and multiple words. Step 3 — Searching for One or More Individual WordsPerhaps the most common search problem is to look up documents containing one or more individual words. Typically, users expect the search engine to be flexible in determining where the given search terms should appear. As an example, if you were to use any popular web search engine and type in “coffee sweet spicy”, you likely are not expecting results that will contain those three words in that exact order. It’s more likely that you’d expect a list of web pages containing the words “coffee”, “sweet”, and “spicy” but not necessarily immediately near each other. That’s also how MongoDB approaches typical search queries when using text indexes. This step outlines how MongoDB interprets search queries with a few examples. To begin, say you want to search for coffee drinks with spices in their recipe, so you search for the word
Notice that the syntax when using full-text search is slightly different from regular queries.
Individual field names — like After running this command, MongoDB produces the following list of documents:
There are two documents in the result set, both of which contain words resembling the search query. While the Regardless, it was still returned by this query thanks to MongoDB’s use of stemming. MongoDB stripped the word Now, suppose you’re particularly fond of espresso drinks. Try looking up documents with a two-word query,
The list of results this time is longer than before:
When using multiple words in a search query, MongoDB
performs a logical Note: If you try to execute any full-text search query on a collection for which there is no text index defined, MongoDB will return an error message instead:
In this step, you learned how to use one or multiple words as a text search query, how MongoDB joins multiple words with a logical Step 4 — Searching for Full Phrases and Using ExclusionsLooking up individual words might return too many results, or the results may not be precise enough. In this step, you’ll use phrase search and exclusions to control search results more precisely. Suppose you have a sweet tooth, it’s hot outside, and coffee topped with ice cream sounds like a nice treat. Try finding an ice cream coffee using the basic search query as outlined previously:
The database will return two coffee recipes:
While the To tell MongoDB that you are looking for
Notice the backslashes preceding each of the double quotes surrounding the phrase: This time, MongoDB returns a single result:
This document matches the search term exactly, and neither Another useful full-text search feature is the exclusion modifier. To illustrate how to this works, first run the following query to get a list of all the coffee drinks in the collection based on espresso:
This query returns four documents:
Notice that two of these drinks are served with milk, but suppose you want a milk-free drink. This is a case where exclusions can come in handy. In a single query, you can join words that you want to appear in the results with those that you want to be excluded by prepending the word or phrase you want to exclude with a minus sign
( As an example, say you run the following query to look up espresso coffees that do not contain milk:
With this query, two documents will be excluded from the previously returned results:
You can also exclude full phrases. To search for coffees without ice cream, you could include
Now that you’ve learned how to filter documents based on a phrase consisting of multiple words and how to exclude certain words and phrases from search results, you can acquaint yourself with MongoDB’s full-text search scoring. Step 5 — Scoring the Results and Sorting By ScoreWhen a query, especially a complex one, returns multiple results, some documents are likely to be a better match than others. For example, when you look for Full-text search engines typically assign a relevance score to the search results, indicating how well they match the search query. MongoDB also does this, but the search relevance is not visible by default. Search once again for
The projection After executing the query, the returned documents will include a new field named
Notice how much higher the score is
for To change that, you could add a
The syntax for the sorting document is the same as that of the projection. Now, the list of documents is the same, but their order is different:
The
Sorting results according to their relevance score can be helpful. This is especially true with queries containing multiple words, where the most fitting documents will usually contain multiple search terms while the less relevant documents might contain only one. ConclusionBy following this tutorial, you’ve acquainted yourself with MongoDB’s full-text search features. You created a text index and wrote text search queries using single and multiple words, full phrases, and exclusions. You’ve also assessed the relevance scores for returned documents and sorted the search results to show the most relevant results first. While MongoDB’s full-text search features may not be as robust as those of some dedicated search engines, they are capable enough for many use cases. Note that there are more search query modifiers — such as case and diacritic sensitivity and support for multiple languages — within a single text index. These can be used in more robust scenarios to support text search applications. For more information on MongoDB’s full-text search features and how they can be used, we encourage you to check out the official official MongoDB documentation. |