How to Search for Text

Return to tatoeba.org.

Important Note

The search engine on tatoeba.org doesn't work like other standard search engines.

You can't use ? or ! in your searches in the way you would normally expect to use them, so you need to search for sentences without using these punctuation marks.

Tatoeba.org uses Sphinx Search

These instructions tell you how to use the search bar at the top of every Tatoeba page. Our search works much like a search engine such as Google, but has some important differences.

  • To find English sentences with "live", "lives", "living" or "lived", search for the word "live". (This will also find sentences with "Live", "Living", etc., since capitalization is ignored.)

  • To match a word exactly (ignoring capitalization), put an equals sign (=) before it.

  • Leave punctuation out of your search string. Most punctuation will be ignored, but a final exclamation mark (!) or question mark (?) will actually interfere with the search. These symbols have other purposes, as described later on this page.

    • The following yields no results:

    • but this search will find How strange! among other results:

  • Put a $ after a word to find sentences ending with that word. The example finds English sentences ending with "Tom".

  • Note that because $ is a special character, if you would like to search for sentences containing the symbol $, you will need to escape the symbol with a backslash.

  • Put a ^ before a word to find sentences beginning with that word. The example finds English sentences beginning with "Tom".

  • This example finds English sentences beginning with "Tom" and ending with "Mary".

  • This example finds English sentences beginning with either "Tom" or "He".

  • To search for a phrase, put quotes (") around it. Put an equals sign in front of each word that you want to be matched exactly.

    • If you want to see phrases like "live in Boston", "living in Boston", or "lives in Boston", use the following search:

    • The following search will only find sentences with the exact phrase "live in Boston".

    • This search will only find sentences consisting of the exact words "I live in Boston"..

  • This example finds English sentences that have "Tom", but don't begin with "Tom."

  • This example finds English sentences that have "Tom", but don't begin or end with "Tom."

  • The question mark (?) as part of a word is a one-letter wildcard.

    • The following will find sentences with either "whenever" and "wherever."

    • The following will find sentences with with 6-letter words that have 2 letters, and then "eve" and then one more letter, such as "clever" "eleven", "peeves", "uneven", ...

  • This example finds English sentences that have "Tom", then 2 words, then "Mary", then 1 word, and then "John."

  • This example finds English sentences that start with "Tom", then 3 words, then ends with "Mary".

  • This example finds English sentences that have words beginning with "red", including the word "red". (3 letters or more are required.)

  • This example finds English sentences that have words ending with "red", including the word "red".

  • This example finds English sentences that have words containing the word "red", including the word "red".

  • This example finds English sentences that have the word "French", but don't have the word "Tom"

  • This example will find sentences with "cheek" (in any form: cheeks, etc.) that don't include any of the words preceded by a minus sign (-).

  • This example finds sentences in which the word "cat" comes before the word "dog."

Languages without word boundaries

For languages that don't use space characters to separate words, like Japanese, Chinese etc. the search engine interprets each character as a single word. For instance, searching for 逆に will return the same results as 逆 に, which actually matches sentences that only include these characters, but not necessarily in that particular order, or not contiguously. So you want to surround keywords with quotes: "逆に".

More details

The search ignores capitalization and punctuation (unless the punctuation happens to match one of the special characters described elsewhere on the page). An apostrophe within a word is not treated as punctuation, so you can find such words as "don't" by including them in an ordinary search string.

In some languages, including English, the search engine stems the search words by default. This means that it removes certain trailing sequences from both search words and indexed words. Thus a search for live will also find lived and living.

The languages in which the search engine stems words are: German, English, Finnish, French, Italian, Dutch, Portuguese, Russian, Spanish, Swedish and Turkish.

If you want to find an exact match for a word, you must precede it with an equals sign, as in =live. This may come as a surprise to users who are accustomed to Google Search, where wrapping a word or phrase in double quotes forces an exact match. In Sphinx, double quotes have a different function, which only affects multiword (phrase) searches: wrapping a phrase in double quotes requires matching sentences to contain words in the specified continuous sequence. Simply placing a phrase in quotes does not suppress stemming of its individual words. To do that, you will need to place an equals sign before each word in the phrase for which you want to suppress stemming.

As an example, take the search like thing. This will find like things, likely things, and even things like. Adding quotes, as in "like thing", will prevent a match against things like (where the words appear in the wrong order), but it will continue to match like things, likely things, and so on. By contrast, "=like =thing" will only match like thing (which does not occur in the Tatoeba corpus). Removing the double quotes, =like =thing, will match What made you do a silly thing like that? Removing one of the equals signs, as in like =thing, will find Such a strange thing is not likely to happen.

Note that a star (*) can be placed at the beginning and/or end of a string representing a word, but it if is placed in the middle, the search will always fail. Also, a string beginning and/or ending with a star must be at least three characters long.

Other search operators

  • A vertical bar (representing "or") finds examples where either of the words appears:

    • hate | detest will match sentences with either hate or detest (or both).
  • If you want to combine an or-expression with other terms, you need to put the or-expression in parentheses:

    • (red|blue) house will match sentences in which the word "house" appears together with either "red" or "blue" (or both)
  • A dash (or exclamation point) before a word prevents matches with sentences where the word appears: like -thing (or like !thing) will match I like ice cream but not I like that red thing.

  • Putting a caret (^) before a word will match only sentences that begin with that word: ^great will match Great people are not always wise. but not You are the great love of my life.

  • Putting a dollar sign ($) after a word will match only sentences that end with that word: life$ will match This is the best day of my life. but not Life means nothing without friends.

  • If you want to search for sentences that contain nothing other than the specified words, use double quotes, a caret, and a dollar sign in combination: "^i love you$" will find I love you. and I love you! but not I love you more than you love me. (However, it will find I loved you. To prevent this match, use "^i =love you$".)

  • The strict order operator (<<) between two words will find sentences where the first word occurs before the second but not where the second word comes before the first. Thus dog << cat will find examples where dog precedes cat, but not vice versa.

See the Sphinx documentation for other functionality. Note that the documentation mentions keywords pertaining to specific fields in a document, but these are not relevant to Tatoeba.