:: DEVELOPER ZONE
MATCH
(
col1,col2,...)
AGAINST (expr [IN BOOLEAN MODE | WITH
QUERY EXPANSION])
As of MySQL 3.23.23, MySQL has support for full-text indexing and
searching. A full-text index in MySQL is an index of type
FULLTEXT. FULLTEXT indexes are
used with MyISAM tables only and can be created
from CHAR, VARCHAR, or
TEXT columns at CREATE TABLE
time or added later with ALTER TABLE or
CREATE INDEX. For large datasets, it is much
faster to load your data into a table that has no
FULLTEXT index, then create the index with
ALTER TABLE (or CREATE INDEX).
Loading data into a table that has an existing
FULLTEXT index could be significantly slower.
Constraints on full-text searching are listed in Section 12.7.3, “Full-Text Restrictions”.
Full-text searching is performed with the MATCH()
function.
mysql> CREATE TABLE articles (
-> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
-> title VARCHAR(200),
-> body TEXT,
-> FULLTEXT (title,body)
-> );
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO articles (title,body) VALUES
-> ('MySQL Tutorial','DBMS stands for DataBase ...'),
-> ('How To Use MySQL Well','After you went through a ...'),
-> ('Optimizing MySQL','In this tutorial we will show ...'),
-> ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
-> ('MySQL vs. YourSQL','In the following database comparison ...'),
-> ('MySQL Security','When configured properly, MySQL ...');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM articles
-> WHERE MATCH (title,body) AGAINST ('database');
+----+-------------------+------------------------------------------+
| id | title | body |
+----+-------------------+------------------------------------------+
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
| 1 | MySQL Tutorial | DBMS stands for DataBase ... |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)
The MATCH() function performs a natural language
search for a string against a text collection. A collection is a set
of one or more columns included in a FULLTEXT
index. The search string is given as the argument to
AGAINST(). For every row in the table,
MATCH() returns a relevance value, that is, a
similarity measure between the search string and the text in that row
in the columns named in the MATCH() list.
By default, the search is performed in case-insensitive fashion. In
MySQL 4.1 and up, you can make a full-text search by using a binary
collation for the indexed columns. For example, a column that has a
character set of latin1 can be assigned a
collation of latin1_bin to make it case sensitive
for full-text searches.
When MATCH() is used in a WHERE
clause, as in the preceding example, the rows returned are
automatically sorted with the highest relevance first. Relevance
values are non-negative floating-point numbers. Zero relevance means
no similarity. Relevance is computed based on the number of words in
the row, the number of unique words in that row, the total number of
words in the collection, and the number of documents (rows) that
contain a particular word.
For natural-language full-text searches, it is a requirement that the
columns named in the MATCH() function be the same
columns included in some FULLTEXT index in your
table. For the preceding query, note that the columns named in the
MATCH() function (title and
body) are the same as those named in the
definition of the article table's
FULLTEXT index. If you wanted to search the
title or body separately, you
would need to create FULLTEXT indexes for each
column.
It is also possible to perform a boolean search or a search with query expansion. These search types are described in Section 12.7.1, “Boolean Full-Text Searches” and Section 12.7.2, “Full-Text Searches with Query Expansion”.
The preceding example is a basic illustration showing how to use the
MATCH() function where rows are returned in order
of decreasing relevance. The next example shows how to retrieve the
relevance values explicitly. Returned rows are not ordered because
the SELECT statement includes neither
WHERE nor ORDER BY clauses:
mysql> SELECT id, MATCH (title,body) AGAINST ('Tutorial')
-> FROM articles;
+----+-----------------------------------------+
| id | MATCH (title,body) AGAINST ('Tutorial') |
+----+-----------------------------------------+
| 1 | 0.65545833110809 |
| 2 | 0 |
| 3 | 0.66266459226608 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
+----+-----------------------------------------+
6 rows in set (0.00 sec)
The following example is more complex. The query returns the
relevance values and it also sorts the rows in order of decreasing
relevance. To achieve this result, you should specify
MATCH() twice: once in the
SELECT list and once in the
WHERE clause. This causes no additional overhead,
because the MySQL optimizer notices that the two
MATCH() calls are identical and invokes the
full-text search code only once.
mysql> SELECT id, body, MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root') AS score
-> FROM articles WHERE MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root');
+----+-------------------------------------+-----------------+
| id | body | score |
+----+-------------------------------------+-----------------+
| 4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
| 6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)
The MySQL FULLTEXT implementation regards any
sequence of true word characters (letters, digits, and underscores)
as a word. That sequence may also contain apostrophes
('), but not more than one in a row. This means
that aaa'bbb is regarded as one word, but
aaa''bbb is regarded as two words. Apostrophes at
the beginning or the end of a word are stripped by the
FULLTEXT parser; 'aaa'bbb'
would be parsed as aaa'bbb.
The FULLTEXT parser determines where words start
and end by looking for certain delimiters, for example '
' (the space character), , (the comma),
and . (the period). If words aren't separated by
delimiters, like for example Chinese words, the
FULLTEXT parser cannot determine where a word
starts and where it ends. To be able to add words or other indexed
terms in such languages to a FULLTEXT index, you'd
have to preprocess them so that they are separated by some delimiter
(for example, by '').
Some words are ignored in full-text searches:
Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is four characters.
Words in the stopword list are ignored. A stopword is a word such as ``the'' or ``some'' that is so common that it is considered to have zero semantic value. There is a built-in stopword list.
The default minimum word length and stopword list can be changed as described in Section 12.7.4, “Fine-Tuning MySQL Full-Text Search”.
Every correct word in the collection and in the query is weighted according to its significance in the collection or query. This way, a word that is present in many documents has a lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are then combined to compute the relevance of the row.
Such a technique works best with large collections (in fact, it was
carefully tuned this way). For very small tables, word distribution
does not adequately reflect their semantic value, and this model may
sometimes produce bizarre results. For example, although the word
``MySQL'' is present in every row of the articles
table, a search for the word produces no results:
mysql> SELECT * FROM articles
-> WHERE MATCH (title,body) AGAINST ('MySQL');
Empty set (0.00 sec)
The search result is empty because the word ``MySQL'' is present in at least 50% of the rows. As such, it is effectively treated as a stopword. For large datasets, this is the most desirable behavior---a natural language query should not return every second row from a 1GB table. For small datasets, it may be less desirable.
A word that matches half of rows in a table is less likely to locate relevant documents. In fact, it most likely finds plenty of irrelevant documents. We all know this happens far too often when we are trying to find something on the Internet with a search engine. It is with this reasoning that rows containing the word are assigned a low semantic value for the particular dataset in which they occur. A given word may exceed the 50% threshold in one dataset but not another.
The 50% threshold has a significant implication when you first try full-text searching to see how it works: If you create a table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. As a result, no search returns any results. Be sure to insert at least three rows, and preferably many more.
© 1995-2005 MySQL AB. All rights reserved.

User Comments
Warning: query failed: Unknown column 'user.firstname' in 'field list' in /data0/sites/live/web-main/lib/mysql-cxn.php on line 69
Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /data0/sites/live/web-main/lib/docbook.php on line 245
Add your own comment.