3 * What is Senna MySQL binding?
5 MySQL version after 3.23.23 supports FULLTEXT index.
6 With it, MySQL can execute the full-text search for the field of VARCHAR and TEXT type.
7 But, these problems exist.
8 1) Insufficient Japanese support
12 Senna MySQL binding substitute for MySQL original fulltext index.
13 Those problems are fixed.
14 And, It enables you to use full features of Senna through MySQL.
18 You can install MySQL with Senna with either one of below two method.
21 +patch MySQL source, compile and install.
25 - Supports version 4.0, 4.1, 5.0 and 5.1
26 - MATCH AGAINST query support in BOOLEAN MODE and defaulut mode(NLQ MODE)
27 - In BOOLEAN MODE, you can use all operators like +, -, <, >, (, ), ~, *, ".
28 - Supports Japanese encoding EUC, SJIS
29 - Supports Unicode with UTF8
30 - Supports normalization. In UTF8, NFKC normalization supported
31 - Supports similar document search
32 - Supports near words search
33 - Supports MyISAM and MERGE(4.0 only) storage engines.
34 - Supports snippet(KWIC) function with MySQL user defined functions.
35 - Supports Japanese word's index(with MeCab), N-gram(bi-gram) index and space delimited index.
36 - 2ind patch enables MySQL to use FULLTEXT index and normal b-tree index bothly at one time.
37 - skipmode patch(4.0 only) accelerate queries as select count(*) and select * from table limit x,y.
41 Please read [[install_en]].
43 * Original enhanced features
45 Senna extends SQL syntax.
48 Moriyoshi Koizumi at GREE(http://gree.jp) and
49 Kazuhiro Osawa at iYappo(http://i.yappo.jp/).
51 ** enhanced CREATE TABLE / CREATE INDEX
53 CREATE TABLE / CREATE INDEX query is enhanced to be able to pass Senna parameters.
55 *** You can specify index type
57 You can select Japanese word index, NGRAM index or DELIMITED index.
59 **** Japanese word index (with MeCab)
61 If no index type specified, Senna makes Japanese word index.
62 Notice: For English users, this mode is not befitted. use NGRAM index.
67 id INTEGER AUTO_INCREMENT,
73 **** NGRAM index(bigram index for non-ascii strings and word index for ascii strings)
80 id INTEGER AUTO_INCREMENT,
83 FULLTEXT INDEX USING NGRAM (text)
86 **** DELIMITED index(space delimited index)
88 If you specify USING DELIMITED, words delimited by space are indexed.
93 id INTEGER AUTO_INCREMENT,
96 FULLTEXT INDEX USING DELIMITED (text)
99 **** MySQL original fulltext index(without Senna)
101 Specify USING NO SENNA.
106 id INTEGER AUTO_INCREMENT,
109 FULLTEXT INDEX USING NO SENNA (text)
112 **** normalize or not
114 At default, alphabet normalization is enabled
115 If you want to disable it, specify USING NO NORMALIZE.
117 ex.) without normalization with NGRAM index
120 id INTEGER AUTO_INCREMENT,
123 FULLTEXT INDEX USING SENNA, NO NORMALIZE, NGRAM (text)
126 **** INITIAL_N_SEGMENTS parameter
128 You can speficy INITIAL_N_SEGMENTS which is an initial value of an index buffer with 'USING number'
130 ex.) create index with INITIAL_N_SEGMENTS=2048
133 id INTEGER AUTO_INCREMENT,
136 FULLTEXT INDEX USING SENNA, 2048 (text)
139 ** Extensions of SHOW INDEX
141 mysql> SHOW INDEX FROM test;
142 +-------+------------+----------+- ... -+--------+------+--------------------------------+---------+
143 | Table | Non_unique | Key_name | | Packed | Null | Index_type | Comment |
144 +-------+------------+----------+- ... -+--------+------+--------------------------------+---------+
145 | test | 0 | PRIMARY | | NULL | | BTREE | |
146 | test | 1 | text | | NULL | YES | FULLTEXT,SENNA,NORMALIZE,NGRAM | |
147 +-------+------------+----------+--... -+--------+------+--------------------------------+---------+
148 2 rows in set (0.04 sec)
150 ** Extensions of SHOW TABLE STATUS
152 +-------+--------+ ... -------+----------------+----------------+--------------------+---------+
153 | Name | Type | ... k_time | Create_options | Senna_key_size | Senna_lexicon_size | Comment |
154 +-------+--------+ ... -------+----------------+----------------+--------------------+---------+
155 | test1 | MyISAM | ... | | 0 | 0 | |
156 | test2 | MyISAM | ... | | 0 | 0 | |
157 | test3 | MyISAM | ... | | 0 | 0 | |
158 | test4 | MyISAM | ... | | 0 | 0 | |
159 | test5 | MyISAM | ... | | 0 | 0 | |
160 +-------+--------+ ... -------+----------------+----------------+--------------------+---------+
162 next column is added.
166 record number of symbol table (.SEN)
168 - Senna_keys_file_size
170 file size of symbol table (.SEN)
174 record number of lexicon table (.SEN.I)
176 - Senna_lexicon_file_size
178 file size of lexicon table (.SEN.I)
182 file size of the buffer of lexicon table (.SEN.i)
184 - Senna_inv_chunk_size
186 file size of an inverted index table (.SEN.i.c)
188 ** 2ind patch(fulltext index and b-tree index bothly used at one time)
190 *** What is 2ind patch
192 In MySQL, only one index is used in executing one query.
193 Normally, that's limit is avoided with multi-column index.
194 But using fulltext index, it is very big problem like below.
196 1. slow limited query
198 select columns from table where match(a) against(b) limit 1000, 10
200 If limit offset is big, table scan occurs and response is very slow.
202 2. slow only getting count(*)
204 select count(*) from table where match(a) against(b);
206 Only getting record count, table scan occurs and response is very slow.
208 3. slow conditional search
210 select columns from table where match(a) against(b) and c like 'hoge%';
212 Although 'column c' is indexed, table scan occurs and response is very slow.
216 select columns from table where match(a) against(b) order by c;
218 Although 'column c' is indexed, table scan occurs and response is very slow.
220 2ind patch solves all of the problems.
222 *** Install 2ind patch
224 Please read [[install_en]].
227 - No additional query parameter is required to solve problem 1. 2 with 2ind-patch.
228 - Additional query parameter is required to solve problem 3. 4 with 2ind-patch like below.
229 select columns from table force index(c) where match(a) against(b) and c like 'hoge%';
231 select columns from table force index(c) where match(a) against(b) order by c;
233 (If you want to use PRIMARY key, specify 'force index(PRIMARY)'.
235 *** Notice of 2ind-patch
237 - 2ind-patch affects those four problem patterns, but no effect for other patterns.
238 - 2ind-patch is now beta version.
242 You can get snippet(or KWIC, KeyWords In Context) with a snippet UDF function.
244 *** Install snippet UDF
246 > cd {senna-package}/bindings/mysql/udf/
247 If you get Senna from Subversion repository, you have to execute autogen.sh.
250 > ./configure --prefix=/usr
255 Load udf functions in MySQL.
257 (In 'make load', 'mysql -u root -p' command is used for regist UDF.So password input is required.)
259 *** Using snippet UDF
261 snippet UDF is a function which have many parameters.
263 SELECT snippet(text or column, byte length of one snippet, number of snippets, encoding of text,
264 html encoding or not, snippet open tag, snippet close tag,
265 word1, word1 open tag, word1 close tag,
266 word2, word2 open tag, word2 close tag, ...)
268 Encoding of text is 'sjis', 'ujis', 'utf8' or 'default'.
269 If you get html encoded text, you have to specify html encoding parameter as 1. If not, 0.
273 SELECT snippet(body, 120, 3, 'utf8',
274 1, '...', '...<br>\n',
275 'search', '<span class="word1">', '</span>',
276 'test', '<span class="word2">', '</span>'
279 WHERE MATCH(body) AGAINST('*D+ search test' IN BOOLEAN MODE)
284 skipmode-patch make it's time faster that you get hit count and query which is count limited.
285 It is only available in MySQL 4.0.
287 *** Install skipmode patch
289 Please read [[install_en]].