html/pg_bigm_en-1-2.html

   1 <html>
   2 <head>
   3 <title>pg_bigm 1.2 Document</title>
   4
   5 <link rel="stylesheet" type="text/css" href="style.css">
   6 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
   7 </head>
   8
   9 <body>
  10   <div id="navigation">
  11     <ul>
  12       <li><a href="index_en.html">Home</a></li>
  13       <li><a href="http://en.osdn.jp/projects/pgbigm/releases/?package_id=13634">Download</a></li>
  14       <li><a href="index_en.html#document">Document</a></li>
  15       <li><a href="index_en.html#community">Community</a></li>
  16       <li><a href="index_en.html#development">Development</a></li>
  17       <li><a href="pg_bigm-1-2.html">日本語</a></li>
  18     </ul>
  19   </div>
  20
  21 <h1 id="pg_bigm">Document (Release 1.2)</h1>
  22
  23 <div class="index">
  24 <ol>
  25 <li><a href="#description">Overview</a></li>
  26 <li><a href="#pg_trgm">Comparison with pg_trgm</a></li>
  27 <li><a href="#requirement">Tested platforms</a></li>
  28 <li><a href="#install">Install</a></li>
  29 <li><a href="#uninstall">Uninstall</a></li>
  30 <li><a href="#fulltext_search">Full text search</a></li>
  31 <li><a href="#functions">Functions</a></li>
  32 <li><a href="#parametares">Parameters</a></li>
  33 <li><a href="#limitations">Limitations</a></li>
  34 <li><a href="#release_notes">Release Notes</a></li>
  35 </ol>
  36 </div>
  37
  38 <h2 id="description">Overview</h2>
  39 <p>The pg_bigm module provides full text search capability in <a href="http://www.postgresql.org/">PostgreSQL</a>. This module allows a user to create <b>2-gram</b> (bigram) index for faster full text search.</p>
  40 <p>The <a href="http://en.osdn.jp/projects/pgbigm/">pg_bigm project</a> provides the following one module.</p>
  41
  42 <table>
  43 <thead>
  44 <tr>
  45 <th>Module</th><th>Description</th><th>Source Archive File Name</th>
  46 </tr>
  47 </thead>
  48 <tbody>
  49 <tr><td>pg_bigm</td>
  50   <td nowrap>Module that provides full text search capability in PostgreSQL</td>
  51   <td>pg_bigm-x.y-YYYYMMDD.tar.gz</td></tr>
  52 </tbody>
  53 </table>
  54
  55 <p>
  56 The x.y and YYYYMMDD parts of the source archive file name are replaced with its release version number and date, respectively.
  57 For example, x.y is 1.1 and YYYYMMDD is 20131122 if the file of the version 1.1 was released on November 22, 2013.
  58 </p>
  59 <p>The license of pg_bigm is <a href="http://opensource.org/licenses/postgresql">The PostgreSQL License</a> (same as BSD license).</p>
  60
  61 <h2 id="pg_trgm">Comparison with pg_trgm</h2>
  62 <p>The <a href="http://www.postgresql.jp/document/current/html/pgtrgm.html">pg_trgm</a> contrib module which provides full text search capability using 3-gram (trigram) model is included in PostgreSQL. The pg_bigm was developed based on the pg_trgm. They have the following differences:</p>
  63
  64 <table>
  65 <thead>
  66 <tr>
  67 <th>Functionalities and Features</th><th>pg_trgm</th><th>pg_bigm</th>
  68 </tr>
  69 </thead>
  70 <tbody>
  71 <tr><td>Phrase matching method for full text search</td>
  72   <td nowrap>3-gram</td>
  73   <td>2-gram</td></tr>
  74 <tr><td>Available index</td>
  75   <td nowrap>GIN and GiST</td>
  76   <td>GIN only</td></tr>
  77 <tr><td>Available text search operators</td>
  78   <td nowrap>LIKE (~~), ILIKE (~~*), ~, ~*</td>
  79   <td>LIKE only</td></tr>
  80 <tr><td>Full text search for non-alphabetic language<br>(e.g., Japanese)</td>
  81   <td nowrap>Not supported (*1)</td>
  82   <td>Supported</td></tr>
  83 <tr><td>Full text search with 1-2 characters keyword</td>
  84   <td nowrap>Slow (*2)</td>
  85   <td>Fast</td></tr>
  86 <tr><td>Similarity search</td>
  87   <td nowrap>Supported</td>
  88   <td>Supported (version 1.1 or later)</td></tr>
  89 <tr><td>Maximum indexed column size</td>
  90   <td nowrap>238,609,291 Bytes (~228MB)</td>
  91   <td nowrap>107,374,180 Bytes (~102MB)</td></tr>
  92 </tbody>
  93 </table>
  94
  95 <ul>
  96 <li>(*1) You can use full text search for non-alphabetic language by commenting out KEEPONLYALNUM macro variable in contrib/pg_trgm/pg_trgm.h and rebuilding pg_trgm module. But pg_bigm provides faster non-alphabetic search than such a modified pg_trgm.</li>
  97 <li>(*2) Because, in this search, only sequential scan or index full scan (not normal index scan) can run.</li>
  98 </ul>
  99
 100 <p>pg_bigm 1.1 or later can coexist with pg_trgm in the same database, but pg_bigm 1.0 cannot.</p>
 101
 102 <h2 id="requirement">Tested platforms</h2>
 103 <p>pg_bigm has been built and tested on the following platforms:</p>
 104 <table>
 105 <thead>
 106 <tr>
 107 <th>Category</th><th>Module Name</th>
 108 </tr>
 109 </thead>
 110 <tbody>
 111 <tr>
 112   <td>OS</td>
 113   <td nowrap>Linux, Mac OS X</td>
 114 </tr>
 115 <tr>
 116   <td>DBMS</td>
 117   <td nowrap>PostgreSQL 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 10, 11, 12, 13, 14, 15</td>
 118 </tr>
 119 </tbody>
 120 </table>
 121
 122 <p>pg_bigm requires PostgreSQL 9.1 or later.</p>
 123
 124 <h2 id="install">Install</h2>
 125
 126 <h3 id="pg_install">Install PostgreSQL</h3>
 127 <p>From the <a href="http://www.postgresql.org/">official site of PostgreSQL</a> download the source archive file "postgresql-X.Y.Z.tar.gz (please replace X.Y.Z with actual version number)" of PostgreSQL, and then build and install it.</p>
 128 <pre>
 129 $ tar zxf postgresql-X.Y.Z.tar.gz
 130 $ cd postgresql-X.Y.Z
 131 $ ./configure --prefix=/opt/pgsql-X.Y.Z
 132 $ make
 133 $ su
 134 # make install
 135 # exit</pre>
 136 <ul>
 137 <li>--prefix : Specify the PostgreSQL installation directory. This is optional. By default, PostgreSQL is installed in /usr/local/pgsql.</li>
 138 </ul>
 139 <p>If PostgreSQL is installed from RPM, the postgresql-devel package must be installed to build pg_bigm.</p>
 140
 141 <h3 id="bigm_install">Install pg_bigm</h3>
 142 <p>Download the source archive file of pg_bigm from <a href="http://en.osdn.jp/projects/pgbigm/releases/?package_id=13634">here</a>, and then build and install it.</p>
 143 <pre>
 144 $ tar zxf pg_bigm-x.y-YYYYMMDD.tar.gz
 145 $ cd pg_bigm-x.y-YYYYMMDD
 146 $ make USE_PGXS=1 PG_CONFIG=/opt/pgsql-X.Y.Z/bin/pg_config
 147 $ su
 148 # make USE_PGXS=1 PG_CONFIG=/opt/pgsql-X.Y.Z/bin/pg_config install
 149 # exit
 150 </pre>
 151 <ul>
 152 <li>USE_PGXS : USE_PGXS=1 must be always specified when building pg_bigm.</li>
 153 <li>PG_CONFIG : Specify the path to <a href="http://www.postgresql.org/docs/current/static/app-pgconfig.html">pg_config</a> (which exists in the bin directory of PostgreSQL installation). If the PATH environment variable contains the path to pg_config, PG_CONFIG doesn't need to be specified.</li>
 154 </ul>
 155
 156 <h3 id="bigm_register">Load pg_bigm</h3>
 157 <p>Create the database cluster, modify postgresql.conf, start PostgreSQL server and then load pg_bigm into the database.</p>
 158 <pre>
 159 $ initdb -D $PGDATA --locale=C --encoding=UTF8
 160
 161 $ vi $PGDATA/postgresql.conf
 162 shared_preload_libraries = 'pg_bigm'
 163
 164 $ pg_ctl -D $PGDATA start
 165 $ psql -d &lt;database name&gt;
 166 =# CREATE EXTENSION pg_bigm;
 167 =# \dx pg_bigm
 168                     List of installed extensions
 169   Name   | Version | Schema |              Description
 170 ---------+---------+--------+---------------------------------------
 171  pg_bigm | 1.1     | public | text index searching based on bigrams
 172 (1 row)
 173 </pre>
 174
 175 <ul>
 176 <li>Replace $PGDATA with the path to database cluster.</li>
 177 <li>pg_bigm supports all PostgreSQL encoding and locale.</li>
 178 <li>In postgresql.conf, <a href="http://www.postgresql.org/docs/devel/static/runtime-config-client.html#GUC-SHARED-PRELOAD-LIBRARIES">shared_preload_libraries</a> or <a href="http://www.postgresql.org/docs/devel/static/runtime-config-client.html#GUC-SESSION-PRELOAD-LIBRARIES">session_preload_libraries</a> (available in PostgreSQL 9.4 or later) must be set to 'pg_bigm' to preload the pg_bigm shared library into the server.
 179   <ul>
 180     <li>In PostgreSQL 9.1, <a href="http://www.postgresql.org/docs/9.1/static/runtime-config-custom.html#GUC-CUSTOM-VARIABLE-CLASSES">custom_variable_classes</a> also must be set to 'pg_bigm'.</li>
 181   </ul>
 182 </li>
 183 <li><a href="http://www.postgresql.org/docs/current/static/sql-createextension.html">CREATE EXTENSION</a> pg_bigm needs to be executed in all the databases that you want to use pg_bigm in.</li>
 184 </ul>
 185
 186
 187 <h2 id="uninstall">Uninstall</h2>
 188
 189 <h3 id="bigm_uninstall">Delete pg_bigm</h3>
 190 <p>Unload pg_bigm from the database and then uninstall it.</p>
 191 <pre>
 192 $ psql -d &lt;database name&gt;
 193 =# DROP EXTENSION pg_bigm CASCADE;
 194 =# \q
 195
 196 $ pg_ctl -D $PGDATA stop
 197 $ su
 198
 199 # cd &lt;pg_bigm source directory&gt;
 200 # make USE_PGXS=1 PG_CONFIG=/opt/pgsql-X.Y.Z/bin/pg_config uninstall
 201 # exit
 202 </pre>
 203
 204 <ul>
 205 <li>pg_bigm needs to be unloaded from all the databases that it was loaded into.</li>
 206 <li><a href="http://www.postgresql.org/docs/current/static/sql-dropextension.html">DROP EXTENSION</a> pg_bigm needs to be executed with CASCADE option to delete all the database objects which depend on pg_bigm, e.g., pg_bigm full text search index.</li>
 207 </ul>
 208
 209 <h3 id="delete_conf">Reset postgresql.conf</h3>
 210 <p>Delete the following pg_bigm related settings from postgresql.conf.</p>
 211 <ul>
 212 <li>shared_preload_libraries or session_preload_libraries</li>
 213 <li>custom_variable_classes (only PostgreSQL 9.1)</li>
 214 <li>pg_bigm.* (parameters which begin with pg_bigm)</li>
 215 </ul>
 216
 217 <h2 id="fulltext_search">Full text search</h2>
 218
 219 <h3 id="create_index">Create Index</h3>
 220 <p>You can create an index for full text search by using GIN index.</p>
 221 <p>The following example creates the table <i>pg_tools</i> which stores the name and description of PostgreSQL related tool, inserts four records into the table, and then creates the full text search index on the <i>description</i> column.</p>
 222
 223 <pre>
 224 =# CREATE TABLE pg_tools (tool text, description text);
 225
 226 =# INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
 227 =# INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
 228 =# INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
 229 =# INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
 230
 231 =# CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
 232 </pre>
 233
 234 <ul>
 235 <li><b>gin</b> must be used as an index method. GiST is not available for pg_bigm.</li>
 236 <li><b>gin_bigm_ops</b> must be used as an operator class.</li>
 237 </ul>
 238
 239 <p>You can also create multicolumn pg_bigm index and specify GIN related parameters then, as follows.</p>
 240
 241 <pre>
 242 =# CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);
 243 </pre>
 244
 245 <h3 id="do_fulltext_search">Execute full text search</h3>
 246 <p>You can execute full text search by using LIKE pattern matching.</p>
 247 <pre>
 248 =# SELECT * FROM pg_tools WHERE description LIKE '%search%';
 249   tool   |                             description
 250 ---------+---------------------------------------------------------------------
 251  pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
 252  pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
 253 (2 rows)
 254 </pre>
 255 <ul>
 256 <li>The search keyword must be specified as the pattern string that LIKE operator can handle properly, as discussed in <a href="#likequery">likequery</a>.</li>
 257 </ul>
 258
 259 <h3 id="similarity_search">Execute similarity search</h3>
 260 <p>You can execute similarity search by using =% operator.</p>
 261 <p>The following query returns all values in the tool column that are sufficiently similar to the word 'bigm'. This similarity search is basically fast because it can use the full text search index. It measures whether two strings are sufficiently similar to by seeing whether their similarity is higher than or equal to the value of <a href="#similarity_limit">pg_bigm.similarity_limit</a>. This means, in this query, that the values whose similarity with the word 'bigm' is higher than or equal to 0.2 are only 'pg_bigm' and 'pg_trgm' in the tool column.</p>
 262 <pre>
 263 =# SET pg_bigm.similarity_limit TO 0.2;
 264
 265 =# SELECT tool FROM pg_tools WHERE tool =% 'bigm';
 266   tool
 267 ---------
 268  pg_bigm
 269  pg_trgm
 270 (2 rows)
 271 </pre>
 272 <p>
 273 Please see <a href="#bigm_similarity">bigm_similarity</a> function for details of how to calculate the similarity.
 274 </p>
 275
 276 <h2 id="functions">Functions</h2>
 277 <h3 id="likequery">likequery</h3>
 278 <p>likequery is a function that converts the search keyword (argument #1) into the pattern string that LIKE operator can handle properly.</p>
 279 <ul>
 280 <li>Argument #1 (text) - search keyword</li>
 281 <li>Return value (text) - pattern string that was converted from argument #1 so that LIKE operator can handle properly</li>
 282 </ul>
 283
 284 <p>If the argument #1 is NULL, the return value is also NULL.</p>
 285
 286 <p>This function does the conversion as follows:</p>
 287 <ul>
 288 <li>appends % (single-byte percent) into both the beginning and the end of the search keyword.</li>
 289 <li>escapes the characters % (single-byte percent), _ (single-byte underscore) and \ (single-byte backslash) in the search keyword by using \ (single-byte backslash).</li>
 290 </ul>
 291
 292 <p>In pg_bigm, full text search is performed by using LIKE pattern matching. Therefore, the search keyword needs to be converted into the pattern string that LIKE operator can handle properly. Usually a client application should be responsible for this conversion. But, you can save the effort of implementing such a conversion logic in the application by using likequery function.</p>
 293
 294 <pre>
 295 =# SELECT likequery('pg_bigm has improved the full text search performance by 200%');
 296                              likequery
 297 -------------------------------------------------------------------
 298  %pg\_bigm has improved the full text search performance by 200\%%
 299 (1 row)
 300 </pre>
 301
 302 <p>Using likequery, you can rewrite the full text search query which was used in the example in "Execute full text search" into:</p>
 303 <pre>
 304 =# SELECT * FROM pg_tools WHERE description LIKE likequery('search');
 305   tool   |                             description
 306 ---------+---------------------------------------------------------------------
 307  pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
 308  pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
 309 (2 rows)
 310 </pre>
 311
 312 <h3 id="show_bigm">show_bigm</h3>
 313 <p>show_bigm returns an array of all the 2-grams in the given string (argument #1).</p>
 314
 315 <ul>
 316 <li>Argument #1 (text) - character string</li>
 317 <li>Return value (text[]) - an array of all the 2-grams in argument #1</li>
 318 </ul>
 319
 320 <p>A 2-gram that show_bigm returns is a group of two consecutive characters taken from a string that blank character has been appended into the beginning and the end. For example, the 2-grams of the string "ABC" are "(blank)A" "AB" "BC" "C(blank)".</p>
 321
 322 <pre>
 323 =# SELECT show_bigm('full text search');
 324                             show_bigm
 325 ------------------------------------------------------------------
 326  {" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
 327 (1 row)
 328 </pre>
 329
 330 <h3 id="bigm_similarity">bigm_similarity</h3>
 331 <p>bigm_similarity returns a number that indicates how similar the two strings (argument #1 and #2) are.</p>
 332
 333 <ul>
 334 <li>Argument #1 (text) - character string</li>
 335 <li>Argument #2 (text) - character string</li>
 336 <li>Return value (real) - the similarity of two arguments</li>
 337 </ul>
 338
 339 <p>
 340 This function measures the similarity of two strings by counting the number of 2-grams they share. The range of the similarity is zero (indicating that the two strings are completely dissimilar) to one (indicating that the two strings are identical).
 341 </p>
 342
 343 <pre>
 344 =# SELECT bigm_similarity('full text search', 'text similarity search');
 345  bigm_similarity
 346 -----------------
 347         0.571429
 348 (1 row)
 349 </pre>
 350
 351 <p>
 352 Note that each argument is considered to have one space prefixed and suffixed when determining the set of 2-grams contained in the string for calculation of similarity.
 353 For example, though the string "ABC" contains the string "B", their similarity is 0 because there are no 2-grams they share as follows.
 354 On the other hand, the string "ABC" and "A" share one 2-gram "(blank)A" as follows, so their similarity is higher than 0.
 355 This is basically the same behavior as pg_trgm's similarity function.
 356
 357 <ul>
 358 <li>The 2-grams of the string "ABC" are "(blank)A" "AB" "BC" "C(blank)".</li>
 359 <li>The 2-grams of the string "A" are "(blank)A" "A(blank)".</li>
 360 <li>The 2-grams of the string "B" are "(blank)B" "B(blank)".</li>
 361 </ul>
 362 </p>
 363
 364 <pre>
 365 =# SELECT bigm_similarity('ABC', 'A');
 366  bigm_similarity
 367 -----------------
 368             0.25
 369 (1 row)
 370
 371 =# SELECT bigm_similarity('ABC', 'B');
 372  bigm_similarity
 373 -----------------
 374                0
 375 (1 row)
 376 </pre>
 377
 378 <p>
 379 Note that bigm_similarity IS case-sensitive, but pg_trgm's similarity function is not.
 380 For example, the similarity of the strings "ABC" and "abc" is 1 in pg_trgm's similarity function but 0 in bigm_similarity.
 381 </p>
 382
 383 <pre>
 384 =# SELECT similarity('ABC', 'abc');
 385  similarity
 386 ------------
 387           1
 388 (1 row)
 389
 390 =# SELECT bigm_similarity('ABC', 'abc');
 391  bigm_similarity
 392 -----------------
 393                0
 394 (1 row)
 395 </pre>
 396
 397 <h3 id="pg_gin_pending_stats">pg_gin_pending_stats</h3>
 398 <p>pg_gin_pending_stats is a function that returns the number of pages and tuples in the pending list of GIN index (argument #1).</p>
 399
 400 <ul>
 401 <li>Argument #1 (regclass) - Name or OID of GIN index</li>
 402 <li>Return value #1 (integer) - Number of pages in the pending list</li>
 403 <li>Return value #2 (bigint) - Number of tuples in the pending list</li>
 404 </ul>
 405
 406 <p>
 407 Note that the return value #1 and #2 are 0 if the argument #1 is the GIN index built with FASTUPDATE option disabled because it doesn't have a pending list.
 408 Please see <a href="http://www.postgresql.org/docs/current/static/gin-implementation.html#GIN-FAST-UPDATE">GIN Fast Update Technique</a> for details of the pending list and FASTUPDATE option.
 409 </p>
 410
 411 <pre>
 412 =# SELECT * FROM pg_gin_pending_stats('pg_tools_idx');
 413  pages | tuples
 414 -------+--------
 415      1 |      4
 416 (1 row)
 417 </pre>
 418
 419 <h2 id="parametares">Parameters</h2>
 420 <h3 id="last_update">pg_bigm.last_update</h3>
 421 <p>pg_bigm.last_update is a parameter that reports the last updated date of the pg_bigm module. This parameter is read-only. You cannot change the value of this parameter at all.</p>
 422
 423 <pre>
 424 =# SHOW pg_bigm.last_update;
 425  pg_bigm.last_update
 426 ---------------------
 427  2013.11.22
 428 (1 row)
 429 </pre>
 430
 431 <h3 id="enable_recheck">pg_bigm.enable_recheck</h3>
 432 <p>pg_bigm.enable_recheck is a parameter that specifies whether to perform Recheck which is an internal process of full text search. The default value is on, i.e., Recheck is performed. Not only superuser but also any user can change this parameter value in postgresql.conf or by using SET command. This parameter must be enabled if you want to obtain the correct search result.</p>
 433
 434 <p>PostgreSQL and pg_bigm internally perform the following processes to get the search results:</p>
 435
 436 <ul>
 437 <li>retrieve the result candidates from full text search index.</li>
 438 <li>choose the correct search results from the candidates.</li>
 439 </ul>
 440
 441 <p>The latter process is called Recheck. The result candidates retrieved from full text search index may contain wrong results. Recheck process gets rid of such wrong results.</p>
 442
 443 <p>For example, imagine the case where two character strings "He is awaiting trial" and "It was a trivial mistake" are stored in a table. The correct search result with the keyword "trial" is "He is awaiting trial". However, "It was a trivial mistake" is also retrieved as the result candidate from the full text search index because it contains all the 2-grams ("al", "ia", "ri", "tr") of the search keyword "trial". Recheck process tests whether each candidate contains the search keyword itself, and then chooses only the correct results.</p>
 444
 445 <p>How Recheck narrows down the search result can be observed in the result of EXPLAIN ANALYZE.</p>
 446
 447 <pre>
 448 =# CREATE TABLE tbl (doc text);
 449 =# INSERT INTO tbl VALUES('He is awaiting trial');
 450 =# INSERT INTO tbl VALUES('It was a trivial mistake');
 451 =# CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops);
 452 =# SET enable_seqscan TO off;
 453 =# EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial');
 454                                                    QUERY PLAN
 455 -----------------------------------------------------------------------------------------------------------------
 456  Bitmap Heap Scan on tbl  (cost=12.00..16.01 rows=1 width=32) (actual time=0.041..0.044 rows=1 loops=1)
 457    Recheck Cond: (doc ~~ '%trial%'::text)
 458    Rows Removed by Index Recheck: 1
 459    -&gt;  Bitmap Index Scan on tbl_idx  (cost=0.00..12.00 rows=1 width=0) (actual time=0.028..0.028 rows=2 loops=1)
 460          Index Cond: (doc ~~ '%trial%'::text)
 461  Total runtime: 0.113 ms
 462 (6 rows)
 463 </pre>
 464
 465 <p>In this example, you can see that Bitmap Index Scan retrieved two rows from the full text search index but Bitmap Heap Scan returned only one row after Recheck process.</p>
 466
 467 <p>It is possible to skip Recheck process and get the result candidates retrieved from the full text search index as the final results, by disabling this parameter. In the following example, wrong result "It was a trivial mistake" is also returned because the parameter is disabled.</p>
 468
 469 <pre>
 470 =# SELECT * FROM tbl WHERE doc LIKE likequery('trial');
 471          doc
 472 ----------------------
 473  He is awaiting trial
 474 (1 row)
 475
 476 =# SET pg_bigm.enable_recheck = off;
 477 =# SELECT * FROM tbl WHERE doc LIKE likequery('trial');
 478            doc
 479 --------------------------
 480  He is awaiting trial
 481  It was a trivial mistake
 482 (2 rows)
 483 </pre>
 484
 485 <p>This parameter must be enabled if you want to obtain the correct search result. On the other hand, you may need to set it to off, for example, for evaluation of Recheck performance overhead or debugging, etc.</p>
 486
 487 <h3 id="gin_key_limit">pg_bigm.gin_key_limit</h3>
 488 <p>pg_bigm.gin_key_limit is a parameter that specifies the maximum number of 2-grams of the search keyword to be used for full text search. If it's set to zero (default), all the 2-grams of the search keyword are used for full text search. Not only superuser but also any user can change this parameter value in postgresql.conf or by using SET command.</p>
 489
 490 <p>PostgreSQL and pg_bigm basically use all the 2-grams of search keyword to scan GIN index. However, in current implementation of GIN index, the more 2-grams are used, the more performance overhead of GIN index scan is increased. In the system that large search keyword is often used, full text search is likely to be slow. This performance issue can be solved by using this parameter and limiting the maximum number of 2-grams to be used.</p>
 491
 492 <p>On the other hand, the less 2-grams are used, the more wrong results are included in the result candidates retrieved from full text search index. Please note that this can increase the workload of Recheck and decrease the performance.</p>
 493
 494 <h3 id="similarity_limit">pg_bigm.similarity_limit</h3>
 495 <p>pg_bigm.similarity_limit is a parameter that specifies the threshold used by the similarity search. The similarity search returns all the rows whose similarity with the search keyword is higher than or equal to this threshold. Value must be between 0 and 1 (default is 0.3). Not only superuser but also any user can change this parameter value in postgresql.conf or by using SET command.</p>
 496
 497 <h2 id="limitations">Limitations</h2>
 498 <h3 id="indexed_column_size">Indexed Column Size</h3>
 499
 500 <p>The size of the column indexed by bigm GIN index cannot exceed 107,374,180 Bytes (~102MB). Any attempt to enter larger values will result in an error. </p>
 501
 502 <pre>
 503 =# CREATE TABLE t1 (description text);
 504 =# CREATE INDEX t1_idx ON t1 USING gin (description gin_bigm_ops);
 505 =# INSERT INTO t1 SELECT repeat('A', 107374181);
 506 ERROR:  out of memory
 507 </pre>
 508
 509 <p>pg_trgm also has this limitation. However, the maximum size in the case of trgm indexed column is 238,609,291 Bytes (~228MB).</p>
 510
 511 <h2 id="release_notes">Release Notes</h2>
 512 <ul>
 513 <li><a href="release-1-2_en.html">Version 1.2</a></li>
 514 <li><a href="release-1-1_en.html">Version 1.1</a></li>
 515 <li><a href="release-1-0_en.html">Version 1.0</a></li>
 516 </ul>
 517
 518 <hr>
 519 <div align="right">Copyright (c) 2017-2023, pg_bigm Development Group</div>
 520 <div align="right">Copyright (c) 2012-2016, NTT DATA Corporation</div>
 521
 522 </body>
 523 </html>