git.osdn.net Git - pg-rex/syncrep.git/commit

author	Tom Lane <tgl@sss.pgh.pa.us>
	Tue, 23 Oct 2007 20:46:12 +0000 (20:46 +0000)
committer	Tom Lane <tgl@sss.pgh.pa.us>
	Tue, 23 Oct 2007 20:46:12 +0000 (20:46 +0000)
commit	dbaec70c153239224c0288d865b96c2f939fbdf5
tree	a2309acc315e5d4b9f9b0cd8b2ad60dc999ba93d	tree \| snapshot
parent	344d0cae64dbf398559b855806fc7338ec0a2e64	commit \| diff

Rename and slightly redefine the default text search parser's "word"
categories, as per discussion.  asciiword (formerly lword) is still
ASCII-letters-only, and numword (formerly word) is still the most general
mixed-alpha-and-digits case.  But word (formerly nlword) is now
any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as
before.  This is no worse than before for parsing mixed Russian/English text,
which seems to have been the design center for the original coding; and it
should simplify matters for parsing most European languages.  In particular
it will not be necessary for any language to accept strings containing digits
as being regular "words".  The hyphenated-word categories are adjusted
similarly.

doc/src/sgml/func.sgml		diff \| blob \| history
doc/src/sgml/textsearch.sgml		diff \| blob \| history
src/backend/snowball/Makefile		diff \| blob \| history
src/backend/snowball/snowball.sql.in		diff \| blob \| history
src/backend/tsearch/wparser_def.c		diff \| blob \| history
src/include/catalog/catversion.h		diff \| blob \| history
src/test/regress/expected/tsdicts.out		diff \| blob \| history
src/test/regress/expected/tsearch.out		diff \| blob \| history
src/test/regress/sql/tsdicts.sql		diff \| blob \| history
src/tools/msvc/Install.pm		diff \| blob \| history