draft/man7/regex.7

   1 .\" From Henry Spencer's regex package (as found in the apache
   2 .\" distribution). The package carries the following copyright:
   3 .\"
   4 .\"  Copyright 1992, 1993, 1994 Henry Spencer.  All rights reserved.
   5 .\"  This software is not subject to any license of the American Telephone
   6 .\"  and Telegraph Company or of the Regents of the University of California.
   7 .\"
   8 .\"  Permission is granted to anyone to use this software for any purpose
   9 .\"  on any computer system, and to alter it and redistribute it, subject
  10 .\"  to the following restrictions:
  11 .\"
  12 .\"  1. The author is not responsible for the consequences of use of this
  13 .\"     software, no matter how awful, even if they arise from flaws in it.
  14 .\"
  15 .\"  2. The origin of this software must not be misrepresented, either by
  16 .\"     explicit claim or by omission.  Since few users ever read sources,
  17 .\"     credits must appear in the documentation.
  18 .\"
  19 .\"  3. Altered versions must be plainly marked as such, and must not be
  20 .\"     misrepresented as being the original software.  Since few users
  21 .\"     ever read sources, credits must appear in the documentation.
  22 .\"
  23 .\"  4. This notice may not be removed or altered.
  24 .\"
  25 .\" In order to comply with `credits must appear in the documentation'
  26 .\" I added an AUTHOR paragraph below - aeb.
  27 .\"
  28 .\" In the default nroff environment there is no dagger \(dg.
  29 .\"
  30 .\" 2005-05-11 Removed discussion of `[[:<:]]' and `[[:>:]]', which
  31 .\"     appear not to be in the glibc implementation of regcomp
  32 .\"
  33 .ie t .ds dg \(dg
  34 .el .ds dg (!)
  35 .\"
  36 .\" Japanese Version Copyright (c) 1998 NAKANO Takeo all rights reserved.
  37 .\" Translated Wed 8 Jul 1998 by NAKANO Takeo <nakano@apm.seikei.ac.jp>
  38 .\"
  39 .\"WORD:        regular expression      正規表現
  40 .\"WORD:        modern RE               新しい正規表現
  41 .\"WORD:        obsolete RE             古い正規表現
  42 .\"WORD:        basic RE                基本正規表現
  43 .\"WORD:        extended RE             拡張正規表現
  44 .\"WORD:        branch                  枝
  45 .\"WORD:        piece                   文節
  46 .\"WORD:        atom                    アトム
  47 .\"WORD:        bound                   繰り返し指定
  48 .\"WORD:        bracket expression      ブラケット表現
  49 .\"WORD:        digit                   数字
  50 .\"WORD:        collating sequence      照合順序
  51 .\"WORD:        collating element       照合順序の要素
  52 .\"WORD:        character class         文字クラス
  53 .\"WORD:        equivalent class        等価クラス
  54 .\"WORD:        substring               部分文字列
  55 .\"WORD:        subexpression           部分正規表現
  56 .\"
  57 .TH REGEX 7 2009-01-12 "" "Linux Programmer's Manual"
  58 .\"O .SH NAME
  59 .\"O regex \- POSIX.2 regular expressions
  60 .SH 名前
  61 regex \- POSIX.2 正規表現
  62 .\"O .SH DESCRIPTION
  63 .SH 説明
  64 .\"O Regular expressions ("RE"s),
  65 .\"O as defined in POSIX.2, come in two forms:
  66 .\"O modern REs (roughly those of
  67 .\"O .IR egrep ;
  68 .\"O POSIX.2 calls these "extended" REs)
  69 .\"O and obsolete REs (roughly those of
  70 .\"O .BR ed ;
  71 .\"O POSIX.2 "basic" REs).
  72 .\"O Obsolete REs mostly exist for backward compatibility in some old programs;
  73 .\"O they will be discussed at the end.
  74 .\"O POSIX.2 leaves some aspects of RE syntax and semantics open;
  75 .\"O "\*(dg" marks decisions on these aspects that
  76 .\"O may not be fully portable to other POSIX.2 implementations.
  77 正規表現 (Regular expression: RE) は POSIX.2 で定義されており、
  78 二つの形式がある。新しい正規表現 (modern RE) と古い正規表現 (obsolete RE)
  79 である。新しい正規表現はだいたい
  80 .I egrep
  81 のものと同じで、 POSIX.2 では「拡張」正規表現 ("extended" RE)
  82 と呼ばれている。古い正規表現はだいたい
  83 .BR ed (1)
  84 のものと同じで、 POSIX.2 では「基本」正規表現 ("basic" RE) である。
  85 古い正規表現は、古いプログラムとの互換性を保つためのものである。
  86 これについては最後に議論する。
  87 POSIX.2 では、正規表現の文法や記号の一部が、未定義のまま残されている。
  88 "\*(dg" は、このような意味で、他の POSIX.2 の実装と
  89 完全には互換でないかも知れない部分である。
  90 .PP
  91 .\"O A (modern) RE is one\*(dg or more nonempty\*(dg \fIbranches\fR,
  92 .\"O separated by \(aq|\(aq.
  93 .\"O It matches anything that matches one of the branches.
  94 (新しい) 正規表現は一つ以上\*(dg の空白でない \fI枝 (branch)\fP からなる。
  95 枝どうしは \(aq|\(aq で区切られる。正規表現は、
  96 枝のどれかにマッチ (match) したものにマッチする。
  97 .PP
  98 .\"O A branch is one\*(dg or more \fIpieces\fR, concatenated.
  99 .\"O It matches a match for the first, followed by a match for the second, etc.
 100 枝は一つ以上の文節 (piece) が結合されたものである。
 101 枝は第一の文節がマッチし、
 102 続いて第二の文節がマッチし、... したものにマッチする。
 103 .PP
 104 .\"O A piece is an \fIatom\fR possibly followed
 105 .\"O by a single\*(dg \(aq*\(aq, \(aq+\(aq, \(aq?\(aq, or \fIbound\fR.
 106 .\"O An atom followed by \(aq*\(aq
 107 .\"O matches a sequence of 0 or more matches of the atom.
 108 .\"O An atom followed by \(aq+\(aq
 109 .\"O matches a sequence of 1 or more matches of the atom.
 110 .\"O An atom followed by \(aq?\(aq
 111 .\"O matches a sequence of 0 or 1 matches of the atom.
 112 文節は\fIアトム (atom)\fR からなる。ただしアトムの後には一つ\*(dg の \(aq*\(aq,
 113 \(aq+\(aq, \(aq?\(aq あるいは \fI繰り返し指定 (bound)\fR が続くこともある。
 114 \(aq*\(aq が後置されたアトムは、マッチしたアトムの 0 個以上の並びにマッチする。
 115 \(aq+\(aq が後置されたアトムは、マッチしたアトムの 1 個以上の並びにマッチする。
 116 \(aq?\(aq が後置されたアトムは、マッチしたアトムの 0 個または 1 個にマッチする。
 117 .PP
 118 .\"O A \fIbound\fR is \(aq{\(aq followed by an unsigned decimal integer,
 119 .\"O possibly followed by \(aq,\(aq
 120 .\"O possibly followed by another unsigned decimal integer,
 121 .\"O always followed by \(aq}\(aq.
 122 \fI繰り返し指定\fRとは \(aq{\(aq に続いて、符号なし 10 進整数、\(aq,\(aq、
 123 もう一つの 10 進整数、\(aq}\(aq を並べたものである。\(aq,\(aq と二つめの
 124 10 進整数は省略できる。二つめの 10 進整数だけを省略することもできる
 125 (最後の `}' は省略できない)。
 126 .\"O The integers must lie between 0 and
 127 .\"O .B RE_DUP_MAX
 128 .\"O (255\*(dg) inclusive,
 129 .\"O and if there are two of them, the first may not exceed the second.
 130 .\"O An atom followed by a bound containing one integer \fIi\fR
 131 .\"O and no comma matches
 132 .\"O a sequence of exactly \fIi\fR matches of the atom.
 133 .\"O An atom followed by a bound
 134 .\"O containing one integer \fIi\fR and a comma matches
 135 .\"O a sequence of \fIi\fR or more matches of the atom.
 136 .\"O An atom followed by a bound
 137 .\"O containing two integers \fIi\fR and \fIj\fR matches
 138 .\"O a sequence of \fIi\fR through \fIj\fR (inclusive) matches of the atom.
 139 整数は 0 以上
 140 .B RE_DUP_MAX
 141 (255\*(dg) 以下の間で指定できる。
 142 二つ指定する場合には、最初の数値は後の数値を越えてはならない。
 143 整数 \fIi\fR だけからなる繰り返し指定を後置されたアトムは、
 144 アトムをぴったりちょうど \fIi\fR 個だけ並べたものにマッチする。
 145 整数 \fIi\fR とコンマが指定された繰り返し指定を後置されたアトムは、
 146 アトムを \fIi\fR個以上並べたものにマッチする。
 147 整数 \fIi\fR と \fIj\fR が指定された繰り返し指定を後置されたアトムは、
 148 アトムを \fIi\fR個以上 \fIj\fR 個以下だけ並べたものにマッチする。
 149 .PP
 150 .\"O An atom is a regular expression enclosed in "\fI()\fP"
 151 .\"O (matching a match for the regular expression),
 152 .\"O an empty set of "\fI()\fP" (matching the null string)\*(dg,
 153 .\"O a \fIbracket expression\fR (see below), \(aq.\(aq
 154 .\"O (matching any single character), \(aq^\(aq (matching the null string at the
 155 .\"O beginning of a line), \(aq$\(aq (matching the null string at the
 156 .\"O end of a line), a \(aq\e\(aq followed by one of the characters
 157 .\"O "\fI^.[$()|*+?{\e\fP"
 158 .\"O (matching that character taken as an ordinary character),
 159 .\"O a \(aq\e\(aq followed by any other character\*(dg
 160 .\"O (matching that character taken as an ordinary character,
 161 .\"O as if the \(aq\e\(aq had not been present\*(dg),
 162 .\"O or a single character with no other significance (matching that character).
 163 アトムの種類は以下の通り。"\fI()\fP" に囲まれた正規表現
 164 (その正規表現がマッチする文字列にマッチする)、
 165 中身が空の "\fI()\fP" (null 文字列にマッチする)\*(dg、
 166 \fIブラケット表現 (bracket expression\fR :後述)、
 167 \(aq.\(aq (任意の 1 文字にマッチする)、
 168 \(aq^\(aq (行頭の空白文字にマッチする)、
 169 \(aq$\(aq (行末の空白文字にマッチする)、
 170 \(aq\e\(aq に "\fI^.[$()|*+?{\e\fP" のいずれか一文字を後置したもの
 171 (通常の文字として扱われ、その文字にマッチする)、
 172 \(aq\e\(aq にそれ以外の文字を後置したもの\*(dg
 173 (\(aq\e\(aq がない場合と同じように、その文字にマッチする\*(dg)、
 174 特に意味を持たない文字一つ (その文字にマッチする)。
 175 .\"O A \(aq{\(aq followed by a character other than a digit is an ordinary
 176 .\"O character, not the beginning of a bound\*(dg.
 177 .\"O It is illegal to end an RE with \(aq\e\(aq.
 178 \(aq{\(aq は数字以外の文字が後置されると通常の文字として扱われ、
 179 繰り返し指定の始まりとはされない\*(dg。\(aq\e\(aq
 180 で終わる正規表現は不正なものとみなされる。
 181 .PP
 182 .\"O A \fIbracket expression\fR is a list of characters enclosed in "\fI[]\fP".
 183 .\"O It normally matches any single character from the list (but see below).
 184 .\"O If the list begins with \(aq^\(aq,
 185 .\"O it matches any single character
 186 .\"O (but see below) \fInot\fR from the rest of the list.
 187 .\"O If two characters in the list are separated by \(aq\-\(aq, this is shorthand
 188 .\"O for the full \fIrange\fR of characters between those two (inclusive) in the
 189 .\"O collating sequence,
 190 .\"O for example, "\fI[0\-9]\fP" in ASCII matches any decimal digit.
 191 .\"O It is illegal\*(dg for two ranges to share an
 192 .\"O endpoint, for example, "\fIa-c-e\fP".
 193 .\"O Ranges are very collating-sequence-dependent,
 194 .\"O and portable programs should avoid relying on them.
 195 \fIブラケット表現\fRは "\fI[]\fP" によって閉じられた文字のリストである。
 196 これは通常リスト中に存在している文字にマッチする。
 197 (例外あり、後述。) リストが \(aq^\(aq で始まると、
 198 \fIブラケット表現\fRはリストに存在して\fIいない\fR文字一つにマッチする
 199 (例外あり、後述)。 リスト中の二つの文字が \(aq\-\(aq で区切られている場合は、
 200 これは照合順序 (collating sequence) でその二つの文字に挟まれる、
 201 すべての文字の並びを短縮したものとみなされる (両端含む)。
 202 例えば "\fI[0\-9]\fP" は ASCII では 10 進の数字 (digit) のいずれかにマッチする。
 203 二つの領域指定が端点を共有してはならない\*(dg。
 204 つまり "\fIa-c-e\fP" のようなものは不正である。領域指定は照合順序に強く依存する。
 205 したがって移植性の高いプログラムを作る場合は、
 206 領域指定には頼らないほうが良いだろう。
 207 .PP
 208 【訳注: 照合順序 (collating sequence) というのは、国際化
 209 (Internationalization) に関連した用語です。アルファベット順に単語を並
 210 べる際には、言語によって並べる基準が異なります。照合順序は、その差異を
 211 吸収するための仕組みです。
 212 .PP
 213 例えば、スペイン語では ch という文字並びを特別扱いするため、アルファベッ
 214 ト順が a, b, c, ch, d, e, ... の順になるそうです。このようなシーケンス
 215 のことを collating sequence と言います。このとき `ch' という文字並びは、
 216 単語整列の際にあたかも「一文字」のように扱われます。ここで、
 217 順序付けを行う際に最小の単位となる、`a'、`b' の文字や
 218 `ch' のような特別な文字並びなど、照合順序の要素のことを
 219 collating element と言います。collating sequence は、文字単位ではなく
 220 collating element を単位として定義されます。】
 221 .PP
 222 .\"O To include a literal \(aq]\(aq in the list, make it the first character
 223 .\"O (following a possible \(aq^\(aq).
 224 .\"O To include a literal \(aq\-\(aq, make it the first or last character,
 225 .\"O or the second endpoint of a range.
 226 .\"O To use a literal `\-' as the first endpoint of a range,
 227 .\"O enclose it in `[.' and `.]' to make it a collating element (see below).
 228 .\"O With the exception of these and some combinations using `[' (see next
 229 .\"O paragraphs), all other special characters, including `\e', lose their
 230 .\"O To use a literal \(aq\-\(aq as the first endpoint of a range,
 231 .\"O enclose it in "\fI[.\fP" and "\fI.]\fP"
 232 .\"O to make it a collating element (see below).
 233 .\"O With the exception of these and some combinations using \(aq[\(aq (see next
 234 .\"O paragraphs), all other special characters, including \(aq\e\(aq, lose their
 235 .\"O special significance within a bracket expression.
 236 文字 \(aq]\(aq そのものをリストに入れたい場合は、
 237 最初の文字として指定すれば良い (\(aq^\(aq) の後に続けるのでも良い)。
 238 文字 \(aq\-\(aq そのものをリストに入れたい場合は、
 239 最初か最後の文字とすれば良い。
 240 あるいは領域指定の終端文字として指定しても良い。
 241 \(aq\-\(aq を領域指定の先頭文字に指定するには、"\fI[.\fP" と "\fI.]\fP" で囲って、
 242 照合順序の要素 (collating element: 後述) にすれば良い。
 243 他の特殊文字 ( も含む) は、
 244 ブラケット表現の内部ではすべて通常の文字として扱われる。
 245 .PP
 246 .\"O Within a bracket expression, a collating element (a character,
 247 .\"O a multicharacter sequence that collates as if it were a single character,
 248 .\"O or a collating-sequence name for either)
 249 .\"O enclosed in "\fI[.\fP" and "\fI.]\fP" stands for the
 250 .\"O sequence of characters of that collating element.
 251 .\"O The sequence is a single element of the bracket expression's list.
 252 .\"O A bracket expression containing a multicharacter collating element
 253 .\"O can thus match more than one character,
 254 .\"O for example, if the collating sequence includes a "ch" collating element,
 255 .\"O then the RE "\fI[[.ch.]]*c\fP" matches the first five characters
 256 .\"O of "chchcc".
 257 ブラケット表現の内部では、"\fI[.\fP" と "\fI.]\fP" に囲われた照合順序の要素は、
 258 その要素に対応する文字並びを表す。
 259 「照合順序の要素」とは、
 260 [1] 文字、 [2] 単一文字のように扱われる複数文字のシーケンス、
 261 [3] 1, 2 いずれかに対応する照合順序上の名前、のいずれかである。
 262 この繰り返しは、ブラケット表現のリストにおける単一の要素となる。
 263 上記 [2] の、「複数文字からなる照合順序要素」を含むブラケット表現は、
 264 したがって一文字以上にマッチすることがある。
 265 例えば、もし照合順序が "ch" という要素を含んでいる場合には、
 266 正規表現 "\fI[[.ch.]]*c\fP" は "chchcc" の最初の 5 文字にマッチする。
 267 .PP
 268 .\"O Within a bracket expression, a collating element enclosed in "\fI[=\fP" and
 269 .\"O "\fI=]\fP" is an equivalence class, standing for the sequences of characters
 270 .\"O of all collating elements equivalent to that one, including itself.
 271 .\"O (If there are no other equivalent collating elements,
 272 .\"O the treatment is as if the enclosing delimiters
 273 .\"O were "\fI[.\fP" and "\fI.]\fP".)
 274 .\"O For example, if o and \o'o^' are the members of an equivalence class,
 275 .\"O then "\fI[[=o=]]\fP", "\fI[[=\o'o^'=]]\fP",
 276 .\"O and "\fI[o\o'o^']\fP" are all synonymous.
 277 .\"O An equivalence class may not\*(dg be an endpoint
 278 .\"O of a range.
 279 ブラケット表現の内部では、"\fI[=\fP" と "\fI=]\fP" に囲まれた照合順序の要素は、
 280 等価クラス (equivalence class) となる。
 281 これは、その要素と等価な要素すべてからなる文字シーケンス (自身も含む) を表す。
 282 他に等価な要素がなければ、
 283 取り扱いは "\fI[.\fP" と "\fI.]\fP" で囲まれている場合と同じである。
 284 例えば o と ou が等価クラスのメンバーであれば、
 285 "\fI[[=o=]]\fP", "\fI[[=\o'o^'=]]\fP", "\fI[o\o'o^']\fP" はすべて同じ意味になる。
 286 等価クラスは領域指定の端点にはなれない\*(dg。
 287 .\" nippon 端末では \o'o^' が正しく出ないので、例示を変更しました。
 288 .PP
 289 .\"O Within a bracket expression, the name of a \fIcharacter class\fR enclosed
 290 .\"O in "\fI[:\fP" and "\fI:]\fP" stands for the list
 291 .\"O of all characters belonging to that
 292 .\"O class.
 293 .\"O Standard character class names are:
 294 ブラケット表現の内部では、"\fI[:\fP" と "\fI:]\fP" で囲われた\fI文字クラス
 295 (character class)\fR はそのクラスに属するすべての文字のリストを表す。
 296 標準で用意されている文字クラスの名前は以下の通り:
 297 .PP
 298 .RS
 299 .nf
 300 .ta 3c 6c 9c
 301 alnum   digit   punct
 302 alpha   graph   space
 303 blank   lower   upper
 304 cntrl   print   xdigit
 305 .fi
 306 .RE
 307 .PP
 308 .\"O These stand for the character classes defined in
 309 .\"O .BR wctype (3).
 310 .\"O A locale may provide others.
 311 .\"O A character class may not be used as an endpoint of a range.
 312 これらは
 313 .BR wctype (3)
 314 で定義されている文字クラスを表している。ロケール (locale) によって、
 315 これら以外のクラスが定義されることもある。
 316 文字クラスは領域指定の端点にはなれない。
 317 .\" .PP
 318 .\"O .\" As per http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=295666
 319 .\"O .\" The following does not seem to apply in the glibc implementation
 320 .\"O .\" There are two special cases\*(dg of bracket expressions:
 321 .\"O .\" the bracket expressions "\fI[[:<:]]\fP" and "\fI[[:>:]]\fP" match
 322 .\"O .\" the null string at the beginning and end of a word respectively.
 323 .\"O .\" A word is defined as a sequence of
 324 .\"O .\" word characters
 325 .\"O .\" which is neither preceded nor followed by
 326 .\"O .\" word characters.
 327 .\"O .\" A word character is an
 328 .\"O .\" .I alnum
 329 .\"O .\" character (as defined by
 330 .\"O .\" .BR wctype (3))
 331 .\"O .\" or an underscore.
 332 .\"O .\" This is an extension,
 333 .\"O .\" compatible with but not specified by POSIX.2,
 334 .\"O .\" and should be used with
 335 .\"O .\" caution in software intended to be portable to other systems.
 336 .\" http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=295666 にあるように
 337 .\" 以下の記載は glibc の実装にはあてはまらないようである。
 338 .\" ブラケット表現には、特殊な意味を持つものが二つ存在する\*(dg。
 339 .\" ブラケット表現 "\fI[[:<:]]\fP" はワード (word) 先頭のヌル文字列に、
 340 .\" "\fI[[:>:]]\fP" はワード末尾のヌル文字列にそれぞれマッチする。
 341 .\" ワードとはワード文字の並びであり、
 342 .\" ワード文字が前置も後置もされていないものである。
 343 .\" ワード文字は
 344 .\" .I alnum
 345 .\" 文字
 346 .\" .RB ( wctype (3)
 347 .\" で定義されている) およびアンダースコア `_' である。
 348 .\" これは拡張記法であり、POSIX.2 に反してはいないが、
 349 .\" 定義もされていない。
 350 .\" 他のシステムと互換性を確保したいソフトウェアでは、
 351 .\" 注意して用いるようにすること。
 352 .PP
 353 .\"O In the event that an RE could match more than one substring of a given
 354 .\"O string,
 355 .\"O the RE matches the one starting earliest in the string.
 356 .\"O If the RE could match more than one substring starting at that point,
 357 .\"O it matches the longest.
 358 .\"O Subexpressions also match the longest possible substrings, subject to
 359 .\"O the constraint that the whole match be as long as possible,
 360 .\"O with subexpressions starting earlier in the RE taking priority over
 361 .\"O ones starting later.
 362 .\"O Note that higher-level subexpressions thus take priority over
 363 .\"O their lower-level component subexpressions.
 364 正規表現が、与えられた文字列の複数の部分文字列
 365 (substring) にマッチできるような場合には、
 366 最も先頭の近くから始まるものにマッチする。
 367 その位置から始まり、正規表現がマッチできる部分文字列が複数ある場合には、
 368 最長のものにマッチする。
 369 部分正規表現 (subexpression) も最も長い部分文字列にマッチする。
 370 ただし、全体のマッチが最長であるように、という条件が優先される。
 371 正規表現の中で先に現れる部分正規表現は、後に現れるものより優先される。
 372 ただし、より高位の部分正規表現は、
 373 それを構成する低位の部分正規表現よりも優先されることに注意すること。
 374 .PP
 375 .\"O Match lengths are measured in characters, not collating elements.
 376 .\"O A null string is considered longer than no match at all.
 377 .\"O For example,
 378 .\"O "\fIbb*\fP" matches the three middle characters of "abbbc",
 379 .\"O "\fI(wee|week)(knights|nights)\fP"
 380 .\"O matches all ten characters of "weeknights",
 381 .\"O when "\fI(.*).*\fP" is matched against "abc" the parenthesized subexpression
 382 .\"O matches all three characters, and
 383 .\"O when "\fI(a*)*\fP" is matched against "bc"
 384 .\"O both the whole RE and the parenthesized
 385 .\"O subexpression match the null string.
 386 マッチ長は照合順序の要素ではなく、文字数を単位としてカウントされる。
 387 null 文字列は、全くマッチしなかった場合よりも長いとみなされる。
 388 例えば "\fIbb*\fP" は "abbbc" のまん中の 3 文字にマッチする。
 389 "\fI(wee|week)(knights|nights)\fP" は "weeknights" の全体にマッチする。
 390 "\fI(.*).*\fP" を "abc" にマッチさせると、
 391 括弧の内部の部分正規表現が 3 文字すべてにマッチする。
 392 "\fI(a*)*\fP" を "bc" にマッチさせると、正規表現全体も、
 393 括弧で括られた部分正規表現も null 文字列にマッチする。
 394 .PP
 395 .\"O If case-independent matching is specified,
 396 .\"O the effect is much as if all case distinctions had vanished from the
 397 .\"O alphabet.
 398 .\"O When an alphabetic that exists in multiple cases appears as an
 399 .\"O ordinary character outside a bracket expression, it is effectively
 400 .\"O transformed into a bracket expression containing both cases,
 401 .\"O for example, \(aqx\(aq becomes "\fI[xX]\fP".
 402 .\"O When it appears inside a bracket expression, all case counterparts
 403 .\"O of it are added to the bracket expression, so that, for example, "\fI[x]\fP"
 404 .\"O becomes "\fI[xX]\fP" and "\fI[^x]\fP" becomes "\fI[^xX]\fP".
 405 マッチが大文字・小文字を無視するように指定されると、
 406 アルファベット全体から大小文字の区別が無くなったかのような効果となる。
 407 大文字・小文字を持つアルファベットがブラケット表現の外部で
 408 通常の文字として現れると、
 409 これは実効的に大小両方の文字のブラケット表現のように変換される。
 410 すなわち \(aqx\(aq は "\fI[xX]\fP" となる。ブラケット表現の内部に現れると、
 411 大文字なら小文字が、小文字なら大文字がそのブラケット表現に加えられる。
 412 すなわち
 413 "\fI[x]\fP" は "\fI[xX]\fP" に、"\fI[^x]\fP" は "\fI[^xX]\fP" になる。
 414 .PP
 415 .\"O No particular limit is imposed on the length of REs\*(dg.
 416 .\"O Programs intended to be portable should not employ REs longer
 417 .\"O than 256 bytes,
 418 .\"O as an implementation can refuse to accept such REs and remain
 419 .\"O POSIX-compliant.
 420 正規表現の長さには特に制限はない\*(dg。
 421 ただし移植性を高くしたいプログラムでは、
 422 256 バイトより長い正規表現は実行しないようにするほうが良い。
 423 なぜなら、そのような正規表現を拒否し、
 424 しかも POSIX 互換を保つような実装が可能だからである。
 425 .PP
 426 .\"O Obsolete ("basic") regular expressions differ in several respects.
 427 .\"O \(aq|\(aq, \(aq+\(aq, and \(aq?\(aq are
 428 .\"O ordinary characters and there is no equivalent
 429 .\"O for their functionality.
 430 .\"O The delimiters for bounds are "\fI\e{\fP" and "\fI\e}\fP",
 431 .\"O with \(aq{\(aq and \(aq}\(aq by themselves ordinary characters.
 432 .\"O The parentheses for nested subexpressions are "\fI\e(\fP" and "\fI\e)\fP",
 433 .\"O with \(aq(\(aq and \(aq)\(aq by themselves ordinary characters.
 434 .\"O \(aq^\(aq is an ordinary character except at the beginning of the
 435 .\"O RE or\*(dg the beginning of a parenthesized subexpression,
 436 .\"O \(aq$\(aq is an ordinary character except at the end of the
 437 .\"O RE or\*(dg the end of a parenthesized subexpression,
 438 .\"O and \(aq*\(aq is an ordinary character if it appears at the beginning of the
 439 .\"O RE or the beginning of a parenthesized subexpression
 440 .\"O (after a possible leading \(aq^\(aq).
 441 古い ("基本") 正規表現は、いくつかの点において異なる。
 442 \(aq|\(aq, \(aq+\(aq, and \(aq?\(aq は通常の文字となる。
 443 対応する機能は存在しない。繰り返し指定の区切りは
 444 "\fI\e{\fP" および "\fI\e}\fP" となる。\(aq{\(aq と \(aq}\(aq は、
 445 単独では通常の文字として扱われる。
 446 部分正規表現をネストする括弧は "\fI\e(\fP" および "\fI\e)\fP" となり、
 447 \(aq(\(aq と \(aq)\(aq は単独では通常の文字となる。
 448 \(aq^\(aq は正規表現の先頭か、
 449 括弧でくくられた部分表現の先頭\*(dgを除いて通常の文字となる。
 450 \(aq$\(aq は正規表現の末尾か、
 451 括弧でくくられた部分正規表現の末尾\*(dgを除いて通常の文字となる。
 452 \(aq*\(aq は、正規表現の先頭か、
 453 括弧でくくられた部分文字列の先頭に置かれた場合は通常の文字となる
 454 (\(aq^\(aq) が前置されていてもよい)。
 455 .PP
 456 .\"O Finally, there is one new type of atom, a \fIback reference\fR:
 457 .\"O \(aq\e\(aq followed by a nonzero decimal digit \fId\fR
 458 .\"O matches the same sequence of characters
 459 .\"O matched by the \fId\fRth parenthesized subexpression
 460 .\"O (numbering subexpressions by the positions of their opening parentheses,
 461 .\"O left to right),
 462 .\"O so that, for example, "\fI\e([bc]\e)\e1\fP" matches "bb" or "cc" but not "bc".
 463 最後に、アトムとして別のタイプが存在する。
 464 \fI後方参照 (back reference)\fR である。
 465 \(aq\e\(aq の後に 0 でない 10 進数値文字 \fId\fR が続くと、
 466 括弧でくくられた部分正規表現の
 467 \fId\fR 番目にマッチした文字並びと同じものにマッチする。
 468 (部分正規表現の番号付けは、
 469 開き括弧 `(' の位置が左のものから右のものへ向かってなされる。)
 470 したがって "\fI\e([bc]\e)\e1\fP" は
 471 "bb" または "cc" にはマッチするが、"bc" にはマッチしない。
 472 .\"O .SH BUGS
 473 .SH バグ
 474 .\"O Having two kinds of REs is a botch.
 475 正規表現が 2 種類あるのは格好悪い。
 476 .PP
 477 .\"O The current POSIX.2 spec says that \(aq)\(aq is an ordinary character in
 478 .\"O the absence of an unmatched \(aq(\(aq;
 479 .\"O this was an unintentional result of a wording error,
 480 .\"O and change is likely.
 481 .\"O Avoid relying on it.
 482 現在の POSIX.2 規格においては、\(aq)\(aq は、
 483 対応する \(aq(\(aq がない場合には通常の文字として扱われることになっている。
 484 しかしこれは、本来の意図とは異なる記述上のエラーであり、
 485 修正される可能性が高い。これに依存したコードは使わないこと。
 486 .PP
 487 .\"O Back references are a dreadful botch,
 488 .\"O posing major problems for efficient implementations.
 489 .\"O They are also somewhat vaguely defined
 490 .\"O (does
 491 .\"O "\fIa\e(\e(b\e)*\e2\e)*d\fP" match "abbbd"?).
 492 .\"O Avoid using them.
 493 後方参照はひどく出来の悪い代物である。
 494 効率の良い実装をするのはとても難しい。
 495 また定義があいまいである。
 496 ("\fIa\e(\e(b\e)*\e2\e)*d\fP" は "abbbd" にマッチすると思うか？)
 497 使わないほうが良い。
 498 .PP
 499 .\"O POSIX.2's specification of case-independent matching is vague.
 500 .\"O The "one case implies all cases" definition given above
 501 .\"O is current consensus among implementors as to the right interpretation.
 502 POSIX.2 の規格では、case (大文字か小文字か)
 503 に依存しないマッチの記述があいまいである。
 504 現在のところでは「一つの case がすべての case を意味する」
 505 という上記の定義が正しい解釈であるというのが、
 506 実装者の間での共通認識のようである。
 507 .\" As per http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=295666
 508 .\" The following does not seem to apply in the glibc implementation
 509 .\" .PP
 510 .\"O .\" The syntax for word boundaries is incredibly ugly.
 511 .\" ワード境界に関する文法定義が非常に醜い。
 512 .\"O .SH AUTHOR
 513 .SH 著者
 514 .\" Sigh... The page license means we must have the author's name
 515 .\" in the formatted output.
 516 .\"O This page was taken from Henry Spencer's regex package.
 517 このページは Henry Spencer の regex パッケージから採録したものである。
 518 .\"O .SH SEE ALSO
 519 .SH 関連項目
 520 .BR grep (1),
 521 .BR regex (3)
 522 .PP
 523 POSIX.2, section 2.8 (Regular Expression Notation).