lib/bitvector/lib/Bit/Vector/String.pod

   1
   2 =head1 NAME
   3
   4 Bit::Vector::String - Generic string import/export for Bit::Vector
   5
   6 =head1 SYNOPSIS
   7
   8   use Bit::Vector::String;
   9
  10   to_Oct
  11       $string = $vector->to_Oct();
  12
  13   from_Oct
  14       $vector->from_Oct($string);
  15
  16   new_Oct
  17       $vector = Bit::Vector->new_Oct($bits,$string);
  18
  19   String_Export
  20       $string = $vector->String_Export($type);
  21
  22   String_Import
  23       $type = $vector->String_Import($string);
  24
  25   new_String
  26       $vector = Bit::Vector->new_String($bits,$string);
  27       ($vector,$type) = Bit::Vector->new_String($bits,$string);
  28
  29 =head1 DESCRIPTION
  30
  31 =over 2
  32
  33 =item *
  34
  35 C<$string = $vector-E<gt>to_Oct();>
  36
  37 Returns an octal string representing the given bit vector.
  38
  39 Note that this method is not particularly efficient, since it
  40 is almost completely realized in Perl, and moreover internally
  41 operates on a Perl list of individual octal digits which it
  42 concatenates into the final string using "C<join('', ...)>".
  43
  44 A benchmark reveals that this method is about 40 times slower
  45 than the method "C<to_Bin()>" (which is realized in C):
  46
  47  Benchmark: timing 10000 iterations of to_Bin, to_Hex, to_Oct...
  48      to_Bin:  1 wallclock secs ( 1.09 usr +  0.00 sys =  1.09 CPU)
  49      to_Hex:  1 wallclock secs ( 0.53 usr +  0.00 sys =  0.53 CPU)
  50      to_Oct: 40 wallclock secs (40.16 usr +  0.05 sys = 40.21 CPU)
  51
  52 Note that since an octal digit is always worth three bits,
  53 the length of the resulting string is always a multiple of
  54 three bits, regardless of the true length (in bits) of the
  55 given bit vector.
  56
  57 Also note that the B<LEAST> significant octal digit is
  58 located at the B<RIGHT> end of the resulting string, and
  59 the B<MOST> significant digit at the B<LEFT> end.
  60
  61 Finally, note that this method does B<NOT> prepend any uniquely
  62 identifying format prefix (such as "0o") to the resulting string
  63 (which means that the result of this method only contains valid
  64 octal digits, i.e., [0-7]).
  65
  66 However, this can of course most easily be done as needed,
  67 as follows:
  68
  69   $string = '0o' . $vector->to_Oct();
  70
  71 =item *
  72
  73 C<$vector-E<gt>from_Oct($string);>
  74
  75 Allows to read in the contents of a bit vector from an octal string,
  76 such as returned by the method "C<to_Oct()>" (see above).
  77
  78 Note that this method is not particularly efficient, since it is
  79 almost completely realized in Perl, and moreover chops the input
  80 string into individual characters using "C<split(//, $string)>".
  81
  82 Remember also that the least significant bits are always to the
  83 right of an octal string, and the most significant bits to the left.
  84 Therefore, the string is actually reversed internally before storing
  85 it in the given bit vector using the method "C<Chunk_List_Store()>",
  86 which expects the least significant chunks of data at the beginning
  87 of a list.
  88
  89 A benchmark reveals that this method is about 40 times slower than
  90 the method "C<from_Bin()>" (which is realized in C):
  91
  92  Benchmark: timing 10000 iterations of from_Bin, from_Hex, from_Oct...
  93    from_Bin:  1 wallclock secs ( 1.13 usr +  0.00 sys =  1.13 CPU)
  94    from_Hex:  1 wallclock secs ( 0.80 usr +  0.00 sys =  0.80 CPU)
  95    from_Oct: 46 wallclock secs (44.95 usr +  0.00 sys = 44.95 CPU)
  96
  97 If the given string contains any character which is not an octal digit
  98 (i.e., [0-7]), a fatal syntax error ensues ("unknown string type").
  99
 100 Note especially that this method does B<NOT> accept any uniquely
 101 identifying format prefix (such as "0o") in the given string; the
 102 presence of such a prefix will also lead to the fatal "unknown
 103 string type" error.
 104
 105 If the given string contains less octal digits than are needed to
 106 completely fill the given bit vector, the remaining (most significant)
 107 bits all remain cleared (i.e., set to zero).
 108
 109 This also means that, even if the given string does not contain
 110 enough digits to completely fill the given bit vector, the previous
 111 contents of the bit vector are erased completely.
 112
 113 If the given string is longer than it needs to fill the given bit
 114 vector, the superfluous characters are simply ignored.
 115
 116 This behaviour is intentional so that you may read in the string
 117 representing one bit vector into another bit vector of different
 118 size, i.e., as much of it as will fit.
 119
 120 =item *
 121
 122 C<$vector = Bit::Vector-E<gt>new_Oct($bits,$string);>
 123
 124 This method is an alternative constructor which allows you to create
 125 a new bit vector object (with "C<$bits>" bits) and to initialize it
 126 all in one go.
 127
 128 The method internally first calls the bit vector constructor method
 129 "C<new()>" and then stores the given string in the newly created
 130 bit vector using the same approach as the method "C<from_Oct()>"
 131 (described above).
 132
 133 Note that this approach is not particularly efficient, since it
 134 is almost completely realized in Perl, and moreover chops the input
 135 string into individual characters using "C<split(//, $string)>".
 136
 137 An exception will be raised if the necessary memory cannot be allocated
 138 (see the description of the method "C<new()>" in L<Bit::Vector(3)> for
 139 possible causes) or if the given string cannot be converted successfully
 140 (see the description of the method "C<from_Oct()>" above for details).
 141
 142 Note especially that this method does B<NOT> accept any uniquely
 143 identifying format prefix (such as "0o") in the given string and that
 144 such a prefix will lead to a fatal "unknown string type" error.
 145
 146 In case of an error, the memory occupied by the new bit vector is
 147 released again before the exception is actually thrown.
 148
 149 If the number of bits "C<$bits>" given has the value "C<undef>",
 150 the method will automatically allocate a bit vector with a size
 151 (i.e., number of bits) of three times the length of the given string
 152 (since every octal digit is worth three bits).
 153
 154 Note that this behaviour is different from that of the methods
 155 "C<new_Hex()>", "C<new_Bin()>", "C<new_Dec()>" and "C<new_Enum()>"
 156 (which are realized in C, internally); these methods will silently
 157 assume a value of 0 bits if "C<undef>" is given (and may warn
 158 about the "Use of uninitialized value" if warnings are enabled).
 159
 160 =item *
 161
 162 C<$string = $vector-E<gt>String_Export($type);>
 163
 164 Returns a string representing the given bit vector in the
 165 format specified by "C<$type>":
 166
 167   1 | b | bin      =>  binary        (using "to_Bin()")
 168   2 | o | oct      =>  octal         (using "to_Oct()")
 169   3 | d | dec      =>  decimal       (using "to_Dec()")
 170   4 | h | hex | x  =>  hexadecimal   (using "to_Hex()")
 171   5 | e | enum     =>  enumeration   (using "to_Enum()")
 172   6 | p | pack     =>  packed binary (using "Block_Read()")
 173
 174 The case (lower/upper/mixed case) of "C<$type>" is ignored.
 175
 176 If "C<$type>" is omitted or "C<undef>" or false ("0"
 177 or the empty string), a hexadecimal string is returned
 178 as the default format.
 179
 180 If "C<$type>" does not have any of the values described
 181 above, a fatal "unknown string type" will occur.
 182
 183 Beware that in order to guarantee that the strings can
 184 be correctly parsed and read in by the methods
 185 "C<String_Import()>" and "C<new_String()>" (described
 186 below), the method "C<String_Export()>" provides
 187 uniquely identifying prefixes (and, in one case,
 188 a suffix) as follows:
 189
 190   1 | b | bin      =>  '0b' . $vector->to_Bin();
 191   2 | o | oct      =>  '0o' . $vector->to_Oct();
 192   3 | d | dec      =>         $vector->to_Dec(); # prefix is [+-]
 193   4 | h | hex | x  =>  '0x' . $vector->to_Hex();
 194   5 | e | enum     =>  '{'  . $vector->to_Enum() . '}';
 195   6 | p | pack     =>  ':'  . $vector->Size() .
 196                        ':'  . $vector->Block_Read();
 197
 198 This is necessary because certain strings can be valid
 199 representations in more than one format.
 200
 201 All strings in binary format, i.e., which only contain "0"
 202 and "1", are also valid number representations (of a different
 203 value, of course) in octal, decimal and hexadecimal.
 204
 205 Likewise, a string in octal format is also valid in decimal
 206 and hexadecimal, and a string in decimal format is also valid
 207 in hexadecimal.
 208
 209 Moreover, if the enumeration of set bits (as returned by
 210 "C<to_Enum()>") only contains one element, this element could
 211 be mistaken for a representation of the entire bit vector
 212 (instead of just one bit) in decimal.
 213
 214 Beware also that the string returned by format "6" ("packed
 215 binary") will in general B<NOT BE PRINTABLE>, because it will
 216 usually consist of many unprintable characters!
 217
 218 =item *
 219
 220 C<$type = $vector-E<gt>String_Import($string);>
 221
 222 Allows to read in the contents of a bit vector from a string
 223 which has previously been produced by "C<String_Export()>",
 224 "C<to_Bin()>", "C<to_Oct()>", "C<to_Dec()>", "C<to_Hex()>",
 225 "C<to_Enum()>", "C<Block_Read()>" or manually or by another
 226 program.
 227
 228 Beware however that the string must have the correct format;
 229 otherwise a fatal "unknown string type" error will occur.
 230
 231 The correct format is the one returned by "C<String_Export()>"
 232 (see immediately above).
 233
 234 The method will also try to automatically recognize formats
 235 without identifying prefix such as returned by the methods
 236 "C<to_Bin()>", "C<to_Oct()>", "C<to_Dec()>", "C<to_Hex()>"
 237 and "C<to_Enum()>".
 238
 239 However, as explained above for the method "C<String_Export()>",
 240 due to the fact that a string may be a valid representation in
 241 more than one format, this may lead to unwanted results.
 242
 243 The method will try to match the format of the given string
 244 in the following order:
 245
 246 If the string consists only of [01], it will be considered
 247 to be in binary format (although it could be in octal, decimal
 248 or hexadecimal format or even be an enumeration with only
 249 one element as well).
 250
 251 If the string consists only of [0-7], it will be considered
 252 to be in octal format (although it could be in decimal or
 253 hexadecimal format or even be an enumeration with only
 254 one element as well).
 255
 256 If the string consists only of [0-9], it will be considered
 257 to be in decimal format (although it could be in hexadecimal
 258 format or even be an enumeration with only one element as well).
 259
 260 If the string consists only of [0-9A-Fa-f], it will be considered
 261 to be in hexadecimal format.
 262
 263 If the string only contains numbers in decimal format, separated
 264 by commas (",") or dashes ("-"), it is considered to be an
 265 enumeration (a single decimal number also qualifies).
 266
 267 And if the string starts with ":[0-9]:", the remainder of the
 268 string is read in with "C<Block_Store()>".
 269
 270 To avoid misinterpretations, it is therefore recommendable to
 271 always either use the method "C<String_Export()>" or to provide
 272 some uniquely identifying prefix (and suffix, in one case)
 273 yourself:
 274
 275   binary         =>  '0b' . $string;
 276   octal          =>  '0o' . $string;
 277   decimal        =>  '+'  . $string; # in case "$string"
 278                  =>  '-'  . $string; # has no sign yet
 279   hexadecimal    =>  '0x' . $string;
 280                  =>  '0h' . $string;
 281   enumeration    =>  '{'  . $string . '}';
 282                  =>  '['  . $string . ']';
 283                  =>  '<'  . $string . '>';
 284                  =>  '('  . $string . ')';
 285   packed binary  =>  ':'  . $vector->Size() .
 286                      ':'  . $vector->Block_Read();
 287
 288 Note that case (lower/upper/mixed case) is not important
 289 and will be ignored by this method.
 290
 291 Internally, the method uses the methods "C<from_Bin()>",
 292 "C<from_Oct()>", "C<from_Dec()>", "C<from_Hex()>",
 293 "C<from_Enum()>" and "C<Block_Store()>" for actually
 294 importing the contents of the string into the given
 295 bit vector. See their descriptions here in this document
 296 and in L<Bit::Vector(3)> for any further conditions that
 297 must be met and corresponding possible fatal error messages.
 298
 299 The method returns the number of the format that has been
 300 recognized:
 301
 302                 1    =>    binary
 303                 2    =>    octal
 304                 3    =>    decimal
 305                 4    =>    hexadecimal
 306                 5    =>    enumeration
 307                 6    =>    packed binary
 308
 309 =item *
 310
 311 C<$vector = Bit::Vector-E<gt>new_String($bits,$string);>
 312
 313 C<($vector,$type) = Bit::Vector-E<gt>new_String($bits,$string);>
 314
 315 This method is an alternative constructor which allows you to create
 316 a new bit vector object (with "C<$bits>" bits) and to initialize it
 317 all in one go.
 318
 319 The method internally first calls the bit vector constructor method
 320 "C<new()>" and then stores the given string in the newly created
 321 bit vector using the same approach as the method "C<String_Import()>"
 322 (described immediately above).
 323
 324 An exception will be raised if the necessary memory cannot be allocated
 325 (see the description of the method "C<new()>" in L<Bit::Vector(3)> for
 326 possible causes) or if the given string cannot be converted successfully
 327 (see the description of the method "C<String_Import()>" above for details).
 328
 329 In case of an error, the memory occupied by the new bit vector is
 330 released again before the exception is actually thrown.
 331
 332 If the number of bits "C<$bits>" given has the value "C<undef>", the
 333 method will automatically determine this value for you and allocate
 334 a bit vector of the calculated size.
 335
 336 Note that this behaviour is different from that of the methods
 337 "C<new_Hex()>", "C<new_Bin()>", "C<new_Dec()>" and "C<new_Enum()>"
 338 (which are realized in C, internally); these methods will silently
 339 assume a value of 0 bits if "C<undef>" is given (and may warn
 340 about the "Use of uninitialized value" if warnings are enabled).
 341
 342 The necessary number of bits is calculated as follows:
 343
 344   binary         =>       length($string);
 345   octal          =>   3 * length($string);
 346   decimal        =>  int( length($string) * log(10) / log(2) + 1 );
 347   hexadecimal    =>   4 * length($string);
 348   enumeration    =>  maximum of values found in $string + 1
 349   packed binary  =>  $string =~ /^:(\d+):/;
 350
 351 If called in scalar context, the method returns the newly created
 352 bit vector object.
 353
 354 If called in list context, the method additionally returns the
 355 number of the format which has been recognized, as explained
 356 above for the method "C<String_Import()>".
 357
 358 =back
 359
 360 =head1 SEE ALSO
 361
 362 Bit::Vector(3), Bit::Vector::Overload(3).
 363
 364 =head1 VERSION
 365
 366 This man page documents "Bit::Vector::String" version 6.4.
 367
 368 =head1 AUTHOR
 369
 370   Steffen Beyer
 371   mailto:sb@engelschall.com
 372   http://www.engelschall.com/u/sb/download/
 373
 374 =head1 COPYRIGHT
 375
 376 Copyright (c) 2004 by Steffen Beyer. All rights reserved.
 377
 378 =head1 LICENSE
 379
 380 This package is free software; you can redistribute it and/or
 381 modify it under the same terms as Perl itself, i.e., under the
 382 terms of the "Artistic License" or the "GNU General Public License".
 383
 384 The C library at the core of this Perl module can additionally
 385 be redistributed and/or modified under the terms of the "GNU
 386 Library General Public License".
 387
 388 Please refer to the files "Artistic.txt", "GNU_GPL.txt" and
 389 "GNU_LGPL.txt" in this distribution for details!
 390
 391 =head1 DISCLAIMER
 392
 393 This package is distributed in the hope that it will be useful,
 394 but WITHOUT ANY WARRANTY; without even the implied warranty of
 395 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 396
 397 See the "GNU General Public License" for more details.
 398