Merge remote-tracking branch \'toybox/master\' into HEAD

[android-x86/external-toybox.git] / www / code.html
diff --git a/www/code.html b/www/code.html

old mode 100755 (executable)

new mode 100644 (file)

index a5ffa15..c0566b4
--- a/www/code.html
+++ b/www/code.html
@@ -14,15 +14,9 @@ This helps code auditing and thus reduces bugs. That said, sometimes being
  more explicit is preferable to being clever enough to outsmart yourself:
  don't be so terse your code is unreadable.</p>
  
-<p>Toybox source uses two spaces per indentation level, and wraps at 80
-columns.</p>
-
-<p>Gotos are allowed for error handling, and for breaking out of
-nested loops.  In general, a goto should only jump forward (not back), and
-should either jump to the end of an outer loop, or to error handling code
-at the end of the function.  Goto labels are never indented: they override the
-block structure of the file.  Putting them at the left edge makes them easy
-to spot as overrides to the normal flow of control, which they are.</p>
+<p>Toybox has an actual coding style guide over on
+<a href=design.html#codestyle>the design page</a>, but in general we just
+want the code to be consistent.</p>
  
  <p><h1><a name="building" /><a href="#building">Building Toybox</a></h1></p>
  
@@ -36,6 +30,10 @@ controls which features are included when compiling toybox.</p>
  either isn't complete or is a special-purpose option (such as debugging
  code) that isn't intended for general purpose use.</p>
  
+<p>For a more compact human-editable version .config files, you can use the
+<a href=http://landley.net/aboriginal/FAQ.html#dev_miniconfig>miniconfig</a>
+format.</p>
+
  <p>The standard build invocation is:</p>
  
  <ul>
@@ -53,8 +51,16 @@ accepts existing definitions of the environment variables, so it may be sourced
  or modified by the developer before building and the definitions exported
  to the environment will take precedence.</p>
  
-<p>(To clarify: "configure" describes the build and installation environment,
-".config" lists the features selected by defconfig/menuconfig.)</p>
+<p>(To clarify: ".config" lists the features selected by defconfig/menuconfig,
+I.E. "what to build", and "configure" describes the build and installation
+environment, I.E. "how to build it".)</p>
+
+<p>By default "make install" puts files in /usr/toybox. Adding this to the
+$PATH is up to you. The environment variable $PREFIX can change the
+install location, ala "PREFIX=/usr/local/bin make install".</p>
+
+<p>If you need an unstripped (debug) version of any of these binaries,
+look in generated/unstripped.</p>
  
  <p><h1><a name="running"><a href="#running">Running a command</a></h1></p>
  
@@ -113,13 +119,14 @@ other global infrastructure.</li>
  multiple commands:</li>
  <ul>
  <li><a href="#lib_lib">lib/lib.c</a></li>
+<li><a href="#lib_xwrap">lib/xwrap.c</a></li>
  <li><a href="#lib_llist">lib/llist.c</a></li>
  <li><a href="#lib_args">lib/args.c</a></li>
  <li><a href="#lib_dirtree">lib/dirtree.c</a></li>
  </ul>
  <li>The <a href="#toys">toys directory</a> contains the C files implementating
-each command. Currently it contains three subdirectories:
-posix, lsb, and other.</li>
+each command. Currently it contains five subdirectories categorizing the
+commands: posix, lsb, other, example, and pending.</li>
  <li>The <a href="#scripts">scripts directory</a> contains the build and
  test infrastructure.</li>
  <li>The <a href="#kconfig">kconfig directory</a> contains the configuration
@@ -130,33 +137,48 @@ files generated from other parts of the source code.</li>
  
  <a name="adding" />
  <p><h1><a href="#adding">Adding a new command</a></h1></p>
-<p>To add a new command to toybox, add a C file implementing that command under
-the toys directory.  No other files need to be modified; the build extracts
-all the information it needs (such as command line arguments) from specially
-formatted comments and macros in the C file.  (See the description of the
-<a href="#generated">"generated" directory</a> for details.)</p>
-
-<p>Currently there are three subdirectories under "toys", one for commands
+<p>To add a new command to toybox, add a C file implementing that command to
+one of the subdirectories under the toys directory.  No other files need to
+be modified; the build extracts all the information it needs (such as command
+line arguments) from specially formatted comments and macros in the C file.
+(See the description of the <a href="#generated">"generated" directory</a>
+for details.)</p>
+
+<p>Currently there are five subdirectories under "toys", one for commands
  defined by the POSIX standard, one for commands defined by the Linux Standard
-Base, and one for all other commands. (This is just for developer convenience
-sorting them, the directories are otherwise functionally identical.)</p>
-
-<p>An easy way to start a new command is copy the file "toys/other/hello.c" to
-the name of the new command, and modify this copy to implement the new command.
-This file is an example command meant to be used as a "skeleton" for
-new commands (more or less by turning every instance of "hello" into the
+Base, an "other" directory for commands not covered by an obvious standard,
+a directory of example commands (templates to use when starting new commands),
+and a "pending" directory of commands that need further review/cleanup
+before moving to one of the other directories (run these at your own risk,
+cleanup patches welcome).
+These directories are just for developer convenience sorting the commands,
+the directories are otherwise functionally identical. To add a new category,
+create the appropriate directory with a README file in it whose first line
+is the description menuconfig should use for the directory.)</p>
+
+<p>An easy way to start a new command is copy the file "toys/example/hello.c"
+to the name of the new command, and modify this copy to implement the new
+command (more or less by turning every instance of "hello" into the
  name of your command, updating the command line arguments, globals, and
-help data,  and then filling out its "main" function with code that does
-something interesting).  It provides examples of all the build infrastructure
-(including optional elements like command line argument parsing and global
-variables that a "hello world" program doesn't strictly need).</p>
+help data, and then filling out its "main" function with code that does
+something interesting).</p> 
+
+<p>You could also start with "toys/example/skeleton.c", which provides a lot
+more example code (showing several variants of command line option
+parsing, how to implement multiple commands in the same file, and so on).
+But usually it's just more stuff to delete.</p>
  
  <p>Here's a checklist of steps to turn hello.c into another command:</p>
  
  <ul>
-<li><p>First "cd toys/other" and "cp hello.c yourcommand.c".  Note that the name
-of this file is significant, it's the name of the new command you're adding
-to toybox.  Open your new file in your favorite editor.</p></li>
+<li><p>First "cp toys/example/hello.c toys/other/yourcommand.c" and open
+the new file in your preferred text editor.</p>
+<ul><li><p>Note that the
+name of the new file is significant: it's the name of the new command you're
+adding to toybox. The build includes all *.c files under toys/*/ whose
+names are a case insensitive match for an enabled config symbol. So
+toys/posix/cat.c only gets included if you have "CAT=y" in ".config".</p></li>
+</ul></p></li>
  
  <li><p>Change the one line comment at the top of the file (currently
  "hello.c - A hello world program") to describe your new file.</p></li>
@@ -174,7 +196,7 @@ structure.  The arguments to the NEWTOY macro are:</p>
  
  <ol>
  <li><p>the name used to run your command</p></li>
-<li><p>the command line argument <a href="#lib_args">option parsing string</a> (NULL if none)</p></li>
+<li><p>the command line argument <a href="#lib_args">option parsing string</a> (0 if none)</p></li>
  <li><p>a bitfield of TOYFLAG values
  (defined in toys.h) providing additional information such as where your
  command should be installed on a running system, whether to blank umask
@@ -222,7 +244,19 @@ correspond to the arguments specified in NEWTOY().
  where execution of your command starts. Your command line options are
  already sorted into this.optflags, this.optargs, this.optc, and the GLOBALS()
  as appropriate by the time this function is called. (See
-<a href="#lib_args">get_optflags()</a> for details.</p></li>
+<a href="#lib_args">get_optflags()</a> for details.)</p></li>
+
+<li><p>Switch on TOYBOX_DEBUG in menuconfig (toybox global settings menu)
+the first time you build and run your new command. If anything is wrong
+with your option string, that will give you error messages.</p>
+
+<p>Otherwise it'll just segfault without
+explanation when it falls off the end because it didn't find a matching
+end parantheses for a longopt, or you put a nonexistent option in a square
+bracket grouping... Since these kind of errors can only be caused by a
+developer, not by end users, we don't normally want runtime checks for
+them. Once you're happy with your option string, you can switch TOYBOX_DEBUG
+back off.</p></li>
  </ul>
  
  <a name="headers" /><h2><a href="#headers">Headers.</a></h2>
@@ -382,9 +416,6 @@ the first argument, not the command name.  Use toys.which->name for the command
  name.</p></li>
  <li><p>int <b>optc</b> - Optarg count, equivalent to argc but for
  optargs[].<p></li>
-<li><p>int <b>exithelp</b> - Whether error_exit() should print a usage message
-via help_main() before exiting.  (True during option parsing, defaults to
-false afterwards.)</p></li>
  </ul>
  
  <a name="toy_union" />
@@ -420,10 +451,15 @@ as specified by the options field off this command's toy_list entry.  See
  the get_optargs() description in lib/args.c for details.</p>
  </li>
  
-<li><b>char toybuf[4096]</b> - a common scratch space buffer so
-commands don't need to allocate their own.  Any command is free to use this,
-and it should never be directly referenced by functions in lib/ (although
-commands are free to pass toybuf in to a library function as an argument).</li>
+<li><b>char toybuf[4096]</b> - a common scratch space buffer guaranteed
+to start zeroed, so commands don't need to allocate/initialize their own.
+Any command is free to use this, and it should never be directly referenced
+by functions in lib/ (although commands are free to pass toybuf in to a
+library function as an argument).</li>
+
+<li><b>char libbuf[4096]</b> - like toybuf, but for use by common code in
+lib/*.c. Commands should never directly reference libbuf, and library
+could should nnever directly reference toybuf.</li>
  </ul>
  
  <p>The following functions are defined in main.c:</p>
@@ -481,8 +517,9 @@ instructions</a>.</p>
  which is for files generated at build time from other source files.</p>
  
  <ul>
-<li><p><b>generated/Config.in</b> - Included from the top level Config.in,
-contains one or more configuration entries for each command.</p>
+<li><p><b>generated/Config.in</b> - Kconfig entries for each command, included
+from the top level Config.in. The help text here is used to generate
+help.h.</p>
  
  <p>Each command has a configuration entry with an upper case version of
  the command name. Options to commands start with the command
@@ -490,16 +527,10 @@ name followed by an underscore and the option name. Global options are attached
  to the "toybox" command, and thus use the prefix "TOYBOX_".  This organization
  is used by scripts/cfg2files to select which toys/*/*.c files to compile for a
  given .config.</p>
-
-<p>A command with multiple names (or multiple similar commands implemented in
-the same .c file) should have config symbols prefixed with the name of their
-C file. I.E. config symbol prefixes are NEWTOY() names. If OLDTOY() names
-have config symbols they must be options (symbols with an underscore and
-suffix) to the NEWTOY() name. (See generated/toylist.h)</p>
  </li>
  
  <li><p><b>generated/config.h</b> - list of CFG_SYMBOL and USE_SYMBOL() macros,
-generated from .config by a sed invocation in the top level Makefile.</p>
+generated from .config by a sed invocation in scripts/make.sh.</p>
  
  <p>CFG_SYMBOL is a comple time constant set to 1 for enabled symbols and 0 for
  disabled symbols. This allows the use of normal if() statements to remove
@@ -510,12 +541,15 @@ breaks. (See the 1992 Usenix paper
  <a href=http://doc.cat-v.org/henry_spencer/ifdef_considered_harmful.pdf>#ifdef
  Considered Harmful</a> for more information.)</p>
  
-<p>USE_SYMBOL(code) evaluates to the code in parentheses when the symbol
-is enabled, and nothing when the symbol is disabled. This can be used
-for things like varargs or variable declarations which can't always be
-eliminated by a simple test on CFG_SYMBOL. Note that
-(unlike CFG_SYMBOL) this is really just a variant of #ifdef, and can
-still result in configuration dependent build breaks. Use with caution.</p>
+<p>When you can't entirely avoid an #ifdef, the USE_SYMBOL(code) macro
+provides a less intrusive alternative, evaluating to the code in parentheses
+when the symbol is enabled, and nothing when the symbol is disabled. This
+is most commonly used around NEWTOY() declarations (so only the enabled
+commands show up in toy_list), and in option strings. This can also be used
+for things like varargs or structure members which can't always be
+eliminated by a simple test on CFG_SYMBOL. Remember, unlike CFG_SYMBOL
+this is really just a variant of #ifdef, and can still result in configuration
+dependent build breaks. Use with caution.</p>
  </li>
  
  <li><p><b>generated/flags.h</b> - FLAG_? macros indicating which command
@@ -548,10 +582,10 @@ variables out of "this" as TT.variablename.</p>
  lib/args.c argument parsing code called from main.c.</p>
  </li>
  
-<li><p><b>toys/help.h</b> -
-#defines two help text strings for each command: a single line
-command_help and an additinal command_help_long.  This is used by help_main()
-in toys/help.c to display help for commands.</p>
+<li><p><b>toys/help.h</b> - Help strings for use by the "help" command and
+--help options. This file #defines a help_symbolname string for each
+symbolname, but only the symbolnames matching command names get used
+by show_help() in lib/help.c to display help for commands.</p>
  
  <p>This file is created by scripts/make.sh, which compiles scripts/config2help.c
  into the binary generated/config2help, and then runs it against the top
@@ -565,17 +599,19 @@ have their help text added to the command they depend on.</p>
  </li>
  
  <li><p><b>generated/newtoys.h</b> - 
-All the NEWTOY() and OLDTOY() macros in alphabetical order,
-each of which should be inside the appropriate USE_ macro. (Ok, not _quite_
-alphabetical orer: the "toybox" multiplexer is always the first entry.)</p>
+All the NEWTOY() and OLDTOY() macros from toys/*/*.c. The "toybox" multiplexer
+is the first entry, the rest are in alphabetical order. Each line should be
+inside an appropriate USE_ macro, so code that #includes this file only sees
+the currently enabled commands.</p>
  
  <p>By #definining NEWTOY() to various things before #including this file,
  it may be used to create function prototypes (in toys.h), initialize the
-toy_list array (in main.c, the alphabetical order lets toy_find() do a
-binary search), initialize the help_data array (in lib/help.c), and so on.
-(It's even used to initialize the NEED_OPTIONS macro, which is has a 1 or 0
-for each command using command line option parsing, ORed together.
-This allows compile-time dead code elimination to remove the whole of
+help_data array (in lib/help.c),  initialize the toy_list array (in main.c,
+the alphabetical order lets toy_find() do a binary search, the exception to
+the alphabetical order lets it use the multiplexer without searching), and so
+on.  (It's even used to initialize the NEED_OPTIONS macro, which produces a 1
+or 0 for each command using command line option parsing, which is ORed together
+to allow compile-time dead code elimination to remove the whole of
  lib/args.c if nothing currently enabled is using it.)<p>
  
  <p>Each NEWTOY and OLDTOY macro contains the command name, command line
@@ -600,6 +636,178 @@ having to repeat it.</p>
  strlcpy(), xexec(), xopen()/xread(), xgetcwd(), xabspath(), find_in_path(),
  itoa().</p>
  
+
+
+<a name="lib_xwrap"><h3>lib/xwrap.c</h3>
+
+<p>Functions prefixed with the letter x call perror_exit() when they hit
+errors, to eliminate common error checking. This prints an error message
+and the strerror() string for the errno encountered.</p>
+
+<p>We replaced exit(), _exit(), and atexit() with xexit(), _xexit(), and
+sigatexit(). This gives _xexit() the option to siglongjmp(toys.rebound, 1)
+instead of exiting, lets xexit() report stdout flush failures to stderr
+and change the exit code to indicate error, lets our toys.exit function
+change happen for signal exit paths and lets us remove the functions
+after we've called them.</p>
+
+<p>You can intercept our exit by assigning a setjmp/longjmp buffer to
+toys.rebound (set it back to zero to restore the default behavior).
+If you do this, cleaning up resource leaks is your problem.</p>
+
+<ul>
+<li><b>void xstrncpy(char *dest, char *src, size_t size)</b></li>
+<li><p><b><p>void _xexit(void)</b></p>
+<p>Calls siglongjmp(toys.rebound, 1), or else _exit(toys.exitval). This
+lets you ignore errors with the NO_EXIT() macro wrapper, or intercept
+them with WOULD_EXIT().</p>
+<li><b><p>void xexit(void)</b></p>
+<p>Calls toys.xexit functions (if any) and flushes stdout/stderr (reporting
+failure to write to stdout both to stderr and in the exit code), then
+calls _xexit().</p>
+</li>
+<li><b>void *xmalloc(size_t size)</b></li>
+<li><b>void *xzalloc(size_t size)</b></li>
+<li><b>void *xrealloc(void *ptr, size_t size)</b></li>
+<li><b>char *xstrndup(char *s, size_t n)</b></li>
+<li><b>char *xstrdup(char *s)</b></li>
+<li><b>char *xmprintf(char *format, ...)</b></li>
+<li><b>void xprintf(char *format, ...)</b></li>
+<li><b>void xputs(char *s)</b></li>
+<li><b>void xputc(char c)</b></li>
+<li><b>void xflush(void)</b></li>
+<li><b>pid_t xfork(void)</b></li>
+<li><b>void xexec_optargs(int skip)</b></li>
+<li><b>void xexec(char **argv)</b></li>
+<li><b>pid_t xpopen(char **argv, int *pipes)</b></li>
+<li><b>int xpclose(pid_t pid, int *pipes)</b></li>
+<li><b>void xaccess(char *path, int flags)</b></li>
+<li><b>void xunlink(char *path)</b></li>
+<li><p><b>int xcreate(char *path, int flags, int mode)<br />
+int xopen(char *path, int flags)</b></p>
+
+<p>The xopen() and xcreate() functions open an existing file (exiting if
+it's not there) and create a new file (exiting if it can't).</p>
+
+<p>They default to O_CLOEXEC so the filehandles aren't passed on to child
+processes. Feed in O_CLOEXEC to disable this.</p>
+</li>
+<li><p><b>void xclose(int fd)</b></p>
+
+<p>Because NFS is broken, and won't necessarily perform the requested
+operation (and report the error) until you close the file. Of course, this
+being NFS, it's not guaranteed to report the error there either, but it
+_can_.</p>
+
+<p>Nothing else ever reports an error on close, everywhere else it's just a
+VFS operation freeing some resources. NFS is _special_, in a way that
+other network filesystems like smbfs and v9fs aren't..</p>
+</li>
+<li><b>int xdup(int fd)</b></li>
+<li><p><b>size_t xread(int fd, void *buf, size_t len)</b></p>
+
+<p>Can return 0, but not -1.</p>
+</li>
+<li><p><b>void xreadall(int fd, void *buf, size_t len)</b></p>
+
+<p>Reads the entire len-sized buffer, retrying to complete short
+reads. Exits if it can't get enough data.</p></li>
+
+<li><p><b>void xwrite(int fd, void *buf, size_t len)</b></p>
+
+<p>Retries short writes, exits if can't write the entire buffer.</p></li>
+
+<li><b>off_t xlseek(int fd, off_t offset, int whence)</b></li>
+<li><b>char *xgetcwd(void)</b></li>
+<li><b>void xstat(char *path, struct stat *st)</b></li>
+<li><p><b>char *xabspath(char *path, int exact) </b></p>
+
+<p>After several years of
+<a href=http://landley.net/notes-2007.html#18-06-2007>wrestling</a>
+<a href=http://landley.net/notes-2008.html#19-01-2008>with</a> realpath(), 
+I broke down and <a href=http://landley.net/notes-2012.html#20-11-2012>wrote
+my own</a> implementation that doesn't use the one in libc. As I explained:
+
+<blockquote><p>If the path ends with a broken link,
+readlink -f should show where the link points to, not where the broken link
+lives. (The point of readlink -f is "if I write here, where would it attempt
+to create a file".) The problem is, realpath() returns NULL for a path ending
+with a broken link, and I can't beat different behavior out of code locked
+away in libc.</p></blockquote>
+
+<p>
+</li>
+<li><b>void xchdir(char *path)</b></li>
+<li><b>void xchroot(char *path)</b></li>
+
+<li><p><b>struct passwd *xgetpwuid(uid_t uid)<br />
+struct group *xgetgrgid(gid_t gid)<br />
+struct passwd *xgetpwnam(char *name)</b></p>
+
+<p></p>
+</li>
+
+
+
+<li><b>void xsetuser(struct passwd *pwd)</b></li>
+<li><b>char *xreadlink(char *name)</b></li>
+<li><b>char *xreadfile(char *name, char *buf, off_t len)</b></li>
+<li><b>int xioctl(int fd, int request, void *data)</b></li>
+<li><b>void xpidfile(char *name)</b></li>
+<li><b>void xsendfile(int in, int out)</b></li>
+<li><b>long xparsetime(char *arg, long units, long *fraction)</b></li>
+<li><b>void xregcomp(regex_t *preg, char *regex, int cflags)</b></li>
+</ul>
+
+<a name="lib_lib"><h3>lib/lib.c</h3>
+<p>Eight gazillion common functions:</p>
+
+<ul>
+<li><b>void verror_msg(char *msg, int err, va_list va)</b></li>
+<li><b>void error_msg(char *msg, ...)</b></li>
+<li><b>void perror_msg(char *msg, ...)</b></li>
+<li><b>void error_exit(char *msg, ...)</b></li>
+<li><b>void perror_exit(char *msg, ...)</b></li>
+<li><b>ssize_t readall(int fd, void *buf, size_t len)</b></li>
+<li><b>ssize_t writeall(int fd, void *buf, size_t len)</b></li>
+<li><b>off_t lskip(int fd, off_t offset)</b></li>
+<li><b>int mkpathat(int atfd, char *dir, mode_t lastmode, int flags)</b></li>
+<li><b>struct string_list **splitpath(char *path, struct string_list **list)</b></li>
+<li><b>struct string_list *find_in_path(char *path, char *filename)</b></li>
+<li><b>long atolx(char *numstr)</b></li>
+<li><b>long atolx_range(char *numstr, long low, long high)</b></li>
+<li><b>int numlen(long l)</b></li>
+<li><b>int stridx(char *haystack, char needle)</b></li>
+<li><b>int strstart(char **a, char *b)</b></li>
+<li><b>off_t fdlength(int fd)</b></li>
+<li><b>char *readfile(char *name, char *ibuf, off_t len)</b></li>
+<li><b>void msleep(long miliseconds)</b></li>
+<li><b>int64_t peek_le(void *ptr, unsigned size)</b></li>
+<li><b>int64_t peek_be(void *ptr, unsigned size)</b></li>
+<li><b>int64_t peek(void *ptr, unsigned size)</b></li>
+<li><b>void poke(void *ptr, uint64_t val, int size)</b></li>
+<li><b>void loopfiles_rw(char **argv, int flags, int permissions, int failok,</b></li>
+<li><b>void loopfiles(char **argv, void (*function)(int fd, char *name))</b></li>
+<li><b>char *get_rawline(int fd, long *plen, char end)</b></li>
+<li><b>char *get_line(int fd)</b></li>
+<li><b>int wfchmodat(int fd, char *name, mode_t mode)</b></li>
+<li><b>static void tempfile_handler(int i)</b></li>
+<li><b>int copy_tempfile(int fdin, char *name, char **tempname)</b></li>
+<li><b>void delete_tempfile(int fdin, int fdout, char **tempname)</b></li>
+<li><b>void replace_tempfile(int fdin, int fdout, char **tempname)</b></li>
+<li><b>void crc_init(unsigned int *crc_table, int little_endian)</b></li>
+<li><b>int terminal_size(unsigned *xx, unsigned *yy)</b></li>
+<li><b>int yesno(char *prompt, int def)</b></li>
+<li><b>void generic_signal(int sig)</b></li>
+<li><b>void sigatexit(void *handler)</b></li>
+<li><b>int sig_to_num(char *pidstr)</b></li>
+<li><b>char *num_to_sig(int sig)</b></li>
+<li><b>mode_t string_to_mode(char *modestr, mode_t mode)</b></li>
+<li><b>void mode_to_string(mode_t mode, char *buf)</b></li>
+<li><b>void names_to_pid(char **names, int (*callback)(pid_t pid, char *name))</b></li>
+<li><b>int human_readable(char *buf, unsigned long long num)</b></li>
+</ul>
+
  <h3>lib/portability.h</h3>
  
  <p>This file is automatically included from the top of toys.h, and smooths
@@ -710,20 +918,21 @@ a double_list, dlist_add() your entries, and then break the circle with
  <a name="lib_args"><h3>lib/args.c</h3>
  
  <p>Toybox's main.c automatically parses command line options before calling the
-command's main function.  Option parsing starts in get_optflags(), which stores
+command's main function. Option parsing starts in get_optflags(), which stores
  results in the global structures "toys" (optflags and optargs) and "this".</p>
  
  <p>The option parsing infrastructure stores a bitfield in toys.optflags to
-indicate which options the current command line contained.  Arguments
+indicate which options the current command line contained, and defines FLAG
+macros code can use to check whether each argument's bit is set. Arguments
  attached to those options are saved into the command's global structure
-("this").  Any remaining command line arguments are collected together into
-the null-terminated array toys.optargs, with the length in toys.optc.  (Note
+("this"). Any remaining command line arguments are collected together into
+the null-terminated array toys.optargs, with the length in toys.optc. (Note
  that toys.optargs does not contain the current command name at position zero,
-use "toys.which->name" for that.)  The raw command line arguments get_optflags()
+use "toys.which->name" for that.) The raw command line arguments get_optflags()
  parsed are retained unmodified in toys.argv[].</p>
  
  <p>Toybox's option parsing logic is controlled by an "optflags" string, using
-a format reminiscent of getopt's optargs but has several important differences.
+a format reminiscent of getopt's optargs but with several important differences.
  Toybox does not use the getopt()
  function out of the C library, get_optflags() is an independent implementation
  which doesn't permute the original arguments (and thus doesn't change how the
@@ -737,14 +946,14 @@ command line arguments to look for, and what to do with them.
  If a command has no option
  definition string (I.E. the argument is NULL), option parsing is skipped
  for that command, which must look at the raw data in toys.argv to parse its
-own arguments.  (If no currently enabled command uses option parsing,
+own arguments. (If no currently enabled command uses option parsing,
  get_optflags() is optimized out of the resulting binary by the compiler's
  --gc-sections option.)</p>
  
  <p>You don't have to free the option strings, which point into the environment
-space (I.E. the string data is not copied).  A TOYFLAG_NOFORK command
+space (I.E. the string data is not copied). A TOYFLAG_NOFORK command
  that uses the linked list type "*" should free the list objects but not
-the data they point to, via "llist_free(TT.mylist, NULL);".  (If it's not
+the data they point to, via "llist_free(TT.mylist, NULL);". (If it's not
  NOFORK, exit() will free all the malloced data anyway unless you want
  to implement a CONFIG_TOYBOX_FREE cleanup for it.)</p>
  
@@ -765,7 +974,7 @@ available to command_main():
  <ul>
  <li><p>In <b>struct toys</b>:
  <ul>
-<li>toys.optflags = 13; // -a = 8 | -b = 4 | -d = 1</li>
+<li>toys.optflags = 13; // FLAG_a = 8 | FLAG_b = 4 | FLAG_d = 1</li>
  <li>toys.optargs[0] = "walrus"; // leftover argument</li>
  <li>toys.optargs[1] = NULL; // end of list</li>
  <li>toys.optc = 1; // there was 1 leftover argument</li>
@@ -791,6 +1000,7 @@ GLOBALS(
         long a;
  )
  </pre></blockquote>
+
  <p>That would mean TT.c == NULL, TT.b == "fruit", and TT.a == 42.  (Remember,
  each entry that receives an argument must be a long or pointer, to line up
  with the array position.  Right to left in the optflags string corresponds to
@@ -809,19 +1019,39 @@ toys.optflags, with the same value as a corresponding binary digit.  The
  rightmost argument is (1<<0), the next to last is (1<<1) and so on.  If
  the option isn't encountered while parsing argv[], its bit remains 0.</p>
  
+<p>Each option -x has a FLAG_x macro for the command letter. Bare --longopts
+with no corresponding short option have a FLAG_longopt macro for the long
+optionname. Commands enable these macros by #defining FOR_commandname before
+#including <toys.h>. When multiple commands are implemented in the same
+source file, you can switch flag contexts later in the file by
+#defining CLEANUP_oldcommand and #defining FOR_newcommand, then
+#including <generated/flags.h>.</p>
+
+<p>Options disabled in the current configuration (wrapped in
+a USE_BLAH() macro for a CONFIG_BLAH that's switched off) have their
+corresponding FLAG macro set to zero, so code checking them ala
+if (toys.optargs & FLAG_x) gets optimized out via dead code elimination.
+#defining FORCE_FLAGS when switching flag context disables this
+behavior: the flag is never zero even if the config is disabled. This
+allows code shared between multiple commands to use the same flag
+values, as long as the common flags match up right to left in both option
+strings.</p>
+
  <p>For example,
  the optflags string "abcd" would parse the command line argument "-c" to set
  optflags to 2, "-a" would set optflags to 8, "-bd" would set optflags to
-6 (I.E. 4|2), and "-a -c" would set optflags to 10 (2|8).</p>
+6 (I.E. 4|2), and "-a -c" would set optflags to 10 (2|8). To check if -c
+was encountered, code could test: if (toys.optflags & FLAG_c) printf("yup");
+(See the toys/examples directory for more.)</p>
  
  <p>Only letters are relevant to optflags, punctuation is skipped: in the
-string "a*b:c#d", d=1, c=2, b=4, a=8.  The punctuation after a letter
+string "a*b:c#d", d=1, c=2, b=4, a=8. The punctuation after a letter
  usually indicate that the option takes an argument.</p>
  
-<p>Since toys.optflags is an unsigned int, it only stores 32 bits.  (Which is
+<p>Since toys.optflags is an unsigned int, it only stores 32 bits. (Which is
  the amount a long would have on 32-bit platforms anyway; 64 bit code on
  32 bit platforms is too expensive to require in common code used by almost
-all commands.)  Bit positions beyond the 1<<31 aren't recorded, but
+all commands.) Bit positions beyond the 1<<31 aren't recorded, but
  parsing higher options can still set global variables.</p>
  
  <p><b>Automatically setting global variables from arguments (union this)</b></p>
@@ -843,15 +1073,6 @@ argument letter, indicating the option takes an additional argument:</p>
  </ul>
  </ul>
  
-<p>A note about "." and CFG_TOYBOX_FLOAT: option parsing only understands <>=
-after . when CFG_TOYBOX_FLOAT
-is enabled. (Otherwise the code to determine where floating point constants
-end drops out; it requires floating point).  When disabled, it can reserve a
-global data slot for the argument (so offsets won't change in your
-GLOBALS[] block), but will never fill it out. You can handle
-this by using the USE_BLAH() macros with C string concatenation, ala:
-"abc." USE_TOYBOX_FLOAT("<1.23>4.56=7.89") "def"</p>
-
  <p><b>GLOBALS</b></p>
  
  <p>Options which have an argument fill in the corresponding slot in the global
@@ -866,7 +1087,7 @@ in the same order they're declared, and that padding won't be inserted between
  consecutive variables of register size.  Thus the first few entries can
  be longs or pointers corresponding to the saved arguments.</p>
  
-<p>See toys/other/hello.c for a longer example of parsing options into the
+<p>See toys/example/*.c for longer examples of parsing options into the
  GLOBALS block.</p>
  
  <p><b>char *toys.optargs[]</b></p>
@@ -920,7 +1141,7 @@ optflag, but letters are never control characters.)</p>
  <p>Option parsing only understands <>= after . when CFG_TOYBOX_FLOAT
  is enabled. (Otherwise the code to determine where floating point constants
  end drops out.  When disabled, it can reserve a global data slot for the
-argument so offsets won't change, but will never fill it out.). You can handle
+argument so offsets won't change, but will never fill it out.) You can handle
  this by using the USE_BLAH() macros with C string concatenation, ala:</p>
  
  <blockquote>"abc." USE_TOYBOX_FLOAT("<1.23>4.56=7.89") "def"</blockquote>
@@ -928,13 +1149,13 @@ this by using the USE_BLAH() macros with C string concatenation, ala:</p>
  <p><b>--longopts</b></p>
  
  <p>The optflags string can contain long options, which are enclosed in
-parentheses.  They may be appended to an existing option character, in
+parentheses. They may be appended to an existing option character, in
  which case the --longopt is a synonym for that option, ala "a:(--fred)"
  which understands "-a blah" or "--fred blah" as synonyms.</p>
  
  <p>Longopts may also appear before any other options in the optflags string,
  in which case they have no corresponding short argument, but instead set
-their own bit based on position.  So for "(walrus)#(blah)xy:z" "command
+their own bit based on position. So for "(walrus)#(blah)xy:z", "command
  --walrus 42" would set toys.optflags = 16 (-z = 1, -y = 2, -x = 4, --blah = 8)
  and would assign this[1] = 42;</p>
  
@@ -942,6 +1163,17 @@ and would assign this[1] = 42;</p>
  each "bare longopt" (ala "(one)(two)abc" before any option characters)
  always sets its own bit (although you can group them with +X).</p>
  
+<p>Only bare longopts have a FLAG_ macro with the longopt name
+(ala --fred would #define FLAG_fred). Other longopts use the short
+option's FLAG macro to test the toys.optflags bit.</p>
+
+<p>Options with a semicolon ";" after their data type can only set their
+corresponding GLOBALS() entry via "--longopt=value". For example, option
+string "x(boing): y" would set TT.x if it saw "--boing=value", but would
+treat "--boing value" as setting FLAG_x in toys.optargs, leaving TT.x NULL,
+and keeping "value" in toys.optargs[]. (This lets "ls --color" and
+"ls --color=auto" both work.)</p>
+
  <p><b>[groups]</b></p>
  
  <p>At the end of the option string, square bracket groups can define
@@ -993,16 +1225,32 @@ of functions.</p>
  
  <p>These functions do not call chdir() or rely on PATH_MAX. Instead they
  use openat() and friends, using one filehandle per directory level to
-recurseinto subdirectories. (I.E. they can descend 1000 directories deep
+recurse into subdirectories. (I.E. they can descend 1000 directories deep
  if setrlimit(RLIMIT_NOFILE) allows enough open filehandles, and the default
  in /proc/self/limits is generally 1024.)</p>
  
+<p>There are two main ways to use dirtree: 1) assemble a tree of nodes
+representing a snapshot of directory state and traverse them using the
+->next and ->child pointers, or 2) traverse the tree calling a callback
+function on each entry, and freeing its node afterwards. (You can also
+combine the two, using the callback as a filter to determine which nodes
+to keep.)</p>
+
  <p>The basic dirtree functions are:</p>
  
  <ul>
-<li><p><b>dirtree_read(char *path, int (*callback)(struct dirtree node))</b> -
-recursively read directories, either applying callback() or returning
-a tree of struct dirtree if callback is NULL.</p></li>
+<li><p><b>struct dirtree *dirtree_read(char *path, int (*callback)(struct
+dirtree node))</b> - recursively read files and directories, calling
+callback() on each, and returning a tree of saved nodes (if any).
+If path doesn't exist, returns DIRTREE_ABORTVAL. If callback is NULL,
+returns a single node at that path.</p>
+
+<li><p><b>dirtree_notdotdot(struct dirtree *new)</b> - standard callback
+which discards "." and ".." entries and returns DIRTREE_SAVE|DIRTREE_RECURSE
+for everything else. Used directly, this assembles a snapshot tree of
+the contents of this directory and its subdirectories
+to be processed after dirtree_read() returns (by traversing the
+struct dirtree's ->next and ->child pointers from the returned root node).</p>
  
  <li><p><b>dirtree_path(struct dirtree *node, int *plen)</b> - malloc() a
  string containing the path from the root of this tree to this node. If
@@ -1010,21 +1258,21 @@ plen isn't NULL then *plen is how many extra bytes to malloc at the end
  of string.</p></li>
  
  <li><p><b>dirtree_parentfd(struct dirtree *node)</b> - return fd of
-containing directory, for use with openat() and such.</p></li>
+directory containing this node, for use with openat() and such.</p></li>
  </ul>
  
-<p>The <b>dirtree_read()</b> function takes two arguments, a starting path for
-the root of the tree, and a callback function. The callback takes a
-<b>struct dirtree *</b> (from lib/lib.h) as its argument. If the callback is
-NULL, the traversal uses a default callback (dirtree_notdotdot()) which
-recursively assembles a tree of struct dirtree nodes for all files under
-this directory and subdirectories (filtering out "." and ".." entries),
-after which dirtree_read() returns the pointer to the root node of this
-snapshot tree.</p>
+<p>The <b>dirtree_read()</b> function is the standard way to start
+directory traversal. It takes two arguments: a starting path for
+the root of the tree, and a callback function. The callback() is called
+on each directory entry, its argument is a fully populated
+<b>struct dirtree *</b> (from lib/lib.h) describing the node, and its
+return value tells the dirtree infrastructure what to do next.</p>
  
-<p>Otherwise the callback() is called on each entry in the directory,
-with struct dirtree * as its argument. This includes the initial
-node created by dirtree_read() at the top of the tree.</p>
+<p>(There's also a three argument version,
+<b>dirtree_flagread(char *path, int flags, int (*callback)(struct
+dirtree node))</b>, which lets you apply flags like DIRTREE_SYMFOLLOW and
+DIRTREE_SHUTUP to reading the top node, but this only affects the top node.
+Child nodes use the flags returned by callback().</p>
  
  <p><b>struct dirtree</b></p>
  
@@ -1032,12 +1280,13 @@ node created by dirtree_read() at the top of the tree.</p>
  st</b> entries describing a file, plus a <b>char *symlink</b>
  which is NULL for non-symlinks.</p>
  
-<p>During a callback function, the <b>int data</b> field of directory nodes
-contains a dirfd (for use with the openat() family of functions). This is
-generally used by calling dirtree_parentfd() on the callback's node argument.
-For symlinks, data contains the length of the symlink string. On the second
-callback from DIRTREE_COMEAGAIN (depth-first traversal) data = -1 for
-all nodes (that's how you can tell it's the second callback).</p>
+<p>During a callback function, the <b>int dirfd</b> field of directory nodes
+contains a directory file descriptor (for use with the openat() family of
+functions). This isn't usually used directly, intstead call dirtree_parentfd()
+on the callback's node argument. The <b>char again</a> field is 0 for the
+first callback on a node, and 1 on the second callback (triggered by returning
+DIRTREE_COMEAGAIN on a directory, made after all children have been processed).
+</p>
  
  <p>Users of this code may put anything they like into the <b>long extra</b>
  field. For example, "cp" and "mv" use this to store a dirfd for the destination
@@ -1061,15 +1310,17 @@ return DIRTREE_ABORT from parent callbacks too.)</p></li>
  <li><p><b>DIRTREE_RECURSE</b> - Examine directory contents. Ignored for
  non-directory entries. The remaining flags only take effect when
  recursing into the children of a directory.</p></li>
-<li><p><b>DIRTREE_COMEAGAIN</b> - Call the callback a second time after
-examining all directory contents, allowing depth-first traversal.
-On the second call, dirtree->data = -1.</p></li>
+<li><p><b>DIRTREE_COMEAGAIN</b> - Call the callback on this node a second time
+after examining all directory contents, allowing depth-first traversal.
+On the second call, dirtree->again is nonzero.</p></li>
  <li><p><b>DIRTREE_SYMFOLLOW</b> - follow symlinks when populating children's
  <b>struct stat st</b> (by feeding a nonzero value to the symfollow argument of
  dirtree_add_node()), which means DIRTREE_RECURSE treats symlinks to
  directories as directories. (Avoiding infinite recursion is the callback's
  problem: the non-NULL dirtree->symlink can still distinguish between
-them.)</p></li>
+them. The "find" command follows ->parent up the tree to the root node
+each time, checking to make sure that stat's dev and inode pair don't
+match any ancestors.)</p></li>
  </ul>
  
  <p>Each struct dirtree contains three pointers (next, parent, and child)
@@ -1094,15 +1345,15 @@ single malloc() (even char *symlink points to memory at the end of the node),
  so llist_free() works but its callback must descend into child nodes (freeing
  a tree, not just a linked list), plus whatever the user stored in extra.</p>
  
-<p>The <b>dirtree_read</b>() function is a simple wrapper, calling <b>dirtree_add_node</b>()
+<p>The <b>dirtree_flagread</b>() function is a simple wrapper, calling <b>dirtree_add_node</b>()
  to create a root node relative to the current directory, then calling
-<b>handle_callback</b>() on that node (which recurses as instructed by the callback
-return flags). Some commands (such as chgrp) bypass this wrapper, for example
-to control whether or not to follow symlinks to the root node; symlinks
+<b>dirtree_handle_callback</b>() on that node (which recurses as instructed by the callback
+return flags). The flags argument primarily lets you
+control whether or not to follow symlinks to the root node; symlinks
  listed on the command line are often treated differently than symlinks
-encountered during recursive directory traversal).
+encountered during recursive directory traversal.
  
-<p>The ls command not only bypasses the wrapper, but never returns
+<p>The ls command not only bypasses this wrapper, but never returns
  <b>DIRTREE_RECURSE</b> from the callback, instead calling <b>dirtree_recurse</b>() manually
  from elsewhere in the program. This gives ls -lR manual control
  of traversal order, which is neither depth first nor breadth first but
@@ -1116,16 +1367,23 @@ self-contained file. Adding a new command involves adding a single
  file, and removing a command involves removing that file. Commands use
  shared infrastructure from the lib/ and generated/ directories.</p>
  
-<p>Currently there are three subdirectories under "toys/" containing commands
-described in POSIX-2008, the Linux Standard Base 4.1, or "other". The only
-difference this makes is which menu the command shows up in during "make
-menuconfig", the directories are otherwise identical. Note that they commands
-exist within a single namespace at runtime, so you can't have the same
-command in multiple subdirectories.</p>
+<p>Currently there are five subdirectories under "toys/" containing "posix"
+commands described in POSIX-2008, "lsb" commands described in the Linux
+Standard Base 4.1, "other" commands not described by either standard,
+"pending" commands awaiting cleanup (which default to "n" in menuconfig
+because they don't necessarily work right yet), and "example" code showing
+how toybox infrastructure works and providing template/skeleton files to
+start new commands.</p>
+
+<p>The only difference directory location makes is which menu the command
+shows up in during "make menuconfig", the directories are otherwise identical.
+Note that the commands exist within a single namespace at runtime, so you can't
+have the same command in multiple subdirectories. (The build tries to fail
+informatively when you do that.)</p>
  
-<p>(There are actually four sub-menus in "make menuconfig", the fourth
-contains global configuration options for toybox, and lives in Config.in at
-the top level.)</p>
+<p>There is one more sub-menus in "make menuconfig" containing global
+configuration options for toybox. This menu is defined in the top level
+Config.in.</p>
  
  <p>See <a href="#adding">adding a new command</a> for details on the
  layout of a command file.</p>
@@ -1152,28 +1410,15 @@ Makefile.
  <p>Menuconfig infrastructure copied from the Linux kernel.  See the
  Linux kernel's Documentation/kbuild/kconfig-language.txt</p>
  
-<a name="generated">
-<h2>Directory generated/</h2>
-
-<p>All the files in this directory except the README are generated by the
-build.  (See scripts/make.sh)</p>
-
-<ul>
-<li><p><b>config.h</b> - CFG_COMMAND and USE_COMMAND() macros set by menuconfig via .config.</p></li>
-
-<li><p><b>Config.in</b> - Kconfig entries for each command.  Included by top level Config.in.  The help text in here is used to generated help.h</p></li>
+<!-- todo
  
-<li><p><b>help.h</b> - Help text strings for use by "help" command.  Building
-this file requires python on the host system, so the prebuilt file is shipped
-in the build tarball to avoid requiring python to build toybox.</p></li>
+Better OLDTOY and multiple command explanation. From Config.in:
  
-<li><p><b>newtoys.h</b> - List of NEWTOY() or OLDTOY() macros for all available
-commands.  Associates command_main() functions with command names, provides
-option string for command line parsing (<a href="#lib_args">see lib/args.c</a>),
-specifies where to install each command and whether toysh should fork before
-calling it.</p></li>
-</ul>
+<p>A command with multiple names (or multiple similar commands implemented in
+the same .c file) should have config symbols prefixed with the name of their
+C file. I.E. config symbol prefixes are NEWTOY() names. If OLDTOY() names
+have config symbols they must be options (symbols with an underscore and
+suffix) to the NEWTOY() name. (See generated/toylist.h)</p>
+-->
  
-<p>Everything in this directory is a derivative file produced from something
-else.  The entire directory is deleted by "make distclean".</p>
  <!--#include file="footer.html" -->