.\"
.\" Author: Paul Jackson (http://oss.sgi.com/projects/cpusets)
.\"
+.\" %%%LICENSE_START(GPLv2_MISC)
.\" This is free documentation; you can redistribute it and/or
.\" modify it under the terms of the GNU General Public License
.\" version 2 as published by the Free Software Foundation.
.\" GNU General Public License for more details.
.\"
.\" You should have received a copy of the GNU General Public
-.\" License along with this manual; if not, write to the Free
-.\" Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
-.\" MA 02111, USA.
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
.\"
-.TH CPUSET 7 2008-11-12 "Linux" "Linux Programmer's Manual"
+.TH CPUSET 7 2013-02-12 "Linux" "Linux Programmer's Manual"
.SH NAME
cpuset \- confine processes to processor and memory node subsets
.SH DESCRIPTION
See the \fBNotify On Release\fR section, below.
.\" ====================== cpus ======================
.TP
-.I cpus
+.I cpuset.cpus
List of the physical numbers of the CPUs on which processes
in that cpuset are allowed to execute.
See \fBList Format\fR below for a description of the
file.
.\" ==================== cpu_exclusive ====================
.TP
-.I cpu_exclusive
+.I cpuset.cpu_exclusive
Flag (0 or 1).
If set (1), the cpuset has exclusive use of
its CPUs (no sibling or cousin cpuset may overlap CPUs).
of its parent cpuset.
.\" ====================== mems ======================
.TP
-.I mems
+.I cpuset.mems
List of memory nodes on which processes in this cpuset are
allowed to allocate memory.
See \fBList Format\fR below for a description of the
.IR mems .
.\" ==================== mem_exclusive ====================
.TP
-.I mem_exclusive
+.I cpuset.mem_exclusive
Flag (0 or 1).
If set (1), the cpuset has exclusive use of
its memory nodes (no sibling or cousin may overlap).
of that cpuset's parent cpuset.
.\" ==================== mem_hardwall ====================
.TP
-.IR mem_hardwall " (since Linux 2.6.26)"
+.IR cpuset.mem_hardwall " (since Linux 2.6.26)"
Flag (0 or 1).
If set (1), the cpuset is a \fBHardwall\fR cpuset (see below.)
Unlike \fBmem_exclusive\fR,
Newly created cpusets also initially default this to off (0).
.\" ==================== memory_migrate ====================
.TP
-.IR memory_migrate " (since Linux 2.6.16)"
+.IR cpuset.memory_migrate " (since Linux 2.6.16)"
Flag (0 or 1).
If set (1), then memory migration is enabled.
By default this is off (0).
See the \fBMemory Migration\fR section, below.
.\" ==================== memory_pressure ====================
.TP
-.IR memory_pressure " (since Linux 2.6.16)"
+.IR cpuset.memory_pressure " (since Linux 2.6.16)"
A measure of how much memory pressure the processes in this
cpuset are causing.
See the \fBMemory Pressure\fR section, below.
section, below.
.\" ================= memory_pressure_enabled =================
.TP
-.IR memory_pressure_enabled " (since Linux 2.6.16)"
+.IR cpuset.memory_pressure_enabled " (since Linux 2.6.16)"
Flag (0 or 1).
This file is only present in the root cpuset, normally
.IR /dev/cpuset .
\fBMemory Pressure\fR section, below.
.\" ================== memory_spread_page ==================
.TP
-.IR memory_spread_page " (since Linux 2.6.17)"
+.IR cpuset.memory_spread_page " (since Linux 2.6.17)"
Flag (0 or 1).
If set (1), pages in the kernel page cache
(file-system buffers) are uniformly spread across the cpuset.
See the \fBMemory Spread\fR section, below.
.\" ================== memory_spread_slab ==================
.TP
-.IR memory_spread_slab " (since Linux 2.6.17)"
+.IR cpuset.memory_spread_slab " (since Linux 2.6.17)"
Flag (0 or 1).
If set (1), the kernel slab caches
for file I/O (directory and inode structures) are
See the \fBMemory Spread\fR section, below.
.\" ================== sched_load_balance ==================
.TP
-.IR sched_load_balance " (since Linux 2.6.24)"
+.IR cpuset.sched_load_balance " (since Linux 2.6.24)"
Flag (0 or 1).
If set (1, the default) the kernel will
automatically load balance processes in that cpuset over
See \fBScheduler Load Balancing\fR, below, for further details.
.\" ================== sched_relax_domain_level ==================
.TP
-.IR sched_relax_domain_level " (since Linux 2.6.26)"
+.IR cpuset.sched_relax_domain_level " (since Linux 2.6.26)"
Integer, between \-1 and a small positive value.
The
.I sched_relax_domain_level
a process is allowed to use, cpusets provide the following
extended capabilities.
.\" ================== Exclusive Cpusets ==================
-.SS Exclusive Cpusets
+.SS Exclusive cpusets
If a cpuset is marked
.I cpu_exclusive
or
.I hardwall
cpuset.
.\" ================== Notify On Release ==================
-.SS Notify On Release
+.SS Notify on release
If the
.I notify_on_release
flag is enabled (1) in a cpuset,
number 0 or 1 (with optional trailing newline)
into the file, to clear or set the flag, respectively.
.\" ================== Memory Pressure ==================
-.SS Memory Pressure
+.SS Memory pressure
The
.I memory_pressure
of a cpuset provides a simple per-cpuset running average of
what action to take if it detects signs of memory pressure.
.PP
Unless memory pressure calculation is enabled by setting the pseudo-file
-.IR /dev/cpuset/memory_pressure_enabled ,
+.IR /dev/cpuset/cpuset.memory_pressure_enabled ,
it is not computed for any cpuset, and reads from any
.I memory_pressure
always return zero, as represented by the ASCII string "0\en".
will have to be reread from disk.
.PP
The
-.I memory_pressure
+.I cpuset.memory_pressure
file provides an integer number representing the recent (half-life of
10 seconds) rate of entries to the direct reclaim code caused by any
process in the cpuset, in units of reclaims attempted per second,
times 1000.
.\" ================== Memory Spread ==================
-.SS Memory Spread
+.SS Memory spread
There are two Boolean flag files per cpuset that control where the
kernel allocates pages for the file-system buffers and related
in-kernel data structures.
They are called
-.I memory_spread_page
+.I cpuset.memory_spread_page
and
-.IR memory_spread_slab .
+.IR cpuset.memory_spread_slab .
.PP
If the per-cpuset Boolean flag file
-.I memory_spread_page
+.I cpuset.memory_spread_page
is set, then
the kernel will spread the file-system buffers (page cache) evenly
over all the nodes that the faulting process is allowed to use, instead
of preferring to put those pages on the node where the process is running.
.PP
If the per-cpuset Boolean flag file
-.I memory_spread_slab
+.I cpuset.memory_spread_slab
is set,
then the kernel will spread some file-system-related slab caches,
such as those for inodes and directory entries, evenly over all the nodes
reapplied.
.PP
Both
-.I memory_spread_page
+.I cpuset.memory_spread_page
and
-.I memory_spread_slab
+.I cpuset.memory_spread_slab
are Boolean flag files.
By default they contain "0", meaning that the feature is off
for that cpuset.
especially for jobs that might have just a single
thread initializing or reading in the data set.
.\" ================== Memory Migration ==================
-.SS Memory Migration
+.SS Memory migration
Normally, under the default setting (disabled) of
-.IR memory_migrate ,
+.IR cpuset.memory_migrate ,
once a page is allocated (given a physical page
of main memory) then that page stays on whatever node it
was allocated, so long as it remains allocated, even if the
then the page will be placed on the second valid node of the new cpuset,
if possible.
.\" ================== Scheduler Load Balancing ==================
-.SS Scheduler Load Balancing
+.SS Scheduler load balancing
The kernel scheduler automatically load balances processes.
If one CPU is underutilized,
the kernel will look for processes on other more
.I sched_load_balance
as those processes aren't going anywhere else anyway.
.\" ================== Scheduler Relax Domain Level ==================
-.SS Scheduler Relax Domain Level
+.SS Scheduler relax domain level
The kernel scheduler performs immediate load balancing whenever
a CPU becomes free or another task becomes runnable.
This load
The following formats are used to represent sets of
CPUs and memory nodes.
.\" ================== Mask Format ==================
-.SS Mask Format
-The \fBMask Format\fR is used to represent CPU and memory-node bitmasks
+.SS Mask format
+The \fBMask Format\fR is used to represent CPU and memory-node bit masks
in the
.I /proc/<pid>/status
file.
The hex digits within a word are also in big-endian order.
.PP
The number of 32-bit words displayed is the minimum number needed to
-display all bits of the bitmask, based on the size of the bitmask.
+display all bits of the bit mask, based on the size of the bit mask.
.PP
Examples of the \fBMask Format\fR:
.PP
second for bit 32, the third for bit 16, the fourth for bit 8, the
fifth for bit 4, and the "7" is for bits 2, 1, and 0.
.\" ================== List Format ==================
-.SS List Format
+.SS List format
The \fBList Format\fR for
.I cpus
and
.SH WARNINGS
.SS Enabling memory_pressure
By default, the per-cpuset file
-.I memory_pressure
+.I cpuset.memory_pressure
always contains zero (0).
Unless this feature is enabled by writing "1" to the pseudo-file
-.IR /dev/cpuset/memory_pressure_enabled ,
+.IR /dev/cpuset/cpuset.memory_pressure_enabled ,
the kernel does
not compute per-cpuset
.IR memory_pressure .
.in +4n
.nf
-echo 19 > mems
+echo 19 > cpuset.mems
.fi
.in
.in +4n
.nf
-/bin/echo 19 > mems
+/bin/echo 19 > cpuset.mems
/bin/echo: write error: Invalid argument
.fi
.in
.B EACCES
Attempted to set, using
.BR write (2),
-.I cpu_exclusive
+.I cpuset.cpu_exclusive
or
-.I mem_exclusive
+.I cpuset.mem_exclusive
on a cpuset whose parent lacks the same setting.
.TP
.B EACCES
Attempted to
.BR write (2)
a
-.I memory_pressure
+.I cpuset.memory_pressure
file.
.TP
.B EACCES
Attempted to
.BR write (2)
an empty
-.I cpus
+.I cpuset.cpus
or
-.I mems
+.I cpuset.mems
list to a cpuset which has attached processes or child cpusets.
.TP
.B EINVAL
Attempted to
.BR write (2)
a
-.I cpus
+.I cpuset.cpus
or
-.I mems
+.I cpuset.mems
list which included a range with the second number smaller than
the first number.
.TP
Attempted to
.BR write (2)
a
-.I cpus
+.I cpuset.cpus
or
-.I mems
+.I cpuset.mems
list which included an invalid character in the string.
.TP
.B EINVAL
Attempted to
.BR write (2)
a list to a
-.I cpus
+.I cpuset.cpus
file that did not include any online CPUs.
.TP
.B EINVAL
Attempted to
.BR write (2)
a list to a
-.I mems
+.I cpuset.mems
file that did not include any online memory nodes.
.TP
.B EINVAL
Attempted to
.BR write (2)
a list to a
-.I mems
+.I cpuset.mems
file that included a node that held no memory.
.TP
.B EIO
of a process to a cpuset
.I tasks
file when the cpuset had an empty
-.I cpus
+.I cpuset.cpus
or empty
-.I mems
+.I cpuset.mems
setting.
.TP
.B ENOSPC
Attempted to
.BR write (2)
an empty
-.I cpus
+.I cpuset.cpus
or
-.I mems
+.I cpuset.mems
setting to a cpuset that
has tasks attached.
.TP
.TP
.B ERANGE
Specified a
-.I cpus
+.I cpuset.cpus
or
-.I mems
+.I cpuset.mems
list to the kernel which included a number too large for the kernel
-to set in its bitmasks.
+to set in its bit masks.
.TP
.B ESRCH
Attempted to
.IR pid .
.\" ================== BUGS ==================
.SH BUGS
-.I memory_pressure
+.I cpuset.memory_pressure
cpuset files can be opened
for writing, creation, or truncation, but then the
.BR write (2)
.RB "$" " cd /dev/cpuset"
.RB "$" " mkdir Charlie"
.RB "$" " cd Charlie"
-.RB "$" " /bin/echo 2-3 > cpus"
-.RB "$" " /bin/echo 1 > mems"
+.RB "$" " /bin/echo 2-3 > cpuset.cpus"
+.RB "$" " /bin/echo 1 > cpuset.mems"
.RB "$" " /bin/echo $$ > tasks"
# The current shell is now running in cpuset Charlie
# The next line should display '/Charlie'
.RB "$" " cd /dev/cpuset"
.RB "$" " mkdir beta"
.RB "$" " cd beta"
-.RB "$" " /bin/echo 16-19 > cpus"
-.RB "$" " /bin/echo 8-9 > mems"
-.RB "$" " /bin/echo 1 > memory_migrate"
+.RB "$" " /bin/echo 16-19 > cpuset.cpus"
+.RB "$" " /bin/echo 8-9 > cpuset.mems"
+.RB "$" " /bin/echo 1 > cpuset.memory_migrate"
.RB "$" " while read i; do /bin/echo $i; done < ../alpha/tasks > tasks"
.fi
.in
.BR migratepages (8),
.BR numactl (8)
.PP
-The kernel source file
-.IR Documentation/cpusets.txt .
+.IR Documentation/cpusets.txt
+in the Linux kernel source tree