git.osdn.net Git - android-x86/kernel.git/log

perf tools: Always try to build libtraceevent

Although perf depends on the libtraceevent, it cannot know when it needs
to be rebuilt. So just try to rebuild it always in order to make sure we
use the latest version.

While at it, silence annoying directory change messages.

Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1337677434-4881-2-git-send-email-namhyung.kim@lge.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Rename libparsevent to libtraceevent in Makefile

Change some variable names according to new library name.

Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1337677434-4881-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf script: Rename struct event to struct event_format in perl engine

While migrating to the libtraceevent, the perl scripting engine
missed this structure rename.

This fixes:

     util/scripting-engines/trace-event-perl.c: In function "find_cache_event":
     util/scripting-engines/trace-event-perl.c:244: error: assignment from incompatible pointer type
     util/scripting-engines/trace-event-perl.c:248: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:248: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:250: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c: In function "perl_process_tracepoint":
     util/scripting-engines/trace-event-perl.c:286: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:286: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:307: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c: In function "perl_generate_script":
     util/scripting-engines/trace-event-perl.c:498: error: passing argument 1 of "trace_find_next_event" from incompatible pointer type
     util/scripting-engines/../trace-event.h:56: note: expected "struct event_format *" but argument is of type "struct event *"
     util/scripting-engines/trace-event-perl.c:498: error: assignment from incompatible pointer type
     util/scripting-engines/trace-event-perl.c:499: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:499: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:513: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:532: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:556: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:569: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:570: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:579: error: dereferencing pointer to incomplete type
     util/scripting-engines/trace-event-perl.c:580: error: dereferencing pointer to incomplete type

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: http://lkml.kernel.org/r/1337697049-30251-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf script: Explicitly handle known default print arg type

Handle the print argument types brought by the new libparsevent in perl
scripting engine.

PRINT_BSTRING and PRINT_DYNAMIC_ARRAY are treated just like strings
and thus don't require specific processing.

But PRINT_FUNC need specific plugins which are not yet handled, lets
warn if we meet this case.

This fixes:

     util/scripting-engines/trace-event-perl.c: In function define_event_symbol:
     util/scripting-engines/trace-event-perl.c:188: error: enumeration value PRINT_BSTRING not handled in switch
     util/scripting-engines/trace-event-perl.c:188: error: enumeration value PRINT_DYNAMIC_ARRAY not handled in switch
     util/scripting-engines/trace-event-perl.c:188: error: enumeration value PRINT_FUNC not handled in switch

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: http://lkml.kernel.org/r/1337697049-30251-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Add hardcoded name term for pmu events

Adding a new hardcoded term 'name' allowing to specify a name for the
pmu event. The term is defined along with standard pmu terms. If no
'name' term is given, the event name follows following template:

    "raw 0x<perf_event_attr::config>"

running:
    perf stat -e cpu/config=1,name=krava1/u ls

will produce following output:
    ...
    Performance counter stats for 'ls':
                 0 krava1
    ...

running:
    perf stat -e cpu/config=1/u ls

will produce following output:
    ...
    Performance counter stats for 'ls':
                 0 raw 0x1
    ...

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337584373-2741-6-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Separate 'mem:' event scanner bits

Separating 'mem:' scanner processing, so we can parse out modifier
specifically and dont clash with other rules.

This is just precaution for the future, so we dont need to worry about
the rules clashing where we need to parse out any sub-rule of global
rules.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337584373-2741-5-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Use allocated list for each parsed event

Switch from using static temporary event list into dynamically allocated
one. This way we dont need to pass temp list to the parse_events_parse
which makes the interface more clear.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337584373-2741-4-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Add support for displaying event parser debug info

Adding PARSER_DEBUG Makefile variable to enable building event scanner/
parser with debug enabled. This results in verbose output right out of
the scanner/parser.

It's useful for debuging the event parser. Keeping this only for event
parser so far.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337584373-2741-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf test: Move parse event automated tests to separated object

Moving event parsing specific tests into separated file:

util/parse-events-test.c

Also changing the code a bit to ease running separate tests.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337584373-2741-2-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

hwmon: (it87) Make temp3 attribute conditional for IT8782F

On IT8782F, temp3 is only supported if UART6 is disabled.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Jean Delvare <khali@linux-fr.org>

hwmon: (it87) Convert to use devm_kzalloc and devm_request_region

This makes the code a bit simpler and smaller.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Jean Delvare <khali@linux-fr.org>

hwmon: INA219 and INA226 support

Add support for the Texas Instruments INA219 and INA226 power monitors.

Signed-off-by: Lothar Felten <l-felten@ti.com>
[guenter.roeck@ericsson.com: formatting cleanup; check for smbus word data;
select PGA=8 for INA219]
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>

sh: intc: Kill off special reservation interface.

At present reserving the IRLs in the IRQ bitmap in addition to the
dropping of the legacy IRQ pre-allocation prevent IRL IRQs from being
allocated for the x3proto board.

The only reason to permit reservations was to lock down possible hardware
vectors prior to dynamic IRQ scanning, but this doesn't matter much given
that the hardware controller configuration is sorted before we get around
to doing any dynamic IRQ allocation anyways. Beyond that, all of the
tables are __init annotated, so quite a bit more work would need to be
done to support reconfiguring things like IRL controllers on the fly,
much more than would ever make it worth the hassle.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>

Merge branches 'upstream-fixes', 'wacom' and 'waltop' into for-linus

Conflicts:
drivers/hid/hid-core.c

Merge branch 'upstream' into for-linus

Conflicts:
drivers/hid/hid-core.c

Merge branches 'device-groups', 'logitech' and 'multitouch' into for-linus

edac, mips: don't change code that has been removed in edac/mips tree

This is a partial revert of

15ed103a9800 ("edac: Fix spelling errors")
6997991ab0db ("mips: Fix printk typos in arc/mips")

which change code that doesn't exist any more in edac/mips trees.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

powerpc: Fix irq distribution

setting CONFIG_IRQ_ALL_CPUS distributes IRQs to CPUs only when
the number of online CPUs equals NR_CPUS. See commit
280ff97494e0fef4124bee5c52e39b23a18dd283 "sparc64: fix and
optimize irq distribution" for more details.

Using the online mask fixes IRQ-to-CPU distribution on systems
that boot with less than NR_CPUS.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Revert "powerpc/hw-breakpoint: Use generic hw-breakpoint interfaces for new PPC ptrace flags"

This reverts commit 1b788400bbcbfe25280dc0b8000d2142bfe3be3b.

It causes oopses when passed incorrect arguments and has a
design fault using IPIs with interrupts disabled.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

powerpc: Fixing a cputhread code documentation

--
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

ktest: Add README to explain what is in the examples directory

Add a README that explains what the different example configs in the
ktest example directory are about.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ktest: Add the snowball.conf example config

I used the snowball.conf in a live demo that demonstrated how to use
ktest.pl with a snowball ARM board. I've been asked to included that
config in the ktest repository.

Here it is.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ktest: Add an example config that does cross compiling of several archs

Add the config that I use to test several archs. I downloaded several
cross compilers from:

http://kernel.org/pub/tools/crosstool/files/bin/x86_64/

and this config is an example to crosscompile several archs to make sure
that your changes do not break archs that you are not working on.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ktest: Add kvm.conf example config

Add an example config that explains how to use ktest with a virtual
guest as the target.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ktest: Add useful example configs

I've been asked several times to provide more useful example configs for
ktest.pl, as the sample.conf is too complex (because it explains all
configs). This adds configs broken up by use case, and these configs are
based on actual configs that I use on a daily basis.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ktest: Add USE_OUTPUT_MIN_CONFIG to avoid prompt on make_min_config

If the file that OUTPUT_MIN_CONFIG exists then ktest.pl will prompt the
user and ask them if the OUTPUT_MIN_CONFIG should be used as the
starting point for make_min_config instead of MIN_CONFIG.

This is usually the case, and to allow the user to do so, which is
helpful if the user is creating different min configs based on tests,
and they know one is a superset of another test, they can set
USE_OUTPUT_MIN_CONFIG to one, which will prevent kest.pl from prompting
to use the OUTPUT_MIN_CONFIG and it will just use it.

If USE_OUTPUT_MIN_CONIFG is set to zero, then ktest.pl will continue to
use MIN_CONFIG instead.

The default is that USE_OUTPUT_MIN_CONFIG is undefined.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

unicore32: if there's no handler we need to restore sigmask, syscall or no syscall

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

xtensa: add handling of TIF_NOTIFY_RESUME

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

microblaze: drop 'oldset' argument of do_notify_resume()

never used...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

microblaze: handle TIF_NOTIFY_RESUME

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

score: add handling of NOTIFY_RESUME to do_notify_resume()

It's already called if TIF_NOTIFY_RESUME is set, so we only
need to add the actual work. Note that checking for RESTORE_SIGMASK
was not needed - set_restore_sigmask() also sets SIGPENDING, so
we never RESTORE_SIGMASK without SIGPENDING.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

m68k: add TIF_NOTIFY_RESUME and handle it.

TIF_NOTIFY_RESUME added (as bit 5). That way nommu glue needs no changes at
all; mmu one needs just to replace jmi do_signal_return to jne do_signal_return
There we have flags shifted up, until bit 6 (SIGPENDING) is in MSBit; instead
of checking that MSBit is set (jmi) we check that MSBit or something below it
is set (jne); bits 0..4 are never set, so that's precisely "bit 6 or bit 5 is
set".

Usual handling of NOTIFY_RESUME/SIGPENDING is done in do_notify_resume(); glue
calls it instead of do_signal().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fbdev: sh_mobile_lcdc: Don't confuse line size with pitch

When using the MERAM the LCDC line size needs to be programmed with a
MERAM-specific value different than the real frame buffer pitch. Fix it.

Reported-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: stable@vger.kernel.org # for 3.4
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sparc: kill ancient comment in sparc_sigaction()

It used to be true, until 2.1.78 (14 years ago) when we switched to
do_sigaction()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

h8300: missing checks of __get_user()/__put_user() return values

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

frv: missing checks of __get_user()/__put_user() return values

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

cris: missing checks of __get_user()/__put_user() return values

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

powerpc: missing checks of __get_user()/__put_user() return values

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

sh: missing checks of __get_user()/__put_user() return values

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

sparc: missing checks of __get_user()/__put_user() return values

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

avr32: struct old_sigaction is never used

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

m32r: struct old_sigaction is never used

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

xtensa: xtensa_sigaction doesn't exist

... and struct old_sigaction never used

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

alpha: tidy signal delivery up

* move force_sigsegv() (from setup...frame()) and clearing RESTART_SIGMASK
(from do_signal()) into hanlde_signal()
* get rid of handle_signal() return value and oldset argument
* checking for TIF_SIGPENDING is enough; set_restart_sigmask() sets this
one as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

score: don't open-code force_sigsegv()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

cris: don't open-code force_sigsegv()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

blackfin: don't open-code force_sigsegv()

... especially since we don't have the right k_sigaction here,
so resetting sa_handler won't work.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

avr32: need to clear RESTORE_SIGMASK on successful signal delivery

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

microblaze: bury sys_rt_sigsuspend_wrapper in nommu case too

It's been a dead code since commit 571202f50fad0aeb36661c79de9beed052347df8
Author: Michal Simek <monstr@monstr.eu>
Date:   Fri Dec 11 12:54:04 2009 +0100

    microblaze: Remove rt_sigsuspend wrapper

    Generic rt_sigsuspend syscalls doesn't need any asm wrapper.

but that commit has only removed it from entry.S, missing one in entry-nommu.S.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

cris: SA_ONESHOT handling is done by get_signal_to_deliver()

... and resetting sa_handler in local copy filled by get_signal_to_deliver()
is obviously pointless anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

h8300: switch to saved_sigmask-based sigsuspend/rt_sigsuspend

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

h8300: don't change blocked signals' mask if setting frame up fails

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

sh: switch to saved_sigmask-based sigsuspend()/rt_sigsuspend()

Complete the move of sh64 to it, trim the crap from prototypes,
tidy up a bit. Infrastructure in do_signal() had already been
there, in signal_64 as well as in signal_32 (where it was already
used).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

xtensa: switch to generic rt_sigsuspend(2)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

md/bitmap: record the space available for the bitmap in the superblock.

Now that bitmaps can grow and shrink it is best if we record
how much space is available. This means that when
we reduce the size of the bitmap we won't "lose" the space
for late when we might want to increase the size of the bitmap
again.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: Remove extras after reshape to smaller number of devices.

When a reshape which reduced the number of devices finishes
we must remove the extra devices.

So ensure that raid10_remove_disk won't try to keep them, and
have raid10_finish_reshape clear the 'in_sync' flag. Then
remove_and_add_spares will be able to remove them.

Reported-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: improve removal of extra devices after reshape.

After a reshape which reduced the number of devices we need
to disconnect the extra devices.
The code for this doesn't currently handle 'replacement' devices.
It is very unlikely that such devices will be present, but it is
safest to handle them anyway.

So simplify the handling. Just clear In_sync and leave it
to remove_and_add_spaces (which will be called soon) to do
the real works.

Signed-off-by: NeilBrown <neilb@suse.de>

md: check the return of mddev_find()

Check the return of mddev_find(), since it may fail due to out of
memeory or out of usable minor number.

The reason I chose -ENODEV instead of -ENOMEM or something else is
md_alloc() function chose that ;)

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

MD RAID1: Further conditionalize 'fullsync'

A RAID1 device does not necessarily need a fullsync if the bitmap can be used instead.

Similar to commit d6b212f4b19da5301e6b6eca562e5c7a2a6e8c8d in raid5.c, if a raid1
device can be brought back (i.e. from a transient failure) it shouldn't need a
complete resync. Provided the bitmap is not to old, it will have recorded the areas
of the disk that need recovery.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

DM RAID: Use md_error() in place of simply setting Faulty bit

When encountering an error while reading the superblock, call md_error.

We are currently setting the 'Faulty' bit on one of the array devices when an
error is encountered while reading the superblock of a dm-raid array. We should
be calling md_error(), as it handles the error more completely.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

DM RAID: Record and handle missing devices

Missing dm-raid devices should be recorded in the superblock

When specifying the devices that compose a DM RAID array, it is possible to denote
failed or missing devices with '-'s.  When this occurs, we must record this in the
superblock.  We do this by checking if the array position's data device is missing
and then forcing MD to record the superblock by setting 'MD_CHANGE_DEVS' in
'raid_resume'.  If we do not cause the superblock to be rewritten by the resume
function, it is possible for a stale superblock to be written by an out-going
in-active table (during 'raid_dtr').

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

DM RAID: Set recovery flags on resume

Properly initialize MD recovery flags when resuming device-mapper devices.

When a device-mapper device is suspended, all I/O must stop.  This is done by
calling 'md_stop_writes' and 'mddev_suspend'.  These calls in-turn manipulate
the recovery flags - including setting 'MD_RECOVERY_FROZEN'.  The DM device
may have been suspended while recovery was not yet complete, so the process
needs to pick-up where it left off.  Since 'mddev_resume' does not unset
'MD_RECOVERY_FROZEN' and set 'MD_RECOVERY_NEEDED', we must do it ourselves.
'MD_RECOVERY_NEEDED' can safely be set in 'mddev_resume', but 'MD_RECOVERY_FROZEN'
must be set outside of 'mddev_resume' due to how MD handles RAID reshaping.
(e.g.  It is possible for a user to delay reshaping a RAID5->RAID6 by purposefully
setting 'MD_RECOVERY_FROZEN'.  Clearing it in 'mddev_resume' would override the
desired behavior.)

Because 'mddev_resume' already unconditionally calls 'md_wakeup_thread(mddev->thread)'
there is no need to make this call from 'raid_resume' since it calls 'mddev_resume'.

Also clean up where  level_store calls mddev_resume() - it current
duplicates some of the funcitons of that call. - NB

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: Allow reshape while a bitmap is present.

We always should have allowed this. A raid5 reshape doesn't change
the size of the bitmap, so not need to restrict it.

Also add a test to make sure we don't try to start a reshape on a
failed array.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: resize bitmap when required during reshape.

If a reshape changes the size of the array, then we can now
update the bitmap to suit - so do so.

Signed-off-by: NeilBrown <neilb@suse.de>

md: allow array to be resized while bitmap is present.

Now that bitmaps can be resized, we can allow an array to be resized
while the bitmap is present.

This only covers resizing that involves changing the effective size
of member devices, not resizing that changes the number of devices.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: make sure reshape request are reflected in superblock.

As a reshape may change the sync_size and/or chunk_size, we need
to update these whenever we write out the bitmap superblock.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: add bitmap_resize function to allow bitmap resizing.

This function will allocate the new data structures and copy
bits across from old to new, allowing for the possibility that the
chunksize has changed.

Use the same function for performing the initial allocation
of the structures. This improves test coverage.

When bitmap_resize is used to resize an existing bitmap, it
only copies '1' bits in, not '0' bits.
So when allocating the bitmap, ensure everything is initialised
to ZERO.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: use DIV_ROUND_UP instead of open-code

Also take the opportunity to simplify CHUNK_BLOCK_RATIO.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: create a 'struct bitmap_counts' substructure of 'struct bitmap'

The new "struct bitmap_counts" contains all the fields that are
related to counting the number of active writes in each bitmap chunk.

Having this separate will make it easier to change the chunksize
or overall size of a bitmap atomically.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: make bitmap bitops atomic.

This allows us to remove spinlock protection which is
more heavy-weight than simple atomics.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: make _page_attr bitops atomic.

Using e.g. set_bit instead of __set_bit and using test_and_clear_bit
allow us to remove some locking and contract other locked ranges.

It is rare that we set or clear a lot of these bits, so gain should
outweigh any cost.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: merge bitmap_file_unmap and bitmap_file_put.

There functions really do one thing together: release the
'bitmap_storage'. So make them just one function.

Since we removed the locking (previous patch), we don't need to zero
any fields before freeing them, so it all becomes a bit simpler.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: remove async freeing of bitmap file.

There is no real value in freeing things the moment there is an error.
It is just as good to free the bitmap file and pages when the bitmap
is explicitly removed (and replaced?) or at shutdown.

With this gone, the bitmap will only disappear when the array is
quiescent, so we can remove some locking.

As the 'filemap' doesn't disappear now, include extra checks before
trying to write any of it out.
Also remove the check for "has it disappeared" in
bitmap_daemon_write().

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: convert some spin_lock_irqsave to spin_lock_irq

All of these sites can only be called from process context with
irqs enabled, so using irqsave/irqrestore just adds noise.
Remove it.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: use set_bit, test_bit, etc for operation on bitmap->flags.

We currently use '&' and '|' which isn't the norm in the kernel
and doesn't allow easy atomicity.
So change to bit numbers and {set,clear,test}_bit.
This allows us to remove a spinlock/unlock (which was dubious anyway)
and some other simplifications.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: remove single-bit manipulation on sb->state

Just do single-bit manipulations on bitmap->flags and copy whole
value between that and sb->state.

This will allow next patch which changes how bit manipulations are
performed on bitmap->flags.

This does result in BITMAP_STALE not being set in sb by
bitmap_read_sb, however as the setting is determined by other
information in the 'sb' we do not lose information this way.
Normally, bitmap_load will be called shortly which will clear
BITMAP_STALE anyway.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: remove bitmap_mask_state

This function isn't really needed. It sets or clears a flag in both
bitmap->flags and sb->state.
However both times it is called, bitmap_update_sb is called soon
afterwards which copies bitmap->flags to sb->state.
So just make changes to bitmap->flags, and open-code those rather than
hiding in a function.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: move storage allocation from bitmap_load to bitmap_create.

We should allocate memory for the storage-bitmap at create-time, not
load time.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: separate bitmap file allocation to its own function.

This will allow allocation before swapping in a new bitmap.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: store bytes in file rather than just in last page.

This number is more generally useful, and bytes-in-last-page is
easily extracted from it.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: move some fields of 'struct bitmap' into a 'storage' substruct.

This new 'struct bitmap_storage' reflects the external storage of the
bitmap.
Having this clearly defined will make it easier to change the storage
used while the array is active.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: change *_page_attr() to take a page number, not a page.

Most often we have the page number, not the page. And that is what
the *_page_attr() functions really want. So change the arguments to
take that number.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: centralise allocation of bitmap file pages.

Instead of allocating pages in read_sb_page, read_page and
bitmap_read_sb, allocate them all in bitmap_init_from disk.

Also replace the hack of calling "attach_page_buffers(page, NULL)" to
ensure that free_buffer() won't complain, by putting a test for
PagePrivate in free_buffer().

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: allow a bitmap with no backing storage.

An md bitmap comprises two parts
- internal counting of active writes per 'chunk'.
- external storage of whether there are any active writes on
each chunk

The second requires the first, but the first doesn't require the
second.

Not having backing storage means that the bitmap cannot expedite
resync after a crash, but it still allows us to expedite the recovery
of a recently-removed device.

So: allow a bitmap to exist even if there is no backing device.
In that case we default to 128M chunks.

A particular value of this is that we can remove and re-add a bitmap
(possibly of a different granularity) on a degraded array, and not
lose the information needed to fast-recover the missing device.

We don't actually activate these bitmaps yet - that will come
in a later patch.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: add new 'space' attribute for bitmaps.

If we are to allow bitmaps to be resized when the array is resized,
we need to know how much space there is.

So create an attribute to store this information and set appropriate
defaults.

It can be set more precisely via sysfs, or future metadata extensions
may allow it to be recorded.

Signed-off-by: NeilBrown <neilb@suse.de>

md/bitmap: disentangle two different 'pending' flags.

There are two different 'pending' concepts in the handling of the
write intent bitmap.

Firstly, a 'page' from the bitmap (which container PAGE_SIZE*8 bits)
may have changes (bits cleared) that should be written in due course.
There is no hurry for these and the page will transition from
PENDING to NEEDWRITE and will then be written, though if it ever
becomes DIRTY it will be written much sooner and PENDING will be
cleared.

Secondly, a page of counters - which contains PAGE_SIZE/2 counters, one
for each bit, can usefully have a 'pending' flag which indicates if
any of the counters are low (2 or 1) and ready to be processed by
bitmap_daemon_work(). If this flag is clear we can skip the whole
page.

These two concepts are currently combined in the bitmap-file flag.
This causes a tighter connection between the counters and the bitmap
file than I would like - as I want to add some flexibility to the
bitmap file.

So introduce a new flag with the page-of-counters, and rewrite
bitmap_daemon_work() so that it handles the two different 'pending'
concepts separately.

This also allows us to clear BITMAP_PAGE_PENDING when we write out
a dirty page, which may occasionally reduce the number of times we
write a page.

Signed-off-by: NeilBrown <neilb@suse.de>

raid5: support sync request

REQ_SYNC is ignored in current raid5 code. Block layer does use it to do
policy,
for example ioscheduler. This patch adds it.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>

raid5: remove unused variables

The two variables are useless.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: Fix memleak in r10buf_pool_alloc

If the allocation of rep1_bio fails, we currently don't free the 'bio'
of the same dev.

Reported by kmemleak.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid1: allow fix_read_error to read from recovering device.

When attempting to fix a read error, it is acceptable to read from a
device that is recovering, provided the recovery has got past the
place we are reading from. This makes the test for "can we read from
here" the same as the test in read_balance.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: move freeing of badblocks.page into md_rdev_clear

This ensures that it is always freed - there were case where
we failed to free the page.

Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: dm-raid should call helper function to clear rdev.

dm-raid currently open-codes the freeing of some members of
and rdev. It is more maintainable to have it call common code
from md.c which does this for all call-sites.

So remove free_disk_sb to md_rdev_clear, export it, and use it in
dm-raid.c

Signed-off-by: NeilBrown <neilb@suse.de>

lib/raid6: cleanup gen_syndrome function selection

Reorders functions in raid6_algos as well as the preference check
to reduce the number of functions tested on initialization.

Also, creates symmetry between choosing the gen_syndrome functions
and choosing the recovery functions.

Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

lib/raid6: update test program for recovery functions

Test each combination of recovery and syndrome generation
functions.

Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

lib/raid6: Add SSSE3 optimized recovery functions

Add SSSE3 optimized recovery functions, as well as a system
for selecting the most appropriate recovery functions to use.

Originally-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

lib/raid6: fix test program build

<linux/module.h> drags in headers which are not visible to userspace,
thus breaking the build for the test program.

Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

raid5: add AVX optimized RAID5 checksumming

Optimize RAID5 xor checksumming by taking advantage of
256-bit YMM registers introduced in AVX.

Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

crypto: disable preemption while benchmarking RAID5 xor checksumming

With CONFIG_PREEMPT=y, we need to disable preemption while benchmarking
RAID5 xor checksumming to ensure we're actually measuring what we think
we're measuring.

Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

crypto: wait for a full jiffy in do_xor_speed

In the existing do_xor_speed(), there is no guarantee that we actually
run do_2() for a full jiffy. We get the current jiffy, then run do_2()
until the next jiffy.

Instead, let's get the current jiffy, then wait until the next jiffy
to start our test.

Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: add reshape support

A 'near' or 'offset' lay RAID10 array can be reshaped to a different
'near' or 'offset' layout, a different chunk size, and a different
number of devices.
However the number of copies cannot change.

Unlike RAID5/6, we do not support having user-space backup data that
is being relocated during a 'critical section'. Rather, the
data_offset of each device must change so that when writing any block
to a new location, it will not over-write any data that is still
'live'.

This means that RAID10 reshape is not supportable on v0.90 metadata.

The different between the old data_offset and the new_offset must be
at least the larger of the chunksize multiplied by offset copies of
each of the old and new layout. (for 'near' mode, offset_copies == 1).

A larger difference of around 64M seems useful for in-place reshapes
as more data can be moved between metadata updates.
Very large differences (e.g. 512M) seem to slow the process down due
to lots of long seeks (on oldish consumer graded devices at least).

Metadata needs to be updated whenever the place we are about to write
to is considered - by the current metadata - to still contain data in
the old layout.

[unbalanced locking fix from Dan Carpenter <dan.carpenter@oracle.com>]

Signed-off-by: NeilBrown <neilb@suse.de>