OSDN Git Service
summary |
shortlog | log |
commit |
commitdiff |
tree
first ⋅ prev ⋅ next
Siarhei Siamashka [Fri, 2 Jul 2010 12:25:39 +0000 (15:25 +0300)]
sbc: ARM NEON optimizations for input permutation in SBC encoder
Using SIMD optimizations for 'sbc_enc_process_input_*' functions provides
a modest, but consistent speedup in all SBC encoding cases.
Benchmarked on ARM Cortex-A8:
== Before: ==
$ time ./sbcenc -b53 -s8 -j test.au > /dev/null
real 0m4.389s
user 0m3.969s
sys 0m0.422s
samples % image name symbol name
26234 29.9625 sbcenc sbc_pack_frame
20057 22.9076 sbcenc sbc_analyze_4b_8s_neon
14306 16.3393 sbcenc sbc_calculate_bits
9866 11.2682 sbcenc sbc_enc_process_input_8s_be
8506 9.7149 no-vmlinux /no-vmlinux
5219 5.9608 sbcenc sbc_calc_scalefactors_j_neon
2280 2.6040 sbcenc sbc_encode
661 0.7549 libc-2.10.1.so memcpy
== After: ==
$ time ./sbcenc -b53 -s8 -j test.au > /dev/null
real 0m3.989s
user 0m3.602s
sys 0m0.391s
samples % image name symbol name
26057 32.6128 sbcenc sbc_pack_frame
20003 25.0357 sbcenc sbc_analyze_4b_8s_neon
14220 17.7977 sbcenc sbc_calculate_bits
8498 10.6361 no-vmlinux /no-vmlinux
5300 6.6335 sbcenc sbc_calc_scalefactors_j_neon
3235 4.0489 sbcenc sbc_enc_process_input_8s_be_neon
2172 2.7185 sbcenc sbc_encode
Siarhei Siamashka [Fri, 2 Jul 2010 12:25:38 +0000 (15:25 +0300)]
sbc: ARM NEON optimized joint stereo processing in SBC encoder
Improves SBC encoding performance when joint stereo is used, which
is a typical A2DP configuration.
Benchmarked on ARM Cortex-A8:
== Before: ==
$ time ./sbcenc -b53 -s8 -j test.au > /dev/null
real 0m5.239s
user 0m4.805s
sys 0m0.430s
samples % image name symbol name
26083 25.0856 sbcenc sbc_pack_frame
21548 20.7240 sbcenc sbc_calc_scalefactors_j
19910 19.1486 sbcenc sbc_analyze_4b_8s_neon
14377 13.8272 sbcenc sbc_calculate_bits
9990 9.6080 sbcenc sbc_enc_process_input_8s_be
8667 8.3356 no-vmlinux /no-vmlinux
2263 2.1765 sbcenc sbc_encode
696 0.6694 libc-2.10.1.so memcpy
== After: ==
$ time ./sbcenc -b53 -s8 -j test.au > /dev/null
real 0m4.389s
user 0m3.969s
sys 0m0.422s
samples % image name symbol name
26234 29.9625 sbcenc sbc_pack_frame
20057 22.9076 sbcenc sbc_analyze_4b_8s_neon
14306 16.3393 sbcenc sbc_calculate_bits
9866 11.2682 sbcenc sbc_enc_process_input_8s_be
8506 9.7149 no-vmlinux /no-vmlinux
5219 5.9608 sbcenc sbc_calc_scalefactors_j_neon
2280 2.6040 sbcenc sbc_encode
661 0.7549 libc-2.10.1.so memcpy
Johan Hedberg [Wed, 30 Jun 2010 08:55:11 +0000 (11:55 +0300)]
sbc: Fix signedness of libsbc parameters
The written parameter of sbc_encode can be negative so it should be
ssize_t instead of size_t.
Siarhei Siamashka [Tue, 29 Jun 2010 13:48:47 +0000 (16:48 +0300)]
sbc: ARM NEON optimization for scale factors calculation
Improves SBC encoding performance when joint stereo is not used.
Benchmarked on ARM Cortex-A8:
== Before: ==
$ time ./sbcenc -b53 -s8 test.au > /dev/null
real 0m4.756s
user 0m4.313s
sys 0m0.438s
samples % image name symbol name
2569 27.6296 sbcenc sbc_pack_frame
1934 20.8002 sbcenc sbc_analyze_4b_8s_neon
1386 14.9064 sbcenc sbc_calculate_bits
1221 13.1319 sbcenc sbc_calc_scalefactors
996 10.7120 sbcenc sbc_enc_process_input_8s_be
878 9.4429 no-vmlinux /no-vmlinux
204 2.1940 sbcenc sbc_encode
56 0.6023 libc-2.10.1.so memcpy
== After: ==
$ time ./sbcenc -b53 -s8 test.au > /dev/null
real 0m4.220s
user 0m3.797s
sys 0m0.422s
samples % image name symbol name
2563 31.3249 sbcenc sbc_pack_frame
1892 23.1239 sbcenc sbc_analyze_4b_8s_neon
1368 16.7196 sbcenc sbc_calculate_bits
961 11.7453 sbcenc sbc_enc_process_input_8s_be
836 10.2176 no-vmlinux /no-vmlinux
262 3.2022 sbcenc sbc_calc_scalefactors_neon
199 2.4322 sbcenc sbc_encode
49 0.5989 libc-2.10.1.so memcpy
Siarhei Siamashka [Tue, 29 Jun 2010 13:48:46 +0000 (16:48 +0300)]
sbc: MMX optimization for scale factors calculation
Improves SBC encoding performance when joint stereo is not used.
Benchmarked on Pentium-M:
== Before: ==
$ time ./sbcenc -b53 -s8 test.au > /dev/null
real 0m1.439s
user 0m1.336s
sys 0m0.104s
samples % image name symbol name
8642 33.7473 sbcenc sbc_pack_frame
5873 22.9342 sbcenc sbc_analyze_4b_8s_mmx
4435 17.3188 sbcenc sbc_calc_scalefactors
4285 16.7331 sbcenc sbc_calculate_bits
1942 7.5836 sbcenc sbc_enc_process_input_8s_be
322 1.2574 sbcenc sbc_encode
== After: ==
$ time ./sbcenc -b53 -s8 test.au > /dev/null
real 0m1.319s
user 0m1.220s
sys 0m0.084s
samples % image name symbol name
8706 37.9959 sbcenc sbc_pack_frame
5740 25.0513 sbcenc sbc_analyze_4b_8s_mmx
4307 18.7972 sbcenc sbc_calculate_bits
1937 8.4537 sbcenc sbc_enc_process_input_8s_be
1801 7.8602 sbcenc sbc_calc_scalefactors_mmx
307 1.3399 sbcenc sbc_encode
Siarhei Siamashka [Tue, 29 Jun 2010 13:48:45 +0000 (16:48 +0300)]
sbc: new 'sbc_calc_scalefactors_j' function added to sbc primitives
The code for scale factors calculation with joint stereo support has
been moved to a separate function. It can get platform-specific
SIMD optimizations later for best possible performance.
But even this change in C code improves performance because of the
use of __builtin_clz() instead of loops similar to what was done
to sbc_calc_scalefactors earlier. Also technically it does loop
unrolling by processing two channels at once, which might be either
good or bad for performance (if the registers pressure is increased
and more data is spilled to memory). But the benchmark from 32-bit
x86 system (pentium-m) shows that it got clearly faster:
$ time ./sbcenc.old -b53 -s8 -j test.au > /dev/null
real 0m1.868s
user 0m1.808s
sys 0m0.048s
$ time ./sbcenc.new -b53 -s8 -j test.au > /dev/null
real 0m1.742s
user 0m1.668s
sys 0m0.064s
Gustavo F. Padovan [Sat, 5 Jun 2010 10:14:28 +0000 (07:14 -0300)]
sbc: Fix redundant null check on calling free()
Issues found by smatch static check: http://smatch.sourceforge.net/
Johan Hedberg [Thu, 7 Jan 2010 09:02:51 +0000 (11:02 +0200)]
sbc: Update Nokia copyrights
Marcel Holtmann [Sat, 2 Jan 2010 01:08:17 +0000 (17:08 -0800)]
sbc: Update copyright information
Siarhei Siamashka [Fri, 17 Apr 2009 15:27:38 +0000 (18:27 +0300)]
sbc: added saturated clipping of decoder output to 16-bit
This prevents overflows and audible artefacts for the audio files which
originally had loudness maximized. Music from audio CD disks is an
example of such files, see http://en.wikipedia.org/wiki/Loudness_war
Marcel Holtmann [Thu, 16 Apr 2009 23:55:42 +0000 (01:55 +0200)]
sbc: Do some coding style cleanups
Lennart Poettering [Mon, 23 Mar 2009 15:44:11 +0000 (16:44 +0100)]
sbc: fix up sbc.h prototypes to use const/size_t wherever applicable
Luiz Augusto von Dentz [Wed, 1 Apr 2009 13:47:39 +0000 (10:47 -0300)]
sbc: Remove unused variable.
Siarhei Siamashka [Mon, 16 Mar 2009 00:27:26 +0000 (02:27 +0200)]
sbc: ensure 16-byte buffer position alignment for 4 subbands encoding
Buffer position in X array was not always 16-bytes aligned.
Strict 16-byte alignment is strictly required for powerpc altivec
simd optimizations because altivec does not have support for
unaligned vector loads at all.
Luiz Augusto von Dentz [Fri, 20 Mar 2009 21:40:43 +0000 (18:40 -0300)]
sbc: Fix misuse of 'frame.joint' when estimating the frame length.
'frame.joint' is not the flag for joint stereo mode, it is a set of bits which
show for which subbands channels joining was actually used.
Johan Hedberg [Thu, 12 Mar 2009 19:33:14 +0000 (16:33 -0300)]
sbc: Fix a couple of other places that should use size_t and ssize_t
Marc-André Lureau [Tue, 17 Feb 2009 20:46:41 +0000 (22:46 +0200)]
sbc: don't dereference sbc pointer if NULL
Marc-André Lureau [Mon, 16 Feb 2009 13:59:51 +0000 (15:59 +0200)]
sbc: provide implementation info as a readable string
This is mainly useful for logging and debugging.
Lennart Poettering [Mon, 2 Feb 2009 00:57:14 +0000 (01:57 +0100)]
sbc: make check_mmx_support() a proper C function
Signed-off-by: Lennart Poettering <lennart@poettering.net>
Marcel Holtmann [Thu, 29 Jan 2009 23:02:58 +0000 (00:02 +0100)]
sbc: Fix SBC to compile cleanly with -Wsign-compare
Siarhei Siamashka [Thu, 29 Jan 2009 16:15:31 +0000 (18:15 +0200)]
sbc: Fix for SBC encoding with block sizes other than 16
Thanks to Christian Hoene for finding and reporting the
problem. This regression was intruduced in commit
19af3c49e61aa046375497108e05a3a0605da158
Marcel Holtmann [Thu, 29 Jan 2009 16:32:58 +0000 (17:32 +0100)]
sbc: Add -Wno-sign-compare for the library and fix the other warnings
Siarhei Siamashka [Thu, 29 Jan 2009 00:17:36 +0000 (02:17 +0200)]
sbc: SBC encoder scale factors calculation optimized with __builtin_clz
Count leading zeros operation is often implemented using a special
instruction for it on various architectures (at least this is true
for ARM and x86). Using __builtin_clz gcc intrinsic allows to
eliminate innermost loop in scale factors calculation and improve
performance. Also scale factors calculation can be optimized even
more using SIMD instructions.
Siarhei Siamashka [Tue, 27 Jan 2009 16:57:35 +0000 (18:57 +0200)]
sbc: Performance optimizations for input data processing in SBC encoder
Channels deinterleaving, endian conversion and samples reordering
is done in one pass, avoiding the use of intermediate buffer. Also
this code is implemented as a new "performance primitive", which
allows further platform specific optimizations (ARMv6 and ARM NEON
should gain quite a lot from assembly optimizations here).
Siarhei Siamashka [Wed, 21 Jan 2009 19:08:34 +0000 (21:08 +0200)]
sbc: Use of -funroll-loops option to improve SBC encoder performance
Added the use of -funroll-loops gcc option for SBC. Also in
order to gain better effect, 'sbc_pack_frame' function
body moved to an inline function, which gets instantiated
for 4 different subbands/channels combinations. So that
'frame_subbands' and 'frame_channels' arguments become compile
time constants and can be better optimized by the compiler.
Siarhei Siamashka [Wed, 21 Jan 2009 22:12:40 +0000 (00:12 +0200)]
sbc: Audio quality improvement for 16-bit fixed point SBC encoder
Multiplying the first part of the analysis filter constant tables
by some coefficients and dividing the second part by the same
coefficients is a transformation which should produce the same
results if rounding errors are not taken into account. These
additional C0/C1/... coefficients can be varied in a certain
range (the requirement is that we still do not get overflows).
The 'magic' values for these coefficients are selected in such
a way that the rounding errors are minimized (rounding errors
are unavoidable when putting all the floating constants into
16-bit tables and losing some of the fractional part).
Also non-SIMD variant of the analysis filter is dropped because
keeping it would require applying a similar change to its tables,
which is a bit tricky and just increases maintenance overhead.
Siarhei Siamashka [Sun, 18 Jan 2009 21:10:00 +0000 (23:10 +0200)]
sbc: Fix sbcenc breakage when au file header size is larger than 24 bytes
Siarhei Siamashka [Fri, 16 Jan 2009 15:23:54 +0000 (17:23 +0200)]
sbc: Performance optimizations for sbcenc utility
Read and write buffers sizes increased, memmove overhead eliminated.
Nonportable cast from 'unsigned char *' to 'struct au_header *' is
now also resolved as part of the changes.
Siarhei Siamashka [Sat, 17 Jan 2009 18:30:40 +0000 (20:30 +0200)]
sbc: Coding style fixes
Johan Hedberg [Fri, 16 Jan 2009 18:29:43 +0000 (20:29 +0200)]
sbc: Fix indentation to use only tabs
Siarhei Siamashka [Thu, 15 Jan 2009 18:25:49 +0000 (20:25 +0200)]
sbc: MMX and ARM NEON optimized versions of analysis filter for SBC encoder
Siarhei Siamashka [Thu, 15 Jan 2009 17:45:36 +0000 (19:45 +0200)]
sbc: SBC arrays and constant tables aligned at 16 byte boundary for SIMD
Most SIMD instruction sets benefit from data being naturally aligned.
And even if it is not strictly required, performance is usually better
with the aligned data. ARM NEON and SSE2 have different instruction
variants for aligned/unaligned memory accesses.
Siarhei Siamashka [Thu, 15 Jan 2009 17:11:23 +0000 (19:11 +0200)]
sbc: SIMD-friendly variant of SBC encoder analysis filter
Added SIMD-friendly C implementation of SBC analysis filter (the
structure of code had to be changed a bit and constants in the
tables reordered). This code can be used as a reference for
developing platform specific SIMD optimizations. These functions
are put into a new file 'sbc_primitives.c', which is going to
contain all the basic stuff for SBC codec.
Siarhei Siamashka [Wed, 7 Jan 2009 12:28:48 +0000 (14:28 +0200)]
sbc: Fix for big endian problems in SBC codec
Christian Hoene [Mon, 5 Jan 2009 12:26:08 +0000 (13:26 +0100)]
sbc: Fixed correct handling of frame sizes in the encoder
Siarhei Siamashka [Sun, 4 Jan 2009 01:11:12 +0000 (03:11 +0200)]
sbc: Use of constant shift in SBC quantization code to make it faster
The result of 32x32->64 unsigned long multiplication is returned
in two registers (high and low 32-bit parts) for many 32-bit
architectures. For these architectures constant right shift by
32 bits is optimized out by the compiler to just taking the high
32-bit part. Also some data needed at the quantization stage is
precalculated beforehand to improve performance.
Marcel Holtmann [Thu, 1 Jan 2009 18:33:20 +0000 (19:33 +0100)]
sbc: Update copyright information
Siarhei Siamashka [Wed, 31 Dec 2008 07:14:25 +0000 (09:14 +0200)]
sbc: Added possibility to analyze 4 blocks at once in SBC encoder
This change is needed for SIMD optimizations which will follow
shortly. And even for non-SIMD capable platforms it still may
be useful to have possibility to merge several analyzing functions
together into one for better code scheduling or reusing loaded
constants. Also analysis filter functions are now called using
function pointers, which allows the default implementation to be
overrided at runtime (with high precision variant or MMX/SSE2/NEON
optimized code).
Siarhei Siamashka [Sun, 28 Dec 2008 01:22:59 +0000 (03:22 +0200)]
sbc: New SBC analysis filter function to replace current broken code
This code is heavily based on the patch submitted by Jaska Uimonen.
Additional changes include preserving extra bits in the output of
filter function for better precision, support for both 16-bit and
32-bit fixed point implementation. Sign of some table values was
changed in order to preserve a regular code structure and have
multiply-accumulate oparations only. No additional optimizations
were applied as this code is intended to be some kind of "reference"
implementation. Platform specific optimizations may require
different tricks and can be branched off from this implementation.
Some extra information about this code can be found in linux-bluetooth
mailing list archive for December 2008.
Siarhei Siamashka [Sat, 27 Dec 2008 17:36:14 +0000 (19:36 +0200)]
sbc: Fixed subbands selection for joint-stereo in SBC encoder
Marcel Holtmann [Tue, 23 Dec 2008 22:56:32 +0000 (23:56 +0100)]
sbc: Add more options to control encoding methods
Marcel Holtmann [Tue, 23 Dec 2008 22:41:38 +0000 (23:41 +0100)]
sbc: Don't decode a frame if it is too small
Luiz Augusto von Dentz [Thu, 18 Dec 2008 22:22:31 +0000 (19:22 -0300)]
sbc: Remove unnecessary code and fix a coding style.
Siarhei Siamashka [Wed, 17 Dec 2008 20:32:11 +0000 (22:32 +0200)]
sbc: Fix for overflow bug in SBC quantization code
The result of multiplication does not always fit into 32-bits. Using 64-bit
calculations helps to avoid overflows and sound quality problems in encoded
audio. Overflows are more likely to show up when using high values for
bitpool setting.
Siarhei Siamashka [Thu, 11 Dec 2008 19:21:28 +0000 (21:21 +0200)]
sbc: Bitstream writing optimization for SBC encoder
SBC encoder performance improvement up to 1.5x for ARM11
and almost twice faster for Intel Core2 in some cases.
Marcel Holtmann [Fri, 31 Oct 2008 23:14:46 +0000 (00:14 +0100)]
sbc: Add more options to SBC encoder and decoder
Marcel Holtmann [Fri, 31 Oct 2008 22:55:13 +0000 (23:55 +0100)]
sbc: Fix SBC gain mismatch
Marcel Holtmann [Thu, 30 Oct 2008 19:01:06 +0000 (20:01 +0100)]
sbc: Fix SBC decoding handling
Marcel Holtmann [Sun, 26 Oct 2008 00:04:44 +0000 (02:04 +0200)]
sbc: Let the decoder write Sun/NeXT audio S16_BE files
Marcel Holtmann [Sat, 25 Oct 2008 23:32:52 +0000 (01:32 +0200)]
sbc: Add bitpool option to encoder
Marcel Holtmann [Sat, 25 Oct 2008 22:26:20 +0000 (00:26 +0200)]
sbc: Fix missing encoding of last frame
Marcel Holtmann [Mon, 30 Jul 2012 02:34:44 +0000 (19:34 -0700)]
sbc: Add low-complexity, subband codec support
Marcel Holtmann [Wed, 11 Jul 2012 12:51:00 +0000 (09:51 -0300)]
Initial revision