target/ppc: Moved VMLADDUHM to decodetree and use gvec
This patch moves VMLADDUHM to decodetree a creates a gvec implementation
using mul_vec and add_vec.
rept loop master patch
8 12500 0,
01810500 0,
00903100 (-50.1%)
25 4000 0,
01739400 0,
00747700 (-57.0%)
100 1000 0,
01843600 0,
00901400 (-51.1%)
500 200 0,
02574600 0,
01971000 (-23.4%)
2500 40 0,
05921600 0,
07121800 (+20.3%)
8000 12 0,
15326700 0,
21725200 (+41.7%)
The significant difference in performance when REPT is low and LOOP is
high I think is due to the fact that the new implementation has a higher
translation time, as when using a helper only 5 TCGop are used but with
the patch a total of 10 TCGop are needed (Power lacks a direct mul_vec
equivalent so this instruction is implemented with the help of 5 others,
vmuleu, vmulou, vmrgh, vmrgl and vpkum).
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <
20221019125040.48028-2-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>