summaryrefslogtreecommitdiff
path: root/tcg/i386
AgeCommit message (Collapse)Author
2019-05-13tcg/i386: Support vector absolute valueRichard Henderson
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13tcg: Add support for vector absolute valueRichard Henderson
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13tcg/i386: Support vector scalar shift opcodesRichard Henderson
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13tcg/i386: Support vector variable shift opcodesRichard Henderson
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13tcg: Add INDEX_op_dupm_vecRichard Henderson
Allow the backend to expand dup from memory directly, instead of forcing the value into a temp first. This is especially important if integer/vector register moves do not exist. Note that officially tcg_out_dupm_vec is allowed to fail. If it did, we could fix this up relatively easily: VECE == 32/64: Load the value into a vector register, then dup. Both of these must work. VECE == 8/16: If the value happens to be at an offset such that an aligned load would place the desired value in the least significant end of the register, go ahead and load w/garbage in high bits. Load the value w/INDEX_op_ld{8,16}_i32. Attempt a move directly to vector reg, which may fail. Store the value into the backing store for OTS. Load the value into the vector reg w/TCG_TYPE_I32, which must work. Duplicate from the vector reg into itself, which must work. All of which is well and good, except that all supported hosts can support dupm for all vece, so all of the failure paths would be dead code and untestable. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13tcg/i386: Implement tcg_out_dupm_vecRichard Henderson
At the same time, improve tcg_out_dupi_vec wrt broadcast from the constant pool. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13tcg: Add tcg_out_dupm_vec to the backend interfaceRichard Henderson
Currently stubbed out in all backends that support vectors. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13tcg: Manually expand INDEX_op_dup_vecRichard Henderson
This case is similar to INDEX_op_mov_* in that we need to do different things depending on the current location of the source. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Added some commentary to the tcg_reg_alloc_* functions.
2019-05-13tcg: Promote tcg_out_{dup,dupi}_vec to backend interfaceRichard Henderson
The i386 backend already has these functions, and the aarch64 backend could easily split out one. Nothing is done with these functions yet, but this will aid register allocation of INDEX_op_dup_vec in a later patch. Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13tcg: Return bool success from tcg_out_movRichard Henderson
This patch merely changes the interface, aborting on all failures, of which there are currently none. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24tcg: Restart TB generation after out-of-line ldst overflowRichard Henderson
This is part c of relocation overflow handling. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24tcg/i386: Support INDEX_op_extract2_{i32,i64}Richard Henderson
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24tcg: Add INDEX_op_extract2_{i32,i64}Richard Henderson
This will let backends implement the double-word shift operation. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-11tcg/i386: fix unsigned vector saturating arithmeticMark Cave-Ayland
Due to a cut/paste error in the original implementation, the unsigned vector saturating arithmetic was erroneously being calculated as signed vector saturating arithmetic. Fixes: 8ffafbcec2 ("tcg/i386: Implement vector saturating arithmetic") Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20190207224258.426-1-mark.cave-ayland@ilande.co.uk> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28cputlb: Remove static tlb sizingRichard Henderson
Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB, remove the define and the old code. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28tcg/i386: enable dynamic TLB sizingEmilio G. Cota
As the following experiments show, this series is a net perf gain, particularly for memory-heavy workloads. Experiments are run on an Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz. 1. System boot + shudown, debian aarch64: - Before (v3.1.0): Performance counter stats for './die.sh v3.1.0' (10 runs): 9019.797015 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% ) 29,910,312,379 cycles # 3.316 GHz ( +- 0.14% ) 54,699,252,014 instructions # 1.83 insn per cycle ( +- 0.08% ) 10,061,951,686 branches # 1115.541 M/sec ( +- 0.08% ) 172,966,530 branch-misses # 1.72% of all branches ( +- 0.07% ) 9.084039051 seconds time elapsed ( +- 0.23% ) - After: Performance counter stats for './die.sh tlb-dyn-v5' (10 runs): 8624.084842 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% ) 28,556,123,404 cycles # 3.311 GHz ( +- 0.13% ) 51,755,089,512 instructions # 1.81 insn per cycle ( +- 0.05% ) 9,526,513,946 branches # 1104.641 M/sec ( +- 0.05% ) 166,578,509 branch-misses # 1.75% of all branches ( +- 0.19% ) 8.680540350 seconds time elapsed ( +- 0.24% ) That is, a 4.4% perf increase. 2. System boot + shutdown, ubuntu 18.04 x86_64: - Before (v3.1.0): 56100.574751 task-clock (msec) # 1.016 CPUs utilized ( +- 4.81% ) 200,745,466,128 cycles # 3.578 GHz ( +- 5.24% ) 431,949,100,608 instructions # 2.15 insn per cycle ( +- 5.65% ) 77,502,383,330 branches # 1381.490 M/sec ( +- 6.18% ) 844,681,191 branch-misses # 1.09% of all branches ( +- 3.82% ) 55.221556378 seconds time elapsed ( +- 5.01% ) - After: 56603.419540 task-clock (msec) # 1.019 CPUs utilized ( +- 10.19% ) 202,217,930,479 cycles # 3.573 GHz ( +- 10.69% ) 439,336,291,626 instructions # 2.17 insn per cycle ( +- 14.14% ) 80,538,357,447 branches # 1422.853 M/sec ( +- 16.09% ) 776,321,622 branch-misses # 0.96% of all branches ( +- 3.77% ) 55.549661409 seconds time elapsed ( +- 10.44% ) No improvement (within noise range). Note that for this workload, increasing the time window too much can lead to perf degradation, since it flushes the TLB *very* frequently. 3. x86_64 SPEC06int: x86_64-softmmu speedup vs. v3.1.0 for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 5.5 +------------------------------------------------------------------------+ | +-+ | 5 |-+.................+-+...............................tlb-dyn-v5.......+-| | * * | 4.5 |-+.................*.*................................................+-| | * * | 4 |-+.................*.*................................................+-| | * * | 3.5 |-+.................*.*................................................+-| | * * | 3 |-+......+-+*.......*.*................................................+-| | * * * * | 2.5 |-+......*..*.......*.*.................................+-+*...........+-| | * * * * * * | 2 |-+......*..*.......*.*.................................*..*...........+-| | * * * * * * +-+ | 1.5 |-+......*..*.......*.*.................................*..*.*+-+.*+-+.+-| | * * *+-+ * * +-+ *+-+ +-+ +-+ * * * * * * | 1 |++++-+*+*++*+*++*++*+*++*+*+++-+*+*+-++*+-++++-++++-+++*++*+*++*+*++*+++| | * * * * * * * * * * * * * * * * * * * * * * * * * * | 0.5 +------------------------------------------------------------------------+ 400.perlb401.bzip403.g429445.g456.hm462.libq464.h471.omn47483.xalancbgeomean png: https://imgur.com/YRF90f7 That is, a 1.51x average speedup over the baseline, with a max speedup of 5.17x. Here's a different look at the SPEC06int results, using KVM as the baseline: x86_64-softmmu slowdown vs. KVM for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 25 +---------------------------------------------------------------------------+ | +-+ +-+ | | * * +-+ v3.1.0 | | * * +-+ tlb-dyn-v5 | | * * * * +-+ | 20 |-+.................*.*.............................*.+-+......*.*........+-| | * * * # # * * | | +-+ * * * # # * * | | * * * * * # # * * | 15 |-+......*.*........*.*.............................*.#.#......*.+-+......+-| | * * * * * # # * #|# | | * * * * +-+ * # # * +-+ | | * * +-+ * * ++-+ +-+ * # # * # # +-+ | | * * +-+ * * * ## *| +-+ * # # * # # +-+ | 10 |-+......*.*..*.+-+.*.*........*.##.......++-+.*.+-+*.#.#......*.#.#.*.*..+-| | * * * +-+ * * * ## +-+ *# # * # #* # # +-+ * # # * * | | * * * # # * * +-+ * ## * +-+ *# # * # #* # # * * * # # *+-+ | | * * * # # * * * +-+ * ## * # # *# # * # #* # # * * * # # * ## | 5 |-+......*.+-+*.#.#.*.*..*.#.#.*.##.*.#.#.*#.#.*.#.#*.#.#.*.*..*.#.#.*.##.+-| | * # #* # # * +-+* # # * ## * # # *# # * # #* # # * * * # # * ## | | * # #* # # * # #* # # * ## * # # *# # * # #* # # * +-+* # # * ## | | ++-+ * # #* # # * # #* # # * ## * # # *# # * # #* # # * # #* # # * ## | |+++*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+*+#+#+*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+++| 0 +---------------------------------------------------------------------------+ 400.perlbe401.bzi403.gc429445.go456.h462.libqu464.h471.omne4483.xalancbmgeomean png: https://imgur.com/YzAMNEV After this series, we bring down the average SPEC06int slowdown vs KVM from 11.47x to 7.58x. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190116170114.26802-4-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28tcg: introduce dynamic TLB sizingEmilio G. Cota
Disabled in all TCG backends for now. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190116170114.26802-3-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28tcg/i386: Implement vector minmax arithmeticRichard Henderson
The avx instruction set does not directly provide MO_64. We can still implement 64-bit with comparison and vpblendvb. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28tcg/i386: Implement vector saturating arithmeticRichard Henderson
Only MO_8 and MO_16 are implemented, since that's all the instruction set provides. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28tcg/i386: Split subroutines out of tcg_expand_vec_opRichard Henderson
This routine was becoming too large. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28tcg: Add opcodes for vector minmax arithmeticRichard Henderson
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28tcg: Add opcodes for vector saturated arithmeticRichard Henderson
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-11avoid TABs in files that only contain a fewPaolo Bonzini
Most files that have TABs only contain a handful of them. Change them to spaces so that we don't confuse people. disas, standard-headers, linux-headers and libdecnumber are imported from other projects and probably should be exempted from the check. Outside those, after this patch the following files still contain both 8-space and TAB sequences at the beginning of the line. Many of them have a majority of TABs, or were initially committed with all tabs. bsd-user/i386/target_syscall.h bsd-user/x86_64/target_syscall.h crypto/aes.c hw/audio/fmopl.c hw/audio/fmopl.h hw/block/tc58128.c hw/display/cirrus_vga.c hw/display/xenfb.c hw/dma/etraxfs_dma.c hw/intc/sh_intc.c hw/misc/mst_fpga.c hw/net/pcnet.c hw/sh4/sh7750.c hw/timer/m48t59.c hw/timer/sh_timer.c include/crypto/aes.h include/disas/bfd.h include/hw/sh4/sh.h libdecnumber/decNumber.c linux-headers/asm-generic/unistd.h linux-headers/linux/kvm.h linux-user/alpha/target_syscall.h linux-user/arm/nwfpe/double_cpdo.c linux-user/arm/nwfpe/fpa11_cpdt.c linux-user/arm/nwfpe/fpa11_cprt.c linux-user/arm/nwfpe/fpa11.h linux-user/flat.h linux-user/flatload.c linux-user/i386/target_syscall.h linux-user/ppc/target_syscall.h linux-user/sparc/target_syscall.h linux-user/syscall.c linux-user/syscall_defs.h linux-user/x86_64/target_syscall.h slirp/cksum.c slirp/if.c slirp/ip.h slirp/ip_icmp.c slirp/ip_icmp.h slirp/ip_input.c slirp/ip_output.c slirp/mbuf.c slirp/misc.c slirp/sbuf.c slirp/socket.c slirp/socket.h slirp/tcp_input.c slirp/tcpip.h slirp/tcp_output.c slirp/tcp_subr.c slirp/tcp_timer.c slirp/tftp.c slirp/udp.c slirp/udp.h target/cris/cpu.h target/cris/mmu.c target/cris/op_helper.c target/sh4/helper.c target/sh4/op_helper.c target/sh4/translate.c tcg/sparc/tcg-target.inc.c tests/tcg/cris/check_addo.c tests/tcg/cris/check_moveq.c tests/tcg/cris/check_swap.c tests/tcg/multiarch/test-mmap.c ui/vnc-enc-hextile-template.h ui/vnc-enc-zywrle.h util/envlist.c util/readline.c The following have only TABs: bsd-user/i386/target_signal.h bsd-user/sparc64/target_signal.h bsd-user/sparc64/target_syscall.h bsd-user/sparc/target_signal.h bsd-user/sparc/target_syscall.h bsd-user/x86_64/target_signal.h crypto/desrfb.c hw/audio/intel-hda-defs.h hw/core/uboot_image.h hw/sh4/sh7750_regnames.c hw/sh4/sh7750_regs.h include/hw/cris/etraxfs_dma.h linux-user/alpha/termbits.h linux-user/arm/nwfpe/fpopcode.h linux-user/arm/nwfpe/fpsr.h linux-user/arm/syscall_nr.h linux-user/arm/target_signal.h linux-user/cris/target_signal.h linux-user/i386/target_signal.h linux-user/linux_loop.h linux-user/m68k/target_signal.h linux-user/microblaze/target_signal.h linux-user/mips64/target_signal.h linux-user/mips/target_signal.h linux-user/mips/target_syscall.h linux-user/mips/termbits.h linux-user/ppc/target_signal.h linux-user/sh4/target_signal.h linux-user/sh4/termbits.h linux-user/sparc64/target_syscall.h linux-user/sparc/target_signal.h linux-user/x86_64/target_signal.h linux-user/x86_64/termbits.h pc-bios/optionrom/optionrom.h slirp/mbuf.h slirp/misc.h slirp/sbuf.h slirp/tcp.h slirp/tcp_timer.h slirp/tcp_var.h target/i386/svm.h target/sparc/asi.h target/xtensa/core-dc232b/xtensa-modules.inc.c target/xtensa/core-dc233c/xtensa-modules.inc.c target/xtensa/core-de212/core-isa.h target/xtensa/core-de212/xtensa-modules.inc.c target/xtensa/core-fsf/xtensa-modules.inc.c target/xtensa/core-sample_controller/core-isa.h target/xtensa/core-sample_controller/xtensa-modules.inc.c target/xtensa/core-test_kc705_be/core-isa.h target/xtensa/core-test_kc705_be/xtensa-modules.inc.c tests/tcg/cris/check_abs.c tests/tcg/cris/check_addc.c tests/tcg/cris/check_addcm.c tests/tcg/cris/check_addoq.c tests/tcg/cris/check_bound.c tests/tcg/cris/check_ftag.c tests/tcg/cris/check_int64.c tests/tcg/cris/check_lz.c tests/tcg/cris/check_openpf5.c tests/tcg/cris/check_sigalrm.c tests/tcg/cris/crisutils.h tests/tcg/cris/sys.c tests/tcg/i386/test-i386-ssse3.c ui/vgafont.h Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20181213223737.11793-3-pbonzini@redhat.com> Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Acked-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Eric Blake <eblake@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-17tcg: Add TCG_TARGET_HAS_MEMORY_BSWAPRichard Henderson
For now, defined universally as true, since we previously required backends to implement swapped memory operations. Future patches may now remove that support where it is onerous. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg/i386: Add setup_guest_base_seg for FreeBSDRichard Henderson
Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg/i386: Precompute all guest_base parametersRichard Henderson
These values are constant between all qemu_ld/st invocations; there is no need to figure this out each time. If we cannot use a segment or an offset directly for guest_base, load the value into a register in the prologue. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg/i386: Assume 32-bit values are zero-extendedRichard Henderson
We now have an invariant that all TCG_TYPE_I32 values are zero-extended, which means that we do not need to extend them again during qemu_ld/st, either explicitly via a separate tcg_out_ext32u or implicitly via P_ADDR32. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg/i386: Implement INDEX_op_extr{lh}_i64_i32 for 32-bit guestsRichard Henderson
This preserves the invariant that all TCG_TYPE_I32 values are zero-extended in the 64-bit host register. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg/i386: Propagate is64 to tcg_out_qemu_ld_slow_pathRichard Henderson
This helps preserve the invariant that all TCG_TYPE_I32 values are stored zero-extended in the 64-bit host registers. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg/i386: Propagate is64 to tcg_out_qemu_ld_directRichard Henderson
This helps preserve the invariant that all TCG_TYPE_I32 values are stored zero-extended in the 64-bit host registers. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg/i386: Return false on failure from patch_relocRichard Henderson
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg: Return success from patch_relocRichard Henderson
This will move the assert for success from within (subroutines of) patch_reloc into the callers. It will also let new code do something different when a relocation is out of range. For the moment, all backends are trivially converted to return true. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg/i386: Move TCG_REG_CALL_STACK from define to enumRichard Henderson
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg/i386: Always use %ebp for TCG_AREG0Richard Henderson
For x86_64, this can remove a REX prefix resulting in smaller code when manipulating globals of type i32, as we move them between backing store via cpu_env, aka TCG_AREG0. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26tcg/i386: fix vector operations on 32-bit hostsRoman Kapl
The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers. This was defined as no-op for 32-bit x86, with the assumption that we have eight registers anyway. This assumption is not true once we have xmm regs. Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and have overflown into other opcode fields, wreaking havoc. To trigger these problems, you can try running the "movi d8, #0x0" AArch64 instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated, but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2". Fixes: 770c2fc7bb ("Add vector operations") Signed-off-by: Roman Kapl <rka@sysgo.com> Message-Id: <20180824131734.18557-1-rka@sysgo.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-07-23tcg/i386: Mark xmm registers call-clobberedRichard Henderson
When host vector registers and operations were introduced, I failed to mark the registers call clobbered as required by the ABI. Fixes: 770c2fc7bb7 Cc: qemu-stable@nongnu.org Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15tcg: Reduce max TB opcode countRichard Henderson
Also, assert that we don't overflow any of two different offsets into the TB. Both unwind and goto_tb both record a uint16_t for later use. This fixes an arm-softmmu test case utilizing NEON in which there is a TB generated that runs to 7800 opcodes, and compiles to 96k on an x86_64 host. This overflows the 16-bit offset in which we record the goto_tb reset offset. Because of that overflow, we install a jump destination that goes to neverland. Boom. With this reduced op count, the same TB compiles to about 48k for aarch64, ppc64le, and x86_64 hosts, and neither assertion fires. Cc: qemu-stable@nongnu.org Reported-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15tcg/i386: Use byte form of xgetbv instructionJohn Arbuckle
The assembler in most versions of Mac OS X is pretty old and does not support the xgetbv instruction. To go around this problem, the raw encoding of the instruction is used instead. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Message-Id: <20180604215102.11002-1-programmingkidx@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-05-09tcg/i386: Fix dup_vec in non-AVX2 codepathPeter Maydell
The VPUNPCKLD* instructions are all "non-destructive source", indicated by "NDS" in the encoding string in the x86 ISA manual. This means that they take two source operands, one of which is encoded in the VEX.vvvv field. We were incorrectly treating them as if they were destructive-source and passing 0 as the 'v' argument of tcg_out_vex_modrm(). This meant we were always using %xmm0 as one of the source operands, causing incorrect results if the register allocator happened to want to use something else. For instance the input AArch64 insn: DUP v26.16b, w21 which becomes TCG IR ops: dup_vec v128,e8,tmp2,x21 st_vec v128,e8,tmp2,env,$0xa40 was assembled to: 0x607c568c: c4 c1 7a 7e 86 e8 00 00 vmovq 0xe8(%r14), %xmm0 0x607c5694: 00 0x607c5695: c5 f9 60 c8 vpunpcklbw %xmm0, %xmm0, %xmm1 0x607c5699: c5 f9 61 c9 vpunpcklwd %xmm1, %xmm0, %xmm1 0x607c569d: c5 f9 70 c9 00 vpshufd $0, %xmm1, %xmm1 0x607c56a2: c4 c1 7a 7f 8e 40 0a 00 vmovdqu %xmm1, 0xa40(%r14) 0x607c56aa: 00 when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1". This resulted in our incorrectly setting the output vector to q26=0000320000003200:0000320000003200 when given an input of x21 == 0000000002803200 rather than the expected all-zeroes. Pass the correct source register number to tcg_out_vex_modrm() for these insns. Fixes: 770c2fc7bb70804a Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180504153431.5169-1-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-03-16tcg/i386: Support INDEX_op_dup2_vec for -m32Richard Henderson
Unknown why -m32 was passing with gcc but not clang; it should have failed for both. This would be used for tcg_gen_dup_i64_vec, and visible with the right TB and an aarch64 guest. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08tcg/i386: Add vector operationsRichard Henderson
The x86 vector instruction set is extremely irregular. With newer editions, Intel has filled in some of the blanks. However, we don't get many 64-bit operations until SSE4.2, introduced in 2009. The subsequent edition was for AVX1, introduced in 2011, which added three-operand addressing, and adjusts how all instructions should be encoded. Given the relatively narrow 2 year window between possible to support and desirable to support, and to vastly simplify code maintainence, I am only planning to support AVX1 and later cpus. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10tcg/i386: constify tcg_target_callee_save_regsEmilio G. Cota
Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17tcg: Remove tcg_regset_set32Richard Henderson
It's not even clear what the interface REG and VAL32 were supposed to mean. All uses had REG = 0 and VAL32 was the bitset assigned to the destination. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17tcg: Remove tcg_regset_clearRichard Henderson
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-07tcg/i386: Store out-of-range call targets in constant poolRichard Henderson
Already it saves 2 bytes per call, but also the constant pool entry may well be shared across multiple calls. Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07tcg: Rearrange ldst label trackingRichard Henderson
Dispense with TCGBackendData, as it has never been used for more than holding a single pointer. Use a define in the cpu/tcg-target.h to signal requirement for TCGLabelQemuLdst, so that we can drop the no-op tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.hRichard Henderson
Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional function tb_target_set_jmp_target. While we're touching all backends, add a parameter for tb->tc_ptr; we're going to need it shortly for some backends. Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c. This opens the possibility for TCG_TARGET_HAS_direct_jump to be a runtime decision -- based on host cpu capabilities, the size of code_gen_buffer, or a future debugging switch. Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-07-24util: Introduce include/qemu/cpuid.hRichard Henderson
Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the supplied <cpuid.h> does not contain the bit_AVX2 define that we use when detecting whether the routine can be enabled. Introduce a qemu-specific header that uses the compiler's definition of __cpuid et al, but supplies any missing bit_* definitions needed. This avoids introducing any extra ifdefs to util/bufferiszero.c, and allows quite a few to be removed from tcg/i386/tcg-target.inc.c. Signed-off-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20170719044018.18063-1-rth@twiddle.net Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-05tcg/i386: implement goto_ptrEmilio G. Cota
Suggested-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-6-git-send-email-cota@braap.org> [rth: Reuse goto_ptr epilogue for exit_tb 0.] Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptrEmilio G. Cota
Instead of exporting goto_ptr directly to TCG frontends, export tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer returned by the lookup_tb_ptr() helper. This is the only use case we have for goto_ptr and lookup_tb_ptr, so having this function is very convenient. Furthermore, it trivially allows us to avoid calling the lookup helper if goto_ptr is not implemented by the backend. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-2-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-3-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-4-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-5-git-send-email-cota@braap.org> [rth: Squashed 4 related commits.] Signed-off-by: Richard Henderson <rth@twiddle.net>