summaryrefslogtreecommitdiff
path: root/util
AgeCommit message (Collapse)Author
2018-11-06util/qemu-thread-posix: Fix qemu_thread_atexit* for OSXPeter Maydell
Our current implementation of qemu_thread_atexit* is broken on OSX. This is because it works by cerating a piece of thread-specific data with pthread_key_create() and using the destructor function for that data to run the notifier function passed to it by the caller of qemu_thread_atexit_add(). The expected use case is that the caller uses a __thread variable as the notifier, and uses the callback to clean up information that it is keeping per-thread in __thread variables. Unfortunately, on OSX this does not work, because on OSX a __thread variable may be destroyed (freed) before the pthread_key_create() destructor runs. (POSIX imposes no ordering constraint here; the OSX implementation happens to implement __thread variables in terms of pthread_key_create((), whereas Linux uses different mechanisms that mean the __thread variables will still be present when the pthread_key_create() destructor is run.) Fix this by switching to a scheme similar to the one qemu-thread-win32 uses for qemu_thread_atexit: keep the thread's notifiers on a __thread variable, and run the notifiers on calls to qemu_thread_exit() and on return from the start routine passed to qemu_thread_start(). We do this with the pthread_cleanup_push() API. We take advantage of the qemu_thread_atexit_add() API permission not to run thread notifiers on process exit to avoid having to special case the main thread. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181105135538.28025-3-peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-06oslib-posix: Use MAP_STACK in qemu_alloc_stack() on OpenBSDBrad Smith
Use MAP_STACK in qemu_alloc_stack() on OpenBSD. Added to our 6.4 release. MAP_STACK Indicate that the mapping is used as a stack. This flag must be used in combination with MAP_ANON and MAP_PRIVATE. Implement MAP_STACK option for mmap(). Synchronous faults (pagefault and syscall) confirm the stack register points at MAP_STACK memory, otherwise SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified to create a MAP_STACK sub-region which satisfies alignment requirements. Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the contents of the region -- there is no mprotect() equivalent operation, so there is no MAP_STACK-adding gadget. Signed-off-by: Brad Smith <brad@comstyle.com> Reviewed-by: Kamil Rytarowski <n54@gmx.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20181019125239.GA13884@humpty.home.comstyle.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-11-05option: Make option help nicer to readMax Reitz
This adds some whitespace into the option help (including indentation) and puts angle brackets around the type names. Furthermore, the list name is no longer printed as part of every line, but only once in advance, and only if the caller did not print a caption already. This patch also restores the description alignment we had before commit 9cbef9d68ee1d8d0, just at 24 instead of 16 characters like we used to. This increase is because now we have the type and two spaces of indentation before the description, and with a usual type name length of three chracters, this sums up to eight additional characters -- which means that we now need 24 characters to get the same amount of padding for most options. Also, 24 is a third of 80, which makes it kind of a round number in terminal terms. Finally, this patch amends the reference output of iotest 082 to match the changes (and thus makes it pass again). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-29bitmap: Update count after a mergeEric Blake
We need an accurate count of the number of bits set in a bitmap after a merge. In particular, since the merge operation short-circuits a merge from an empty source, if you have bitmaps A, B, and C where B started empty, then merge C into B, and B into A, an inaccurate count meant that A did not get the contents of C. In the worst case, we may falsely regard the bitmap as empty when it has had new writes merged into it. Fixes: be58721db CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20181002233314.30159-1-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29dirty-bitmap: make it possible to restore bitmap after mergeVladimir Sementsov-Ogievskiy
Add backup parameter to bdrv_merge_dirty_bitmap() to be used then with bdrv_restore_dirty_bitmap() if it needed to restore the bitmap after merge operation. This is needed to implement bitmap merge transaction action in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com>
2018-10-29util: aio-posix: fix a typoLi Qiang
Cc: qemu-trivial@nongnu.org Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1538964972-3223-1-git-send-email-liq3ea@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-10-23Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2018-10-22' ↵Peter Maydell
into staging Error reporting patches for 2018-10-22 # gpg: Signature made Mon 22 Oct 2018 13:20:23 BST # gpg: using RSA key 3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-error-2018-10-22: (40 commits) error: Drop bogus "use error_setg() instead" admonitions vpc: Fail open on bad header checksum block: Clean up bdrv_img_create()'s error reporting vl: Simplify call of parse_name() vl: Fix exit status for -drive format=help blockdev: Convert drive_new() to Error vl: Assert drive_new() does not fail in default_drive() fsdev: Clean up error reporting in qemu_fsdev_add() spice: Clean up error reporting in add_channel() tpm: Clean up error reporting in tpm_init_tpmdev() numa: Clean up error reporting in parse_numa() vnc: Clean up error reporting in vnc_init_func() ui: Convert vnc_display_init(), init_keyboard_layout() to Error ui/keymaps: Fix handling of erroneous include files vl: Clean up error reporting in device_init_func() vl: Clean up error reporting in parse_fw_cfg() vl: Clean up error reporting in mon_init_func() vl: Clean up error reporting in machine_set_property() vl: Clean up error reporting in chardev_init_func() qom: Clean up error reporting in user_creatable_add_opts_foreach() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-10-19error: Drop bogus "use error_setg() instead" admonitionsMarkus Armbruster
Commit 97f40301f1d "error: Functions to report warnings and informational messages" copied the "use error_setg() instead" admonition from the error reporting functions to new functions even though it doesn't actually apply there. Drop it. Also drop it from vreport(), where it doesn't apply anymore. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20181019123923.26649-1-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-19error: Fix use of error_prepend() with &error_fatal, &error_abortMarkus Armbruster
From include/qapi/error.h: * Pass an existing error to the caller with the message modified: * error_propagate(errp, err); * error_prepend(errp, "Could not frobnicate '%s': ", name); Fei Li pointed out that doing error_propagate() first doesn't work well when @errp is &error_fatal or &error_abort: the error_prepend() is never reached. Since I doubt fixing the documentation will stop people from getting it wrong, introduce error_propagate_prepend(), in the hope that it lures people away from using its constituents in the wrong order. Update the instructions in error.h accordingly. Convert existing error_prepend() next to error_propagate to error_propagate_prepend(). If any of these get reached with &error_fatal or &error_abort, the error messages improve. I didn't check whether that's the case anywhere. Cc: Fei Li <fli@suse.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-19qemu-timer: avoid checkpoints for virtual clock timers in external subsystemsArtem Pisarenko
Adds EXTERNAL attribute definition to qemu timers subsystem and assigns it to virtual clock timers, used in slirp (ICMP IPv6) and ui (key queue). Virtual clock processing in rr mode can use this attribute instead of a separate clock type. Fixes: 87f4fe7653baf55b5c2f2753fe6003f473c07342 Fixes: 775a412bf83f6bc0c5c02091ee06cf649b34c593 Fixes: 9888091404a702d7ec79d51b088d994b9fc121bd Signed-off-by: Artem Pisarenko <artem.k.pisarenko@gmail.com> Message-Id: <e771f96ab94e86b54b9a783c974f2af3009fe5d1.1539764043.git.artem.k.pisarenko@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-19qemu-timer: introduce timer attributesArtem Pisarenko
Attributes are simple flags, associated with individual timers for their whole lifetime. They intended to be used to mark individual timers for special handling when they fire. New/init functions family in timer interface updated and refactored (new 'attribute' argument added, timer_list replaced with timer_list_group+type combinations, comments improved to avoid info duplication). Also existing aio interface extended with attribute-enabled variants of functions, which create/initialize timers. Signed-off-by: Artem Pisarenko <artem.k.pisarenko@gmail.com> Message-Id: <f47b81dbce734e9806f9516eba8ca588e6321c2f.1539764043.git.artem.k.pisarenko@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-19Revert some patches from recent [PATCH v6] "Fixing record/replay and adding ↵Artem Pisarenko
reverse debugging" That patch series introduced new virtual clock type for use in external subsystems. It breaks desired behavior in non-record/replay usage scenarios due to a small change to existing behavior. Processing of virtual timers belonging to new clock type is kicked off to the main loop, which makes these timers asynchronous with vCPU thread and, in icount mode, with whole guest execution. This breaks expected determinism in non-record/replay icount mode of emulation where these "external subsystems" are isolated from the host (i.e. they are external only to guest core, not to the entire emulation environment). Example for slirp ("user" backend for network device): User runs qemu in icount mode with rtc clock=vm without any external communication interfaces but with "-netdev user,restrict=on". It expects deterministic execution, because network services are emulated inside qemu and isolated from host. There are no reasons to get reply from DHCP server with different delay or something like that. The next patches revert reimplements the same changes in a better way. This reverts commit 87f4fe7653baf55b5c2f2753fe6003f473c07342. This reverts commit 775a412bf83f6bc0c5c02091ee06cf649b34c593. This reverts commit 9888091404a702d7ec79d51b088d994b9fc121bd. Signed-off-by: Artem Pisarenko <artem.k.pisarenko@gmail.com> Message-Id: <18b1e7c8f155fe26976f91be06bde98eef6f8751.1539764043.git.artem.k.pisarenko@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-05qemu-option: improve qemu_opts_print_help() outputMarc-André Lureau
Modify qemu_opts_print_help(): - to print expected argument type - skip description if not available - sort lines - prefix with the list name (like qdev, to avoid confusion) - drop 16-chars alignment, use a '-' as seperator for option name and description For ex, "-spice help" output is changed from: port No description available tls-port No description available addr No description available [...] gl No description available rendernode No description available to: spice.addr=str spice.agent-mouse=bool (on/off) spice.disable-agent-file-xfer=bool (on/off) [...] spice.x509-key-password=str spice.zlib-glz-wan-compression=str "qemu-img create -f qcow2 -o help", changed from: size Virtual disk size compat Compatibility level (0.10 or 1.1) backing_file File name of a base image [...] lazy_refcounts Postpone refcount updates refcount_bits Width of a reference count entry in bits to: backing_file=str - File name of a base image backing_fmt=str - Image format of the base image cluster_size=size - qcow2 cluster size [...] refcount_bits=num - Width of a reference count entry in bits size=size - Virtual disk size Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-05qemu-option: add help fallback to print the list of optionsMarc-André Lureau
QDev options accept 'help' (or '?', but that's problematic with shell globbing) in the list of parameters, which is handy to list the available options. Unfortunately, this isn't built in QemuOpts. qemu_opts_parse_noisily() seems to be the common path for command line options, so place a fallback to print help, listing the available options. This is quite handy, for example with qemu "-spice help". Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-05cutils: add qemu_pstrcmp0()Marc-André Lureau
A char** variant of g_strcmp0(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com>
2018-10-02timer: introduce new virtual clockPavel Dovgalyuk
Slirp and VNC modules use virtual clock for processing some events that are related to the guest execution speed. But virtual clock-related events are consideres to be deterministic and are recorded/replayed by icount mechanism. But slirp and VNC lie outside the recorded guest core (which includes CPU and peripherals). Therefore slirp and VNC are external for the guest, but should work at guest speed. This patch introduces new virtual clock which can be used for external subsystems for running timers that are synchronized with the guest. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Message-Id: <20180912082002.3228.82417.stgit@pasha-VirtualBox> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02util: use fcntl() for qemu_write_pidfile() lockingMarc-André Lureau
Daniel Berrangé suggested to use fcntl() locks rather than lockf(). 'man lockf': On Linux, lockf() is just an interface on top of fcntl(2) locking. Many other systems implement lockf() in this way, but note that POSIX.1 leaves the relationship between lockf() and fcntl(2) locks unspecified. A portable application should probably avoid mixing calls to these interfaces. IOW, if its just a shim around fcntl() on many systems, it is clearer if we just use fcntl() directly, as we then know how fcntl() locks will behave if they're on a network filesystem like NFS. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180831145314.14736-3-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02util: add qemu_write_pidfile()Marc-André Lureau
There are variants of qemu_create_pidfile() in qemu-pr-helper and qemu-ga. Let's have a common implementation in libqemuutil. The code is initially based from pr-helper write_pidfile(), with various improvements and suggestions from Daniel Berrangé: QEMU will leave the pidfile existing on disk when it exits which initially made me think it avoids the deletion race. The app managing QEMU, however, may well delete the pidfile after it has seen QEMU exit, and even if the app locks the pidfile before deleting it, there is still a race. eg consider the following sequence QEMU 1 libvirtd QEMU 2 1. lock(pidfile) 2. exit() 3. open(pidfile) 4. lock(pidfile) 5. open(pidfile) 6. unlink(pidfile) 7. close(pidfile) 8. lock(pidfile) IOW, at step 8 the new QEMU has successfully acquired the lock, but the pidfile no longer exists on disk because it was deleted after the original QEMU exited. While we could just say no external app should ever delete the pidfile, I don't think that is satisfactory as people don't read docs, and admins don't like stale pidfiles being left around on disk. To make this robust, I think we might want to copy libvirt's approach to pidfile acquisition which runs in a loop and checks that the file on disk /after/ acquiring the lock matches the file that was locked. Then we could in fact safely let QEMU delete its own pidfiles on clean exit.. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180831145314.14736-2-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02hostmem-memfd: add checks before adding hostmem-memfd & propertiesMarc-André Lureau
Run some memfd-related checks before registering hostmem-memfd & various properties. This will help libvirt to figure out what the host is supposed to be capable of. qemu_memfd_check() is changed to a less optimized version, since it is used with various flags, it no longer caches the result. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180906161415.8543-1-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02qsp: use atomic64 accessorsEmilio G. Cota
With the seqlock, we either have to use atomics to remain within defined behaviour (and note that 64-bit atomics aren't always guaranteed to compile, irrespective of __nocheck), or drop the atomics and be in undefined behaviour territory. Fix it by dropping the seqlock and using atomic64 accessors. This will limit scalability when !CONFIG_ATOMIC64, but those machines (1) don't have many users and (2) are unlikely to have many cores. - With CONFIG_ATOMIC64: $ tests/atomic_add-bench -n 1 -m -p Throughput: 13.00 Mops/s - Forcing !CONFIG_ATOMIC64: $ tests/atomic_add-bench -n 1 -m -p Throughput: 10.89 Mops/s Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20180910232752.31565-5-cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02util: add atomic64Emilio G. Cota
This introduces read/set accessors for int64_t and uint64_t. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20180910232752.31565-3-cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02cacheinfo: add i/d cache_linesize_logEmilio G. Cota
Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20180910232752.31565-2-cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-28Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180926' into stagingPeter Maydell
Queued tcg patches # gpg: Signature made Wed 26 Sep 2018 19:27:22 BST # gpg: using RSA key 64DF38E8AF7E215F # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20180926: tcg/i386: fix vector operations on 32-bit hosts qht-bench: add -p flag to precompute hash values qht: constify arguments to some internal functions qht: constify qht_statistics_init qht: constify qht_lookup qht: fix comment in qht_bucket_remove_entry qht: drop ht argument from qht iterators test-qht: speed up + test qht_resize test-qht: test deletion of the last entry in a bucket test-qht: test removal of non-existent entries test-qht: test qht_iter_remove qht: add qht_iter_remove qht: remove unused map param from qht_remove__locked Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-28Merge remote-tracking branch 'remotes/famz/tags/staging-pull-request' into ↵Peter Maydell
staging Block and testing patches - Paolo's AIO fixes. - VMDK streamOptimized corner case fix - VM testing improvment on -cpu # gpg: Signature made Wed 26 Sep 2018 03:54:08 BST # gpg: using RSA key CA35624C6A9171C6 # gpg: Good signature from "Fam Zheng <famz@redhat.com>" # Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6 * remotes/famz/tags/staging-pull-request: vmdk: align end of file to a sector boundary tests/vm: Use -cpu max rather than -cpu host aio-posix: do skip system call if ctx->notifier polling succeeds aio-posix: compute timeout before polling aio-posix: fix concurrent access to poll_disable_cnt Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-26qht: constify arguments to some internal functionsEmilio G. Cota
These functions do not modify their @ht or @bucket arguments. Constify those arguments. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26qht: constify qht_statistics_initEmilio G. Cota
Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26qht: constify qht_lookupEmilio G. Cota
seqlock_read_begin takes a const param since c04649eeea ("seqlock: constify seqlock_read_begin", 2018-08-23), so we can constify the entire lookup. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26qht: fix comment in qht_bucket_remove_entryEmilio G. Cota
Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26qht: drop ht argument from qht iteratorsEmilio G. Cota
Accessing the HT from an iterator results almost always in a deadlock. Given that only one qht-internal function uses this argument, drop it from the interface. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26qht: add qht_iter_removeEmilio G. Cota
This currently has no users, but the use case is so common that I think we must support it. Note that without the appended we cannot safely remove a set of elements; a 2-step approach (i.e. qht_iter first, keep track of the to-be-deleted elements, and then a bunch of qht_remove calls) would be racy, since between the iteration and the removals other threads might insert additional elements. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26qht: remove unused map param from qht_remove__lockedEmilio G. Cota
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26aio-posix: do skip system call if ctx->notifier polling succeedsPaolo Bonzini
Commit 70232b5253 ("aio-posix: Don't count ctx->notifier as progress when 2018-08-15), by not reporting progress, causes aio_poll to execute the system call when polling succeeds because of ctx->notifier. This introduces latency before the call to aio_bh_poll() and negates the advantages of polling, unfortunately. The fix builds on the previous patch, separating the effect of polling on the timeout from the progress reported to aio_poll(). ctx->notifier does zero the timeout, causing the caller to skip the system call, but it does not report progress, so that the bug fix of commit 70232b5253 still stands. Fixes: 70232b5253a3c4e03ed1ac47ef9246a8ac66c6fa Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-4-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-26aio-posix: compute timeout before pollingPaolo Bonzini
This is a preparation for the next patch, and also a very small optimization. Compute the timeout only once, before invoking try_poll_mode, and adjust it in run_poll_handlers. The adjustment is the polling time when polling fails, or zero (non-blocking) if polling succeeds. Fixes: 70232b5253a3c4e03ed1ac47ef9246a8ac66c6fa Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-3-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-26aio-posix: fix concurrent access to poll_disable_cntPaolo Bonzini
It is valid for an aio_set_fd_handler to happen concurrently with aio_poll. In that case, poll_disable_cnt can change under the heels of aio_poll, and the assertion on poll_disable_cnt can fail in run_poll_handlers. Therefore, this patch simply checks the counter on every polling iteration. There are no particular needs for ordering, since the polling loop is terminated anyway by aio_notify at the end of aio_set_fd_handler. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-2-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-25Merge remote-tracking branch ↵Peter Maydell
'remotes/huth-gitlab/tags/pull-request-2018-09-25' into staging - Deprecate the usage of a network backend via "name" instead of "id" - Deprecate the "enforce-config-section" machine parameter - Re-enable the wdt_ib700, endianness and vmxnet3 qtests - Some trivial fixes and doc update patches that crossed my way # gpg: Signature made Tue 25 Sep 2018 16:58:42 BST # gpg: using RSA key 2ED9D774FE702DB5 # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" # gpg: aka "Thomas Huth <thuth@redhat.com>" # gpg: aka "Thomas Huth <huth@tuxfamily.org>" # gpg: aka "Thomas Huth <th.huth@posteo.de>" # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * remotes/huth-gitlab/tags/pull-request-2018-09-25: Revert "check: Move VMXNET3 test to common" Revert "check: Move endianess test to common" Revert "check: Move wdt_ib700 test to common" tests/migration: Speed up the test on ppc64 hw/qdev-core: Fix description of instance_init qdev: fix a typo in comment docs: Fix some typos (most found by codespell) trivial: Make bios files and source files non-executable memfd: fix possible usage of the uninitialized file descriptor hw/core/machine: Officially deprecate the enforce-config-section parameter net/slirp: Deprecate the [hub_id name] parameter tuple net: Deprecate the "name" parameter of -net Makefile: Add missing dependency for qemu-deprecated.texi Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-25memfd: fix possible usage of the uninitialized file descriptorDima Stepanov
The qemu_memfd_alloc_check() routine allocates the fd variable on stack. This variable is initialized inside the qemu_memfd_alloc() function. There are several cases when *fd will be left unintialized which can lead to the unexpected close() in the qemu_memfd_free() call. Set file descriptor to -1 before calling the qemu_memfd_alloc routine. Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25block: Use a single global AioWaitKevin Wolf
When draining a block node, we recurse to its parent and for subtree drains also to its children. A single AIO_WAIT_WHILE() is then used to wait for bdrv_drain_poll() to become true, which depends on all of the nodes we recursed to. However, if the respective child or parent becomes quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent is checked, while AIO_WAIT_WHILE() depends on the AioWait of the original node. Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE(). This may mean that the draining thread gets a few more unnecessary wakeups because an unrelated operation got completed, but we already wake it up when something _could_ have changed rather than only if it has certainly changed. Apart from that, drain is a slow path anyway. In theory it would be possible to use wakeups more selectively and still correctly, but the gains are likely not worth the additional complexity. In fact, this patch is a nice simplification for some places in the code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25block: Add missing locking in bdrv_co_drain_bh_cb()Kevin Wolf
bdrv_do_drained_begin/end() assume that they are called with the AioContext lock of bs held. If we call drain functions from a coroutine with the AioContext lock held, we yield and schedule a BH to move out of coroutine context. This means that the lock for the home context of the coroutine is released and must be re-acquired in the bottom half. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cbSergio Lopez
AIO Coroutines shouldn't by managed by an AioContext different than the one assigned when they are created. aio_co_enter avoids entering a coroutine from a different AioContext, calling aio_co_schedule instead. Scheduled coroutines are then entered by co_schedule_bh_cb using qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the current AioContext obtained with qemu_get_current_aio_context. Eventually, co->ctx will be set to the AioContext passed as an argument to qemu_aio_coroutine_enter. This means that, if an IO Thread's AioConext is being processed by the Main Thread (due to aio_poll being called with a BDS AioContext, as it happens in AIO_WAIT_WHILE among other places), the AioContext from some coroutines may be wrongly replaced with the one from the Main Thread. This is the root cause behind some crashes, mainly triggered by the drain code at block/io.c. The most common are these abort and failed assertion: util/async.c:aio_co_schedule 456 if (scheduled) { 457 fprintf(stderr, 458 "%s: Co-routine was already scheduled in '%s'\n", 459 __func__, scheduled); 460 abort(); 461 } util/qemu-coroutine-lock.c: 286 assert(mutex->holder == self); But it's also known to cause random errors at different locations, and even SIGSEGV with broken coroutine backtraces. By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can pass the correct AioContext as an argument, making sure co->ctx is not wrongly altered. Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-09-24qemu-error: add {error, warn}_report_once_condCornelia Huck
Add two functions to print an error/warning report once depending on a passed-in condition variable and flip it if printed. This is useful if you want to print a message not once-globally, but e.g. once-per-device. Inspired by warn_once() in hw/vfio/ccw.c, which has been replaced with warn_report_once_cond(). Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20180830145902.27376-2-cohuck@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Function comments reworded] Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-08-27Merge remote-tracking branch ↵Peter Maydell
'remotes/kraxel/tags/ui-20180827-v4-pull-request' into staging ui: misc fixes which piled up during 3.0 release freeze # gpg: Signature made Mon 27 Aug 2018 09:53:07 BST # gpg: using RSA key 4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/ui-20180827-v4-pull-request: util: promote qemu_egl_rendernode_open() to libqemuutil dmabuf: add y0_top, pass it to spice ui/vnc: Remove useless parenthesis around DIV_ROUND_UP macro ui/sdl2: Fix broken -full-screen CLI option spice-display: fix qemu_spice_cursor_refresh_bh locking spice-display: access ptr_x/ptr_y under Mutex vnc: remove support for deprecated tls, x509, x509verify options doc: switch to modern syntax for VNC TLS setup sdl2: redraw correctly when scanout_mode enabled. ui: use enum to string helpers vnc: fix memleak of the "vnc-worker-output" name ui/sdl2: Remove the obsolete SDL_INIT_NOPARACHUTE flag Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-08-27util: promote qemu_egl_rendernode_open() to libqemuutilMarc-André Lureau
vhost-user-gpu will share the same code to open a DRM node. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180713130916.4153-20-marcandre.lureau@redhat.com> [ kraxel: buildfix: util/drm.o must be CONFIG_OPENGL not CONFIG_LINUX ] Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2018-08-24json: Reject invalid UTF-8 sequencesMarkus Armbruster
We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1, \xF5..\xFF in the lexer. That's insufficient; there's plenty of invalid UTF-8 not containing these bytes, as demonstrated by check-qjson: * Malformed sequences - Unexpected continuation bytes - Missing continuation bytes after start bytes other than \xC0..\xC1, \xF5..\xFD. * Overlong sequences with start bytes other than \xC0..\xC1, \xF5..\xFD. * Invalid code points Fixing this in the lexer would be bothersome. Fixing it in the parser is straightforward, so do that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-23-armbru@redhat.com>
2018-08-23util/oslib-win32: indicate alignment for qemu_anon_ram_alloc()David Hildenbrand
Let's set the alignment just like for the posix variant. This will implicitly set the alignment of the underlying memory region and therefore make memory_region_get_alignment(mr) return something > 0 for all memory backends applicable to PCDIMM/NVDIMM. The allocation granularity is ususally 64k, while the page size is 4k. The documentation of VirtualAlloc is not really comprehensible in case only MEM_COMMIT is specified without an address. We'll detect the actual values and then go for the bigger one. The expection is, that it will always be 64k aligned. (The assumption is that MEM_COMMIT does an implicit MEM_RESERVE, so the address will always be aligned to the allocation granularity. And the allocation granularity is always bigger than the page size). This will allow us to drop special handling in pc.c for memory_region_get_alignment(mr) == 0, as we can then assume that it is always set (and AFAICS >= getpagesize()). For pc in pc_memory_plug(), under Windows TARGET_PAGE_SIZE == getpagesize(), therefore alignment of DIMMs will not change, and therefore also not the guest physical memory layout. For spapr in spapr_memory_plug(), an alignment of 0 would have been used until now. As QEMU_ALIGN_UP will crash with the alignment being 0, this never worked, so we don't have to care about compatibility handling. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180801133444.11269-3-david@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-23module: Use QEMU_MODULE_DIR as a search pathryang
The current paths for modules are CONFIG_QEMU_MODDIR and paths relative to the executable. Qemu and its modules can be installed and executed in paths that are different from these search paths. This change allows a search path to be specified by environment variable. An example usage for this is postmarketOS[1]. This is a build environment for Alpine Linux. It sets up Alpine Linux in a chroot environment. Alpine's Qemu packages are installed in the chroot. The Alpine Linux Qemu package is used to test compiled Alpine Linux system images. This way there isn't a reliance on the which ever version of Qemu the host system / distro provides. postmarketOS executes Qemu on host system outside of the chroot The Qemu module search path needs to point to the location of the chroot relative to the host system. e.g. The root of the Alpine Linux chroot is: ~/.local/var/pmbootstrap/chroot_native/ Alpine's Qemu is installed at ~/.local/var/pmbootstrap/chroot_native/usr/bin/ The Qemu module search path needs to be: QEMU_MODULE_DIR=~/.local/var/pmbootstrap/chroot_native/usr/lib/qemu/ [1] https://postmarketos.org/ Signed-off-by: ryang <decatf@gmail.com> Message-Id: <20180704181010.GA918@computer> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-23qsp: track BQL callers explicitlyEmilio G. Cota
The BQL is acquired via qemu_mutex_lock_iothread(), which makes the profiler assign the associated wait time (i.e. most of BQL wait time) entirely to that function. This loses the original call site information, which does not help diagnose BQL contention. Fix it by tracking the callers explicitly. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-23qsp: support call site coalescingEmilio G. Cota
Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-23qsp: add qsp_resetEmilio G. Cota
I first implemented this by deleting all entries in the global hash table. But doing that safely slows down profiling, since we'd need to introduce rcu_read_lock/unlock in the fast path. What's implemented here avoids messing with the thread-local data in the global hash table. It achieves this by taking a snapshot of the current state, so that subsequent reports present the delta wrt to the snapshot. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-23qsp: add sort_by option to qsp_reportEmilio G. Cota
Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-23qsp: QEMU's Synchronization ProfilerEmilio G. Cota
The goal of this module is to profile synchronization primitives (i.e. mutexes, recursive mutexes and condition variables) so that scalability issues can be quickly diagnosed. Sync primitives are profiled by QSP based on the vaddr of the object accessed as well as the call site (file:line_nr). That means the same object called from two different call sites will be tracked in separate entries, which might be reported together or separately (see subsequent commit on call site coalescing). Some perf numbers: Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz Command: taskset -c 0 tests/atomic_add-bench -d 5 -m - Before: 54.80 Mops/s - After: 54.75 Mops/s That is, a negligible slowdown due to the now indirect call to qemu_mutex_lock. Note that using a branch instead of an indirect call introduces a more severe slowdown (53.65 Mops/s, i.e. 2% slowdown). Enabling the profiler (with -p, added in this series) is more interesting: - No profiling: 54.75 Mops/s - W/ profiling: 12.53 Mops/s That is, a 4.36X slowdown. We can break down this slowdown by removing the get_clock calls or the entry lookup: - No profiling: 54.75 Mops/s - W/o get_clock: 25.37 Mops/s - W/o entry lookup: 19.30 Mops/s - W/ profiling: 12.53 Mops/s Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>