summaryrefslogtreecommitdiff
path: root/include/block
AgeCommit message (Collapse)Author
2021-09-15block: Clarify that @bytes is no limit on *pnumHanna Reitz
.bdrv_co_block_status() implementations are free to return a *pnum that exceeds @bytes, because bdrv_co_block_status() in block/io.c will clamp *pnum as necessary. On the other hand, if drivers' implementations return values for *pnum that are as large as possible, our recently introduced block-status cache will become more effective. So, make a note in block_int.h that @bytes is no upper limit for *pnum. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210812084148.14458-4-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2021-09-15block: block-status cache for data regionsHanna Reitz
As we have attempted before (https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html, "file-posix: Cache lseek result for data regions"; https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html, "file-posix: Cache next hole"), this patch seeks to reduce the number of SEEK_DATA/HOLE operations the file-posix driver has to perform. The main difference is that this time it is implemented as part of the general block layer code. The problem we face is that on some filesystems or in some circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the implementation is outside of qemu, there is little we can do about its performance. We have already introduced the want_zero parameter to bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls unless we really want zero information; but sometimes we do want that information, because for files that consist largely of zero areas, special-casing those areas can give large performance boosts. So the real problem is with files that consist largely of data, so that inquiring the block status does not gain us much performance, but where such an inquiry itself takes a lot of time. To address this, we want to cache data regions. Most of the time, when bad performance is reported, it is in places where the image is iterated over from start to end (qemu-img convert or the mirror job), so a simple yet effective solution is to cache only the current data region. (Note that only caching data regions but not zero regions means that returning false information from the cache is not catastrophic: Treating zeroes as data is fine. While we try to invalidate the cache on zero writes and discards, such incongruences may still occur when there are other processes writing to the image.) We only use the cache for nodes without children (i.e. protocol nodes), because that is where the problem is: Drivers that rely on block-status implementations outside of qemu (e.g. SEEK_DATA/HOLE). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210812084148.14458-3-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [hreitz: Added `local_file == bs` assertion, as suggested by Vladimir] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15block: Drop BDS comment regarding bdrv_append()Hanna Reitz
There is a comment above the BDS definition stating care must be taken to consider handling newly added fields in bdrv_append(). Actually, this comment should have said "bdrv_swap()" as of 4ddc07cac (nine years ago), and in any case, bdrv_swap() was dropped in 8e419aefa (six years ago). So no such care is necessary anymore. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-2-hreitz@redhat.com>
2021-09-01block/block-copy: block_copy_state_new(): drop extra argumentsVladimir Sementsov-Ogievskiy
The only caller pass copy_range and compress both false. Let's just drop these arguments. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824083856.17408-35-vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/backup: move cluster size calculation to block-copyVladimir Sementsov-Ogievskiy
The main consumer of cluster-size is block-copy. Let's calculate it here instead of passing through backup-top. We are going to publish copy-before-write filter soon, so it will be created through options. But we don't want for now to make explicit option for cluster-size, let's continue to calculate it automatically. So, now is the time to get rid of cluster_size argument for bdrv_cbw_append(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-10-vsementsov@virtuozzo.com> [hreitz: Add qemu/error-report.h include to block/block-copy.c] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/block-copy: introduce block_copy_set_copy_opts()Vladimir Sementsov-Ogievskiy
We'll need a possibility to set compress and use_copy_range options after initialization of the state. So make corresponding part of block_copy_state_new() separate and public. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824083856.17408-8-vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block-copy: move detecting fleecing scheme to block-copyVladimir Sementsov-Ogievskiy
We want to simplify initialization interface of copy-before-write filter as we are going to make it public. So, let's detect fleecing scheme exactly in block-copy code, to not pass this information through extra levels. Why not just set BDRV_REQ_SERIALISING unconditionally: because we are going to implement new more efficient fleecing scheme which will not rely on backing feature. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-7-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block: introduce bdrv_replace_child_bs()Vladimir Sementsov-Ogievskiy
Add function to transactionally replace bs inside BdrvChild. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-2-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-07-26hw/nvme: use symbolic names for registersKlaus Jensen
Add the NvmeBarRegs enum and use these instead of explicit register offsets. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-07-26hw/nvme: split pmrmsc register into upper and lowerKlaus Jensen
The specification uses a set of 32 bit PMRMSCL and PMRMSCU registers to make up the 64 bit logical PMRMSC register. Make it so. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-07-21iothread: add aio-max-batch parameterStefano Garzarella
The `aio-max-batch` parameter will be propagated to AIO engines and it will be used to control the maximum number of queued requests. When there are in queue a number of requests equal to `aio-max-batch`, the engine invokes the system call to forward the requests to the kernel. This parameter allows us to control the maximum batch size to reduce the latency that requests might accumulate while queued in the AIO engine queue. If `aio-max-batch` is equal to 0 (default value), the AIO engine will use its default maximum batch size value. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20210721094211.69853-3-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-10Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches - Make blockdev-reopen stable - Remove deprecated qemu-img backing file without format - rbd: Convert to coroutines and add write zeroes support - rbd: Updated MAINTAINERS - export/fuse: Allow other users access to the export - vhost-user: Fix backends without multiqueue support - Fix drive-backup transaction endless drained section # gpg: Signature made Fri 09 Jul 2021 13:49:22 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (28 commits) block: Make blockdev-reopen stable API iotests: Test reopening multiple devices at the same time block: Support multiple reopening with x-blockdev-reopen block: Acquire AioContexts during bdrv_reopen_multiple() block: Add bdrv_reopen_queue_free() qcow2: Fix dangling pointer after reopen for 'file' qemu-img: Improve error for rebase without backing format qemu-img: Require -F with -b backing image qcow2: Prohibit backing file changes in 'qemu-img amend' blockdev: fix drive-backup transaction endless drained section vhost-user: Fix backends without multiqueue support MAINTAINERS: add block/rbd.c reviewer block/rbd: fix type of task->complete iotests/fuse-allow-other: Test allow-other iotests/308: Test +w on read-only FUSE exports export/fuse: Let permissions be adjustable export/fuse: Give SET_ATTR_SIZE its own branch export/fuse: Add allow-other option export/fuse: Pass default_permissions for mount util/uri: do not check argument of uri_free() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-09block: Acquire AioContexts during bdrv_reopen_multiple()Kevin Wolf
As the BlockReopenQueue can contain nodes in multiple AioContexts, only one of which may be locked when AIO_WAIT_WHILE() can be called, we can't let the caller lock the right contexts. Instead, individually lock the AioContext of a single node when iterating the queue. Reintroduce bdrv_reopen() as a wrapper for reopening a single node that drains the node and temporarily drops the AioContext lock for bdrv_reopen_multiple(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210708114709.206487-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block: Add bdrv_reopen_queue_free()Alberto Garcia
Move the code to free a BlockReopenQueue to a separate function. It will be used in a subsequent patch. [ kwolf: Also free explicit_options and options, and explicitly qobject_ref() the value when it continues to be used. This makes future memory leaks less likely. ] Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210708114709.206487-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-05util/async: add a human-readable name to BHs for debuggingStefan Hajnoczi
It can be difficult to debug issues with BHs in production environments. Although BHs can usually be identified by looking up their ->cb() function pointer, this requires debug information for the program. It is also not possible to print human-readable diagnostics about BHs because they have no identifier. This patch adds a name to each BH. The name is not unique per instance but differentiates between cb() functions, which is usually enough. It's done by changing aio_bh_new() and friends to macros that stringify cb. The next patch will use the name field when reporting leaked BHs. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210414200247.917496-2-stefanha@redhat.com>
2021-07-02Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches - Supporting changing 'file' in x-blockdev-reopen - ssh: add support for sha256 host key fingerprints - vhost-user-blk: Implement reconnection during realize - introduce QEMU_AUTO_VFREE - Don't require password of encrypted backing file for image creation - Code cleanups # gpg: Signature made Wed 30 Jun 2021 17:00:55 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (24 commits) vhost-user-blk: Implement reconnection during realize vhost-user-blk: Factor out vhost_user_blk_realize_connect() vhost: Distinguish errors in vhost_dev_get_config() vhost-user-blk: Add Error parameter to vhost_user_blk_start() vhost: Return 0/-errno in vhost_dev_init() vhost: Distinguish errors in vhost_backend_init() vhost: Add Error parameter to vhost_dev_init() block/ssh: add support for sha256 host key fingerprints block/commit: use QEMU_AUTO_VFREE introduce QEMU_AUTO_VFREE iotests: Test replacing files with x-blockdev-reopen block: Allow changing bs->file on reopen block: BDRVReopenState: drop replace_backing_bs field block: move supports_backing check to bdrv_set_file_or_backing_noperm() block: bdrv_reopen_parse_backing(): simplify handling implicit filters block: bdrv_reopen_parse_backing(): don't check frozen child block: bdrv_reopen_parse_backing(): don't check aio context block: introduce bdrv_set_file_or_backing_noperm() block: introduce bdrv_remove_file_or_backing_child() block: comment graph-modifying function not updating permissions ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-30Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into ↵Peter Maydell
staging hw/nvme patches * namespace eui64 support (Heinrich) * aiocb refactoring (Klaus) * controller parameter for auto zone transitioning (Niklas) * misc fixes and additions (Gollu, Klaus, Keith) # gpg: Signature made Tue 29 Jun 2021 19:46:55 BST # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * remotes/nvme/tags/nvme-next-pull-request: (23 commits) hw/nvme: add 'zoned.zasl' to documentation hw/nvme: fix pin-based interrupt behavior (again) hw/nvme: fix missing check for PMR capability hw/nvme: documentation fix hw/nvme: fix endianess conversion and add controller list Partially revert "hw/block/nvme: drain namespaces on sq deletion" hw/nvme: reimplement format nvm to allow cancellation hw/nvme: reimplement zone reset to allow cancellation hw/nvme: reimplement the copy command to allow aio cancellation hw/nvme: add dw0/1 to the req completion trace event hw/nvme: use prinfo directly in nvme_check_prinfo and nvme_dif_check hw/nvme: remove assert from nvme_get_zone_by_slba hw/nvme: save reftag when generating pi hw/nvme: reimplement dsm to allow cancellation hw/nvme: add nvme_block_status_all helper hw/nvme: reimplement flush to allow cancellation hw/nvme: default for namespace EUI-64 hw/nvme: namespace parameter for EUI-64 hw/nvme: fix csi field for cns 0x00 and 0x11 hw/nvme: add param to control auto zone transitioning to zone state closed ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-29block: Allow changing bs->file on reopenAlberto Garcia
When the x-blockdev-reopen was added it allowed reconfiguring the graph by replacing backing files, but changing the 'file' option was forbidden. Because of this restriction some operations are not possible, notably inserting and removing block filters. This patch adds support for replacing the 'file' option. This is similar to replacing the backing file and the user is likewise responsible for the correctness of the resulting graph, otherwise this can lead to data corruption. Signed-off-by: Alberto Garcia <berto@igalia.com> [vsementsov: bdrv_reopen_parse_file_or_backing() is modified a lot] Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610120537.196183-9-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-29block: BDRVReopenState: drop replace_backing_bs fieldVladimir Sementsov-Ogievskiy
It's used only in bdrv_reopen_commit(). "backing" is covered by the loop through all children except for case when we removed backing child during reopen. Make it more obvious and drop extra boolean field: qdict_del will not fail if there is no such entry. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610120537.196183-8-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-29hw/nvme: fix endianess conversion and add controller listGollu Appalanaidu
Add the controller identifiers list CNS 0x13, available list of ctrls in NVM Subsystem that may or may not be attached to namespaces. In Identify Ctrl List of the CNS 0x12 and 0x13 no endian conversion for the nsid field. These two CNS values shows affect when there exists a Subsystem. Added condition if there is no Subsystem return invalid field in command. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-06-29hw/nvme: use prinfo directly in nvme_check_prinfo and nvme_dif_checkKlaus Jensen
The nvme_check_prinfo() and nvme_dif_check() functions operate on the 16 bit "control" member of the NvmeCmd. These functions do not otherwise operate on an NvmeCmd or an NvmeRequest, so change them to expect the actual 4 bit PRINFO field and add constants that work on this field as well. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-06-29hw/nvme: add identify namespace flbas/mc enumsGollu Appalanaidu
Add enums for the Identify Namespace FLBAS and MC fields. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> [k.jensen: squashed separate flbas/mc commits into one] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-06-28Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into ↵Peter Maydell
staging * Some Meson test conversions * KVM dirty page ring buffer fix * KVM TSC scaling support * Fixes for SG_IO with /dev/sdX devices * (Non)support for host devices on iOS * -smp cleanups # gpg: Signature made Fri 25 Jun 2021 15:16:18 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (28 commits) machine: reject -smp dies!=1 for non-PC machines machine: pass QAPI struct to mc->smp_parse machine: add error propagation to mc->smp_parse machine: move common smp_parse code to caller machine: move dies from X86MachineState to CpuTopology file-posix: handle EINTR during ioctl block: detect DKIOCGETBLOCKCOUNT/SIZE before use block: try BSD disk size ioctls one after another block: check for sys/disk.h block: feature detection for host block support file-posix: try BLKSECTGET on block devices too, do not round to power of 2 block: add max_hw_transfer to BlockLimits block-backend: align max_transfer to request alignment osdep: provide ROUND_DOWN macro scsi-generic: pass max_segments via max_iov field in BlockLimits file-posix: fix max_iov for /dev/sg devices KVM: Fix dirty ring mmap incorrect size due to renaming accident configure, meson: convert libusbredir detection to meson configure, meson: convert libcacard detection to meson configure, meson: convert libusb detection to meson ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-25block-copy: atomic .cancelled and .finished fields in BlockCopyCallStateEmanuele Giuseppe Esposito
By adding acquire/release pairs, we ensure that .ret and .error_is_read fields are written by block_copy_dirty_clusters before .finished is true, and that they are read by API user after .finished is true. The atomic here are necessary because the fields are concurrently modified in coroutines, and read outside. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-6-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25block: add max_hw_transfer to BlockLimitsPaolo Bonzini
For block host devices, I/O can happen through either the kernel file descriptor I/O system calls (preadv/pwritev, io_submit, io_uring) or the SCSI passthrough ioctl SG_IO. In the latter case, the size of each transfer can be limited by the HBA, while for file descriptor I/O the kernel is able to split and merge I/O in smaller pieces as needed. Applying the HBA limits to file descriptor I/O results in more system calls and suboptimal performance, so this patch splits the max_transfer limit in two: max_transfer remains valid and is used in general, while max_hw_transfer is limited to the maximum hardware size. max_hw_transfer can then be included by the scsi-generic driver in the block limits page, to ensure that the stricter hardware limit is used. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-18nbd/client-connection: add option for non-blocking connection attemptVladimir Sementsov-Ogievskiy
We'll need a possibility of non-blocking nbd_co_establish_connection(), so that it returns immediately, and it returns success only if a connections was previously established in background. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-30-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd/client-connection: return only one io channelVladimir Sementsov-Ogievskiy
block/nbd doesn't need underlying sioc channel anymore. So, we can update nbd/client-connection interface to return only one top-most io channel, which is more straight forward. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-27-vsementsov@virtuozzo.com> [eblake: squash in Vladimir's fixes for uninit usage caught by clang] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd/client-connection: implement connection retryVladimir Sementsov-Ogievskiy
Add an option for a thread to retry connecting until it succeeds. We'll use nbd/client-connection both for reconnect and for initial connection in nbd_open(), so we need a possibility to use same NBDClientConnection instance to connect once in nbd_open() and then use retry semantics for reconnect. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-21-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd/client-connection: add possibility of negotiationVladimir Sementsov-Ogievskiy
Add arguments and logic to support nbd negotiation in the same thread after successful connection. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-20-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd: move connection code from block/nbd to nbd/client-connectionVladimir Sementsov-Ogievskiy
We now have bs-independent connection API, which consists of four functions: nbd_client_connection_new() nbd_client_connection_release() nbd_co_establish_connection() nbd_co_establish_connection_cancel() Move them to a separate file together with NBDClientConnection structure which becomes private to the new API. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-18-vsementsov@virtuozzo.com> [eblake: comment tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18async: the main AioContext is only "current" if under the BQLPaolo Bonzini
If we want to wake up a coroutine from a worker thread, aio_co_wake() currently does not work. In that scenario, aio_co_wake() calls aio_co_enter(), but there is no current AioContext and therefore qemu_get_current_aio_context() returns the main thread. aio_co_wake() then attempts to call aio_context_acquire() instead of going through aio_co_schedule(). The default case of qemu_get_current_aio_context() was added to cover synchronous I/O started from the vCPU thread, but the main and vCPU threads are quite different. The main thread is an I/O thread itself, only running a more complicated event loop; the vCPU thread instead is essentially a worker thread that occasionally calls qemu_mutex_lock_iothread(). It is only in those critical sections that it acts as if it were the home thread of the main AioContext. Therefore, this patch detaches qemu_get_current_aio_context() from iothreads, which is a useless complication. The AioContext pointer is stored directly in the thread-local variable, including for the main loop. Worker threads (including vCPU threads) optionally behave as temporary home threads if they have taken the big QEMU lock, but if that is not the case they will always schedule coroutines on remote threads via aio_co_schedule(). With this change, the stub qemu_mutex_iothread_locked() must be changed from true to false. The previous value of true was needed because the main thread did not have an AioContext in the thread-local variable, but now it does have one. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210609122234.544153-1-pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: tweak commit message per Vladimir's review] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-04vl: plumb keyval-based options into -readconfigPaolo Bonzini
Let -readconfig support parsing command line options into QDict or QemuOpts. This will be used to add back support for objects in -readconfig. Cc: Markus Armbruster <armbru@redhat.com> Cc: qemu-stable@nongnu.org Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210524105752.3318299-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-02block: drop BlockBackendRootState::read_onlyVladimir Sementsov-Ogievskiy
Instead of keeping additional boolean field, let's store the information in BDRV_O_RDWR bit of BlockBackendRootState::open_flags. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210527154056.70294-4-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02block: drop BlockDriverState::read_onlyVladimir Sementsov-Ogievskiy
This variable is just a cache for !(bs->open_flags & BDRV_O_RDWR), which we have to synchronize everywhere. Let's just drop it and consistently use bdrv_is_read_only(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210527154056.70294-3-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02block/vvfat: child_vvfat_qcow: add .get_parent_aio_context, fix crashVladimir Sementsov-Ogievskiy
Commit 3ca1f3225727419ba573673b744edac10904276f "block: BdrvChildClass: add .get_parent_aio_context handler" introduced new handler and commit 228ca37e12f97788e05bd0c92f89b3e5e4019607 "block: drop ctx argument from bdrv_root_attach_child" made a generic use of it. But 3ca1f3225727419ba573673b744edac10904276f didn't update child_vvfat_qcow. Fix that. Before that fix the command ./build/qemu-system-x86_64 -usb -device usb-storage,drive=fat16 \ -drive file=fat:rw:fat-type=16:"<path of a host folder>",id=fat16,format=raw,if=none crashes: 1 bdrv_child_get_parent_aio_context (c=0x559d62426d20) at ../block.c:1440 2 bdrv_attach_child_common (child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, perm=3, shared_perm=4, opaque=0x559d62445690, child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60) at ../block.c:2795 3 bdrv_attach_child_noperm (parent_bs=0x559d62445690, child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60) at ../block.c:2855 4 bdrv_attach_child (parent_bs=0x559d62445690, child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, errp=0x7ffc74c2ae60) at ../block.c:2953 5 bdrv_open_child (filename=0x559d62464b80 "/var/tmp/vl.h3TIS4", options=0x559d6246ec20, bdref_key=0x559d606f9e3d "write-target", parent=0x559d62445690, child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, allow_none=false, errp=0x7ffc74c2ae60) at ../block.c:3351 6 enable_write_target (bs=0x559d62445690, errp=0x7ffc74c2ae60) at ../block/vvfat.c:3176 7 vvfat_open (bs=0x559d62445690, options=0x559d6244adb0, flags=155650, errp=0x7ffc74c2ae60) at ../block/vvfat.c:1236 8 bdrv_open_driver (bs=0x559d62445690, drv=0x559d60d4f7e0 <bdrv_vvfat>, node_name=0x0, options=0x559d6244adb0, open_flags=155650, errp=0x7ffc74c2af70) at ../block.c:1557 9 bdrv_open_common (bs=0x559d62445690, file=0x0, options=0x559d6244adb0, errp=0x7ffc74c2af70) at ... (gdb) fr 1 #1 0x0000559d603ea3bf in bdrv_child_get_parent_aio_context (c=0x559d62426d20) at ../block.c:1440 1440 return c->klass->get_parent_aio_context(c); (gdb) p c->klass $1 = (const BdrvChildClass *) 0x559d60c58d20 <child_vvfat_qcow> (gdb) p c->klass->get_parent_aio_context $2 = (AioContext *(*)(BdrvChild *)) 0x0 Fixes: 3ca1f3225727419ba573673b744edac10904276f Fixes: 228ca37e12f97788e05bd0c92f89b3e5e4019607 Reported-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210524101257.119377-2-vsementsov@virtuozzo.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-05-26replication: move include out of root directoryPaolo Bonzini
The replication.h file is included from migration/colo.c and tests/unit/test-replication.c, so it should be in include/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-18Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into ↵Peter Maydell
staging emulated nvme updates * various fixes (Gollu Appalanaidu) * refactoring (me) * move to hw/nvme from hw/block (me) # gpg: Signature made Mon 17 May 2021 10:16:01 BST # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * remotes/nvme/tags/nvme-next-pull-request: hw/nvme: move nvme emulation out of hw/block hw/block/nvme: move zoned constraints checks hw/block/nvme: remove irrelevant zone resource checks hw/block/nvme: remove num_namespaces member hw/block/nvme: streamline namespace array indexing hw/block/nvme: add metadata offset helper hw/block/nvme: cache lba and ms sizes hw/block/nvme: replace nvme_ns_status hw/block/nvme: remove non-shared defines from header file hw/block/nvme: cleanup includes hw/block/nvme: consolidate header files hw/block/nvme: rename __nvme_select_ns_iocs hw/block/nvme: rename __nvme_advance_zone_wp hw/block/nvme: rename __nvme_zrm_open hw/block/nvme: align with existing style hw/block/nvme: function formatting fix hw/block/nvme: fix io-command set profile feature hw/block/nvme: consider metadata read aio return value in compare hw/block/nvme: rename reserved fields declarations hw/block/nvme: remove redundant invalid_lba_range trace Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-05-17hw/block/nvme: align with existing styleGollu Appalanaidu
While QEMU coding style prefers lowercase hexadecimals in constants, the NVMe subsystem uses the format from the NVMe specifications in comments, i.e. 'h' suffix instead of '0x' prefix. Fix this up across the code base. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> [k.jensen: updated message; added conversion in a couple of missing comments] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-05-17hw/block/nvme: rename reserved fields declarationsGollu Appalanaidu
Align the 'rsvd1' reserved field declaration in NvmeBar with existing style. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> [k.jensen: minor commit message fixup] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-05-14write-threshold: deal with includesVladimir Sementsov-Ogievskiy
"qemu/typedefs.h" is enough for include/block/write-threshold.h header with forward declaration of BlockDriverState. Also drop extra includes from block/write-threshold.c and tests/unit/test-write-threshold.c Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210506090621.11848-9-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14block/write-threshold: drop extra APIsVladimir Sementsov-Ogievskiy
bdrv_write_threshold_exceeded() is unused. bdrv_write_threshold_is_set() is used only to double check the value of bs->write_threshold_offset in tests. No real sense in it (both tests do check real value with help of bdrv_write_threshold_get()) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [mreitz: Adjusted commit message as per Eric's suggestion] Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14block: drop write notifiersVladimir Sementsov-Ogievskiy
They are unused now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-3-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14block/write-threshold: don't use write notifiersVladimir Sementsov-Ogievskiy
write-notifiers are used only for write-threshold. New code for such purpose should create filters. Let's better special-case write-threshold and drop write notifiers at all. (Actually, write-threshold is special-cased anyway, as the only user of write-notifiers) So, create a new direct interface for bdrv_co_write_req_prepare() and drop all write-notifier related logic from write-threshold.c. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [mreitz: Adjusted comment as per Eric's suggestion] Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14mirror: stop cancelling in-flight requests on non-force cancel in READYVladimir Sementsov-Ogievskiy
If mirror is READY than cancel operation is not discarding the whole result of the operation, but instead it's a documented way get a point-in-time snapshot of source disk. So, we should not cancel any requests if mirror is READ and force=false. Let's fix that case. Note, that bug that we have before this commit is not critical, as the only .bdrv_cancel_in_flight implementation is nbd_cancel_in_flight() and it cancels only requests waiting for reconnection, so it should be rare case. Fixes: 521ff8b779b11c394dbdc43f02e158dd99df308a Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210421075858.40197-1-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-04-30block: Add BDRV_O_NO_SHARE for blk_new_open()Kevin Wolf
Normally, blk_new_open() just shares all permissions. This was fine originally when permissions only protected against uses in the same process because no other part of the code would actually get to access the block nodes opened with blk_new_open(). However, since we use it for file locking now, unsharing permissions becomes desirable. Add a new BDRV_O_NO_SHARE flag that is used in blk_new_open() to unshare any permissions that can be unshared. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210422164344.283389-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30block: refactor bdrv_child_set_perm_safe() transaction actionVladimir Sementsov-Ogievskiy
Old interfaces dropped, nobody directly calls bdrv_child_set_perm_abort() and bdrv_child_set_perm_commit(), so we can use personal state structure for the action and stop exploiting BdrvChild structure. Also, drop "_safe" suffix which is redundant now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-35-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30block: bdrv_reopen_multiple: refresh permissions on updated graphVladimir Sementsov-Ogievskiy
Move bdrv_reopen_multiple to new paradigm of permission update: first update graph relations, then do refresh the permissions. We have to modify reopen process in file-posix driver: with new scheme we don't have prepared permissions in raw_reopen_prepare(), so we should reconfigure fd in raw_check_perm(). Still this seems more native and simple anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-31-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30block: make bdrv_refresh_limits() to be a transaction actionVladimir Sementsov-Ogievskiy
To be used in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-28-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30block: introduce bdrv_drop_filter()Vladimir Sementsov-Ogievskiy
Using bdrv_replace_node() for removing filter is not good enough: it keeps child reference of the filter, which may conflict with original top node during permission update. Instead let's create new interface, which will do all graph modifications first and then update permissions. Let's modify bdrv_replace_node_common(), allowing it additionally drop backing chain child link pointing to new node. This is quite appropriate for bdrv_drop_intermediate() and makes possible to add new bdrv_drop_filter() as a simple wrapper. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-24-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30block: make bdrv_reopen_{prepare,commit,abort} privateVladimir Sementsov-Ogievskiy
These functions are called only from bdrv_reopen_multiple() in block.c. No reason to publish them. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-8-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>