Age | Commit message (Collapse) | Author |
|
Using the nested aio_poll() in coroutine is a bad idea. This patch
replaces the aio_poll loop in bdrv_drain with a BH, if called in
coroutine.
For example, the bdrv_drain() in mirror.c can hang when a guest issued
request is pending on it in qemu_co_mutex_lock().
Mirror coroutine in this case has just finished a request, and the block
job is about to complete. It calls bdrv_drain() which waits for the
other coroutine to complete. The other coroutine is a scsi-disk request.
The deadlock happens when the latter is in turn pending on the former to
yield/terminate, in qemu_co_mutex_lock(). The state flow is as below
(assuming a qcow2 image):
mirror coroutine scsi-disk coroutine
-------------------------------------------------------------
do last write
qcow2:qemu_co_mutex_lock()
...
scsi disk read
tracked request begin
qcow2:qemu_co_mutex_lock.enter
qcow2:qemu_co_mutex_unlock()
bdrv_drain
while (has tracked request)
aio_poll()
In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return
because the mirror coroutine is blocked in the aio_poll(blocking=true).
With this patch, the added qemu_coroutine_yield() allows the scsi-disk
coroutine to make progress as expected:
mirror coroutine scsi-disk coroutine
-------------------------------------------------------------
do last write
qcow2:qemu_co_mutex_lock()
...
scsi disk read
tracked request begin
qcow2:qemu_co_mutex_lock.enter
qcow2:qemu_co_mutex_unlock()
bdrv_drain.enter
> schedule BH
> qemu_coroutine_yield()
> qcow2:qemu_co_mutex_lock.return
> ...
tracked request end
...
(resumed from BH callback)
bdrv_drain.return
...
Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459855253-5378-2-git-send-email-famz@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
The only remaining users were block jobs (mirror and backup) which
unconditionally enabled WCE on the BlockBackend of the target image. As
these block jobs don't go through BlockBackend for their I/O requests,
they aren't affected by this setting anyway but always get a writeback
mode, so that call can be removed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
|
|
The previous patches have successively made blk->enable_write_cache the
true source for the information whether a writethrough mode must be
implemented. The corresponding BDRV_O_CACHE_WB is only useless baggage
we're carrying around, so now's the time to remove it.
At the same time, we remove the 'cache.writeback' option parsing on the
BDS level as the only effect was setting the BDRV_O_CACHE_WB flag.
This change requires test cases that explicitly enabled the option to
drop it. Other than that and the change of the error message when
writethrough is enabled on the BDS level (from "Can't set writethrough
mode" to "doesn't support the option"), there should be no change in
behaviour.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
|
|
All users are converted to bdrv_parse_cache_mode() now.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
|
|
This function will allow drivers to implement BDRV_REQ_FUA natively
instead of sending a separate flush after the write.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
|
|
Now that WCE is handled on the BlockBackend level, the flag is
meaningless for BDSes. As the schema requires us to fill the field,
we return an enabled write cache for them.
Note that this means that querying the BlockBackend name may return
writethrough as the cache information, whereas querying the node-name of
the root of that same BlockBackend will return writeback.
This may appear odd at first, but it actually makes sense because it
correctly repesents the layer that implements the WCE handling. This
becomes more apparent when you consider nodes that are the root node of
multiple BlockBackends, where each BB can have its own WCE setting.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
|
|
Whether a write cache is used or not is a decision that concerns the
user (e.g. the guest device) rather than the backend. It was already
logically part of the BB level as bdrv_move_feature_fields() always kept
it on top of the BDS tree; with this patch, the core of it (the actual
flag and the additional flushes) is also implemented there.
Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs
doesn't have a BlockBackend attached.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
|
|
It's like bdrv_parse_cache_flags(), except that writethrough mode isn't
included in the flags, but returned as a separate bool.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
|
|
This patch adds callback for flush request. This callback is responsible
for flushing whole block devices stack. bdrv_flush function does not
proceed to underlying devices. It should be performed by this callback
function, if needed.
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
For a couple of releases we have been warning
Encrypted images are deprecated
Support for them will be removed in a future release.
You can use 'qemu-img convert' to convert your image to an unencrypted one.
This warning was issued by system emulators, qemu-img, qemu-nbd
and qemu-io. Such a broad warning was issued because the original
intention was to rip out all the code for dealing with encryption
inside the QEMU block layer APIs.
The new block encryption framework used for the LUKS driver does
not rely on the unloved block layer API for encryption keys,
instead using the QOM 'secret' object type. It is thus no longer
appropriate to warn about encryption unconditionally.
When the qcow/qcow2 drivers are converted to use the new encryption
framework too, it will be practical to keep AES-CBC support present
for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability
with older QEMU versions and liberation of data from existing encrypted
qcow2 files.
This change moves the warning out of the generic block code and
into the qcow/qcow2 drivers. Further, the warning is set to only
appear when running the system emulators, since qemu-img, qemu-io,
qemu-nbd are expected to support qcow2 encryption long term now that
the maint burden has been eliminated.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
When opening an image it is useful to know whether the caller
intends to perform I/O on the image or not. In the case of
encrypted images this will allow the block driver to avoid
having to prompt for decryption keys when we merely want to
query header metadata about the image. eg qemu-img info
This flag is enforced at the top level only, since even if
we don't want todo I/O on the 'qcow2' file payload, the
underlying 'file' driver will still need todo I/O to read
the qcow2 header, for example.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
The function is unused since commit f21d96d0 ('block: Use BdrvChild in
BlockBackend').
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
|
|
The call in hmp_drive_del() is dead code because blk_remove_bs() is
called a few lines above. The only other remaining user is
bdrv_delete(), which only abuses bdrv_make_anon() to remove it from the
named nodes list. This path inlines the list entry removal into
bdrv_delete() and removes bdrv_make_anon().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
|
|
qemu-common.h should only be included by .c files. Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."
qemu/iov.h includes qemu-common.h for QEMUIOVector stuff. Move all
that to qemu/iov.h and drop the ill-advised include. Include
qemu/iov.h where the QEMUIOVector stuff is now missing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Re-run scripts/clean-includes to apply the previous commit's
corrections and updates. Besides redundant qemu/typedefs.h, this only
finds a redundant config-host.h include in ui/egl-helpers.c. No idea
how that escaped the previous runs.
Some manual whitespace trimming around dropped includes squashed in.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use blk_read(), so that it goes through blk_co_preadv() like all read
requests from the BB to the BDS.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
This patch introduces blk_co_preadv() as a central function on the
BlockBackend level that is supposed to handle all read requests from the
BB to its root BDS eventually.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Add a function for iterating over all monitor-owned BlockDriverStates so
the generic block layer can do so.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-6-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
|
|
The only code change is making bdrv_dirty_bitmap_truncate public. It is
used in block.c.
Also two long lines (bdrv_get_dirty) are wrapped.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-5-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
|
|
Following patches to refactor and move block dirty bitmap code could use
this.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-4-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
|
|
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-3-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
|
|
This patch adds an option to the drive_add HMP command to create only a
BlockDriverState without a BlockBackend on top.
The motivation for this is that libvirt needs to specify options to a
migration target (specifically, detect-zeroes). drive-mirror doesn't
allow specifying options, and the proper way to do this is to create the
target BDS separately with blockdev-add (where you can specify options)
and then use blockdev-mirror to that BDS.
However, libvirt can't use blockdev-add as long as it is still
experimental, and we're expecting that it will still take some time, so
we need to resort to drive_add.
The problem with drive_add is that so far it always created a BB, and
BDSes with a BB can't be used as a mirroring target as long as we don't
support multiple BBs per BDS - and while we're working towards that
goal, it's another thing that will still take some time.
So to achieve the goal, the simplest solution to provide the
functionality now without adding one-off options to the mirror QMP
commands is to extend drive_add to create nodes without BBs.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Since commit 91a097e, we end up with a somewhat weird cache mode
configuration with snapshot=on: The commit broke the cache mode
inheritance for the snapshot overlay so that it is opened as
writethrough instead of unsafe now. The following bdrv_append() call to
put it on top of the tree swaps the WCE flag with the snapshot's backing
file (i.e. the originally given file), so what we eventually get is
cache=writeback on the temporary overlay and
cache=writethrough,cache.no-flush=on on the real image file.
This patch changes things so that the temporary overlay gets
cache=unsafe again like it used to, and the real images get whatever the
user specified. This means that cache.direct is now respected even with
snapshot=on, and in the case of committing changes, the final flush is
no longer ignored except explicitly requested by the user.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
|
|
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
NB: If this commit breaks compilation for your out-of-tree
patchseries or fork, then you need to make sure you add
#include "qemu/osdep.h" to any new .c files that you have.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
|
|
This extends the NBD protocol handling code so that it is capable
of negotiating TLS support during the connection setup. This involves
requesting the STARTTLS protocol option before any other NBD options.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-14-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that all callers are converted to use I/O channels for
initial connection setup, it is possible to switch the core
NBD protocol handling core over to use QIOChannel APIs for
actual sockets I/O.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-7-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
With a mirror job running on a virtio-blk dataplane disk, sending "q" to
HMP will cause a dead loop in block_job_finish_sync.
This is because the aio_poll() only processes the AIO context of bs
which has no more work to do, while the main loop BH that is scheduled
for setting the job->completed flag is never processed.
Fix this by adding a flag in BlockJob structure, to track which context
to poll for the block job to make progress. Its value is set to true
when block_job_coroutine_complete() is called, and is checked in
block_job_finish_sync to determine which context to poll.
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1454379144-29807-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
This fixes a regression introduced with commit 3f09bfbc7. Multiple
bugs arise in conjunction with live snapshots and mirroring operations
(which include active layer commit).
After a live snapshot occurs, the active layer and the base layer both
have a non-NULL tqe_prev field in the device_list, although the base
node's tqe_prev field points to a NULL entry. This non-NULL tqe_prev
field occurs after the bdrv_append() in the external snapshot calls
change_parent_backing_link().
In change_parent_backing_link(), when the previous active layer is
removed from device_list, the device_list.tqe_prev pointer is not
set to NULL.
The operating scheme in the block layer is to indicate that a BDS belongs
in the bdrv_states device_list iff the device_list.tqe_prev pointer
is non-NULL.
This patch does two things:
1.) Introduces a new block layer helper bdrv_device_remove() to remove a
BDS from the device_list, and
2.) uses that new API, which also fixes the regression once used in
change_parent_backing_link().
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 0cd51e11c0666c04ddb7c05293fe94afeb551e89.1454376655.git.jcody@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
|
|
The added parameter can be used to return the BDS pointer which the
valid offset is referring to. Its value should be ignored unless
BDRV_BLOCK_OFFSET_VALID in ret is set.
Until block drivers fill in the right value, let's clear it explicitly
right before calling .bdrv_get_block_status.
The "bs->file" condition in bdrv_co_get_block_status is kept now to keep iotest
case 102 passing, and will be fixed once all drivers return the right file
pointer.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-2-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
|
|
As a side effect, we can now make x-blockdev-del's check whether a BDS
is actually owned by the monitor explicit.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
We need this list so that bdrv_close_all() can keep track of which BDSs
are still open after having removed the BDSs from all of the BBs and
having released all monitor BDS references.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
There are no users of bdrv_close() left, except for one of bdrv_open()'s
failure paths, bdrv_close_all() and bdrv_delete(), and that is good.
Make bdrv_close() static so nobody makes the mistake of directly using
bdrv_close() again.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
It is unused now, so we can remove it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Pull out the check whether a block device has a tray from
blk_dev_is_tray_open() into its own function so both attributes (whether
there is a tray vs. whether that tray is open) can be queried
independently.
Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1454096953-31773-2-git-send-email-mreitz@redhat.com
|
|
So far, live migration with shared storage meant that the image is in a
not-really-ready don't-touch-me state on the destination while the
source is still actively using it, but after completing the migration,
the image was fully opened on both sides. This is bad.
This patch adds a block driver callback to inactivate images on the
source before completing the migration. Inactivation means that it goes
to a state as if it was just live migrated to the qemu instance on the
source (i.e. BDRV_O_INACTIVE is set). You're then supposed to continue
either on the source or on the destination, which takes ownership of the
image.
A typical migration looks like this now with respect to disk images:
1. Destination qemu is started, the image is opened with
BDRV_O_INACTIVE. The image is fully opened on the source.
2. Migration is about to complete. The source flushes the image and
inactivates it. Now both sides have the image opened with
BDRV_O_INACTIVE and are expecting the other side to still modify it.
3. One side (the destination on success) continues and calls
bdrv_invalidate_all() in order to take ownership of the image again.
This removes BDRV_O_INACTIVE on the resuming side; the flag remains
set on the other side.
This ensures that the same image isn't written to by both instances
(unless both are resumed, but then you get what you deserve). This is
important because .bdrv_close for non-BDRV_O_INACTIVE images could write
to the image file, which is definitely forbidden while another host is
using the image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
|
|
Instead of covering only the state of images on the migration
destination before the migration is completed, the flag will also cover
the state of images on the migration source after completion. This
common state implies that the image is technically still open, but no
writes will happen and any cached contents will be reloaded from disk if
and when the image leaves this state.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
|
|
Rename the parameter "close" to "close_fn" to disambiguous with
close(2).
This unifies error handling paths of NBDClient allocation:
nbd_client_new will shutdown the socket and call the "close_fn" callback
if negotiation failed, so the caller don't need a different path than
the normal close.
The returned pointer is never used, make it void in preparation for the
next patch.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1452760863-25350-2-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450932306-13717-4-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
|
|
It's necessary to distinguish source and target before we can add
blockdev-mirror, because we would want a concrete type of operation to
check on target bs before starting.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450932306-13717-2-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
|
|
The maximum number of struct iovec elements depends on the
BlockDriverState. The raw-posix and iSCSI protocols have a maximum of
IOV_MAX but others could have different values.
Cc: Peter Lieven <pl@kamp.de>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
The function has gone.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Add an opaque value which is to be passed to the bdrv_amend_options()
status callback.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
bs->options doesn't only contain options that the user explicitly
requested, but also option that were derived from flags, the filename or
inherited from the parent node.
For reopen, it is important to know the difference because reopening the
parent can change inherited values in child nodes, but it shouldn't
change any options that were explicitly specified for the child.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
|
|
Options are not actually inherited from the parent node yet, but this
commit lays the grounds for doing so.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
|
|
In order to decide whether a blkdebug: filename can be produced or a
json: one is necessary, blkdebug checked whether bs->options had more
options than just "config", "x-image" or "image" (the latter including
nested options). That doesn't work well when generic block layer options
are present.
This patch passes an option QDict to the driver that contains only
driver-specific options, i.e. the options for the general block layer as
well as child nodes are already filtered out. Works much better this
way.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
|