summaryrefslogtreecommitdiff
path: root/block/quorum.c
AgeCommit message (Collapse)Author
2014-09-22block: Rename qemu_aio_release -> qemu_aio_unrefFam Zheng
Suggested-by: Benoît Canet <benoit.canet@irqsave.net> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22quorum: Convert quorum_aiocb_info.cancel to .cancel_asyncFam Zheng
Before, we cancel all the child requests with bdrv_aio_cancel, then free the acb.. Now we just kick off asynchronous cancellation of child requests and return, we know quorum_aio_cb will be called later, so in the end quorum_aio_finalize will take care of calling the caller's cb. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22quorum: fix quorum_aio_cancel()Liu Yuan
For a fifo read pattern, we only have one running aio (possible other cases that has less number than num_children in the future), so we need to check if .acb is NULL against bdrv_aio_cancel() to avoid segfault. Cc: Eric Blake <eblake@redhat.com> Cc: Benoit Canet <benoit@irqsave.net> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29quorum: Fix leak of opts in quorum_openFam Zheng
Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29block/quorum: add simple read pattern supportLiu Yuan
This patch adds single read pattern to quorum driver and quorum vote is default pattern. For now we do a quorum vote on all the reads, it is designed for unreliable underlying storage such as non-redundant NFS to make sure data integrity at the cost of the read performance. For some use cases as following: VM -------------- | | v v A B Both A and B has hardware raid storage to justify the data integrity on its own. So it would help performance if we do a single read instead of on all the nodes. Further, if we run VM on either of the storage node, we can make a local read request for better performance. This patch generalize the above 2 nodes case in the N nodes. That is, vm -> write to all the N nodes, read just one of them. If single read fails, we try to read next node in FIFO order specified by the startup command. The 2 nodes case is very similar to DRBD[1] though lack of auto-sync functionality in the single device/node failure for now. But compared with DRBD we still have some advantages over it: - Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed storage. And if A crashes, we need to restart all the VMs on node B. But for practice case, we can't because B might not have enough resources to setup 20 VMs at once. So if we run our 20 VMs with quorum driver, and scatter the replicated images over the data center, we can very likely restart 20 VMs without any resource problem. After all, I think we can build a more powerful replicated image functionality on quorum and block jobs(block mirror) to meet various High Availibility needs. E.g, Enable single read pattern on 2 children, -drive driver=quorum,children.0.file.filename=0.qcow2,\ children.1.file.filename=1.qcow2,read-pattern=fifo,vote-threshold=1 [1] http://en.wikipedia.org/wiki/Distributed_Replicated_Block_Device [Dropped \n from an error_setg() error message --Stefan] Cc: Benoit Canet <benoit@irqsave.net> Cc: Eric Blake <eblake@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-20quorum: Implement bdrv_refresh_filename()Max Reitz
Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27quorum: Add the rewrite-corrupted parameter to quorumBenoît Canet
On read operations when this parameter is set and some replicas are corrupted while quorum can be reached quorum will proceed to rewrite the correct version of the data to fix the corrupted replicas. This will shine with SSD where the FTL will remap the same block at another place on rewrite. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-23qapi event: convert QUORUM eventsWenchao Xia
Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-04quorum: implement .bdrv_detach/attach_aio_context()Stefan Hajnoczi
Implement .bdrv_detach/attach_aio_context() interfaces to propagate detach/attach to BDRVQuorumState->bs[] children. The block layer takes care of ->file and ->backing_hd but doesn't know about our ->bs[] BlockDriverStates, which is also part of the graph. Reviewed-by: Benoît Canet <benoit.canet@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-25Use error_is_set() only when necessary (again)Markus Armbruster
error_is_set(&var) is the same as var != NULL, but it takes whole-program analysis to figure that out. Unnecessarily hard for optimizers, static checkers, and human readers. Commit 84d18f0 dumbed it down to obvious, but a few more have crept in since, and documentation was overlooked. Dumb these down, too. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-19block: Add error handling to bdrv_invalidate_cache()Kevin Wolf
If it returns an error, the migrated VM will not be started, but qemu exits with an error message. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-03-13block: Rewrite the snapshot authorization mechanism for block filters.Benoît Canet
This patch keep the recursive way of doing things but simplify it by giving two responsabilities to all block filters implementors. They will need to do two things: -Set the is_filter field of their block driver to true. -Implement the bdrv_recurse_is_first_non_filter method of their block driver like it is done on the Quorum block driver. (block/quorum.c) [Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes the semantics of blkverify, which now recurses down both bs->file and s->test_file. -- Stefan] Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-28qmp: Make Quorum error events more palatable.Benoît Canet
Insert quorum QMP events documentation alphabetically. Also change the "ret" errno value by an optional "error" being an strerror(-ret) in the QUORUM_REPORT_BAD qmp event. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-21quorum: Simplify quorum_open()Max Reitz
Although it may not look like it, this patch simplifies quorum_open(). qdict_array_split() is now able to return QLists with different objects than only QDicts, therefore it will now do all the work and quorum_open() does not have to handle reference strings by itself. This allows mixing full option dicts and reference strings for specifying the child block devices of quorum; furthermore, it improves handling of malformed specifications. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_open() and quorum_close().Benoît Canet
Example of command line: -drive if=virtio,driver=quorum,\ children.0.file.filename=1.raw,\ children.0.node-name=1.raw,\ children.0.driver=raw,\ children.1.file.filename=2.raw,\ children.1.node-name=2.raw,\ children.1.driver=raw,\ children.2.file.filename=3.raw,\ children.2.node-name=3.raw,\ children.2.driver=raw,\ vote-threshold=2 blkverify=on with vote-threshold=2 and two files can be passed to emulate blkverify. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Implement recursive .bdrv_recurse_is_first_non_filter in quorum.Benoît Canet
This is used to activate quorum snapshot. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_co_flush().Benoît Canet
Makes a vote to select error if any. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_invalidate_cache().Benoît Canet
We really want that live migration works with quorum so implement quorum_invalidate_cache(). Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_getlength().Benoît Canet
Check that every bs file returns the same length. Otherwise, return -EIO to disable the quorum and avoid length discrepancy. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum mechanism.Benoît Canet
This patchset enables the core of the quorum mechanism. The num_children reads are compared to get the majority version and if this version exists more than threshold times the guest won't see the error at all. If a block is corrupted or if an error occurs during an IO or if the quorum cannot be established QMP events are used to report to the management. Use gnutls's SHA-256 to compare versions. --enable-quorum must be used to enable the feature. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_aio_readv.Benoît Canet
Add code to do num_children reads in parallel and cleanup the structures afterwards. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_aio_writev and its dependencies.Benoît Canet
Writes are mirrored num_children times on num_children devices. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Create BDRVQuorumState and BlkDriver and do init.Benoît Canet
Create the structure holding the quorum settings and write the minimal block driver instanciation boilerplate. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Create quorum.c, add QuorumChildRequest and QuorumAIOCB.Benoît Canet
Quorum is a block filter mirroring writes to num_children children. For reads quorum reads each children and does a vote. If more than vote_threshold versions are identical the quorum is reached and this winning version is returned to the guest. So quorum prevents bit corruption. For high availability purpose minority errors are reported via QMP but the guest does not see them. This patch creates the driver C source file and introduces the structures that will be used in asynchronous reads and writes. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>