summaryrefslogtreecommitdiff
path: root/block.h
AgeCommit message (Collapse)Author
2009-09-11block: add aio_flush operationChristoph Hellwig
Instead stalling the VCPU while serving a cache flush try to do it asynchronously. Use our good old helper thread pool to issue an asynchronous fdatasync for raw-posix. Note that while Linux AIO implements a fdatasync operation it is not useful for us because it isn't actually implement in asynchronous fashion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11block: add enable_write_cache flagChristoph Hellwig
Add a enable_write_cache flag in the block driver state, and use it to decide if we claim to have a volatile write cache that needs controlled flushing from the guest. The flag is off if cache=writethrough is defined because O_DSYNC guarantees that every write goes to stable storage, and it is on for cache=none and cache=writeback. Both scsi-disk and ide now use the new flage, changing from their defaults of always off (ide) or always on (scsi-disk). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11Add bdrv_aio_multiwriteKevin Wolf
One performance problem of qcow2 during the initial image growth are sequential writes that are not cluster aligned. In this case, when a first requests requires to allocate a new cluster but writes only to the first couple of sectors in that cluster, the rest of the cluster is zeroed - just to be overwritten by the following second request that fills up the cluster. Let's try to merge sequential write requests to the same cluster, so we can avoid to write the zero padding to the disk in the first place. As a nice side effect, also other formats take advantage of dealing with less and larger requests. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27raw-posix: add Linux native AIO supportChristoph Hellwig
Now that do have a nicer interface to work against we can add Linux native AIO support. It's an extremly thing layer just setting up an iocb for the io_submit system call in the submission path, and registering an eventfd with the qemu poll handler to do complete the iocbs directly from there. This started out based on Anthony's earlier AIO patch, but after estimated 42,000 rewrites and just as many build system changes there's not much left of it. To enable native kernel aio use the aio=native sub-command on the drive command line. I have also added an option to qemu-io to test the aio support without needing a guest. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-16replace bdrv_{get, put}_buffer with bdrv_{load, save}_vmstateChristoph Hellwig
The VM state offset is a concept internal to the image format. Replace the old bdrv_{get,put}_buffer method that require an index into the image file that is constructed from the VM state offset and an offset into the vmstate with the bdrv_{load,save}_vmstate that just take an offset into the VM state. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09Revert "support colon in filenames"Anthony Liguori
This reverts commit 707c0dbc97cddfe8d2441b8259c6c526d99f2dd8. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09qcow2: Make cache=writethrough defaultKevin Wolf
The performance of qcow2 has improved meanwhile, so we don't need to special-case it any more. Switch the default to write-through caching like all other block drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29support colon in filenamesRam Pai
Problem: It is impossible to feed filenames with the character colon because qemu interprets such names as a protocol. For example filename scsi:0, is interpreted as a protocol by name "scsi". This patch allows user to espace colon characters. For example the above filename can now be expressed either as 'scsi\:0' or as file:scsi:0 anything following the "file:" tag is interpreted verbatin. However if "file:" tag is omitted then any colon characters in the string must be escaped using backslash. Here are couple of examples: scsi\:0\:abc is a local file scsi:0:abc http\://myweb is a local file by name http://myweb file:scsi:0:abc is a local file scsi:0:abc file:http://myweb is a local file by name http://myweb Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16Prevent CD-ROM media eject while device is lockedMark McLoughlin
Section 10.8.25 ("START/STOP UNIT Command") of SFF-8020i states that if the device is locked we should refuse to eject if the device is locked. ASC_MEDIA_REMOVAL_PREVENTED is the appropriate return in this case. In order to stop itself from ejecting the media it is running from, Fedora's installer (anaconda) requires the CDROMEJECT ioctl() to fail if the drive has been previously locked. See also https://bugzilla.redhat.com/501412 Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-05-22Convert all block drivers to new bdrv_createKevin Wolf
Now we can make use of the newly introduced option structures. Instead of having bdrv_create carry more and more parameters (which are format specific in most cases), just pass a option structure as defined by the driver itself. bdrv_create2() contains an emulation of the old interface to simplify the transition. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-05-14Convert block infrastructure to use new module init functionalityAnthony Liguori
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-04-21Introduce bdrv_check (Kevin Wolf)aliguori
From: Kevin Wolf <kwolf@redhat.com> Introduce a new bdrv_check function pointer for block drivers. Modify qcow2 to return an error status in check_refcounts(), so it can implement bdrv_check. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7214 c046a42c-6fe2-441c-8c8c-71466251a162
2009-04-07remove bdrv_aio_read/bdrv_aio_write (Christoph Hellwig)aliguori
Always use the vectored APIs to reduce code churn once we switch the BlockDriver API to be vectored. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7019 c046a42c-6fe2-441c-8c8c-71466251a162
2009-04-05Fix savevm after BDRV_FILE size enforcementaliguori
We now enforce that you cannot write beyond the end of a non-growable file. qcow2 files are not growable but we rely on them being growable to do savevm/loadvm. Temporarily allow them to be growable by introducing a new API specifically for savevm read/write operations. Reported-by: malc Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6994 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-28block: support known backing format for image create and open (Uri Lublin)aliguori
Added a backing_format field to BlockDriverState. Added bdrv_create2 and drv->bdrv_create2 to create an image with a known backing file format. Upon bdrv_open2 if backing format is known use it, instead of probing the (backing) image. Signed-off-by: Uri Lublin <uril@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6908 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-28new scsi-generic abstraction, use SG_IO (Christoph Hellwig)aliguori
Okay, I started looking into how to handle scsi-generic I/O in the new world order. I think the best is to use the SG_IO ioctl instead of the read/write interface as that allows us to support scsi passthrough on disk/cdrom devices, too. See Hannes patch on the kvm list from August for an example. Now that we always do ioctls we don't need another abstraction than bdrv_ioctl for the synchronous requests for now, and for asynchronous requests I've added a aio_ioctl abstraction keeping it simple. Long-term we might want to move the ops to a higher-level abstraction and let the low-level code fill out the request header, but I'm lazy enough to leave that to the people trying to support scsi-passthrough on a non-Linux OS. Tested lightly by issuing various sg_ commands from sg3-utils in a guest to a host CDROM device. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6895 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-12Add specialized block driver scsi generic API (Avi Kivity)aliguori
When a scsi device is backed by a scsi generic device instead of an ordinary host block device, the block API is abused in a couple of annoying ways: - nb_sectors is negative, and specifies a byte count instead of a sector count - offset is ignored, since scsi-generic is essentially a packet protocol This overloading makes hacking the block layer difficult. Remove it by introducing a new explicit API for scsi-generic devices. The new API is still backed by the old implementation, but at least the users are insulated. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6822 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-11Revert r6405aliguori
This series is broken by design as it requires expensive IO operations at open time causing very long delays when starting a virtual machine for the first time. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6815 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-11Revert r6407aliguori
This series is broken by design as it requires expensive IO operations at open time causing very long delays when starting a virtual machine for the first time. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6813 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-05monitor: Rework API (Jan Kiszka)aliguori
Refactor the monitor API and prepare it for decoupled terminals: term_print functions are renamed to monitor_* and all monitor services gain a new parameter (mon) that will once refer to the monitor instance the output is supposed to appear on. However, the argument remains unused for now. All monitor command callbacks are also extended by a mon parameter so that command handlers are able to pass an appropriate reference to monitor output services. For the case that monitor outputs so far happen without clearly identifiable context, the global variable cur_mon is introduced that shall once provide a pointer either to the current active monitor (while processing commands) or to the default one. On the mid or long term, those use case will be obsoleted so that this variable can be removed again. Due to the broad usage of the monitor interface, this patch mostly deals with converting users of the monitor API. A few of them are already extended to pass 'mon' from the command handler further down to internal functions that invoke monitor_printf. At this chance, monitor-related prototypes are moved from console.h to a new monitor.h. The same is done for the readline API. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6711 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-05monitor: Rework early disk password inquiry (Jan Kiszka)aliguori
Reading the passwords for encrypted hard disks during early startup is broken (I guess for quiet a while now): - No monitor terminal is ready for input at this point - Forcing all mux'ed terminals into monitor mode can confuse other users of that channels To overcome these issues and to lay the ground for a clean decoupling of monitor terminals, this patch changes the initial password inquiry as follows: - Prevent autostart if there is some encrypted disk - Once the user tries to resume the VM, prompt for all missing passwords - Only resume if all passwords were accepted Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6707 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-05block: Introduce bdrv_get_encrypted_filename (Jan Kiszka)aliguori
Introduce bdrv_get_encrypted_filename service to allow more informative password prompting. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6704 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-05block: Improve bdrv_iterate (Jan Kiszka)aliguori
Make bdrv_iterate more useful by passing the BlockDriverState to the iterator instead of the device name. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6703 c046a42c-6fe2-441c-8c8c-71466251a162
2009-01-22qcow2 format: keep 'num_free_bytes', and show it upon 'info blockstats' (Uri ↵aliguori
Lublin) 'num_free_bytes' is the number of non-allocated bytes below highest-allocation. It's useful, together with the highest-allocation, to figure out how fragmented the image is, and how likely it will run out-of-space soon. For example when the highest allocation is high (almost end-of-disk), but many bytes (clusters) are free, and can be re-allocated when neeeded, than we know it's probably not going to reach end-of-disk-space soon. Added bookkeeping to block-qcow2.c Export it using BlockDeviceInfo Show it upon 'info blockstats' if BlockDeviceInfo exists Signed-off-by: Uri Lublin <uril@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6407 c046a42c-6fe2-441c-8c8c-71466251a162
2009-01-22block-qcow2: export highest_allocated through BlockDriverInfo and get_info() ↵aliguori
(Uri Lublin) Signed-off-by: Uri Lublin <uril@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6405 c046a42c-6fe2-441c-8c8c-71466251a162
2009-01-22Vectored block device API (Avi Kivity)aliguori
Most devices that are capable of DMA are also capable of scatter-gather. With the memory mapping API, this means that the device code needs to be able to access discontiguous host memory regions. For block devices, this translates to vectored I/O. This patch implements an aynchronous vectored interface for the qemu block devices. At the moment all I/O is bounced and submitted through the non-vectored API; in the future we will convert block devices to natively support vectored I/O wherever possible. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6397 c046a42c-6fe2-441c-8c8c-71466251a162
2008-12-04Use writeback caching by default with qcow2aliguori
qcow2 writes a cluster reference count on every cluster update. This causes performance to crater when using anything but cache=writeback. This is most noticeable when using savevm. Right now, qcow2 isn't a reliable format regardless of the type of cache your using because metadata is not updated in the correct order. Considering this, I think it's somewhat reasonable to use writeback caching by default with qcow2 files. It at least avoids the massive performance regression for users until we sort out the issues in qcow2. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5879 c046a42c-6fe2-441c-8c8c-71466251a162
2008-11-25Abstract out geometry detection code from IDE for reusealiguori
Virtio will want to use the geometry detection code. It doesn't belong in ide.c anyway. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5797 c046a42c-6fe2-441c-8c8c-71466251a162
2008-11-08Use an option rom instead of boot sector for -kernelaliguori
Generate an option rom instead of using a hijacked boot sector for kernel booting. This just requires adding a small option ROM header and a few more instructions to the boot sector to take over the int19 vector and run our boot code. A disk is no longer needed when using -kernel on x86. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5650 c046a42c-6fe2-441c-8c8c-71466251a162
2008-10-14Expand cache= option and use write-through caching by defaultaliguori
This patch changes the cache= option to accept none, writeback, or writethough to control the host page cache behavior. By default, writethrough caching is now used which internally is implemented by using O_DSYNC to open the disk images. When using -snapshot, writeback is used by default since data integrity it not at all an issue. cache=none has the same behavior as cache=off previously. The later syntax is still supported by now deprecated. I also cleaned up the O_DIRECT implementation to avoid many of the #ifdefs. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5485 c046a42c-6fe2-441c-8c8c-71466251a162
2008-10-06Add bdrv_flush_all()aliguori
This patch adds a bdrv_flush_all() function. It's necessary to ensure that all IO operations have been flushed to disk before completely a live migration. N.B. we don't actually use this now. We really should flush the block drivers using an live savevm callback to avoid unnecessary guest down time. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5432 c046a42c-6fe2-441c-8c8c-71466251a162
2008-09-22Refactor AIO to allow multiple AIO implementationsaliguori
This patch refactors the AIO layer to allow multiple AIO implementations. It's only possible because of the recent signalfd() patch. Right now, the AIO infrastructure is pretty specific to the block raw backend. For other block devices to implement AIO, the qemu_aio_wait function must support registration. This patch introduces a new function, qemu_aio_set_fd_handler, which can be used to register a file descriptor to be called back. qemu_aio_wait() now polls a set of file descriptors registered with this function until one becomes readable or writable. This patch should allow the implementation of alternative AIO backends (via a thread pool or linux-aio) and AIO backends in non-traditional block devices (like NBD). Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5297 c046a42c-6fe2-441c-8c8c-71466251a162
2008-09-15Use common objects for qemu-img and qemu-nbdaliguori
Right now, we sprinkle #if defined(QEMU_IMG) && defined(QEMU_NBD) all over the code. It's ugly and causes us to have to build multiple object files for linking against qemu and the tools. This patch introduces a new file, qemu-tool.c which contains enough for qemu-img, qemu-nbd, and QEMU to all share the same objects. This also required getting qemu-nbd to be a bit more Windows friendly. I also changed the Windows block-raw to use normal IO instead of overlapping IO since we don't actually do AIO yet on Windows. I changed the various #if 0's to #if WIN32_AIO to make it easier for someone to eventually fix AIO on Windows. After this patch, there are no longer any #ifdef's related to qemu-img and qemu-nbd. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5226 c046a42c-6fe2-441c-8c8c-71466251a162
2008-09-10Use signalfd() to work around signal/select racealiguori
This patch introduces signalfd() to work around the signal/select race in checking for AIO completions. For platforms that don't support signalfd(), we emulate it with threads. There was a long discussion about this approach. I don't believe there are any fundamental problems with this approach and I believe eliminating the use of signals is a good thing. I've tested Windows and Linux using Windows and Linux guests. I've also checked for disk IO performance regressions. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5187 c046a42c-6fe2-441c-8c8c-71466251a162
2008-07-03Allow QEMU to connect directly to an NBD server, by Laurent Vivier.ths
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4838 c046a42c-6fe2-441c-8c8c-71466251a162
2008-06-05New qemu-img convert -B option, by Marc Bevand.ths
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4672 c046a42c-6fe2-441c-8c8c-71466251a162
2008-03-11Revert fix for CVE-2008-0928. Will be fixed in a different way later.aurel32
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4041 c046a42c-6fe2-441c-8c8c-71466251a162
2008-03-11Fix CVE-2008-0928 - insufficient block device address range checkingaurel32
Qemu 0.9.1 and earlier does not perform range checks for block device read or write requests, which allows guest host users with root privileges to access arbitrary memory and escape the virtual machine. git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4037 c046a42c-6fe2-441c-8c8c-71466251a162
2007-12-24Real SCSI device passthrough (v4), by Laurent Vivier.ths
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3851 c046a42c-6fe2-441c-8c8c-71466251a162
2007-12-24Add "cache" parameter to "-drive" (Laurent Vivier).balrog
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3848 c046a42c-6fe2-441c-8c8c-71466251a162
2007-12-17Fix bdrv_get_geometry to return uint64_t, by Andre Przywara.ths
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3825 c046a42c-6fe2-441c-8c8c-71466251a162
2007-12-02Collecting block device statistics, by Richard W.M. Jones.ths
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3760 c046a42c-6fe2-441c-8c8c-71466251a162
2007-11-17Break up vl.h.pbrook
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3674 c046a42c-6fe2-441c-8c8c-71466251a162
2007-11-11Split block API from vl.h.pbrook
Remove QEMU_TOOL. Replace with QEMU_IMG and NEED_CPU_H. Avoid linking qemu-img against whole system emulatior. git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3578 c046a42c-6fe2-441c-8c8c-71466251a162