summaryrefslogtreecommitdiff
path: root/hw/vfio/pci.c
AgeCommit message (Collapse)Author
2017-10-15pci: Add interface names to hybrid PCI devicesEduardo Habkost
The following devices support both PCI Express and Conventional PCI, by including special code to handle the QEMU_PCI_CAP_EXPRESS flag and/or conditional pcie_endpoint_cap_init() calls: * vfio-pci (is_express=1, but legacy PCI handled by vfio_populate_device()) * vmxnet3 (is_express=0, but PCIe handled by vmxnet3_realize()) * pvscsi (is_express=0, but PCIe handled by pvscsi_realize()) * virtio-pci (is_express=0, but PCIe handled by virtio_pci_dc_realize(), and additional legacy PCI code at virtio_pci_realize()) * base-xhci (is_express=1, but pcie_endpoint_cap_init() call is conditional on pci_bus_is_express(dev->bus) * Note that xhci does not clear QEMU_PCI_CAP_EXPRESS like the other hybrid devices Cc: Dmitry Fleytman <dmitry@daynix.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-10-03vfio/pci: Add NVIDIA GPUDirect Cliques supportAlex Williamson
NVIDIA has defined a specification for creating GPUDirect "cliques", where devices with the same clique ID support direct peer-to-peer DMA. When running on bare-metal, tools like NVIDIA's p2pBandwidthLatencyTest (part of cuda-samples) determine which GPUs can support peer-to-peer based on chipset and topology. When running in a VM, these tools have no visibility to the physical hardware support or topology. This option allows the user to specify hints via a vendor defined capability. For instance: <qemu:commandline> <qemu:arg value='-set'/> <qemu:arg value='device.hostdev0.x-nv-gpudirect-clique=0'/> <qemu:arg value='-set'/> <qemu:arg value='device.hostdev1.x-nv-gpudirect-clique=1'/> <qemu:arg value='-set'/> <qemu:arg value='device.hostdev2.x-nv-gpudirect-clique=1'/> </qemu:commandline> This enables two cliques. The first is a singleton clique with ID 0, for the first hostdev defined in the XML (note that since cliques define peer-to-peer sets, singleton clique offer no benefit). The subsequent two hostdevs are both added to clique ID 1, indicating peer-to-peer is possible between these devices. QEMU only provides validation that the clique ID is valid and applied to an NVIDIA graphics device, any validation that the resulting cliques are functional and valid is the user's responsibility. The NVIDIA specification allows a 4-bit clique ID, thus valid values are 0-15. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-10-03vfio/pci: Add virtual capabilities quirk infrastructureAlex Williamson
If the hypervisor needs to add purely virtual capabilties, give us a hook through quirks to do that. Note that we determine the maximum size for a capability based on the physical device, if we insert a virtual capability, that can change. Therefore if maximum size is smaller after added virt capabilities, use that. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-10-03vfio/pci: Do not unwind on errorAlex Williamson
If vfio_add_std_cap() errors then going to out prepends irrelevant errors for capabilities we haven't attempted to add as we unwind our recursive stack. Just return error. Fixes: 7ef165b9a8d9 ("vfio/pci: Pass an error object to vfio_add_capabilities") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-07-26vfio/pci: fix use of freed memoryPhilippe Mathieu-Daudé
hw/vfio/pci.c:308:29: warning: Use of memory after it is freed qemu_set_fd_handler(*pfd, NULL, NULL, vdev); ^~~~ Reported-by: Clang Static Analyzer Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-07-10vfio/pci: Fixup v0 PCIe capabilitiesAlex Williamson
Intel 82599 VFs report a PCIe capability version of 0, which is invalid. The earliest version of the PCIe spec used version 1. This causes Windows to fail startup on the device and it will be disabled with error code 10. Our choices are either to drop the PCIe cap on such devices, which has the side effect of likely preventing the guest from discovering any extended capabilities, or performing a fixup to update the capability to the earliest valid version. This implements the latter. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-07-10vfio: Test realized when using VFIOGroup.device_list iteratorAlex Williamson
VFIOGroup.device_list is effectively our reference tracking mechanism such that we can teardown a group when all of the device references are removed. However, we also use this list from our machine reset handler for processing resets that affect multiple devices. Generally device removals are fully processed (exitfn + finalize) when this reset handler is invoked, however if the removal is triggered via another reset handler (piix4_reset->acpi_pcihp_reset) then the device exitfn may run, but not finalize. In this case we hit asserts when we start trying to access PCI helpers since much of the PCI state of the device is released. To resolve this, add a pointer to the Object DeviceState in our common base-device and skip non-realized devices as we iterate. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-07-03pci: Replace pci_add_capability2() with pci_add_capability()Mao Zhongyi
After the patch 'Make errp the last parameter of pci_add_capability()', pci_add_capability() and pci_add_capability2() now do exactly the same. So drop the wrapper pci_add_capability() of pci_add_capability2(), then replace the pci_add_capability2() with pci_add_capability() everywhere. Cc: pbonzini@redhat.com Cc: rth@twiddle.net Cc: ehabkost@redhat.com Cc: mst@redhat.com Cc: dmitry@daynix.com Cc: jasowang@redhat.com Cc: marcel@redhat.com Cc: alex.williamson@redhat.com Cc: armbru@redhat.com Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-03pci: Make errp the last parameter of pci_add_capability()Mao Zhongyi
Add Error argument for pci_add_capability() to leverage the errp to pass info on errors. This way is helpful for its callers to make a better error handling when moving to 'realize'. Cc: pbonzini@redhat.com Cc: rth@twiddle.net Cc: ehabkost@redhat.com Cc: mst@redhat.com Cc: jasowang@redhat.com Cc: marcel@redhat.com Cc: alex.williamson@redhat.com Cc: armbru@redhat.com Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-03pci: Fix the wrong assertion.Mao Zhongyi
pci_add_capability returns a strictly positive value on success, correct asserts. Cc: dmitry@daynix.com Cc: jasowang@redhat.com Cc: kraxel@redhat.com Cc: alex.williamson@redhat.com Cc: armbru@redhat.com Cc: marcel@redhat.com Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-03vfio/pci: Fix incorrect error messageDong Jia Shi
When the "No host device provided" error occurs, the hint message that starts with "Use -vfio-pci," makes no sense, since "-vfio-pci" is not a valid command line parameter. Correct this by replacing "-vfio-pci" with "-device vfio-pci". Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-02-22vfio/pci: Improve extended capability comments, skip masked capsAlex Williamson
Since commit 4bb571d857d9 ("pci/pcie: don't assume cap id 0 is reserved") removes the internal use of extended capability ID 0, the comment here becomes invalid. However, peeling back the onion, the code is still correct and we still can't seed the capability chain with ID 0, unless we want to muck with using the version number to force the header to be non-zero, which is much uglier to deal with. The comment also now covers some of the subtleties of using cap ID 0, such as transparently indicating absence of capabilities if none are added. This doesn't detract from the correctness of the referenced commit as vfio in the kernel also uses capability ID zero to mask capabilties. In fact, we should skip zero capabilities precisely because the kernel might also expose such a capability at the head position and re-introduce the problem. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Reported-by: Jintack Lim <jintack@cs.columbia.edu> Tested-by: Jintack Lim <jintack@cs.columbia.edu>
2017-02-22vfio/pci: Report errors from qdev_unplug() via device requestAlex Williamson
Currently we ignore this error, report it with error_reportf_err() Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2017-02-01pci: Convert msix_init() to Error and fix callersCao jin
msix_init() reports errors with error_report(), which is wrong when it's used in realize(). The same issue was fixed for msi_init() in commit 1108b2f. In order to make the API change as small as possible, leave the return value check to later patch. For some devices(like e1000e, vmxnet3, nvme) who won't fail because of msix_init's failure, suppress the error report by passing NULL error object. Bonus: add comment for msix_init. CC: Jiri Pirko <jiri@resnulli.us> CC: Gerd Hoffmann <kraxel@redhat.com> CC: Dmitry Fleytman <dmitry@daynix.com> CC: Jason Wang <jasowang@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Hannes Reinecke <hare@suse.de> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Alex Williamson <alex.williamson@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-01-24vfio: remove a duplicated word in commentsCao jin
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-10-31vfio: Add support for mmapping sub-page MMIO BARsYongji Xie
Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive") allows VFIO to mmap sub-page BARs. This is the corresponding QEMU patch. With those patches applied, we could passthrough sub-page BARs to guest, which can help to improve IO performance for some devices. In this patch, we expand MemoryRegions of these sub-page MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION with a valid size. The expanding size will be recovered when the base address of sub-page BAR is changed and not page aligned any more in guest. And we also set the priority of these BARs' memory regions to zero in case of overlap with BARs which share the same page with sub-page BARs in guest. Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-31vfio/pci: fix out-of-sync BAR information on resetIdo Yariv
When a PCI device is reset, pci_do_device_reset resets all BAR addresses in the relevant PCIDevice's config buffer. The VFIO configuration space stays untouched, so the guest OS may choose to skip restoring the BAR addresses as they would seem intact. The PCI device may be left non-operational. One example of such a scenario is when the guest exits S3. Fix this by resetting the BAR addresses in the VFIO configuration space as well. Signed-off-by: Ido Yariv <ido@wizery.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio: fix duplicate function callCao jin
When vfio device is reset(encounter FLR, or bus reset), if need to do bus reset(vfio_pci_hot_reset_one is called), vfio_pci_pre_reset & vfio_pci_post_reset will be called twice. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Handle host oversightEric Auger
In case the end-user calls qemu with -vfio-pci option without passing either sysfsdev or host property value, the device is interpreted as 0000:00:00.0. Let's create a specific error message to guide the end-user. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Remove vfio_populate_device returned valueEric Auger
The returned value (either -errno or -1) is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Remove vfio_msix_early_setup returned valueEric Auger
The returned value is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Conversion to realizeEric Auger
This patch converts VFIO PCI to realize function. Also original initfn errors now are propagated using QEMU error objects. All errors are formatted with the same pattern: "vfio: %s: the error description" Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio: Pass an error object to vfio_get_deviceEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. In vfio platform vfio_base_device_init we currently just report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio: Pass an error object to vfio_get_groupEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. For the time being let's just simply report the error in vfio platform's vfio_base_device_init(). A subsequent patch will duly propagate the error up to vfio_platform_realize. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_pci_igd_opregion_initEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. In vfio_probe_igd_bar4_quirk, simply report the error. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_add_capabilitiesEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. The error is cascaded downto vfio_add_std_cap and then vfio_msi(x)_setup, vfio_setup_pcie_cap. vfio_add_ext_cap does not return anything else than 0 so let's transform it into a void function. Also use pci_add_capability2 which takes an error object. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_intx_enableEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. The error object is propagated down to vfio_intx_enable_kvm(). The three other callers, vfio_intx_enable_kvm(), vfio_msi_disable_common() and vfio_pci_post_reset() do not propagate the error and simply call error_reportf_err() with the ERR_PREFIX formatting. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_msix_early_setupEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. We now format an error in case of reading failure for - the MSIX flags - the MSIX table, - the MSIX PBA. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_populate_deviceEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. The case where error recovery cannot be enabled is not converted into an error object but directly reported through error_report, as before. Populating an error instead would cause the future realize function to fail, which is not wanted. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_populate_vgaEric Auger
Pass an error object to prepare for the same operation in vfio_populate_device. Eventually this contributes to the migration to VFIO-PCI realize. We now report an error on vfio_get_region_info failure. vfio_probe_igd_bar4_quirk is not involved in the migration to realize and simply calls error_reportf_err. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Use local error object in vfio_initfnEric Auger
To prepare for migration to realize, let's use a local error object in vfio_initfn. Also let's use the same error prefix for all error messages. On top of the 1-1 conversion, we start using a common error prefix for all error messages. We also introduce a similar warning prefix which will be used later on. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-09-15vfio/pci: Fix regression in MSI routing configurationDavid Gibson
d1f6af6 "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup of kvmchip routing configuration, that was mostly intended for x86. However, it also contains a subtle change in behaviour which breaks EEH[1] error recovery on certain VFIO passthrough devices on spapr guests. So far it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be other hardware with the same problem. It's also possible there could be circumstances where it causes a bug on x86 as well, though I don't know of any obvious candidates. Prior to d1f6af6, both vfio_msix_vector_do_use() and vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this as the "dummy" vector used to make the host hardware state sync with the guest expected hardware state in terms of MSI configuration. Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op, meaning the dummy irq would always be delivered via qemu. d1f6af6 changed vfio_add_kvm_msi_virq() so it takes a vector number instead of the msg parameter, and determines the correct message itself. The test for !msg was removed, and not replaced with anything there or in the caller. With an spapr guest which has a VFIO device, if an EEH error occurs on the host hardware, then the device will be isolated then reset. This is a combination of host and guest action, mediated by some EEH related hypercalls. I haven't fully traced the mechanics, but somehow installing the kvm irqchip route for the dummy irq on the BCM5719 means that after EEH reset and recovery, at least some irqs are no longer delivered to the guest. In particular, the guest never gets the link up event, and so the NIC is effectively dead. [1] EEH (Enhanced Error Handling) is an IBM POWER server specific PCI-* error reporting and recovery mechanism. The concept is somewhat similar to PCI-E AER, but the details are different. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1373802 Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Gavin Shan <gwshan@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Cc: qemu-stable@nongnu.org Fixes: d1f6af6a17a6 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-07-21kvm-irqchip: do explicit commit when update irqPeter Xu
In the past, we are doing gsi route commit for each irqchip route update. This is not efficient if we are updating lots of routes in the same time. This patch removes the committing phase in kvm_irqchip_update_msi_route(). Instead, we do explicit commit after all routes updated. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21kvm-irqchip: simplify kvm_irqchip_add_msi_routePeter Xu
Changing the original MSIMessage parameter in kvm_irqchip_add_msi_route into the vector number. Vector index provides more information than the MSIMessage, we can retrieve the MSIMessage using the vector easily. This will avoid fetching MSIMessage every time before adding MSI routes. Meanwhile, the vector info will be used in the coming patches to further enable gsi route update notifications. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-18vfio/pci: Hide ARI capabilityAlex Williamson
QEMU supports ARI on downstream ports and assigned devices may support ARI in their extended capabilities. The endpoint ARI capability specifies the next function, such that the OS doesn't need to walk each possible function, however this next function is relative to the host, not the guest. This leads to device discovery issues when we combine separate functions into virtual multi-function packages in a guest. For example, SR-IOV VFs are not enumerated by simply probing the function address space, therefore the ARI next-function field is zero. When we combine multiple VFs together as a multi-function device in the guest, the guest OS identifies ARI is enabled, relies on this next-function field, and stops looking for additional function after the first is found. Long term we should expose the ARI capability to the guest to enable configurations with more than 8 functions per slot, but this requires additional QEMU PCI infrastructure to manage the next-function field for multiple, otherwise independent devices. In the short term, hiding this capability allows equivalent functionality to what we currently have on non-express chipsets. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-07-05pci: Convert msi_init() to Error and fix callers to check itCao jin
msi_init() reports errors with error_report(), which is wrong when it's used in realize(). Fix by converting it to Error. Fix its callers to handle failure instead of ignoring it. For those callers who don't handle the failure, it might happen: when user want msi on, but he doesn't get what he want because of msi_init fails silently. cc: Gerd Hoffmann <kraxel@redhat.com> cc: John Snow <jsnow@redhat.com> cc: Dmitry Fleytman <dmitry@daynix.com> cc: Jason Wang <jasowang@redhat.com> cc: Michael S. Tsirkin <mst@redhat.com> cc: Hannes Reinecke <hare@suse.de> cc: Paolo Bonzini <pbonzini@redhat.com> cc: Alex Williamson <alex.williamson@redhat.com> cc: Markus Armbruster <armbru@redhat.com> cc: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
2016-06-30vfio/pci: Hide SR-IOV capabilityAlex Williamson
The kernel currently exposes the SR-IOV capability as read-only through vfio-pci. This is sufficient to protect the host kernel, but has the potential to confuse guests without further virtualization. In particular, OVMF tries to size the VF BARs and comes up with absurd results, ending with an assert. There's not much point in adding virtualization to a read-only capability, so we simply hide it for now. If the kernel ever enables SR-IOV virtualization, we should easily be able to test it through VF BAR sizing or explicit flags. Testing whether we should parse extended capabilities is also pulled into the function to keep these assumptions in one place. Tested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-06-30vfio: add pcie extended capability supportChen Fan
For vfio pcie device, we could expose the extended capability on PCIE bus. due to add a new pcie capability at the tail of the chain, in order to avoid config space overwritten, we introduce a copy config for parsing extended caps. and rebuild the pcie extended config space. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-06-16os-posix: include sys/mman.hPaolo Bonzini
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check is bogus without a previous inclusion of sys/mman.h. Include it in sysemu/os-posix.h and remove it from everywhere else. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-26vfio/pci: Add a separate option for IGD OpRegion supportAlex Williamson
The IGD OpRegion is enabled automatically when running in legacy mode, but it can sometimes be useful in universal passthrough mode as well. Without an OpRegion, output spigots don't work, and even though Intel doesn't officially support physical outputs in UPT mode, it's a useful feature. Note that if an OpRegion is enabled but a monitor is not connected, some graphics features will be disabled in the guest versus a headless system without an OpRegion, where they would work. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26vfio/pci: Intel graphics legacy mode assignmentAlex Williamson
Enable quirks to support SandyBridge and newer IGD devices as primary VM graphics. This requires new vfio-pci device specific regions added in kernel v4.6 to expose the IGD OpRegion, the shadow ROM, and config space access to the PCI host bridge and LPC/ISA bridge. VM firmware support, SeaBIOS only so far, is also required for reserving memory regions for IGD specific use. In order to enable this mode, IGD must be assigned to the VM at PCI bus address 00:02.0, it must have a ROM, it must be able to enable VGA, it must have or be able to create on its own an LPC/ISA bridge of the proper type at PCI bus address 00:1f.0 (sorry, not compatible with Q35 yet), and it must have the above noted vfio-pci kernel features and BIOS. The intention is that to enable this mode, a user simply needs to assign 00:02.0 from the host to 00:02.0 in the VM: -device vfio-pci,host=0000:00:02.0,bus=pci.0,addr=02.0 and everything either happens automatically or it doesn't. In the case that it doesn't, we leave error reports, but assume the device will operate in universal passthrough mode (UPT), which doesn't require any of this, but has a much more narrow window of supported devices, supported use cases, and supported guest drivers. When using IGD in this mode, the VM firmware is required to reserve some VM RAM for the OpRegion (on the order or several 4k pages) and stolen memory for the GTT (up to 8MB for the latest GPUs). An additional option, x-igd-gms allows the user to specify some amount of additional memory (value is number of 32MB chunks up to 512MB) that is pre-allocated for graphics use. TBH, I don't know of anything that requires this or makes use of this memory, which is why we don't allocate any by default, but the specification suggests this is not actually a valid combination, so the option exists as a workaround. Please report if it's actually necessary in some environment. See code comments for further discussion about the actual operation of the quirks necessary to assign these devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26vfio/pci: Setup BAR quirks after capabilities probingAlex Williamson
Capability probing modifies wmask, which quirks may be interested in changing themselves. Apply our BAR quirks after the capability scan to make this possible. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26vfio/pci: Consolidate VGA setupAlex Williamson
Combine VGA discovery and registration. Quirks can have dependencies on BARs, so the quirks push out until after we've scanned the BARs. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26vfio/pci: Fix return of vfio_populate_vga()Alex Williamson
This function returns success if either we setup the VGA region or the host vfio doesn't return enough regions to support the VGA index. This latter case doesn't make any sense. If we're asked to populate VGA, fail if it doesn't exist and let the caller decide if that's important. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-10vfio/pci: replace fixed string limit by g_strdup_printfNeo Jia
A trivial change to remove string limit by using g_strdup_printf Tested-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10vfio/pci: Split out VGA setupAlex Williamson
This could be setup later by device specific code, such as IGD initialization. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10vfio/pci: Fixup PCI option ROMsAlex Williamson
Devices like Intel graphics are known to not only have bad checksums, but also the wrong device ID. This is not so surprising given that the video BIOS is typically part of the system firmware image rather that embedded into the device and needs to support any IGD device installed into the system. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functionsAlex Williamson
Match common vfio code with setup, exit, and finalize functions for BAR, quirk, and VGA management. VGA is also changed to dynamic allocation to match the other MemoryRegions. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10vfio: Generalize region supportAlex Williamson
Both platform and PCI vfio drivers create a "slow", I/O memory region with one or more mmap memory regions overlayed when supported by the device. Generalize this to a set of common helpers in the core that pulls the region info from vfio, fills the region data, configures slow mapping, and adds helpers for comleting the mmap, enable/disable, and teardown. This can be immediately used by the PCI MSI-X code, which needs to mmap around the MSI-X vector table. This also changes VFIORegion.mem to be dynamically allocated because otherwise we don't know how the caller has allocated VFIORegion and therefore don't know whether to unreference it to destroy the MemoryRegion or not. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10vfio: Wrap VFIO_DEVICE_GET_REGION_INFOAlex Williamson
In preparation for supporting capability chains on regions, wrap ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for each caller. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>