summaryrefslogtreecommitdiff
path: root/memory.c
AgeCommit message (Collapse)Author
2020-02-25memory: batch allocate ioeventfds[] in address_space_update_ioeventfds()Stefan Hajnoczi
Reallocing the ioeventfds[] array each time an element is added is very expensive as the number of ioeventfds increases. Batch allocate instead to amortize the cost of realloc. This patch reduces Linux guest boot times from 362s to 140s when there are 2 virtio-blk devices with 1 virtqueue and 99 virtio-blk devices with 32 virtqueues. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200218182226.913977-1-stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30memory.c: Use trace_event_get_state_backends()Peter Maydell
The preferred way to test whether a trace event is enabled is to use trace_event_get_state_backends(), because this will give the correct answer (allowing expensive computations to be skipped) whether the trace event is compile-time or run-time disabled. Convert the four old-style direct uses of TRACE_FOO_ENABLED in memory.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20200120151142.18954-3-peter.maydell@linaro.org Message-Id: <20200120151142.18954-3-peter.maydell@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-24accel: Replace current_machine->accelerator by current_accel() wrapperPhilippe Mathieu-Daudé
We actually want to access the accelerator, not the machine, so use the current_accel() wrapper instead. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200121110349.25842-10-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-17memory: do not look at current_machine->accelPaolo Bonzini
"info mtree -f" prints the wrong accelerator name if used with for example "-machine accel=kvm:tcg". The right thing to do is to fetch the name from the AccelClass, which will also work nicely once current_machine->accel stops existing. Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-16Memory: Enable writeback for given memory regionBeata Michalska
Add an option to trigger memory writeback to sync given memory region with the corresponding backing store, case one is available. This extends the support for persistent memory, allowing syncing on-demand. Signed-off-by: Beata Michalska <beata.michalska@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191121000843.24844-3-beata.michalska@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-11-11Remove unassigned_access CPU hookPeter Maydell
All targets have now migrated away from the old unassigned_access hook to the new do_transaction_failed hook. This means we can remove the core-code infrastructure for that hook and the code that calls it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20191108173732.11816-1-peter.maydell@linaro.org
2019-10-11rcu: Use automatic rc_read unlock in core memory/exec codeDr. David Alan Gilbert
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20191007143642.301445-6-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-10-04memory: allow memory_region_register_iommu_notifier() to failEric Auger
Currently, when a notifier is attempted to be registered and its flags are not supported (especially the MAP one) by the IOMMU MR, we generally abruptly exit in the IOMMU code. The failure could be handled more nicely in the caller and especially in the VFIO code. So let's allow memory_region_register_iommu_notifier() to fail as well as notify_flag_changed() callback. All sites implementing the callback are updated. This patch does not yet remove the exit(1) in the amd_iommu code. in SMMUv3 we turn the warning message into an error message saying that the assigned device would not work properly. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-25cputlb: Move NOTDIRTY handling from I/O path to TLB pathRichard Henderson
Pages that we want to track for NOTDIRTY are RAM. We do not really need to go through the I/O path to handle them. Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-25exec: Adjust notdirty tracingRichard Henderson
The memory_region_tb_read tracepoint is unreachable, since notdirty is supposed to apply only to writes. The memory_region_tb_write tracepoint is mis-named, because notdirty is not only used for TB invalidation. It is also used for e.g. VGA RAM updates and migration. Replace memory_region_tb_write with memory_notdirty_write_access, and place it in memory_notdirty_write_prepare where it can catch all of the instances. Add memory_notdirty_set_dirty to log when we no longer intercept writes to a page. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-16memory: inline and optimize devend_memopPaolo Bonzini
devend_memop can rely on the fact that the result is always either 0 or MO_BSWAP, corresponding respectively to host endianness and the opposite. Native (target) endianness in turn can be either the host endianness, in which case MO_BSWAP is only returned for host-opposite endianness, or the opposite, in which case 0 is only returned for host endianness. With this in mind, devend_memop can be compiled as a setcond+shift for every target. Do this and, while at it, move it to include/exec/memory.h since !NEED_CPU_H files do not (and should not) need it. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-04Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20190903' into stagingPeter Maydell
Allow page table bit to swap endianness. Reorganize watchpoints out of i/o path. Return host address from probe_write / probe_access. # gpg: Signature made Tue 03 Sep 2019 16:47:50 BST # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20190903: (36 commits) tcg: Factor out probe_write() logic into probe_access() tcg: Make probe_write() return a pointer to the host page s390x/tcg: Pass a size to probe_write() in do_csst() hppa/tcg: Call probe_write() also for CONFIG_USER_ONLY mips/tcg: Call probe_write() for CONFIG_USER_ONLY as well tcg: Enforce single page access in probe_write() tcg: Factor out CONFIG_USER_ONLY probe_write() from s390x code s390x/tcg: Fix length calculation in probe_write_access() s390x/tcg: Use guest_addr_valid() instead of h2g_valid() in probe_write_access() tcg: Check for watchpoints in probe_write() cputlb: Handle watchpoints via TLB_WATCHPOINT cputlb: Remove double-alignment in store_helper cputlb: Fix size operand for tlb_fill on unaligned store exec: Factor out cpu_watchpoint_address_matches cputlb: Fold TLB_RECHECK into TLB_INVALID_MASK exec: Factor out core logic of check_watchpoint() exec: Move user-only watchpoint stubs inline target/sparc: sun4u Invert Endian TTE bit target/sparc: Add TLB entry with attributes cputlb: Byte swap memory transaction attribute ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-09-03memory: Single byte swap along the I/O pathTony Nguyen
Now that MemOp has been pushed down into the memory API, and callers are encoding endianness, we can collapse byte swaps along the I/O path into the accelerator and target independent adjust_endianness. Collapsing byte swaps along the I/O path enables additional endian inversion logic, e.g. SPARC64 Invert Endian TTE bit, with redundant byte swaps cancelling out. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Message-Id: <911ff31af11922a9afba9b7ce128af8b8b80f316.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03memory: Access MemoryRegion with endiannessTony Nguyen
Preparation for collapsing the two byte swaps adjust_endianness and handle_bswap into the former. Call memory_region_dispatch_{read|write} with endianness encoded into the "MemOp op" operand. This patch does not change any behaviour as memory_region_dispatch_{read|write} is yet to handle the endianness. Once it does handle endianness, callers with byte swaps can collapse them into adjust_endianness. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Message-Id: <8066ab3eb037c0388dfadfe53c5118429dd1de3a.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03memory: Access MemoryRegion with MemOpTony Nguyen
Convert memory_region_dispatch_{read|write} operand "unsigned size" into a "MemOp op". Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1dd82df5801866743f838f1d046475115a1d32da.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03memory: Remove unused memory_region_iommu_replay_all()Eric Auger
memory_region_iommu_replay_all is not used. Remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-id: 20190822172350.12008-2-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-21memory: Fix up memory_region_{add|del}_coalescingPeter Xu
The old memory_region_{add|clear}_coalescing() has some defects because they both changed mr->coalesced before updating the regions using memory_region_update_coalesced_range_as(). Then when the regions were updated in memory_region_update_coalesced_range_as() the mr->coalesced will always be either one more or one less. So: - For memory_region_add_coalescing: it'll always trying to remove the newly added coalesced region while it shouldn't, and, - For memory_region_clear_coalescing: when it calls the update there will be no coalesced ranges on mr->coalesced because they were all removed before hand so the update will probably do nothing for real. Let's fix this. Now we've got flat_range_coalesced_io_notify() to notify a single CoalescedMemoryRange instance change, so use it in the existing memory_region_update_coalesced_range() logic by only notify either an addition or deletion. Then we hammer both the memory_region_{add|clear}_coalescing() to use it. Fixes: 3ac7d43a6fbb5d4a3 Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190820141328.10009-5-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-21memory: Remove has_coalesced_range counterPeter Xu
The has_coalesced_range could potentially be problematic in that it only works for additions of coalesced mmio ranges but not deletions. The reason is that has_coalesced_range information can be lost when the FlatView updates the topology again when the updated region is not covering the coalesced regions. When that happens, due to flatrange_equal() is not checking against has_coalesced_range, the new FlatRange will be seen as the same one as the old and the new instance (whose has_coalesced_range will be zero) will replace the old instance (whose has_coalesced_range _could_ be non-zero). The counter was originally used to make sure every FlatRange will only notify once for coalesced_io_{add|del} memory listeners, because each FlatRange can be used by multiple address spaces, so logically speaking it could be called multiple times. However we should not limit that, because memory listeners should will only be registered with specific address space rather than multiple address spaces. So let's fix this up by simply removing the whole has_coalesced_range. Fixes: 3ac7d43a6fbb5d4a3 Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190820141328.10009-3-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-21memory: Split zones when do coalesced_io_del()Peter Xu
It is a workaround of current KVM's KVM_UNREGISTER_COALESCED_MMIO interface. The kernel interface only allows to unregister an mmio device with exactly the zone size when registered, or any smaller zone that is included in the device mmio zone. It does not support the userspace to specify a very large zone to remove all the small mmio devices within the zone covered. Logically speaking it would be nicer to fix this from KVM side, though in all cases we still need to coop with old kernels so let's do this. Fixes: 3ac7d43a6fbb5d4a3 Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190820141328.10009-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-21memory: Refactor memory_region_clear_coalescingPeter Xu
Removing the update variable and quit earlier if the memory region has no coalesced range. This prepares for the next patch. Fixes: 3ac7d43a6fbb5d4a3 Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190820141328.10009-4-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-20memory: fix race between TCG and accesses to dirty bitmapPaolo Bonzini
There is a race between TCG and accesses to the dirty log: vCPU thread reader thread ----------------------- ----------------------- TLB check -> slow path notdirty_mem_write write to RAM set dirty flag clear dirty flag TLB check -> fast path read memory write to RAM Fortunately, in order to fix it, no change is required to the vCPU thread. However, the reader thread must delay the read after the vCPU thread has finished the write. This can be approximated conservatively by run_on_cpu, which waits for the end of the current translation block. A similar technique is used by KVM, which has to do a synchronous TLB flush after doing a test-and-clear of the dirty-page flags. Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-20memory: assert on out of scope notificationYan Zhao
It is wrong for an entry to have parts out of scope of notifier's range. assert this condition. Out of scope mapping/unmapping would cause problem, as in below case: 1. initially there are two notifiers with ranges 0-0xfedfffff, 0xfef00000-0xffffffffffffffff, IOVAs from 0x3c000000 - 0x3c1fffff is in shadow page table. 2. in vfio, memory_region_register_iommu_notifier() is followed by memory_region_iommu_replay(), which will first call address space unmap, and walk and add back all entries in vtd shadow page table. e.g. (1) for notifier 0-0xfedfffff, IOVAs from 0 - 0xffffffff get unmapped, and IOVAs from 0x3c000000 - 0x3c1fffff get mapped (2) for notifier 0xfef00000-0xffffffffffffffff IOVAs from 0 - 0x7fffffffff get unmapped, but IOVAs from 0x3c000000 - 0x3c1fffff cannot get mapped back. Cc: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Message-Id: <1561432878-13754-1-git-send-email-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-16sysemu: Split sysemu/runstate.h off sysemu/sysemu.hMarkus Armbruster
sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]
2019-08-16Include hw/qdev-properties.h lessMarkus Armbruster
In my "build everything" tree, changing hw/qdev-properties.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). Many places including hw/qdev-properties.h (directly or via hw/qdev.h) actually need only hw/qdev-core.h. Include hw/qdev-core.h there instead. hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h and hw/qdev-properties.h, which in turn includes hw/qdev-core.h. Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h. While there, delete a few superfluous inclusions of hw/qdev-core.h. Touching hw/qdev-properties.h now recompiles some 1200 objects. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Daniel P. Berrangé" <berrange@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190812052359.30071-22-armbru@redhat.com>
2019-08-16Include qemu/main-loop.h lessMarkus Armbruster
In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16memory: Fix type of IOMMUMemoryRegionClass member @parent_classMarkus Armbruster
TYPE_IOMMU_MEMORY_REGION is a direct subtype of TYPE_MEMORY_REGION. Its instance struct is IOMMUMemoryRegion, and its first member is a MemoryRegion. Correct. Its class struct is IOMMUMemoryRegionClass, and its first member is a DeviceClass. Wrong. Messed up when commit 1221a474676 introduced the QOM type. It even included hw/qdev-core.h just for that. TYPE_MEMORY_REGION doesn't bother to define a class struct. This is fine, it simply defaults to its super-type TYPE_OBJECT's class struct ObjectClass. Changing IOMMUMemoryRegionClass's first member's type to ObjectClass would be a minimal fix, if a bit brittle: if TYPE_MEMORY_REGION ever acquired own class struct, we'd have to update IOMMUMemoryRegionClass to use it. Fix it the clean and robust way instead: give TYPE_MEMORY_REGION its own class struct MemoryRegionClass now, and use it for IOMMUMemoryRegionClass's first member. Revert the include of hw/qdev-core.h, and fix the few files that have come to rely on it. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20190812052359.30071-5-armbru@redhat.com>
2019-07-19hmp: Print if memory section is registered with an acceleratorAlexey Kardashevskiy
This adds an accelerator name to the "into mtree -f" to tell the user if a particular memory section is registered with the accelerator; the primary user for this is KVM and such information is useful for debugging purposes. This adds a has_memory() callback to the accelerator class allowing any accelerator to have a label in that memory tree dump. Since memory sections are passed to memory listeners and get registered in accelerators (rather than memory regions), this only prints new labels for flatviews attached to the system address space. An example: Root memory region: system 0000000000000000-0000002fffffffff (prio 0, ram): /objects/mem0 kvm 0000003000000000-0000005fffffffff (prio 0, ram): /objects/mem1 kvm 0000200000000020-000020000000003f (prio 1, i/o): virtio-pci 0000200080000000-000020008000003f (prio 0, i/o): capabilities Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190614015237.82463-1-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-16Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell
* VFIO bugfix for AMD SEV (Alex) * Kconfig improvements (Julio, Philippe) * MemoryRegion reference counting bugfix (King Wang) * Build system cleanups (Marc-André, myself) * rdmacm-mux off-by-one (Marc-André) * ZBC passthrough fixes (Shinichiro, myself) * WHPX build fix (Stefan) * char-pty fix (Wei Yang) # gpg: Signature made Tue 16 Jul 2019 08:31:27 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: vl: make sure char-pty message displayed by moving setbuf to the beginning create_config: remove $(CONFIG_SOFTMMU) hack Makefile: do not repeat $(CONFIG_SOFTMMU) in hw/Makefile.objs hw/usb/Kconfig: USB_XHCI_NEC requires USB_XHCI hw/usb/Kconfig: Add CONFIG_USB_EHCI_PCI target/i386: sev: Do not unpin ram device memory region checkpatch: detect doubly-encoded UTF-8 hw/lm32/Kconfig: Milkymist One provides a USB 1.1 Controller util: merge main-loop.c and iohandler.c Fix broken build with WHPX enabled memory: unref the memory region in simplify flatview hw/i386: turn off vmport if CONFIG_VMPORT is disabled rdmacm-mux: fix strcpy string warning build-sys: remove slirp cflags from main-loop.o iscsi: base all handling of check condition on scsi_sense_to_errno iscsi: fix busy/timeout/task set full scsi: add guest-recoverable ZBC errors scsi: explicitly list guest-recoverable sense codes scsi-disk: pass sense correctly for guest-recoverable errors Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-15memory: Introduce memory listener hook log_clear()Peter Xu
Introduce a new memory region listener hook log_clear() to allow the listeners to hook onto the points where the dirty bitmap is cleared by the bitmap users. Previously log_sync() contains two operations: - dirty bitmap collection, and, - dirty bitmap clear on remote site. Let's take KVM as example - log_sync() for KVM will first copy the kernel dirty bitmap to userspace, and at the same time we'll clear the dirty bitmap there along with re-protecting all the guest pages again. We add this new log_clear() interface only to split the old log_sync() into two separated procedures: - use log_sync() to collect the collection only, and, - use log_clear() to clear the remote dirty bitmap. With the new interface, the memory listener users will still be able to decide how to implement the log synchronization procedure, e.g., they can still only provide log_sync() method only and put all the two procedures within log_sync() (that's how the old KVM works before KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is introduced). However with this new interface the memory listener users will start to have a chance to postpone the log clear operation explicitly if the module supports. That can really benefit users like KVM at least for host kernels that support KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2. There are three places that can clear dirty bits in any one of the dirty bitmap in the ram_list.dirty_memory[3] array: cpu_physical_memory_snapshot_and_clear_dirty cpu_physical_memory_test_and_clear_dirty cpu_physical_memory_sync_dirty_bitmap Currently we hook directly into each of the functions to notify about the log_clear(). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190603065056.25211-7-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15memory: Pass mr into snapshot_and_clear_dirtyPeter Xu
Also we change the 2nd parameter of it to be the relative offset within the memory region. This is to be used in follow up patches. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190603065056.25211-6-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15memory: Don't set migration bitmap when without migrationPeter Xu
Similar to 9460dee4b2 ("memory: do not touch code dirty bitmap unless TCG is enabled", 2015-06-05) but for the migration bitmap - we can skip the MIGRATION bitmap update if migration not enabled. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190603065056.25211-4-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15memory: unref the memory region in simplify flatviewKing Wang
The memory region reference is increased when insert a range into flatview range array, then decreased by destroy flatview. If some flat range merged by flatview_simplify, the memory region reference can not be decreased by destroy flatview any more. In this case, start virtual machine by the command line: qemu-system-x86_64 -name guest=ubuntu,debug-threads=on -machine pc,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 16384 -realtime mlock=off -smp 8,sockets=2,cores=4,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=8589934592 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=8589934592 -numa node,nodeid=1,cpus=4-7,memdev=ram-node1 -no-user-config -nodefaults -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=ubuntu.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x5 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on And run the script in guest OS: while true do setpci -s 00:06.0 04.b=03 setpci -s 00:06.0 04.b=07 done I found the reference of node0 HostMemoryBackendFile is a big one. (gdb) p numa_info[0]->node_memdev->parent.ref $6 = 1636278 (gdb) Signed-off-by: King Wang<king.wang@huawei.com> Message-Id: <20190712065241.11784-1-king.wang@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-02spapr_pci: Unregister listeners before destroying the IOMMU address spaceGreg Kurz
Hot-unplugging a PHB with a VFIO device connected to it crashes QEMU: -device spapr-pci-host-bridge,index=1,id=phb1 \ -device vfio-pci,host=0034:01:00.3,id=vfio0 (qemu) device_del phb1 [ 357.207183] iommu: Removing device 0001:00:00.0 from group 1 [ 360.375523] rpadlpar_io: slot PHB 1 removed qemu-system-ppc64: memory.c:2742: do_address_space_destroy: Assertion `QTAILQ_EMPTY(&as->listeners)' failed. 'as' is the IOMMU address space, which indeed has a listener registered to by vfio_connect_container() when the VFIO device is realized. This listener is supposed to be unregistered by vfio_disconnect_container() when the VFIO device is finalized. Unfortunately, the VFIO device hasn't reached finalize yet at the time the PHB unrealize function is called, and address_space_destroy() gets called with the VFIO listener still being registered. All regions have just been unmapped from the address space. Listeners aren't needed anymore at this point. Remove them before destroying the address space. The VFIO code will try to remove them _again_ at device finalize, but it is okay since memory_listener_unregister() is idempotent. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156110925375.92514.11649846071216864570.stgit@bahia.lan> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> [dwg: Correct spelling error pointed out by aik] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-06-11qemu-common: Move tcg_enabled() etc. to sysemu/tcg.hMarkus Armbruster
Other accelerators have their own headers: sysemu/hax.h, sysemu/hvf.h, sysemu/kvm.h, sysemu/whpx.h. Only tcg_enabled() & friends sit in qemu-common.h. This necessitates inclusion of qemu-common.h into headers, which is against the rules spelled out in qemu-common.h's file comment. Move tcg_enabled() & friends into their own header sysemu/tcg.h, and adjust #include directives. Cc: Richard Henderson <rth@twiddle.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-2-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [Rebased with conflicts resolved automatically, except for accel/tcg/tcg-all.c]
2019-06-03memory: Remove memory_region_get_dirty()Peter Xu
It's never used anywhere. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190520030839.6795-5-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-17memory: correct the comment to DIRTY_MEMORY_MIGRATIONWei Yang
The dirty bit is DIRTY_MEMORY_MIGRATION. Correct the comment. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190426020927.25470-1-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-18memory: Clean up how mtree_info() printsMarkus Armbruster
mtree_info() takes an fprintf()-like callback and a FILE * to pass to it, and so do its helper functions. Passing around callback and argument is rather tiresome. Its only caller hmp_info_mtree() passes monitor_printf() cast to fprintf_function and the current monitor cast to FILE *. The type-punning is technically undefined behaviour, but works in practice. Clean up: drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-9-armbru@redhat.com>
2019-03-18memory: Fix the memory region type assignment orderSingh, Brijesh
Currently, a callback registered through the RAMBlock notifier is not able to get the memory region type (i.e callback is not able to use memory_region_is_ram_device function). This is because mr->ram assignment happens _after_ the memory is allocated whereas the callback is executed during allocation. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1667249 Suggested-by: Alex Williamson <alex.williamson@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Message-Id: <20190204222322.26766-2-brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-11memory: Do not update coalesced IO range in the case of NOPJagannathan Raman
Do not add/del coalesced IO ranges in the case where the same FlatRanges are present in both old and new FlatViews Fixes: 3ac7d43a6fbb ("memory: update coalesced_range on transaction_commit") Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Message-Id: <59572a7353830be4b7aa57d79ccb7ad6b72f0dda.1549406119.git.jag.raman@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11qemu/queue.h: simplify reverse access to QTAILQPaolo Bonzini
The new definition of QTAILQ does not require passing the headname, remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11qemu/queue.h: leave head structs anonymous unless necessaryPaolo Bonzini
Most list head structs need not be given a name. In most cases the name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV or reverse iteration, but this does not apply to lists of other kinds, and even for QTAILQ in practice this is only rarely needed. In addition, we will soon reimplement those macros completely so that they do not need a name for the head struct. So clean up everything, not giving a name except in the rare case where it is necessary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11memory: update coalesced_range on transaction_commitPaolo Bonzini
The e1000 driver calls memory_region_add_coalescing but kvm_coalesce_mmio_region is never called for those regions. The bug dates back to the introduction of the memory region API; to fix it, delete and re-add coalesced MMIO ranges when building the FlatViews. Because coalesced MMIO regions apply to all address spaces, the has_coalesced_range flag has to be changed into an int. Fixes: 093bc2cd885e ("Hierarchical memory region API") Reported-by: Atsushi Nemoto <atsushi.nemoto@sord.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11memory: avoid unnecessary coalesced_io_del operationsPaolo Bonzini
Store whether the FlatRange has had any coalesced I/O ranges applied, and if not avoid calling coalesced_io_del. This is useful in preparation for the next patch, which will call coalesced_io_del when rendering memory regions. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11memory: extract flat_range_coalesced_io_{del,add}Paolo Bonzini
Extract two new functions from memory_region_update_coalesced_range_as. To avoid duplication in the creation of the MemoryRegionSection, use MEMORY_LISTENER_UPDATE_REGION instead of MEMORY_LISTENER_CALL to invoke the listener callback. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-11-06memory: learn about non-volatile memory regionMarc-André Lureau
Add a new flag to mark memory region that are used as non-volatile, by NVDIMM for example. That bit is propagated down to the flat view, and reflected in HMP info mtree with a "nv-" prefix on the memory type. This way, guest_phys_blocks_region_add() can skip the NV memory regions for dumps and TCG memory clear in a following patch. Cc: dgilbert@redhat.com Cc: imammedo@redhat.com Cc: pbonzini@redhat.com Cc: guangrong.xiao@linux.intel.com Cc: mst@redhat.com Cc: xiaoguangrong.eric@gmail.com Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20181003114454.5662-2-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-19target-i386 : add coalesced_pio APIPeng Hao
the primary API realization. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1539795177-21038-3-git-send-email-peng.hao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02memory: Remove old_mmio accessorsPeter Maydell
Now that all the users of old_mmio MemoryRegion accessors have been converted, we can remove the core code support. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180824170422.5783-2-peter.maydell@linaro.org> Based-on: <20180802174042.29234-1-peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02memory: Fix access_with_adjusted_size(small size) on big-endian memory regionsPhilippe Mathieu-Daudé
Memory regions configured as DEVICE_BIG_ENDIAN (or DEVICE_NATIVE_ENDIAN on big-endian guest) behave incorrectly when the memory access 'size' is smaller than the implementation 'access_size'. In the following code segment from access_with_adjusted_size(): if (memory_region_big_endian(mr)) { for (i = 0; i < size; i += access_size) { r |= access_fn(mr, addr + i, value, access_size, (size - access_size - i) * 8, access_mask, attrs); } (size - access_size - i) * 8 is the number of bits that will arithmetic shift the current value. Currently we can only 'left' shift a read() access, and 'right' shift a write(). When the access 'size' is smaller than the implementation, we get a negative number of bits to shift. For the read() case, a negative 'left' shift is a 'right' shift :) However since the 'shift' type is unsigned, there is currently no way to right shift. Fix this by changing the access_fn() prototype to handle signed shift values, and modify the memory_region_shift_read|write_access() helpers to correctly arithmetic shift the opposite direction when the 'shift' value is negative. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180927002416.1781-4-f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02memory: Refactor common shifting code from accessorsPhilippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180927002416.1781-3-f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02memory: Use MAKE_64BIT_MASK()Philippe Mathieu-Daudé
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180927002416.1781-2-f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>