summaryrefslogtreecommitdiff
path: root/numa.c
AgeCommit message (Collapse)Author
2015-03-19numa: Print warning if no node is assigned to a CPUEduardo Habkost
We need all possible CPUs (including hotplug ones) to be present in the SRAT when QEMU starts. QEMU already does that correctly today, the only problem is that when a CPU is omitted from the NUMA configuration, it is silently assigned to node 0. Check if all CPUs up to max_cpus are present in the NUMA configuration and warn about missing CPUs. Make it just a warning, to allow management software to be updated if necessary. In the future we may make it a fatal error instead. Command-line examples: * Correct, no warning: $ qemu-system-x86_64 -smp 2,maxcpus=4 $ qemu-system-x86_64 -smp 2,maxcpus=4 -numa node,cpus=0-3 * Incomplete, with warnings: $ qemu-system-x86_64 -smp 2,maxcpus=4 -numa node,cpus=0 qemu-system-x86_64: warning: CPU(s) not present in any NUMA nodes: 1 2 3 qemu-system-x86_64: warning: All CPU(s) up to maxcpus should be described in NUMA config $ qemu-system-x86_64 -smp 2,maxcpus=4 -numa node,cpus=0-2 qemu-system-x86_64: warning: CPU(s) not present in any NUMA nodes: 3 qemu-system-x86_64: warning: All CPU(s) up to maxcpus should be described in NUMA config Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- v1 -> v2: (no changes) v2 -> v3: * Use enumerate_cpus() and error_report() for error message * Simplify logic using bitmap_full() v3 -> v4: * Clarify error message, mention that all CPUs up to maxcpus need to be described in NUMA config v4 -> v5: * Commit log update, to make problem description clearer
2015-03-19numa: introduce machine callback for VCPU to node mappingIgor Mammedov
Current default round-robin way of distributing VCPUs among NUMA nodes might be wrong in case on multi-core/threads CPUs. Making guests confused wrt topology where cores from the same socket are on different nodes. Allow a machine to override default mapping by providing MachineClass::cpu_index_to_socket_id() callback which would allow it group VCPUs from a socket on the same NUMA node. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-03-19numa: Reject configuration if CPU appears on multiple nodesEduardo Habkost
Each CPU can appear in only one NUMA node on the NUMA config. Reject configuration if a CPU appears in multiple nodes. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-03-19numa: Reject CPU indexes > max_cpusEduardo Habkost
CPU index is always less than max_cpus, as documented at sysemu.h: > The following shall be true for all CPUs: > cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS Reject configuration which uses invalid CPU indexes. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-03-19numa: Fix off-by-one error at MAX_CPUMASK_BITS checkEduardo Habkost
Fix the CPU index check to ensure we don't go beyond the size of the node_cpu bitmap. CPU index is always less than MAX_CPUMASK_BITS, as documented at sysemu.h: > The following shall be true for all CPUs: > cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-03-10numa: remove superfluous '\n' around error_setgGonglei
Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-03-02Merge remote-tracking branch 'remotes/ehabkost/tags/numa-pull-request' into ↵Peter Maydell
staging NUMA fixes queue # gpg: Signature made Mon Feb 23 19:28:42 2015 GMT using RSA key ID 984DC5A6 # gpg: Can't check signature: public key not found * remotes/ehabkost/tags/numa-pull-request: numa: Rename set_numa_modes() to numa_post_machine_init() numa: Rename option parsing functions numa: Move QemuOpts parsing to set_numa_nodes() numa: Make max_numa_nodeid static numa: Move NUMA globals to numa.c vl.c: Remove unnecessary zero-initialization of NUMA globals numa: Move NUMA declarations from sysemu.h to numa.h Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-02-23numa: Rename set_numa_modes() to numa_post_machine_init()Eduardo Habkost
This function does some initialization that needs to be done after machine init. The function may be eventually removed if we move the CPUState.numa_node initialization to the CPU init code, but while the function exists, lets give it a name that makes sense. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-02-23numa: Rename option parsing functionsEduardo Habkost
Renaming set_numa_nodes() and numa_init_func() to parse_numa_opts() and parse_numa() makes the purpose of those functions clearer. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-02-23numa: Move QemuOpts parsing to set_numa_nodes()Eduardo Habkost
This allows us to make numa_init_func() static. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-02-23numa: Make max_numa_nodeid staticEduardo Habkost
Now the only code that uses the variable is inside numa.c. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-02-23numa: Move NUMA globals to numa.cEduardo Habkost
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-02-23numa: Move NUMA declarations from sysemu.h to numa.hEduardo Habkost
Not all sysemu.h users need the NUMA declarations, and keeping them in a separate file makes it easier to see what are the interfaces provided by numa.c. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-02-18numa: Avoid qerror_report_err() outside QMP command handlersMarkus Armbruster
qerror_report_err() is a transitional interface to help with converting existing monitor commands to QMP. It should not be used elsewhere. Replace by error_report_err() in initial startup helper numa_init_func() and board setup helper memory_region_allocate_system_memory(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2014-11-11numa: make 'info numa' take into account hotplugged memoryzhanghailiang
When do memory hotplug, if there is numa node, we should add the memory size to the corresponding node memory size. It affects the result of hmp command "info numa". Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-09memory: add parameter errp to memory_region_init_ramHu Tao
Add parameter errp to memory_region_init_ram and update all call sites to pass in &error_abort. Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-02hmp: fix MemdevList memory leakChen Fan
the memdev_list in hmp_info_memdev() is never freed. so we use existent method qapi_free_MemdevList() to free it. and also we can use qapi_free_MemdevList() to replace list loops to clean up the memdev list in error path. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Reviewed-by: Hu Tao <hutao@cn.fujitsu.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02query-memdev: fix potential memory leaksChen Fan
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Reviewed-by: Hu Tao <hutao@cn.fujitsu.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-14numa: show hex number in error message for consistency and prefix them with 0xHu Tao
The error messages before and after patch are: before: qemu-system-x86_64: total memory for NUMA nodes (134217728) should equal RAM size (20000000) after: qemu-system-x86_64: total memory for NUMA nodes (0x8000000) should equal RAM size (0x20000000) Cc: qemu-stable@nongnu.org Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-07-06numa: check for busy memory backendHu Tao
Specifying the same memory backend twice leads to an assert: ./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0 -numa node,nodeid=1,memdev=ram0 qemu-system-x86_64: /scm/qemu/memory.c:1506: memory_region_add_subregion_common: Assertion `!subregion->container' failed. Aborted (core dumped) Detect and exit with an error message instead. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-29numa: Reject configuration if not all node IDs are presentEduardo Habkost
We don't support sparse NUMA node IDs yet, so this changes QEMU to reject configs where not all nodes are present. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-29numa: Reject duplicate node IDsEduardo Habkost
The same nodeid shouldn't appear multiple times in the command-line. In addition to detecting command-line mistakes, this will fix a bug where nb_numa_nodes may become larger than MAX_NODES (and cause out-of-bounds access on the numa_info array). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-29numa: Keep track of NUMA nodes present on the command-lineEduardo Habkost
Based on "enable sparse node numbering" patch from Nishanth Aravamudan, but without the code to actually support sparse node IDs. This just adds the code to keep track of present/non-present nodes on the command-line, without changing any behavior. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> [Rename max_numa_node to max_numa_nodeid -Eduardo] [Initialize max_numa_nodeid to 0 -Eduardo] [Use MAX() macro when setting max_numa_nodeid -Eduardo] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-29numa: fix commentMichael S. Tsirkin
s/if given for/is given for/; Reported-by: Hu Tao <hutao@cn.fujitsu.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-29numa: fix commentMichael S. Tsirkin
Fix up English in comments: s/the each/each/ Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2014-06-19numa: handle mmaped memory allocation failure correctlyIgor Mammedov
when memory_region_init_ram_from_file() fails memory_region_size() will still return size that was provided at region init time. Instead use errp to properly detect error condition. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19qmp: add query-memdevHu Tao
Add qmp command query-memdev to query for information of memory devices Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19hostmem: add property to map memory with MAP_SHAREDPaolo Bonzini
A new "share" property can be used with the "memory-file" backend to map memory with MAP_SHARED instead of MAP_PRIVATE. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19memory: add error propagation to file-based RAM allocationPaolo Bonzini
Right now, -mem-path will fall back to RAM-based allocation in some cases. This should never happen with "-object memory-file", prepare the code by adding correct error propagation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> MST: drop \n at end of error messages
2014-06-19memory: move mem_path handling to memory_region_allocate_system_memoryPaolo Bonzini
Like the previous patch did in exec.c, split memory_region_init_ram and memory_region_init_ram_from_file, and push mem_path one step further up. Other RAM regions than system memory will now be backed by regular RAM. Also, boards that do not use memory_region_allocate_system_memory will not support -mem-path anymore. This can be changed before the patches are merged by migrating boards to use the function. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19numa: add -numa node,memdev= optionPaolo Bonzini
This option provides the infrastructure for binding guest NUMA nodes to host NUMA nodes. For example: -object memory-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ -object memory-ram,size=1024M,policy=interleave,host-nodes=1-3,id=ram-node1 \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 The option replaces "-numa node,mem=". Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> MST: conflict resolution
2014-06-19numa: introduce memory_region_allocate_system_memoryPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> MST: resolve conflicts
2014-06-19NUMA: convert -numa option to use OptsVisitorWanlong Gao
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Tested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19NUMA: Add numa_info structure to contain numa nodes infoWanlong Gao
Add the numa_info structure to contain the numa nodes memory, VCPUs information and the future added numa nodes host memory policies. Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> [Fix hw/ppc/spapr.c - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19NUMA: check if the total numa memory size is equal to ram_sizeWanlong Gao
If the total number of the assigned numa nodes memory is not equal to the assigned ram size, it will write the wrong data to ACPI table, then the guest will ignore the wrong ACPI table and recognize all memory to one node. It's buggy, we should check it to ensure that we write the right data to ACPI table. Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> MST: error message reworded
2014-06-19NUMA: move numa related code to new file numa.cWanlong Gao
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> MST: comment tweaks