diff options
Diffstat (limited to 'docs/devel/tracing.rst')
-rw-r--r-- | docs/devel/tracing.rst | 498 |
1 files changed, 498 insertions, 0 deletions
diff --git a/docs/devel/tracing.rst b/docs/devel/tracing.rst new file mode 100644 index 0000000000..ba83954899 --- /dev/null +++ b/docs/devel/tracing.rst @@ -0,0 +1,498 @@ +======= +Tracing +======= + +Introduction +============ + +This document describes the tracing infrastructure in QEMU and how to use it +for debugging, profiling, and observing execution. + +Quickstart +========== + +Enable tracing of ``memory_region_ops_read`` and ``memory_region_ops_write`` +events:: + + $ qemu --trace "memory_region_ops_*" ... + ... + 719585@1608130130.441188:memory_region_ops_read cpu 0 mr 0x562fdfbb3820 addr 0x3cc value 0x67 size 1 + 719585@1608130130.441190:memory_region_ops_write cpu 0 mr 0x562fdfbd2f00 addr 0x3d4 value 0x70e size 2 + +This output comes from the "log" trace backend that is enabled by default when +``./configure --enable-trace-backends=BACKENDS`` was not explicitly specified. + +Multiple patterns can be specified by repeating the ``--trace`` option:: + + $ qemu --trace "kvm_*" --trace "virtio_*" ... + +When patterns are used frequently it is more convenient to store them in a +file to avoid long command-line options:: + + $ echo "memory_region_ops_*" >/tmp/events + $ echo "kvm_*" >>/tmp/events + $ qemu --trace events=/tmp/events ... + +Trace events +============ + +Sub-directory setup +------------------- + +Each directory in the source tree can declare a set of trace events in a local +"trace-events" file. All directories which contain "trace-events" files must be +listed in the "trace_events_subdirs" variable in the top level meson.build +file. During build, the "trace-events" file in each listed subdirectory will be +processed by the "tracetool" script to generate code for the trace events. + +The individual "trace-events" files are merged into a "trace-events-all" file, +which is also installed into "/usr/share/qemu" with the name "trace-events". +This merged file is to be used by the "simpletrace.py" script to later analyse +traces in the simpletrace data format. + +The following files are automatically generated in <builddir>/trace/ during the +build: + + - trace-<subdir>.c - the trace event state declarations + - trace-<subdir>.h - the trace event enums and probe functions + - trace-dtrace-<subdir>.h - DTrace event probe specification + - trace-dtrace-<subdir>.dtrace - DTrace event probe helper declaration + - trace-dtrace-<subdir>.o - binary DTrace provider (generated by dtrace) + - trace-ust-<subdir>.h - UST event probe helper declarations + +Here <subdir> is the sub-directory path with '/' replaced by '_'. For example, +"accel/kvm" becomes "accel_kvm" and the final filename for "trace-<subdir>.c" +becomes "trace-accel_kvm.c". + +Source files in the source tree do not directly include generated files in +"<builddir>/trace/". Instead they #include the local "trace.h" file, without +any sub-directory path prefix. eg io/channel-buffer.c would do:: + + #include "trace.h" + +The "io/trace.h" file must be created manually with an #include of the +corresponding "trace/trace-<subdir>.h" file that will be generated in the +builddir:: + + $ echo '#include "trace/trace-io.h"' >io/trace.h + +While it is possible to include a trace.h file from outside a source file's own +sub-directory, this is discouraged in general. It is strongly preferred that +all events be declared directly in the sub-directory that uses them. The only +exception is where there are some shared trace events defined in the top level +directory trace-events file. The top level directory generates trace files +with a filename prefix of "trace/trace-root" instead of just "trace". This is +to avoid ambiguity between a trace.h in the current directory, vs the top level +directory. + +Using trace events +------------------ + +Trace events are invoked directly from source code like this:: + + #include "trace.h" /* needed for trace event prototype */ + + void *qemu_vmalloc(size_t size) + { + void *ptr; + size_t align = QEMU_VMALLOC_ALIGN; + + if (size < align) { + align = getpagesize(); + } + ptr = qemu_memalign(align, size); + trace_qemu_vmalloc(size, ptr); + return ptr; + } + +Declaring trace events +---------------------- + +The "tracetool" script produces the trace.h header file which is included by +every source file that uses trace events. Since many source files include +trace.h, it uses a minimum of types and other header files included to keep the +namespace clean and compile times and dependencies down. + +Trace events should use types as follows: + + * Use stdint.h types for fixed-size types. Most offsets and guest memory + addresses are best represented with uint32_t or uint64_t. Use fixed-size + types over primitive types whose size may change depending on the host + (32-bit versus 64-bit) so trace events don't truncate values or break + the build. + + * Use void * for pointers to structs or for arrays. The trace.h header + cannot include all user-defined struct declarations and it is therefore + necessary to use void * for pointers to structs. + + * For everything else, use primitive scalar types (char, int, long) with the + appropriate signedness. + + * Avoid floating point types (float and double) because SystemTap does not + support them. In most cases it is possible to round to an integer type + instead. This may require scaling the value first by multiplying it by 1000 + or the like when digits after the decimal point need to be preserved. + +Format strings should reflect the types defined in the trace event. Take +special care to use PRId64 and PRIu64 for int64_t and uint64_t types, +respectively. This ensures portability between 32- and 64-bit platforms. +Format strings must not end with a newline character. It is the responsibility +of backends to adapt line ending for proper logging. + +Each event declaration will start with the event name, then its arguments, +finally a format string for pretty-printing. For example:: + + qemu_vmalloc(size_t size, void *ptr) "size %zu ptr %p" + qemu_vfree(void *ptr) "ptr %p" + + +Hints for adding new trace events +--------------------------------- + +1. Trace state changes in the code. Interesting points in the code usually + involve a state change like starting, stopping, allocating, freeing. State + changes are good trace events because they can be used to understand the + execution of the system. + +2. Trace guest operations. Guest I/O accesses like reading device registers + are good trace events because they can be used to understand guest + interactions. + +3. Use correlator fields so the context of an individual line of trace output + can be understood. For example, trace the pointer returned by malloc and + used as an argument to free. This way mallocs and frees can be matched up. + Trace events with no context are not very useful. + +4. Name trace events after their function. If there are multiple trace events + in one function, append a unique distinguisher at the end of the name. + +Generic interface and monitor commands +====================================== + +You can programmatically query and control the state of trace events through a +backend-agnostic interface provided by the header "trace/control.h". + +Note that some of the backends do not provide an implementation for some parts +of this interface, in which case QEMU will just print a warning (please refer to +header "trace/control.h" to see which routines are backend-dependent). + +The state of events can also be queried and modified through monitor commands: + +* ``info trace-events`` + View available trace events and their state. State 1 means enabled, state 0 + means disabled. + +* ``trace-event NAME on|off`` + Enable/disable a given trace event or a group of events (using wildcards). + +The "--trace events=<file>" command line argument can be used to enable the +events listed in <file> from the very beginning of the program. This file must +contain one event name per line. + +If a line in the "--trace events=<file>" file begins with a '-', the trace event +will be disabled instead of enabled. This is useful when a wildcard was used +to enable an entire family of events but one noisy event needs to be disabled. + +Wildcard matching is supported in both the monitor command "trace-event" and the +events list file. That means you can enable/disable the events having a common +prefix in a batch. For example, virtio-blk trace events could be enabled using +the following monitor command:: + + trace-event virtio_blk_* on + +Trace backends +============== + +The "tracetool" script automates tedious trace event code generation and also +keeps the trace event declarations independent of the trace backend. The trace +events are not tightly coupled to a specific trace backend, such as LTTng or +SystemTap. Support for trace backends can be added by extending the "tracetool" +script. + +The trace backends are chosen at configure time:: + + ./configure --enable-trace-backends=simple,dtrace + +For a list of supported trace backends, try ./configure --help or see below. +If multiple backends are enabled, the trace is sent to them all. + +If no backends are explicitly selected, configure will default to the +"log" backend. + +The following subsections describe the supported trace backends. + +Nop +--- + +The "nop" backend generates empty trace event functions so that the compiler +can optimize out trace events completely. This imposes no performance +penalty. + +Note that regardless of the selected trace backend, events with the "disable" +property will be generated with the "nop" backend. + +Log +--- + +The "log" backend sends trace events directly to standard error. This +effectively turns trace events into debug printfs. + +This is the simplest backend and can be used together with existing code that +uses DPRINTF(). + +The -msg timestamp=on|off command-line option controls whether or not to print +the tid/timestamp prefix for each trace event. + +Simpletrace +----------- + +The "simple" backend writes binary trace logs to a file from a thread, making +it lower overhead than the "log" backend. A Python API is available for writing +offline trace file analysis scripts. It may not be as powerful as +platform-specific or third-party trace backends but it is portable and has no +special library dependencies. + +Monitor commands +~~~~~~~~~~~~~~~~ + +* ``trace-file on|off|flush|set <path>`` + Enable/disable/flush the trace file or set the trace file name. + +Analyzing trace files +~~~~~~~~~~~~~~~~~~~~~ + +The "simple" backend produces binary trace files that can be formatted with the +simpletrace.py script. The script takes the "trace-events-all" file and the +binary trace:: + + ./scripts/simpletrace.py trace-events-all trace-12345 + +You must ensure that the same "trace-events-all" file was used to build QEMU, +otherwise trace event declarations may have changed and output will not be +consistent. + +Ftrace +------ + +The "ftrace" backend writes trace data to ftrace marker. This effectively +sends trace events to ftrace ring buffer, and you can compare qemu trace +data and kernel(especially kvm.ko when using KVM) trace data. + +if you use KVM, enable kvm events in ftrace:: + + # echo 1 > /sys/kernel/debug/tracing/events/kvm/enable + +After running qemu by root user, you can get the trace:: + + # cat /sys/kernel/debug/tracing/trace + +Restriction: "ftrace" backend is restricted to Linux only. + +Syslog +------ + +The "syslog" backend sends trace events using the POSIX syslog API. The log +is opened specifying the LOG_DAEMON facility and LOG_PID option (so events +are tagged with the pid of the particular QEMU process that generated +them). All events are logged at LOG_INFO level. + +NOTE: syslog may squash duplicate consecutive trace events and apply rate + limiting. + +Restriction: "syslog" backend is restricted to POSIX compliant OS. + +LTTng Userspace Tracer +---------------------- + +The "ust" backend uses the LTTng Userspace Tracer library. There are no +monitor commands built into QEMU, instead UST utilities should be used to list, +enable/disable, and dump traces. + +Package lttng-tools is required for userspace tracing. You must ensure that the +current user belongs to the "tracing" group, or manually launch the +lttng-sessiond daemon for the current user prior to running any instance of +QEMU. + +While running an instrumented QEMU, LTTng should be able to list all available +events:: + + lttng list -u + +Create tracing session:: + + lttng create mysession + +Enable events:: + + lttng enable-event qemu:g_malloc -u + +Where the events can either be a comma-separated list of events, or "-a" to +enable all tracepoint events. Start and stop tracing as needed:: + + lttng start + lttng stop + +View the trace:: + + lttng view + +Destroy tracing session:: + + lttng destroy + +Babeltrace can be used at any later time to view the trace:: + + babeltrace $HOME/lttng-traces/mysession-<date>-<time> + +SystemTap +--------- + +The "dtrace" backend uses DTrace sdt probes but has only been tested with +SystemTap. When SystemTap support is detected a .stp file with wrapper probes +is generated to make use in scripts more convenient. This step can also be +performed manually after a build in order to change the binary name in the .stp +probes:: + + scripts/tracetool.py --backends=dtrace --format=stap \ + --binary path/to/qemu-binary \ + --target-type system \ + --target-name x86_64 \ + --group=all \ + trace-events-all \ + qemu.stp + +To facilitate simple usage of systemtap where there merely needs to be printf +logging of certain probes, a helper script "qemu-trace-stap" is provided. +Consult its manual page for guidance on its usage. + +Trace event properties +====================== + +Each event in the "trace-events-all" file can be prefixed with a space-separated +list of zero or more of the following event properties. + +"disable" +--------- + +If a specific trace event is going to be invoked a huge number of times, this +might have a noticeable performance impact even when the event is +programmatically disabled. + +In this case you should declare such event with the "disable" property. This +will effectively disable the event at compile time (by using the "nop" backend), +thus having no performance impact at all on regular builds (i.e., unless you +edit the "trace-events-all" file). + +In addition, there might be cases where relatively complex computations must be +performed to generate values that are only used as arguments for a trace +function. In these cases you can use 'trace_event_get_state_backends()' to +guard such computations, so they are skipped if the event has been either +compile-time disabled or run-time disabled. If the event is compile-time +disabled, this check will have no performance impact. + +:: + + #include "trace.h" /* needed for trace event prototype */ + + void *qemu_vmalloc(size_t size) + { + void *ptr; + size_t align = QEMU_VMALLOC_ALIGN; + + if (size < align) { + align = getpagesize(); + } + ptr = qemu_memalign(align, size); + if (trace_event_get_state_backends(TRACE_QEMU_VMALLOC)) { + void *complex; + /* some complex computations to produce the 'complex' value */ + trace_qemu_vmalloc(size, ptr, complex); + } + return ptr; + } + +"tcg" +----- + +Guest code generated by TCG can be traced by defining an event with the "tcg" +event property. Internally, this property generates two events: +"<eventname>_trans" to trace the event at translation time, and +"<eventname>_exec" to trace the event at execution time. + +Instead of using these two events, you should instead use the function +"trace_<eventname>_tcg" during translation (TCG code generation). This function +will automatically call "trace_<eventname>_trans", and will generate the +necessary TCG code to call "trace_<eventname>_exec" during guest code execution. + +Events with the "tcg" property can be declared in the "trace-events" file with a +mix of native and TCG types, and "trace_<eventname>_tcg" will gracefully forward +them to the "<eventname>_trans" and "<eventname>_exec" events. Since TCG values +are not known at translation time, these are ignored by the "<eventname>_trans" +event. Because of this, the entry in the "trace-events" file needs two printing +formats (separated by a comma):: + + tcg foo(uint8_t a1, TCGv_i32 a2) "a1=%d", "a1=%d a2=%d" + +For example:: + + #include "trace-tcg.h" + + void some_disassembly_func (...) + { + uint8_t a1 = ...; + TCGv_i32 a2 = ...; + trace_foo_tcg(a1, a2); + } + +This will immediately call:: + + void trace_foo_trans(uint8_t a1); + +and will generate the TCG code to call:: + + void trace_foo(uint8_t a1, uint32_t a2); + +"vcpu" +------ + +Identifies events that trace vCPU-specific information. It implicitly adds a +"CPUState*" argument, and extends the tracing print format to show the vCPU +information. If used together with the "tcg" property, it adds a second +"TCGv_env" argument that must point to the per-target global TCG register that +points to the vCPU when guest code is executed (usually the "cpu_env" variable). + +The "tcg" and "vcpu" properties are currently only honored in the root +./trace-events file. + +The following example events:: + + foo(uint32_t a) "a=%x" + vcpu bar(uint32_t a) "a=%x" + tcg vcpu baz(uint32_t a) "a=%x", "a=%x" + +Can be used as:: + + #include "trace-tcg.h" + + CPUArchState *env; + TCGv_ptr cpu_env; + + void some_disassembly_func(...) + { + /* trace emitted at this point */ + trace_foo(0xd1); + /* trace emitted at this point */ + trace_bar(env_cpu(env), 0xd2); + /* trace emitted at this point (env) and when guest code is executed (cpu_env) */ + trace_baz_tcg(env_cpu(env), cpu_env, 0xd3); + } + +If the translating vCPU has address 0xc1 and code is later executed by vCPU +0xc2, this would be an example output:: + + // at guest code translation + foo a=0xd1 + bar cpu=0xc1 a=0xd2 + baz_trans cpu=0xc1 a=0xd3 + // at guest code execution + baz_exec cpu=0xc2 a=0xd3 |