1 files changed, 498 insertions, 0 deletions
diff --git a/docs/devel/tracing.rst b/docs/devel/tracing.rst
new file mode 100644
index 0000000000..ba83954899
--- /dev/null
+++ b/docs/devel/tracing.rst
@@ -0,0 +1,498 @@
+=======
+Tracing
+=======
+
+Introduction
+============
+
+This document describes the tracing infrastructure in QEMU and how to use it
+for debugging, profiling, and observing execution.
+
+Quickstart
+==========
+
+Enable tracing of ``memory_region_ops_read`` and ``memory_region_ops_write``
+events::
+
+    $ qemu --trace "memory_region_ops_*" ...
+    ...
+    719585@1608130130.441188:memory_region_ops_read cpu 0 mr 0x562fdfbb3820 addr 0x3cc value 0x67 size 1
+    719585@1608130130.441190:memory_region_ops_write cpu 0 mr 0x562fdfbd2f00 addr 0x3d4 value 0x70e size 2
+
+This output comes from the "log" trace backend that is enabled by default when
+``./configure --enable-trace-backends=BACKENDS`` was not explicitly specified.
+
+Multiple patterns can be specified by repeating the ``--trace`` option::
+
+    $ qemu --trace "kvm_*" --trace "virtio_*" ...
+
+When patterns are used frequently it is more convenient to store them in a
+file to avoid long command-line options::
+
+    $ echo "memory_region_ops_*" >/tmp/events
+    $ echo "kvm_*" >>/tmp/events
+    $ qemu --trace events=/tmp/events ...
+
+Trace events
+============
+
+Sub-directory setup
+-------------------
+
+Each directory in the source tree can declare a set of trace events in a local
+"trace-events" file. All directories which contain "trace-events" files must be
+listed in the "trace_events_subdirs" variable in the top level meson.build
+file. During build, the "trace-events" file in each listed subdirectory will be
+processed by the "tracetool" script to generate code for the trace events.
+
+The individual "trace-events" files are merged into a "trace-events-all" file,
+which is also installed into "/usr/share/qemu" with the name "trace-events".
+This merged file is to be used by the "simpletrace.py" script to later analyse
+traces in the simpletrace data format.
+
+The following files are automatically generated in <builddir>/trace/ during the
+build:
+
+ - trace-<subdir>.c - the trace event state declarations
+ - trace-<subdir>.h - the trace event enums and probe functions
+ - trace-dtrace-<subdir>.h - DTrace event probe specification
+ - trace-dtrace-<subdir>.dtrace - DTrace event probe helper declaration
+ - trace-dtrace-<subdir>.o - binary DTrace provider (generated by dtrace)
+ - trace-ust-<subdir>.h - UST event probe helper declarations
+
+Here <subdir> is the sub-directory path with '/' replaced by '_'. For example,
+"accel/kvm" becomes "accel_kvm" and the final filename for "trace-<subdir>.c"
+becomes "trace-accel_kvm.c".
+
+Source files in the source tree do not directly include generated files in
+"<builddir>/trace/". Instead they #include the local "trace.h" file, without
+any sub-directory path prefix. eg io/channel-buffer.c would do::
+
+  #include "trace.h"
+
+The "io/trace.h" file must be created manually with an #include of the
+corresponding "trace/trace-<subdir>.h" file that will be generated in the
+builddir::
+
+  $ echo '#include "trace/trace-io.h"' >io/trace.h
+
+While it is possible to include a trace.h file from outside a source file's own
+sub-directory, this is discouraged in general. It is strongly preferred that
+all events be declared directly in the sub-directory that uses them. The only
+exception is where there are some shared trace events defined in the top level
+directory trace-events file.  The top level directory generates trace files
+with a filename prefix of "trace/trace-root" instead of just "trace". This is
+to avoid ambiguity between a trace.h in the current directory, vs the top level
+directory.
+
+Using trace events
+------------------
+
+Trace events are invoked directly from source code like this::
+
+    #include "trace.h"  /* needed for trace event prototype */
+    
+    void *qemu_vmalloc(size_t size)
+    {
+        void *ptr;
+        size_t align = QEMU_VMALLOC_ALIGN;
+     
+        if (size < align) {
+            align = getpagesize();
+        }
+        ptr = qemu_memalign(align, size);
+        trace_qemu_vmalloc(size, ptr);
+        return ptr;
+    }
+
+Declaring trace events
+----------------------
+
+The "tracetool" script produces the trace.h header file which is included by
+every source file that uses trace events.  Since many source files include
+trace.h, it uses a minimum of types and other header files included to keep the
+namespace clean and compile times and dependencies down.
+
+Trace events should use types as follows:
+
+ * Use stdint.h types for fixed-size types.  Most offsets and guest memory
+   addresses are best represented with uint32_t or uint64_t.  Use fixed-size
+   types over primitive types whose size may change depending on the host
+   (32-bit versus 64-bit) so trace events don't truncate values or break
+   the build.
+
+ * Use void * for pointers to structs or for arrays.  The trace.h header
+   cannot include all user-defined struct declarations and it is therefore
+   necessary to use void * for pointers to structs.
+
+ * For everything else, use primitive scalar types (char, int, long) with the
+   appropriate signedness.
+
+ * Avoid floating point types (float and double) because SystemTap does not
+   support them.  In most cases it is possible to round to an integer type
+   instead.  This may require scaling the value first by multiplying it by 1000
+   or the like when digits after the decimal point need to be preserved.
+
+Format strings should reflect the types defined in the trace event.  Take
+special care to use PRId64 and PRIu64 for int64_t and uint64_t types,
+respectively.  This ensures portability between 32- and 64-bit platforms.
+Format strings must not end with a newline character.  It is the responsibility
+of backends to adapt line ending for proper logging.
+
+Each event declaration will start with the event name, then its arguments,
+finally a format string for pretty-printing. For example::
+
+    qemu_vmalloc(size_t size, void *ptr) "size %zu ptr %p"
+    qemu_vfree(void *ptr) "ptr %p"
+
+
+Hints for adding new trace events
+---------------------------------
+
+1. Trace state changes in the code.  Interesting points in the code usually
+   involve a state change like starting, stopping, allocating, freeing.  State
+   changes are good trace events because they can be used to understand the
+   execution of the system.
+
+2. Trace guest operations.  Guest I/O accesses like reading device registers
+   are good trace events because they can be used to understand guest
+   interactions.
+
+3. Use correlator fields so the context of an individual line of trace output
+   can be understood.  For example, trace the pointer returned by malloc and
+   used as an argument to free.  This way mallocs and frees can be matched up.
+   Trace events with no context are not very useful.
+
+4. Name trace events after their function.  If there are multiple trace events
+   in one function, append a unique distinguisher at the end of the name.
+
+Generic interface and monitor commands
+======================================
+
+You can programmatically query and control the state of trace events through a
+backend-agnostic interface provided by the header "trace/control.h".
+
+Note that some of the backends do not provide an implementation for some parts
+of this interface, in which case QEMU will just print a warning (please refer to
+header "trace/control.h" to see which routines are backend-dependent).
+
+The state of events can also be queried and modified through monitor commands:
+
+* ``info trace-events``
+  View available trace events and their state.  State 1 means enabled, state 0
+  means disabled.
+
+* ``trace-event NAME on|off``
+  Enable/disable a given trace event or a group of events (using wildcards).
+
+The "--trace events=<file>" command line argument can be used to enable the
+events listed in <file> from the very beginning of the program. This file must
+contain one event name per line.
+
+If a line in the "--trace events=<file>" file begins with a '-', the trace event
+will be disabled instead of enabled.  This is useful when a wildcard was used
+to enable an entire family of events but one noisy event needs to be disabled.
+
+Wildcard matching is supported in both the monitor command "trace-event" and the
+events list file. That means you can enable/disable the events having a common
+prefix in a batch. For example, virtio-blk trace events could be enabled using
+the following monitor command::
+
+    trace-event virtio_blk_* on
+
+Trace backends
+==============
+
+The "tracetool" script automates tedious trace event code generation and also
+keeps the trace event declarations independent of the trace backend.  The trace
+events are not tightly coupled to a specific trace backend, such as LTTng or
+SystemTap.  Support for trace backends can be added by extending the "tracetool"
+script.
+
+The trace backends are chosen at configure time::
+
+    ./configure --enable-trace-backends=simple,dtrace
+
+For a list of supported trace backends, try ./configure --help or see below.
+If multiple backends are enabled, the trace is sent to them all.
+
+If no backends are explicitly selected, configure will default to the
+"log" backend.
+
+The following subsections describe the supported trace backends.
+
+Nop
+---
+
+The "nop" backend generates empty trace event functions so that the compiler
+can optimize out trace events completely.  This imposes no performance
+penalty.
+
+Note that regardless of the selected trace backend, events with the "disable"
+property will be generated with the "nop" backend.
+
+Log
+---
+
+The "log" backend sends trace events directly to standard error.  This
+effectively turns trace events into debug printfs.
+
+This is the simplest backend and can be used together with existing code that
+uses DPRINTF().
+
+The -msg timestamp=on|off command-line option controls whether or not to print
+the tid/timestamp prefix for each trace event.
+
+Simpletrace
+-----------
+
+The "simple" backend writes binary trace logs to a file from a thread, making
+it lower overhead than the "log" backend. A Python API is available for writing
+offline trace file analysis scripts. It may not be as powerful as
+platform-specific or third-party trace backends but it is portable and has no
+special library dependencies.
+
+Monitor commands
+~~~~~~~~~~~~~~~~
+
+* ``trace-file on|off|flush|set <path>``
+  Enable/disable/flush the trace file or set the trace file name.
+
+Analyzing trace files
+~~~~~~~~~~~~~~~~~~~~~
+
+The "simple" backend produces binary trace files that can be formatted with the
+simpletrace.py script.  The script takes the "trace-events-all" file and the
+binary trace::
+
+    ./scripts/simpletrace.py trace-events-all trace-12345
+
+You must ensure that the same "trace-events-all" file was used to build QEMU,
+otherwise trace event declarations may have changed and output will not be
+consistent.
+
+Ftrace
+------
+
+The "ftrace" backend writes trace data to ftrace marker. This effectively
+sends trace events to ftrace ring buffer, and you can compare qemu trace
+data and kernel(especially kvm.ko when using KVM) trace data.
+
+if you use KVM, enable kvm events in ftrace::
+
+   # echo 1 > /sys/kernel/debug/tracing/events/kvm/enable
+
+After running qemu by root user, you can get the trace::
+
+   # cat /sys/kernel/debug/tracing/trace
+
+Restriction: "ftrace" backend is restricted to Linux only.
+
+Syslog
+------
+
+The "syslog" backend sends trace events using the POSIX syslog API. The log
+is opened specifying the LOG_DAEMON facility and LOG_PID option (so events
+are tagged with the pid of the particular QEMU process that generated
+them). All events are logged at LOG_INFO level.
+
+NOTE: syslog may squash duplicate consecutive trace events and apply rate
+      limiting.
+
+Restriction: "syslog" backend is restricted to POSIX compliant OS.
+
+LTTng Userspace Tracer
+----------------------
+
+The "ust" backend uses the LTTng Userspace Tracer library.  There are no
+monitor commands built into QEMU, instead UST utilities should be used to list,
+enable/disable, and dump traces.
+
+Package lttng-tools is required for userspace tracing. You must ensure that the
+current user belongs to the "tracing" group, or manually launch the
+lttng-sessiond daemon for the current user prior to running any instance of
+QEMU.
+
+While running an instrumented QEMU, LTTng should be able to list all available
+events::
+
+    lttng list -u
+
+Create tracing session::
+
+    lttng create mysession
+
+Enable events::
+
+    lttng enable-event qemu:g_malloc -u
+
+Where the events can either be a comma-separated list of events, or "-a" to
+enable all tracepoint events. Start and stop tracing as needed::
+
+    lttng start
+    lttng stop
+
+View the trace::
+
+    lttng view
+
+Destroy tracing session::
+
+    lttng destroy
+
+Babeltrace can be used at any later time to view the trace::
+
+    babeltrace $HOME/lttng-traces/mysession-<date>-<time>
+
+SystemTap
+---------
+
+The "dtrace" backend uses DTrace sdt probes but has only been tested with
+SystemTap.  When SystemTap support is detected a .stp file with wrapper probes
+is generated to make use in scripts more convenient.  This step can also be
+performed manually after a build in order to change the binary name in the .stp
+probes::
+
+    scripts/tracetool.py --backends=dtrace --format=stap \
+                         --binary path/to/qemu-binary \
+                         --target-type system \
+                         --target-name x86_64 \
+                         --group=all \
+                         trace-events-all \
+                         qemu.stp
+
+To facilitate simple usage of systemtap where there merely needs to be printf
+logging of certain probes, a helper script "qemu-trace-stap" is provided.
+Consult its manual page for guidance on its usage.
+
+Trace event properties
+======================
+
+Each event in the "trace-events-all" file can be prefixed with a space-separated
+list of zero or more of the following event properties.
+
+"disable"
+---------
+
+If a specific trace event is going to be invoked a huge number of times, this
+might have a noticeable performance impact even when the event is
+programmatically disabled.
+
+In this case you should declare such event with the "disable" property. This
+will effectively disable the event at compile time (by using the "nop" backend),
+thus having no performance impact at all on regular builds (i.e., unless you
+edit the "trace-events-all" file).
+
+In addition, there might be cases where relatively complex computations must be
+performed to generate values that are only used as arguments for a trace
+function. In these cases you can use 'trace_event_get_state_backends()' to
+guard such computations, so they are skipped if the event has been either
+compile-time disabled or run-time disabled. If the event is compile-time
+disabled, this check will have no performance impact.
+
+::
+
+    #include "trace.h"  /* needed for trace event prototype */
+    
+    void *qemu_vmalloc(size_t size)
+    {
+        void *ptr;
+        size_t align = QEMU_VMALLOC_ALIGN;
+    
+        if (size < align) {
+            align = getpagesize();
+        }
+        ptr = qemu_memalign(align, size);
+        if (trace_event_get_state_backends(TRACE_QEMU_VMALLOC)) {
+            void *complex;
+            /* some complex computations to produce the 'complex' value */
+            trace_qemu_vmalloc(size, ptr, complex);
+        }
+        return ptr;
+    }
+
+"tcg"
+-----
+
+Guest code generated by TCG can be traced by defining an event with the "tcg"
+event property. Internally, this property generates two events:
+"<eventname>_trans" to trace the event at translation time, and
+"<eventname>_exec" to trace the event at execution time.
+
+Instead of using these two events, you should instead use the function
+"trace_<eventname>_tcg" during translation (TCG code generation). This function
+will automatically call "trace_<eventname>_trans", and will generate the
+necessary TCG code to call "trace_<eventname>_exec" during guest code execution.
+
+Events with the "tcg" property can be declared in the "trace-events" file with a
+mix of native and TCG types, and "trace_<eventname>_tcg" will gracefully forward
+them to the "<eventname>_trans" and "<eventname>_exec" events. Since TCG values
+are not known at translation time, these are ignored by the "<eventname>_trans"
+event. Because of this, the entry in the "trace-events" file needs two printing
+formats (separated by a comma)::
+
+    tcg foo(uint8_t a1, TCGv_i32 a2) "a1=%d", "a1=%d a2=%d"
+
+For example::
+
+    #include "trace-tcg.h"
+    
+    void some_disassembly_func (...)
+    {
+        uint8_t a1 = ...;
+        TCGv_i32 a2 = ...;
+        trace_foo_tcg(a1, a2);
+    }
+
+This will immediately call::
+
+    void trace_foo_trans(uint8_t a1);
+
+and will generate the TCG code to call::
+
+    void trace_foo(uint8_t a1, uint32_t a2);
+
+"vcpu"
+------
+
+Identifies events that trace vCPU-specific information. It implicitly adds a
+"CPUState*" argument, and extends the tracing print format to show the vCPU
+information. If used together with the "tcg" property, it adds a second
+"TCGv_env" argument that must point to the per-target global TCG register that
+points to the vCPU when guest code is executed (usually the "cpu_env" variable).
+
+The "tcg" and "vcpu" properties are currently only honored in the root
+./trace-events file.
+
+The following example events::
+
+    foo(uint32_t a) "a=%x"
+    vcpu bar(uint32_t a) "a=%x"
+    tcg vcpu baz(uint32_t a) "a=%x", "a=%x"
+
+Can be used as::
+
+    #include "trace-tcg.h"
+    
+    CPUArchState *env;
+    TCGv_ptr cpu_env;
+    
+    void some_disassembly_func(...)
+    {
+        /* trace emitted at this point */
+        trace_foo(0xd1);
+        /* trace emitted at this point */
+        trace_bar(env_cpu(env), 0xd2);
+        /* trace emitted at this point (env) and when guest code is executed (cpu_env) */
+        trace_baz_tcg(env_cpu(env), cpu_env, 0xd3);
+    }
+
+If the translating vCPU has address 0xc1 and code is later executed by vCPU
+0xc2, this would be an example output::
+
+    // at guest code translation
+    foo a=0xd1
+    bar cpu=0xc1 a=0xd2
+    baz_trans cpu=0xc1 a=0xd3
+    // at guest code execution
+    baz_exec cpu=0xc2 a=0xd3