stapprobes - systemtap probe points
The following sections enumerate the variety of probe points supported by the systemtap translator, and some of the additional aliases defined by standard tapset scripts. Many are individually documented in the 3stap manual section, with the probe:: prefix.
probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
A probe declaration may list multiple comma-separated probe points in
order to attach a handler to all of the named events. Normally, the
handler statements are run whenever any of events occur.
The syntax of a single probe point is a general dotted-symbol sequence.
This allows a breakdown of the event namespace into parts, somewhat
like the Domain Name System does on the Internet. Each component
identifier may be parametrized by a string or number literal, with a
syntax like a function call. A component may include a "*" character,
to expand to a set of matching probe points. It may also include "**"
to match multiple sequential components at once. Probe aliases
likewise expand to other probe points.
Probe aliases can be given on their own, or with a suffix. The suffix
attaches to the underlying probe point that the alias is expanded to.
For example,
syscall.read.return.maxactive(10)
expands to
kernel.function("sys_read").return.maxactive(10)
with the component maxactive(10) being recognized as a suffix.
Normally, each and every probe point resulting from wildcard- and
alias-expansion must be resolved to some low-level system
instrumentation facility (e.g., a kprobe address, marker, or a timer
configuration), otherwise the elaboration phase will fail.
However, a probe point may be followed by a "?" character, to indicate
that it is optional, and that no error should result if it fails to
resolve. Optionalness passes down through all levels of alias/wildcard
expansion. Alternately, a probe point may be followed by a "!"
character, to indicate that it is both optional and sufficient. (Think
vaguely of the Prolog cut operator.) If it does resolve, then no
further probe points in the same comma-separated list will be resolved.
Therefore, the "!" sufficiency mark only makes sense in a list of
probe point alternatives.
Additionally, a probe point may be followed by a "if (expr)" statement,
in order to enable/disable the probe point on-the-fly. With the "if"
statement, if the "expr" is false when the probe point is hit, the
whole probe body including alias's body is skipped. The condition is
stacked up through all levels of alias/wildcard expansion. So the final
condition becomes the logical-and of conditions of all expanded
alias/wildcard. The expressions are necessarily restricted to global
variables.
These are all syntactically valid probe points. (They are generally
semantically invalid, depending on the contents of the tapsets, and the
versions of kernel/user software installed.)
kernel.function("foo").return
process("/bin/vi").statement(0x2222)
end
syscall.*
syscall.*.return.maxactive(10)
syscall.{open,close}
sys**open
kernel.function("no_such_function") ?
module("awol").function("no_such_function") !
signal.*? if (switch)
kprobe.function("foo")
Probes may be broadly classified into "synchronous" and "asynchronous".
A "synchronous" event is deemed to occur when any processor executes an
instruction matched by the specification. This gives these probes a
reference point (instruction address) from which more contextual data
may be available. Other families of probe points refer to
"asynchronous" events such as timers/counters rolling over, where there
is no fixed reference point that is related. Each probe point
specification may match multiple locations (for example, using
wildcards or aliases), and all them are then probed. A probe
declaration may also contain several comma-separated specifications,
all of which are probed.
Brace expansion is a mechanism which allows a list of probe points to
be generated. It is very similar to shell expansion. A component may be
surrounded by a pair of curly braces to indicate that the comma-
separated sequence of one or more subcomponents will each constitute a
new probe point. The braces may be arbitrarily nested. The ordering of
expanded results is based on product order.
The question mark (?), exclamation mark (!) indicators and probe point
conditions may not be placed in any expansions that are before the last
component.
The following is an example of brace expansion.
syscall.{write,read}
# Expands to
syscall.write, syscall.read
{kernel,module("nfs")}.function("nfs*")!
# Expands to
kernel.function("nfs*")!, module("nfs").function("nfs*")!
Resolving some probe points requires DWARF debuginfo or "debug symbols" for the specific program being instrumented. For some others, DWARF is automatically synthesized on the fly from source code header files. For others, it is not needed at all. Since a systemtap script may use any mixture of probe points together, the union of their DWARF requirements has to be met on the computer where script compilation occurs. (See the --use-server option and the stap-server(8) man page for information about the remote compilation facility, which allows these requirements to be met on a different machine.) The following point lists many of the available probe point families, to classify them with respect to their need for DWARF debuginfo for the specific program for that probe point. DWARF NON-DWARF SYMBOL-TABLE kernel.function, .statement kernel.mark kernel.function* module.function, .statement process.mark, process.plt module.function* process.function, .statement begin, end, error, never process.function* process.mark* timer .function.callee perf procfs AUTO-GENERATED-DWARF kernel.statement.absolute kernel.data kernel.trace kprobe.function process.statement.absolute process.begin, .end netfilter java The probe types marked with * asterisks mark fallbacks, where systemtap can sometimes infer subset or substitute information. In general, the more symbolic / debugging information available, the higher quality probing will be available.
The following types of probe points may be armed/disarmed on-the-fly to save overheads during uninteresting times. Arming conditions may also be added to other types of probes, but will be treated as a wrapping conditional and won't benefit from overhead savings. DISARMABLE exceptions kernel.function, kernel.statement module.function, module.statement process.*.function, process.*.statement process.*.plt, process.*.mark timer. timer.profile java
BEGIN/END/ERROR
The probe points begin and end are defined by the translator to refer
to the time of session startup and shutdown. All "begin" probe
handlers are run, in some sequence, during the startup of the session.
All global variables will have been initialized prior to this point.
All "end" probes are run, in some sequence, during the normal shutdown
of a session, such as in the aftermath of an exit () function call, or
an interruption from the user. In the case of an error-triggered
shutdown, "end" probes are not run. There are no target variables
available in either context.
If the order of execution among "begin" or "end" probes is significant,
then an optional sequence number may be provided:
begin(N)
end(N)
The number N may be positive or negative. The probe handlers are run
in increasing order, and the order between handlers with the same
sequence number is unspecified. When "begin" or "end" are given
without a sequence, they are effectively sequence zero.
The error probe point is similar to the end probe, except that each
such probe handler run when the session ends after errors have
occurred. In such cases, "end" probes are skipped, but each "error"
probe is still attempted. This kind of probe can be used to clean up
or emit a "final gasp". It may also be numerically parametrized to set
a sequence.
NEVER
The probe point never is specially defined by the translator to mean
"never". Its probe handler is never run, though its statements are
analyzed for symbol / type correctness as usual. This probe point may
be useful in conjunction with optional probes.
SYSCALL and ND_SYSCALL
The syscall.* and nd_syscall.* aliases define several hundred probes,
too many to detail here. They are of the general form:
syscall.NAME
nd_syscall.NAME
syscall.NAME.return
nd_syscall.NAME.return
Generally, a pair of probes are defined for each normal system call as
listed in the syscalls(2) manual page, one for entry and one for
return. Those system calls that never return do not have a
corresponding .return probe. The nd_* family of probes are about the
same, except it uses non-DWARF based searching mechanisms, which may
result in a lower quality of symbolic context data (parameters), and
may miss some system calls. You may want to try them first, in case
kernel debugging information is not immediately available.
Each probe alias provides a variety of variables. Looking at the tapset
source code is the most reliable way. Generally, each variable listed
in the standard manual page is made available as a script-level
variable, so syscall.open exposes filename, flags, and mode. In
addition, a standard suite of variables is available at most aliases:
argstr A pretty-printed form of the entire argument list, without
parentheses.
name The name of the system call.
retstr For return probes, a pretty-printed form of the system-call
result.
As usual for probe aliases, these variables are all initialized once
from the underlying $context variables, so that later changes to
$context variables are not automatically reflected. Not all probe
aliases obey all of these general guidelines. Please report any
bothersome ones you encounter as a bug. Note that on some
kernel/userspace architecture combinations (e.g., 32-bit userspace on
64-bit kernel), the underlying $context variables may need explicit
sign extension / masking. When this is an issue, consider using the
tapset-provided variables instead of raw $context variables.
If debuginfo availability is a problem, you may try using the non-DWARF
syscall probe aliases instead. Use the nd_syscall. prefix instead of
syscall. The same context variables are available, as far as possible.
TIMERS
Intervals defined by the standard kernel "jiffies" timer may be used to
trigger probe handlers asynchronously. Two probe point variants are
supported by the translator:
timer.jiffies(N)
timer.jiffies(N).randomize(M)
The probe handler is run every N jiffies (a kernel-defined unit of
time, typically between 1 and 60 ms). If the "randomize" component is
given, a linearly distributed random value in the range [-M..+M] is
added to N every time the handler is run. N is restricted to a
reasonable range (1 to around a million), and M is restricted to be
smaller than N. There are no target variables provided in either
context. It is possible for such probes to be run concurrently on a
multi-processor computer.
Alternatively, intervals may be specified in units of time. There are
two probe point variants similar to the jiffies timer:
timer.ms(N)
timer.ms(N).randomize(M)
Here, N and M are specified in milliseconds, but the full options for
units are seconds (s/sec), milliseconds (ms/msec), microseconds
(us/usec), nanoseconds (ns/nsec), and hertz (hz). Randomization is not
supported for hertz timers.
The actual resolution of the timers depends on the target kernel. For
kernels prior to 2.6.17, timers are limited to jiffies resolution, so
intervals are rounded up to the nearest jiffies interval. After
2.6.17, the implementation uses hrtimers for tighter precision, though
the actual resolution will be arch-dependent. In either case, if the
"randomize" component is given, then the random value will be added to
the interval before any rounding occurs.
Profiling timers are also available to provide probes that execute on
all CPUs at the rate of the system tick (CONFIG_HZ) or at a given
frequency (hz). On some kernels, this is a one-concurrent-user-only or
disabled facility, resulting in error -16 (EBUSY) during probe
registration.
timer.profile.tick
timer.profile.freq.hz(N)
Full context information of the interrupted process is available,
making this probe suitable for a time-based sampling profiler.
It is recommended to use the tapset probe timer.profile rather than
timer.profile.tick. This probe point behaves identically to
timer.profile.tick when the underlying functionality is available, and
falls back to using perf.sw.cpu_clock on some recent kernels which lack
the corresponding profile timer facility.
Profiling timers with specified frequencies are only accurate up to
around 100 hz. You may need to provide a larger value to achieve the
desired rate.
DWARF
This family of probe points uses symbolic debugging information for the
target kernel/module/program, as may be found in unstripped
executables, or the separate debuginfo packages. They allow placement
of probes logically into the execution path of the target program, by
specifying a set of points in the source or object code. When a
matching statement executes on any processor, the probe handler is run
in that context.
Probe points in the DWARF family can be identified by the target kernel
module (or user process), source file, line number, function name, or
some combination of these.
Here is a list of DWARF probe points currently supported:
kernel.function(PATTERN)
kernel.function(PATTERN).call
kernel.function(PATTERN).callee(PATTERN)
kernel.function(PATTERN).callee(PATTERN).return
kernel.function(PATTERN).callee(PATTERN).call
kernel.function(PATTERN).callees(DEPTH)
kernel.function(PATTERN).return
kernel.function(PATTERN).inline
kernel.function(PATTERN).label(LPATTERN)
module(MPATTERN).function(PATTERN)
module(MPATTERN).function(PATTERN).call
module(MPATTERN).function(PATTERN).callee(PATTERN)
module(MPATTERN).function(PATTERN).callee(PATTERN).return
module(MPATTERN).function(PATTERN).callee(PATTERN).call
module(MPATTERN).function(PATTERN).callees(DEPTH)
module(MPATTERN).function(PATTERN).return
module(MPATTERN).function(PATTERN).inline
module(MPATTERN).function(PATTERN).label(LPATTERN)
kernel.statement(PATTERN)
kernel.statement(PATTERN).nearest
kernel.statement(ADDRESS).absolute
module(MPATTERN).statement(PATTERN)
process("PATH").function("NAME")
process("PATH").statement("*@FILE.c:123")
process("PATH").library("PATH").function("NAME")
process("PATH").library("PATH").statement("*@FILE.c:123")
process("PATH").library("PATH").statement("*@FILE.c:123").nearest
process("PATH").function("*").return
process("PATH").function("myfun").label("foo")
process("PATH").function("foo").callee("bar")
process("PATH").function("foo").callee("bar").return
process("PATH").function("foo").callee("bar").call
process("PATH").function("foo").callees(DEPTH)
process(PID).function("NAME")
process(PID).function("myfun").label("foo")
process(PID).plt("NAME")
process(PID).plt("NAME").return
process(PID).statement("*@FILE.c:123")
process(PID).statement("*@FILE.c:123").nearest
process(PID).statement(ADDRESS).absolute
(See the USER-SPACE section below for more information on the process
probes.)
The list above includes multiple variants and modifiers which provide
additional functionality or filters. They are:
.function
Places a probe near the beginning of the named function,
so that parameters are available as context variables.
.return
Places a probe at the moment after the return from the
named function, so the return value is available as the
"$return" context variable.
.inline
Filters the results to include only instances of inlined
functions. Note that inlined functions do not have an
identifiable return point, so .return is not supported on
.inline probes.
.call Filters the results to include only non-inlined functions
(the opposite set of .inline)
.exported
Filters the results to include only exported functions.
.statement
Places a probe at the exact spot, exposing those local
variables that are visible there.
.statement.nearest
Places a probe at the nearest available line number for
each line number given in the statement.
.callee
Places a probe on the callee function given in the
.callee modifier, where the callee must be a function
called by the target function given in .function. The
advantage of doing this over directly probing the callee
function is that this probe point is run only when the
callee is called from the target function (add the
-DSTAP_CALLEE_MATCHALL directive to override this when
calling stap(1)).
Note that only callees that can be statically determined
are available. For example, calls through function
pointers are not available. Additionally, calls to
functions located in other objects (e.g. libraries) are
not available (instead use another probe point). This
feature will only work for code compiled with GCC 4.7+.
.callees
Shortcut for .callee("*"), which places a probe on all
callees of the function.
.callees(DEPTH)
Recursively places probes on callees. For example,
.callees(2) will probe both callees of the target
function, as well as callees of those callees. And
.callees(3) goes one level deeper, etc... A callee probe
at depth N is only triggered when the N callers in the
callstack match those that were statically determined
during analysis (this also may be overridden using
-DSTAP_CALLEE_MATCHALL).
In the above list of probe points, MPATTERN stands for a string literal
that aims to identify the loaded kernel module of interest. For in-tree
kernel modules, the name suffices (e.g. "btrfs"). The name may also
include the "*", "[]", and "?" wildcards to match multiple in-tree
modules. Out-of-tree modules are also supported by specifying the full
path to the ko file. Wildcards are not supported. The file must follow
the convention of being named <module_name>.ko (characters ',' and '-'
are replaced by '_').
LPATTERN stands for a source program label. It may also contain "*",
"[]", and "?" wildcards. PATTERN stands for a string literal that aims
to identify a point in the program. It is made up of three parts:
* The first part is the name of a function, as would appear in the nm
program's output. This part may use the "*" and "?" wildcarding
operators to match multiple names.
* The second part is optional and begins with the "@" character. It
is followed by the path to the source file containing the function,
which may include a wildcard pattern, such as mm/slab*. If it does
not match as is, an implicit "*/" is optionally added before the
pattern, so that a script need only name the last few components of
a possibly long source directory path.
* Finally, the third part is optional if the file name part was
given, and identifies the line number in the source file preceded
by a ":" or a "+". The line number is assumed to be an absolute
line number if preceded by a ":", or relative to the declaration
line of the function if preceded by a "+". All the lines in the
function can be matched with ":*". A range of lines x through y
can be matched with ":x-y". Ranges and specific lines can be mixed
using commas, e.g. ":x,y-z".
As an alternative, PATTERN may be a numeric constant, indicating an
address. Such an address may be found from symbol tables of the
appropriate kernel / module object file. It is verified against known
statement code boundaries, and will be relocated for use at run time.
In guru mode only, absolute kernel-space addresses may be specified
with the ".absolute" suffix. Such an address is considered already
relocated, as if it came from /proc/kallsyms, so it cannot be checked
against statement/instruction boundaries.
CONTEXT VARIABLES
Many of the source-level context variables, such as function
parameters, locals, globals visible in the compilation unit, may be
visible to probe handlers. They may refer to these variables by
prefixing their name with "$" within the scripts. In addition, a
special syntax allows limited traversal of structures, pointers, and
arrays. More syntax allows pretty-printing of individual variables or
their groups. See also @cast. Note that variables may be inaccessible
due to them being paged out, or for a few other reasons. See also man
error::fault(7stap).
$var refers to an in-scope variable "var". If it's an integer-like
type, it will be cast to a 64-bit int for systemtap script use.
String-like pointers (char *) may be copied to systemtap string
values using the kernel_string or user_string functions.
@var("varname")
an alternative syntax for $varname
@var("varname@src/file.c")
refers to the global (either file local or external) variable
varname defined when the file src/file.c was compiled. The CU in
which the variable is resolved is the first CU in the module of
the probe point which matches the given file name at the end and
has the shortest file name path (e.g. given
@var("foo@bar/baz.c") and CUs with file name paths
src/sub/module/bar/baz.c and src/bar/baz.c the second CU will be
chosen to resolve the (file) global variable foo
$var->field traversal via a structure's or a pointer's field. This
generalized indirection operator may be repeated to follow more
levels. Note that the . operator is not used for plain
structure members, only -> for both purposes. (This is because
"." is reserved for string concatenation.)
$return
is available in return probes only for functions that are
declared with a return value, which can be determined using
@defined($return).
$var[N]
indexes into an array. The index given with a literal number or
even an arbitrary numeric expression.
A number of operators exist for such basic context variable
expressions:
$$vars expands to a character string that is equivalent to
sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
parm1, ..., parmN, var1, ..., varN)
for each variable in scope at the probe point. Some values may
be printed as =? if their run-time location cannot be found.
$$locals
expands to a subset of $$vars for only local variables.
$$parms
expands to a subset of $$vars for only function parameters.
$$return
is available in return probes only. It expands to a string that
is equivalent to sprintf("return=%x", $return) if the probed
function has a return value, or else an empty string.
& $EXPR
expands to the address of the given context variable expression,
if it is addressable.
@defined($EXPR)
expands to 1 or 0 iff the given context variable expression is
resolvable, for use in conditionals such as
@defined($foo->bar) ? $foo->bar : 0
$EXPR$ expands to a string with all of $EXPR's members, equivalent to
sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}",
$EXPR->a, $EXPR->b)
$EXPR$$
expands to a string with all of $var's members and submembers,
equivalent to
sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}",
$EXPR->a, $EXPR->b, $EXPR->c->x, $EXPR->c->y, $EXPR->d[0])
MORE ON RETURN PROBES
For the kernel ".return" probes, only a certain fixed number of returns
may be outstanding. The default is a relatively small number, on the
order of a few times the number of physical CPUs. If many different
threads concurrently call the same blocking function, such as futex(2)
or read(2), this limit could be exceeded, and skipped "kretprobes"
would be reported by "stap -t". To work around this, specify a
probe FOO.return.maxactive(NNN)
suffix, with a large enough NNN to cover all expected concurrently
blocked threads. Alternately, use the
stap -DKRETACTIVE=NNNN
stap command line macro setting to override the default for all
".return" probes.
For ".return" probes, context variables other than the "$return" may be
accessible, as a convenience for a script programmer wishing to access
function parameters. These values are snapshots taken at the time of
function entry. Local variables within the function are not generally
accessible, since those variables did not exist in
allocated/initialized form at the snapshot moment.
In addition, arbitrary entry-time expressions can also be saved for
".return" probes using the @entry(expr) operator. For example, one can
compute the elapsed time of a function:
probe kernel.function("do_filp_open").return {
println( get_timeofday_us() - @entry(get_timeofday_us()) )
}
The following table summarizes how values related to a function
parameter context variable, a pointer named addr, may be accessed from
a .return probe.
at-entry value past-exit value
$addr not available
$addr->x->y @cast(@entry($addr),"struct zz")->x->y
$addr[0] {kernel,user}_{char,int,...}(& $addr[0])
DWARFLESS
In absence of debugging information, entry & exit points of kernel &
module functions can be probed using the "kprobe" family of probes.
However, these do not permit looking up the arguments / local variables
of the function. Following constructs are supported :
kprobe.function(FUNCTION)
kprobe.function(FUNCTION).call
kprobe.function(FUNCTION).return
kprobe.module(NAME).function(FUNCTION)
kprobe.module(NAME).function(FUNCTION).call
kprobe.module(NAME).function(FUNCTION).return
kprobe.statement(ADDRESS).absolute
Probes of type function are recommended for kernel functions, whereas
probes of type module are recommended for probing functions of the
specified module. In case the absolute address of a kernel or module
function is known, statement probes can be utilized.
Note that FUNCTION and MODULE names must not contain wildcards, or the
probe will not be registered. Also, statement probes must be run under
guru-mode only.
USER-SPACE
Support for user-space probing is available for kernels that are
configured with the utrace extensions, or have the uprobes facility in
linux 3.5. (Various kernel build configuration options need to be
enabled; systemtap will advise if these are missing.)
There are several forms. First, a non-symbolic probe point:
process(PID).statement(ADDRESS).absolute
is analogous to kernel.statement(ADDRESS).absolute in that both use raw
(unverified) virtual addresses and provide no $variables. The target
PID parameter must identify a running process, and ADDRESS should
identify a valid instruction address. All threads of that process will
be probed.
Second, non-symbolic user-kernel interface events handled by utrace may
be probed:
process(PID).begin
process("FULLPATH").begin
process.begin
process(PID).thread.begin
process("FULLPATH").thread.begin
process.thread.begin
process(PID).end
process("FULLPATH").end
process.end
process(PID).thread.end
process("FULLPATH").thread.end
process.thread.end
process(PID).syscall
process("FULLPATH").syscall
process.syscall
process(PID).syscall.return
process("FULLPATH").syscall.return
process.syscall.return
process(PID).insn
process("FULLPATH").insn
process(PID).insn.block
process("FULLPATH").insn.block
A .begin probe gets called when new process described by PID or
FULLPATH gets created. A .thread.begin probe gets called when a new
thread described by PID or FULLPATH gets created. A .end probe gets
called when process described by PID or FULLPATH dies. A .thread.end
probe gets called when a thread described by PID or FULLPATH dies. A
.syscall probe gets called when a thread described by PID or FULLPATH
makes a system call. The system call number is available in the
$syscall context variable, and the first 6 arguments of the system call
are available in the $argN (ex. $arg1, $arg2, ...) context variable. A
.syscall.return probe gets called when a thread described by PID or
FULLPATH returns from a system call. The system call number is
available in the $syscall context variable, and the return value of the
system call is available in the $return context variable. A .insn
probe gets called for every single-stepped instruction of the process
described by PID or FULLPATH. A .insn.block probe gets called for
every block-stepped instruction of the process described by PID or
FULLPATH.
If a process probe is specified without a PID or FULLPATH, all user
threads will be probed. However, if systemtap was invoked with the -c
or -x options, then process probes are restricted to the process
hierarchy associated with the target process. If a process probe is
unspecified (i.e. without a PID or FULLPATH), but with the -c option,
the PATH of the -c cmd will be heuristically filled into the process
PATH. In that case, only command parameters are allowed in the -c
command (i.e. no command substitution allowed and no occurrences of any
of these characters: '|&;<>(){}').
Third, symbolic static instrumentation compiled into programs and
shared libraries may be probed:
process("PATH").mark("LABEL")
process("PATH").provider("PROVIDER").mark("LABEL")
process(PID).mark("LABEL")
process(PID).provider("PROVIDER").mark("LABEL")
A .mark probe gets called via a static probe which is defined in the
application by STAP_PROBE1(PROVIDER,LABEL,arg1), which are macros
defined in sys/sdt.h. The PROVIDER is an arbitrary application
identifier, LABEL is the marker site identifier, and arg1 is the
integer-typed argument. STAP_PROBE1 is used for probes with 1
argument, STAP_PROBE2 is used for probes with 2 arguments, and so on.
The arguments of the probe are available in the context variables
$arg1, $arg2, ... An alternative to using the STAP_PROBE macros is to
use the dtrace script to create custom macros. Additionally, the
variables $$name and $$provider are available as parts of the probe
point name. The sys/sdt.h macro names DTRACE_PROBE* are available as
aliases for STAP_PROBE*.
Finally, full symbolic source-level probes in user-space programs and
shared libraries are supported. These are exactly analogous to the
symbolic DWARF-based kernel/module probes described above. They expose
the same sorts of context $variables for function parameters, local
variables, and so on.
process("PATH").function("NAME")
process("PATH").statement("*@FILE.c:123")
process("PATH").plt("NAME")
process("PATH").library("PATH").plt("NAME")
process("PATH").library("PATH").function("NAME")
process("PATH").library("PATH").statement("*@FILE.c:123")
process("PATH").function("*").return
process("PATH").function("myfun").label("foo")
process("PATH").function("foo").callee("bar")
process("PATH").plt("NAME").return
process(PID).function("NAME")
process(PID).statement("*@FILE.c:123")
process(PID).plt("NAME")
Note that for all process probes, PATH names refer to executables that
are searched the same way shells do: relative to the working directory
if they contain a "/" character, otherwise in $PATH. If PATH names
refer to scripts, the actual interpreters (specified in the script in
the first line after the #! characters) are probed.
If PATH is a process component parameter referring to shared libraries
then all processes that map it at runtime would be selected for
probing. If PATH is a library component parameter referring to shared
libraries then the process specified by the process component would be
selected. Note that the PATH pattern in a library component will
always apply to libraries statically determined to be in use by the
process. However, you may also specify the full path to any library
file even if not statically needed by the process.
A .plt probe will probe functions in the program linkage table
corresponding to the rest of the probe point. .plt can be specified as
a shorthand for .plt("*"). The symbol name is available as a $$name
context variable; function arguments are not available, since PLTs are
processed without debuginfo. A .plt.return probe places a probe at the
moment after the return from the named function.
If the PATH string contains wildcards as in the MPATTERN case, then
standard globbing is performed to find all matching paths. In this
case, the $PATH environment variable is not used.
If systemtap was invoked with the -c or -x options, then process probes
are restricted to the process hierarchy associated with the target
process.
JAVA
Support for probing Java methods is available using Byteman as a
backend. Byteman is an instrumentation tool from the JBoss project
which systemtap can use to monitor invocations for a specific method or
line in a Java program.
Systemtap does so by generating a Byteman script listing the probes to
instrument and then invoking the Byteman bminstall utility.
This Java instrumentation support is currently a prototype feature with
major limitations. Moreover, Java probing currently does not work
across users; the stap script must run (with appropriate permissions)
under the same user that the Java process being probed. (Thus a stap
script under root currently cannot probe Java methods in a non-root-
user Java process.)
The first probe type refers to Java processes by the name of the Java
process:
java("PNAME").class("CLASSNAME").method("PATTERN")
java("PNAME").class("CLASSNAME").method("PATTERN").return
The PNAME argument must be a pre-existing jvm pid, and be identifiable
via a jps listing.
The PATTERN parameter specifies the signature of the Java method to
probe. The signature must consist of the exact name of the method,
followed by a bracketed list of the types of the arguments, for
instance "myMethod(int,double,Foo)". Wildcards are not supported.
The probe can be set to trigger at a specific line within the method by
appending a line number with colon, just as in other types of probes:
"myMethod(int,double,Foo):245".
The CLASSNAME parameter identifies the Java class the method belongs
to, either with or without the package qualification. By default, the
probe only triggers on descendants of the class that do not override
the method definition of the original class. However, CLASSNAME can
take an optional caret prefix, as in ^org.my.MyClass, which specifies
that the probe should also trigger on all descendants of MyClass that
override the original method. For instance, every method with signature
foo(int) in program org.my.MyApp can be probed at once using
java("org.my.MyApp").class("^java.lang.Object").method("foo(int)")
The second probe type works analogously, but refers to Java processes
by PID:
java(PID).class("CLASSNAME").method("PATTERN")
java(PID).class("CLASSNAME").method("PATTERN").return
(PIDs for an already running process can be obtained using the jps(1)
utility.)
Context variables defined within java probes include $arg1 through
$arg10 (for up to the first 10 arguments of a method), represented as
integers or strings.
PROCFS
These probe points allow procfs "files" in /proc/systemtap/MODNAME to
be created, read and written using a permission that may be modified
using the proper umask value. Default permissions are 0400 for read
probes, and 0200 for write probes. If both a read and write probe are
being used on the same file, a default permission of 0600 will be used.
Using procfs.umask(0040).read would result in a 0404 permission set for
the file. (MODNAME is the name of the systemtap module). The proc
filesystem is a pseudo-filesystem which is used as an interface to
kernel data structures. There are several probe point variants
supported by the translator:
procfs("PATH").read
procfs("PATH").umask(UMASK).read
procfs("PATH").read.maxsize(MAXSIZE)
procfs("PATH").umask(UMASK).maxsize(MAXSIZE)
procfs("PATH").write
procfs("PATH").umask(UMASK).write
procfs.read
procfs.umask(UMASK).read
procfs.read.maxsize(MAXSIZE)
procfs.umask(UMASK).read.maxsize(MAXSIZE)
procfs.write
procfs.umask(UMASK).write
PATH is the file name (relative to /proc/systemtap/MODNAME) to be
created. If no PATH is specified (as in the last two variants above),
PATH defaults to "command".
When a user reads /proc/systemtap/MODNAME/PATH, the corresponding
procfs read probe is triggered. The string data to be read should be
assigned to a variable named $value, like this:
procfs("PATH").read { $value = "100\n" }
When a user writes into /proc/systemtap/MODNAME/PATH, the corresponding
procfs write probe is triggered. The data the user wrote is available
in the string variable named $value, like this:
procfs("PATH").write { printf("user wrote: %s", $value) }
MAXSIZE is the size of the procfs read buffer. Specifying MAXSIZE
allows larger procfs output. If no MAXSIZE is specified, the procfs
read buffer defaults to STP_PROCFS_BUFSIZE (which defaults to
MAXSTRINGLEN, the maximum length of a string). If setting the procfs
read buffers for more than one file is needed, it may be easiest to
override the STP_PROCFS_BUFSIZE definition. Here's an example of using
MAXSIZE:
procfs.read.maxsize(1024) {
$value = "long string..."
$value .= "another long string..."
$value .= "another long string..."
$value .= "another long string..."
}
NETFILTER HOOKS
These probe points allow observation of network packets using the
netfilter mechanism. A netfilter probe in systemtap corresponds to a
netfilter hook function in the original netfilter probes API. It is
probably more convenient to use tapset::netfilter(3stap), which wraps
the primitive netfilter hooks and does the work of extracting useful
information from the context variables.
There are several probe point variants supported by the translator:
netfilter.hook("HOOKNAME").pf("PROTOCOL_F")
netfilter.pf("PROTOCOL_F").hook("HOOKNAME")
netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY")
netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY")
PROTOCOL_F is the protocol family to listen for, currently one of
NFPROTO_IPV4, NFPROTO_IPV6, NFPROTO_ARP, or NFPROTO_BRIDGE.
HOOKNAME is the point, or 'hook', in the protocol stack at which to
intercept the packet. The available hook names for each protocol family
are taken from the kernel header files <linux/netfilter_ipv4.h>,
<linux/netfilter_ipv6.h>, <linux/netfilter_arp.h> and
<linux/netfilter_bridge.h>. For instance, allowable hook names for
NFPROTO_IPV4 are NF_INET_PRE_ROUTING, NF_INET_LOCAL_IN,
NF_INET_FORWARD, NF_INET_LOCAL_OUT, and NF_INET_POST_ROUTING.
PRIORITY is an integer priority giving the order in which the probe
point should be triggered relative to any other netfilter hook
functions which trigger on the same packet. Hook functions execute on
each packet in order from smallest priority number to largest priority
number. If no PRIORITY is specified (as in the first two probe point
variants above), PRIORITY defaults to "0".
There are a number of predefined priority names of the form NF_IP_PRI_*
and NF_IP6_PRI_* which are defined in the kernel header files
<linux/netfilter_ipv4.h> and <linux/netfilter_ipv6.h> respectively. The
script is permitted to use these instead of specifying an integer
priority. (The probe points for NFPROTO_ARP and NFPROTO_BRIDGE
currently do not expose any named hook priorities to the script
writer.) Thus, allowable ways to specify the priority include:
priority("255")
priority("NF_IP_PRI_SELINUX_LAST")
A script using guru mode is permitted to specify any identifier or
number as the parameter for hook, pf, and priority. This feature should
be used with caution, as the parameter is inserted verbatim into the C
code generated by systemtap.
The netfilter probe points define the following context variables:
$hooknum
The hook number.
$skb The address of the sk_buff struct representing the packet. See
<linux/skbuff.h> for details on how to use this struct, or
alternatively use the tapset tapset::netfilter(3stap) for easy
access to key information.
$in The address of the net_device struct representing the network
device on which the packet was received (if any). May be 0 if
the device is unknown or undefined at that stage in the protocol
stack.
$out The address of the net_device struct representing the network
device on which the packet will be sent (if any). May be 0 if
the device is unknown or undefined at that stage in the protocol
stack.
$verdict
(Guru mode only.) Assigning one of the verdict values defined in
<linux/netfilter.h> to this variable alters the further progress
of the packet through the protocol stack. For instance, the
following guru mode script forces all ipv6 network packets to be
dropped:
probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") {
$verdict = 0 /* nf_drop */
}
For convenience, unlike the primitive probe points discussed
here, the probes defined in tapset::netfilter(3stap) export the
lowercase names of the verdict constants (e.g. NF_DROP becomes
nf_drop) as local variables.
KERNEL TRACEPOINTS
This family of probe points hooks up to static probing tracepoints
inserted into the kernel or modules. As with markers, these
tracepoints are special macro calls inserted by kernel developers to
make probing faster and more reliable than with DWARF-based probes, and
DWARF debugging information is not required to probe tracepoints.
Tracepoints have an extra advantage of more strongly-typed parameters
than markers.
Tracepoint probes look like: kernel.trace("name"). The tracepoint name
string, which may contain the usual wildcard characters, is matched
against the names defined by the kernel developers in the tracepoint
header files. To restrict the search to specific subsystems (e.g.
sched, ext3, etc...), the following syntax can be used:
kernel.trace("system:name"). The tracepoint system string may also
contain the usual wildcard characters.
The handler associated with a tracepoint-based probe may read the
optional parameters specified at the macro call site. These are named
according to the declaration by the tracepoint author. For example,
the tracepoint probe kernel.trace("sched:sched_switch") provides the
parameters $prev and $next. If the parameter is a complex type, as in
a struct pointer, then a script can access fields with the same syntax
as DWARF $target variables. Also, tracepoint parameters cannot be
modified, but in guru-mode a script may modify fields of parameters.
The subsystem and name of the tracepoint are available in $$system and
$$name and a string of name=value pairs for all parameters of the
tracepoint is available in $$vars or $$parms.
KERNEL MARKERS (OBSOLETE)
This family of probe points hooks up to an older style of static
probing markers inserted into older kernels or modules. These markers
are special STAP_MARK macro calls inserted by kernel developers to make
probing faster and more reliable than with DWARF-based probes.
Further, DWARF debugging information is not required to probe markers.
Marker probe points begin with kernel. The next part names the marker
itself: mark("name"). The marker name string, which may contain the
usual wildcard characters, is matched against the names given to the
marker macros when the kernel and/or module was compiled.
Optionally, you can specify format("format"). Specifying the marker
format string allows differentiation between two markers with the same
name but different marker format strings.
The handler associated with a marker-based probe may read the optional
parameters specified at the macro call site. These are named $arg1
through $argNN, where NN is the number of parameters supplied by the
macro. Number and string parameters are passed in a type-safe manner.
The marker format string associated with a marker is available in
$format. And also the marker name string is available in $name.
HARDWARE BREAKPOINTS
This family of probes is used to set hardware watchpoints for a given
(global) kernel symbol. The probes take three components as inputs :
1. The virtualaddress/name of the kernel symbol to be traced is
supplied as argument to this class of probes. ( Probes for only data
segment variables are supported. Probing local variables of a function
cannot be done.)
2. Nature of access to be probed : a. .write probe gets triggered when
a write happens at the specified address/symbol name. b. rw probe is
triggered when either a read or write happens.
3. .length (optional) Users have the option of specifying the address
interval to be probed using "length" constructs. The user-specified
length gets approximated to the closest possible address length that
the architecture can support. If the specified length exceeds the
limits imposed by architecture, an error message is flagged and probe
registration fails. Wherever 'length' is not specified, the translator
requests a hardware breakpoint probe of length 1. It should be noted
that the "length" construct is not valid with symbol names.
Following constructs are supported :
probe kernel.data(ADDRESS).write
probe kernel.data(ADDRESS).rw
probe kernel.data(ADDRESS).length(LEN).write
probe kernel.data(ADDRESS).length(LEN).rw
probe kernel.data("SYMBOL_NAME").write
probe kernel.data("SYMBOL_NAME").rw
This set of probes make use of the debug registers of the processor,
which is a scarce resource. (4 on x86 , 1 on powerpc ) The script
translation flags a warning if a user requests more hardware breakpoint
probes than the limits set by architecture. For example,a pass-2
warning is flashed when an input script requests 5 hardware breakpoint
probes on an x86 system while x86 architecture supports a maximum of 4
breakpoints. Users are cautioned to set probes judiciously.
PERF
This family of probe points interfaces to the kernel "perf event"
infrastructure for controlling hardware performance counters. The
events being attached to are described by the "type", "config" fields
of the perf_event_attr structure, and are sampled at an interval
governed by the "sample_period" and "sample_freq" fields.
These fields are made available to systemtap scripts using the
following syntax:
probe perf.type(NN).config(MM).sample(XX)
probe perf.type(NN).config(MM).hz(XX)
probe perf.type(NN).config(MM)
probe perf.type(NN).config(MM).process("PROC")
probe perf.type(NN).config(MM).counter("COUNTER")
probe perf.type(NN).config(MM).process("PROC").counter("COUNTER")
The systemtap probe handler is called once per XX increments of the
underlying performance counter when using the .sample field or at a
frequency in hertz when using the .hz field. When not specified, the
default behavior is to sample at a count of 1000000. The range of
valid type/config is described by the perf_event_open(2) system call,
and/or the linux/perf_event.h file. Invalid combinations or exhausted
hardware counter resources result in errors during systemtap script
startup. Systemtap does not sanity-check the values: it merely passes
them through to the kernel for error- and safety-checking. By default
the perf event probe is systemwide unless .process is specified, which
will bind the probe to a specific task. If the name is omitted then it
is inferred from the stap -c argument. A perf event can be read on
demand using .counter. The body of the perf probe handler will not be
invoked for a .counter probe; instead, the counter is read in a user
space probe via:
process("PROCESS").statement("func@file") {stat <<< @perf("NAME")}
Here are some example probe points, defining the associated events.
begin, end, end
refers to the startup and normal shutdown of the session. In
this case, the handler would run once during startup and twice
during shutdown.
timer.jiffies(1000).randomize(200)
refers to a periodic interrupt, every 1000 +/- 200 jiffies.
kernel.function("*init*"), kernel.function("*exit*")
refers to all kernel functions with "init" or "exit" in the
name.
kernel.function("*@kernel/time.c:240")
refers to any functions within the "kernel/time.c" file that
span line 240. Note that this is not a probe at the statement
at that line number. Use the kernel.statement probe instead.
kernel.trace("sched_*")
refers to all scheduler-related (really, prefixed) tracepoints
in the kernel.
kernel.mark("getuid")
refers to an obsolete STAP_MARK(getuid, ...) macro call in the
kernel.
module("usb*").function("*sync*").return
refers to the moment of return from all functions with "sync" in
the name in any of the USB drivers.
kernel.statement(0xc0044852)
refers to the first byte of the statement whose compiled
instructions include the given address in the kernel.
kernel.statement("*@kernel/time.c:296")
refers to the statement of line 296 within "kernel/time.c".
kernel.statement("bio_init@fs/bio.c+3")
refers to the statement at line bio_init+3 within "fs/bio.c".
kernel.data("pid_max").write
refers to a hardware breakpoint of type "write" set on pid_max
syscall.*.return
refers to the group of probe aliases with any name in the third
position
stap(1), probe::*(3stap), tapset::*(3stap) STAPPROBES(3stap)
Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.
Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.
Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.
Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.
The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.
Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.
Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.
Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.