...

Text file src/github.com/google/pprof/proto/README.md

Documentation: github.com/google/pprof/proto

     1This is a description of the profile.proto format.
     2
     3# Overview
     4
     5Profile.proto is a data representation for profile data. It is independent of
     6the type of data being collected and the sampling process used to collect that
     7data. On disk, it is represented as a gzip-compressed protocol buffer, described
     8at src/proto/profile.proto
     9
    10A profile in this context refers to a collection of samples, each one
    11representing measurements performed at a certain point in the life of a job. A
    12sample associates a set of measurement values with a list of locations, commonly
    13representing the program call stack when the sample was taken.
    14
    15Tools such as pprof analyze these samples and display this information in
    16multiple forms, such as identifying hottest locations, building graphical call
    17graphs or trees, etc.
    18
    19# General structure of a profile
    20
    21A profile is represented on a Profile message, which contain the following
    22fields:
    23
    24* *sample*: A profile sample, with the values measured and the associated call
    25  stack as a list of location ids. Samples with identical call stacks can be
    26  merged by adding their respective values, element by element.
    27* *location*: A unique place in the program, commonly mapped to a single
    28  instruction address. It has a unique nonzero id, to be referenced from the
    29  samples. It contains source information in the form of lines, and a mapping id
    30  that points to a binary.
    31* *function*: A program function as defined in the program source. It has a
    32  unique nonzero id, referenced from the location lines. It contains a
    33  human-readable name for the function (eg a C++ demangled name), a system name
    34  (eg a C++ mangled name), the name of the corresponding source file, and other
    35  function attributes.
    36* *mapping*: A binary that is part of the program during the profile
    37  collection. It has a unique nonzero id, referenced from the locations. It
    38  includes details on how the binary was mapped during program execution. By
    39  convention the main program binary is the first mapping, followed by any
    40  shared libraries.
    41* *string_table*: All strings in the profile are represented as indices into
    42  this repeating field. The first string is empty, so index == 0 always
    43  represents the empty string.
    44
    45# Measurement values
    46
    47Measurement values are represented as 64-bit integers. The profile contains an
    48explicit description of each value represented, using a ValueType message, with
    49two fields:
    50
    51* *Type*: A human-readable description of the type semantics. For example “cpu”
    52  to represent CPU time, “wall” or “time” for wallclock time, or “memory” for
    53  bytes allocated.
    54* *Unit*: A human-readable name of the unit represented by the 64-bit integer
    55  values. For example, it could be “nanoseconds” or “milliseconds” for a time
    56  value, or “bytes” or “megabytes” for a memory size. If this is just
    57  representing a number of events, the recommended unit name is “count”.
    58
    59A profile can represent multiple measurements per sample, but all samples must
    60have the same number and type of measurements. The actual values are stored in
    61the Sample.value fields, each one described by the corresponding
    62Profile.sample_type field.
    63
    64Some profiles have a uniform period that describe the granularity of the data
    65collection. For example, a CPU profile may have a period of 100ms, or a memory
    66allocation profile may have a period of 512kb. Profiles can optionally describe
    67such a value on the Profile.period and Profile.period_type fields. The profile
    68period is meant for human consumption and does not affect the interpretation of
    69the profiling data.
    70
    71By convention, the first value on all profiles is the number of samples
    72collected at this call stack, with unit “count”. Because the profile does not
    73describe the sampling process beyond the optional period, it must include
    74unsampled values for all measurements. For example, a CPU profile could have
    75value[0] == samples, and value[1] == time in milliseconds.
    76
    77## Locations, functions and mappings
    78
    79Each sample lists the id of each location where the sample was collected, in
    80bottom-up order. Each location has an explicit unique nonzero integer id,
    81independent of its position in the profile, and holds additional information to
    82identify the corresponding source.
    83
    84The profile source is expected to perform any adjustment required to the
    85locations in order to point to the calls in the stack. For example, if the
    86profile source extracts the call stack by walking back over the program stack,
    87it must adjust the instruction addresses to point to the actual call
    88instruction, instead of the instruction that each call will return to.
    89
    90Sources usually generate profiles that fall into these two categories:
    91
    92* *Unsymbolized profiles*: These only contain instruction addresses, and are to
    93  be symbolized by a separate tool. It is critical for each location to point to
    94  a valid mapping, which will provide the information required for
    95  symbolization. These are used for profiles of compiled languages, such as C++
    96  and Go.
    97
    98* *Symbolized profiles*: These contain all the symbol information available for
    99  the profile. Mappings and instruction addresses are optional for symbolized
   100  locations. These are used for profiles of interpreted or jitted languages,
   101  such as Java or Python.  Also, the profile format allows the generation of
   102  mixed profiles, with symbolized and unsymbolized locations.
   103
   104The symbol information is represented in the repeating lines field of the
   105Location message. A location has multiple lines if it reflects multiple program
   106sources, for example if representing inlined call stacks. Lines reference
   107functions by their unique nonzero id, and the source line number within the
   108source file listed by the function. A function contains the source attributes
   109for a function, including its name, source file, etc. Functions include both a
   110user and a system form of the name, for example to include C++ demangled and
   111mangled names. For profiles where only a single name exists, both should be set
   112to the same string.
   113
   114Mappings are also referenced from locations by their unique nonzero id, and
   115include all information needed to symbolize addresses within the mapping. It
   116includes similar information to the Linux /proc/self/maps file. Locations
   117associated to a mapping should have addresses that land between the mapping
   118start and limit. Also, if available, mappings should include a build id to
   119uniquely identify the version of the binary being used.
   120
   121## Labels
   122
   123Samples optionally contain labels, which are annotations to discriminate samples
   124with identical locations. For example, a label can be used on a malloc profile
   125to indicate allocation size, so two samples on the same call stack with sizes
   1262MB and 4MB do not get merged into a single sample with two allocations and a
   127size of 6MB.
   128
   129Labels can be string-based or numeric. They are represented by the Label
   130message, with a key identifying the label and either a string or numeric
   131value. For numeric labels, the measurement unit can be specified in the profile.
   132If no unit is specified and the key is "request" or "alignment",
   133then the units are assumed to be "bytes". Otherwise when no unit is specified
   134the key will be used as the measurement unit of the numeric value. All tags with
   135the same key should have the same unit.
   136
   137## Keep and drop expressions
   138
   139Some profile sources may have knowledge of locations that are uninteresting or
   140irrelevant. However, if symbolization is needed in order to identify these
   141locations, the profile source may not be able to remove them when the profile is
   142generated. The profile format provides a mechanism to identify these frames by
   143name, through regular expressions.
   144
   145These expressions must match the function name in its entirety. Frames that
   146match Profile.drop\_frames will be dropped from the profile, along with any
   147frames below it. Frames that match Profile.keep\_frames will be kept, even if
   148they match drop\_frames.
   149

View as plain text