pcp-archive - Archive Files for Performance Co-Pilot
$PCP_LOG_DIR/pmlogger/*/*.{meta,index,0} $PCP_LOG_DIR/pmmgr/*/*.{meta,index,0}
PCP log archives store volumes of historical values of arbitrary Performance Co-Pilot metrics recorded from a single host. Archives are self-contained in the sense that they contain all the important metadata that would be required for off-line or off-site analysis. The format is intended to be stable in order to allow long-term historical storage and processing by current tools. (Compatibility in the other direction - new files, old tools - is not as fully assured.) Archives may be read by most PCP client tools, using the -a ARCHIVE option, or dumped raw by pmdumplog(1). Archives may be created by pmlogger(1) and bulk-import tools. Archives may be merged, analyzed, and subsampled using specialized tools such as pmlogsummary(1), pmlogreduce(1), pmlogrewrite(1), and pmlogextract(1). In addition, PCP archives may examined in sets or grouped together into PCP "archive folios", which are managed by the pmafm(1) tool. PCP archives consist of several physical files that share a common arbitrary prefix, e.g., myarchive. myarchive.0, myarchive.1, ... Metric values. May grow rapidly. myarchive.meta Information for PMAPI functions such as pmLookupDesc(3) and pmGetInDom(3). May grow in fits and spurts, as logged instances and instance domains vary. myarchive.index A temporal index, mapping timestamps to offsets in the other files. Grows slowly.
All three types of files have a similar record-based structure, a convention of network-byte-order (big-endian) encoding, and 32-bit fields for tagging/padding for those records. Strings are stored as 8-bit characters without assuming a specific encoding, so normally ASCII. See also the __pmLog* types in include/pcp/impl.h. RECORD FRAMING The volume and meta files are divided into self-identifying records. Offset Length Name 0 4 N, length of record, in bytes, including this field 4 N-8 record payload, usually starting with a 32-bit tag N-4 4 N, length of record (again) ARCHIVE LOG LABEL All three types of files begin with a "log label" header, which identifies the host name, the time interval covered, and a time zone. Offset Length Name 0 4 tag, PM_LOG_MAGIC | PM_LOG_VERS02=0x50052602 4 4 pid of pmlogger process that wrote file 8 4 log start time, seconds part (past UNIX epoch) 12 4 log start time, microseconds part 16 4 current log volume number (or -1=.meta, -2=.index) 20 64 name of collection host 80 40 time zone string ($TZ environment variable) All fields, except for the current log volume number field, match for all archive-related files produced by a single run of the tool.
pmResult After the archive log label record, an archive volume file contains metric values corresponding to the pmResult set of one pmFetch operation, which is almost identical to the form on disk. The record size may vary according to number of PMIDs being fetched, the number of instances for their domains. File size is limited to 2GB, due to storage of 32-bit offsets within the .index file. Offset Length Name 0 4 timestamp, seconds part (past UNIX epoch) 4 4 timestamp, microseconds part 8 4 number of PMIDs with data following 12 M pmValueSet #0 12+M N pmValueSet #1 12+M+N ... ... NOP X pmValueBlock #0 NOP+X Y pmValueBlock #1 NOP+X+Y ... ... Records with a number-of-PMIDs equal to zero are "markers", and may represent interruptions, missing data, or time discontinuities in logging. pmValueSet This subrecord represents the measurements for one metric. Offset Length Name 0 4 PMID 4 4 number of values 8 4 storage mode, PM_VAL_INSITU=0 or PM_VAL_DPTR=1 12 M pmValue #0 12+M N pmValue #1 12+M+N ... ... The metric-description metadata for PMIDs is found in the .meta files. These entries are not timestamped, so the metadata is assumed to be unchanging throughout the archiving session. pmValue This subrecord represents one measurement for one instance of the metric. It is a variant type, depending on the parent pmValueSet's value-format field. This allows small numbers to be encoded compactly, but retain flexibility for larger or variable-length data to be stored later in the pmResult record. Offset Length Name 0 4 number in instance-domain (or PM_IN_NULL=-1) 4 4 value (INSITU) or offset in pmResult to our pmValueBlock (DPTR) The instance-domain metadata for PMIDs is found in the .meta files. Since the numeric mappings may change during the lifetime of the logging session, it is important to match up the timestamp of the measurement record with the corresponding instance-domain record. That is, the instance-domain corresponding to a measurement at time T are the records with largest timestamps T' <= T. pmValueBlock Instances of this subrecord are placed at the end of the pmValueSet, after all the pmValue subrecords. Iff needed, they are padded at the end to the next-higher 32-bit boundary. Offset Length Name 0 1 value type (same as pmDesc.type) 1 3 4 + N, the length of the subrecord 4 N bytes that make up the raw value 4+N 0-3 padding (not included in the 4+N length field) Note that for PM_TYPE_STRING, the length includes an explicit NUL terminator byte. For PM_TYPE_EVENT, the value bytestring is further structured. pmEventArray (TBD)
After the archive log label record, the metadata file contains interleaved metric-description and timestamped instance-domain descriptors. File size is limited to 2GB, due to storage of 32-bit offsets within the .index file. Unlike the archive volumes, these records are not forced to 32-bit alignment! See also src/libpcp/src/logmeta.c. pmDesc Instances of this record represent the metric description, giving a name, type, instance-domain identifier, and a set of names to each PMID used in the archive volume. Offset Length Name 0 4 tag, TYPE_DESC=1 4 4 pmid 8 4 type (PM_TYPE_*) 12 4 instance domain number 16 4 semantics of value (PM_SEM_*) 20 4 units: bit-packed pmUnits 4 4 number of alternative names for this PMID 28 4 N: number of bytes in this name 32 N bytes of the name, no NUL terminator nor padding 32+N 4 N2: number of bytes in next name 36+N N2 bytes of the name, no NUL terminator nor padding ... ... ... pmLogIndom Instances of this record represent the number-string mapping table of an instance domain. The instance domain number will have already been mentioned in a prior pmDesc record. Since new instances may appear over a long archiving run, these records are timestamped, and must be searched when decoding pmResult records from the main archive volumes. Instance names may be reused between instance numbers, so an offset- based string table is used that could permit sharing. Offset Length Name 0 4 tag, TYPE_INDOM=2 4 4 timestamp, seconds part (past UNIX epoch) 8 4 timestamp, microseconds part 12 4 instance domain number 16 4 N: number of instances in domain, normally >0 20 4 first instance number 24 4 second instance number (if appropriate) ... ... ... 20+4*N 4 first offset into string table (see below) 20+4*N+4 4 second offset into string table (etc.) ... ... ... 20+8*N M base of string table, containing packed, NUL-terminated instance names Records of this form replace the existing instance-domain: prior records are not searched for resolving instance numbers in measurements after this timestamp.
After the archive log label record, the temporal index file contains a plainly concatenated, unframed group of tuples, which relate timestamps to 32-bit seek offsets in the volume and meta files. (This limits those files to 2GB in size.) These records are fixed-size, fixed- format, and are not enclosed in the standard length/payload/length wrapper: they just take up the entire remainder of the .index file. See also src/libpcp/src/logutil.c. Offset Length Name 0 4 event time, seconds part (past UNIX epoch) 4 4 event time, microseconds part 8 4 archive volume number (0...N) 12 4 byte offset in .meta file of pmDesc or pmLogIndom 16 4 byte offset in archive volume file of pmResult Since temporal indexes are optional, and exist only to speed up time- wise random access of metrics and their metadata, index records are emitted only intermittently. An archive reader program should not presume any particular rate of data flow into the index. However, common events that may trigger a new temporal-index record include changes in instance-domains, switching over to a new archive volume, just starting or stopping logging. One reliable invariant however is that, for each index entry, there are to be no meta or archive-volume records with a timestamp after that in the index, but physically before the byte-offset in the index.
PCPIntro(1), PMAPI(3), pmlogger(1), pmdumplog(1), pmafm(1), pcp.conf(5), and pcp.env(5).
Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.
Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.
Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.
Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.
The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.
Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.
Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.
Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.