collectl(1)

NAME

   collectl - Collects data that describes the current system status.

SYNOPSIS

   Record  Mode  - read data from live system and write to file or display
   on terminal

   collectl [-f file] [options]

   Playback Mode - read data from one or more raw data files  and  display
   on terminal

   collectl -p file1 [file2 ...] [options]

OPTIONS

   Record Mode

   In  this  mode data is taken from a live system and either displayed on
   the terminal or written to one or more files or a socket.

   --align
          If the HiRes modules is present, collectl sample monitoring will
          be aligned such that a sample will always be taken at the top of
          a minute (this does NOT mean the first sample will  occur  then)
          so  that  all instances of collectl running on any systems which
          have their clocks synchronized will all take samples at the same
          time.   Furthermore,  if  one is doing process monitoring, those
          samples will also be taken at the top of the minute and  so  can
          delay  the  start  of  sampling  up to 2 full process monitoring
          intervals.

   --all
          Collect summary data for ALL subsystems except slabs, since slab
          monitoring  requires a different monitoring interval.  This also
          means  you  won't  get  any  detail  data  which  also  includes
          processes   and  environmementals.   You  can  use  this  switch
          anywhere -s can be used but not both together.   If  the  system
          supports  lustre and/or interconnect monitoring those statistics
          will be provided but the warnings produced  when  they  are  not
          available you try to select them with -s will not be displayed.

   --ALL
          This is actually a superset of --all by adding detail statistics
          as well with the exception of TCP details when displaying  to  a
          terminal since those are only available with -P or -f.

   -A, --address address[:port[:timeout]] | server[:port]
          In  the  first form, one specifies an address, optional port and
          timeout (the first colon is  required  to  specify  timeout  for
          default port).  All data is then written to that socket prefaced
          with the current host name at the named address and  port  until
          the socket is closed, at which time collectl will exit.

          In  the  second  form  one enters the text "server" and optional
          port.  In this form, collectl runs as a server,  waiting  for  a
          connection and once established writes data on that socket.  The
          key difference here is  if  the  client  exists  collectl  keeps
          running and will again look for a new connection, allowing it to
          survive client restarts or crashes.

          The default port is set  at  2655  but  can  be  changed  -  see
          collectl.conf.

          In  both  forms, one can additionally request local data logging
          by specifying a combination of -P and  -f.   See  man  collectl-
          logging for more details.

   --comment string
          Add  the  specified string to the end of the headers in the data
          files. If any embedded spaces be sure to quote it.  This can  be
          very  useful  when  doing  characterizations or benchmarking and
          you're frequently  changing  system/application  parameters  and
          restarting collectl between tests.

   -C, --config filename
          Name/location  of  the  collectl  configuration  file.   If  not
          specified, collectl searches for  collectl.conf  first  in  /etc
          (the   default),   then  in  the  same  directory  the  collectl
          executable is in, and finally the current working directory.

   -c, --count Samples
          The number of samples to record. This is one way of  3  ways  of
          describing  how long collectl should run (see -r and -R ).  Note
          that these 3 switches are mutually exclusive.

   -D, --daemon
          Run collectl as a daemon, primarily  used  when  starting  as  a
          service.   One  caveat  about  this mode is you can only run one
          copy.

   --export file[,options]
          This requests that collectl  does  not  print  anything  on  the
          terminal   (or   send   it  to  a  socket)  using  the  standard
          brief/verbose/plot  formats.   Instead  it   executes   a   perl
          "require"  on  the  named  file, using an extension of ph if not
          specified.  It first looks in the current directory and  if  not
          there  the  directory  the  executable is in.  It then calls the
          function "file"Init(options) towards the beginning  of  collectl
          and  again  as simply  "file"(@options) to generate the exported
          formatted output.  See the  online  documentation  on  Exporting
          Custom Output and Logging for more details.

   -f, --filename Filename
          This  is the name of a file to write the output to.  For details
          on how the output files are named, see the File  Naming  section
          of    the    documentation    on   collectl.sourceforge.net   OR
          /usr/share/doc/collectl/FileNaming.html

   -F, --flush seconds
          Flush output buffers after this  number  of  seconds.   This  is
          equivalent  to issuing kill -s USR1 at the same frequency (but a
          lot easier!).  If 0, a flush will occur  every  data  collection
          interval.

   --grep pattern
          The  main  purpose  of  this  switch is for those users who have
          discovered there is some  data  in  the  raw  files  that  never
          appears   in  any  display  and  have  taken  to  displaying  it
          themselves  with  grep.   Unfortunately  this  method  does  not
          include  timestamps  and  so makes it difficult to interpret the
          results.  Even if you include the timestamp from the file it  is
          in  UTC  and  so needs to be translated to be of any real value.
          This switch does just that and then some.

          Specifically, it allows you to playback a file  and  instead  of
          processing  it  normally it simply searches for any entries that
          match the perl pattern and reports  those  lines  prefaced  with
          time stamps.  You can optionally change the time format with the
          usual -o options and can even select the timeframe  with  --from
          and --thru.

   --home
          Always  start the display for the current interval at the top of
          the screen also known as  the  home  position  (non-plot  format
          only).   This  generates  a  real-time,  continously  refreshing
          display when the data fits on a single screen.

   --import file1[,options][:file2[,options]...]
          This loads the named files and executes callbacks to them, which
          is  the  API  mechanism  for  importing  additional metrics into
          collectl.  See the webpage on the API for further detail.

          Since these files also include instructions for  how  to  report
          the  output  in  all  the  various  forms, you will also need to
          include --import during playback.  Finally, since the default is
          to   seamlessly  include  imported  data  with  everything  else
          collectl reports, if you ONLY want to display imported data  you
          much   explicitly   deselect  all  other  subsystems  either  by
          including -s- (note the trailing minus sign) followed by all the
          subsystems were recorded OR simply say -s-all.

   -i, --interval interval[:interval2[:interval3]]
          This  is  the  sampling  interval in seconds.  The default is 10
          seconds when run as  a  daemon  and  1  second  otherwise.   The
          process  subsystem  and  slabs  (-sY and -sZ) are sampled at the
          lower rate of interval2.  Environmentals (-sE), which only apply
          to  a  subset  of  hardware,  are  sampled  at  interval3.  Both
          interval2 and interval3, if specified, must be an even  multiple
          of  interval1.   The daemon default is -i10:60:300 and all other
          modes are -i1:60:300.  To sample only processes  once  every  10
          seconds use -i:10.

   --nohup
          Whenever collectl finishes a data collection interval, it checks
          to see if the starting parent has exited.  This  is  to  prevent
          the  case  in  which  someone might start a copy of collectl and
          then the process dies and collectl keeps running.   If  that  is
          the   behavior  someone  actually  intends,  they  should  start
          collectl with --nohup.

          NOTE - when running as a daemon, --nohup is implied.

   --quiet
          Whenever collectl wants to tell the user something, it assigns a
          category  to  it such as Informational, Warning, Error or Fatal.
          When run with -m, all messages are displayed for the user and if
          logging  data to a file with -f, these messages are also sent to
          a log file which is in the data collection directory and has  an
          extenion   of   "log".    However,   if   -m  is  not  specified
          Informational messages (such as collectl starting  or  stopping)
          are not reported on the terminal but the other 3 are.  Sometimes
          the warnings can be annoying and one  can  suppress  these  with
          --quiet  though they will still be written to the message log in
          -f.  You cannot suppress Error or Fatal errors.

   -r, --rolllogs time[[,days[:months]][,minutes]]
          When selected, collectl runs indefinately (or at least until the
          system  reboots).   The  maximum number of raw and/or plot files
          that will be retained (older ones are automatically deleted)  is
          controlled by the days field, the default is 7.  When -m is also
          specified to direct collectl to write messages to a log file  in
          the logging directory, the number of months to retain those logs
          is controlled by the months field and its default  is  12.   The
          increment   field  which  is  also  optional  (but  is  position
          dependent) specifies the duration of  an  individual  collection
          file in minutes the default of which is 1440 or 1 day.

   --rawdskfilt
          This  switch  overrides  the DiskFilter setting in collectl.conf
          and explicitly defines a  perl  regx  expression  against  which
          records  from /prod/diskstats are selected for processing.  When
          there are a lot of disks to process, this can be a handy way  to
          reduce  the  amount  of  data  collected  and  actually  improve
          performance since there are less patterns to  match  each  input
          record  against.  Just remember that unlike --dskfilt which only
          filters during display, records filtered with  this  switch  are
          never even recorded and so lost forever.

          You  can optionally specify your filter with a leading plus-sign
          which tells collectl to just add  your  filter  to  the  default
          specification.  Care should be taken here as longer filters will
          slightly increase overhead  and  with  a  lot  of  disks  and/or
          shorter monitoring intervals can add up.

          As  a side benefit of this switch, if you really want to look at
          partition level stats you can do so by leaving off the  trailing
          space in the default pattern.

          One  must  be  also  be careful in selecting the correct pattern
          since it's easy to get it wrong and you may  end  up  collecting
          the WRONG data!  To verify you are collecting what you think you
          are, make a test run  using  -d4  to  see  the  raw  data  being
          recorded in real-time.

   --rawdskignore
          This  is  the opposite of the rawdskfilt switch.  When specified
          any disks listed are completely ignored and will not  appear  in
          the  raw file.  Typically this switch is useful when you're only
          interested in recording a subset of disk statistics.

   --rawnetfilt
          This works just like --rawdskfilt except it applies to networks.
          Unlike disk filtering which has an explicit default pattern, the
          default for network filtering is to simply  record  all  network
          data from /proc/net/dev.

          The  -d4  switch  also works here, as well as everywhere, to see
          the raw data as it is being collected.

   --rawnetignore
          This is the opposite of the rawnetfilt  switch  and  works  just
          like  the  rawdskignore  switch.   When  specified  any networks
          listed are  ignored  and  will  not  appear  in  the  raw  file.
          Typically  this  switch is useful when you're only interested in
          recording a subset of network statistics.

   --rawtoo
          Only available in conjunction with -P, this  switch  causes  the
          creation/logging  of  raw  data  in  addition to plottable data.
          While  this  may  seem  excessive,  keep  in  mind  that  unlike
          plottable  data,  raw  data  can  be  played back with different
          switches potentially providing more details.   The  overhead  to
          write  out  this  additional data is minimal, the only real cost
          being that of extra disk space.

   -R, --runas uid[:gid]
          This switch only works when running in daemon mode and  so  must
          be  specified  in  the  DaemonCommands  line.  Its presence will
          cause collectl to write the  collectl.pid  file  into  the  same
          directory  as  its  other output files as specified by -f, since
          /var/run does not  normally  grant  non-privileged  users  write
          access.  Furthermore, the ownership of that directory must match
          the specified ownership since collectl needs to write  ALL  it's
          files  to  that  directory  and  can  no  longer  assume  global
          permissions when run as root.

          This WILL also require manually  modifying  /etc/init.d/collectl
          to  change  the  PIDFILE variable to point to the same directory
          which the -f switch in the DaemonCommands line of  collectl.conf
          points to.

          As  a  final note of caution, since this mechanism changes where
          collectl  reads/writes  its  pid  file,  once  you  start  using
          --runas, all calls to run collectl as a daemon must use it or it
          may be confused and exhibit unpredictable behavior.

   -R, --runtime duration
          Specify the duration of data collection where the duration is  a
          number  followed  by  one  of  wdhms, indicating how many weeks,
          days, hours, minutes or seconds the collection is  to  be  taken
          for.

   --sep separator
          Specify the plot format separator - default is a space.  If this
          is a numeric field it is interpretted as the  decimal  value  of
          the   associated   ASCII   character   code.   Otherwise  it  is
          interpretted as the character itself.  In other words, "--sep :"
          sets the separator character to a colon and "--sep 9" sets it to
          a horizontal tab.  "--sep 58" would also set it to a colon.

   --tworaw
          The switches -G and --group  have  been  replaced  by  --rawtoo,
          which  is  more rescriptive of its function.  When specified, it
          tells collectl to treat process and slab  data  as  an  entirely
          separate  group  of  raw files, named with the extention "rawp".
          These separate files can be played back and processed just  like
          any  other collectl raw files and in fact one can even play back
          both at the same time if that is what is desired.  The only real
          purpose  of  this  switch  is  that  on  some  systems with many
          processes, it is possible to generate huge raw files (some  have
          been  observerd  to  be >250MB!) and while collectl will happily
          play back/process these files it can take a long time.  By using
          the  --tworaw  switch  one  still gets a huge rawp file, but the
          normal raw file is a much more manageable size and as  a  result
          will  faster  to process then when all data is combined into the
          same file.

   Playback Mode

   In this mode, data is read from  one  or  more  data  files  that  were
   generated in Record Mode

   --export Filename
          When playing back a file, use this switch to create an identical
          raw file differing only in  the  timeframe  being  convered,  so
          naturally   one  must  also  include  --from,  --thru  or  both.
          Further, since the resultant file will contain  the  exact  same
          raw  data  you  cannot select a subset using -s.  This switch is
          actually intended for a support function  for  situations  where
          somone  is  having  problems playing back a file and a subset of
          the original raw file that covers  the  problem  time  has  been
          requested,  hopefully allowing a significantly file to be posted
          or emailed.

   --extract filename
          If specified, rather than actually play back the file  specified
          with  -p, ALL raw data between the date ranges is selected and a
          subset of that raw file created.  The rules for how to interpret
          the filename are the same as used for -f.

   -f, --filename filename
          If  specified,  this is the name of a file or directory to write
          the output to (rather than the terminal).  See  the  description
          for  details  on the format of this field.  This requires the -P
          flag as well.

   --from time range
          Play back data starting with this  time,  which  may  optionally
          include  the  ending  time  as  well,  which is of the format of
          [date:]time[-[date:]time].   The  leading  0  of  the  hour   is
          optional and if the seconds field is not specified is assumed to
          be 0.  If no dates specified the  time(s)  apply  to  each  file
          specified  by  -P.   Otherwise  the  time(s)  only  apply to the
          first/last dates and any files between those dates will have all
          their data reported.

   --full
          Full  mode  is  actually a superset of --verbose and if selected
          will force --verbose.  It will also force the  RECORD  separator
          to be printed for every interval even if only a single subsystem
          was requested and to include the actual subsystems  that  follow
          following  the  utc timestamp as a parsing aid for those who may
          wish to parse the text output rather than the plot data.

   --offsettime seconds
          This field originally was  used  before  collectl  reported  the
          timezone  in  the  file  headers  and allowed one to compensate.
          Since then it is rarely needed except in two possible cases, one
          in  which data on two systems is to be compared and they weren't
          synchonized with ntp.  This allows all the times to be  reported
          as  shifted by some number of seconds.  The other case (and this
          is very rare) is when a clock had changed in  the  middle  of  a
          sample  and  will not be converted correctly.  When this happens
          one may have to play back the samples in pieces and manually set
          the time offset.

   --passwd filename
          When  reporting  usernames  associated with a UID, use this file
          for the mapping.  This  is  particularly  important  on  systems
          running NIS where this are no user names in /etc/passwd.

   -p, --playback Filename
          Read  data  from the specified playback file(s), noting that one
          can use wildcards in the filename if  quoted  (if  playing  back
          multiple  files  to the terminal you probably want to include -m
          to see the filenames as they are processed).  The filename  must
          either  end in raw or raw.gz.  As an added feature, since people
          sometimes automate the running of this option and don't want  to
          hard  code a date, you can specify the string YESTERDAY or TODAY
          and they  will  be  replaced  in  the  filename  string  by  the
          appropriate date.

   --pname name
          By  default,  collectl  uses  the  file /var/run/collectl.pid to
          indicate the pid of the running instance of collectl and prevent
          multiple  copies from being run.  If you DO want to run a second
          copy, this switch will cause collectl to change its process name
          to collectl-name and use that name as the associated pid file as
          well.

   --procanalyze
          When specified and there is process data  in  the  raw  file,  a
          summary  file  will  be  generated with one entry unique process
          containing such things as the total cpu consumed for  both  user
          and  system,  min/max utilization of various memory types, total
          page faults and several others.

   --slabanalyze
          When specified and there is slab data in the raw file, a summary
          file  will  be  generated  with one entry unique slab containing
          data on physical memory usage by that slab.

   --thru time
          Time thru which to play back a raw file.  See --from for more

   Common Switches - both record and playback modes

   -d, --debug debug
          Control the level of debugging information, not typically  used.
          For details see the source code.

   -h, --help, -x, --helpext, -X, --helpall
          Display  standard,  extended help message (which doesn't include
          the  optional  displays  such  as  --showoptions,  --showsubsys,
          --showsubopts, --showtopopts) or everything.

   --hr, --headerrepeat num
          Sets  the  number  of  intervals  to  display  data  for  before
          repeating the header.  A value -1 will prevent any headers  from
          being displayed and a value of 0 will cause only a single header
          to be displayed and never repeated.

   --iosize
          In brief mode, include iosize with disk, infiniband and  network
          data.

   -l, --limits limit
          Override one or more default exception limits.  If more than one
          limit they must be separated by hyphens.  Current values are:

          SVC:value
                 Report partition activity with Service times >= 30 msec

          IOS:value
                 Report device activity with 10 or more  reads  or  writes
                 per second

          LusKBS:value
                 Report  client  or OSS activity greater than limit.  Only
                 applies  to  Client  Summary  or  OSS  Detail  reporting.
                 [default=100000]

          LusReints:value
                 Report  MDS activity with Reint greater than limit.  Only
                 applies to MDS Summary reporting.  [default=1000]

          AND
                 Both the IOS and SCV limits  must  be  reached  before  a
                 device  is  reported.   This  is the default value and is
                 only included for completeness.

          OR
                 Report device activity if either IOS  or  SVC  thresholds
                 are reached.

          -L, --lustsvcs [c|m|o][:seconds]
                 This  switch  limits  which servics lustre checks for and
                 the frequency of those checks.  For more information  see
                 the man page collectl-lustre.

   -m, --messages
          Write  status to a monthly log file in the same directory as the
          output file (requires -f to be specified as well).  The name  of
          the  file  will  be  collectl-yyyymm.log  and will track various
          messages that may get generated during every run of collectl.

   -N, --nice
          Set priority to a nicer one of 10.

   -o, --options Options
          These apply to the way output is displayed OR written to a  plot
          file.   They  do  not  effect  the  way  data  is  selected  for
          recording.  Most of these switches work in both record  as  well
          as playback mode.  If you're not sure, just try it.

          1
                 Data  in  plotting  format  should use 1 decimal point of
                 precision as appropriate.

          2
                 Data in plotting format should use 2  decimal  points  of
                 precision as appropriate.

          a
                 Always  append data to an existing plot file.  By default
                 if a plot file exists, the playback file will be  skipped
                 as  a  way  of  assuring  it  is associated with a single
                 recorded file.   This  switch  overrides  that  mechanism
                 allowing  muliple  recorded  files  to  be  processed and
                 written to a single plot file.

          c
                 Always  open  newly  named  plot  fies  in  create  mode,
                 overwriting any old ones that may already exists.  If one
                 processes multiple files for the same day in append  mode
                 multiple  times,  the  same  data will be appended to the
                 same file mulitple times.  This assures  a  new  file  is
                 created at the start of the processing.

          d
                 For  use  with  terminal output and  brief mode.  Preceed
                 each line with a date/time stamp, the date being in mm/dd
                 format.  This option can also be applied to plot formatit
                 which will cause the date portion to also be displayed in
                 this format as opposed to D format.

          D
                 For  use  with  terminal  output and brief mode.  Preceed
                 each line with a  date/time  stamp,  the  date  being  in
                 yyyymmdd format.

          g
                 For  use  with  terminal  output  and  brief mode.   When
                 displaying values of  1G  or  greater  there  is  limited
                 precision  for  1  digit values.  This options provides a
                 way to display additional digits for more granularity  by
                 substituting  a "g" for the decimal point rather than the
                 trailing "G".

          G
                 For use with terminal output and  brief  mode.   This  is
                 similar   to  "g"  but  preserves  the  trailing  "G"  by
                 sacrificing a digit of granularity.

          m
                 Whenever times are reported in plot format, in the normal
                 terminal   reporting  format  at  the  bginning  of  each
                 interval or when when one of the time  reporting  options
                 (d,  D,  T  or U is selected), append the milliseconds to
                 the time.

          n
                 Where appropriate, data such as disk KBs or transfers are
                 normalized  to units per second by taking the change in a
                 counter and dividing by the number  of  seconds  in  that
                 interval.   In  the case of CPUs, utilization (calculated
                 in  jiffies)  is  normalized  as  a  percentage  of   the
                 interval.

                 Normalization can be disabled via this option, the result
                 being the reported values are not divided by the duration
                 of  the  interval.   This  can  be particulary useful for
                 reporting values that are < 1/2 the sampling, which  will
                 be rounded to 0.

          T
                 For  use  with  terminal  output and brief mode, preceeds
                 each line with a time stamp.

          u
                 Create plot  files  with  unique  names  by  include  the
                 starting  time  of  a colletion in the name.  This forces
                 multiple collections taken the same day to be written  to
                 multiple files.

          -U or --utc
                 In  plot  format  only,  report timestamps in Coordinated
                 Universal time which is more commonly know as UTC.

          x
                 Report only exception records  for  selected  subsystems.
                 Exception  reporting  also requires --verbose.  Currently
                 this only  applies  to  disk  detail  and  Lustre  server
                 information  so one must select at least -s D, l or L for
                 this to apply.  If writing to a detail  file,  this  data
                 will  go  into  a  separate  file  with  the  extension X
                 appended to the regular detail file name.

          X
                 Report  both  exceptions  as  well  as  all  details  for
                 selected subsystems, for -s D, l or L only.

          z
                 If the compression library has been installed, all output
                 files will be compressed by default.  This  switch  tells
                 collectl   not  to  compress  any  plottable  files.   If
                 collectl tries to compress but cannot because the library
                 hasn't  been  installed, it will generate a warning which
                 can be suppressed with this switch.

   -P, --plot
          Generate output in plot format.  This format is space  separated
          data  which  consists  of  a  header (prefaced with a # for easy
          identification by an analysis program as well as identifying  it
          as  a  comment  for  programs, such as gnuplot, which honor that
          convention).  When written to disk, which  is  the  typical  way
          this  option  is  used, summary data elements are written to the
          tab file and the detail elements written to one or  more  files,
          one per detail subsystem.  If -f is not specified, all output is
          sent to the terminal.  Output is always one  line  per  sampling
          interval.

   --stats
          This  switch will cause brief data to be reported as both totals
          and averages after processing one or more files for the same day
          or in playback mode.

   --statopts option(s)
          This  switch  controls  the  way  brief  stats are reported, the
          default is to report the totals once, at  the  end  of  a  day's
          worth of raw files, if more than one.

          a - include averages along with totals
          i - include the interval data itself, which is the equivalent of
          -oA
          s - print summary stats at the end of each file  processed  even
          if more than one per day

   -s, --subsys subsystem
          This  field  controls which subsystem data is to be collected or
          played back.  The default for collecting data  is  "cdn",  which
          stands  for  CPU,  Disk and Network summary data and the default
          for playback is everthing that was collected.

          The rules for displaying results vary depending on the  type  of
          data  selected.   If  you write data for CPUs and DISKs to a raw
          file and play it back with -sc, you will only see CPU data.   If
          you  play  it  back  with  -scm you will still only see CPU data
          since memory data was not collected.  However,  when  used  with
          -P,  collectl  will  always  honor the subsystems specified with
          this switch so in the previous example you  will  see  CPU  data
          plus  memory  data of all 0s.  To see the current set of default
          subsystems, which are a subset of this full list, use -h.

          You can also use + or - to add or  subtract  subsystems  to/from
          the  default  values.  For example, "-s-cdn+N"< will remove cpu,
          disk and network  monitoring  from  the  defaults  while  adding
          network detail.

          Refer  to  data  definitions  on  the  sourceforge website OR in
          /usr/share/collectl/doc/collectl-xxx     to     see     complete
          descriptions of the data returned.

          SUMMARY SUBSYSTEMS

          b - buddy info (memory fragmentation)
          c - CPU
          d - Disk
          f - NFS V3 Data
          i - Inode and File System
          j - Interrupts
          l - Lustre
          m - Memory
          n - Networks
          s - Sockets
          t - TCP
          x - Interconnect
          y - Slabs (system object caches)

          DETAIL SUBSYSTEMS

          This  is  the  set  of  detail data from which in most cases the
          corresponding summary data is derived.  There  are  currently  2
          types  that do not have corresponding summary data and those are
          "Environmental" and "Process".  So,  if  one  has  3  disks  and
          chooses -sd, one will only see a single total taken across all 3
          disks.  If one chooses  -sD,  individual  disk  totals  will  be
          reported but no totals.  Choosing -sdD will get you both.

          C - CPU
          D - Disk
          E - Environmental data (fan, power, temp),  via ipmitool
          F - NFS Data
          J - Interrupts
          L - Lustre OST detail OR client Filesystem detail
          M - Memory node data, which is also known as numa data
          N - Networks
          T - 65 TCP counters only available in plot format
          X - Interconnect
          Y - Slabs (system object caches)
          Z - Processes

   --showheader
          In  collectl  mode  this  command  will cause the header that is
          normally written to a data file to be displayed on the  terminal
          and  collectl  then  exists.   This  can be a handy way to get a
          brief overview of the system configuration.

   --showoptions
          This command shows only  the  portion  of  the  help  text  that
          desribes  the  -o  and  --options  switches  to save the time of
          wading through the entire help screen.

   --showcolheaders
          This command shows the first set of headers that will be printed
          by  collectl  and  exits.   Doesn't really make sense for multi-
          section output like several sets  of  verbose  or  detail  data.
          Also  note  that  since  it  requires one monitoring interval to
          build up some headers which may be dynamic, it also  forces  the
          interval to 0.

   --showsubopts
          List all the subsystem specifice options

   --showtopopts
          Show  all  the  different values for the --top type field, which
          specify the field(s) by to sort the data

   --showrootslabs
          This command only works on systems using the new slab  allocator
          and  will  list  the  root  name  (these  are  those  entries in
          /sys/slab which are not soft links) along  with  all  its  alias
          names.   If  a name doesn't have an alias, it will not appear in
          this report.

   --showslabaliases
          This command only works on systems using the new slab allocator.
          Like  --showrootslabs,  it  will name a slab and all its aliases
          but rather than show the root slab name it will show one of  the
          aliases  to  provide  a  more meaningful name.  If there are any
          slabs that only have a single (or no) alias  they  will  not  be
          included in this report.

   --showsubopts
          Similar  to  --showoptions,  this  command  summaries  just  the
          paramaters associated with -O and --subopts.

   --showsubsys
          Yet another way to summare a portion  of  the  help  text,  this
          command only shows valid subsystems.

   --top [type][,num[,v]]
          Include  the  top "num" consumers by resource for this interval.
          The default number is the height of the  window  if  it  can  be
          determined  otherwise  24, and the default resource is the total
          cpu time which is taken as  the  sum  of  SysT  and  UsrT.   See
          --showtopopts for a list of other types of data you can sort on.

          This  switch can also be used with -s in which case a portion of
          the window is reserved at the top to fill in the subsystem data,
          which  is  currently  in  verbose  mode though a brief format is
          contemplated for some time in the future.

          In interactive mode and if not specified, the process monitoring
          interval  will  be set to that for other subsystems.  The screen
          will be cleared for each interval resulting in a display similar
          to  the  "top" utility.  In playback more the screen will NOT be
          cleared.  You cannot use this switch in "record" mode.

          Finally, if v is specified as  the  3rd  parameter,  the  output
          scrolls  vertically (like playbak mode) rather than clearing the
          screen between intervals.

   --umask mask
          Sets collectl's umask to control output file permissions.   Only
          root can set the umask.  See "man umask" for details.

   --utime mask
          Write  periodic  micro-timestamps  into  raw  file  at different
          points in time for fine grained measurements of operation times.
          1 - write timestamps when entering major sections
          2 - write timestamps for all /proc accesses except  for  process
          data
          4  - write timestamps for /proc data for all processes including
          threads

   -v
          Show version and whether or  not  Compression  and/or  HiResTime
          modules have been installed and exit.

   -V
          Show  default parmeter and control settings, all of which can be
          changed in /etc/collectl.conf

   --verbose
          Display output in verbose mode.  This often displays  more  data
          than  in the default mode.  When displaying detail data, verbose
          mode is forced.  Furthermore,  if  summary  data  for  a  single
          subsystem  is  to  be displayed in verbose mode, the headers are
          only repeated occasionally whereas if  multiple  subsystems  are
          involved each needs their own header.

   -w
          Disply data in wide mode.  When displaying data on the terminal,
          some data is formatted followed by a K, M or G  as  appropriate.
          Selecting this switch will cause the full field to be displayed.
          Note that there is no attempt to  align  data  with  the  column
          headings in this mode.

SUBSYSTEM OPTIONS

   The  following options are subsystem specific and typically filter data
   for collection and/or display as well as affect the output format:

   --cpufilt[^]perl-regx[,perl-regx...]
          Works the same as dskfilt and netfilt, allows one  to  select  a
          subset  of  CPUs.   These  filters are also honored by interrupt
          reporting as well.

   --cpuopts
          z - only applies to cpu details, do not report any CPUs with  no
          load.  In other words all entries are zero except for IDLE.

   --dskfilt [^]perl-regx[,perl-regx...]
          NOTE  -  this  does NOT effect data collection and ALL disk data
          will always be collected, unless --rawdskfilt is specified  too.
          However, only data for disk names that match the pattern(s) will
          be included in the summary totals and displayed when details are
          requested.   Alternatively,  if you preface the first expression
          with a caret, all names that match all strings will be  excluded
          from   the  summary  totals  and  detail  displays  rather  then
          included.  If you don't know perl, a partial string will usually
          work too.

   --dskopts
          f  -  report  some  columns  as  fractions for more precision on
          detail output
          i - display the i/o sizes in brief mode just like with --iosize
          o - exclude unused disks from new file headers and plot data
          z - only applies to disk details, do not report any  lines  with
          values of all zeros.

   --dskremap aaa:bbb,ccc:ddd...
          This  will  cause disk names matching the perl pattern aaa to be
          replaced with the string bbb.  In some  cases,  you  may  simply
          want to remove the entire string in which case the second string
          should be left empty.  If you want to remove a string  container
          a /, be sure to escape it with a backslash.

   --envopts Environmental Options
          The  default is to display ALL data but the following will cause
          a subset to be displayed

          f - display fan data
          p - display current (power) data
          t - display temperature data
          C - convert temperature to Celcius if in Farenheit
          F - convert temperature to Farenheit if in Celcius
          M - display each type of data on separate line
          T  -  display   data   truncated   to   whole   integers   (some
          implemenations displayed them with fractional components)
          9 - any number, will tell ipmitool to read on this device number

   --envfilt  regx  If specified, this regx is evaluated against each line
   of data returned by ipmitool and only those that  match  are  retained.
   All other data is lost.

   --envremap perl-regx,...
          If   specified  as  a  comma  separated  list  of  perl  regular
          substitution  expressions  without   the   =~s   portion,   each
          expression  is applied to each environmental field name, thereby
          allowing one to rename the column headers.   This  can  be  most
          useful  when  running  on  heterogeneuos  systems  and  you want
          consistent column names.

   --intfilt [^]perl-regx[,perl-regx...]
          NOTE - this does NOT effect data collection,  ALL interrupt data
          will  always  be  collected.   However, only data for interrupts
          that match the pattern(s) will be included in the summary totals
          and displayed when details are requested.  Alternatively, if you
          preface the first expression with a caret, all names that  match
          all  strings will be excluded from the summary totals and detail
          displays rather then  included.   If  you  don't  know  perl,  a
          partial string will usually work too.

          NOTE - these expressions are applied to the entire line one sees
          in /proc/interrupts, including the interrupt  number,  name  and
          even  counters  so if you do want to include an interrupt number
          in the pattern be sure to include the trailing colon as well.

   --lustopts Lustre Options
          B - For clients and servers, show buffer stats
          D - For MDSs and OSTs AND running  earlier  versions  of  HPSFS,
          collect disk block iostats
          M - For clients, collect metadata
          O - For OSTs, show detail level stats
          R - For client, collect readahead stats

   --memopts Memory Options
          R - show memory values (including swap space) as rates of change
          as opposed to absolute  values.   One  can  also  show  absolute
          changes between intervals by including -on.

   --netfilt [^]perl-regx[,perl-regx...]
          NOTE - this does NOT effect data collection and ALL network data
          will always be collected, unless --rawnetfilt is specified  too.
          Also note that by default only eth, ib, em and p1p networks when
          present are included  in  the  summary.   When  this  switch  is
          specified, only data for network names that match the pattern(s)
          will be included in the summary and displayed when  details  are
          requested.   This switch therefore also gives you the ability to
          add other, possibly new, network devices to the summary totals.

          Alternatively, if you preface the first expression with a caret,
          all  names  that  match  all  strings  will be excluded from the
          summary totals and detail displays rather then included.  If you
          don't know perl, a partial string will usually work too.

   --netopts
          e  -  include  network  error counts in brief and explicit error
          types elsewhere
          E - only include lines with network errors in them
          i - include i/o sizes in brief mode
          o - exclude unused networks from new file headers and plot data
          w - set width of network device name

   --nfsfilt NFS Filters
          Specify one or more comma separated filters as a C/S followed by
          an nfs version number and only those will have data reported on.
          For example, C2 says to report data on V2 Clients.   As  a  data
          collection  performance  optimization,  if  one  or  more client
          filters are specified, data will actually be collected  for  all
          clients as is also done for servers.

   --nfsopts  NFS  Options  q.RS  z - only display detail lines which have
   data

   --procfilt Process Filters
          These  filters  restrict  which  processes  are   selected   for
          collection/display.   Using this filter will significanly reduce
          the load on process data collection  since  collectl  creates  a
          blacklist  of  those  existing  processes  that  do not pass the
          filter  and  so  are  permanently  excluded  from   any   future
          processing.

          The format of a filter is a one charter type followed by a match
          string.  Multiple filters  may  be  specified  if  separated  by
          commas.

          c  -  substring of the command being executed as explicitly read
          from /proc/pid/stat.  Note that this  can  actually  be  a  perl
          expression,  so  if you want a command that ends in a particular
          string all you need to is append a \$ to the end of the  string.
          Otherwise it would match any commands containing that string.
          C - any command that starts with the specified string
          f  - full path of the command, including arguments, as read from
          /proc/pid/cmdline.  Like the c modifier this too can be  a  perl
          expression.
          p - pid
          P - parent pid
          u  -  any  process  ownerd  by  this  user's UID or in the range
          specifide by uxxx-yyy
          U - any process owned by this username

          caution: the process names collectl tries to match with c and  C
          is  the second field in /proc/pid/stat which may not necessarily
          be what you think!  eg the name for X emacs is actually emacs-x

   --procopts options
          These options control the way data is  displayed  and  can  also
          improve data collection  performance

          c - include CPU time of children who have exited (same as ps -S)
          f  -  use  cumulative  totals  for  page  faults in process data
          instead of rates
          i - show process I/O counters  in  display  instead  of  default
          format
          I - disable collection of I/O counters, see note below
          k  -  remove known shells from process names, making it possible
          to see actual command
          m - show breakdown of  memory  utilization  instead  of  default
          format
          p - never look for new pids or threads during data collection
          r  -  show  root  command  name only (no directory) for narrower
          display. Note that this is applied AFTER 'k' so if arg1  becomes
          the  new  command  it will be truncated now, which is very handy
          when running in a virtual python environment
          R - show ALL process priorities  ('RT'  currently  displayed  if
          realtime)
          s - show process start time in hh:mm:ss format
          S - show process start time in mmmdd-hh:mm:ss format
          t - include ALL process threads (increases collection overhead)
          u  -  report  username as 12 chars instead of 8, noting uxx will
          cause column width to be xx but cannot be less than 8
          w - widen display  by  including  whole  argument  string,  with
          optional max width
          x  -  include  extended  process  attributes (currently only for
          context switches)
          z - exclude any processes with 0 in sort field (in --top mode)

          Process data is the  most  expensive  type  of  data  collected,
          costing  as  much  as 3 times the CPU load as all other types of
          data combined.  Collecting thread  data  makes  this  even  more
          expensive.   One  can  significantly reduce this load by over 25
          percent by disabling the collection of I/O stats.  However, keep
          in  mind  that  even  if  you don't try to optimize process data
          collection, the overall system load by collectl can still be  on
          the  order  of  about 0.2% when running as a daemon with default
          collection rates.  See the  online  documentation  on  measuring
          performance for more information.

          A  security  hole  was  identified  that allowed non-priviledged
          users to read /proc/pid/io and guess password  lengths  and  noe
          many  distros retrict access to the owner or root.  As a result,
          non-priviledged users will see all 0 I/O  counts  for  processes
          that are not theirs when specifying --procopt i.

   --slabfilt Slab Filters
          One  can  specify  a  list of slab names separated by commas and
          only those slabs whose names start with those  strings  will  be
          listed or summaried.

   --slabopts Slab Options
          s - exclude any slabs with an allocation of 0
          S  -  only show those slabs whose allocations changed since last
          display

   --tcpfilt
          These filters actually control both what is collected as well as
          displayed.   If  one  selects  non-collected filters, 0s will be
          reported.  There is one special case and that is if one includes
          T  (tcp extended stats) in the filter string, there are no brief
          ones and therefore --verbose will be forced.
          i - ip stats
          t - tcp stats
          u - udp stats
          c - icmp stats
          I - ip extended stats
          T - tcp excented stats

   --xopts
          i - include i/o sizes in brief mode

DESCRIPTION

   The collectl utility is  a  system  monitoring  tool  that  records  or
   displays  specific  operating  system  data  for  one  or  more sets of
   subsystems. Any set of the subsystems, such as CPU,  Disks,  Memory  or
   Sockets  can be included in or excluded from data collection.  Data can
   either be displayed back  to  the  terminal,  or  stored  in  either  a
   compressed  or  uncompressed  data  file. The data files themselves can
   either be in raw format (essentially a direct copy from the  associated
   /proc structures) or in a space separated plottable format such that it
   can be easily plotted using tools such as gnuplot or excel.  Data files
   can  be  read  and manipulated from the command line, or through use of
   command scripts.

   Upon startup, collectl.conf is read, which sets  a  number  of  default
   parameters and switch values.  Collectl searches for this file first in
   /etc, then in the directory the collectl execuable lives in  (typically
   /usr/sbin)  and  finally the current directory.  These locations can be
   overriden with the -C switch.  Unless  you're  doing  something  really
   special,  this  file  need never be touched, the only exception perhaps
   being when choosing to run collectl as a service and you wish to change
   it's default behavior which is set by the DaemonCommand entry.

RESTRICTIONS/PROBLEMS

   Thread reporting currently only works with 2.6 kernels.

   The  pagesize  has been hardcoded for perl 5.6 systems to 4096 for IA32
   and 16384 for all others.  If you are running 5.6 on a  system  with  a
   different  pagesize  you  will  see incorrect SLAB allocation sizes and
   will need to scale the numbers you're seeing accordingly.

   I have recently discovered there is a bug in /proc  in  that  an  extra
   line  is  occasionally  read with the end of the previous buffer!  When
   this occurs a message is written (if -m enabled) and always written  to
   the  terminal.  Since this happens with a higher frequency with process
   data I silently ignore those as the output can get pretty  noisey.   If
   for any reason this is a problem, be sure to let me know.

   Since  collectl  has  no  control over the frequency at which data gets
   written to /proc, one can get anomolous statistics as collectl is  only
   reporting  a  snapshot of what is being recorded.  For more information
   see http://collectl.sourceforge.net/TheMath.html.

   At least one network  card  occasionally  generates  erroneous  network
   stats  and  to  try to keep the data rational, collectl tries to detect
   this and when it does generates a message  that  bogus  data  has  been
   detected.

FILES, EXAMPLES AND MORE INFORMATION

   http://collectl.sourceforge.net OR /opt/hp/collectl/docs

ACKNOWLEDGEMENTS

   I  would  like  to  thank  Rob Urban for his creation of the Tru64 Unix
   collect tool, which collectl is based on.

AUTHOR

   This program was written by Mark Seger ([email protected]).
   Copyright 2003-2015 Hewlett-Packard Development Company, LP
   collectl may be copied only under the  terms  of  either  the  Artistic
   License  or  the  GNU General Public License, which may be found in the
   source kit



Opportunity


Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.

Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.


Free Software


Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.

Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.


Free Books


The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.

Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.


Education


Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.

Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.