slurmdbd.conf - Slurm Database Daemon (SlurmDBD) configuration file
slurmdb.conf is an ASCII file which describes Slurm Database Daemon
(SlurmDBD) configuration information. The file location can be
modified at system build time using the DEFAULT_SLURM_CONF parameter or
at execution time by setting the SLURM_CONF environment variable.
The contents of the file are case insensitive except for the names of
nodes and files. Any text following a "#" in the configuration file is
treated as a comment through the end of that line. Changes to the
configuration file take effect upon restart of SlurmDbd or daemon
receipt of the SIGHUP signal unless otherwise noted.
This file should be only on the computer where SlurmDBD executes and
should only be readable by the user which executes SlurmDBD (e.g.
"slurm"). If the slurmdbd daemon is started as user root and changes
to another user ID, the configuration file will initially be read as
user root, but will be read as the other user ID in response to a
SIGHUP signal. This file should be protected from unauthorized access
since it contains a database password. The overall configuration
parameters available include:
ArchiveDir
If ArchiveScript is not set the slurmdbd will generate a file
that can be read in anytime with sacctmgr load filename. This
directory is where the file will be placed after a purge event
has happened and archive for that element is set to true.
Default is /tmp. The format for this files name is
$ArchiveDir/$ClusterName_$ArchiveObject_archive_$BeginTimeStamp_$endTimeStamp
ArchiveEvents
When purging events also archive them. Boolean, yes to archive
event data, no otherwise. Default is no.
ArchiveJobs
When purging jobs also archive them. Boolean, yes to archive
job data, no otherwise. Default is no.
ArchiveResvs
When purging reservations also archive them. Boolean, yes to
archive reservation data, no otherwise. Default is no.
ArchiveScript
This script can be executed every time a rollup happens (every
hour, day and month), depending on the Purge*After options.
This script is used to transfer accounting records out of the
database into an archive. It is used in place of the internal
process used to archive objects. The script is executed with a
no arguments, The following environment variables are set.
SLURM_ARCHIVE_EVENTS
1 for archive events 0 otherwise.
SLURM_ARCHIVE_LAST_EVENT
Time of last event start to archive.
SLURM_ARCHIVE_JOBS
1 for archive jobs 0 otherwise.
SLURM_ARCHIVE_LAST_JOB
Time of last job submit to archive.
SLURM_ARCHIVE_STEPS
1 for archive steps 0 otherwise.
SLURM_ARCHIVE_LAST_STEP
Time of last step start to archive.
SLURM_ARCHIVE_SUSPEND
1 for archive suspend data 0 otherwise.
SLURM_ARCHIVE_LAST_SUSPEND
Time of last suspend start to archive.
ArchiveSteps
When purging steps also archive them. Boolean,
yes to archive step data, no otherwise. Default
is no.
ArchiveSuspend
When purging suspend data also archive it.
Boolean, yes to archive suspend data, no
otherwise. Default is no.
AuthInfo
Additional information to be used for
authentication of communications with the Slurm
control daemon (slurmctld) on each cluster. The
interpretation of this option is specific to the
configured AuthType. In the case of auth/munge,
this can be configured to use a Munge daemon
specifically configured to provide authentication
between clusters while the default Munge daemon
provides authentication within a cluster. In that
case, this will specify the pathname of the socket
to use. Per default this value is left
unspecified, which results in the default
authentication mechanism being used.
AuthType
Define the authentication method for
communications between Slurm components.
Acceptable values at present include "auth/none"
and "auth/munge". The default value is
"auth/none", which means the UID included in
communication messages is not verified. This may
be fine for testing purposes, but do not use
"auth/none" if you desire any security.
"auth/munge" indicates that LLNL's Munge system is
to be used (this is the best supported
authentication mechanism for Slurm, see
"https://code.google.com/p/munge/" for more
information). SlurmDBD must be terminated prior
to changing the value of AuthType and later
restarted.
CommitDelay
How many seconds between commits on a connection
from a Slurmctld. This speeds up inserts into the
database dramatically. If you are running a very
high throughput of jobs you should consider
setting this. In testing, 1 second improves the
slurmdbd performance dramatically and reduces
overhead. There is a small probability of data
loss though since this creates a window in which
if the slurmdbd seg faults or exits abnormally for
any reason the data not committed could be lost.
While this situation should be very rare, it does
present an extremely small risk, but may be the
only way to run in extremely heavy environments.
In all honesty, the risk is quite low, but still
present.
DbdBackupHost
The short, or long, name of the machine where the
backup Slurm Database Daemon is executed (i.e. the
name returned by the command "hostname -s"). This
host must have access to the same underlying
database specified by the 'Storage' options
mentioned below.
DbdAddr
Name that DbdHost should be referred to in
establishing a communications path. This name will
be used as an argument to the gethostbyname()
function for identification. For example,
"elx0000" might be used to designate the Ethernet
address for node "lx0000". By default the DbdAddr
will be identical in value to DbdHost.
DbdHost
The short, or long, name of the machine where the
Slurm Database Daemon is executed (i.e. the name
returned by the command "hostname -s"). This
value must be specified.
DbdPort
The port number that the Slurm Database Daemon
(slurmdbd) listens to for work. The default value
is SLURMDBD_PORT as established at system build
time. If none is explicitly specified, it will be
set to 6819. This value must be equal to the
AccountingStoragePort parameter in the slurm.conf
file.
DebugFlags
Defines specific subsystems which should provide
more detailed event logging. Multiple subsystems
can be specified with comma separators. Most
DebugFlags will result in verbose logging for the
identified subsystems and could impact
performance. Valid subsystems available today
(with more to come) include:
DB_ARCHIVE SQL statements/queries when
dealing with archiving and
purging the database.
DB_ASSOC SQL statements/queries when
dealing with associations in the
database.
DB_EVENT SQL statements/queries when
dealing with (node) events in the
database.
DB_JOB SQL statements/queries when
dealing with jobs in the
database.
DB_QOS SQL statements/queries when
dealing with QOS in the database.
DB_QUERY SQL statements/queries when
dealing with transactions and
such in the database.
DB_RESERVATION SQL statements/queries when
dealing with reservations in the
database.
DB_RESOURCE SQL statements/queries when
dealing with resources like
licenses in the database.
DB_STEP SQL statements/queries when
dealing with steps in the
database.
DB_USAGE SQL statements/queries when
dealing with usage queries and
inserts in the database.
DB_WCKEY SQL statements/queries when
dealing with wckeys in the
database.
DebugLevel
The level of detail to provide the Slurm Database
Daemon's logs. The default value is info.
quiet Log nothing
fatal Log only fatal errors
error Log only errors
info Log errors and general informational
messages
verbose Log errors and verbose informational
messages
debug Log errors and verbose informational
messages and debugging messages
debug2 Log errors and verbose informational
messages and more debugging messages
debug3 Log errors and verbose informational
messages and even more debugging
messages
debug4 Log errors and verbose informational
messages and even more debugging
messages
debug5 Log errors and verbose informational
messages and even more debugging
messages
DefaultQOS
When adding a new cluster this will be used as the
qos for the cluster unless something is explicitly
set by the admin with the create.
LogFile
Fully qualified pathname of a file into which the
Slurm Database Daemon's logs are written. The
default value is none (performs logging via
syslog).
See the section LOGGING in the slurm.conf man page
if a pathname is specified.
LogTimeFormat
Format of the timestamp in slurmdbd log files.
Accepted values are "iso8601", "iso8601_ms",
"rfc5424", "rfc5424_ms", "clock", and "short". The
values ending in "_ms" differ from the ones
without in that fractional seconds with
millisecond precision are printed. The default
value is "iso8601_ms". The "rfc5424" formats are
the same as the "iso8601" formats except that the
timezone value is also shown. The "clock" format
shows a timestamp in microseconds retrieved with
the C standard clock() function. The "short"
format is a short date and time format. The
"thread_id" format shows the timestamp in the C
standard ctime() function form without the year
but including the microseconds, the daemon's
process ID and the current thread ID.
MessageTimeout
Time permitted for a round-trip communication to
complete in seconds. Default value is 10 seconds.
PidFile
Fully qualified pathname of a file into which the
Slurm Database Daemon may write its process ID.
This may be used for automated signal processing.
The default value is "/var/run/slurmdbd.pid".
PluginDir
Identifies the places in which to look for Slurm
plugins. This is a colon-separated list of
directories, like the PATH environment variable.
The default value is "/usr/local/lib/slurm".
PrivateData
This controls what type of information is hidden
from regular users. By default, all information
is visible to all users. User SlurmUser, root,
and users with AdminLevel=Admin can always view
all information. Multiple values may be specified
with a comma separator. Acceptable values
include:
accounts
prevents users from viewing any account
definitions unless they are coordinators of
them.
jobs prevents users from viewing job records
belonging to other users unless they are
coordinators of the association running the
job when using sacct.
reservations
restricts getting reservation information
to users with operator status and above.
usage prevents users from viewing usage of any
other user. This applys to sreport.
users prevents users from viewing information of
any user other than themselves, this also
makes it so users can only see associations
they deal with. Coordinators can see
associations of all users they are
coordinator of, but can only see themselves
when listing users.
PurgeEventAfter
Events happening on the cluster over this age are
purged from the database. This includes node down
times and such. The time is a numeric value and
is a number of months. If you want to purge more
often you can include "hours", or "days" behind
the numeric value to get those more frequent
purges (i.e. a value of "12hours" would purge
everything older than 12 hours). The purge takes
place at the start of the each purge interval.
For example, if the purge time is 2 months, the
purge would happen at the beginning of each month.
If not set (default), then job step records are
never purged.
PurgeJobAfter
Individual job records over this age are purged
from the database. Aggregated information will be
preserved indefinitely. The time is a numeric
value and is a number of months. If you want to
purge more often you can include "hours", or
"days" behind the numeric value to get those more
frequent purges (i.e. a value of "12hours" would
purge everything older than 12 hours). The purge
takes place at the start of the each purge
interval. For example, if the purge time is 2
months, the purge would happen at the beginning of
each month. If not set (default), then job
records are never purged.
PurgeResvAfter
Individual reservation records over this age are
purged from the database. Aggregated information
will be preserved indefinitely. The time is a
numeric value and is a number of months. If you
want to purge more often you can include "hours",
or "days" behind the numeric value to get those
more frequent purges (i.e. a value of "12hours"
would purge everything older than 12 hours). The
purge takes place at the start of the each purge
interval. For example, if the purge time is 2
months, the purge would happen at the beginning of
each month. If not set (default), then
reservation records are never purged.
PurgeStepAfter
Individual job step records over this age are
purged from the database. Aggregated information
will be preserved indefinitely. The time is a
numeric value and is a number of months. If you
want to purge more often you can include "hours",
or "days" behind the numeric value to get those
more frequent purges (i.e. a value of "12hours"
would purge everything older than 12 hours). The
purge takes place at the start of the each purge
interval. For example, if the purge time is 2
months, the purge would happen at the beginning of
each month. If not set (default), then job step
records are never purged.
PurgeSuspendAfter
Records of individual suspend times for jobs over
this age are purged from the database. Aggregated
information will be preserved indefinitely. The
time is a numeric value and is a number of months.
If you want to purge more often you can include
"hours", or "days" behind the numeric value to get
those more frequent purges (i.e. a value of
"12hours" would purge everything older than 12
hours). The purge takes place at the start of the
each purge interval. For example, if the purge
time is 2 months, the purge would happen at the
beginning of each month. If not set (default),
then job step records are never purged.
SlurmUser
The name of the user that the slurmctld daemon
executes as. This user must exist on the machine
executing the Slurm Database Daemon and have the
same user ID as the hosts on which slurmctld
execute. For security purposes, a user other than
"root" is recommended. The default value is
"root".
StorageHost
Define the name of the host the database is
running where we are going to store the data.
Ideally this should be the host on which slurmdbd
executes.
StorageBackupHost
Define the name of the backup host the database is
running where we are going to store the data.
This can be viewed as a backup solution when the
StorageHost is not responding. It is up to the
backup solution to enforce the coherency of the
accounting information between the two hosts. With
clustered database solutions (active/passive HA),
you would not need to use this feature. Default
is none.
StorageLoc
Specify the name of the database as the location
where accounting records are written.
StoragePass
Define the password used to gain access to the
database to store the job accounting data.
StoragePort
The port number that the Slurm Database Daemon
(slurmdbd) communicates with the database.
StorageType
Define the accounting storage mechanism type.
Acceptable values at present include
"accounting_storage/mysql". The value
"accounting_storage/mysql" indicates that
accounting records should be written to a MySQL or
MariaDB database specified by the StorageLoc
parameter. This value must be specified.
StorageUser
Define the name of the user we are going to
connect to the database with to store the job
accounting data.
TCPTimeout
Time permitted for TCP connection to be
established. Default value is 2 seconds.
TrackWCKey
Boolean yes or no. Used to set display and track
of the Workload Characterization Key. Must be set
to track wckey usage. This must be set to
generate rolled up usage tables from WCKeys.
NOTE: If TrackWCKey is set here and not in your
various slurm.conf files all jobs will be
attributed to their default WCKey.
TrackSlurmctldDown
Boolean yes or no. If set the slurmdbd will mark
all idle resources on the cluster as down when a
slurmctld disconnects or is no longer reachable.
The default is no.
# # Sample /etc/slurmdbd.conf # ArchiveEvents=yes ArchiveJobs=yes ArchiveResv=yes ArchiveSteps=no ArchiveSuspend=no #ArchiveScript=/usr/sbin/slurm.dbd.archive AuthInfo=/var/run/munge/munge.socket.2 AuthType=auth/munge DbdHost=db_host DebugLevel=4 PurgeEventAfter=1month PurgeJobAfter=12month PurgeResvAfter=1month PurgeStepAfter=1month PurgeSuspendAfter=1month LogFile=/var/log/slurmdbd.log PidFile=/var/tmp/jette/slurmdbd.pid SlurmUser=slurm_mgr StoragePass=shazaam StorageType=accounting_storage/mysql StorageUser=database_mgr
Copyright (C) 2008-2010 Lawrence Livermore National Security. Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). Copyright (C) 2010-2014 SchedMD LLC. This file is part of Slurm, a resource management program. For details, see <http://slurm.schedmd.com/>. Slurm is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
/etc/slurmdbd.conf
slurm.conf(5), slurmctld(8), slurmdbd(8) syslog (2)
Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.
Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.
Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.
Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.
The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.
Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.
Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.
Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.