9 long_desc: if blank, ceph assumes the short hostname (hostname -s)
20 desc: cluster fsid (uuid)
21 fmt_desc: The cluster ID. One per cluster.
22 May be generated by a deployment tool if not specified.
23 note: Do not set this value if you use a deployment tool that does
35 desc: public-facing address to bind to
36 fmt_desc: The IP address for the public (front-side) network.
49 desc: public-facing address to bind to
58 - name: public_bind_addr
65 fmt_desc: In some dynamic deployments the Ceph MON daemon might bind
66 to an IP address locally that is different from the ``public_addr``
67 advertised to other peers in the network. The environment must ensure
68 that routing rules are set correctly. If ``public_bind_addr`` is set
69 the Ceph Monitor daemon will bind to it locally and use ``public_addr``
70 in the monmaps to advertise its address to peers. This behavior is limited
71 to the Monitor daemon.
76 desc: cluster-facing address to bind to
77 fmt_desc: The IP address for the cluster (back-side) network.
86 - name: public_network
89 desc: Network(s) from which to choose a public address to bind to
90 fmt_desc: The IP address and netmask of the public (front-side) network
91 (e.g., ``192.168.0.0/24``). Set in ``[global]``. You may specify
92 comma-separated subnets. The format of it looks like
93 ``{ip-address}/{netmask} [, {ip-address}/{netmask}]``
104 - name: public_network_interface
107 desc: Interface name(s) from which to choose an address from a public_network to
108 bind to; public_network must also be specified.
120 - name: cluster_network
123 desc: Network(s) from which to choose a cluster address to bind to
124 fmt_desc: The IP address and netmask of the cluster (back-side) network
125 (e.g., ``10.0.0.0/24``). Set in ``[global]``. You may specify
126 comma-separated subnets. The format of it looks like
127 ``{ip-address}/{netmask} [, {ip-address}/{netmask}]``
135 - name: cluster_network_interface
138 desc: Interface name(s) from which to choose an address from a cluster_network to
139 bind to; cluster_network must also be specified.
154 desc: path to MonMap file
155 long_desc: This option is normally used during mkfs, but can also be used to identify
156 which monitors to connect to.
165 desc: list of hosts or addresses to search for a monitor
166 long_desc: This is a list of IP addresses or hostnames that are separated by commas, whitespace, or semicolons. Hostnames are resolved via DNS. All A and AAAA records are included in the search list.
172 - name: mon_host_override
175 desc: monitor(s) to use overriding the MonMap
176 fmt_desc: This is the list of monitors that the Ceph process **initially** contacts when first establishing communication with the Ceph cluster. This overrides the known monitor list that is derived from MonMap updates sent to older Ceph instances (like librados cluster handles). This option is expected to be useful primarily for debugging.
182 - name: mon_dns_srv_name
185 desc: name of DNS SRV record to check for monitor addresses
186 fmt_desc: the service name used querying the DNS for the monitor hosts/addresses
196 - name: container_image
199 desc: container image (used by cephadm orchestrator)
200 default: docker.io/ceph/daemon-base:latest-master-devel
203 - name: no_config_file
206 desc: signal that we don't require a config file to be present
207 long_desc: When specified, we won't be looking for a configuration file, and will
208 instead expect that whatever options or values are required for us to work will
209 be passed as arguments.
221 desc: enable lockdep lock dependency analyzer
229 - name: lockdep_force_backtrace
232 desc: always gather current backtrace at every lock
244 desc: path for the 'run' directory for storing pid and socket files
245 default: /var/run/ceph
256 desc: path for the runtime control socket file, used by the 'ceph daemon' command
257 fmt_desc: The socket for executing administrative commands on a daemon,
258 irrespective of whether Ceph Monitors have established a quorum.
259 daemon_default: $run_dir/$cluster-$name.asok
264 # default changed by common_preinit()
266 - name: admin_socket_mode
269 desc: file mode to set for the admin socket file, e.g, '0755'
280 desc: whether to daemonize (background) after startup
296 # default changed by common_preinit()
301 desc: uid or user name to switch to on startup
302 long_desc: This is normally specified by the systemd unit file.
318 desc: gid or group name to switch to on startup
319 long_desc: This is normally specified by the systemd unit file.
332 - name: setuser_match_path
335 desc: if set, setuser/setgroup is condition on this path matching ownership
336 long_desc: If setuser or setgroup are specified, and this option is non-empty, then
337 the uid/gid of the daemon will only be changed if the file or directory specified
338 by this option has a matching uid and/or gid. This exists primarily to allow
339 switching to user ceph for OSDs to be conditional on whether the osd data contents
340 have also been chowned after an upgrade. This is normally specified by the systemd
358 desc: path to write a pid file (if any)
359 fmt_desc: The file in which the mon, osd or mds will write its
360 PID. For instance, ``/var/run/$cluster/$type.$id.pid``
361 will create /var/run/ceph/mon.a.pid for the ``mon`` with
362 id ``a`` running in the ``ceph`` cluster. The ``pid
363 file`` is removed when the daemon stops gracefully. If
364 the process is not daemonized (i.e. runs with the ``-f``
365 or ``-d`` option), the ``pid file`` is not created.
379 desc: path to chdir(2) to after daemonizing
380 fmt_desc: The directory Ceph daemons change to once they are
381 up and running. Default ``/`` directory recommended.
395 - name: fatal_signal_handlers
398 desc: whether to register signal handlers for SIGABRT etc that dump a stack trace
399 long_desc: This is normally true for daemons and values for libraries.
400 fmt_desc: If set, we will install signal handlers for SEGV, ABRT, BUS, ILL,
401 FPE, XCPU, XFSZ, SYS signals to generate a useful log message
416 desc: Directory where crash reports are archived
417 default: /var/lib/ceph/crash
421 - name: restapi_log_level
424 desc: default set by python code
426 - name: restapi_base_url
429 desc: default set by python code
431 - name: erasure_code_dir
434 desc: directory where erasure-code plugins can be found
435 default: @CEPH_INSTALL_FULL_PKGLIBDIR@/erasure-code
445 desc: path to log file
446 fmt_desc: The location of the logging file for your cluster.
447 daemon_default: /var/log/ceph/$cluster-$name.log
454 # default changed by common_preinit()
459 desc: max unwritten log entries to allow before waiting to flush to the log
460 fmt_desc: The maximum number of new log files.
464 # default changed by common_preinit()
466 - name: log_max_recent
469 desc: recent log entries to keep in memory to dump in the event of a crash
470 long_desc: The purpose of this option is to log at a higher debug level only to
471 the in-memory buffer, and write out the detailed log messages only if there is
472 a crash. Only log entries below the lower log level will be written unconditionally
473 to the log. For example, debug_osd=1/5 will write everything <= 1 to the log
474 unconditionally but keep entries at levels 2-5 in memory. If there is a seg fault
475 or assertion failure, all entries will be dumped to the log.
477 daemon_default: 10000
478 # default changed by common_preinit()
483 desc: send log lines to a file
484 fmt_desc: Determines if logging messages should appear in a file.
489 - name: log_to_stderr
492 desc: send log lines to stderr
493 fmt_desc: Determines if logging messages should appear in ``stderr``.
495 daemon_default: false
497 - name: err_to_stderr
500 desc: send critical error log lines to stderr
501 fmt_desc: Determines if error messages should appear in ``stderr``.
505 - name: log_stderr_prefix
508 desc: String to prefix log messages with when sent to stderr
509 long_desc: This is useful in container environments when combined with mon_cluster_log_to_stderr. The
510 mon log prefixes each line with the channel name (e.g., 'default', 'audit'), while
511 log_stderr_prefix can be set to 'debug '.
513 - mon_cluster_log_to_stderr
514 - name: log_to_syslog
517 desc: send log lines to syslog facility
518 fmt_desc: Determines if logging messages should appear in ``syslog``.
521 - name: err_to_syslog
524 desc: send critical error log lines to syslog facility
525 fmt_desc: Determines if error messages should appear in ``syslog``.
528 - name: log_flush_on_exit
531 desc: set a process exit handler to ensure the log is flushed on exit
532 fmt_desc: Determines if Ceph should flush the log files after exit.
535 - name: log_stop_at_utilization
538 desc: stop writing to the log file when device utilization reaches this ratio
545 - name: log_to_graylog
548 desc: send log lines to remote graylog server
555 - name: err_to_graylog
558 desc: send critical error log lines to remote graylog server
565 - name: log_graylog_host
568 desc: address or hostname of graylog server to log to
575 - name: log_graylog_port
578 desc: port number for the remote graylog server
583 - name: log_to_journald
586 desc: send log lines to journald
590 - name: err_to_journald
593 desc: send critical error log lines to journald
597 - name: log_coarse_timestamps
600 desc: timestamp log entries from coarse system clock to improve performance
607 # options will take k/v pairs, or single-item that will be assumed as general
608 # default for all, regardless of channel.
609 # e.g., "info" would be taken as the same as "default=info"
610 # also, "default=daemon audit=local0" would mean
611 # "default all to 'daemon', override 'audit' with 'local0'
612 - name: clog_to_monitors
615 desc: Make daemons send cluster log messages to monitors
616 fmt_desc: Determines if ``clog`` messages should be sent to monitors.
617 default: default=true
625 - name: clog_to_syslog
628 desc: Make daemons send cluster log messages to syslog
629 fmt_desc: Determines if ``clog`` messages should be sent to syslog.
639 - name: clog_to_syslog_level
642 desc: Syslog level for cluster log messages
654 - name: clog_to_syslog_facility
657 desc: Syslog facility for cluster log messages
658 default: default=daemon audit=local0
669 - name: clog_to_graylog
672 desc: Make daemons send cluster log to graylog
681 - name: clog_to_graylog_host
684 desc: Graylog host to cluster log messages
696 - name: clog_to_graylog_port
699 desc: Graylog port number for cluster log messages
711 - name: enable_experimental_unrecoverable_data_corrupting_features
714 desc: Enable named (or all with '*') experimental features that may be untested,
715 dangerous, and/or cause permanent data loss
722 desc: Base directory for dynamically loaded plugins
723 default: @CEPH_INSTALL_FULL_PKGLIBDIR@
729 - name: compressor_zlib_isal
732 desc: Use Intel ISA-L accelerated zlib implementation if available
735 # regular zlib compression level, not applicable to isa-l optimized version
736 - name: compressor_zlib_level
739 desc: Zlib compression level to use
742 # regular zlib compression winsize, not applicable to isa-l optimized version
743 - name: compressor_zlib_winsize
746 desc: Zlib compression winsize to use
751 # regular zstd compression level
752 - name: compressor_zstd_level
755 desc: Zstd compression level to use
758 - name: qat_compressor_enabled
761 desc: Enable Intel QAT acceleration support for compression if available
764 - name: plugin_crypto_accelerator
767 desc: Crypto accelerator library to use
770 - name: openssl_engine_opts
773 desc: Use engine for specific openssl algorithm
774 long_desc: 'Pass opts in this way: engine_id=engine1,dynamic_path=/some/path/engine1.so,default_algorithms=DIGESTS:engine_id=engine2,dynamic_path=/some/path/engine2.so,default_algorithms=CIPHERS,other_ctrl=other_value'
778 - name: mempool_debug
788 desc: enable transparent huge page (THP) support
789 long_desc: Ceph is known to suffer from memory fragmentation due to THP use. This
790 is indicated by RSS usage above configured memory targets. Enabling THP is currently
791 discouraged until selective use of THP by Ceph is implemented.
798 desc: Authentication key
799 long_desc: A CephX authentication key, base64 encoded. It normally looks something
800 like 'AQAtut9ZdMbNJBAAHz6yBAWyJyz2yYRyeMWDag=='.
801 fmt_desc: The key (i.e., the text string of the key itself). Not recommended.
812 desc: Path to a file containing a key
813 long_desc: The file should contain a CephX authentication key and optionally a trailing
814 newline, but nothing else.
815 fmt_desc: The path to a key file (i.e,. a file containing only the key).
825 desc: Path to a keyring file.
826 long_desc: A keyring file is an INI-style formatted file where the section names
827 are client or daemon names (e.g., 'osd.0') and each section contains a 'key' property
828 with CephX authentication key as the value.
829 # please note, document are generated without accessing to the CMake
830 # variables, so please update the document manually with a representive
831 # default value using the ":default:" option of ".. confval::" directive.
832 default: @keyring_paths@
840 - name: heartbeat_interval
843 desc: Frequency of internal heartbeat checks (seconds)
848 - name: heartbeat_file
851 desc: File to touch on successful internal heartbeat
852 long_desc: If set, this file will be touched every time an internal heartbeat check
859 - name: heartbeat_inject_failure
867 desc: Enable internal performance metrics
868 long_desc: If enabled, collect and expose internal health metrics
874 desc: Messenger implementation to use for network communication
875 fmt_desc: Transport type used by Async Messenger. Can be ``async+posix``,
876 ``async+dpdk`` or ``async+rdma``. Posix uses standard TCP/IP networking and is
877 default. Other transports may be experimental and support may be limited.
882 - name: ms_public_type
885 desc: Messenger implementation to use for the public network
886 long_desc: If not specified, use ms_type
892 - name: ms_cluster_type
895 desc: Messenger implementation to use for the internal cluster network
896 long_desc: If not specified, use ms_type
902 - name: ms_mon_cluster_mode
905 desc: Connection modes (crc, secure) for intra-mon connections in order of preference
906 fmt_desc: the connection mode (or permitted modes) to use between monitors.
909 - ms_mon_service_mode
916 - name: ms_mon_service_mode
919 desc: Allowed connection modes (crc, secure) for connections to mons
920 fmt_desc: a list of permitted modes for clients or
921 other Ceph daemons to use when connecting to monitors.
925 - ms_mon_cluster_mode
931 - name: ms_mon_client_mode
934 desc: Connection modes (crc, secure) for connections from clients to monitors in
936 fmt_desc: a list of connection modes, in order of
937 preference, for clients or non-monitor daemons to use when
938 connecting to monitors.
941 - ms_mon_service_mode
942 - ms_mon_cluster_mode
948 - name: ms_cluster_mode
951 desc: Connection modes (crc, secure) for intra-cluster connections in order of preference
952 fmt_desc: connection mode (or permitted modes) used
953 for intra-cluster communication between Ceph daemons. If multiple
954 modes are listed, the modes listed first are preferred.
961 - name: ms_service_mode
964 desc: Allowed connection modes (crc, secure) for connections to daemons
965 fmt_desc: a list of permitted modes for clients to use
966 when connecting to the cluster.
973 - name: ms_client_mode
976 desc: Connection modes (crc, secure) for connections from clients in order of preference
977 fmt_desc: a list of connection modes, in order of
978 preference, for clients to use (or allow) when talking to a Ceph
986 - name: ms_osd_compress_mode
989 desc: Compression policy to use in Messenger for communicating with OSD
1000 - name: ms_osd_compress_min_size
1003 desc: Minimal message size eligable for on-wire compression
1008 - ms_osd_compress_mode
1011 - name: ms_osd_compression_algorithm
1014 desc: Compression algorithm to use in Messenger when communicating with OSD
1015 long_desc: Compression algorithm for connections with OSD in order of preference
1016 default: snappy zlib zstd lz4
1020 - ms_osd_compress_mode
1023 - name: ms_compress_secure
1026 desc: Allowing compression when on-wire encryption is enabled
1027 long_desc: Combining encryption with compression reduces the level of security of
1028 messages between peers. In case both encryption and compression are enabled,
1029 compression setting will be ignored and message will not be compressed.
1030 This behaviour can be override using this setting.
1033 - ms_osd_compress_mode
1036 - name: ms_learn_addr_from_peer
1039 desc: Learn address from what IP our first peer thinks we connect from
1040 long_desc: Use the IP address our first peer (usually a monitor) sees that we are
1041 connecting from. This is useful if a client is behind some sort of NAT and we
1042 want to see it identified by its local (not NATed) address.
1045 - name: ms_tcp_nodelay
1048 desc: Disable Nagle's algorithm and send queued network traffic immediately
1049 fmt_desc: Ceph enables ``ms_tcp_nodelay`` so that each request is sent
1050 immediately (no buffering). Disabling `Nagle's algorithm`_
1051 increases network traffic, which can introduce latency. If you
1052 experience large numbers of small packets, you may try
1053 disabling ``ms_tcp_nodelay``.
1056 - name: ms_tcp_rcvbuf
1059 desc: Size of TCP socket receive buffer
1060 fmt_desc: The size of the socket buffer on the receiving end of a network
1061 connection. Disable by default.
1064 - name: ms_tcp_prefetch_max_size
1067 desc: Maximum amount of data to prefetch out of the socket receive buffer
1070 - name: ms_initial_backoff
1073 desc: Initial backoff after a network error is detected (seconds)
1074 fmt_desc: The initial time to wait before reconnecting on a fault.
1077 - name: ms_max_backoff
1080 desc: Maximum backoff after a network error before retrying (seconds)
1081 fmt_desc: The maximum time to wait before reconnecting on a fault.
1084 - ms_initial_backoff
1089 desc: Set and/or verify crc32c checksum on data payload sent over network
1092 - name: ms_crc_header
1095 desc: Set and/or verify crc32c checksum on header payload sent over network
1098 - name: ms_die_on_bad_msg
1101 desc: Induce a daemon crash/exit when a bad network message is received
1102 fmt_desc: Debug option; do not configure.
1105 - name: ms_die_on_unhandled_msg
1108 desc: Induce a daemon crash/exit when an unrecognized message is received
1111 - name: ms_die_on_old_message
1114 desc: Induce a daemon crash/exit when a old, undecodable message is received
1117 - name: ms_die_on_skipped_message
1120 desc: Induce a daemon crash/exit if sender skips a message sequence number
1123 - name: ms_die_on_bug
1126 desc: Induce a crash/exit on various bugs (for testing purposes)
1129 - name: ms_dispatch_throttle_bytes
1132 desc: Limit messages that are read off the network but still being processed
1133 fmt_desc: Throttles total size of messages waiting to be dispatched.
1136 - name: ms_bind_ipv4
1139 desc: Bind servers to IPv4 address(es)
1140 fmt_desc: Enables Ceph daemons to bind to IPv4 addresses.
1144 - name: ms_bind_ipv6
1147 desc: Bind servers to IPv6 address(es)
1148 fmt_desc: Enables Ceph daemons to bind to IPv6 addresses.
1153 - name: ms_bind_prefer_ipv4
1156 desc: Prefer IPV4 over IPV6 address(es)
1158 - name: ms_bind_msgr1
1161 desc: Bind servers to msgr1 (legacy) protocol address(es)
1165 - name: ms_bind_msgr2
1168 desc: Bind servers to msgr2 (nautilus+) protocol address(es)
1172 - name: ms_bind_port_min
1175 desc: Lowest port number to bind daemon(s) to
1176 fmt_desc: The minimum port number to which an OSD or MDS daemon will bind.
1179 - name: ms_bind_port_max
1182 desc: Highest port number to bind daemon(s) to
1183 fmt_desc: The maximum port number to which an OSD or MDS daemon will bind.
1186 # FreeBSD does not use SO_REAUSEADDR so allow for a bit more time per default
1187 - name: ms_bind_retry_count
1190 desc: Number of attempts to make while bind(2)ing to a port
1191 default: @ms_bind_retry_count@
1193 # FreeBSD does not use SO_REAUSEADDR so allow for a bit more time per default
1194 - name: ms_bind_retry_delay
1197 desc: Delay between bind(2) attempts (seconds)
1198 default: @ms_bind_retry_delay@
1200 - name: ms_bind_before_connect
1203 desc: Call bind(2) on client sockets
1206 - name: ms_tcp_listen_backlog
1209 desc: Size of queue of incoming connections for accept(2)
1212 - name: ms_connection_ready_timeout
1215 desc: Time before we declare a not yet ready connection as dead (seconds)
1218 - name: ms_connection_idle_timeout
1221 desc: Time before an idle connection is closed (seconds)
1224 - name: ms_pq_max_tokens_per_priority
1229 - name: ms_pq_min_cost
1234 - name: ms_inject_socket_failures
1237 desc: Inject a socket failure every Nth socket operation
1238 fmt_desc: Debug option; do not configure.
1241 - name: ms_inject_delay_type
1244 desc: Entity type to inject delays for
1248 - name: ms_inject_delay_max
1251 desc: Max delay to inject
1254 - name: ms_inject_delay_probability
1259 - name: ms_inject_internal_delays
1262 desc: Inject various internal delays to induce races (seconds)
1265 - name: ms_blackhole_osd
1270 - name: ms_blackhole_mon
1275 - name: ms_blackhole_mds
1280 - name: ms_blackhole_mgr
1285 - name: ms_blackhole_client
1290 - name: ms_dump_on_send
1293 desc: Hexdump message to debug log on message send
1296 - name: ms_dump_corrupt_message_level
1299 desc: Log level at which to hexdump corrupt messages we receive
1302 # number of worker processing threads for async messenger created on init
1303 - name: ms_async_op_threads
1306 desc: Threadpool size for AsyncMessenger (ms_type=async)
1307 fmt_desc: Initial number of worker threads used by each Async Messenger instance.
1308 Should be at least equal to highest number of replicas, but you can
1309 decrease it if you are low on CPU core count and/or you host a lot of
1310 OSDs on single server.
1315 - name: ms_async_reap_threshold
1318 desc: number of deleted connections before we reap
1322 - name: ms_async_rdma_device_name
1326 - name: ms_async_rdma_enable_hugepage
1331 - name: ms_async_rdma_buffer_size
1336 - name: ms_async_rdma_send_buffers
1341 # size of the receive buffer pool, 0 is unlimited
1342 - name: ms_async_rdma_receive_buffers
1347 # max number of wr in srq
1348 - name: ms_async_rdma_receive_queue_len
1354 - name: ms_async_rdma_support_srq
1359 - name: ms_async_rdma_port_num
1364 - name: ms_async_rdma_polling_us
1369 - name: ms_async_rdma_gid_idx
1372 desc: use gid_idx to select GID for choosing RoCEv1 or RoCEv2
1375 # GID format: "fe80:0000:0000:0000:7efe:90ff:fe72:6efe", no zero folding
1376 - name: ms_async_rdma_local_gid
1380 # 0=RoCEv1, 1=RoCEv2, 2=RoCEv1.5
1381 - name: ms_async_rdma_roce_ver
1386 # in RoCE, this means PCP
1387 - name: ms_async_rdma_sl
1392 # in RoCE, this means DSCP
1393 - name: ms_async_rdma_dscp
1398 # when there are enough accept failures, indicating there are unrecoverable failures,
1399 # just do ceph_abort() . Here we make it configurable.
1400 - name: ms_max_accept_failures
1403 desc: The maximum number of consecutive failed accept() calls before considering
1404 the daemon is misconfigured and abort it.
1407 # rdma connection management
1408 - name: ms_async_rdma_cm
1413 - name: ms_async_rdma_type
1418 - name: ms_dpdk_port_id
1423 # it is modified in unittest so that use SAFE_OPTION to declare
1424 - name: ms_dpdk_coremask
1429 - ms_async_op_threads
1431 - name: ms_dpdk_memory_channel
1436 - name: ms_dpdk_hugepages
1444 - name: ms_dpdk_devs_allowlist
1447 desc: NIC's PCIe address are allowed to use
1448 long_desc: for a single NIC use ms_dpdk_devs_allowlist=-a 0000:7d:010 or --allow=0000:7d:010;
1449 for a bond nics use ms_dpdk_devs_allowlist=--allow=0000:7d:01.0 --allow=0000:7d:02.6
1450 --vdev=net_bonding0,mode=2,slave=0000:7d:01.0,slave=0000:7d:02.6.
1451 - name: ms_dpdk_host_ipv4_addr
1455 - name: ms_dpdk_gateway_ipv4_addr
1459 - name: ms_dpdk_netmask_ipv4_addr
1468 - name: ms_dpdk_enable_tso
1472 - name: ms_dpdk_hw_flow_control
1477 # Weighing of a hardware network queue relative to a software queue (0=no work, 1= equal share)")
1478 - name: ms_dpdk_hw_queue_weight
1483 - name: ms_dpdk_debug_allow_loopback
1488 - name: ms_dpdk_rx_buffer_count_per_core
1493 - name: inject_early_sigterm
1496 desc: send ourselves a SIGTERM early during startup
1499 # list of initial cluster mon ids; if specified, need majority to form initial quorum and create new cluster
1500 - name: mon_initial_members
1503 fmt_desc: The IDs of initial monitors in a cluster during startup. If
1504 specified, Ceph requires an odd number of monitors to form an
1505 initial quorum (e.g., 3).
1506 note: A *majority* of monitors in your cluster must be able to reach
1507 each other in order to establish a quorum. You can decrease the initial
1508 number of monitors to establish a quorum with this setting.
1515 - name: mon_max_pg_per_osd
1518 desc: Max number of PGs per OSD the cluster will allow
1519 long_desc: If the number of PGs per OSD exceeds this, a health warning will be visible
1520 in `ceph status`. This is also used in automated PG management, as the threshold
1521 at which some pools' pg_num may be shrunk in order to enable increasing the pg_num
1530 - name: mon_osd_full_ratio
1533 desc: full ratio of OSDs to be set during initial creation of the cluster
1539 - name: mon_osd_backfillfull_ratio
1547 - name: mon_osd_nearfull_ratio
1550 desc: nearfull ratio for OSDs to be set during initial creation of cluster
1556 - name: mon_osd_initial_require_min_compat_client
1564 - name: mon_allow_pool_delete
1567 desc: allow pool deletions
1568 fmt_desc: Should monitors allow pools to be removed, regardless of what the pool flags say?
1573 - name: mon_fake_pool_delete
1576 desc: fake pool deletions by renaming the rados pool
1581 - name: mon_globalid_prealloc
1584 desc: number of globalid values to preallocate
1585 long_desc: This setting caps how many new clients can authenticate with the cluster
1586 before the monitors have to perform a write to preallocate more. Large values
1587 burn through the 64-bit ID space more quickly.
1588 fmt_desc: The number of global IDs to pre-allocate for clients and daemons in the cluster.
1593 - name: mon_osd_report_timeout
1596 desc: time before OSDs who do not report to the mons are marked down (seconds)
1597 fmt_desc: The grace period in seconds before declaring
1598 unresponsive Ceph OSD Daemons ``down``.
1603 - name: mon_warn_on_insecure_global_id_reclaim
1606 desc: issue AUTH_INSECURE_GLOBAL_ID_RECLAIM health warning if any connected
1607 clients are insecurely reclaiming global_id
1612 - mon_warn_on_insecure_global_id_reclaim_allowed
1613 - auth_allow_insecure_global_id_reclaim
1614 - auth_expose_insecure_global_id_reclaim
1615 - name: mon_warn_on_insecure_global_id_reclaim_allowed
1618 desc: issue AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED health warning if insecure
1619 global_id reclaim is allowed
1624 - mon_warn_on_insecure_global_id_reclaim
1625 - auth_allow_insecure_global_id_reclaim
1626 - auth_expose_insecure_global_id_reclaim
1627 - name: mon_warn_on_msgr2_not_enabled
1630 desc: issue MON_MSGR2_NOT_ENABLED health warning if monitors are all running Nautilus
1631 but not all binding to a msgr2 port
1637 - name: mon_warn_on_slow_ping_time
1640 desc: Override mon_warn_on_slow_ping_ratio with specified threshold in milliseconds
1641 fmt_desc: Override ``mon_warn_on_slow_ping_ratio`` with a specific value.
1642 Raise ``HEALTH_WARN`` if any heartbeat between OSDs exceeds
1643 ``mon_warn_on_slow_ping_time`` milliseconds. The default is 0 (disabled).
1649 - mon_warn_on_slow_ping_ratio
1650 - name: mon_warn_on_slow_ping_ratio
1653 desc: Issue a health warning if heartbeat ping longer than percentage of osd_heartbeat_grace
1654 fmt_desc: Raise ``HEALTH_WARN`` when any heartbeat between OSDs exceeds
1655 ``mon_warn_on_slow_ping_ratio`` of ``osd_heartbeat_grace``.
1661 - osd_heartbeat_grace
1662 - mon_warn_on_slow_ping_time
1663 - name: mon_max_snap_prune_per_epoch
1666 desc: max number of pruned snaps we will process in a single OSDMap epoch
1670 - name: mon_min_osdmap_epochs
1673 desc: min number of OSDMaps to store
1674 fmt_desc: Minimum number of OSD map epochs to keep at all times.
1679 - name: mon_max_log_epochs
1682 desc: max number of past cluster log epochs to store
1683 fmt_desc: Maximum number of Log epochs the monitor should keep.
1688 - name: mon_max_mdsmap_epochs
1691 desc: max number of FSMaps/MDSMaps to store
1692 fmt_desc: The maximum number of mdsmap epochs to trim during a single proposal.
1697 - name: mon_max_mgrmap_epochs
1700 desc: max number of MgrMaps to store
1707 desc: max number of OSDs in a cluster
1708 fmt_desc: The maximum number of OSDs allowed in the cluster.
1713 - name: mon_probe_timeout
1716 desc: timeout for querying other mons during bootstrap pre-election phase (seconds)
1717 fmt_desc: Number of seconds the monitor will wait to find peers before bootstrapping.
1722 - name: mon_client_bytes
1725 desc: max bytes of outstanding client messages mon will read off the network
1726 fmt_desc: The amount of client message data allowed in memory (in bytes).
1731 - name: mon_warn_pg_not_scrubbed_ratio
1734 desc: Percentage of the scrub max interval past the scrub max interval to warn
1737 - osd_scrub_max_interval
1740 - name: mon_warn_pg_not_deep_scrubbed_ratio
1743 desc: Percentage of the deep scrub interval past the deep scrub interval to warn
1746 - osd_deep_scrub_interval
1749 - name: mon_scrub_interval
1752 desc: frequency for scrubbing mon database
1753 fmt_desc: How often the monitor scrubs its store by comparing
1754 the stored checksums with the computed ones for all stored
1755 keys. (0 disables it. dangerous, use with care)
1759 - name: mon_scrub_timeout
1762 desc: timeout to restart scrub of mon quorum participant does not respond for the
1768 - name: mon_scrub_max_keys
1771 desc: max keys per on scrub chunk/step
1772 fmt_desc: The maximum number of keys to scrub each time.
1777 # probability of injected crc mismatch [0.0, 1.0]
1778 - name: mon_scrub_inject_crc_mismatch
1781 desc: probability for injecting crc mismatches into mon scrub
1786 # probability of injected missing keys [0.0, 1.0]
1787 - name: mon_scrub_inject_missing_keys
1790 desc: probability for injecting missing keys into mon scrub
1795 - name: mon_config_key_max_entry_size
1798 desc: Defines the number of bytes allowed to be held in a single config-key entry
1799 fmt_desc: The maximum size of config-key entry (in bytes)
1804 - name: mon_sync_timeout
1807 desc: timeout before canceling sync if syncing mon does not respond
1808 fmt_desc: Number of seconds the monitor will wait for the next update
1809 message from its sync provider before it gives up and bootstrap
1815 - name: mon_sync_max_payload_size
1818 desc: target max message payload for mon sync
1819 fmt_desc: The maximum size for a sync payload (in bytes).
1824 - name: mon_sync_max_payload_keys
1827 desc: target max keys in message payload for mon sync
1832 - name: mon_sync_debug
1835 desc: enable extra debugging during mon sync
1840 - name: mon_inject_sync_get_chunk_delay
1843 desc: inject delay during sync (seconds)
1848 - name: mon_osd_min_down_reporters
1851 desc: number of OSDs from different subtrees who need to report a down OSD for it
1853 fmt_desc: The minimum number of Ceph OSD Daemons required to report a
1854 ``down`` Ceph OSD Daemon.
1859 - mon_osd_reporter_subtree_level
1860 - name: mon_osd_reporter_subtree_level
1863 desc: in which level of parent bucket the reporters are counted
1864 fmt_desc: In which level of parent bucket the reporters are counted. The OSDs
1865 send failure reports to monitors if they find a peer that is not responsive.
1866 Monitors mark the reported ``OSD`` out and then ``down`` after a grace period.
1872 - name: mon_osd_snap_trim_queue_warn_on
1875 desc: Warn when snap trim queue is that large (or larger).
1876 long_desc: Warn when snap trim queue length for at least one PG crosses this value,
1877 as this is indicator of snap trimmer not keeping up, wasting disk space
1882 # force mon to trim maps to this point, regardless of min_last_epoch_clean (dangerous)
1883 - name: mon_osd_force_trim_to
1886 desc: force mons to trim osdmaps through this epoch
1887 fmt_desc: Force monitor to trim osdmaps to this point, even if there is
1888 PGs not clean at the specified epoch (0 disables it. dangerous,
1894 - name: mon_debug_extra_checks
1897 desc: Enable some additional monitor checks
1898 long_desc: Enable some additional monitor checks that would be too expensive to
1899 run on production systems, or would only be relevant while testing or debugging.
1903 - name: mon_debug_block_osdmap_trim
1906 desc: Block OSDMap trimming while the option is enabled.
1907 long_desc: Blocking OSDMap trimming may be quite helpful to easily reproduce states
1908 in which the monitor keeps (hundreds of) thousands of osdmaps.
1912 - name: mon_debug_deprecated_as_obsolete
1915 desc: treat deprecated mon commands as obsolete
1920 - name: mon_debug_dump_transactions
1923 desc: dump paxos transactions to log
1928 - mon_debug_dump_location
1930 - name: mon_debug_dump_json
1933 desc: dump paxos transasctions to log as json
1938 - mon_debug_dump_transactions
1940 - name: mon_debug_dump_location
1943 desc: file to dump paxos transactions to
1944 default: /var/log/ceph/$cluster-$name.tdump
1948 - mon_debug_dump_transactions
1950 - name: mon_debug_no_require_pacific
1953 desc: do not set pacific feature for new mon clusters
1959 - name: mon_debug_no_require_quincy
1962 desc: do not set quincy feature for new mon clusters
1968 - name: mon_debug_no_require_bluestore_for_ec_overwrites
1971 desc: do not require bluestore OSDs to enable EC overwrites on a rados pool
1976 - name: mon_debug_no_initial_persistent_features
1979 desc: do not set any monmap features for new mon clusters
1986 - name: mon_inject_transaction_delay_max
1989 desc: max duration of injected delay in paxos
1995 - name: mon_inject_transaction_delay_probability
1998 desc: probability of injecting a delay in paxos
2003 - name: mon_inject_pg_merge_bounce_probability
2006 desc: probability of failing and reverting a pg_num decrement
2010 # kill the sync provider at a specific point in the work flow
2011 - name: mon_sync_provider_kill_at
2014 desc: kill mon sync requester at specific point
2019 # kill the sync requester at a specific point in the work flow
2020 - name: mon_sync_requester_kill_at
2023 desc: kill mon sync requestor at specific point
2028 # force monitor to join quorum even if it has been previously removed from the map
2029 - name: mon_force_quorum_join
2032 desc: force mon to rejoin quorum even though it was just removed
2033 fmt_desc: Force monitor to join quorum even if it has been previously removed from the map
2038 # type of keyvaluedb backend
2039 - name: mon_keyvaluedb
2042 desc: database backend to use for the mon database
2052 # UNSAFE -- TESTING ONLY! Allows addition of a cache tier with preexisting snaps
2053 - name: mon_debug_unsafe_allow_tier_with_nonempty_snaps
2060 # required of mon, mds, osd daemons
2061 - name: auth_cluster_required
2064 desc: authentication methods required by the cluster
2065 fmt_desc: If enabled, the Ceph Storage Cluster daemons (i.e., ``ceph-mon``,
2066 ``ceph-osd``, ``ceph-mds`` and ``ceph-mgr``) must authenticate with
2067 each other. Valid settings are ``cephx`` or ``none``.
2070 # required by daemons of clients
2071 - name: auth_service_required
2074 desc: authentication methods required by service daemons
2075 fmt_desc: If enabled, the Ceph Storage Cluster daemons require Ceph Clients
2076 to authenticate with the Ceph Storage Cluster in order to access
2077 Ceph services. Valid settings are ``cephx`` or ``none``.
2080 # what clients require of daemons
2081 - name: auth_client_required
2084 desc: authentication methods allowed by clients
2085 fmt_desc: If enabled, the Ceph Client requires the Ceph Storage Cluster to
2086 authenticate with the Ceph Client. Valid settings are ``cephx``
2088 default: cephx, none
2090 # deprecated; default value for above if they are not defined.
2091 - name: auth_supported
2094 desc: authentication methods required (deprecated)
2096 - name: max_rotating_auth_attempts
2099 desc: number of attempts to initialize rotating keys before giving up
2102 - name: rotating_keys_bootstrap_timeout
2105 desc: timeout for obtaining rotating keys during bootstrap phase (seconds)
2107 - name: rotating_keys_renewal_timeout
2110 desc: timeout for updating rotating keys (seconds)
2112 - name: cephx_require_signatures
2116 fmt_desc: If set to ``true``, Ceph requires signatures on all message
2117 traffic between the Ceph Client and the Ceph Storage Cluster, and
2118 between daemons comprising the Ceph Storage Cluster.
2120 Ceph Argonaut and Linux kernel versions prior to 3.19 do
2121 not support signatures; if such clients are in use this
2122 option can be turned off to allow them to connect.
2124 - name: cephx_require_version
2127 desc: Cephx version required (1 = pre-mimic, 2 = mimic+)
2130 - name: cephx_cluster_require_signatures
2134 fmt_desc: If set to ``true``, Ceph requires signatures on all message
2135 traffic between Ceph daemons comprising the Ceph Storage Cluster.
2137 - name: cephx_cluster_require_version
2140 desc: Cephx version required by the cluster from clients (1 = pre-mimic, 2 = mimic+)
2143 - name: cephx_service_require_signatures
2147 fmt_desc: If set to ``true``, Ceph requires signatures on all message
2148 traffic between Ceph Clients and the Ceph Storage Cluster.
2150 - name: cephx_service_require_version
2153 desc: Cephx version required from ceph services (1 = pre-mimic, 2 = mimic+)
2156 # Default to signing session messages if supported
2157 - name: cephx_sign_messages
2161 fmt_desc: If the Ceph version supports message signing, Ceph will sign
2162 all messages so they are more difficult to spoof.
2164 - name: auth_mon_ticket_ttl
2169 - name: auth_service_ticket_ttl
2173 fmt_desc: When the Ceph Storage Cluster sends a Ceph Client a ticket for
2174 authentication, the Ceph Storage Cluster assigns the ticket a
2177 - name: auth_allow_insecure_global_id_reclaim
2180 desc: Allow reclaiming global_id without presenting a valid ticket proving
2181 previous possession of that global_id
2182 long_desc: Allowing unauthorized global_id (re)use poses a security risk.
2183 Unfortunately, older clients may omit their ticket on reconnects and
2184 therefore rely on this being allowed for preserving their global_id for
2185 the lifetime of the client instance. Setting this value to false would
2186 immediately prevent new connections from those clients (assuming
2187 auth_expose_insecure_global_id_reclaim set to true) and eventually break
2188 existing sessions as well (regardless of auth_expose_insecure_global_id_reclaim
2192 - mon_warn_on_insecure_global_id_reclaim
2193 - mon_warn_on_insecure_global_id_reclaim_allowed
2194 - auth_expose_insecure_global_id_reclaim
2196 - name: auth_expose_insecure_global_id_reclaim
2199 desc: Force older clients that may omit their ticket on reconnects to
2200 reconnect as part of establishing a session
2201 long_desc: 'In permissive mode (auth_allow_insecure_global_id_reclaim set
2202 to true), this helps with identifying clients that are not patched. In
2203 enforcing mode (auth_allow_insecure_global_id_reclaim set to false), this
2204 is a fail-fast mechanism: don''t establish a session that will almost
2205 inevitably be broken later.'
2208 - mon_warn_on_insecure_global_id_reclaim
2209 - mon_warn_on_insecure_global_id_reclaim_allowed
2210 - auth_allow_insecure_global_id_reclaim
2212 # if true, assert when weird things happen
2218 # how many mons to try to connect to in parallel during hunt
2219 - name: mon_client_hunt_parallel
2224 # try new mon every N seconds until we connect
2225 - name: mon_client_hunt_interval
2229 fmt_desc: The client will try a new monitor every ``N`` seconds until it
2230 establishes a connection.
2232 # send logs every N seconds
2233 - name: mon_client_log_interval
2236 desc: How frequently we send queued cluster log messages to mon
2239 # ping every N seconds
2240 - name: mon_client_ping_interval
2244 fmt_desc: The client will ping the monitor every ``N`` seconds.
2246 # fail if we don't hear back
2247 - name: mon_client_ping_timeout
2252 - name: mon_client_hunt_interval_backoff
2257 - name: mon_client_hunt_interval_min_multiple
2262 - name: mon_client_hunt_interval_max_multiple
2267 - name: mon_client_max_log_entries_per_message
2271 fmt_desc: The maximum number of log entries a monitor will generate
2274 - name: mon_client_directed_command_retry
2277 desc: Number of times to try sending a command directed at a specific monitor
2280 # whitespace-separated list of key=value pairs describing crush location
2281 - name: crush_location
2285 - name: crush_location_hook
2289 - name: crush_location_hook_timeout
2294 - name: objecter_tick_interval
2299 # before we ask for a map
2300 - name: objecter_timeout
2303 desc: Seconds before in-flight op is considered 'laggy' and we query mon for the
2307 - name: objecter_inflight_op_bytes
2310 desc: Max in-flight data in bytes (both directions)
2313 - name: objecter_inflight_ops
2316 desc: Max in-flight operations
2319 # num of completion locks per each session, for serializing same object responses
2320 - name: objecter_completion_locks_per_session
2325 # suppress watch pings
2326 - name: objecter_inject_no_watch_ping
2331 # ignore the first reply for each write, and resend the osd op instead
2332 - name: objecter_retry_writes_after_first_reply
2337 - name: objecter_debug_inject_relock_delay
2342 - name: filer_max_purge_ops
2345 desc: Max in-flight operations for purging a striped range (e.g., MDS journal)
2348 - name: filer_max_truncate_ops
2351 desc: Max in-flight operations for truncating/deleting a striped sequence (e.g.,
2355 - name: journaler_write_head_interval
2358 desc: Interval in seconds between journal header updates (to help bound replay time)
2360 # * journal object size
2361 - name: journaler_prefetch_periods
2364 desc: Number of striping periods to prefetch while reading MDS journal
2366 # we need at least 2 periods to make progress.
2368 # * journal object size
2369 - name: journaler_prezero_periods
2372 desc: Number of striping periods to zero head of MDS journal write position
2374 # we need to zero at least two periods, minimum, to ensure that we
2375 # have a full empty object/period in front of us.
2377 - name: osd_calc_pg_upmaps_aggressively
2380 desc: try to calculate PG upmaps more aggressively, e.g., by doing a fairly exhaustive
2381 search of existing PGs that can be unmapped or upmapped
2385 - name: osd_calc_pg_upmaps_local_fallback_retries
2388 desc: 'Maximum number of PGs we can attempt to unmap or upmap for a specific overfull
2389 or underfull osd per iteration '
2394 - name: osd_crush_chooseleaf_type
2397 desc: default chooseleaf type for osdmaptool --create
2398 fmt_desc: The bucket type to use for ``chooseleaf`` in a CRUSH rule. Uses
2399 ordinal rank rather than name.
2404 # try to use gmt for hitset archive names if all osds in cluster support it
2405 - name: osd_pool_use_gmt_hitset
2408 desc: use UTC for hitset timestamps
2409 long_desc: This setting only exists for compatibility with hammer (and older) clusters.
2412 # whether turn on fast read on the pool or not
2413 - name: osd_pool_default_ec_fast_read
2416 desc: set ec_fast_read for new erasure-coded pools
2417 fmt_desc: Whether to turn on fast read on the pool or not. It will be used as
2418 the default setting of newly created erasure coded pools if ``fast_read``
2419 is not specified at create time.
2424 - name: osd_pool_default_crush_rule
2427 desc: CRUSH rule for newly created pools
2428 fmt_desc: The default CRUSH rule to use when creating a replicated pool. The
2429 default value of ``-1`` means "pick the rule with the lowest numerical ID and
2430 use that". This is to make pool creation work in the absence of rule 0.
2434 - name: osd_pool_default_size
2437 desc: the number of copies of an object for new replicated pools
2438 fmt_desc: Sets the number of replicas for objects in the pool. The default
2439 value is the same as
2440 ``ceph osd pool set {pool-name} size {size}``.
2448 - name: osd_pool_default_min_size
2451 desc: the minimal number of copies allowed to write to a degraded pool for new replicated
2453 long_desc: 0 means no specific default; ceph will use size-size/2
2454 fmt_desc: Sets the minimum number of written replicas for objects in the
2455 pool in order to acknowledge an I/O operation to the client. If
2456 minimum is not met, Ceph will not acknowledge the I/O to the
2457 client, **which may result in data loss**. This setting ensures
2458 a minimum number of replicas when operating in ``degraded`` mode.
2459 The default value is ``0`` which means no particular minimum. If ``0``,
2460 minimum is ``size - (size / 2)``.
2465 - osd_pool_default_size
2470 - name: osd_pool_default_pg_num
2473 desc: number of PGs for new pools
2474 fmt_desc: The default number of placement groups for a pool. The default
2475 value is the same as ``pg_num`` with ``mkpool``.
2476 long_desc: With default value of `osd_pool_default_pg_autoscale_mode` being
2477 `on` the number of PGs for new pools will start out with 1 pg, unless the
2478 user specifies the pg_num.
2483 - osd_pool_default_pg_autoscale_mode
2486 - name: osd_pool_default_pgp_num
2489 desc: number of PGs for placement purposes (0 to match pg_num)
2490 fmt_desc: The default number of placement groups for placement for a pool.
2491 The default value is the same as ``pgp_num`` with ``mkpool``.
2492 PG and PGP should be equal (for now).
2497 - osd_pool_default_pg_num
2500 - name: osd_pool_default_type
2503 desc: default type of pool to create
2512 - name: osd_pool_default_erasure_code_profile
2515 desc: default erasure code profile for new erasure-coded pools
2516 default: plugin=jerasure technique=reed_sol_van k=2 m=2
2521 - name: osd_erasure_code_plugins
2524 desc: erasure code plugins to load
2525 default: @osd_erasure_code_plugins@
2532 - name: osd_pool_default_flags
2535 desc: (integer) flags to set on new pools
2536 fmt_desc: The default flags for new pools.
2541 # use new pg hashing to prevent pool/pg overlap
2542 - name: osd_pool_default_flag_hashpspool
2545 desc: set hashpspool (better hashing scheme) flag on new pools
2550 # pool can't be deleted
2551 - name: osd_pool_default_flag_nodelete
2554 desc: set nodelete flag on new pools
2555 fmt_desc: Set the ``nodelete`` flag on new pools, which prevents pool removal.
2560 # pool's pg and pgp num can't be changed
2561 - name: osd_pool_default_flag_nopgchange
2564 desc: set nopgchange flag on new pools
2565 fmt_desc: Set the ``nopgchange`` flag on new pools. Does not allow the number of PGs to be changed.
2570 # pool's size and min size can't be changed
2571 - name: osd_pool_default_flag_nosizechange
2574 desc: set nosizechange flag on new pools
2575 fmt_desc: Set the ``nosizechange`` flag on new pools. Does not allow the ``size`` to be changed.
2580 - name: osd_pool_default_flag_bulk
2583 desc: set bulk flag on new pools
2584 fmt_desc: Set the ``bulk`` flag on new pools. Allowing autoscaler to use scale-down mode.
2589 - name: osd_pool_default_hit_set_bloom_fpp
2596 - osd_tier_default_cache_hit_set_type
2598 - name: osd_pool_default_cache_target_dirty_ratio
2603 - name: osd_pool_default_cache_target_dirty_high_ratio
2608 - name: osd_pool_default_cache_target_full_ratio
2614 - name: osd_pool_default_cache_min_flush_age
2620 - name: osd_pool_default_cache_min_evict_age
2625 # max size to check for eviction
2626 - name: osd_pool_default_cache_max_evict_check_size
2631 - name: osd_pool_default_pg_autoscale_mode
2634 desc: Default PG autoscaling behavior for new pools
2635 long_desc: With default value `on`, the autoscaler starts a new pool with 1
2636 pg, unless the user specifies the pg_num.
2644 - name: osd_pool_default_read_lease_ratio
2647 desc: Default read_lease_ratio for a pool, as a multiple of osd_heartbeat_grace
2648 long_desc: This should be <= 1.0 so that the read lease will have expired by the
2649 time we decide to mark a peer OSD down.
2652 - osd_heartbeat_grace
2656 # min target size for a HitSet
2657 - name: osd_hit_set_min_size
2662 # max target size for a HitSet
2663 - name: osd_hit_set_max_size
2668 # rados namespace for hit_set tracking
2669 - name: osd_hit_set_namespace
2672 default: .ceph-internal
2674 # conservative default throttling values
2675 - name: osd_tier_promote_max_objects_sec
2680 - name: osd_tier_promote_max_bytes_sec
2685 - name: osd_tier_default_cache_mode
2699 - name: osd_tier_default_cache_hit_set_count
2703 - name: osd_tier_default_cache_hit_set_period
2707 - name: osd_tier_default_cache_hit_set_type
2717 - name: osd_tier_default_cache_min_read_recency_for_promote
2720 desc: number of recent HitSets the object must appear in to be promoted (on read)
2722 - name: osd_tier_default_cache_min_write_recency_for_promote
2725 desc: number of recent HitSets the object must appear in to be promoted (on write)
2727 - name: osd_tier_default_cache_hit_set_grade_decay_rate
2731 - name: osd_tier_default_cache_hit_set_search_last_n
2735 - name: osd_objecter_finishers
2742 - name: osd_map_dedup
2746 fmt_desc: Enable removing duplicates in the OSD map.
2748 - name: osd_map_message_max
2751 desc: maximum number of OSDMaps to include in a single message
2752 fmt_desc: The maximum map entries allowed per MOSDMap message.
2758 - name: osd_map_message_max_bytes
2761 desc: maximum number of bytes worth of OSDMaps to include in a single message
2767 # do not assert on divergent_prior entries which aren't in the log and whose on-disk objects are newer
2768 - name: osd_ignore_stale_divergent_priors
2773 - name: osd_heartbeat_interval
2776 desc: Interval (in seconds) between peer pings
2777 fmt_desc: How often an Ceph OSD Daemon pings its peers (in seconds).
2782 # (seconds) how long before we decide a peer has failed
2783 # This setting is read by the MONs and OSDs and has to be set to a equal value in both settings of the configuration
2784 - name: osd_heartbeat_grace
2788 fmt_desc: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
2789 that the Ceph Storage Cluster considers it ``down``.
2790 This setting must be set in both the [mon] and [osd] or [global]
2791 sections so that it is read by both monitor and OSD daemons.
2793 - name: osd_heartbeat_stale
2796 desc: Interval (in seconds) we mark an unresponsive heartbeat peer as stale.
2797 long_desc: Automatically mark unresponsive heartbeat sessions as stale and tear
2798 them down. The primary benefit is that OSD doesn't need to keep a flood of blocked
2799 heartbeat messages around in memory.
2801 # prio the heartbeat tcp socket and set dscp as CS6 on it if true
2802 - name: osd_heartbeat_use_min_delay_socket
2807 # the minimum size of OSD heartbeat messages to send
2808 - name: osd_heartbeat_min_size
2811 desc: Minimum heartbeat packet size in bytes. Will add dummy payload if heartbeat
2812 packet is smaller than this.
2815 # max number of parallel snap trims/pg
2816 - name: osd_pg_max_concurrent_snap_trims
2822 # max number of trimming pgs
2823 - name: osd_max_trimming_pgs
2828 # minimum number of peers that must be reachable to mark ourselves
2829 # back up after being wrongly marked down.
2830 - name: osd_heartbeat_min_healthy_ratio
2835 # (seconds) how often to ping monitor if no peers
2836 - name: osd_mon_heartbeat_interval
2840 fmt_desc: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
2841 Ceph OSD Daemon peers.
2843 - name: osd_mon_heartbeat_stat_stale
2846 desc: Stop reporting on heartbeat ping times not updated for this many seconds.
2847 long_desc: Stop reporting on old heartbeat information unless this is set to zero
2848 fmt_desc: Stop reporting on heartbeat ping times which haven't been updated for
2849 this many seconds. Set to zero to disable this action.
2851 # failures, up_thru, boot.
2852 - name: osd_mon_report_interval
2855 desc: Frequency of OSD reports to mon for peer failures, fullness status changes
2856 fmt_desc: The number of seconds a Ceph OSD Daemon may wait
2857 from startup or another reportable event before reporting
2861 # max updates in flight
2862 - name: osd_mon_report_max_in_flight
2867 # (second) how often to send beacon message to monitor
2868 - name: osd_beacon_report_interval
2873 # report pg stats for any given pg at least this often
2874 - name: osd_pg_stat_report_interval_max
2879 # Max number of snap intervals to report to mgr in pg_stat_t
2880 - name: osd_max_snap_prune_intervals_per_epoch
2883 desc: Max number of snap intervals to report to mgr in pg_stat_t
2886 - name: osd_default_data_pool_replay_window
2890 fmt_desc: The time (in seconds) for an OSD to wait for a client to replay
2892 - name: osd_auto_mark_unfound_lost
2897 - name: osd_check_for_log_corruption
2901 fmt_desc: Check log files for corruption. Can be computationally expensive.
2903 - name: osd_use_stale_snap
2908 - name: osd_rollback_to_cluster_snap
2912 - name: osd_default_notify_timeout
2915 desc: default number of seconds after which notify propagation times out. used if
2916 a client has not specified other value
2917 fmt_desc: The OSD default notification timeout (in seconds).
2920 - name: osd_kill_backfill_at
2925 # Bounds how infrequently a new map epoch will be persisted for a pg
2926 # make this < map_cache_size!
2927 - name: osd_pg_epoch_persisted_max_stale
2932 - name: osd_target_pg_log_entries_per_osd
2935 desc: target number of PG entries total on an OSD - limited per pg by the min and
2939 - osd_max_pg_log_entries
2940 - osd_min_pg_log_entries
2942 - name: osd_min_pg_log_entries
2945 desc: minimum number of entries to maintain in the PG log
2946 fmt_desc: The minimum number of placement group logs to maintain
2947 when trimming log files.
2952 - osd_max_pg_log_entries
2953 - osd_pg_log_dups_tracked
2954 - osd_target_pg_log_entries_per_osd
2956 - name: osd_max_pg_log_entries
2959 desc: maximum number of entries to maintain in the PG log
2960 fmt_desc: The maximum number of placement group logs to maintain
2961 when trimming log files.
2966 - osd_min_pg_log_entries
2967 - osd_pg_log_dups_tracked
2968 - osd_target_pg_log_entries_per_osd
2970 - name: osd_pg_log_dups_tracked
2973 desc: how many versions back to track in order to detect duplicate ops; this is
2974 combined with both the regular pg log entries and additional minimal dup detection
2980 - osd_min_pg_log_entries
2981 - osd_max_pg_log_entries
2983 - name: osd_object_clean_region_max_num_intervals
2986 desc: number of intervals in clean_offsets
2987 long_desc: partial recovery uses multiple intervals to record the clean part of
2988 the objectwhen the number of intervals is greater than osd_object_clean_region_max_num_intervals,
2989 minimum interval will be trimmed(0 will recovery the entire object data interval)
2994 # max entries factor before force recovery
2995 - name: osd_force_recovery_pg_log_entries_factor
3000 - name: osd_pg_log_trim_min
3003 desc: Minimum number of log entries to trim at once. This lets us trim in larger
3004 batches rather than with each write.
3007 - osd_max_pg_log_entries
3008 - osd_min_pg_log_entries
3010 - name: osd_force_auth_primary_missing_objects
3013 desc: Approximate missing objects above which to force auth_log_shard to be primary
3016 - name: osd_async_recovery_min_cost
3019 desc: A mixture measure of number of current log entries difference and historical
3020 missing objects, above which we switch to use asynchronous recovery when appropriate
3024 - name: osd_max_pg_per_osd_hard_ratio
3027 desc: Maximum number of PG per OSD, a factor of 'mon_max_pg_per_osd'
3028 long_desc: OSD will refuse to instantiate PG if the number of PG it serves exceeds
3030 fmt_desc: The ratio of number of PGs per OSD allowed by the cluster before the
3031 OSD refuses to create new PGs. An OSD stops creating new PGs if the number
3032 of PGs it serves exceeds
3033 ``osd_max_pg_per_osd_hard_ratio`` \* ``mon_max_pg_per_osd``.
3036 - mon_max_pg_per_osd
3038 - name: osd_pg_log_trim_max
3041 desc: maximum number of entries to remove at once from the PG log
3046 - osd_min_pg_log_entries
3047 - osd_max_pg_log_entries
3049 # how many seconds old makes an op complaint-worthy
3050 - name: osd_op_complaint_time
3054 fmt_desc: An operation becomes complaint worthy after the specified number
3055 of seconds have elapsed.
3057 - name: osd_command_max_records
3061 fmt_desc: Limits the number of lost objects to return.
3063 # max peer osds to report that are blocking our progress
3064 - name: osd_max_pg_blocked_by
3069 - name: osd_op_log_threshold
3073 fmt_desc: How many operations logs to display at once.
3075 - name: osd_backoff_on_unfound
3080 # [mainly for debug?] object unreadable/writeable
3081 - name: osd_backoff_on_degraded
3086 # [debug] pg peering
3087 - name: osd_backoff_on_peering
3092 - name: osd_debug_shutdown
3095 desc: Turn up debug levels during shutdown
3098 # crash osd if client ignores a backoff; useful for debugging
3099 - name: osd_debug_crash_on_ignored_backoff
3104 - name: osd_debug_inject_dispatch_delay_probability
3109 - name: osd_debug_inject_dispatch_delay_duration
3114 - name: osd_debug_drop_ping_probability
3120 - name: osd_debug_drop_ping_duration
3126 - name: osd_debug_op_order
3131 - name: osd_debug_verify_missing_on_start
3136 - name: osd_debug_verify_snaps
3141 - name: osd_debug_verify_stray_on_activate
3146 - name: osd_debug_skip_full_check_in_backfill_reservation
3151 - name: osd_debug_reject_backfill_probability
3156 # inject failure during copyfrom completion
3157 - name: osd_debug_inject_copyfrom_error
3162 - name: osd_debug_misdirected_ops
3167 - name: osd_debug_skip_full_check_in_recovery
3172 - name: osd_debug_random_push_read_error
3177 - name: osd_debug_verify_cached_snaps
3182 - name: osd_debug_deep_scrub_sleep
3185 desc: Inject an expensive sleep during deep scrub IO to make it easier to induce
3189 - name: osd_debug_no_acting_change
3194 - name: osd_debug_no_purge_strays
3199 - name: osd_debug_pretend_recovery_active
3204 # enable/disable OSD op tracking
3205 - name: osd_enable_op_tracker
3210 # The number of shards for holding the ops
3211 - name: osd_num_op_tracker_shard
3216 # Max number of completed ops to track
3217 - name: osd_op_history_size
3221 fmt_desc: The maximum number of completed operations to track.
3223 # Oldest completed op to track
3224 - name: osd_op_history_duration
3228 fmt_desc: The oldest completed operation to track.
3230 # Max number of slow ops to track
3231 - name: osd_op_history_slow_op_size
3236 # track the op if over this threshold
3237 - name: osd_op_history_slow_op_threshold
3242 # to adjust various transactions that batch smaller items
3243 - name: osd_target_transaction_size
3248 # what % full makes an OSD "full" (failsafe)
3249 - name: osd_failsafe_full_ratio
3254 - name: osd_fast_shutdown
3257 desc: Fast, immediate shutdown
3258 long_desc: Setting this to false makes the OSD do a slower teardown of all state
3259 when it receives a SIGINT or SIGTERM or when shutting down for any other reason. That
3260 slow shutdown is primarilyy useful for doing memory leak checking with valgrind.
3263 - name: osd_fast_shutdown_timeout
3266 desc: timeout in seconds for osd fast-shutdown (0 is unlimited)
3270 - name: osd_fast_shutdown_notify_mon
3273 desc: Tell mon about OSD shutdown on immediate shutdown
3274 long_desc: Tell the monitor the OSD is shutting down on immediate shutdown. This
3275 helps with cluster log messages from other OSDs reporting it immediately failed.
3279 - osd_mon_shutdown_timeout
3281 # immediately mark OSDs as down once they refuse to accept connections
3282 - name: osd_fast_fail_on_connection_refused
3286 fmt_desc: If this option is enabled, crashed OSDs are marked down
3287 immediately by connected peers and MONs (assuming that the
3288 crashed OSD host survives). Disable it to restore old
3289 behavior, at the expense of possible long I/O stalls when
3290 OSDs crash in the middle of I/O operations.
3292 - name: osd_pg_object_context_cache_count
3297 # true if LTTng-UST tracepoints should be enabled
3303 # true if function instrumentation should use LTTng
3304 - name: osd_function_tracing
3309 # use fast info attr, if we can
3310 - name: osd_fast_info
3315 # determines whether PGLog::check() compares written out log to stored log
3316 - name: osd_debug_pg_log_writeout
3321 # Max number of loop before we reset thread-pool's handle
3322 - name: osd_loop_before_reset_tphandle
3327 # default timeout while caling WaitInterval on an empty queue
3328 - name: threadpool_default_timeout
3333 # default wait time for an empty queue before pinging the hb timeout
3334 - name: threadpool_empty_queue_max_wait
3339 - name: leveldb_log_to_ceph_log
3344 - name: leveldb_write_buffer_size
3349 - name: leveldb_cache_size
3354 - name: leveldb_block_size
3359 - name: leveldb_bloom_size
3364 - name: leveldb_max_open_files
3369 - name: leveldb_compression
3374 - name: leveldb_paranoid
3384 - name: leveldb_compact_on_mount
3389 - name: rocksdb_log_to_ceph_log
3394 - name: rocksdb_cache_size
3401 # ratio of cache for row (vs block)
3402 - name: rocksdb_cache_row_ratio
3407 # rocksdb block cache shard bits, 4 bit -> 16 shards
3408 - name: rocksdb_cache_shard_bits
3414 - name: rocksdb_cache_type
3419 - name: rocksdb_block_size
3424 # Enabling this will have 5-10% impact on performance for the stats collection
3425 - name: rocksdb_perf
3430 # For rocksdb, this behavior will be an overhead of 5%~10%, collected only rocksdb_perf is enabled.
3431 - name: rocksdb_collect_compaction_stats
3436 # For rocksdb, this behavior will be an overhead of 5%~10%, collected only rocksdb_perf is enabled.
3437 - name: rocksdb_collect_extended_stats
3442 # For rocksdb, this behavior will be an overhead of 5%~10%, collected only rocksdb_perf is enabled.
3443 - name: rocksdb_collect_memory_stats
3448 - name: rocksdb_delete_range_threshold
3451 desc: The number of keys required to invoke DeleteRange when deleting muliple keys.
3453 - name: rocksdb_bloom_bits_per_key
3456 desc: Number of bits per key to use for RocksDB's bloom filters.
3457 long_desc: 'RocksDB bloom filters can be used to quickly answer the question of
3458 whether or not a key may exist or definitely does not exist in a given RocksDB
3459 SST file without having to read all keys into memory. Using a higher bit value
3460 decreases the likelihood of false positives at the expense of additional disk
3461 space and memory consumption when the filter is loaded into RAM. The current
3462 default value of 20 was found to provide significant performance gains when getattr
3463 calls are made (such as during new object creation in bluestore) without significant
3464 memory overhead or cache pollution when combined with rocksdb partitioned index
3465 filters. See: https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters
3466 for more information.'
3468 - name: rocksdb_cache_index_and_filter_blocks
3471 desc: Whether to cache indices and filters in block cache
3472 long_desc: By default RocksDB will load an SST file's index and bloom filters into
3473 memory when it is opened and remove them from memory when an SST file is closed. Thus,
3474 memory consumption by indices and bloom filters is directly tied to the number
3475 of concurrent SST files allowed to be kept open. This option instead stores cached
3476 indicies and filters in the block cache where they directly compete with other
3477 cached data. By default we set this option to true to better account for and
3478 bound rocksdb memory usage and keep filters in memory even when an SST file is
3481 - name: rocksdb_cache_index_and_filter_blocks_with_high_priority
3484 desc: Whether to cache indices and filters in the block cache with high priority
3485 long_desc: A downside of setting rocksdb_cache_index_and_filter_blocks to true is
3486 that regular data can push indices and filters out of memory. Setting this option
3487 to true means they are cached with higher priority than other data and should
3488 typically stay in the block cache.
3490 - name: rocksdb_pin_l0_filter_and_index_blocks_in_cache
3493 desc: Whether to pin Level 0 indices and bloom filters in the block cache
3494 long_desc: A downside of setting rocksdb_cache_index_and_filter_blocks to true is
3495 that regular data can push indices and filters out of memory. Setting this option
3496 to true means that level 0 SST files will always have their indices and filters
3497 pinned in the block cache.
3499 - name: rocksdb_index_type
3502 desc: 'Type of index for SST files: binary_search, hash_search, two_level'
3503 long_desc: 'This option controls the table index type. binary_search is a space
3504 efficient index block that is optimized for block-search-based index. hash_search
3505 may improve prefix lookup performance at the expense of higher disk and memory
3506 usage and potentially slower compactions. two_level is an experimental index
3507 type that uses two binary search indexes and works in conjunction with partition
3508 filters. See: http://rocksdb.org/blog/2017/05/12/partitioned-index-filter.html'
3509 default: binary_search
3510 - name: rocksdb_partition_filters
3513 desc: (experimental) partition SST index/filters into smaller blocks
3514 long_desc: 'This is an experimental option for rocksdb that works in conjunction
3515 with two_level indices to avoid having to keep the entire filter/index in cache
3516 when cache_index_and_filter_blocks is true. The idea is to keep a much smaller
3517 top-level index in heap/cache and then opportunistically cache the lower level
3518 indices. See: https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters'
3520 - name: rocksdb_metadata_block_size
3523 desc: The block size for index partitions. (0 = rocksdb default)
3525 # osd_*_priority adjust the relative priority of client io, recovery io,
3528 # osd_*_priority determines the ratio of available io between client and
3529 # recovery. Each option may be set between
3531 - name: osd_client_op_priority
3535 fmt_desc: The priority set for client operations. This value is relative
3536 to that of ``osd_recovery_op_priority`` below. The default
3537 strongly favors client ops over recovery.
3539 - name: osd_recovery_op_priority
3542 desc: Priority to use for recovery operations if not specified for the pool
3543 fmt_desc: The priority of recovery operations vs client operations, if not specified by the
3544 pool's ``recovery_op_priority``. The default value prioritizes client
3545 ops (see above) over recovery ops. You may adjust the tradeoff of client
3546 impact against the time to restore cluster health by lowering this value
3547 for increased prioritization of client ops, or by increasing it to favor
3551 - name: osd_peering_op_priority
3556 - name: osd_snap_trim_priority
3560 fmt_desc: The priority set for the snap trim work queue.
3562 - name: osd_snap_trim_cost
3567 - name: osd_pg_delete_priority
3572 - name: osd_pg_delete_cost
3577 - name: osd_scrub_priority
3580 desc: Priority for scrub operations in work queue
3581 fmt_desc: The default work queue priority for scheduled scrubs when the
3582 pool doesn't specify a value of ``scrub_priority``. This can be
3583 boosted to the value of ``osd_client_op_priority`` when scrubs are
3584 blocking client operations.
3587 - name: osd_scrub_cost
3590 desc: Cost for scrub operations in work queue
3593 # set requested scrub priority higher than scrub priority to make the
3594 # requested scrubs jump the queue of scheduled scrubs
3595 - name: osd_requested_scrub_priority
3599 fmt_desc: The priority set for user requested scrub on the work queue. If
3600 this value were to be smaller than ``osd_client_op_priority`` it
3601 can be boosted to the value of ``osd_client_op_priority`` when
3602 scrub is blocking client operations.
3604 - name: osd_recovery_priority
3607 desc: Priority of recovery in the work queue
3608 long_desc: Not related to a pool's recovery_priority
3609 fmt_desc: The default priority set for recovery work queue. Not
3610 related to a pool's ``recovery_priority``.
3613 # set default cost equal to 20MB io
3614 - name: osd_recovery_cost
3619 # osd_recovery_op_warn_multiple scales the normal warning threshold,
3620 # osd_op_complaint_time, so that slow recovery ops won't cause noise
3621 - name: osd_recovery_op_warn_multiple
3626 # Max time to wait between notifying mon of shutdown and shutting down
3627 - name: osd_mon_shutdown_timeout
3632 # crash if the OSD has stray PG refs on shutdown
3633 - name: osd_shutdown_pgref_assert
3638 # OSD's maximum object size
3639 - name: osd_max_object_size
3643 fmt_desc: The maximum size of a RADOS object in bytes.
3645 # max rados object name len
3646 - name: osd_max_object_name_len
3651 # max rados object namespace len
3652 - name: osd_max_object_namespace_len
3657 # max rados attr name len; cannot go higher than 100 chars for file system backends
3658 - name: osd_max_attr_name_len
3663 - name: osd_max_attr_size
3668 - name: osd_max_omap_entries_per_request
3673 - name: osd_max_omap_bytes_per_request
3678 # osd_recovery_op_warn_multiple scales the normal warning threshold,
3679 # osd_op_complaint_time, so that slow recovery ops won't cause noise
3680 - name: osd_max_write_op_reply_len
3683 desc: Max size of the per-op payload for requests with the RETURNVEC flag set
3684 long_desc: This value caps the amount of data (per op; a request may have many ops)
3685 that will be sent back to the client and recorded in the PG log.
3688 - name: osd_objectstore
3691 desc: backend type for an OSD (like filestore or bluestore)
3703 # true if LTTng-UST tracepoints should be enabled
3704 - name: osd_objectstore_tracing
3709 - name: osd_objectstore_fuse
3714 - name: osd_bench_small_size_max_iops
3719 - name: osd_bench_large_size_max_throughput
3724 - name: osd_bench_max_block_size
3729 # duration of 'osd bench', capped at 30s to avoid triggering timeouts
3730 - name: osd_bench_duration
3735 # create a blkin trace for all osd requests
3736 - name: osd_blkin_trace_all
3741 # create a blkin trace for all objecter requests
3742 - name: osdc_blkin_trace_all
3747 - name: osd_discard_disconnected_ops
3752 - name: osd_memory_target
3755 desc: When tcmalloc and cache autotuning is enabled, try to keep this many bytes
3757 long_desc: The minimum value must be at least equal to osd_memory_base + osd_memory_cache_min.
3759 When TCMalloc is available and cache autotuning is enabled, try to
3760 keep this many bytes mapped in memory. Note: This may not exactly
3761 match the RSS memory usage of the process. While the total amount
3762 of heap memory mapped by the process should usually be close
3763 to this target, there is no guarantee that the kernel will actually
3764 reclaim memory that has been unmapped. During initial development,
3765 it was found that some kernels result in the OSD's RSS memory
3766 exceeding the mapped memory by up to 20%. It is hypothesised
3767 however, that the kernel generally may be more aggressive about
3768 reclaiming unmapped memory when there is a high amount of memory
3769 pressure. Your mileage may vary.
3772 - bluestore_cache_autotune
3773 - osd_memory_cache_min
3775 - osd_memory_target_autotune
3779 - name: osd_memory_target_autotune
3783 desc: If enabled, allow orchestrator to automatically tune osd_memory_target
3786 - name: osd_memory_target_cgroup_limit_ratio
3789 desc: Set the default value for osd_memory_target to the cgroup memory limit (if
3790 set) times this value
3791 long_desc: A value of 0 disables this feature.
3797 - name: osd_memory_base
3800 desc: When tcmalloc and cache autotuning is enabled, estimate the minimum amount
3801 of memory in bytes the OSD will need.
3802 fmt_desc: When TCMalloc and cache autotuning are enabled, estimate the minimum
3803 amount of memory in bytes the OSD will need. This is used to help
3804 the autotuner estimate the expected aggregate memory consumption of
3808 - bluestore_cache_autotune
3811 - name: osd_memory_expected_fragmentation
3814 desc: When tcmalloc and cache autotuning is enabled, estimate the percent of memory
3816 fmt_desc: When TCMalloc and cache autotuning is enabled, estimate the
3817 percentage of memory fragmentation. This is used to help the
3818 autotuner estimate the expected aggregate memory consumption
3822 - bluestore_cache_autotune
3827 - name: osd_memory_cache_min
3830 desc: When tcmalloc and cache autotuning is enabled, set the minimum amount of memory
3833 When TCMalloc and cache autotuning are enabled, set the minimum
3834 amount of memory used for caches. Note: Setting this value too
3835 low can result in significant cache thrashing.
3838 - bluestore_cache_autotune
3842 - name: osd_memory_cache_resize_interval
3845 desc: When tcmalloc and cache autotuning is enabled, wait this many seconds between
3847 fmt_desc: When TCMalloc and cache autotuning are enabled, wait this many
3848 seconds between resizing caches. This setting changes the total
3849 amount of memory available for BlueStore to use for caching. Note
3850 that setting this interval too small can result in memory allocator
3851 thrashing and lower performance.
3854 - bluestore_cache_autotune
3855 - name: memstore_device_bytes
3860 - name: memstore_page_set
3865 - name: memstore_page_size
3870 - name: memstore_debug_omit_block_device_write
3873 desc: write metadata only
3876 - bluestore_debug_omit_block_device_write
3878 - name: objectstore_blackhole
3883 - name: bdev_debug_inflight_ios
3888 # if N>0, then ~ 1/N IOs will complete before we crash on flush
3889 - name: bdev_inject_crash
3894 # wait N more seconds on flush
3895 - name: bdev_inject_crash_flush_delay
3906 - name: bdev_aio_poll_ms
3911 - name: bdev_aio_max_queue_depth
3916 - name: bdev_aio_reap_max
3921 - name: bdev_block_size
3926 - name: bdev_read_buffer_alignment
3931 - name: bdev_read_preallocated_huge_buffers
3934 desc: description of pools arrangement for huge page-based read buffers
3935 long_desc: Arrangement of preallocated, huge pages-based pools for reading
3936 from a KernelDevice. Applied to minimize size of scatter-gather lists
3937 sent to NICs. Targets really big buffers (>= 2 or 4 MBs).
3938 Keep in mind the system must be configured accordingly (see /proc/sys/vm/nr_hugepages).
3939 Otherwise the OSD wil fail early.
3940 Beware BlueStore, by default, stores large chunks across many smaller blobs.
3941 Increasing bluestore_max_blob_size changes that, and thus allows the data to
3942 be read back into small number of huge page-backed buffers.
3943 fmt_desc: List of key=value pairs delimited by comma, semicolon or tab.
3944 key specifies the targeted read size and must be expressed in bytes.
3945 value specifies the number of preallocated buffers.
3946 For instance, to preallocate 64 buffers that will be used to serve
3947 2 MB-sized read requests and 128 for 4 MB, someone needs to set
3948 "2097152=64,4194304=128".
3950 - bluestore_max_blob_size
3951 - name: bdev_debug_aio
3956 - name: bdev_debug_aio_suicide_timeout
3961 - name: bdev_debug_aio_log_age
3966 # if yes, osd will unbind all NVMe devices from kernel driver and bind them
3967 # to the uio_pci_generic driver. The purpose is to prevent the case where
3968 # NVMe driver is loaded while osd is running.
3969 - name: bdev_nvme_unbind_from_kernel
3974 - name: bdev_enable_discard
3979 - name: bdev_async_discard
3984 - name: bdev_flock_retry_interval
3987 desc: interval to retry the flock
3989 - name: bdev_flock_retry
3992 desc: times to retry the flock
3993 long_desc: The number of times to retry on getting the block device lock. Programs
3994 such as systemd-udevd may compete with Ceph for this lock. 0 means 'unlimited'.
3996 - name: bluefs_alloc_size
3999 desc: Allocation unit size for DB and WAL devices
4002 - name: bluefs_shared_alloc_size
4005 desc: Allocation unit size for primary/shared device
4008 - name: bluefs_max_prefetch
4013 # alloc when we get this low
4014 - name: bluefs_min_log_runway
4019 # alloc this much at a time
4020 - name: bluefs_max_log_runway
4025 # before we consider
4026 - name: bluefs_log_compact_min_ratio
4031 # before we consider
4032 - name: bluefs_log_compact_min_size
4037 # ignore flush until its this big
4038 - name: bluefs_min_flush_size
4043 # sync or async log compaction
4044 - name: bluefs_compact_log_sync
4049 - name: bluefs_buffered_io
4052 desc: Enabled buffered IO for bluefs reads.
4053 long_desc: When this option is enabled, bluefs will in some cases perform buffered
4054 reads. This allows the kernel page cache to act as a secondary cache for things
4055 like RocksDB block reads. For example, if the rocksdb block cache isn't large
4056 enough to hold all blocks during OMAP iteration, it may be possible to read them
4057 from page cache instead of from the disk. This can dramatically improve
4058 performance when the osd_memory_target is too small to hold all entries in block
4059 cache but it does come with downsides. It has been reported to occasionally
4060 cause excessive kernel swapping (and associated stalls) under certain workloads.
4061 Currently the best and most consistent performing combination appears to be
4062 enabling bluefs_buffered_io and disabling system level swap. It is possible
4063 that this recommendation may change in the future however.
4066 - name: bluefs_sync_write
4071 - name: bluefs_allocator
4081 - name: bluefs_log_replay_check_allocations
4084 desc: Enables checks for allocations consistency during log replay
4087 - name: bluefs_replay_recovery
4090 desc: Attempt to read bluefs log so large that it became unreadable.
4091 long_desc: If BlueFS log grows to extreme sizes (200GB+) it is likely that it becames
4092 unreadable. This options enables heuristics that scans devices for missing data.
4093 DO NOT ENABLE BY DEFAULT
4096 - name: bluefs_replay_recovery_disable_compact
4101 - name: bluefs_check_for_zeros
4104 desc: Check data read for suspicious pages
4105 long_desc: Looks into data read to check if there is a 4K block entirely filled
4106 with zeros. If this happens, we re-read data. If there is difference, we print
4110 - bluestore_retry_disk_reads
4114 - name: bluefs_check_volume_selector_on_umount
4117 desc: Check validity of volume selector on umount
4118 long_desc: Checks if volume selector did not diverge from the state it should be in.
4119 Reference is constructed from bluefs inode table. Asserts on inconsistency.
4124 - name: bluefs_check_volume_selector_often
4127 desc: Periodically check validity of volume selector
4128 long_desc: Periodically checks if current volume selector does not diverge from the valid state.
4129 Reference is constructed from bluefs inode table. Asserts on inconsistency. This is debug feature.
4132 - bluefs_check_volume_selector_on_umount
4136 - name: bluestore_bluefs
4139 desc: Use BlueFS to back rocksdb
4140 long_desc: BlueFS allows rocksdb to share the same physical device(s) as the rest
4141 of BlueStore. It should be used in all cases unless testing/developing an alternative
4142 metadata database for BlueStore.
4147 # mirror to normal Env for debug
4148 - name: bluestore_bluefs_env_mirror
4151 desc: Mirror bluefs data to file system for testing/validation
4156 - name: bluestore_bluefs_max_free
4160 desc: Maximum free space allocated to BlueFS
4161 - name: bluestore_bluefs_alloc_failure_dump_interval
4164 desc: How frequently (in seconds) to dump allocator onBlueFS space allocation failure
4167 - name: bluestore_spdk_mem
4170 desc: Amount of dpdk memory size in MB
4171 long_desc: If running multiple SPDK instances per node, you must specify the amount
4172 of dpdk memory size in MB each instance will use, to make sure each instance uses
4175 - name: bluestore_spdk_coremask
4178 desc: A hexadecimal bit mask of the cores to run on. Note the core numbering can
4179 change between platforms and should be determined beforehand
4181 - name: bluestore_spdk_max_io_completion
4184 desc: Maximal I/Os to be batched completed while checking queue pair completions,
4185 0 means let spdk library determine it
4187 - name: bluestore_spdk_io_sleep
4190 desc: Time period to wait if there is no completed I/O from polling
4192 # If you want to use spdk driver, you need to specify NVMe serial number here
4193 # with "spdk:" prefix.
4194 # Users can use 'lspci -vvv -d 8086:0953 | grep "Device Serial Number"' to
4195 # get the serial number of Intel(R) Fultondale NVMe controllers.
4197 # bluestore_block_path = spdk:55cd2e404bd73932
4198 - name: bluestore_block_path
4201 desc: Path to block device/file
4205 - name: bluestore_block_size
4208 desc: Size of file to create for backing bluestore
4213 - name: bluestore_block_create
4216 desc: Create bluestore_block_path if it doesn't exist
4219 - bluestore_block_path
4220 - bluestore_block_size
4224 - name: bluestore_block_db_path
4227 desc: Path for db block device
4231 # rocksdb ssts (hot/warm)
4232 - name: bluestore_block_db_size
4235 desc: Size of file to create for bluestore_block_db_path
4240 - name: bluestore_block_db_create
4243 desc: Create bluestore_block_db_path if it doesn't exist
4246 - bluestore_block_db_path
4247 - bluestore_block_db_size
4251 - name: bluestore_block_wal_path
4254 desc: Path to block device/file backing bluefs wal
4259 - name: bluestore_block_wal_size
4262 desc: Size of file to create for bluestore_block_wal_path
4267 - name: bluestore_block_wal_create
4270 desc: Create bluestore_block_wal_path if it doesn't exist
4273 - bluestore_block_wal_path
4274 - bluestore_block_wal_size
4278 # whether preallocate space if block/db_path/wal_path is file rather that block device.
4279 - name: bluestore_block_preallocate_file
4282 desc: Preallocate file created via bluestore_block*_create
4287 - name: bluestore_ignore_data_csum
4290 desc: Ignore checksum errors on read and do not generate an EIO error
4295 - name: bluestore_csum_type
4298 desc: Default checksum algorithm to use
4299 long_desc: crc32c, xxhash32, and xxhash64 are available. The _16 and _8 variants
4300 use only a subset of the bits for more compact (but less reliable) checksumming.
4301 fmt_desc: The default checksum algorithm to use.
4313 - name: bluestore_retry_disk_reads
4316 desc: Number of read retries on checksum validation error
4317 long_desc: Retries to read data from the disk this many times when checksum validation
4318 fails to handle spurious read errors gracefully.
4325 - name: bluestore_min_alloc_size
4328 desc: Minimum allocation size to allocate for an object
4329 long_desc: A smaller allocation size generally means less data is read and then
4330 rewritten when a copy-on-write operation is triggered (e.g., when writing to something
4331 that was recently snapshotted). Similarly, less data is journaled before performing
4332 an overwrite (writes smaller than min_alloc_size must first pass through the BlueStore
4333 journal). Larger values of min_alloc_size reduce the amount of metadata required
4334 to describe the on-disk layout and reduce overall fragmentation.
4339 - name: bluestore_min_alloc_size_hdd
4342 desc: Default min_alloc_size value for rotational media
4345 - bluestore_min_alloc_size
4349 - name: bluestore_min_alloc_size_ssd
4352 desc: Default min_alloc_size value for non-rotational (solid state) media
4355 - bluestore_min_alloc_size
4359 - name: bluestore_use_optimal_io_size_for_min_alloc_size
4362 desc: Discover media optimal IO Size and use for min_alloc_size
4365 - bluestore_min_alloc_size
4369 - name: bluestore_max_alloc_size
4372 desc: Maximum size of a single allocation (0 for no max)
4377 - name: bluestore_prefer_deferred_size
4380 desc: Writes smaller than this size will be written to the journal and then asynchronously
4381 written to the device. This can be beneficial when using rotational media where
4382 seeks are expensive, and is helpful both with and without solid state journal/wal
4388 - name: bluestore_prefer_deferred_size_hdd
4391 desc: Default bluestore_prefer_deferred_size for rotational media
4394 - bluestore_prefer_deferred_size
4398 - name: bluestore_prefer_deferred_size_ssd
4401 desc: Default bluestore_prefer_deferred_size for non-rotational (solid state) media
4404 - bluestore_prefer_deferred_size
4408 - name: bluestore_compression_mode
4411 desc: Default policy for using compression when pool does not specify
4412 long_desc: '''none'' means never use compression. ''passive'' means use compression
4413 when clients hint that data is compressible. ''aggressive'' means use compression
4414 unless clients hint that data is not compressible. This option is used when the
4415 per-pool property for the compression mode is not present.'
4416 fmt_desc: The default policy for using compression if the per-pool property
4417 ``compression_mode`` is not set. ``none`` means never use
4418 compression. ``passive`` means use compression when
4419 :c:func:`clients hint <rados_set_alloc_hint>` that data is
4420 compressible. ``aggressive`` means use compression unless
4421 clients hint that data is not compressible. ``force`` means use
4422 compression under all circumstances even if the clients hint that
4423 the data is not compressible.
4433 - name: bluestore_compression_algorithm
4436 desc: Default compression algorithm to use when writing object data
4437 long_desc: This controls the default compressor to use (if any) if the per-pool
4438 property is not set. Note that zstd is *not* recommended for bluestore due to
4439 high CPU overhead when compressing small amounts of data.
4440 fmt_desc: The default compressor to use (if any) if the per-pool property
4441 ``compression_algorithm`` is not set. Note that ``zstd`` is *not*
4442 recommended for BlueStore due to high CPU overhead when
4443 compressing small amounts of data.
4454 - name: bluestore_compression_min_blob_size
4457 desc: Maximum chunk size to apply compression to when random access is expected
4459 long_desc: Chunks larger than this are broken into smaller chunks before being compressed
4460 fmt_desc: Chunks smaller than this are never compressed.
4461 The per-pool property ``compression_min_blob_size`` overrides
4467 - name: bluestore_compression_min_blob_size_hdd
4470 desc: Default value of bluestore_compression_min_blob_size for rotational media
4471 fmt_desc: Default value of ``bluestore compression min blob size``
4472 for rotational media.
4475 - bluestore_compression_min_blob_size
4479 - name: bluestore_compression_min_blob_size_ssd
4482 desc: Default value of bluestore_compression_min_blob_size for non-rotational (solid
4484 fmt_desc: Default value of ``bluestore compression min blob size``
4485 for non-rotational (solid state) media.
4488 - bluestore_compression_min_blob_size
4492 - name: bluestore_compression_max_blob_size
4495 desc: Maximum chunk size to apply compression to when non-random access is expected
4497 long_desc: Chunks larger than this are broken into smaller chunks before being compressed
4498 fmt_desc: Chunks larger than this value are broken into smaller blobs of at most
4499 ``bluestore_compression_max_blob_size`` bytes before being compressed.
4500 The per-pool property ``compression_max_blob_size`` overrides
4506 - name: bluestore_compression_max_blob_size_hdd
4509 desc: Default value of bluestore_compression_max_blob_size for rotational media
4510 fmt_desc: Default value of ``bluestore compression max blob size``
4511 for rotational media.
4514 - bluestore_compression_max_blob_size
4518 - name: bluestore_compression_max_blob_size_ssd
4521 desc: Default value of bluestore_compression_max_blob_size for non-rotational (solid
4523 fmt_desc: Default value of ``bluestore compression max blob size``
4524 for non-rotational (SSD, NVMe) media.
4527 - bluestore_compression_max_blob_size
4531 # Specifies minimum expected amount of saved allocation units
4532 # per single blob to enable compressed blobs garbage collection
4533 - name: bluestore_gc_enable_blob_threshold
4540 # Specifies minimum expected amount of saved allocation units
4541 # per all blobsb to enable compressed blobs garbage collection
4542 - name: bluestore_gc_enable_total_threshold
4549 - name: bluestore_max_blob_size
4552 long_desc: Bluestore blobs are collections of extents (ie on-disk data) originating
4553 from one or more objects. Blobs can be compressed, typically have checksum data,
4554 may be overwritten, may be shared (with an extent ref map), or split. This setting
4555 controls the maximum size a blob is allowed to be.
4560 - name: bluestore_max_blob_size_hdd
4565 - bluestore_max_blob_size
4569 - name: bluestore_max_blob_size_ssd
4574 - bluestore_max_blob_size
4578 # Require the net gain of compression at least to be at this ratio,
4579 # otherwise we don't compress.
4580 # And ask for compressing at least 12.5%(1/8) off, by default.
4581 - name: bluestore_compression_required_ratio
4584 desc: Compression ratio required to store compressed data
4585 long_desc: If we compress data and get less than this we discard the result and
4586 store the original uncompressed data.
4587 fmt_desc: The ratio of the size of the data chunk after
4588 compression relative to the original size must be at
4589 least this small in order to store the compressed
4595 - name: bluestore_extent_map_shard_max_size
4598 desc: Max size (bytes) for a single extent map shard before splitting
4601 - name: bluestore_extent_map_shard_target_size
4604 desc: Target size (bytes) for a single extent map shard
4607 - name: bluestore_extent_map_shard_min_size
4610 desc: Min size (bytes) for a single extent map shard before merging
4613 - name: bluestore_extent_map_shard_target_size_slop
4616 desc: Ratio above/below target for a shard when trying to align to an existing extent
4620 - name: bluestore_extent_map_inline_shard_prealloc_size
4623 desc: Preallocated buffer for inline shards
4626 - name: bluestore_cache_trim_interval
4629 desc: How frequently we trim the bluestore cache
4632 - name: bluestore_cache_trim_max_skip_pinned
4635 desc: Max pinned cache entries we consider before giving up
4638 - name: bluestore_cache_type
4641 desc: Cache replacement algorithm
4647 - name: bluestore_2q_cache_kin_ratio
4650 desc: 2Q paper suggests .5
4653 - name: bluestore_2q_cache_kout_ratio
4656 desc: 2Q paper suggests .5
4659 - name: bluestore_cache_size
4662 desc: Cache size (in bytes) for BlueStore
4663 long_desc: This includes data and metadata cached by BlueStore as well as memory
4664 devoted to rocksdb's cache(s).
4665 fmt_desc: The amount of memory BlueStore will use for its cache. If zero,
4666 ``bluestore_cache_size_hdd`` or ``bluestore_cache_size_ssd`` will
4670 - name: bluestore_cache_size_hdd
4673 desc: Default bluestore_cache_size for rotational media
4674 fmt_desc: The default amount of memory BlueStore will use for its cache when
4678 - bluestore_cache_size
4680 - name: bluestore_cache_size_ssd
4683 desc: Default bluestore_cache_size for non-rotational (solid state) media
4684 fmt_desc: The default amount of memory BlueStore will use for its cache when
4688 - bluestore_cache_size
4690 - name: bluestore_cache_meta_ratio
4693 desc: Ratio of bluestore cache to devote to metadata
4696 - bluestore_cache_size
4698 - name: bluestore_cache_kv_ratio
4701 desc: Ratio of bluestore cache to devote to key/value database (RocksDB)
4704 - bluestore_cache_size
4706 - name: bluestore_cache_kv_onode_ratio
4709 desc: Ratio of bluestore cache to devote to kv onode column family (rocksdb)
4712 - bluestore_cache_size
4713 - name: bluestore_cache_autotune
4716 desc: Automatically tune the ratio of caches while respecting min values.
4717 fmt_desc: Automatically tune the space ratios assigned to various BlueStore
4718 caches while respecting minimum values.
4721 - bluestore_cache_size
4722 - bluestore_cache_meta_ratio
4723 - name: bluestore_cache_autotune_interval
4726 desc: The number of seconds to wait between rebalances when cache autotune is enabled.
4728 The number of seconds to wait between rebalances when cache autotune
4729 is enabled. This setting changes how quickly the allocation ratios of
4730 various caches are recomputed. Note: Setting this interval too small
4731 can result in high CPU usage and lower performance.
4734 - bluestore_cache_autotune
4735 - name: bluestore_cache_age_bin_interval
4738 desc: The duration (in seconds) represented by a single cache age bin.
4740 The caches used by bluestore will assign cache entries to an 'age bin'
4741 that represents a period of time during which that cache entry was most
4742 recently updated. By binning the caches in this way, Ceph's priority
4743 cache balancing code can make better decisions about which caches should
4744 receive priority based on the relative ages of items in the caches. By
4745 default, a single cache age bin represents 1 second of time. Note:
4746 Setting this interval too small can result in high CPU usage and lower
4750 - bluestore_cache_age_bins_kv
4751 - bluestore_cache_age_bins_kv_onode
4752 - bluestore_cache_age_bins_meta
4753 - bluestore_cache_age_bins_data
4754 - name: bluestore_cache_age_bins_kv
4757 desc: A 10 element, space separated list of age bins for kv cache
4759 A 10 element, space separated list of cache age bins grouped by
4760 priority such that PRI1=[0,n), PRI2=[n,n+1), PRI3=[n+1,n+2) ...
4761 PRI10=[n+8,n+9). Values represent the starting and ending bin for each
4762 priority level. A 0 in the 2nd term will prevent any items from being
4763 associated with that priority. bin duration is based on the
4764 bluestore_cache_age_bin_interval value. For example,
4765 "1 5 0 0 0 0 0 0 0 0" defines bin ranges for two priority levels. PRI1
4766 contains 1 age bin. Assuming the default age bin interval of 1 second,
4767 PRI1 represents cache items that are less than 1 second old. PRI2 has 4
4768 bins representing cache items that are 1 to less than 5 seconds old. All
4769 other cache items in this example are associated with the lowest priority
4770 level as PRI3-PRI10 all have 0s in their second term.
4771 default: "1 2 6 24 120 720 0 0 0 0"
4773 - bluestore_cache_age_bin_interval
4774 - name: bluestore_cache_age_bins_kv_onode
4777 desc: A 10 element, space separated list of age bins for kv onode cache
4779 A 10 element, space separated list of cache age bins grouped by
4780 priority such that PRI1=[0,n), PRI2=[n,n+1), PRI3=[n+1,n+2) ...
4781 PRI10=[n+8,n+9). Values represent the starting and ending bin for each
4782 priority level. A 0 in the 2nd term will prevent any items from being
4783 associated with that priority. bin duration is based on the
4784 bluestore_cache_age_bin_interval value. For example,
4785 "1 5 0 0 0 0 0 0 0 0" defines bin ranges for two priority levels. PRI1
4786 contains 1 age bin. Assuming the default age bin interval of 1 second,
4787 PRI1 represents cache items that are less than 1 second old. PRI2 has 4
4788 bins representing cache items that are 1 to less than 5 seconds old. All
4789 other cache items in this example are associated with the lowest priority
4790 level as PRI3-PRI10 all have 0s in their second term.
4791 default: "0 0 0 0 0 0 0 0 0 720"
4793 - bluestore_cache_age_bin_interval
4794 - name: bluestore_cache_age_bins_meta
4797 desc: A 10 element, space separated list of age bins for onode cache
4799 A 10 element, space separated list of cache age bins grouped by
4800 priority such that PRI1=[0,n), PRI2=[n,n+1), PRI3=[n+1,n+2) ...
4801 PRI10=[n+8,n+9). Values represent the starting and ending bin for each
4802 priority level. A 0 in the 2nd term will prevent any items from being
4803 associated with that priority. bin duration is based on the
4804 bluestore_cache_age_bin_interval value. For example,
4805 "1 5 0 0 0 0 0 0 0 0" defines bin ranges for two priority levels. PRI1
4806 contains 1 age bin. Assuming the default age bin interval of 1 second,
4807 PRI1 represents cache items that are less than 1 second old. PRI2 has 4
4808 bins representing cache items that are 1 to less than 5 seconds old. All
4809 other cache items in this example are associated with the lowest priority
4810 level as PRI3-PRI10 all have 0s in their second term.
4811 default: "1 2 6 24 120 720 0 0 0 0"
4813 - bluestore_cache_age_bin_interval
4814 - name: bluestore_cache_age_bins_data
4817 desc: A 10 element, space separated list of age bins for data cache
4819 A 10 element, space separated list of cache age bins grouped by
4820 priority such that PRI1=[0,n), PRI2=[n,n+1), PRI3=[n+1,n+2) ...
4821 PRI10=[n+8,n+9). Values represent the starting and ending bin for each
4822 priority level. A 0 in the 2nd term will prevent any items from being
4823 associated with that priority. bin duration is based on the
4824 bluestore_cache_age_bin_interval value. For example,
4825 "1 5 0 0 0 0 0 0 0 0" defines bin ranges for two priority levels. PRI1
4826 contains 1 age bin. Assuming the default age bin interval of 1 second,
4827 PRI1 represents cache items that are less than 1 second old. PRI2 has 4
4828 bins representing cache items that are 1 to less than 5 seconds old. All
4829 other cache items in this example are associated with the lowest priority
4830 level as PRI3-PRI10 all have 0s in their second term.
4831 default: "1 2 6 24 120 720 0 0 0 0"
4833 - bluestore_cache_age_bin_interval
4834 - name: bluestore_alloc_stats_dump_interval
4837 desc: The period (in second) for logging allocation statistics.
4840 - name: bluestore_kvbackend
4843 desc: Key value database to use for bluestore
4848 - name: bluestore_allocator
4851 desc: Allocator policy
4852 long_desc: Allocator to use for bluestore. Stupid should only be used for testing.
4861 - name: bluestore_freelist_blocks_per_key
4864 desc: Block (and bits) per database key
4867 - name: bluestore_bitmapallocator_blocks_per_zone
4872 - name: bluestore_bitmapallocator_span_size
4877 - name: bluestore_max_deferred_txc
4880 desc: Max transactions with deferred writes that can accumulate before we force
4881 flush deferred writes
4884 - name: bluestore_max_defer_interval
4887 desc: max duration to force deferred submit
4890 - name: bluestore_rocksdb_options
4893 desc: Full set of rocksdb settings to override
4894 default: compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2,max_total_wal_size=1073741824
4896 - name: bluestore_rocksdb_options_annex
4899 desc: An addition to bluestore_rocksdb_options. Allows setting rocksdb options without
4900 repeating the existing defaults.
4902 - name: bluestore_rocksdb_cf
4905 desc: Enable use of rocksdb column families for bluestore metadata
4906 fmt_desc: Enables sharding of BlueStore's RocksDB.
4907 When ``true``, ``bluestore_rocksdb_cfs`` is used.
4908 Only applied when OSD is doing ``--mkfs``.
4912 // This is necessary as the Seastar's allocator imposes restrictions
4913 // on the number of threads that entered malloc/free/*. Unfortunately,
4914 // RocksDB sharding in BlueStore dramatically lifted the number of
4915 // threads spawn during RocksDB's init.
4916 .set_validator([](std::string *value, std::string *error_message) {
4917 if (const bool parsed_value = strict_strtob(value->c_str(), error_message);
4918 error_message->empty() && parsed_value) {
4919 *error_message = "invalid BlueStore sharding configuration."
4920 " Be aware any change takes effect only on mkfs!";
4927 - name: bluestore_rocksdb_cfs
4930 desc: Definition of column families and their sharding
4931 long_desc: 'Space separated list of elements: column_def [ ''='' rocksdb_options
4932 ]. column_def := column_name [ ''('' shard_count [ '','' hash_begin ''-'' [ hash_end
4933 ] ] '')'' ]. Example: ''I=write_buffer_size=1048576 O(6) m(7,10-)''. Interval
4934 [hash_begin..hash_end) defines characters to use for hash calculation. Recommended
4935 hash ranges: O(0-13) P(0-8) m(0-16). Sharding of S,T,C,M,B prefixes is inadvised'
4936 fmt_desc: Definition of BlueStore's RocksDB sharding.
4937 The optimal value depends on multiple factors, and modification is invadvisable.
4938 This setting is used only when OSD is doing ``--mkfs``.
4939 Next runs of OSD retrieve sharding from disk.
4940 default: m(3) p(3,0-12) O(3,0-13)=block_cache={type=binned_lru} L P
4941 - name: bluestore_qfsck_on_mount
4944 desc: Run quick-fsck at mount comparing allocation-file to RocksDB allocation state
4947 - name: bluestore_fsck_on_mount
4950 desc: Run fsck at mount
4953 - name: bluestore_fsck_on_mount_deep
4956 desc: Run deep fsck at mount when bluestore_fsck_on_mount is set to true
4959 - name: bluestore_fsck_quick_fix_on_mount
4962 desc: Do quick-fix for the store at mount
4965 - name: bluestore_fsck_on_umount
4968 desc: Run fsck at umount
4971 - name: bluestore_allocation_from_file
4974 desc: Remove allocation info from RocksDB and store the info in a new allocation file
4977 - name: bluestore_fsck_on_umount_deep
4980 desc: Run deep fsck at umount when bluestore_fsck_on_umount is set to true
4983 - name: bluestore_fsck_on_mkfs
4986 desc: Run fsck after mkfs
4989 - name: bluestore_fsck_on_mkfs_deep
4992 desc: Run deep fsck after mkfs
4995 - name: bluestore_sync_submit_transaction
4998 desc: Try to submit metadata transaction to rocksdb in queuing thread context
5001 - name: bluestore_fsck_read_bytes_cap
5004 desc: Maximum bytes read at once by deep fsck
5009 - name: bluestore_fsck_quick_fix_threads
5012 desc: Number of additional threads to perform quick-fix (shallow fsck) command
5015 - name: bluestore_fsck_shared_blob_tracker_size
5018 desc: Size(a fraction of osd_memory_target, defaults to 128MB) of a hash table to track shared blobs ref counts. Higher the size, more precise is the tracker -> less overhead during the repair.
5024 - name: bluestore_throttle_bytes
5027 desc: Maximum bytes in flight before we throttle IO submission
5032 - name: bluestore_throttle_deferred_bytes
5035 desc: Maximum bytes for deferred writes before we throttle IO submission
5040 - name: bluestore_throttle_cost_per_io
5043 desc: Overhead added to transaction cost (in bytes) for each IO
5048 - name: bluestore_throttle_cost_per_io_hdd
5051 desc: Default bluestore_throttle_cost_per_io for rotational media
5054 - bluestore_throttle_cost_per_io
5058 - name: bluestore_throttle_cost_per_io_ssd
5061 desc: Default bluestore_throttle_cost_per_io for non-rotation (solid state) media
5064 - bluestore_throttle_cost_per_io
5068 - name: bluestore_deferred_batch_ops
5071 desc: Max number of deferred writes before we flush the deferred write queue
5078 - name: bluestore_deferred_batch_ops_hdd
5081 desc: Default bluestore_deferred_batch_ops for rotational media
5084 - bluestore_deferred_batch_ops
5090 - name: bluestore_deferred_batch_ops_ssd
5093 desc: Default bluestore_deferred_batch_ops for non-rotational (solid state) media
5096 - bluestore_deferred_batch_ops
5102 - name: bluestore_nid_prealloc
5105 desc: Number of unique object ids to preallocate at a time
5108 - name: bluestore_blobid_prealloc
5111 desc: Number of unique blob ids to preallocate at a time
5114 - name: bluestore_clone_cow
5117 desc: Use copy-on-write when cloning objects (versus reading and rewriting them
5123 - name: bluestore_default_buffered_read
5126 desc: Cache read results by default (unless hinted NOCACHE or WONTNEED)
5131 - name: bluestore_default_buffered_write
5134 desc: Cache writes by default (unless hinted NOCACHE or WONTNEED)
5139 - name: bluestore_debug_no_reuse_blocks
5144 - name: bluestore_debug_small_allocations
5149 - name: bluestore_debug_too_many_blobs_threshold
5154 - name: bluestore_debug_freelist
5159 - name: bluestore_debug_prefill
5162 desc: simulate fragmentation
5165 - name: bluestore_debug_prefragment_max
5170 - name: bluestore_debug_inject_read_err
5175 - name: bluestore_debug_randomize_serial_transaction
5180 - name: bluestore_debug_omit_block_device_write
5185 - name: bluestore_debug_fsck_abort
5190 - name: bluestore_debug_omit_kv_commit
5195 - name: bluestore_debug_permit_any_bdev_label
5200 - name: bluestore_debug_random_read_err
5205 - name: bluestore_debug_inject_bug21040
5210 - name: bluestore_debug_inject_csum_err_probability
5213 desc: inject crc verification errors into bluestore device reads
5216 - name: bluestore_debug_legacy_omap
5219 desc: Allows mkfs to create OSD in legacy OMAP naming mode (neither per-pool nor per-pg).
5220 This is intended primarily for developers' purposes. The resulting OSD might/would
5221 be transformed to the currrently default 'per-pg' format when BlueStore's quick-fix or
5225 - name: bluestore_fsck_error_on_no_per_pool_stats
5228 desc: Make fsck error (instead of warn) when bluestore lacks per-pool stats, e.g.,
5232 - name: bluestore_warn_on_bluefs_spillover
5235 desc: Enable health indication on bluefs slow device usage
5238 - name: bluestore_warn_on_legacy_statfs
5241 desc: Enable health indication on lack of per-pool statfs reporting from bluestore
5244 - name: bluestore_warn_on_spurious_read_errors
5247 desc: Enable health indication when spurious read errors are observed by OSD
5250 - name: bluestore_fsck_error_on_no_per_pool_omap
5253 desc: Make fsck error (instead of warn) when objects without per-pool omap are found
5256 - name: bluestore_fsck_error_on_no_per_pg_omap
5259 desc: Make fsck error (instead of warn) when objects without per-pg omap are found
5262 - name: bluestore_warn_on_no_per_pool_omap
5265 desc: Enable health indication on lack of per-pool omap
5268 - name: bluestore_warn_on_no_per_pg_omap
5271 desc: Enable health indication on lack of per-pg omap
5274 - name: bluestore_log_op_age
5277 desc: log operation if it's slower than this age (seconds)
5280 - name: bluestore_log_omap_iterator_age
5283 desc: log omap iteration operation if it's slower than this age (seconds)
5286 - name: bluestore_log_collection_list_age
5289 desc: log collection list operation if it's slower than this age (seconds)
5292 - name: bluestore_debug_enforce_settings
5295 desc: Enforces specific hw profile settings
5296 long_desc: '''hdd'' enforces settings intended for BlueStore above a rotational
5297 drive. ''ssd'' enforces settings intended for BlueStore above a solid drive. ''default''
5298 - using settings for the actual hardware.'
5305 - name: bluestore_avl_alloc_ff_max_search_count
5308 desc: Search for this many ranges in first-fit mode before switching over to
5309 to best-fit mode. 0 to iterate through all ranges for required chunk.
5311 - name: bluestore_avl_alloc_ff_max_search_bytes
5314 desc: Maximum distance to search in first-fit mode before switching over to
5315 to best-fit mode. 0 to iterate through all ranges for required chunk.
5317 - name: bluestore_avl_alloc_bf_threshold
5320 desc: Sets threshold at which shrinking max free chunk size triggers enabling best-fit
5322 long_desc: 'AVL allocator works in two modes: near-fit and best-fit. By default,
5323 it uses very fast near-fit mode, in which it tries to fit a new block near the
5324 last allocated block of similar size. The second mode is much slower best-fit
5325 mode, in which it tries to find an exact match for the requested allocation. This
5326 mode is used when either the device gets fragmented or when it is low on free
5327 space. When the largest free block is smaller than ''bluestore_avl_alloc_bf_threshold'',
5328 best-fit mode is used.'
5331 - bluestore_avl_alloc_bf_free_pct
5332 - name: bluestore_avl_alloc_bf_free_pct
5335 desc: Sets threshold at which shrinking free space (in %, integer) triggers enabling
5337 long_desc: 'AVL allocator works in two modes: near-fit and best-fit. By default,
5338 it uses very fast near-fit mode, in which it tries to fit a new block near the
5339 last allocated block of similar size. The second mode is much slower best-fit
5340 mode, in which it tries to find an exact match for the requested allocation. This
5341 mode is used when either the device gets fragmented or when it is low on free
5342 space. When free space is smaller than ''bluestore_avl_alloc_bf_free_pct'', best-fit
5346 - bluestore_avl_alloc_bf_threshold
5347 - name: bluestore_hybrid_alloc_mem_cap
5350 desc: Maximum RAM hybrid allocator should use before enabling bitmap supplement
5352 - name: bluestore_volume_selection_policy
5355 desc: Determines bluefs volume selection policy
5356 long_desc: Determines bluefs volume selection policy. 'use_some_extra' policy allows
5357 to override RocksDB level granularity and put high level's data to faster device
5358 even when the level doesn't completely fit there. 'fit_to_fast' policy enables
5359 using 100% of faster disk capacity and allows the user to turn on 'level_compaction_dynamic_level_bytes'
5360 option in RocksDB options.
5361 default: use_some_extra
5367 - name: bluestore_volume_selection_reserved_factor
5370 desc: DB level size multiplier. Determines amount of space at DB device to bar from
5371 the usage when 'use some extra' policy is in action. Reserved size is determined
5372 as sum(L_max_size[0], L_max_size[L-1]) + L_max_size[L] * this_factor
5377 - name: bluestore_volume_selection_reserved
5380 desc: Space reserved at DB device and not allowed for 'use some extra' policy usage.
5381 Overrides 'bluestore_volume_selection_reserved_factor' setting and introduces
5382 straightforward limit.
5390 desc: Enables Linux io_uring API instead of libaio
5392 - name: bdev_ioring_hipri
5395 desc: Enables Linux io_uring API Use polled IO completions
5397 - name: bdev_ioring_sqthread_poll
5400 desc: Enables Linux io_uring API Offload submission/completion to kernel thread
5402 - name: bluestore_kv_sync_util_logging_s
5405 desc: KV sync thread utilization logging period
5406 long_desc: How often (in seconds) to print KV sync thread utilization, not logged
5407 when set to 0 or when utilization is 0%
5412 - name: bluestore_fail_eio
5415 desc: fail/crash on EIO
5416 long_desc: whether bluestore osd fails on eio
5421 - name: bluestore_zero_block_detection
5424 desc: punch holes instead of writing zeros
5425 long_desc: Intended for large-scale synthetic testing. Currently this is implemented
5426 with punch hole semantics, affecting the logical extent map of the object. This does
5427 not interact well with some RBD and CephFS features.
5432 - name: kstore_max_ops
5437 - name: kstore_max_bytes
5442 - name: kstore_backend
5447 - name: kstore_rocksdb_options
5450 desc: Options to pass through when RocksDB is used as the KeyValueDB for kstore.
5451 default: compression=kNoCompression
5453 - name: kstore_fsck_on_mount
5456 desc: Whether or not to run fsck on mount for kstore.
5459 - name: kstore_fsck_on_mount_deep
5462 desc: Whether or not to run deep fsck on mount for kstore
5465 - name: kstore_nid_prealloc
5470 - name: kstore_sync_transaction
5475 - name: kstore_sync_submit_transaction
5480 - name: kstore_onode_map_size
5485 - name: kstore_default_stripe_size
5490 # rocksdb options that will be used for omap(if omap_backend is rocksdb)
5491 - name: filestore_rocksdb_options
5494 desc: Options to pass through when RocksDB is used as the KeyValueDB for filestore.
5495 default: max_background_jobs=10,compaction_readahead_size=2097152,compression=kNoCompression
5497 - name: filestore_omap_backend
5500 desc: The KeyValueDB to use for filestore metadata (ie omap).
5506 - name: filestore_omap_backend_path
5509 desc: The path where the filestore KeyValueDB should store it's database(s).
5511 # filestore wb throttle limits
5512 - name: filestore_wbthrottle_enable
5515 desc: Enabling throttling of operations to backing file system
5518 - name: filestore_wbthrottle_btrfs_bytes_start_flusher
5521 desc: Start flushing (fsyncing) when this many bytes are written(btrfs)
5524 - name: filestore_wbthrottle_btrfs_bytes_hard_limit
5527 desc: Block writes when this many bytes haven't been flushed (fsynced) (btrfs)
5530 - name: filestore_wbthrottle_btrfs_ios_start_flusher
5533 desc: Start flushing (fsyncing) when this many IOs are written (brtrfs)
5536 - name: filestore_wbthrottle_btrfs_ios_hard_limit
5539 desc: Block writes when this many IOs haven't been flushed (fsynced) (btrfs)
5542 - name: filestore_wbthrottle_btrfs_inodes_start_flusher
5545 desc: Start flushing (fsyncing) when this many distinct inodes have been modified
5549 - name: filestore_wbthrottle_xfs_bytes_start_flusher
5552 desc: Start flushing (fsyncing) when this many bytes are written(xfs)
5555 - name: filestore_wbthrottle_xfs_bytes_hard_limit
5558 desc: Block writes when this many bytes haven't been flushed (fsynced) (xfs)
5561 - name: filestore_wbthrottle_xfs_ios_start_flusher
5564 desc: Start flushing (fsyncing) when this many IOs are written (xfs)
5567 - name: filestore_wbthrottle_xfs_ios_hard_limit
5570 desc: Block writes when this many IOs haven't been flushed (fsynced) (xfs)
5573 - name: filestore_wbthrottle_xfs_inodes_start_flusher
5576 desc: Start flushing (fsyncing) when this many distinct inodes have been modified
5580 # These must be less than the fd limit
5581 - name: filestore_wbthrottle_btrfs_inodes_hard_limit
5584 desc: Block writing when this many inodes have outstanding writes (btrfs)
5587 - name: filestore_wbthrottle_xfs_inodes_hard_limit
5590 desc: Block writing when this many inodes have outstanding writes (xfs)
5593 # Introduce a O_DSYNC write in the filestore
5594 - name: filestore_odsync_write
5597 desc: Write with O_DSYNC
5600 # Tests index failure paths
5601 - name: filestore_index_retry_probability
5606 # Allow object read error injection
5607 - name: filestore_debug_inject_read_err
5612 - name: filestore_debug_random_read_err
5617 # Expensive debugging check on sync
5618 - name: filestore_debug_omap_check
5622 fmt_desc: Debugging check on synchronization. This is an expensive operation.
5625 - name: filestore_omap_header_cache_size
5630 # Use omap for xattrs for attrs over
5631 # filestore_max_inline_xattr_size or
5632 - name: filestore_max_inline_xattr_size
5637 - name: filestore_max_inline_xattr_size_xfs
5642 - name: filestore_max_inline_xattr_size_btrfs
5647 - name: filestore_max_inline_xattr_size_other
5652 # for more than filestore_max_inline_xattrs attrs
5653 - name: filestore_max_inline_xattrs
5658 - name: filestore_max_inline_xattrs_xfs
5663 - name: filestore_max_inline_xattrs_btrfs
5668 - name: filestore_max_inline_xattrs_other
5673 - name: filestore_max_xattr_value_size
5678 - name: filestore_max_xattr_value_size_xfs
5683 - name: filestore_max_xattr_value_size_btrfs
5688 # ext4 allows 4k xattrs total including some smallish extra fields and the
5689 # keys. We're allowing 2 512 inline attrs in addition some some filestore
5690 # replay attrs. After accounting for those, we still need to fit up to
5691 # two attrs of this value. That means we need this value to be around 1k
5692 # to be safe. This is hacky, but it's not worth complicating the code
5693 # to work around ext4's total xattr limit.
5694 - name: filestore_max_xattr_value_size_other
5700 - name: filestore_sloppy_crc
5705 - name: filestore_sloppy_crc_block_size
5710 - name: filestore_max_alloc_hint_size
5716 - name: filestore_max_sync_interval
5719 desc: Period between calls to syncfs(2) and journal trims (seconds)
5723 - name: filestore_min_sync_interval
5726 desc: Minimum period between calls to syncfs(2)
5729 - name: filestore_btrfs_snap
5734 - name: filestore_btrfs_clone_range
5737 desc: Use btrfs clone_range ioctl to efficiently duplicate objects
5740 # zfsonlinux is still unstable
5741 - name: filestore_zfs_snap
5746 - name: filestore_fsync_flushes_journal_data
5751 # (try to) use fiemap
5752 - name: filestore_fiemap
5755 desc: Use fiemap ioctl(2) to determine which parts of objects are sparse
5758 - name: filestore_punch_hole
5761 desc: Use fallocate(2) FALLOC_FL_PUNCH_HOLE to efficiently zero ranges of objects
5764 # (try to) use seek_data/hole
5765 - name: filestore_seek_data_hole
5768 desc: Use lseek(2) SEEK_HOLE and SEEK_DATA to determine which parts of objects are
5772 - name: filestore_splice
5775 desc: Use splice(2) to more efficiently copy data between files
5778 - name: filestore_fadvise
5781 desc: Use posix_fadvise(2) to pass hints to file system
5784 # collect device partition information for management application to use
5785 - name: filestore_collect_device_partition_information
5788 desc: Collect metadata about the backing file system on OSD startup
5791 # (try to) use extsize for alloc hint NOTE: extsize seems to trigger
5792 # data corruption in xfs prior to kernel 3.5. filestore will
5793 # implicitly disable this if it cannot confirm the kernel is newer
5795 # NOTE: This option involves a tradeoff: When disabled, fragmentation is
5796 # worse, but large sequential writes are faster. When enabled, large
5797 # sequential writes are slower, but fragmentation is reduced.
5798 - name: filestore_xfs_extsize
5801 desc: Use XFS extsize ioctl(2) to hint allocator about expected write sizes
5804 - name: filestore_journal_parallel
5809 - name: filestore_journal_writeahead
5814 - name: filestore_journal_trailing
5819 - name: filestore_queue_max_ops
5822 desc: Max IO operations in flight
5825 - name: filestore_queue_max_bytes
5828 desc: Max (written) bytes in flight
5831 - name: filestore_caller_concurrency
5836 # Expected filestore throughput in B/s
5837 - name: filestore_expected_throughput_bytes
5840 desc: Expected throughput of backend device (aids throttling calculations)
5843 # Expected filestore throughput in ops/s
5844 - name: filestore_expected_throughput_ops
5847 desc: Expected through of backend device in IOPS (aids throttling calculations)
5850 # Filestore max delay multiple. Defaults to 0 (disabled)
5851 - name: filestore_queue_max_delay_multiple
5856 # Filestore high delay multiple. Defaults to 0 (disabled)
5857 - name: filestore_queue_high_delay_multiple
5862 # Filestore max delay multiple ops. Defaults to 0 (disabled)
5863 - name: filestore_queue_max_delay_multiple_bytes
5868 # Filestore high delay multiple bytes. Defaults to 0 (disabled)
5869 - name: filestore_queue_high_delay_multiple_bytes
5874 # Filestore max delay multiple ops. Defaults to 0 (disabled)
5875 - name: filestore_queue_max_delay_multiple_ops
5880 # Filestore high delay multiple ops. Defaults to 0 (disabled)
5881 - name: filestore_queue_high_delay_multiple_ops
5886 - name: filestore_queue_low_threshhold
5891 - name: filestore_queue_high_threshhold
5896 - name: filestore_op_threads
5899 desc: Threads used to apply changes to backing file system
5902 - name: filestore_op_thread_timeout
5905 desc: Seconds before a worker thread is considered stalled
5908 - name: filestore_op_thread_suicide_timeout
5911 desc: Seconds before a worker thread is considered dead
5914 - name: filestore_commit_timeout
5917 desc: Seconds before backing file system is considered hung
5920 - name: filestore_fiemap_threshold
5925 - name: filestore_merge_threshold
5930 - name: filestore_split_multiple
5935 - name: filestore_split_rand_factor
5940 - name: filestore_update_to
5945 - name: filestore_blackhole
5950 - name: filestore_fd_cache_size
5955 - name: filestore_fd_cache_shards
5960 - name: filestore_ondisk_finisher_threads
5965 - name: filestore_apply_finisher_threads
5970 # file onto which store transaction dumps
5971 - name: filestore_dump_file
5975 # inject a failure at the n'th opportunity
5976 - name: filestore_kill_at
5981 # artificially stall for N seconds in op queue thread
5982 - name: filestore_inject_stall
5988 - name: filestore_fail_eio
5993 - name: filestore_debug_verify_split
6002 fmt_desc: Enables direct i/o to the journal. Requires ``journal block
6003 align`` set to ``true``.
6009 fmt_desc: Enables using ``libaio`` for asynchronous writes to the journal.
6010 Requires ``journal dio`` set to ``true``. Version 0.61 and later, ``true``.
6011 Version 0.60 and earlier, ``false``.
6013 - name: journal_force_aio
6018 - name: journal_block_size
6023 - name: journal_block_align
6027 fmt_desc: Block aligns write operations. Required for ``dio`` and ``aio``.
6029 - name: journal_write_header_frequency
6034 - name: journal_max_write_bytes
6037 desc: Max bytes in flight to journal
6038 fmt_desc: The maximum number of bytes the journal will write at
6042 - name: journal_max_write_entries
6045 desc: Max IOs in flight to journal
6046 fmt_desc: The maximum number of entries the journal will write at
6050 # Target range for journal fullness
6051 - name: journal_throttle_low_threshhold
6056 - name: journal_throttle_high_threshhold
6061 # Multiple over expected at high_threshhold. Defaults to 0 (disabled).
6062 - name: journal_throttle_high_multiple
6067 # Multiple over expected at max. Defaults to 0 (disabled).
6068 - name: journal_throttle_max_multiple
6073 # align data payloads >= this.
6074 - name: journal_align_min_size
6078 fmt_desc: Align data payloads greater than the specified minimum.
6080 - name: journal_replay_from
6085 - name: journal_zero_on_create
6090 Causes the file store to overwrite the entire journal with
6091 ``0``'s during ``mkfs``.
6093 # assume journal is not corrupt
6094 - name: journal_ignore_corruption
6099 # using ssd disk as journal, whether support discard nouse journal-data.
6100 - name: journal_discard
6105 # fio data directory for fio-objectstore
6111 - name: rados_mon_op_timeout
6114 desc: timeout for operations handled by monitors such as statfs (0 is unlimited)
6119 - name: rados_osd_op_timeout
6122 desc: timeout for operations handled by osds such as write (0 is unlimited)
6127 # true if LTTng-UST tracepoints should be enabled
6128 - name: rados_tracing
6133 - name: mgr_connect_retry_interval
6139 - name: mgr_client_service_daemon_unregister_timeout
6142 desc: Time to wait during shutdown to deregister service with mgr
6144 - name: throttler_perf_counter
6149 - name: event_tracing
6154 - name: bluestore_tracing
6157 desc: Enable bluestore event tracing.
6159 - name: bluestore_throttle_trace_rate
6162 desc: Rate at which to sample bluestore transactions (per second)
6164 - name: debug_deliberately_leak_memory
6169 - name: debug_asserts_on_shutdown
6172 desc: Enable certain asserts to check for refcounting bugs on shutdown; see http://tracker.ceph.com/issues/21738
6174 - name: debug_asok_assert_abort
6177 desc: allow commands 'assert' and 'abort' via asok for testing crash dumps etc
6180 - name: target_max_misplaced_ratio
6183 desc: Max ratio of misplaced objects to target when throttling data rebalancing
6186 - name: device_failure_prediction_mode
6189 desc: Method used to predict device failures
6190 long_desc: To disable prediction, use 'none', 'local' uses a prediction model that
6191 runs inside the mgr daemon. 'cloud' will share metrics with a cloud service and
6192 query the service for devicelife expectancy.
6200 - name: gss_ktab_client_file
6203 desc: GSS/KRB5 Keytab file for client authentication
6204 long_desc: This sets the full path for the GSS/Kerberos client keytab file location.
6205 default: /var/lib/ceph/$name/gss_client_$name.ktab
6209 - name: gss_target_name
6212 long_desc: This sets the gss target service name.
6217 - name: debug_disable_randomized_ping
6220 desc: Disable heartbeat ping randomization for testing purposes
6222 - name: debug_heartbeat_testing_span
6225 desc: Override 60 second periods for testing only
6227 - name: librados_thread_count
6230 desc: Size of thread pool for Objecter
6235 - name: osd_asio_thread_count
6238 desc: Size of thread pool for ASIO completions
6243 - name: cephsqlite_lock_renewal_interval
6246 desc: number of milliseconds before lock is renewed
6251 - cephsqlite_lock_renewal_timeout
6253 - name: cephsqlite_lock_renewal_timeout
6256 desc: number of milliseconds before transaction lock times out
6257 long_desc: The amount of time before a running libcephsqlite VFS connection has
6258 to renew a lock on the database before the lock is automatically lost. If the
6259 lock is lost, the VFS will abort the process to prevent database corruption.
6264 - cephsqlite_lock_renewal_interval
6266 - name: cephsqlite_blocklist_dead_locker
6269 desc: blocklist the last dead owner of the database lock
6270 long_desc: Require that the Ceph SQLite VFS blocklist the last dead owner of the
6271 database when cleanup was incomplete. DO NOT CHANGE THIS UNLESS YOU UNDERSTAND
6272 THE RAMIFICATIONS. CORRUPTION MAY RESULT.
6279 desc: Explicitly set the device type to select the driver if it's needed
6285 - name: bluestore_cleaner_sleep_interval
6288 desc: How long cleaner should sleep before re-checking utilization
6291 - name: jaeger_tracing_enable
6294 desc: Ceph should use jaeger tracing system
6300 - name: mgr_ttl_cache_expire_seconds
6303 desc: Set the time to live in seconds - set to 0 to disable the cache.