Imported Upstream version 1.9.0+dfsg1

[rustc.git] / src / jemalloc / doc / jemalloc.xml.in
diff --git a/src/jemalloc/doc/jemalloc.xml.in b/src/jemalloc/doc/jemalloc.xml.in

index 1f692f78b923a53f7924bb2a481109011c669fca..bc5dbd1d7a9f60930ce92dc3c1ae0941320088d6 100644 (file)
--- a/src/jemalloc/doc/jemalloc.xml.in
+++ b/src/jemalloc/doc/jemalloc.xml.in
@@ -242,7 +242,7 @@
        relevant.  Use bitwise or (<code language="C">|</code>) operations to
        specify one or more of the following:
          <variablelist>
-          <varlistentry>
+          <varlistentry id="MALLOCX_LG_ALIGN">
              <term><constant>MALLOCX_LG_ALIGN(<parameter>la</parameter>)
              </constant></term>
  
@@ -252,7 +252,7 @@
              that <parameter>la</parameter> is within the valid
              range.</para></listitem>
            </varlistentry>
-          <varlistentry>
+          <varlistentry id="MALLOCX_ALIGN">
              <term><constant>MALLOCX_ALIGN(<parameter>a</parameter>)
              </constant></term>
  
@@ -262,7 +262,7 @@
              validate that <parameter>a</parameter> is a power of 2.
              </para></listitem>
            </varlistentry>
-          <varlistentry>
+          <varlistentry id="MALLOCX_ZERO">
              <term><constant>MALLOCX_ZERO</constant></term>
  
              <listitem><para>Initialize newly allocated memory to contain zero
@@ -271,16 +271,38 @@
              that are initialized to contain zero bytes.  If this macro is
              absent, newly allocated memory is uninitialized.</para></listitem>
            </varlistentry>
-          <varlistentry>
+          <varlistentry id="MALLOCX_TCACHE">
+            <term><constant>MALLOCX_TCACHE(<parameter>tc</parameter>)
+            </constant></term>
+
+            <listitem><para>Use the thread-specific cache (tcache) specified by
+            the identifier <parameter>tc</parameter>, which must have been
+            acquired via the <link
+            linkend="tcache.create"><mallctl>tcache.create</mallctl></link>
+            mallctl.  This macro does not validate that
+            <parameter>tc</parameter> specifies a valid
+            identifier.</para></listitem>
+          </varlistentry>
+          <varlistentry id="MALLOC_TCACHE_NONE">
+            <term><constant>MALLOCX_TCACHE_NONE</constant></term>
+
+            <listitem><para>Do not use a thread-specific cache (tcache).  Unless
+            <constant>MALLOCX_TCACHE(<parameter>tc</parameter>)</constant> or
+            <constant>MALLOCX_TCACHE_NONE</constant> is specified, an
+            automatically managed tcache will be used under many circumstances.
+            This macro cannot be used in the same <parameter>flags</parameter>
+            argument as
+            <constant>MALLOCX_TCACHE(<parameter>tc</parameter>)</constant>.</para></listitem>
+          </varlistentry>
+          <varlistentry id="MALLOCX_ARENA">
              <term><constant>MALLOCX_ARENA(<parameter>a</parameter>)
              </constant></term>
  
              <listitem><para>Use the arena specified by the index
-            <parameter>a</parameter> (and by necessity bypass the thread
-            cache).  This macro has no effect for regions that were allocated
-            via an arena other than the one specified.  This macro does not
-            validate that <parameter>a</parameter> specifies an arena index in
-            the valid range.</para></listitem>
+            <parameter>a</parameter>.  This macro has no effect for regions that
+            were allocated via an arena other than the one specified.  This
+            macro does not validate that <parameter>a</parameter> specifies an
+            arena index in the valid range.</para></listitem>
            </varlistentry>
          </variablelist>
        </para>
@@ -288,16 +310,14 @@
        <para>The <function>mallocx<parameter/></function> function allocates at
        least <parameter>size</parameter> bytes of memory, and returns a pointer
        to the base address of the allocation.  Behavior is undefined if
-      <parameter>size</parameter> is <constant>0</constant>, or if request size
-      overflows due to size class and/or alignment constraints.</para>
+      <parameter>size</parameter> is <constant>0</constant>.</para>
  
        <para>The <function>rallocx<parameter/></function> function resizes the
        allocation at <parameter>ptr</parameter> to be at least
        <parameter>size</parameter> bytes, and returns a pointer to the base
        address of the resulting allocation, which may or may not have moved from
        its original location.  Behavior is undefined if
-      <parameter>size</parameter> is <constant>0</constant>, or if request size
-      overflows due to size class and/or alignment constraints.</para>
+      <parameter>size</parameter> is <constant>0</constant>.</para>
  
        <para>The <function>xallocx<parameter/></function> function resizes the
        allocation at <parameter>ptr</parameter> in place to be at least
@@ -332,10 +352,10 @@
        memory, but it performs the same size computation as the
        <function>mallocx<parameter/></function> function, and returns the real
        size of the allocation that would result from the equivalent
-      <function>mallocx<parameter/></function> function call.  Behavior is
-      undefined if <parameter>size</parameter> is <constant>0</constant>, or if
-      request size overflows due to size class and/or alignment
-      constraints.</para>
+      <function>mallocx<parameter/></function> function call, or
+      <constant>0</constant> if the inputs exceed the maximum supported size
+      class and/or alignment.  Behavior is undefined if
+      <parameter>size</parameter> is <constant>0</constant>.</para>
  
        <para>The <function>mallctl<parameter/></function> function provides a
        general interface for introspecting the memory allocator, as well as
@@ -406,11 +426,12 @@ for (i = 0; i < nbins; i++) {
        functions simultaneously.  If <option>--enable-stats</option> is
        specified during configuration, &ldquo;m&rdquo; and &ldquo;a&rdquo; can
        be specified to omit merged arena and per arena statistics, respectively;
-      &ldquo;b&rdquo; and &ldquo;l&rdquo; can be specified to omit per size
-      class statistics for bins and large objects, respectively.  Unrecognized
-      characters are silently ignored.  Note that thread caching may prevent
-      some statistics from being completely up to date, since extra locking
-      would be required to merge counters that track thread cache operations.
+      &ldquo;b&rdquo;, &ldquo;l&rdquo;, and &ldquo;h&rdquo; can be specified to
+      omit per size class statistics for bins, large objects, and huge objects,
+      respectively.  Unrecognized characters are silently ignored.  Note that
+      thread caching may prevent some statistics from being completely up to
+      date, since extra locking would be required to merge counters that track
+      thread cache operations.
        </para>
  
        <para>The <function>malloc_usable_size<parameter/></function> function
@@ -432,19 +453,20 @@ for (i = 0; i < nbins; i++) {
      routines, the allocator initializes its internals based in part on various
      options that can be specified at compile- or run-time.</para>
  
-    <para>The string pointed to by the global variable
-    <varname>malloc_conf</varname>, the &ldquo;name&rdquo; of the file
-    referenced by the symbolic link named <filename
-    class="symlink">/etc/malloc.conf</filename>, and the value of the
+    <para>The string specified via <option>--with-malloc-conf</option>, the
+    string pointed to by the global variable <varname>malloc_conf</varname>, the
+    &ldquo;name&rdquo; of the file referenced by the symbolic link named
+    <filename class="symlink">/etc/malloc.conf</filename>, and the value of the
      environment variable <envar>MALLOC_CONF</envar>, will be interpreted, in
      that order, from left to right as options.  Note that
      <varname>malloc_conf</varname> may be read before
      <function>main<parameter/></function> is entered, so the declaration of
      <varname>malloc_conf</varname> should specify an initializer that contains
-    the final value to be read by jemalloc.  <varname>malloc_conf</varname> is
-    a compile-time setting, whereas <filename
-    class="symlink">/etc/malloc.conf</filename> and <envar>MALLOC_CONF</envar>
-    can be safely set any time prior to program invocation.</para>
+    the final value to be read by jemalloc.  <option>--with-malloc-conf</option>
+    and <varname>malloc_conf</varname> are compile-time mechanisms, whereas
+    <filename class="symlink">/etc/malloc.conf</filename> and
+    <envar>MALLOC_CONF</envar> can be safely set any time prior to program
+    invocation.</para>
  
      <para>An options string is a comma-separated list of option:value pairs.
      There is one key corresponding to each <link
@@ -494,39 +516,32 @@ for (i = 0; i < nbins; i++) {
      common case, but it increases memory usage and fragmentation, since a
      bounded number of objects can remain allocated in each thread cache.</para>
  
-    <para>Memory is conceptually broken into equal-sized chunks, where the
-    chunk size is a power of two that is greater than the page size.  Chunks
-    are always aligned to multiples of the chunk size.  This alignment makes it
-    possible to find metadata for user objects very quickly.</para>
-
-    <para>User objects are broken into three categories according to size:
-    small, large, and huge.  Small objects are smaller than one page.  Large
-    objects are smaller than the chunk size.  Huge objects are a multiple of
-    the chunk size.  Small and large objects are managed entirely by arenas;
-    huge objects are additionally aggregated in a single data structure that is
-    shared by all threads.  Huge objects are typically used by applications
-    infrequently enough that this single data structure is not a scalability
-    issue.</para>
-
-    <para>Each chunk that is managed by an arena tracks its contents as runs of
+    <para>Memory is conceptually broken into equal-sized chunks, where the chunk
+    size is a power of two that is greater than the page size.  Chunks are
+    always aligned to multiples of the chunk size.  This alignment makes it
+    possible to find metadata for user objects very quickly.  User objects are
+    broken into three categories according to size: small, large, and huge.
+    Multiple small and large objects can reside within a single chunk, whereas
+    huge objects each have one or more chunks backing them.  Each chunk that
+    contains small and/or large objects tracks its contents as runs of
      contiguous pages (unused, backing a set of small objects, or backing one
-    large object).  The combination of chunk alignment and chunk page maps
-    makes it possible to determine all metadata regarding small and large
-    allocations in constant time.</para>
+    large object).  The combination of chunk alignment and chunk page maps makes
+    it possible to determine all metadata regarding small and large allocations
+    in constant time.</para>
  
      <para>Small objects are managed in groups by page runs.  Each run maintains
-    a frontier and free list to track which regions are in use.  Allocation
-    requests that are no more than half the quantum (8 or 16, depending on
-    architecture) are rounded up to the nearest power of two that is at least
-    <code language="C">sizeof(<type>double</type>)</code>.  All other small
-    object size classes are multiples of the quantum, spaced such that internal
-    fragmentation is limited to approximately 25% for all but the smallest size
-    classes.  Allocation requests that are larger than the maximum small size
-    class, but small enough to fit in an arena-managed chunk (see the <link
-    linkend="opt.lg_chunk"><mallctl>opt.lg_chunk</mallctl></link> option), are
-    rounded up to the nearest run size.  Allocation requests that are too large
-    to fit in an arena-managed chunk are rounded up to the nearest multiple of
-    the chunk size.</para>
+    a bitmap to track which regions are in use.  Allocation requests that are no
+    more than half the quantum (8 or 16, depending on architecture) are rounded
+    up to the nearest power of two that is at least <code
+    language="C">sizeof(<type>double</type>)</code>.  All other object size
+    classes are multiples of the quantum, spaced such that there are four size
+    classes for each doubling in size, which limits internal fragmentation to
+    approximately 20% for all but the smallest size classes.  Small size classes
+    are smaller than four times the page size, large size classes are smaller
+    than the chunk size (see the <link
+    linkend="opt.lg_chunk"><mallctl>opt.lg_chunk</mallctl></link> option), and
+    huge size classes extend from the chunk size up to one size class less than
+    the full address space size.</para>
  
      <para>Allocations are packed tightly together, which can be an issue for
      multi-threaded applications.  If you need to assure that allocations do not
@@ -534,8 +549,29 @@ for (i = 0; i < nbins; i++) {
      nearest multiple of the cacheline size, or specify cacheline alignment when
      allocating.</para>
  
-    <para>Assuming 4 MiB chunks, 4 KiB pages, and a 16-byte quantum on a 64-bit
-    system, the size classes in each category are as shown in <xref
+    <para>The <function>realloc<parameter/></function>,
+    <function>rallocx<parameter/></function>, and
+    <function>xallocx<parameter/></function> functions may resize allocations
+    without moving them under limited circumstances.  Unlike the
+    <function>*allocx<parameter/></function> API, the standard API does not
+    officially round up the usable size of an allocation to the nearest size
+    class, so technically it is necessary to call
+    <function>realloc<parameter/></function> to grow e.g. a 9-byte allocation to
+    16 bytes, or shrink a 16-byte allocation to 9 bytes.  Growth and shrinkage
+    trivially succeeds in place as long as the pre-size and post-size both round
+    up to the same size class.  No other API guarantees are made regarding
+    in-place resizing, but the current implementation also tries to resize large
+    and huge allocations in place, as long as the pre-size and post-size are
+    both large or both huge.  In such cases shrinkage always succeeds for large
+    size classes, but for huge size classes the chunk allocator must support
+    splitting (see <link
+    linkend="arena.i.chunk_hooks"><mallctl>arena.&lt;i&gt;.chunk_hooks</mallctl></link>).
+    Growth only succeeds if the trailing memory is currently available, and
+    additionally for huge size classes the chunk allocator must support
+    merging.</para>
+
+    <para>Assuming 2 MiB chunks, 4 KiB pages, and a 16-byte quantum on a
+    64-bit system, the size classes in each category are as shown in <xref
      linkend="size_classes" xrefstyle="template:Table %n"/>.</para>
  
      <table xml:id="size_classes" frame="all">
@@ -553,13 +589,13 @@ for (i = 0; i < nbins; i++) {
        </thead>
        <tbody>
          <row>
-          <entry morerows="6">Small</entry>
+          <entry morerows="8">Small</entry>
            <entry>lg</entry>
            <entry>[8]</entry>
          </row>
          <row>
            <entry>16</entry>
-          <entry>[16, 32, 48, ..., 128]</entry>
+          <entry>[16, 32, 48, 64, 80, 96, 112, 128]</entry>
          </row>
          <row>
            <entry>32</entry>
@@ -579,17 +615,77 @@ for (i = 0; i < nbins; i++) {
          </row>
          <row>
            <entry>512</entry>
-          <entry>[2560, 3072, 3584]</entry>
+          <entry>[2560, 3072, 3584, 4096]</entry>
+        </row>
+        <row>
+          <entry>1 KiB</entry>
+          <entry>[5 KiB, 6 KiB, 7 KiB, 8 KiB]</entry>
+        </row>
+        <row>
+          <entry>2 KiB</entry>
+          <entry>[10 KiB, 12 KiB, 14 KiB]</entry>
+        </row>
+        <row>
+          <entry morerows="7">Large</entry>
+          <entry>2 KiB</entry>
+          <entry>[16 KiB]</entry>
          </row>
          <row>
-          <entry>Large</entry>
            <entry>4 KiB</entry>
-          <entry>[4 KiB, 8 KiB, 12 KiB, ..., 4072 KiB]</entry>
+          <entry>[20 KiB, 24 KiB, 28 KiB, 32 KiB]</entry>
+        </row>
+        <row>
+          <entry>8 KiB</entry>
+          <entry>[40 KiB, 48 KiB, 54 KiB, 64 KiB]</entry>
+        </row>
+        <row>
+          <entry>16 KiB</entry>
+          <entry>[80 KiB, 96 KiB, 112 KiB, 128 KiB]</entry>
+        </row>
+        <row>
+          <entry>32 KiB</entry>
+          <entry>[160 KiB, 192 KiB, 224 KiB, 256 KiB]</entry>
+        </row>
+        <row>
+          <entry>64 KiB</entry>
+          <entry>[320 KiB, 384 KiB, 448 KiB, 512 KiB]</entry>
+        </row>
+        <row>
+          <entry>128 KiB</entry>
+          <entry>[640 KiB, 768 KiB, 896 KiB, 1 MiB]</entry>
+        </row>
+        <row>
+          <entry>256 KiB</entry>
+          <entry>[1280 KiB, 1536 KiB, 1792 KiB]</entry>
+        </row>
+        <row>
+          <entry morerows="6">Huge</entry>
+          <entry>256 KiB</entry>
+          <entry>[2 MiB]</entry>
+        </row>
+        <row>
+          <entry>512 KiB</entry>
+          <entry>[2560 KiB, 3 MiB, 3584 KiB, 4 MiB]</entry>
+        </row>
+        <row>
+          <entry>1 MiB</entry>
+          <entry>[5 MiB, 6 MiB, 7 MiB, 8 MiB]</entry>
+        </row>
+        <row>
+          <entry>2 MiB</entry>
+          <entry>[10 MiB, 12 MiB, 14 MiB, 16 MiB]</entry>
          </row>
          <row>
-          <entry>Huge</entry>
            <entry>4 MiB</entry>
-          <entry>[4 MiB, 8 MiB, 12 MiB, ...]</entry>
+          <entry>[20 MiB, 24 MiB, 28 MiB, 32 MiB]</entry>
+        </row>
+        <row>
+          <entry>8 MiB</entry>
+          <entry>[40 MiB, 48 MiB, 56 MiB, 64 MiB]</entry>
+        </row>
+        <row>
+          <entry>...</entry>
+          <entry>...</entry>
          </row>
        </tbody>
        </tgroup>
@@ -634,6 +730,16 @@ for (i = 0; i < nbins; i++) {
          detecting whether another thread caused a refresh.</para></listitem>
        </varlistentry>
  
+      <varlistentry id="config.cache_oblivious">
+        <term>
+          <mallctl>config.cache_oblivious</mallctl>
+          (<type>bool</type>)
+          <literal>r-</literal>
+        </term>
+        <listitem><para><option>--enable-cache-oblivious</option> was specified
+        during build configuration.</para></listitem>
+      </varlistentry>
+
        <varlistentry id="config.debug">
          <term>
            <mallctl>config.debug</mallctl>
@@ -664,6 +770,17 @@ for (i = 0; i < nbins; i++) {
          during build configuration.</para></listitem>
        </varlistentry>
  
+      <varlistentry id="config.malloc_conf">
+        <term>
+          <mallctl>config.malloc_conf</mallctl>
+          (<type>const char *</type>)
+          <literal>r-</literal>
+        </term>
+        <listitem><para>Embedded configure-time-specified run-time options
+        string, empty unless <option>--with-malloc-conf</option> was specified
+        during build configuration.</para></listitem>
+      </varlistentry>
+
        <varlistentry id="config.munmap">
          <term>
            <mallctl>config.munmap</mallctl>
@@ -810,14 +927,14 @@ for (i = 0; i < nbins; i++) {
          <listitem><para>Virtual memory chunk size (log base 2).  If a chunk
          size outside the supported size range is specified, the size is
          silently clipped to the minimum/maximum supported size.  The default
-        chunk size is 4 MiB (2^22).
+        chunk size is 2 MiB (2^21).
          </para></listitem>
        </varlistentry>
  
        <varlistentry id="opt.narenas">
          <term>
            <mallctl>opt.narenas</mallctl>
-          (<type>size_t</type>)
+          (<type>unsigned</type>)
            <literal>r-</literal>
          </term>
          <listitem><para>Maximum number of arenas to use for automatic
@@ -825,6 +942,20 @@ for (i = 0; i < nbins; i++) {
          number of CPUs, or one if there is a single CPU.</para></listitem>
        </varlistentry>
  
+      <varlistentry id="opt.purge">
+        <term>
+          <mallctl>opt.purge</mallctl>
+          (<type>const char *</type>)
+          <literal>r-</literal>
+        </term>
+        <listitem><para>Purge mode is &ldquo;ratio&rdquo; (default) or
+        &ldquo;decay&rdquo;.  See <link
+        linkend="opt.lg_dirty_mult"><mallctl>opt.lg_dirty_mult</mallctl></link>
+        for details of the ratio mode.  See <link
+        linkend="opt.decay_time"><mallctl>opt.decay_time</mallctl></link> for
+        details of the decay mode.</para></listitem>
+      </varlistentry>
+
        <varlistentry id="opt.lg_dirty_mult">
          <term>
            <mallctl>opt.lg_dirty_mult</mallctl>
@@ -840,7 +971,31 @@ for (i = 0; i < nbins; i++) {
          provides the kernel with sufficient information to recycle dirty pages
          if physical memory becomes scarce and the pages remain unused.  The
          default minimum ratio is 8:1 (2^3:1); an option value of -1 will
-        disable dirty page purging.</para></listitem>
+        disable dirty page purging.  See <link
+        linkend="arenas.lg_dirty_mult"><mallctl>arenas.lg_dirty_mult</mallctl></link>
+        and <link
+        linkend="arena.i.lg_dirty_mult"><mallctl>arena.&lt;i&gt;.lg_dirty_mult</mallctl></link>
+        for related dynamic control options.</para></listitem>
+      </varlistentry>
+
+      <varlistentry id="opt.decay_time">
+        <term>
+          <mallctl>opt.decay_time</mallctl>
+          (<type>ssize_t</type>)
+          <literal>r-</literal>
+        </term>
+        <listitem><para>Approximate time in seconds from the creation of a set
+        of unused dirty pages until an equivalent set of unused dirty pages is
+        purged and/or reused.  The pages are incrementally purged according to a
+        sigmoidal decay curve that starts and ends with zero purge rate.  A
+        decay time of 0 causes all unused dirty pages to be purged immediately
+        upon creation.  A decay time of -1 disables purging.  The default decay
+        time is 10 seconds.  See <link
+        linkend="arenas.decay_time"><mallctl>arenas.decay_time</mallctl></link>
+        and <link
+        linkend="arena.i.decay_time"><mallctl>arena.&lt;i&gt;.decay_time</mallctl></link>
+        for related dynamic control options.
+        </para></listitem>
        </varlistentry>
  
        <varlistentry id="opt.stats_print">
@@ -857,26 +1012,34 @@ for (i = 0; i < nbins; i++) {
          <option>--enable-stats</option> is specified during configuration, this
          has the potential to cause deadlock for a multi-threaded process that
          exits while one or more threads are executing in the memory allocation
-        functions.  Therefore, this option should only be used with care; it is
-        primarily intended as a performance tuning aid during application
+        functions.  Furthermore, <function>atexit<parameter/></function> may
+        allocate memory during application initialization and then deadlock
+        internally when jemalloc in turn calls
+        <function>atexit<parameter/></function>, so this option is not
+        univerally usable (though the application can register its own
+        <function>atexit<parameter/></function> function with equivalent
+        functionality).  Therefore, this option should only be used with care;
+        it is primarily intended as a performance tuning aid during application
          development.  This option is disabled by default.</para></listitem>
        </varlistentry>
  
        <varlistentry id="opt.junk">
          <term>
            <mallctl>opt.junk</mallctl>
-          (<type>bool</type>)
+          (<type>const char *</type>)
            <literal>r-</literal>
            [<option>--enable-fill</option>]
          </term>
-        <listitem><para>Junk filling enabled/disabled.  If enabled, each byte
-        of uninitialized allocated memory will be initialized to
-        <literal>0xa5</literal>.  All deallocated memory will be initialized to
-        <literal>0x5a</literal>.  This is intended for debugging and will
-        impact performance negatively.  This option is disabled by default
-        unless <option>--enable-debug</option> is specified during
-        configuration, in which case it is enabled by default unless running
-        inside <ulink
+        <listitem><para>Junk filling.  If set to "alloc", each byte of
+        uninitialized allocated memory will be initialized to
+        <literal>0xa5</literal>.  If set to "free", all deallocated memory will
+        be initialized to <literal>0x5a</literal>.  If set to "true", both
+        allocated and deallocated memory will be initialized, and if set to
+        "false", junk filling be disabled entirely.  This is intended for
+        debugging and will impact performance negatively.  This option is
+        "false" by default unless <option>--enable-debug</option> is specified
+        during configuration, in which case it is "true" by default unless
+        running inside <ulink
          url="http://valgrind.org/">Valgrind</ulink>.</para></listitem>
        </varlistentry>
  
@@ -977,12 +1140,11 @@ malloc_conf = "xmalloc:true";]]></programlisting>
            <literal>r-</literal>
            [<option>--enable-tcache</option>]
          </term>
-        <listitem><para>Thread-specific caching enabled/disabled.  When there
-        are multiple threads, each thread uses a thread-specific cache for
-        objects up to a certain size.  Thread-specific caching allows many
-        allocations to be satisfied without performing any thread
-        synchronization, at the cost of increased memory use.  See the
-        <link
+        <listitem><para>Thread-specific caching (tcache) enabled/disabled.  When
+        there are multiple threads, each thread uses a tcache for objects up to
+        a certain size.  Thread-specific caching allows many allocations to be
+        satisfied without performing any thread synchronization, at the cost of
+        increased memory use.  See the <link
          linkend="opt.lg_tcache_max"><mallctl>opt.lg_tcache_max</mallctl></link>
          option for related tuning information.  This option is enabled by
          default unless running inside <ulink
@@ -998,8 +1160,8 @@ malloc_conf = "xmalloc:true";]]></programlisting>
            [<option>--enable-tcache</option>]
          </term>
          <listitem><para>Maximum size class (log base 2) to cache in the
-        thread-specific cache.  At a minimum, all small size classes are
-        cached, and at a maximum all large size classes are cached.  The
+        thread-specific cache (tcache).  At a minimum, all small size classes
+        are cached, and at a maximum all large size classes are cached.  The
          default maximum is 32 KiB (2^15).</para></listitem>
        </varlistentry>
  
@@ -1024,9 +1186,11 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          option for information on high-water-triggered profile dumping, and the
          <link linkend="opt.prof_final"><mallctl>opt.prof_final</mallctl></link>
          option for final profile dumping.  Profile output is compatible with
-        the included <command>pprof</command> Perl script, which originates
-        from the <ulink url="http://code.google.com/p/gperftools/">gperftools
-        package</ulink>.</para></listitem>
+        the <command>jeprof</command> command, which is based on the
+        <command>pprof</command> that is developed as part of the <ulink
+        url="http://code.google.com/p/gperftools/">gperftools
+        package</ulink>.  See <link linkend="heap_profile_format">HEAP PROFILE
+        FORMAT</link> for heap profile format documentation.</para></listitem>
        </varlistentry>
  
        <varlistentry id="opt.prof_prefix">
@@ -1047,7 +1211,7 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          <term>
            <mallctl>opt.prof_active</mallctl>
            (<type>bool</type>)
-          <literal>rw</literal>
+          <literal>r-</literal>
            [<option>--enable-prof</option>]
          </term>
          <listitem><para>Profiling activated/deactivated.  This is a secondary
@@ -1132,13 +1296,11 @@ malloc_conf = "xmalloc:true";]]></programlisting>
            <literal>r-</literal>
            [<option>--enable-prof</option>]
          </term>
-        <listitem><para>Trigger a memory profile dump every time the total
-        virtual memory exceeds the previous maximum.  Profiles are dumped to
-        files named according to the pattern
-        <filename>&lt;prefix&gt;.&lt;pid&gt;.&lt;seq&gt;.u&lt;useq&gt;.heap</filename>,
-        where <literal>&lt;prefix&gt;</literal> is controlled by the <link
-        linkend="opt.prof_prefix"><mallctl>opt.prof_prefix</mallctl></link>
-        option.  This option is disabled by default.</para></listitem>
+        <listitem><para>Set the initial state of <link
+        linkend="prof.gdump"><mallctl>prof.gdump</mallctl></link>, which when
+        enabled triggers a memory profile dump every time the total virtual
+        memory exceeds the previous maximum.  This option is disabled by
+        default.</para></listitem>
        </varlistentry>
  
        <varlistentry id="opt.prof_final">
@@ -1155,7 +1317,13 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          <filename>&lt;prefix&gt;.&lt;pid&gt;.&lt;seq&gt;.f.heap</filename>,
          where <literal>&lt;prefix&gt;</literal> is controlled by the <link
          linkend="opt.prof_prefix"><mallctl>opt.prof_prefix</mallctl></link>
-        option.  This option is enabled by default.</para></listitem>
+        option.  Note that <function>atexit<parameter/></function> may allocate
+        memory during application initialization and then deadlock internally
+        when jemalloc in turn calls <function>atexit<parameter/></function>, so
+        this option is not univerally usable (though the application can
+        register its own <function>atexit<parameter/></function> function with
+        equivalent functionality).  This option is disabled by
+        default.</para></listitem>
        </varlistentry>
  
        <varlistentry id="opt.prof_leak">
@@ -1252,7 +1420,7 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          <listitem><para>Enable/disable calling thread's tcache.  The tcache is
          implicitly flushed as a side effect of becoming
          disabled (see <link
-        lenkend="thread.tcache.flush"><mallctl>thread.tcache.flush</mallctl></link>).
+        linkend="thread.tcache.flush"><mallctl>thread.tcache.flush</mallctl></link>).
          </para></listitem>
        </varlistentry>
  
@@ -1263,9 +1431,9 @@ malloc_conf = "xmalloc:true";]]></programlisting>
            <literal>--</literal>
            [<option>--enable-tcache</option>]
          </term>
-        <listitem><para>Flush calling thread's tcache.  This interface releases
-        all cached objects and internal data structures associated with the
-        calling thread's thread-specific cache.  Ordinarily, this interface
+        <listitem><para>Flush calling thread's thread-specific cache (tcache).
+        This interface releases all cached objects and internal data structures
+        associated with the calling thread's tcache.  Ordinarily, this interface
          need not be called, since automatic periodic incremental garbage
          collection occurs, and the thread cache is automatically discarded when
          a thread exits.  However, garbage collection is triggered by allocation
@@ -1290,8 +1458,8 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          can cause asynchronous string deallocation.  Furthermore, each
          invocation of this interface can only read or write; simultaneous
          read/write is not supported due to string lifetime limitations.  The
-        name string must nil-terminated and comprised only of characters in the
-        sets recognized
+        name string must be nil-terminated and comprised only of characters in
+        the sets recognized
          by <citerefentry><refentrytitle>isgraph</refentrytitle>
          <manvolnum>3</manvolnum></citerefentry> and
          <citerefentry><refentrytitle>isblank</refentrytitle>
@@ -1312,18 +1480,76 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          default.</para></listitem>
        </varlistentry>
  
+      <varlistentry id="tcache.create">
+        <term>
+          <mallctl>tcache.create</mallctl>
+          (<type>unsigned</type>)
+          <literal>r-</literal>
+          [<option>--enable-tcache</option>]
+        </term>
+        <listitem><para>Create an explicit thread-specific cache (tcache) and
+        return an identifier that can be passed to the <link
+        linkend="MALLOCX_TCACHE"><constant>MALLOCX_TCACHE(<parameter>tc</parameter>)</constant></link>
+        macro to explicitly use the specified cache rather than the
+        automatically managed one that is used by default.  Each explicit cache
+        can be used by only one thread at a time; the application must assure
+        that this constraint holds.
+        </para></listitem>
+      </varlistentry>
+
+      <varlistentry id="tcache.flush">
+        <term>
+          <mallctl>tcache.flush</mallctl>
+          (<type>unsigned</type>)
+          <literal>-w</literal>
+          [<option>--enable-tcache</option>]
+        </term>
+        <listitem><para>Flush the specified thread-specific cache (tcache).  The
+        same considerations apply to this interface as to <link
+        linkend="thread.tcache.flush"><mallctl>thread.tcache.flush</mallctl></link>,
+        except that the tcache will never be automatically discarded.
+        </para></listitem>
+      </varlistentry>
+
+      <varlistentry id="tcache.destroy">
+        <term>
+          <mallctl>tcache.destroy</mallctl>
+          (<type>unsigned</type>)
+          <literal>-w</literal>
+          [<option>--enable-tcache</option>]
+        </term>
+        <listitem><para>Flush the specified thread-specific cache (tcache) and
+        make the identifier available for use during a future tcache creation.
+        </para></listitem>
+      </varlistentry>
+
        <varlistentry id="arena.i.purge">
          <term>
            <mallctl>arena.&lt;i&gt;.purge</mallctl>
            (<type>void</type>)
            <literal>--</literal>
          </term>
-        <listitem><para>Purge unused dirty pages for arena &lt;i&gt;, or for
+        <listitem><para>Purge all unused dirty pages for arena &lt;i&gt;, or for
          all arenas if &lt;i&gt; equals <link
          linkend="arenas.narenas"><mallctl>arenas.narenas</mallctl></link>.
          </para></listitem>
        </varlistentry>
  
+      <varlistentry id="arena.i.decay">
+        <term>
+          <mallctl>arena.&lt;i&gt;.decay</mallctl>
+          (<type>void</type>)
+          <literal>--</literal>
+        </term>
+        <listitem><para>Trigger decay-based purging of unused dirty pages for
+        arena &lt;i&gt;, or for all arenas if &lt;i&gt; equals <link
+        linkend="arenas.narenas"><mallctl>arenas.narenas</mallctl></link>.
+        The proportion of unused dirty pages to be purged depends on the current
+        time; see <link
+        linkend="opt.decay_time"><mallctl>opt.decay_time</mallctl></link> for
+        details.</para></listitem>
+      </varlistentry>
+
        <varlistentry id="arena.i.dss">
          <term>
            <mallctl>arena.&lt;i&gt;.dss</mallctl>
@@ -1338,75 +1564,233 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          settings.</para></listitem>
        </varlistentry>
  
-      <varlistentry id="arena.i.chunk.alloc">
+      <varlistentry id="arena.i.lg_dirty_mult">
          <term>
-          <mallctl>arena.&lt;i&gt;.chunk.alloc</mallctl>
-          (<type>chunk_alloc_t *</type>)
+          <mallctl>arena.&lt;i&gt;.lg_dirty_mult</mallctl>
+          (<type>ssize_t</type>)
            <literal>rw</literal>
          </term>
-        <listitem><para>Get or set the chunk allocation function for arena
-        &lt;i&gt;.  If setting, the chunk deallocation function should
-        also be set via <link linkend="arena.i.chunk.dalloc">
-        <mallctl>arena.&lt;i&gt;.chunk.dalloc</mallctl></link> to a companion
-        function that knows how to deallocate the chunks.
+        <listitem><para>Current per-arena minimum ratio (log base 2) of active
+        to dirty pages for arena &lt;i&gt;.  Each time this interface is set and
+        the ratio is increased, pages are synchronously purged as necessary to
+        impose the new ratio.  See <link
+        linkend="opt.lg_dirty_mult"><mallctl>opt.lg_dirty_mult</mallctl></link>
+        for additional information.</para></listitem>
+      </varlistentry>
+
+      <varlistentry id="arena.i.decay_time">
+        <term>
+          <mallctl>arena.&lt;i&gt;.decay_time</mallctl>
+          (<type>ssize_t</type>)
+          <literal>rw</literal>
+        </term>
+        <listitem><para>Current per-arena approximate time in seconds from the
+        creation of a set of unused dirty pages until an equivalent set of
+        unused dirty pages is purged and/or reused.  Each time this interface is
+        set, all currently unused dirty pages are considered to have fully
+        decayed, which causes immediate purging of all unused dirty pages unless
+        the decay time is set to -1 (i.e. purging disabled).  See <link
+        linkend="opt.decay_time"><mallctl>opt.decay_time</mallctl></link> for
+        additional information.</para></listitem>
+      </varlistentry>
+
+      <varlistentry id="arena.i.chunk_hooks">
+        <term>
+          <mallctl>arena.&lt;i&gt;.chunk_hooks</mallctl>
+          (<type>chunk_hooks_t</type>)
+          <literal>rw</literal>
+        </term>
+        <listitem><para>Get or set the chunk management hook functions for arena
+        &lt;i&gt;.  The functions must be capable of operating on all extant
+        chunks associated with arena &lt;i&gt;, usually by passing unknown
+        chunks to the replaced functions.  In practice, it is feasible to
+        control allocation for arenas created via <link
+        linkend="arenas.extend"><mallctl>arenas.extend</mallctl></link> such
+        that all chunks originate from an application-supplied chunk allocator
+        (by setting custom chunk hook functions just after arena creation), but
+        the automatically created arenas may have already created chunks prior
+        to the application having an opportunity to take over chunk
+        allocation.</para>
+
+        <programlisting language="C"><![CDATA[
+typedef struct {
+       chunk_alloc_t           *alloc;
+       chunk_dalloc_t          *dalloc;
+       chunk_commit_t          *commit;
+       chunk_decommit_t        *decommit;
+       chunk_purge_t           *purge;
+       chunk_split_t           *split;
+       chunk_merge_t           *merge;
+} chunk_hooks_t;]]></programlisting>
+        <para>The <type>chunk_hooks_t</type> structure comprises function
+        pointers which are described individually below.  jemalloc uses these
+        functions to manage chunk lifetime, which starts off with allocation of
+        mapped committed memory, in the simplest case followed by deallocation.
+        However, there are performance and platform reasons to retain chunks for
+        later reuse.  Cleanup attempts cascade from deallocation to decommit to
+        purging, which gives the chunk management functions opportunities to
+        reject the most permanent cleanup operations in favor of less permanent
+        (and often less costly) operations.  The chunk splitting and merging
+        operations can also be opted out of, but this is mainly intended to
+        support platforms on which virtual memory mappings provided by the
+        operating system kernel do not automatically coalesce and split, e.g.
+        Windows.</para>
+
          <funcsynopsis><funcprototype>
            <funcdef>typedef void *<function>(chunk_alloc_t)</function></funcdef>
            <paramdef>void *<parameter>chunk</parameter></paramdef>
            <paramdef>size_t <parameter>size</parameter></paramdef>
            <paramdef>size_t <parameter>alignment</parameter></paramdef>
            <paramdef>bool *<parameter>zero</parameter></paramdef>
+          <paramdef>bool *<parameter>commit</parameter></paramdef>
            <paramdef>unsigned <parameter>arena_ind</parameter></paramdef>
          </funcprototype></funcsynopsis>
-        A chunk allocation function conforms to the <type>chunk_alloc_t</type>
-        type and upon success returns a pointer to <parameter>size</parameter>
-        bytes of memory on behalf of arena <parameter>arena_ind</parameter> such
-        that the chunk's base address is a multiple of
-        <parameter>alignment</parameter>, as well as setting
-        <parameter>*zero</parameter> to indicate whether the chunk is zeroed.
-        Upon error the function returns <constant>NULL</constant> and leaves
-        <parameter>*zero</parameter> unmodified.  The
+        <literallayout></literallayout>
+        <para>A chunk allocation function conforms to the
+        <type>chunk_alloc_t</type> type and upon success returns a pointer to
+        <parameter>size</parameter> bytes of mapped memory on behalf of arena
+        <parameter>arena_ind</parameter> such that the chunk's base address is a
+        multiple of <parameter>alignment</parameter>, as well as setting
+        <parameter>*zero</parameter> to indicate whether the chunk is zeroed and
+        <parameter>*commit</parameter> to indicate whether the chunk is
+        committed.  Upon error the function returns <constant>NULL</constant>
+        and leaves <parameter>*zero</parameter> and
+        <parameter>*commit</parameter> unmodified.  The
          <parameter>size</parameter> parameter is always a multiple of the chunk
          size.  The <parameter>alignment</parameter> parameter is always a power
          of two at least as large as the chunk size.  Zeroing is mandatory if
-        <parameter>*zero</parameter> is true upon function entry.  If
-        <parameter>chunk</parameter> is not <constant>NULL</constant>, the
-        returned pointer must be <parameter>chunk</parameter> or
-        <constant>NULL</constant> if it could not be allocated.</para>
-
-        <para>Note that replacing the default chunk allocation function makes
-        the arena's <link
+        <parameter>*zero</parameter> is true upon function entry.  Committing is
+        mandatory if <parameter>*commit</parameter> is true upon function entry.
+        If <parameter>chunk</parameter> is not <constant>NULL</constant>, the
+        returned pointer must be <parameter>chunk</parameter> on success or
+        <constant>NULL</constant> on error.  Committed memory may be committed
+        in absolute terms as on a system that does not overcommit, or in
+        implicit terms as on a system that overcommits and satisfies physical
+        memory needs on demand via soft page faults.  Note that replacing the
+        default chunk allocation function makes the arena's <link
          linkend="arena.i.dss"><mallctl>arena.&lt;i&gt;.dss</mallctl></link>
-        setting irrelevant.</para></listitem>
-      </varlistentry>
+        setting irrelevant.</para>
  
-      <varlistentry id="arena.i.chunk.dalloc">
-        <term>
-          <mallctl>arena.&lt;i&gt;.chunk.dalloc</mallctl>
-          (<type>chunk_dalloc_t *</type>)
-          <literal>rw</literal>
-        </term>
-        <listitem><para>Get or set the chunk deallocation function for arena
-        &lt;i&gt;.  If setting, the chunk deallocation function must
-        be capable of deallocating all extant chunks associated with arena
-        &lt;i&gt;, usually by passing unknown chunks to the deallocation
-        function that was replaced.  In practice, it is feasible to control
-        allocation for arenas created via <link
-        linkend="arenas.extend"><mallctl>arenas.extend</mallctl></link> such
-        that all chunks originate from an application-supplied chunk allocator
-        (by setting custom chunk allocation/deallocation functions just after
-        arena creation), but the automatically created arenas may have already
-        created chunks prior to the application having an opportunity to take
-        over chunk allocation.
          <funcsynopsis><funcprototype>
-          <funcdef>typedef void <function>(chunk_dalloc_t)</function></funcdef>
+          <funcdef>typedef bool <function>(chunk_dalloc_t)</function></funcdef>
            <paramdef>void *<parameter>chunk</parameter></paramdef>
            <paramdef>size_t <parameter>size</parameter></paramdef>
+          <paramdef>bool <parameter>committed</parameter></paramdef>
            <paramdef>unsigned <parameter>arena_ind</parameter></paramdef>
          </funcprototype></funcsynopsis>
+        <literallayout></literallayout>
+        <para>
          A chunk deallocation function conforms to the
          <type>chunk_dalloc_t</type> type and deallocates a
-        <parameter>chunk</parameter> of given <parameter>size</parameter> on
-        behalf of arena <parameter>arena_ind</parameter>.</para></listitem>
+        <parameter>chunk</parameter> of given <parameter>size</parameter> with
+        <parameter>committed</parameter>/decommited memory as indicated, on
+        behalf of arena <parameter>arena_ind</parameter>, returning false upon
+        success.  If the function returns true, this indicates opt-out from
+        deallocation; the virtual memory mapping associated with the chunk
+        remains mapped, in the same commit state, and available for future use,
+        in which case it will be automatically retained for later reuse.</para>
+
+        <funcsynopsis><funcprototype>
+          <funcdef>typedef bool <function>(chunk_commit_t)</function></funcdef>
+          <paramdef>void *<parameter>chunk</parameter></paramdef>
+          <paramdef>size_t <parameter>size</parameter></paramdef>
+          <paramdef>size_t <parameter>offset</parameter></paramdef>
+          <paramdef>size_t <parameter>length</parameter></paramdef>
+          <paramdef>unsigned <parameter>arena_ind</parameter></paramdef>
+        </funcprototype></funcsynopsis>
+        <literallayout></literallayout>
+        <para>A chunk commit function conforms to the
+        <type>chunk_commit_t</type> type and commits zeroed physical memory to
+        back pages within a <parameter>chunk</parameter> of given
+        <parameter>size</parameter> at <parameter>offset</parameter> bytes,
+        extending for <parameter>length</parameter> on behalf of arena
+        <parameter>arena_ind</parameter>, returning false upon success.
+        Committed memory may be committed in absolute terms as on a system that
+        does not overcommit, or in implicit terms as on a system that
+        overcommits and satisfies physical memory needs on demand via soft page
+        faults. If the function returns true, this indicates insufficient
+        physical memory to satisfy the request.</para>
+
+        <funcsynopsis><funcprototype>
+          <funcdef>typedef bool <function>(chunk_decommit_t)</function></funcdef>
+          <paramdef>void *<parameter>chunk</parameter></paramdef>
+          <paramdef>size_t <parameter>size</parameter></paramdef>
+          <paramdef>size_t <parameter>offset</parameter></paramdef>
+          <paramdef>size_t <parameter>length</parameter></paramdef>
+          <paramdef>unsigned <parameter>arena_ind</parameter></paramdef>
+        </funcprototype></funcsynopsis>
+        <literallayout></literallayout>
+        <para>A chunk decommit function conforms to the
+        <type>chunk_decommit_t</type> type and decommits any physical memory
+        that is backing pages within a <parameter>chunk</parameter> of given
+        <parameter>size</parameter> at <parameter>offset</parameter> bytes,
+        extending for <parameter>length</parameter> on behalf of arena
+        <parameter>arena_ind</parameter>, returning false upon success, in which
+        case the pages will be committed via the chunk commit function before
+        being reused.  If the function returns true, this indicates opt-out from
+        decommit; the memory remains committed and available for future use, in
+        which case it will be automatically retained for later reuse.</para>
+
+        <funcsynopsis><funcprototype>
+          <funcdef>typedef bool <function>(chunk_purge_t)</function></funcdef>
+          <paramdef>void *<parameter>chunk</parameter></paramdef>
+          <paramdef>size_t<parameter>size</parameter></paramdef>
+          <paramdef>size_t <parameter>offset</parameter></paramdef>
+          <paramdef>size_t <parameter>length</parameter></paramdef>
+          <paramdef>unsigned <parameter>arena_ind</parameter></paramdef>
+        </funcprototype></funcsynopsis>
+        <literallayout></literallayout>
+        <para>A chunk purge function conforms to the <type>chunk_purge_t</type>
+        type and optionally discards physical pages within the virtual memory
+        mapping associated with <parameter>chunk</parameter> of given
+        <parameter>size</parameter> at <parameter>offset</parameter> bytes,
+        extending for <parameter>length</parameter> on behalf of arena
+        <parameter>arena_ind</parameter>, returning false if pages within the
+        purged virtual memory range will be zero-filled the next time they are
+        accessed.</para>
+
+        <funcsynopsis><funcprototype>
+          <funcdef>typedef bool <function>(chunk_split_t)</function></funcdef>
+          <paramdef>void *<parameter>chunk</parameter></paramdef>
+          <paramdef>size_t <parameter>size</parameter></paramdef>
+          <paramdef>size_t <parameter>size_a</parameter></paramdef>
+          <paramdef>size_t <parameter>size_b</parameter></paramdef>
+          <paramdef>bool <parameter>committed</parameter></paramdef>
+          <paramdef>unsigned <parameter>arena_ind</parameter></paramdef>
+        </funcprototype></funcsynopsis>
+        <literallayout></literallayout>
+        <para>A chunk split function conforms to the <type>chunk_split_t</type>
+        type and optionally splits <parameter>chunk</parameter> of given
+        <parameter>size</parameter> into two adjacent chunks, the first of
+        <parameter>size_a</parameter> bytes, and the second of
+        <parameter>size_b</parameter> bytes, operating on
+        <parameter>committed</parameter>/decommitted memory as indicated, on
+        behalf of arena <parameter>arena_ind</parameter>, returning false upon
+        success.  If the function returns true, this indicates that the chunk
+        remains unsplit and therefore should continue to be operated on as a
+        whole.</para>
+
+        <funcsynopsis><funcprototype>
+          <funcdef>typedef bool <function>(chunk_merge_t)</function></funcdef>
+          <paramdef>void *<parameter>chunk_a</parameter></paramdef>
+          <paramdef>size_t <parameter>size_a</parameter></paramdef>
+          <paramdef>void *<parameter>chunk_b</parameter></paramdef>
+          <paramdef>size_t <parameter>size_b</parameter></paramdef>
+          <paramdef>bool <parameter>committed</parameter></paramdef>
+          <paramdef>unsigned <parameter>arena_ind</parameter></paramdef>
+        </funcprototype></funcsynopsis>
+        <literallayout></literallayout>
+        <para>A chunk merge function conforms to the <type>chunk_merge_t</type>
+        type and optionally merges adjacent chunks,
+        <parameter>chunk_a</parameter> of given <parameter>size_a</parameter>
+        and <parameter>chunk_b</parameter> of given
+        <parameter>size_b</parameter> into one contiguous chunk, operating on
+        <parameter>committed</parameter>/decommitted memory as indicated, on
+        behalf of arena <parameter>arena_ind</parameter>, returning false upon
+        success.  If the function returns true, this indicates that the chunks
+        remain distinct mappings and therefore should continue to be operated on
+        independently.</para>
+        </listitem>
        </varlistentry>
  
        <varlistentry id="arenas.narenas">
@@ -1430,6 +1814,35 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          initialized.</para></listitem>
        </varlistentry>
  
+      <varlistentry id="arenas.lg_dirty_mult">
+        <term>
+          <mallctl>arenas.lg_dirty_mult</mallctl>
+          (<type>ssize_t</type>)
+          <literal>rw</literal>
+        </term>
+        <listitem><para>Current default per-arena minimum ratio (log base 2) of
+        active to dirty pages, used to initialize <link
+        linkend="arena.i.lg_dirty_mult"><mallctl>arena.&lt;i&gt;.lg_dirty_mult</mallctl></link>
+        during arena creation.  See <link
+        linkend="opt.lg_dirty_mult"><mallctl>opt.lg_dirty_mult</mallctl></link>
+        for additional information.</para></listitem>
+      </varlistentry>
+
+      <varlistentry id="arenas.decay_time">
+        <term>
+          <mallctl>arenas.decay_time</mallctl>
+          (<type>ssize_t</type>)
+          <literal>rw</literal>
+        </term>
+        <listitem><para>Current default per-arena approximate time in seconds
+        from the creation of a set of unused dirty pages until an equivalent set
+        of unused dirty pages is purged and/or reused, used to initialize <link
+        linkend="arena.i.decay_time"><mallctl>arena.&lt;i&gt;.decay_time</mallctl></link>
+        during arena creation.  See <link
+        linkend="opt.decay_time"><mallctl>opt.decay_time</mallctl></link> for
+        additional information.</para></listitem>
+      </varlistentry>
+
        <varlistentry id="arenas.quantum">
          <term>
            <mallctl>arenas.quantum</mallctl>
@@ -1508,7 +1921,7 @@ malloc_conf = "xmalloc:true";]]></programlisting>
        <varlistentry id="arenas.nlruns">
          <term>
            <mallctl>arenas.nlruns</mallctl>
-          (<type>size_t</type>)
+          (<type>unsigned</type>)
            <literal>r-</literal>
          </term>
          <listitem><para>Total number of large size classes.</para></listitem>
@@ -1524,6 +1937,25 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          class.</para></listitem>
        </varlistentry>
  
+      <varlistentry id="arenas.nhchunks">
+        <term>
+          <mallctl>arenas.nhchunks</mallctl>
+          (<type>unsigned</type>)
+          <literal>r-</literal>
+        </term>
+        <listitem><para>Total number of huge size classes.</para></listitem>
+      </varlistentry>
+
+      <varlistentry id="arenas.hchunk.i.size">
+        <term>
+          <mallctl>arenas.hchunk.&lt;i&gt;.size</mallctl>
+          (<type>size_t</type>)
+          <literal>r-</literal>
+        </term>
+        <listitem><para>Maximum size supported by this huge size
+        class.</para></listitem>
+      </varlistentry>
+
        <varlistentry id="arenas.extend">
          <term>
            <mallctl>arenas.extend</mallctl>
@@ -1579,6 +2011,22 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          option.</para></listitem>
        </varlistentry>
  
+      <varlistentry id="prof.gdump">
+        <term>
+          <mallctl>prof.gdump</mallctl>
+          (<type>bool</type>)
+          <literal>rw</literal>
+          [<option>--enable-prof</option>]
+        </term>
+        <listitem><para>When enabled, trigger a memory profile dump every time
+        the total virtual memory exceeds the previous maximum.  Profiles are
+        dumped to files named according to the pattern
+        <filename>&lt;prefix&gt;.&lt;pid&gt;.&lt;seq&gt;.u&lt;useq&gt;.heap</filename>,
+        where <literal>&lt;prefix&gt;</literal> is controlled by the <link
+        linkend="opt.prof_prefix"><mallctl>opt.prof_prefix</mallctl></link>
+        option.</para></listitem>
+      </varlistentry>
+
        <varlistentry id="prof.reset">
          <term>
            <mallctl>prof.reset</mallctl>
@@ -1629,9 +2077,8 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          </term>
          <listitem><para>Pointer to a counter that contains an approximate count
          of the current number of bytes in active pages.  The estimate may be
-        high, but never low, because each arena rounds up to the nearest
-        multiple of the chunk size when computing its contribution to the
-        counter.  Note that the <link
+        high, but never low, because each arena rounds up when computing its
+        contribution to the counter.  Note that the <link
          linkend="epoch"><mallctl>epoch</mallctl></link> mallctl has no bearing
          on this counter.  Furthermore, counter consistency is maintained via
          atomic operations, so it is necessary to use an atomic operation in
@@ -1662,55 +2109,56 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          equal to <link
          linkend="stats.allocated"><mallctl>stats.allocated</mallctl></link>.
          This does not include <link linkend="stats.arenas.i.pdirty">
-        <mallctl>stats.arenas.&lt;i&gt;.pdirty</mallctl></link> and pages
+        <mallctl>stats.arenas.&lt;i&gt;.pdirty</mallctl></link>, nor pages
          entirely devoted to allocator metadata.</para></listitem>
        </varlistentry>
  
-      <varlistentry id="stats.mapped">
+      <varlistentry id="stats.metadata">
          <term>
-          <mallctl>stats.mapped</mallctl>
+          <mallctl>stats.metadata</mallctl>
            (<type>size_t</type>)
            <literal>r-</literal>
            [<option>--enable-stats</option>]
          </term>
-        <listitem><para>Total number of bytes in chunks mapped on behalf of the
-        application.  This is a multiple of the chunk size, and is at least as
-        large as <link
-        linkend="stats.active"><mallctl>stats.active</mallctl></link>.  This
-        does not include inactive chunks.</para></listitem>
+        <listitem><para>Total number of bytes dedicated to metadata, which
+        comprise base allocations used for bootstrap-sensitive internal
+        allocator data structures, arena chunk headers (see <link
+        linkend="stats.arenas.i.metadata.mapped"><mallctl>stats.arenas.&lt;i&gt;.metadata.mapped</mallctl></link>),
+        and internal allocations (see <link
+        linkend="stats.arenas.i.metadata.allocated"><mallctl>stats.arenas.&lt;i&gt;.metadata.allocated</mallctl></link>).</para></listitem>
        </varlistentry>
  
-      <varlistentry id="stats.chunks.current">
+      <varlistentry id="stats.resident">
          <term>
-          <mallctl>stats.chunks.current</mallctl>
+          <mallctl>stats.resident</mallctl>
            (<type>size_t</type>)
            <literal>r-</literal>
            [<option>--enable-stats</option>]
          </term>
-        <listitem><para>Total number of chunks actively mapped on behalf of the
-        application.  This does not include inactive chunks.
-        </para></listitem>
+        <listitem><para>Maximum number of bytes in physically resident data
+        pages mapped by the allocator, comprising all pages dedicated to
+        allocator metadata, pages backing active allocations, and unused dirty
+        pages.  This is a maximum rather than precise because pages may not
+        actually be physically resident if they correspond to demand-zeroed
+        virtual memory that has not yet been touched.  This is a multiple of the
+        page size, and is larger than <link
+        linkend="stats.active"><mallctl>stats.active</mallctl></link>.</para></listitem>
        </varlistentry>
  
-      <varlistentry id="stats.chunks.total">
-        <term>
-          <mallctl>stats.chunks.total</mallctl>
-          (<type>uint64_t</type>)
-          <literal>r-</literal>
-          [<option>--enable-stats</option>]
-        </term>
-        <listitem><para>Cumulative number of chunks allocated.</para></listitem>
-      </varlistentry>
-
-      <varlistentry id="stats.chunks.high">
+      <varlistentry id="stats.mapped">
          <term>
-          <mallctl>stats.chunks.high</mallctl>
+          <mallctl>stats.mapped</mallctl>
            (<type>size_t</type>)
            <literal>r-</literal>
            [<option>--enable-stats</option>]
          </term>
-        <listitem><para>Maximum number of active chunks at any time thus far.
-        </para></listitem>
+        <listitem><para>Total number of bytes in active chunks mapped by the
+        allocator.  This is a multiple of the chunk size, and is larger than
+        <link linkend="stats.active"><mallctl>stats.active</mallctl></link>.
+        This does not include inactive chunks, even those that contain unused
+        dirty pages, which means that there is no strict ordering between this
+        and <link
+        linkend="stats.resident"><mallctl>stats.resident</mallctl></link>.</para></listitem>
        </varlistentry>
  
        <varlistentry id="stats.arenas.i.dss">
@@ -1727,6 +2175,31 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          </para></listitem>
        </varlistentry>
  
+      <varlistentry id="stats.arenas.i.lg_dirty_mult">
+        <term>
+          <mallctl>stats.arenas.&lt;i&gt;.lg_dirty_mult</mallctl>
+          (<type>ssize_t</type>)
+          <literal>r-</literal>
+        </term>
+        <listitem><para>Minimum ratio (log base 2) of active to dirty pages.
+        See <link
+        linkend="opt.lg_dirty_mult"><mallctl>opt.lg_dirty_mult</mallctl></link>
+        for details.</para></listitem>
+      </varlistentry>
+
+      <varlistentry id="stats.arenas.i.decay_time">
+        <term>
+          <mallctl>stats.arenas.&lt;i&gt;.decay_time</mallctl>
+          (<type>ssize_t</type>)
+          <literal>r-</literal>
+        </term>
+        <listitem><para>Approximate time in seconds from the creation of a set
+        of unused dirty pages until an equivalent set of unused dirty pages is
+        purged and/or reused.  See <link
+        linkend="opt.decay_time"><mallctl>opt.decay_time</mallctl></link>
+        for details.</para></listitem>
+      </varlistentry>
+
        <varlistentry id="stats.arenas.i.nthreads">
          <term>
            <mallctl>stats.arenas.&lt;i&gt;.nthreads</mallctl>
@@ -1768,6 +2241,38 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          <listitem><para>Number of mapped bytes.</para></listitem>
        </varlistentry>
  
+      <varlistentry id="stats.arenas.i.metadata.mapped">
+        <term>
+          <mallctl>stats.arenas.&lt;i&gt;.metadata.mapped</mallctl>
+          (<type>size_t</type>)
+          <literal>r-</literal>
+          [<option>--enable-stats</option>]
+        </term>
+        <listitem><para>Number of mapped bytes in arena chunk headers, which
+        track the states of the non-metadata pages.</para></listitem>
+      </varlistentry>
+
+      <varlistentry id="stats.arenas.i.metadata.allocated">
+        <term>
+          <mallctl>stats.arenas.&lt;i&gt;.metadata.allocated</mallctl>
+          (<type>size_t</type>)
+          <literal>r-</literal>
+          [<option>--enable-stats</option>]
+        </term>
+        <listitem><para>Number of bytes dedicated to internal allocations.
+        Internal allocations differ from application-originated allocations in
+        that they are for internal use, and that they are omitted from heap
+        profiles.  This statistic is reported separately from <link
+        linkend="stats.metadata"><mallctl>stats.metadata</mallctl></link> and
+        <link
+        linkend="stats.arenas.i.metadata.mapped"><mallctl>stats.arenas.&lt;i&gt;.metadata.mapped</mallctl></link>
+        because it overlaps with e.g. the <link
+        linkend="stats.allocated"><mallctl>stats.allocated</mallctl></link> and
+        <link linkend="stats.active"><mallctl>stats.active</mallctl></link>
+        statistics, whereas the other metadata statistics do
+        not.</para></listitem>
+      </varlistentry>
+
        <varlistentry id="stats.arenas.i.npurge">
          <term>
            <mallctl>stats.arenas.&lt;i&gt;.npurge</mallctl>
@@ -1933,17 +2438,6 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          </para></listitem>
        </varlistentry>
  
-      <varlistentry id="stats.arenas.i.bins.j.allocated">
-        <term>
-          <mallctl>stats.arenas.&lt;i&gt;.bins.&lt;j&gt;.allocated</mallctl>
-          (<type>size_t</type>)
-          <literal>r-</literal>
-          [<option>--enable-stats</option>]
-        </term>
-        <listitem><para>Current number of bytes allocated by
-        bin.</para></listitem>
-      </varlistentry>
-
        <varlistentry id="stats.arenas.i.bins.j.nmalloc">
          <term>
            <mallctl>stats.arenas.&lt;i&gt;.bins.&lt;j&gt;.nmalloc</mallctl>
@@ -1977,6 +2471,17 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          requests.</para></listitem>
        </varlistentry>
  
+      <varlistentry id="stats.arenas.i.bins.j.curregs">
+        <term>
+          <mallctl>stats.arenas.&lt;i&gt;.bins.&lt;j&gt;.curregs</mallctl>
+          (<type>size_t</type>)
+          <literal>r-</literal>
+          [<option>--enable-stats</option>]
+        </term>
+        <listitem><para>Current number of regions for this size
+        class.</para></listitem>
+      </varlistentry>
+
        <varlistentry id="stats.arenas.i.bins.j.nfills">
          <term>
            <mallctl>stats.arenas.&lt;i&gt;.bins.&lt;j&gt;.nfills</mallctl>
@@ -2071,8 +2576,99 @@ malloc_conf = "xmalloc:true";]]></programlisting>
          <listitem><para>Current number of runs for this size class.
          </para></listitem>
        </varlistentry>
+
+      <varlistentry id="stats.arenas.i.hchunks.j.nmalloc">
+        <term>
+          <mallctl>stats.arenas.&lt;i&gt;.hchunks.&lt;j&gt;.nmalloc</mallctl>
+          (<type>uint64_t</type>)
+          <literal>r-</literal>
+          [<option>--enable-stats</option>]
+        </term>
+        <listitem><para>Cumulative number of allocation requests for this size
+        class served directly by the arena.</para></listitem>
+      </varlistentry>
+
+      <varlistentry id="stats.arenas.i.hchunks.j.ndalloc">
+        <term>
+          <mallctl>stats.arenas.&lt;i&gt;.hchunks.&lt;j&gt;.ndalloc</mallctl>
+          (<type>uint64_t</type>)
+          <literal>r-</literal>
+          [<option>--enable-stats</option>]
+        </term>
+        <listitem><para>Cumulative number of deallocation requests for this
+        size class served directly by the arena.</para></listitem>
+      </varlistentry>
+
+      <varlistentry id="stats.arenas.i.hchunks.j.nrequests">
+        <term>
+          <mallctl>stats.arenas.&lt;i&gt;.hchunks.&lt;j&gt;.nrequests</mallctl>
+          (<type>uint64_t</type>)
+          <literal>r-</literal>
+          [<option>--enable-stats</option>]
+        </term>
+        <listitem><para>Cumulative number of allocation requests for this size
+        class.</para></listitem>
+      </varlistentry>
+
+      <varlistentry id="stats.arenas.i.hchunks.j.curhchunks">
+        <term>
+          <mallctl>stats.arenas.&lt;i&gt;.hchunks.&lt;j&gt;.curhchunks</mallctl>
+          (<type>size_t</type>)
+          <literal>r-</literal>
+          [<option>--enable-stats</option>]
+        </term>
+        <listitem><para>Current number of huge allocations for this size class.
+        </para></listitem>
+      </varlistentry>
      </variablelist>
    </refsect1>
+  <refsect1 id="heap_profile_format">
+    <title>HEAP PROFILE FORMAT</title>
+    <para>Although the heap profiling functionality was originally designed to
+    be compatible with the
+    <command>pprof</command> command that is developed as part of the <ulink
+    url="http://code.google.com/p/gperftools/">gperftools
+    package</ulink>, the addition of per thread heap profiling functionality
+    required a different heap profile format.  The <command>jeprof</command>
+    command is derived from <command>pprof</command>, with enhancements to
+    support the heap profile format described here.</para>
+
+    <para>In the following hypothetical heap profile, <constant>[...]</constant>
+    indicates elision for the sake of compactness.  <programlisting><![CDATA[
+heap_v2/524288
+  t*: 28106: 56637512 [0: 0]
+  [...]
+  t3: 352: 16777344 [0: 0]
+  [...]
+  t99: 17754: 29341640 [0: 0]
+  [...]
+@ 0x5f86da8 0x5f5a1dc [...] 0x29e4d4e 0xa200316 0xabb2988 [...]
+  t*: 13: 6688 [0: 0]
+  t3: 12: 6496 [0: ]
+  t99: 1: 192 [0: 0]
+[...]
+
+MAPPED_LIBRARIES:
+[...]]]></programlisting> The following matches the above heap profile, but most
+tokens are replaced with <constant>&lt;description&gt;</constant> to indicate
+descriptions of the corresponding fields.  <programlisting><![CDATA[
+<heap_profile_format_version>/<mean_sample_interval>
+  <aggregate>: <curobjs>: <curbytes> [<cumobjs>: <cumbytes>]
+  [...]
+  <thread_3_aggregate>: <curobjs>: <curbytes>[<cumobjs>: <cumbytes>]
+  [...]
+  <thread_99_aggregate>: <curobjs>: <curbytes>[<cumobjs>: <cumbytes>]
+  [...]
+@ <top_frame> <frame> [...] <frame> <frame> <frame> [...]
+  <backtrace_aggregate>: <curobjs>: <curbytes> [<cumobjs>: <cumbytes>]
+  <backtrace_thread_3>: <curobjs>: <curbytes> [<cumobjs>: <cumbytes>]
+  <backtrace_thread_99>: <curobjs>: <curbytes> [<cumobjs>: <cumbytes>]
+[...]
+
+MAPPED_LIBRARIES:
+</proc/<pid>/maps>]]></programlisting></para>
+  </refsect1>
+
    <refsect1 id="debugging_malloc_problems">
      <title>DEBUGGING MALLOC PROBLEMS</title>
      <para>When debugging, it is a good idea to configure/build jemalloc with