From 37f1b7dd8d1f9f5b8811fe89fab9ec55ca9226a0 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Tue, 1 Dec 2020 10:28:06 +0100
Subject: [PATCH] docs: add more thoughts about chunk size

---
 README.rst | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/README.rst b/README.rst
index e880a1d7..5c42ad10 100644
--- a/README.rst
+++ b/README.rst
@@ -112,3 +112,24 @@ Modern SSD are much faster, lets assume the following::
   MAX(64KB) = 354 MB/s;
   MAX(4KB)  =  67 MB/s;
   MAX(1KB)  =  18 MB/s;
+
+
+Also, the average chunk directly relates to the number of chunks produced by
+a backup::
+
+  CHUNK_COUNT = BACKUP_SIZE / ACS
+
+Here are some staticics from my developer worstation::
+
+  Disk Usage:       65 GB
+  Directories:   58971
+  Files:        726314
+  Files < 64KB: 617541
+
+As you see, there are really many small files. If we would do file
+level deduplication, i.e. generate one chunk per file, we end up with
+more than 700000 chunks.
+
+Instead, our current algorithm only produce large chunks with an
+average chunks size of 4MB. With above data, this produce about 15000
+chunks (factor 50 less chunks).
-- 
2.39.2