From 37f1b7dd8d1f9f5b8811fe89fab9ec55ca9226a0 Mon Sep 17 00:00:00 2001 From: Dietmar Maurer Date: Tue, 1 Dec 2020 10:28:06 +0100 Subject: [PATCH] docs: add more thoughts about chunk size --- README.rst | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/README.rst b/README.rst index e880a1d7..5c42ad10 100644 --- a/README.rst +++ b/README.rst @@ -112,3 +112,24 @@ Modern SSD are much faster, lets assume the following:: MAX(64KB) = 354 MB/s; MAX(4KB) = 67 MB/s; MAX(1KB) = 18 MB/s; + + +Also, the average chunk directly relates to the number of chunks produced by +a backup:: + + CHUNK_COUNT = BACKUP_SIZE / ACS + +Here are some staticics from my developer worstation:: + + Disk Usage: 65 GB + Directories: 58971 + Files: 726314 + Files < 64KB: 617541 + +As you see, there are really many small files. If we would do file +level deduplication, i.e. generate one chunk per file, we end up with +more than 700000 chunks. + +Instead, our current algorithm only produce large chunks with an +average chunks size of 4MB. With above data, this produce about 15000 +chunks (factor 50 less chunks). -- 2.39.2