[proxmox-backup.git] / README.rst


Build & Release Notes
*********************

``rustup`` Toolchain
====================

We normally want to build with the ``rustc`` Debian package. To do that
you can set the following ``rustup`` configuration:

    # rustup toolchain link system /usr
    # rustup default system


Versioning of proxmox helper crates
===================================

To use current git master code of the proxmox* helper crates, add::

   git = "git://git.proxmox.com/git/proxmox"

or::

   path = "../proxmox/proxmox"

to the proxmox dependency, and update the version to reflect the current,
pre-release version number (e.g., "0.1.1-dev.1" instead of "0.1.0").


Local cargo config
==================

This repository ships with a ``.cargo/config`` that replaces the crates.io
registry with packaged crates located in ``/usr/share/cargo/registry``.

A similar config is also applied building with dh_cargo. Cargo.lock needs to be
deleted when switching between packaged crates and crates.io, since the
checksums are not compatible.

To reference new dependencies (or updated versions) that are not yet packaged,
the dependency needs to point directly to a path or git source (e.g., see
example for proxmox crate above).


Build
=====
on Debian 12 Bookworm

Setup:
  1. # echo 'deb http://download.proxmox.com/debian/devel/ bookworm main' | sudo tee /etc/apt/sources.list.d/proxmox-devel.list
  2. # sudo wget https://enterprise.proxmox.com/debian/proxmox-release-bookworm.gpg -O /etc/apt/trusted.gpg.d/proxmox-release-bookworm.gpg
  3. # sudo apt update
  4. # sudo apt install devscripts debcargo clang
  5. # git clone git://git.proxmox.com/git/proxmox-backup.git
  6. # cd proxmox-backup; sudo mk-build-deps -ir

Note: 2. may be skipped if you already added the PVE or PBS package repository

You are now able to build using the Makefile or cargo itself, e.g.::

  # make deb
  # # or for a non-package build
  # cargo build --all --release

Design Notes
************

Here are some random thought about the software design (unless I find a better place).


Large chunk sizes
=================

It is important to notice that large chunk sizes are crucial for performance.
We have a multi-user system, where different people can do different operations
on a datastore at the same time, and most operation involves reading a series
of chunks.

So what is the maximal theoretical speed we can get when reading a series of
chunks? Reading a chunk sequence need the following steps:

- seek to the first chunk's start location
- read the chunk data
- seek to the next chunk's start location
- read the chunk data
- ...

Lets use the following disk performance metrics:

:AST: Average Seek Time (second)
:MRS: Maximum sequential Read Speed (bytes/second)
:ACS: Average Chunk Size (bytes)

The maximum performance you can get is::

  MAX(ACS) = ACS /(AST + ACS/MRS)

Please note that chunk data is likely to be sequential arranged on disk, but
this it is sort of a best case assumption.

For a typical rotational disk, we assume the following values::

  AST: 10ms
  MRS: 170MB/s

  MAX(4MB)  = 115.37 MB/s
  MAX(1MB)  =  61.85 MB/s;
  MAX(64KB) =   6.02 MB/s;
  MAX(4KB)  =   0.39 MB/s;
  MAX(1KB)  =   0.10 MB/s;

Modern SSD are much faster, lets assume the following::

  max IOPS: 20000 => AST = 0.00005
  MRS: 500Mb/s

  MAX(4MB)  = 474 MB/s
  MAX(1MB)  = 465 MB/s;
  MAX(64KB) = 354 MB/s;
  MAX(4KB)  =  67 MB/s;
  MAX(1KB)  =  18 MB/s;


Also, the average chunk directly relates to the number of chunks produced by
a backup::

  CHUNK_COUNT = BACKUP_SIZE / ACS

Here are some staticics from my developer worstation::

  Disk Usage:       65 GB
  Directories:   58971
  Files:        726314
  Files < 64KB: 617541

As you see, there are really many small files. If we would do file
level deduplication, i.e. generate one chunk per file, we end up with
more than 700000 chunks.

Instead, our current algorithm only produce large chunks with an
average chunks size of 4MB. With above data, this produce about 15000
chunks (factor 50 less chunks).
Commit	Line	Data
	1
	2	Build & Release Notes
	3	*********************
	4
	5	``rustup`` Toolchain
	6	====================
	7
	8	We normally want to build with the ``rustc`` Debian package. To do that
	9	you can set the following ``rustup`` configuration:
	10
	11	# rustup toolchain link system /usr
	12	# rustup default system
	13
	14
	15	Versioning of proxmox helper crates
	16	===================================
	17
	18	To use current git master code of the proxmox* helper crates, add::
	19
	20	git = "git://git.proxmox.com/git/proxmox"
	21
	22	or::
	23
	24	path = "../proxmox/proxmox"
	25
	26	to the proxmox dependency, and update the version to reflect the current,
	27	pre-release version number (e.g., "0.1.1-dev.1" instead of "0.1.0").
	28
	29
	30	Local cargo config
	31	==================
	32
	33	This repository ships with a ``.cargo/config`` that replaces the crates.io
	34	registry with packaged crates located in ``/usr/share/cargo/registry``.
	35
	36	A similar config is also applied building with dh_cargo. Cargo.lock needs to be
	37	deleted when switching between packaged crates and crates.io, since the
	38	checksums are not compatible.
	39
	40	To reference new dependencies (or updated versions) that are not yet packaged,
	41	the dependency needs to point directly to a path or git source (e.g., see
	42	example for proxmox crate above).
	43
	44
	45	Build
	46	=====
	47	on Debian 12 Bookworm
	48
	49	Setup:
	50	1. # echo 'deb http://download.proxmox.com/debian/devel/ bookworm main' \| sudo tee /etc/apt/sources.list.d/proxmox-devel.list
	51	2. # sudo wget https://enterprise.proxmox.com/debian/proxmox-release-bookworm.gpg -O /etc/apt/trusted.gpg.d/proxmox-release-bookworm.gpg
	52	3. # sudo apt update
	53	4. # sudo apt install devscripts debcargo clang
	54	5. # git clone git://git.proxmox.com/git/proxmox-backup.git
	55	6. # cd proxmox-backup; sudo mk-build-deps -ir
	56
	57	Note: 2. may be skipped if you already added the PVE or PBS package repository
	58
	59	You are now able to build using the Makefile or cargo itself, e.g.::
	60
	61	# make deb
	62	# # or for a non-package build
	63	# cargo build --all --release
	64
	65	Design Notes
	66	************
	67
	68	Here are some random thought about the software design (unless I find a better place).
	69
	70
	71	Large chunk sizes
	72	=================
	73
	74	It is important to notice that large chunk sizes are crucial for performance.
	75	We have a multi-user system, where different people can do different operations
	76	on a datastore at the same time, and most operation involves reading a series
	77	of chunks.
	78
	79	So what is the maximal theoretical speed we can get when reading a series of
	80	chunks? Reading a chunk sequence need the following steps:
	81
	82	- seek to the first chunk's start location
	83	- read the chunk data
	84	- seek to the next chunk's start location
	85	- read the chunk data
	86	- ...
	87
	88	Lets use the following disk performance metrics:
	89
	90	:AST: Average Seek Time (second)
	91	:MRS: Maximum sequential Read Speed (bytes/second)
	92	:ACS: Average Chunk Size (bytes)
	93
	94	The maximum performance you can get is::
	95
	96	MAX(ACS) = ACS /(AST + ACS/MRS)
	97
	98	Please note that chunk data is likely to be sequential arranged on disk, but
	99	this it is sort of a best case assumption.
	100
	101	For a typical rotational disk, we assume the following values::
	102
	103	AST: 10ms
	104	MRS: 170MB/s
	105
	106	MAX(4MB) = 115.37 MB/s
	107	MAX(1MB) = 61.85 MB/s;
	108	MAX(64KB) = 6.02 MB/s;
	109	MAX(4KB) = 0.39 MB/s;
	110	MAX(1KB) = 0.10 MB/s;
	111
	112	Modern SSD are much faster, lets assume the following::
	113
	114	max IOPS: 20000 => AST = 0.00005
	115	MRS: 500Mb/s
	116
	117	MAX(4MB) = 474 MB/s
	118	MAX(1MB) = 465 MB/s;
	119	MAX(64KB) = 354 MB/s;
	120	MAX(4KB) = 67 MB/s;
	121	MAX(1KB) = 18 MB/s;
	122
	123
	124	Also, the average chunk directly relates to the number of chunks produced by
	125	a backup::
	126
	127	CHUNK_COUNT = BACKUP_SIZE / ACS
	128
	129	Here are some staticics from my developer worstation::
	130
	131	Disk Usage: 65 GB
	132	Directories: 58971
	133	Files: 726314
	134	Files < 64KB: 617541
	135
	136	As you see, there are really many small files. If we would do file
	137	level deduplication, i.e. generate one chunk per file, we end up with
	138	more than 700000 chunks.
	139
	140	Instead, our current algorithm only produce large chunks with an
	141	average chunks size of 4MB. With above data, this produce about 15000
	142	chunks (factor 50 less chunks).