[pmg-docs.git] / pmgcm.adoc

[[chapter_pmgcm]]
ifdef::manvolnum[]
pmgcm(1)
========
:pmg-toplevel:

NAME
----

pmgcm - Proxmox Mail Gateway Cluster Management Toolkit


SYNOPSIS
--------

include::pmgcm.1-synopsis.adoc[]


DESCRIPTION
-----------
endif::manvolnum[]
ifndef::manvolnum[]
Cluster Management
==================
:pmg-toplevel:
endif::manvolnum[]

We are living in a world where email becomes more and more important -
failures in email systems are just not acceptable. To meet these
requirements we developed the Proxmox HA (High Availability) Cluster.

The {pmg} HA Cluster consists of a master and several slave nodes
(minimum one node). Configuration is done on the master. Configuration
and data is synchronized to all cluster nodes over a VPN tunnel. This
provides the following advantages:

* centralized configuration management

* fully redundant data storage

* high availability

* high performance

We use a unique application level clustering scheme, which provides
extremely good performance. Special considerations where taken to make
management as easy as possible. Complete Cluster setup is done within
minutes, and nodes automatically reintegrate after temporary failures
without any operator interaction.

image::images/pmg-ha-cluster.png[]


Hardware requirements
---------------------

There are no special hardware requirements, although it is highly
recommended to use fast and reliable server with redundant disks on
all cluster nodes (Hardware RAID with BBU and write cache enabled).

The HA Cluster can also run in virtualized environments.


Subscriptions
-------------

Each host in a cluster has its own subscription. If you want support
for a cluster, each cluster node needs to have a valid
subscription. All nodes must have the same subscription level.


Load balancing
--------------

You can use one of the mechanism described in chapter 9 if you want to
distribute mail traffic among the cluster nodes. Please note that this
is not always required, because it is also reasonable to use only one
node to handle SMTP traffic. The second node is used as quarantine
host (provide the web interface to user quarantine).


Cluster administration
----------------------

Cluster administration is done with a single command line utility
called `pmgcm'. So you need to login via ssh to manage the cluster
setup.

NOTE: Always setup the IP configuration before adding a node to the
cluster. IP address, network mask, gateway address and hostname can’t
be changed later.


Creating a Cluster
~~~~~~~~~~~~~~~~~~

You can create a cluster from any existing Proxmox host. All data is
preserved.

* make sure you have the right IP configuration
  (IP/MASK/GATEWAY/HOSTNAME), because you cannot changed that later

* run the cluster creation command:
+
----
pmgcm create
----


List Cluster Status
~~~~~~~~~~~~~~~~~~~

----
pmgcm status
--NAME(CID)--------------IPADDRESS----ROLE-STATE---------UPTIME---LOAD----MEM---DISK
pmg5(1)              192.168.2.127   master A       1 day 21:18   0.30    80%    41%
----


Adding Cluster Nodes
~~~~~~~~~~~~~~~~~~~~

When you add a new node to a cluster (join) all data on that node is
destroyed. The whole database is initialized with cluster data from
the master.

* make sure you have the right IP configuration

* run the cluster join command (on the new node):
+
----
pmgcm join <master_ip>
----

You need to enter the root password of the master host when asked for
a password.

CAUTION: Node initialization deletes all existing databases, stops and
then restarts all services accessing the database. So do not add nodes
which are already active and receive mails.

Also, joining a cluster can take several minutes, because the new node
needs to synchronize all data from the master (although this is done
in the background).

NOTE: If you join a new node, existing quarantined items from the other nodes are not synchronized to the new node.


Deleting Nodes
~~~~~~~~~~~~~~

Please detach nodes from the cluster network before removing them
from the cluster configuration. Then run the following command on
the master node:

----
pmgcm delete <cid>
----

Parameter `<cid>` is the unique cluster node ID, as listed with `pmgcm status`.


Disaster Recovery
~~~~~~~~~~~~~~~~~

It is highly recommended to use redundant disks on all cluster nodes
(RAID). So in almost any circumstances you just need to replace the
damaged hardware or disk. {pmg} uses an asynchronous
clustering algorithm, so you just need to reboot the repaired node,
and everything will work again transparently.

The following scenarios only apply when you really loose the contents
of the hard disk.


Single Node Failure
^^^^^^^^^^^^^^^^^^^

* delete failed node on master
+
----
pmgcm delete <cid>
----

* add (re-join) a new node
+
----
pmgcm join <master_ip>
----


Master Failure
^^^^^^^^^^^^^^

* force another node to be master
+
-----
pmgcm promote
-----

* tell other nodes that master has changed
+
----
pmgcm sync --master_ip <master_ip>
----


Total Cluster Failure
^^^^^^^^^^^^^^^^^^^^^

* restore backup (Cluster and node information is not restored, you
  have to recreate master and nodes)

* tell it to become master
+
----
pmgcm create
----

* install new nodes

* add those new nodes to the cluster
+
----
pmgcm join <master_ip>
----


ifdef::manvolnum[]
include::pmg-copyright.adoc[]
endif::manvolnum[]
Commit	Line	Data
a0f910ae DM	1	[[chapter_pmgcm]]
	2	ifdef::manvolnum[]
	3	pmgcm(1)
	4	========
	5	:pmg-toplevel:
	6
	7	NAME
	8	----
	9
	10	pmgcm - Proxmox Mail Gateway Cluster Management Toolkit
	11
	12
	13	SYNOPSIS
	14	--------
	15
	16	include::pmgcm.1-synopsis.adoc[]
	17
	18
	19	DESCRIPTION
	20	-----------
	21	endif::manvolnum[]
	22	ifndef::manvolnum[]
3ea67bfe DM	23	Cluster Management
3ea67bfe DM	24	==================
a0f910ae DM	25	:pmg-toplevel:
	26	endif::manvolnum[]
	27
3ea67bfe DM	28	We are living in a world where email becomes more and more important -
	29	failures in email systems are just not acceptable. To meet these
	30	requirements we developed the Proxmox HA (High Availability) Cluster.
	31
	32	The {pmg} HA Cluster consists of a master and several slave nodes
	33	(minimum one node). Configuration is done on the master. Configuration
	34	and data is synchronized to all cluster nodes over a VPN tunnel. This
	35	provides the following advantages:
	36
	37	* centralized configuration management
	38
	39	* fully redundant data storage
	40
	41	* high availability
	42
	43	* high performance
	44
	45	We use a unique application level clustering scheme, which provides
	46	extremely good performance. Special considerations where taken to make
	47	management as easy as possible. Complete Cluster setup is done within
	48	minutes, and nodes automatically reintegrate after temporary failures
	49	without any operator interaction.
	50
	51	image::images/pmg-ha-cluster.png[]
	52
	53
	54	Hardware requirements
	55	---------------------
	56
	57	There are no special hardware requirements, although it is highly
	58	recommended to use fast and reliable server with redundant disks on
	59	all cluster nodes (Hardware RAID with BBU and write cache enabled).
	60
	61	The HA Cluster can also run in virtualized environments.
	62
	63
	64	Subscriptions
	65	-------------
	66
	67	Each host in a cluster has its own subscription. If you want support
	68	for a cluster, each cluster node needs to have a valid
	69	subscription. All nodes must have the same subscription level.
	70
	71
	72	Load balancing
	73	--------------
	74
	75	You can use one of the mechanism described in chapter 9 if you want to
	76	distribute mail traffic among the cluster nodes. Please note that this
	77	is not always required, because it is also reasonable to use only one
	78	node to handle SMTP traffic. The second node is used as quarantine
	79	host (provide the web interface to user quarantine).
	80
	81
	82	Cluster administration
	83	----------------------
	84
	85	Cluster administration is done with a single command line utility
	86	called `pmgcm'. So you need to login via ssh to manage the cluster
	87	setup.
	88
	89	NOTE: Always setup the IP configuration before adding a node to the
	90	cluster. IP address, network mask, gateway address and hostname can’t
	91	be changed later.
92
93
94	Creating a Cluster
95	~~~~~~~~~~~~~~~~~~
96
97	You can create a cluster from any existing Proxmox host. All data is
98	preserved.
99
100	* make sure you have the right IP configuration
101	(IP/MASK/GATEWAY/HOSTNAME), because you cannot changed that later
102
103	* run the cluster creation command:
104	+
105	----
106	pmgcm create
107	----
108
109
110	List Cluster Status
111	~~~~~~~~~~~~~~~~~~~
112
113	----
114	pmgcm status
115	--NAME(CID)--------------IPADDRESS----ROLE-STATE---------UPTIME---LOAD----MEM---DISK
116	pmg5(1) 192.168.2.127 master A 1 day 21:18 0.30 80% 41%
117	----
118
119
120	Adding Cluster Nodes
121	~~~~~~~~~~~~~~~~~~~~
122
123	When you add a new node to a cluster (join) all data on that node is
124	destroyed. The whole database is initialized with cluster data from
125	the master.
126
127	* make sure you have the right IP configuration
128
129	* run the cluster join command (on the new node):
130	+
131	----
132	pmgcm join <master_ip>
133	----
134
135	You need to enter the root password of the master host when asked for
136	a password.
137
138	CAUTION: Node initialization deletes all existing databases, stops and
139	then restarts all services accessing the database. So do not add nodes
140	which are already active and receive mails.
141
142	Also, joining a cluster can take several minutes, because the new node
143	needs to synchronize all data from the master (although this is done
144	in the background).
145
146	NOTE: If you join a new node, existing quarantined items from the other nodes are not synchronized to the new node.
147
148
149	Deleting Nodes
150	~~~~~~~~~~~~~~
151
152	Please detach nodes from the cluster network before removing them
153	from the cluster configuration. Then run the following command on
154	the master node:
155
156	----
157	pmgcm delete <cid>
158	----
159
160	Parameter `<cid>` is the unique cluster node ID, as listed with `pmgcm status`.
161
162
163	Disaster Recovery
164	~~~~~~~~~~~~~~~~~
165
166	It is highly recommended to use redundant disks on all cluster nodes
167	(RAID). So in almost any circumstances you just need to replace the
168	damaged hardware or disk. {pmg} uses an asynchronous
169	clustering algorithm, so you just need to reboot the repaired node,
170	and everything will work again transparently.
171
172	The following scenarios only apply when you really loose the contents
173	of the hard disk.
174
175
176	Single Node Failure
177	^^^^^^^^^^^^^^^^^^^
178
179	* delete failed node on master
180	+
181	----
182	pmgcm delete <cid>
183	----
184
185	* add (re-join) a new node
186	+
187	----
188	pmgcm join <master_ip>
189	----
190
191
192	Master Failure
193	^^^^^^^^^^^^^^
194
195	* force another node to be master
196	+
197	-----
198	pmgcm promote
199	-----
200
201	* tell other nodes that master has changed
202	+
203	----
204	pmgcm sync --master_ip <master_ip>
205	----
206
207
208	Total Cluster Failure
209	^^^^^^^^^^^^^^^^^^^^^
210
211	* restore backup (Cluster and node information is not restored, you
212	have to recreate master and nodes)
213
214	* tell it to become master
215	+
216	----
217	pmgcm create
218	----
219
220	* install new nodes
221
222	* add those new nodes to the cluster
223	+
224	----
225	pmgcm join <master_ip>
226	----
227
a0f910ae DM	228
	229	ifdef::manvolnum[]
	230	include::pmg-copyright.adoc[]
	231	endif::manvolnum[]