pmgcm.adoc

   1 [[chapter_pmgcm]]
   2 ifdef::manvolnum[]
   3 pmgcm(1)
   4 ========
   5 :pmg-toplevel:
   6
   7 NAME
   8 ----
   9
  10 pmgcm - Proxmox Mail Gateway Cluster Management Toolkit
  11
  12
  13 SYNOPSIS
  14 --------
  15
  16 include::pmgcm.1-synopsis.adoc[]
  17
  18
  19 DESCRIPTION
  20 -----------
  21 endif::manvolnum[]
  22 ifndef::manvolnum[]
  23 Cluster Management
  24 ==================
  25 :pmg-toplevel:
  26 endif::manvolnum[]
  27
  28 We are living in a world where email becomes more and more important -
  29 failures in email systems are just not acceptable. To meet these
  30 requirements we developed the Proxmox HA (High Availability) Cluster.
  31
  32 The {pmg} HA Cluster consists of a master and several slave nodes
  33 (minimum one node). Configuration is done on the master. Configuration
  34 and data is synchronized to all cluster nodes over a VPN tunnel. This
  35 provides the following advantages:
  36
  37 * centralized configuration management
  38
  39 * fully redundant data storage
  40
  41 * high availability
  42
  43 * high performance
  44
  45 We use a unique application level clustering scheme, which provides
  46 extremely good performance. Special considerations where taken to make
  47 management as easy as possible. Complete Cluster setup is done within
  48 minutes, and nodes automatically reintegrate after temporary failures
  49 without any operator interaction.
  50
  51 image::images/Proxmox_HA_cluster_final_1024.png[]
  52
  53
  54 Hardware requirements
  55 ---------------------
  56
  57 There are no special hardware requirements, although it is highly
  58 recommended to use fast and reliable server with redundant disks on
  59 all cluster nodes (Hardware RAID with BBU and write cache enabled).
  60
  61 The HA Cluster can also run in virtualized environments.
  62
  63
  64 Subscriptions
  65 -------------
  66
  67 Each host in a cluster has its own subscription. If you want support
  68 for a cluster, each cluster node needs to have a valid
  69 subscription. All nodes must have the same subscription level.
  70
  71
  72 Load balancing
  73 --------------
  74
  75 It is usually advisable to distribute mail traffic among all cluster
  76 nodes. Please note that this is not always required, because it is
  77 also reasonable to use only one node to handle SMTP traffic. The
  78 second node is used as quarantine host, and only provides the web
  79 interface to the user quarantine.
  80
  81 The normal mail delivery process looks up DNS Mail Exchange (`MX`)
  82 records to determine the destination host. A `MX` record tells the
  83 sending system where to deliver mail for a certain domain. It is also
  84 possible to have several `MX` records for a single domain, they can have
  85 different priorities. For example, our `MX` record looks like that:
  86
  87 ----
  88 # dig -t mx proxmox.com
  89
  90 ;; ANSWER SECTION:
  91 proxmox.com.            22879   IN      MX      10 mail.proxmox.com.
  92
  93 ;; ADDITIONAL SECTION:
  94 mail.proxmox.com.       22879   IN      A       213.129.239.114
  95 ----
  96
  97 Please notice that there is one single `MX` record for the Domain
  98 `proxmox.com`, pointing to `mail.proxmox.com`. The `dig` command
  99 automatically puts out the corresponding address record if it
 100 exists. In our case it points to `213.129.239.114`. The priority of
 101 our `MX` record is set to 10 (preferred default value).
 102
 103
 104 Hot standby with backup `MX` records
 105 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 106
 107 Many people do not want to install two redundant mail proxies, instead
 108 they use the mail proxy of their ISP as fall-back. This is simply done
 109 by adding an additional `MX` Record with a lower priority (higher
 110 number). With the example above this looks like that:
 111
 112 ----
 113 proxmox.com.            22879   IN      MX      100 mail.provider.tld.
 114 ----
 115
 116 Sure, your provider must accept mails for your domain and forward
 117 received mails to you. Please note that such setup is not really
 118 advisable, because spam detection needs to be done by that backup `MX`
 119 server also, and external servers provided by ISPs usually don't do
 120 that.
 121
 122 You will never lose mails with such a setup, because the sending Mail
 123 Transport Agent (MTA) will simply deliver the mail to the backup
 124 server (mail.provider.tld) if the primary server (mail.proxmox.com) is
 125 not available.
 126
 127 NOTE: Any resononable mail server retries mail devivery if the target
 128 server is not available, i.e. {pmg} stores mail and retries delivery
 129 for up to one week. So you will not loose mail if you mail server is
 130 down, even if you run a single server setup.
 131
 132
 133 Load balancing with `MX` records
 134 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 135
 136 Using your ISPs mail server is not always a good idea, because many
 137 ISPs do not use advanced spam prevention techniques, or do not filter
 138 SPAM at all. It is often better to run a second server yourself to
 139 avoid lower spam detection rates.
 140
 141 Anyways, it’s quite simple to set up a high performance load balanced
 142 mail cluster using `MX` records. You just need to define two `MX` records
 143 with the same priority. I will explain this using a complete example
 144 to make it clearer.
 145
 146 First, you need to have at least 2 working {pmg} servers
 147 (mail1.example.com and mail2.example.com) configured as cluster (see
 148 section xref:pmg_cluster_administration[Cluster administration]
 149 below), each having its own IP address. Let us assume the following
 150 addresses (DNS address records):
 151
 152 ----
 153 mail1.example.com.       22879   IN      A       1.2.3.4
 154 mail2.example.com.       22879   IN      A       1.2.3.5
 155 ----
 156
 157 Btw, it is always a good idea to add reverse lookup entries (PTR
 158 records) for those hosts. Many email systems nowadays reject mails
 159 from hosts without valid PTR records.  Then you need to define your `MX`
 160 records:
 161
 162 ----
 163 example.com.            22879   IN      MX      10 mail1.example.com.
 164 example.com.            22879   IN      MX      10 mail2.example.com.
 165 ----
 166
 167 This is all you need. You will receive mails on both hosts, more or
 168 less load-balanced using round-robin scheduling. If one host fails the
 169 other is used.
 170
 171
 172 Other ways
 173 ~~~~~~~~~~
 174
 175 Multiple address records
 176 ^^^^^^^^^^^^^^^^^^^^^^^^
 177
 178 Using several DNS `MX` record is sometime clumsy if you have many
 179 domains. It is also possible to use one `MX` record per domain, but
 180 multiple address records:
 181
 182 ----
 183 example.com.            22879   IN      MX      10 mail.example.com.
 184 mail.example.com.       22879   IN      A       1.2.3.4
 185 mail.example.com.       22879   IN      A       1.2.3.5
 186 ----
 187
 188
 189 Using firewall features
 190 ^^^^^^^^^^^^^^^^^^^^^^^
 191
 192 Many firewalls can do some kind of RR-Scheduling (round-robin) when
 193 using DNAT. See your firewall manual for more details.
 194
 195
 196 [[pmg_cluster_administration]]
 197 Cluster administration
 198 ----------------------
 199
 200 Cluster administration can be done on the GUI or using the command
 201 line utility `pmgcm`. The CLI tool is a bit more verbose, so we suggest
 202 to use that if you run into problems.
 203
 204 NOTE: Always setup the IP configuration before adding a node to the
 205 cluster. IP address, network mask, gateway address and hostname can’t
 206 be changed later.
 207
 208 Creating a Cluster
 209 ~~~~~~~~~~~~~~~~~~
 210
 211 image::images/screenshot/pmg-gui-cluster-panel.png[]
 212
 213 You can create a cluster from any existing Proxmox host. All data is
 214 preserved.
 215
 216 * make sure you have the right IP configuration
 217   (IP/MASK/GATEWAY/HOSTNAME), because you cannot change that later
 218
 219 * press the create button on the GUI, or run the cluster creation command:
 220 +
 221 ----
 222 pmgcm create
 223 ----
 224
 225 NOTE: The node where you run the cluster create command will be the
 226 'master' node.
 227
 228
 229 Show Cluster Status
 230 ~~~~~~~~~~~~~~~~~~~
 231
 232 The GUI shows the status of all cluster nodes, and it is also possible
 233 to use the command line tool:
 234
 235 ----
 236 pmgcm status
 237 --NAME(CID)--------------IPADDRESS----ROLE-STATE---------UPTIME---LOAD----MEM---DISK
 238 pmg5(1)              192.168.2.127   master A       1 day 21:18   0.30    80%    41%
 239 ----
 240
 241
 242 [[pmgcm_join]]
 243 Adding Cluster Nodes
 244 ~~~~~~~~~~~~~~~~~~~~
 245
 246 image::images/screenshot/pmg-gui-cluster-join.png[]
 247
 248 When you add a new node to a cluster (join) all data on that node is
 249 destroyed. The whole database is initialized with cluster data from
 250 the master.
 251
 252 * make sure you have the right IP configuration
 253
 254 * run the cluster join command (on the new node):
 255 +
 256 ----
 257 pmgcm join <master_ip>
 258 ----
 259
 260 You need to enter the root password of the master host when asked for
 261 a password. When joining a cluster using the GUI, you also need to
 262 enter the 'fingerprint' of the master node. You get that information
 263 by pressing the `Add` button on the master node.
 264
 265 CAUTION: Node initialization deletes all existing databases, stops and
 266 then restarts all services accessing the database. So do not add nodes
 267 which are already active and receive mails.
 268
 269 Also, joining a cluster can take several minutes, because the new node
 270 needs to synchronize all data from the master (although this is done
 271 in the background).
 272
 273 NOTE: If you join a new node, existing quarantined items from the other nodes are not synchronized to the new node.
 274
 275
 276 Deleting Nodes
 277 ~~~~~~~~~~~~~~
 278
 279 Please detach nodes from the cluster network before removing them
 280 from the cluster configuration. Then run the following command on
 281 the master node:
 282
 283 ----
 284 pmgcm delete <cid>
 285 ----
 286
 287 Parameter `<cid>` is the unique cluster node ID, as listed with `pmgcm status`.
 288
 289
 290 Disaster Recovery
 291 ~~~~~~~~~~~~~~~~~
 292
 293 It is highly recommended to use redundant disks on all cluster nodes
 294 (RAID). So in almost any circumstances you just need to replace the
 295 damaged hardware or disk. {pmg} uses an asynchronous
 296 clustering algorithm, so you just need to reboot the repaired node,
 297 and everything will work again transparently.
 298
 299 The following scenarios only apply when you really loose the contents
 300 of the hard disk.
 301
 302
 303 Single Node Failure
 304 ^^^^^^^^^^^^^^^^^^^
 305
 306 * delete failed node on master
 307 +
 308 ----
 309 pmgcm delete <cid>
 310 ----
 311
 312 * add (re-join) a new node
 313 +
 314 ----
 315 pmgcm join <master_ip>
 316 ----
 317
 318
 319 Master Failure
 320 ^^^^^^^^^^^^^^
 321
 322 * force another node to be master
 323 +
 324 -----
 325 pmgcm promote
 326 -----
 327
 328 * tell other nodes that master has changed
 329 +
 330 ----
 331 pmgcm sync --master_ip <master_ip>
 332 ----
 333
 334
 335 Total Cluster Failure
 336 ^^^^^^^^^^^^^^^^^^^^^
 337
 338 * restore backup (Cluster and node information is not restored, you
 339   have to recreate master and nodes)
 340
 341 * tell it to become master
 342 +
 343 ----
 344 pmgcm create
 345 ----
 346
 347 * install new nodes
 348
 349 * add those new nodes to the cluster
 350 +
 351 ----
 352 pmgcm join <master_ip>
 353 ----
 354
 355
 356 ifdef::manvolnum[]
 357 include::pmg-copyright.adoc[]
 358 endif::manvolnum[]