]> git.proxmox.com Git - ceph.git/blob - ceph/doc/dev/network-protocol.rst
update sources to ceph Nautilus 14.2.1
[ceph.git] / ceph / doc / dev / network-protocol.rst
1 ==================
2 Network Protocol
3 ==================
4
5 This file describes the network protocol used by Ceph. In order to understand
6 the way the structures are defined it is recommended to read the introduction
7 of :doc:`/dev/network-encoding` first.
8
9 Hello
10 =====
11
12 The protocol starts with a handshake that confirms that both nodes are talking
13 ceph and shares some basic information.
14
15 Banner
16 ------
17
18 The first action is the server sending banner to the client. The banner is
19 defined in ``CEPH_BANNER`` from ``src/include/msgr.h``. This is followed by
20 the server's then client's address each encoded as a ``entity_addr_t``.
21
22 Once the client verifies that the servers banner matches its own it replies with
23 its banner and its address.
24
25 Connect
26 -------
27
28 Once the banners have been verified and the addresses exchanged the connection
29 negotiation begins. First the client sends a ``ceph_msg_connect`` structure
30 with its information.
31
32 ::
33
34 // From src/include/msgr.h
35 struct ceph_msg_connect {
36 u64le features; // Supported features (CEPH_FEATURE_*)
37 u32le host_type; // CEPH_ENTITY_TYPE_*
38 u32le global_seq; // Number of connections initiated by this host.
39 u32le connect_seq; // Number of connections initiated in this session.
40 u32le protocol_version;
41 u32le authorizer_protocol;
42 u32le authorizer_len;
43 u8 flags; // CEPH_MSG_CONNECT_*
44 u8 authorizer[authorizer_len];
45 }
46
47 Connect Reply
48 -------------
49
50 Once the connect has been sent the connection has effectively been opened,
51 however the first message the server sends must be a connect reply message.
52
53 ::
54
55 struct ceph_msg_connect_reply {
56 u8 tag; // Tag indicating response code.
57 u64le features;
58 u32le global_seq;
59 u32le connect_seq;
60 u32le protocol_version;
61 u32le authorizer_len;
62 u8 flags;
63 u8 authorizer[authorizer_len];
64 }
65
66 MSGR Protocol
67 =============
68
69 This is a low level protocol over which messages are delivered. The messages
70 at this level consist of a tag byte, identifying the type of message, followed
71 by the message data.
72
73 ::
74
75 // Virtual structure.
76 struct {
77 u8 tag; // CEPH_MSGR_TAG_*
78 u8 data[]; // Length depends on tag and data.
79 }
80
81 The length of ``data`` is determined by the tag byte and depending on the
82 message type via information in the ``data`` array itself.
83
84 .. note::
85 There is no way to determine the length of the message if you do not
86 understand the type of message.
87
88 The message tags are defined in ``src/include/msgr.h`` and the current ones
89 are listed below along with the data they include. Note that the defined
90 structures don't exist in the source and are merely for representing the
91 protocol.
92
93 CEPH_MSGR_TAG_CLOSE (0x06)
94 --------------------------
95
96 ::
97
98 struct ceph_msgr_close {
99 u8 tag = 0x06;
100 u8 data[0]; // No data.
101 }
102
103 The close message indicates that the connection is being closed.
104
105 CEPH_MSGR_TAG_MSG (0x07)
106 ------------------------
107
108 ::
109
110 struct ceph_msgr_msg {
111 u8 tag = 0x07;
112 ceph_msg_header header;
113 u8 front [header.front_len ];
114 u8 middle[header.middle_len];
115 u8 data [header.data_len ];
116 ceph_msg_footer footer;
117 }
118
119 // From src/include/msgr.h
120 struct ceph_msg_header {
121 u64le seq; // Sequence number.
122 u64le tid; // Transaction ID.
123 u16le type; // Message type (CEPH_MSG_* or MSG_*).
124 u16le priority; // Priority (higher is more important).
125 u16le version; // Version of message encoding.
126
127 u32le front_len; // The size of the front section.
128 u32le middle_len; // The size of the middle section.
129 u32le data_len; // The size of the data section.
130 u16le data_off; // The way data should be aligned by the receiver.
131
132 ceph_entity_name src; // Information about the sender.
133
134 u16le compat_version; // Oldest compatible encoding version.
135 u16le reserved; // Unused.
136 u32le crc; // CRC of header.
137 }
138
139 // From src/include/msgr.h
140 struct ceph_msg_footer {
141 u32le front_crc; // Checksums of the various sections.
142 u32le middle_crc; //
143 u32le data_crc; //
144 u64le sig; // Crypographic signature.
145 u8 flags;
146 }
147
148 Messages are the business logic of Ceph. They are what is used to send data and
149 requests between nodes. The message header contains the length of the message
150 so unknown messages can be handled gracefully.
151
152 There are two names for the message type constants ``CEPH_MSG_*`` and ``MSG_*``.
153 The only difference between the two is that the first are considered "public"
154 while the second is for internal use only. There is no protocol-level
155 difference.
156
157 CEPH_MSGR_TAG_ACK (0x08)
158 ------------------------
159
160 ::
161
162 struct ceph_msgr_ack {
163 u8 tag = 0x08;
164 u64le seq; // The sequence number of the message being acknowledged.
165 }
166
167 CEPH_MSGR_TAG_KEEPALIVE (0x09)
168 ------------------------------
169
170 ::
171
172 struct ceph_msgr_keepalive {
173 u8 tag = 0x09;
174 u8 data[0]; // No data.
175 }
176
177 CEPH_MSGR_TAG_KEEPALIVE2 (0x0E)
178 -------------------------------
179
180 ::
181
182 struct ceph_msgr_keepalive2 {
183 u8 tag = 0x0E;
184 utime_t timestamp;
185 }
186
187 CEPH_MSGR_TAG_KEEPALIVE2_ACK (0x0F)
188 -----------------------------------
189
190 ::
191
192 struct ceph_msgr_keepalive2_ack {
193 u8 tag = 0x0F;
194 utime_t timestamp;
195 }
196
197 .. vi: textwidth=80 noexpandtab