]> git.proxmox.com Git - ceph.git/blob - ceph/doc/dev/osd_internals/osd_overview.rst
add subtree-ish sources for 12.0.3
[ceph.git] / ceph / doc / dev / osd_internals / osd_overview.rst
1 ===
2 OSD
3 ===
4
5 Concepts
6 --------
7
8 *Messenger*
9 See src/msg/Messenger.h
10
11 Handles sending and receipt of messages on behalf of the OSD. The OSD uses
12 two messengers:
13
14 1. cluster_messenger - handles traffic to other OSDs, monitors
15 2. client_messenger - handles client traffic
16
17 This division allows the OSD to be configured with different interfaces for
18 client and cluster traffic.
19
20 *Dispatcher*
21 See src/msg/Dispatcher.h
22
23 OSD implements the Dispatcher interface. Of particular note is ms_dispatch,
24 which serves as the entry point for messages received via either the client
25 or cluster messenger. Because there are two messengers, ms_dispatch may be
26 called from at least two threads. The osd_lock is always held during
27 ms_dispatch.
28
29 *WorkQueue*
30 See src/common/WorkQueue.h
31
32 The WorkQueue class abstracts the process of queueing independent tasks
33 for asynchronous execution. Each OSD process contains workqueues for
34 distinct tasks:
35
36 1. OpWQ: handles ops (from clients) and subops (from other OSDs).
37 Runs in the op_tp threadpool.
38 2. PeeringWQ: handles peering tasks and pg map advancement
39 Runs in the op_tp threadpool.
40 See Peering
41 3. CommandWQ: handles commands (pg query, etc)
42 Runs in the command_tp threadpool.
43 4. RecoveryWQ: handles recovery tasks.
44 Runs in the recovery_tp threadpool.
45 5. SnapTrimWQ: handles snap trimming
46 Runs in the disk_tp threadpool.
47 See SnapTrimmer
48 6. ScrubWQ: handles primary scrub path
49 Runs in the disk_tp threadpool.
50 See Scrub
51 7. ScrubFinalizeWQ: handles primary scrub finalize
52 Runs in the disk_tp threadpool.
53 See Scrub
54 8. RepScrubWQ: handles replica scrub path
55 Runs in the disk_tp threadpool
56 See Scrub
57 9. RemoveWQ: Asynchronously removes old pg directories
58 Runs in the disk_tp threadpool
59 See PGRemoval
60
61 *ThreadPool*
62 See src/common/WorkQueue.h
63 See also above.
64
65 There are 4 OSD threadpools:
66
67 1. op_tp: handles ops and subops
68 2. recovery_tp: handles recovery tasks
69 3. disk_tp: handles disk intensive tasks
70 4. command_tp: handles commands
71
72 *OSDMap*
73 See src/osd/OSDMap.h
74
75 The crush algorithm takes two inputs: a picture of the cluster
76 with status information about which nodes are up/down and in/out,
77 and the pgid to place. The former is encapsulated by the OSDMap.
78 Maps are numbered by *epoch* (epoch_t). These maps are passed around
79 within the OSD as std::tr1::shared_ptr<const OSDMap>.
80
81 See MapHandling
82
83 *PG*
84 See src/osd/PG.* src/osd/PrimaryLogPG.*
85
86 Objects in rados are hashed into *PGs* and *PGs* are placed via crush onto
87 OSDs. The PG structure is responsible for handling requests pertaining to
88 a particular *PG* as well as for maintaining relevant metadata and controlling
89 recovery.
90
91 *OSDService*
92 See src/osd/OSD.cc OSDService
93
94 The OSDService acts as a broker between PG threads and OSD state which allows
95 PGs to perform actions using OSD services such as workqueues and messengers.
96 This is still a work in progress. Future cleanups will focus on moving such
97 state entirely from the OSD into the OSDService.
98
99 Overview
100 --------
101 See src/ceph_osd.cc
102
103 The OSD process represents one leaf device in the crush hierarchy. There
104 might be one OSD process per physical machine, or more than one if, for
105 example, the user configures one OSD instance per disk.
106