]>
Commit | Line | Data |
---|---|---|
74a552a1 MM |
1 | ORANGEFS |
2 | ======== | |
3 | ||
4 | OrangeFS is an LGPL userspace scale-out parallel storage system. It is ideal | |
5 | for large storage problems faced by HPC, BigData, Streaming Video, | |
6 | Genomics, Bioinformatics. | |
7 | ||
8 | Orangefs, originally called PVFS, was first developed in 1993 by | |
9 | Walt Ligon and Eric Blumer as a parallel file system for Parallel | |
10 | Virtual Machine (PVM) as part of a NASA grant to study the I/O patterns | |
11 | of parallel programs. | |
12 | ||
13 | Orangefs features include: | |
14 | ||
15 | * Distributes file data among multiple file servers | |
16 | * Supports simultaneous access by multiple clients | |
17 | * Stores file data and metadata on servers using local file system | |
18 | and access methods | |
19 | * Userspace implementation is easy to install and maintain | |
20 | * Direct MPI support | |
21 | * Stateless | |
22 | ||
23 | ||
8e9ba5c4 MM |
24 | MAILING LIST ARCHIVES |
25 | ===================== | |
74a552a1 | 26 | |
8e9ba5c4 MM |
27 | http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/ |
28 | ||
29 | ||
30 | MAILING LIST SUBMISSIONS | |
31 | ======================== | |
32 | ||
33 | devel@lists.orangefs.org | |
74a552a1 MM |
34 | |
35 | ||
36 | DOCUMENTATION | |
37 | ============= | |
38 | ||
39 | http://www.orangefs.org/documentation/ | |
40 | ||
41 | ||
42 | USERSPACE FILESYSTEM SOURCE | |
43 | =========================== | |
44 | ||
45 | http://www.orangefs.org/download | |
46 | ||
47 | Orangefs versions prior to 2.9.3 would not be compatible with the | |
48 | upstream version of the kernel client. | |
49 | ||
50 | ||
dd098022 MB |
51 | RUNNING ORANGEFS ON A SINGLE SERVER |
52 | =================================== | |
74a552a1 | 53 | |
dd098022 MB |
54 | OrangeFS is usually run in large installations with multiple servers and |
55 | clients, but a complete filesystem can be run on a single machine for | |
56 | development and testing. | |
57 | ||
58 | On Fedora, install orangefs and orangefs-server. | |
59 | ||
60 | dnf -y install orangefs orangefs-server | |
61 | ||
62 | There is an example server configuration file in | |
63 | /etc/orangefs/orangefs.conf. Change localhost to your hostname if | |
64 | necessary. | |
65 | ||
66 | To generate a filesystem to run xfstests against, see below. | |
67 | ||
68 | There is an example client configuration file in /etc/pvfs2tab. It is a | |
69 | single line. Uncomment it and change the hostname if necessary. This | |
70 | controls clients which use libpvfs2. This does not control the | |
71 | pvfs2-client-core. | |
72 | ||
73 | Create the filesystem. | |
74 | ||
75 | pvfs2-server -f /etc/orangefs/orangefs.conf | |
76 | ||
77 | Start the server. | |
78 | ||
79 | systemctl start orangefs-server | |
80 | ||
81 | Test the server. | |
82 | ||
83 | pvfs2-ping -m /pvfsmnt | |
84 | ||
85 | Start the client. The module must be compiled in or loaded before this | |
86 | point. | |
87 | ||
88 | systemctl start orangefs-client | |
89 | ||
90 | Mount the filesystem. | |
91 | ||
92 | mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt | |
93 | ||
94 | ||
95 | BUILDING ORANGEFS ON A SINGLE SERVER | |
96 | ==================================== | |
97 | ||
98 | Where OrangeFS cannot be installed from distribution packages, it may be | |
99 | built from source. | |
100 | ||
101 | You can omit --prefix if you don't care that things are sprinkled around | |
102 | in /usr/local. As of version 2.9.6, OrangeFS uses Berkeley DB by | |
103 | default, we will probably be changing the default to LMDB soon. | |
74a552a1 | 104 | |
ba5e79ea | 105 | ./configure --prefix=/opt/ofs --with-db-backend=lmdb |
74a552a1 MM |
106 | |
107 | make | |
108 | ||
109 | make install | |
110 | ||
dd098022 MB |
111 | Create an orangefs config file. |
112 | ||
74a552a1 MM |
113 | /opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf |
114 | ||
dd098022 MB |
115 | Create an /etc/pvfs2tab file. |
116 | ||
117 | echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \ | |
118 | /etc/pvfs2tab | |
119 | ||
120 | Create the mount point you specified in the tab file if needed. | |
121 | ||
122 | mkdir /pvfsmnt | |
74a552a1 | 123 | |
dd098022 | 124 | Bootstrap the server. |
74a552a1 | 125 | |
dd098022 | 126 | /opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf |
74a552a1 | 127 | |
dd098022 | 128 | Start the server. |
74a552a1 | 129 | |
74a552a1 MM |
130 | /opt/osf/sbin/pvfs2-server /etc/pvfs2.conf |
131 | ||
8e9ba5c4 MM |
132 | Now the server should be running. Pvfs2-ls is a simple |
133 | test to verify that the server is running. | |
dd098022 MB |
134 | |
135 | /opt/ofs/bin/pvfs2-ls /pvfsmnt | |
136 | ||
8e9ba5c4 MM |
137 | If stuff seems to be working, load the kernel module and |
138 | turn on the client core. | |
dd098022 MB |
139 | |
140 | /opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core | |
141 | ||
8e9ba5c4 | 142 | Mount your filesystem. |
dd098022 MB |
143 | |
144 | mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt | |
145 | ||
146 | ||
147 | RUNNING XFSTESTS | |
148 | ================ | |
149 | ||
150 | It is useful to use a scratch filesystem with xfstests. This can be | |
151 | done with only one server. | |
152 | ||
153 | Make a second copy of the FileSystem section in the server configuration | |
154 | file, which is /etc/orangefs/orangefs.conf. Change the Name to scratch. | |
155 | Change the ID to something other than the ID of the first FileSystem | |
156 | section (2 is usually a good choice). | |
157 | ||
158 | Then there are two FileSystem sections: orangefs and scratch. | |
159 | ||
160 | This change should be made before creating the filesystem. | |
161 | ||
162 | pvfs2-server -f /etc/orangefs/orangefs.conf | |
163 | ||
164 | To run xfstests, create /etc/xfsqa.config. | |
74a552a1 | 165 | |
dd098022 MB |
166 | TEST_DIR=/orangefs |
167 | TEST_DEV=tcp://localhost:3334/orangefs | |
168 | SCRATCH_MNT=/scratch | |
169 | SCRATCH_DEV=tcp://localhost:3334/scratch | |
74a552a1 | 170 | |
dd098022 | 171 | Then xfstests can be run |
74a552a1 | 172 | |
dd098022 | 173 | ./check -pvfs2 |
74a552a1 MM |
174 | |
175 | ||
176 | OPTIONS | |
177 | ======= | |
178 | ||
179 | The following mount options are accepted: | |
180 | ||
181 | acl | |
182 | Allow the use of Access Control Lists on files and directories. | |
183 | ||
184 | intr | |
185 | Some operations between the kernel client and the user space | |
186 | filesystem can be interruptible, such as changes in debug levels | |
187 | and the setting of tunable parameters. | |
188 | ||
189 | local_lock | |
190 | Enable posix locking from the perspective of "this" kernel. The | |
191 | default file_operations lock action is to return ENOSYS. Posix | |
192 | locking kicks in if the filesystem is mounted with -o local_lock. | |
193 | Distributed locking is being worked on for the future. | |
194 | ||
195 | ||
196 | DEBUGGING | |
197 | ========= | |
198 | ||
fcac9d57 | 199 | If you want the debug (GOSSIP) statements in a particular |
74a552a1 MM |
200 | source file (inode.c for example) go to syslog: |
201 | ||
202 | echo inode > /sys/kernel/debug/orangefs/kernel-debug | |
203 | ||
204 | No debugging (the default): | |
205 | ||
206 | echo none > /sys/kernel/debug/orangefs/kernel-debug | |
207 | ||
208 | Debugging from several source files: | |
209 | ||
210 | echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug | |
211 | ||
212 | All debugging: | |
213 | ||
214 | echo all > /sys/kernel/debug/orangefs/kernel-debug | |
215 | ||
216 | Get a list of all debugging keywords: | |
217 | ||
218 | cat /sys/kernel/debug/orangefs/debug-help | |
fcac9d57 MM |
219 | |
220 | ||
221 | PROTOCOL BETWEEN KERNEL MODULE AND USERSPACE | |
222 | ============================================ | |
223 | ||
224 | Orangefs is a user space filesystem and an associated kernel module. | |
225 | We'll just refer to the user space part of Orangefs as "userspace" | |
226 | from here on out. Orangefs descends from PVFS, and userspace code | |
227 | still uses PVFS for function and variable names. Userspace typedefs | |
228 | many of the important structures. Function and variable names in | |
229 | the kernel module have been transitioned to "orangefs", and The Linux | |
230 | Coding Style avoids typedefs, so kernel module structures that | |
231 | correspond to userspace structures are not typedefed. | |
232 | ||
233 | The kernel module implements a pseudo device that userspace | |
234 | can read from and write to. Userspace can also manipulate the | |
235 | kernel module through the pseudo device with ioctl. | |
236 | ||
237 | THE BUFMAP: | |
238 | ||
239 | At startup userspace allocates two page-size-aligned (posix_memalign) | |
240 | mlocked memory buffers, one is used for IO and one is used for readdir | |
241 | operations. The IO buffer is 41943040 bytes and the readdir buffer is | |
242 | 4194304 bytes. Each buffer contains logical chunks, or partitions, and | |
243 | a pointer to each buffer is added to its own PVFS_dev_map_desc structure | |
244 | which also describes its total size, as well as the size and number of | |
245 | the partitions. | |
246 | ||
247 | A pointer to the IO buffer's PVFS_dev_map_desc structure is sent to a | |
248 | mapping routine in the kernel module with an ioctl. The structure is | |
249 | copied from user space to kernel space with copy_from_user and is used | |
250 | to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which | |
251 | then contains: | |
252 | ||
253 | * refcnt - a reference counter | |
254 | * desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's | |
255 | partition size, which represents the filesystem's block size and | |
256 | is used for s_blocksize in super blocks. | |
257 | * desc_count - PVFS2_BUFMAP_DEFAULT_DESC_COUNT (10) - the number of | |
258 | partitions in the IO buffer. | |
259 | * desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks. | |
260 | * total_size - the total size of the IO buffer. | |
261 | * page_count - the number of 4096 byte pages in the IO buffer. | |
262 | * page_array - a pointer to page_count * (sizeof(struct page*)) bytes | |
263 | of kcalloced memory. This memory is used as an array of pointers | |
264 | to each of the pages in the IO buffer through a call to get_user_pages. | |
265 | * desc_array - a pointer to desc_count * (sizeof(struct orangefs_bufmap_desc)) | |
266 | bytes of kcalloced memory. This memory is further intialized: | |
267 | ||
268 | user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc | |
269 | structure. user_desc->ptr points to the IO buffer. | |
270 | ||
271 | pages_per_desc = bufmap->desc_size / PAGE_SIZE | |
272 | offset = 0 | |
273 | ||
274 | bufmap->desc_array[0].page_array = &bufmap->page_array[offset] | |
275 | bufmap->desc_array[0].array_count = pages_per_desc = 1024 | |
276 | bufmap->desc_array[0].uaddr = (user_desc->ptr) + (0 * 1024 * 4096) | |
277 | offset += 1024 | |
278 | . | |
279 | . | |
280 | . | |
281 | bufmap->desc_array[9].page_array = &bufmap->page_array[offset] | |
282 | bufmap->desc_array[9].array_count = pages_per_desc = 1024 | |
283 | bufmap->desc_array[9].uaddr = (user_desc->ptr) + | |
284 | (9 * 1024 * 4096) | |
285 | offset += 1024 | |
286 | ||
287 | * buffer_index_array - a desc_count sized array of ints, used to | |
288 | indicate which of the IO buffer's partitions are available to use. | |
289 | * buffer_index_lock - a spinlock to protect buffer_index_array during update. | |
290 | * readdir_index_array - a five (ORANGEFS_READDIR_DEFAULT_DESC_COUNT) element | |
291 | int array used to indicate which of the readdir buffer's partitions are | |
292 | available to use. | |
293 | * readdir_index_lock - a spinlock to protect readdir_index_array during | |
294 | update. | |
295 | ||
296 | OPERATIONS: | |
297 | ||
298 | The kernel module builds an "op" (struct orangefs_kernel_op_s) when it | |
299 | needs to communicate with userspace. Part of the op contains the "upcall" | |
300 | which expresses the request to userspace. Part of the op eventually | |
301 | contains the "downcall" which expresses the results of the request. | |
302 | ||
303 | The slab allocator is used to keep a cache of op structures handy. | |
304 | ||
9f08cfe9 MM |
305 | At init time the kernel module defines and initializes a request list |
306 | and an in_progress hash table to keep track of all the ops that are | |
307 | in flight at any given time. | |
308 | ||
309 | Ops are stateful: | |
310 | ||
311 | * unknown - op was just initialized | |
312 | * waiting - op is on request_list (upward bound) | |
313 | * inprogr - op is in progress (waiting for downcall) | |
314 | * serviced - op has matching downcall; ok | |
315 | * purged - op has to start a timer since client-core | |
316 | exited uncleanly before servicing op | |
317 | * given up - submitter has given up waiting for it | |
318 | ||
319 | When some arbitrary userspace program needs to perform a | |
320 | filesystem operation on Orangefs (readdir, I/O, create, whatever) | |
321 | an op structure is initialized and tagged with a distinguishing ID | |
322 | number. The upcall part of the op is filled out, and the op is | |
323 | passed to the "service_operation" function. | |
324 | ||
325 | Service_operation changes the op's state to "waiting", puts | |
326 | it on the request list, and signals the Orangefs file_operations.poll | |
327 | function through a wait queue. Userspace is polling the pseudo-device | |
328 | and thus becomes aware of the upcall request that needs to be read. | |
329 | ||
330 | When the Orangefs file_operations.read function is triggered, the | |
331 | request list is searched for an op that seems ready-to-process. | |
332 | The op is removed from the request list. The tag from the op and | |
333 | the filled-out upcall struct are copy_to_user'ed back to userspace. | |
334 | ||
335 | If any of these (and some additional protocol) copy_to_users fail, | |
336 | the op's state is set to "waiting" and the op is added back to | |
337 | the request list. Otherwise, the op's state is changed to "in progress", | |
338 | and the op is hashed on its tag and put onto the end of a list in the | |
339 | in_progress hash table at the index the tag hashed to. | |
340 | ||
341 | When userspace has assembled the response to the upcall, it | |
342 | writes the response, which includes the distinguishing tag, back to | |
343 | the pseudo device in a series of io_vecs. This triggers the Orangefs | |
344 | file_operations.write_iter function to find the op with the associated | |
345 | tag and remove it from the in_progress hash table. As long as the op's | |
346 | state is not "canceled" or "given up", its state is set to "serviced". | |
347 | The file_operations.write_iter function returns to the waiting vfs, | |
348 | and back to service_operation through wait_for_matching_downcall. | |
349 | ||
350 | Service operation returns to its caller with the op's downcall | |
351 | part (the response to the upcall) filled out. | |
352 | ||
353 | The "client-core" is the bridge between the kernel module and | |
354 | userspace. The client-core is a daemon. The client-core has an | |
355 | associated watchdog daemon. If the client-core is ever signaled | |
356 | to die, the watchdog daemon restarts the client-core. Even though | |
357 | the client-core is restarted "right away", there is a period of | |
358 | time during such an event that the client-core is dead. A dead client-core | |
359 | can't be triggered by the Orangefs file_operations.poll function. | |
360 | Ops that pass through service_operation during a "dead spell" can timeout | |
361 | on the wait queue and one attempt is made to recycle them. Obviously, | |
362 | if the client-core stays dead too long, the arbitrary userspace processes | |
363 | trying to use Orangefs will be negatively affected. Waiting ops | |
364 | that can't be serviced will be removed from the request list and | |
302f0493 | 365 | have their states set to "given up". In-progress ops that can't |
9f08cfe9 MM |
366 | be serviced will be removed from the in_progress hash table and |
367 | have their states set to "given up". | |
368 | ||
369 | Readdir and I/O ops are atypical with respect to their payloads. | |
fcac9d57 MM |
370 | |
371 | - readdir ops use the smaller of the two pre-allocated pre-partitioned | |
372 | memory buffers. The readdir buffer is only available to userspace. | |
373 | The kernel module obtains an index to a free partition before launching | |
374 | a readdir op. Userspace deposits the results into the indexed partition | |
375 | and then writes them to back to the pvfs device. | |
376 | ||
377 | - io (read and write) ops use the larger of the two pre-allocated | |
378 | pre-partitioned memory buffers. The IO buffer is accessible from | |
379 | both userspace and the kernel module. The kernel module obtains an | |
380 | index to a free partition before launching an io op. The kernel module | |
381 | deposits write data into the indexed partition, to be consumed | |
382 | directly by userspace. Userspace deposits the results of read | |
383 | requests into the indexed partition, to be consumed directly | |
384 | by the kernel module. | |
385 | ||
386 | Responses to kernel requests are all packaged in pvfs2_downcall_t | |
387 | structs. Besides a few other members, pvfs2_downcall_t contains a | |
388 | union of structs, each of which is associated with a particular | |
389 | response type. | |
390 | ||
391 | The several members outside of the union are: | |
392 | - int32_t type - type of operation. | |
393 | - int32_t status - return code for the operation. | |
394 | - int64_t trailer_size - 0 unless readdir operation. | |
395 | - char *trailer_buf - initialized to NULL, used during readdir operations. | |
396 | ||
397 | The appropriate member inside the union is filled out for any | |
398 | particular response. | |
399 | ||
400 | PVFS2_VFS_OP_FILE_IO | |
401 | fill a pvfs2_io_response_t | |
402 | ||
403 | PVFS2_VFS_OP_LOOKUP | |
404 | fill a PVFS_object_kref | |
405 | ||
406 | PVFS2_VFS_OP_CREATE | |
407 | fill a PVFS_object_kref | |
408 | ||
409 | PVFS2_VFS_OP_SYMLINK | |
410 | fill a PVFS_object_kref | |
411 | ||
412 | PVFS2_VFS_OP_GETATTR | |
413 | fill in a PVFS_sys_attr_s (tons of stuff the kernel doesn't need) | |
414 | fill in a string with the link target when the object is a symlink. | |
415 | ||
416 | PVFS2_VFS_OP_MKDIR | |
417 | fill a PVFS_object_kref | |
418 | ||
419 | PVFS2_VFS_OP_STATFS | |
420 | fill a pvfs2_statfs_response_t with useless info <g>. It is hard for | |
421 | us to know, in a timely fashion, these statistics about our | |
302f0493 | 422 | distributed network filesystem. |
fcac9d57 MM |
423 | |
424 | PVFS2_VFS_OP_FS_MOUNT | |
425 | fill a pvfs2_fs_mount_response_t which is just like a PVFS_object_kref | |
426 | except its members are in a different order and "__pad1" is replaced | |
427 | with "id". | |
428 | ||
429 | PVFS2_VFS_OP_GETXATTR | |
430 | fill a pvfs2_getxattr_response_t | |
431 | ||
432 | PVFS2_VFS_OP_LISTXATTR | |
433 | fill a pvfs2_listxattr_response_t | |
434 | ||
435 | PVFS2_VFS_OP_PARAM | |
436 | fill a pvfs2_param_response_t | |
437 | ||
438 | PVFS2_VFS_OP_PERF_COUNT | |
439 | fill a pvfs2_perf_count_response_t | |
440 | ||
441 | PVFS2_VFS_OP_FSKEY | |
442 | file a pvfs2_fs_key_response_t | |
443 | ||
444 | PVFS2_VFS_OP_READDIR | |
445 | jamb everything needed to represent a pvfs2_readdir_response_t into | |
446 | the readdir buffer descriptor specified in the upcall. | |
447 | ||
9f08cfe9 | 448 | Userspace uses writev() on /dev/pvfs2-req to pass responses to the requests |
fcac9d57 MM |
449 | made by the kernel side. |
450 | ||
451 | A buffer_list containing: | |
452 | - a pointer to the prepared response to the request from the | |
453 | kernel (struct pvfs2_downcall_t). | |
454 | - and also, in the case of a readdir request, a pointer to a | |
455 | buffer containing descriptors for the objects in the target | |
456 | directory. | |
457 | ... is sent to the function (PINT_dev_write_list) which performs | |
458 | the writev. | |
459 | ||
460 | PINT_dev_write_list has a local iovec array: struct iovec io_array[10]; | |
461 | ||
462 | The first four elements of io_array are initialized like this for all | |
463 | responses: | |
464 | ||
465 | io_array[0].iov_base = address of local variable "proto_ver" (int32_t) | |
466 | io_array[0].iov_len = sizeof(int32_t) | |
467 | ||
468 | io_array[1].iov_base = address of global variable "pdev_magic" (int32_t) | |
469 | io_array[1].iov_len = sizeof(int32_t) | |
302f0493 | 470 | |
fcac9d57 MM |
471 | io_array[2].iov_base = address of parameter "tag" (PVFS_id_gen_t) |
472 | io_array[2].iov_len = sizeof(int64_t) | |
473 | ||
474 | io_array[3].iov_base = address of out_downcall member (pvfs2_downcall_t) | |
475 | of global variable vfs_request (vfs_request_t) | |
476 | io_array[3].iov_len = sizeof(pvfs2_downcall_t) | |
477 | ||
478 | Readdir responses initialize the fifth element io_array like this: | |
479 | ||
480 | io_array[4].iov_base = contents of member trailer_buf (char *) | |
481 | from out_downcall member of global variable | |
482 | vfs_request | |
483 | io_array[4].iov_len = contents of member trailer_size (PVFS_size) | |
484 | from out_downcall member of global variable | |
485 | vfs_request | |
302f0493 MM |
486 | |
487 | Orangefs exploits the dcache in order to avoid sending redundant | |
488 | requests to userspace. We keep object inode attributes up-to-date with | |
489 | orangefs_inode_getattr. Orangefs_inode_getattr uses two arguments to | |
490 | help it decide whether or not to update an inode: "new" and "bypass". | |
491 | Orangefs keeps private data in an object's inode that includes a short | |
492 | timeout value, getattr_time, which allows any iteration of | |
493 | orangefs_inode_getattr to know how long it has been since the inode was | |
494 | updated. When the object is not new (new == 0) and the bypass flag is not | |
495 | set (bypass == 0) orangefs_inode_getattr returns without updating the inode | |
496 | if getattr_time has not timed out. Getattr_time is updated each time the | |
497 | inode is updated. | |
498 | ||
499 | Creation of a new object (file, dir, sym-link) includes the evaluation of | |
500 | its pathname, resulting in a negative directory entry for the object. | |
501 | A new inode is allocated and associated with the dentry, turning it from | |
502 | a negative dentry into a "productive full member of society". Orangefs | |
503 | obtains the new inode from Linux with new_inode() and associates | |
504 | the inode with the dentry by sending the pair back to Linux with | |
505 | d_instantiate(). | |
506 | ||
507 | The evaluation of a pathname for an object resolves to its corresponding | |
508 | dentry. If there is no corresponding dentry, one is created for it in | |
509 | the dcache. Whenever a dentry is modified or verified Orangefs stores a | |
510 | short timeout value in the dentry's d_time, and the dentry will be trusted | |
511 | for that amount of time. Orangefs is a network filesystem, and objects | |
512 | can potentially change out-of-band with any particular Orangefs kernel module | |
513 | instance, so trusting a dentry is risky. The alternative to trusting | |
514 | dentries is to always obtain the needed information from userspace - at | |
515 | least a trip to the client-core, maybe to the servers. Obtaining information | |
516 | from a dentry is cheap, obtaining it from userspace is relatively expensive, | |
517 | hence the motivation to use the dentry when possible. | |
518 | ||
519 | The timeout values d_time and getattr_time are jiffy based, and the | |
520 | code is designed to avoid the jiffy-wrap problem: | |
521 | ||
522 | "In general, if the clock may have wrapped around more than once, there | |
523 | is no way to tell how much time has elapsed. However, if the times t1 | |
524 | and t2 are known to be fairly close, we can reliably compute the | |
525 | difference in a way that takes into account the possibility that the | |
526 | clock may have wrapped between times." | |
527 | ||
528 | from course notes by instructor Andy Wang | |
fcac9d57 | 529 |