ceph/doc/cephfs/app-best-practices.rst

   1
   2 Application best practices for distributed file systems
   3 =======================================================
   4
   5 CephFS is POSIX compatible, and therefore should work with any existing
   6 applications that expect a POSIX file system.  However, because it is a
   7 network file system (unlike e.g. XFS) and it is highly consistent (unlike
   8 e.g. NFS), there are some consequences that application authors may
   9 benefit from knowing about.
  10
  11 The following sections describe some areas where distributed file systems
  12 may have noticeably different performance behaviours compared with
  13 local file systems.
  14
  15
  16 ls -l
  17 -----
  18
  19 When you run "ls -l", the ``ls`` program
  20 is first doing a directory listing, and then calling ``stat`` on every
  21 file in the directory.
  22
  23 This is usually far in excess of what an application really needs, and
  24 it can be slow for large directories.  If you don't really need all
  25 this metadata for each file, then use a plain ``ls``.
  26
  27 ls/stat on files being extended
  28 -------------------------------
  29
  30 If another client is currently extending files in the listed directory,
  31 then an ``ls -l`` may take an exceptionally long time to complete, as
  32 the lister must wait for the writer to flush data in order to do a valid
  33 read of the every file's size.  So unless you *really* need to know the
  34 exact size of every file in the directory, just don't do it!
  35
  36 This would also apply to any application code that was directly
  37 issuing ``stat`` system calls on files being appended from
  38 another node.
  39
  40 Very large directories
  41 ----------------------
  42
  43 Do you really need that 10,000,000 file directory?  While directory
  44 fragmentation enables CephFS to handle it, it is always going to be
  45 less efficient than splitting your files into more modest-sized directories.
  46
  47 Even standard userspace tools can become quite slow when operating on very
  48 large directories. For example, the default behaviour of ``ls``
  49 is to give an alphabetically ordered result, but ``readdir`` system
  50 calls do not give an ordered result (this is true in general, not just
  51 with CephFS).  So when you ``ls`` on a million file directory, it is
  52 loading a list of a million names into memory, sorting the list, then writing
  53 it out to the display.
  54
  55 Hard links
  56 ----------
  57
  58 Hard links have an intrinsic cost in terms of the internal housekeeping
  59 that a file system has to do to keep two references to the same data.  In
  60 CephFS there is a particular performance cost, because with normal files
  61 the inode is embedded in the directory (i.e. there is no extra fetch of
  62 the inode after looking up the path).
  63
  64 Working set size
  65 ----------------
  66
  67 The MDS acts as a cache for the metadata stored in RADOS.  Metadata
  68 performance is very different for workloads whose metadata fits within
  69 that cache.
  70
  71 If your workload has more files than fit in your cache (configured using
  72 ``mds_cache_memory_limit`` settings), then make sure you test it
  73 appropriately: don't test your system with a small number of files and then
  74 expect equivalent performance when you move to a much larger number of files.
  75
  76 Do you need a file system?
  77 --------------------------
  78
  79 Remember that Ceph also includes an object storage interface.  If your
  80 application needs to store huge flat collections of files where you just
  81 read and write whole files at once, then you might well be better off
  82 using the :ref:`Object Gateway <object-gateway>`