[mirror_ubuntu-focal-kernel.git] / Documentation / kref.txt

===================================================
Adding reference counters (krefs) to kernel objects
===================================================

:Author: Corey Minyard <minyard@acm.org>
:Author: Thomas Hellstrom <thellstrom@vmware.com>

A lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and
presentation on krefs, which can be found at:

  - http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf
  - http://www.kroah.com/linux/talks/ols_2004_kref_talk/

Introduction
============

krefs allow you to add reference counters to your objects.  If you
have objects that are used in multiple places and passed around, and
you don't have refcounts, your code is almost certainly broken.  If
you want refcounts, krefs are the way to go.

To use a kref, add one to your data structures like::

    struct my_data
    {
	.
	.
	struct kref refcount;
	.
	.
    };

The kref can occur anywhere within the data structure.

Initialization
==============

You must initialize the kref after you allocate it.  To do this, call
kref_init as so::

     struct my_data *data;

     data = kmalloc(sizeof(*data), GFP_KERNEL);
     if (!data)
            return -ENOMEM;
     kref_init(&data->refcount);

This sets the refcount in the kref to 1.

Kref rules
==========

Once you have an initialized kref, you must follow the following
rules:

1) If you make a non-temporary copy of a pointer, especially if
   it can be passed to another thread of execution, you must
   increment the refcount with kref_get() before passing it off::

       kref_get(&data->refcount);

   If you already have a valid pointer to a kref-ed structure (the
   refcount cannot go to zero) you may do this without a lock.

2) When you are done with a pointer, you must call kref_put()::

       kref_put(&data->refcount, data_release);

   If this is the last reference to the pointer, the release
   routine will be called.  If the code never tries to get
   a valid pointer to a kref-ed structure without already
   holding a valid pointer, it is safe to do this without
   a lock.

3) If the code attempts to gain a reference to a kref-ed structure
   without already holding a valid pointer, it must serialize access
   where a kref_put() cannot occur during the kref_get(), and the
   structure must remain valid during the kref_get().

For example, if you allocate some data and then pass it to another
thread to process::

    void data_release(struct kref *ref)
    {
	struct my_data *data = container_of(ref, struct my_data, refcount);
	kfree(data);
    }

    void more_data_handling(void *cb_data)
    {
	struct my_data *data = cb_data;
	.
	. do stuff with data here
	.
	kref_put(&data->refcount, data_release);
    }

    int my_data_handler(void)
    {
	int rv = 0;
	struct my_data *data;
	struct task_struct *task;
	data = kmalloc(sizeof(*data), GFP_KERNEL);
	if (!data)
		return -ENOMEM;
	kref_init(&data->refcount);

	kref_get(&data->refcount);
	task = kthread_run(more_data_handling, data, "more_data_handling");
	if (task == ERR_PTR(-ENOMEM)) {
		rv = -ENOMEM;
	        kref_put(&data->refcount, data_release);
		goto out;
	}

	.
	. do stuff with data here
	.
    out:
	kref_put(&data->refcount, data_release);
	return rv;
    }

This way, it doesn't matter what order the two threads handle the
data, the kref_put() handles knowing when the data is not referenced
any more and releasing it.  The kref_get() does not require a lock,
since we already have a valid pointer that we own a refcount for.  The
put needs no lock because nothing tries to get the data without
already holding a pointer.

Note that the "before" in rule 1 is very important.  You should never
do something like::

	task = kthread_run(more_data_handling, data, "more_data_handling");
	if (task == ERR_PTR(-ENOMEM)) {
		rv = -ENOMEM;
		goto out;
	} else
		/* BAD BAD BAD - get is after the handoff */
		kref_get(&data->refcount);

Don't assume you know what you are doing and use the above construct.
First of all, you may not know what you are doing.  Second, you may
know what you are doing (there are some situations where locking is
involved where the above may be legal) but someone else who doesn't
know what they are doing may change the code or copy the code.  It's
bad style.  Don't do it.

There are some situations where you can optimize the gets and puts.
For instance, if you are done with an object and enqueuing it for
something else or passing it off to something else, there is no reason
to do a get then a put::

	/* Silly extra get and put */
	kref_get(&obj->ref);
	enqueue(obj);
	kref_put(&obj->ref, obj_cleanup);

Just do the enqueue.  A comment about this is always welcome::

	enqueue(obj);
	/* We are done with obj, so we pass our refcount off
	   to the queue.  DON'T TOUCH obj AFTER HERE! */

The last rule (rule 3) is the nastiest one to handle.  Say, for
instance, you have a list of items that are each kref-ed, and you wish
to get the first one.  You can't just pull the first item off the list
and kref_get() it.  That violates rule 3 because you are not already
holding a valid pointer.  You must add a mutex (or some other lock).
For instance::

	static DEFINE_MUTEX(mutex);
	static LIST_HEAD(q);
	struct my_data
	{
		struct kref      refcount;
		struct list_head link;
	};

	static struct my_data *get_entry()
	{
		struct my_data *entry = NULL;
		mutex_lock(&mutex);
		if (!list_empty(&q)) {
			entry = container_of(q.next, struct my_data, link);
			kref_get(&entry->refcount);
		}
		mutex_unlock(&mutex);
		return entry;
	}

	static void release_entry(struct kref *ref)
	{
		struct my_data *entry = container_of(ref, struct my_data, refcount);

		list_del(&entry->link);
		kfree(entry);
	}

	static void put_entry(struct my_data *entry)
	{
		mutex_lock(&mutex);
		kref_put(&entry->refcount, release_entry);
		mutex_unlock(&mutex);
	}

The kref_put() return value is useful if you do not want to hold the
lock during the whole release operation.  Say you didn't want to call
kfree() with the lock held in the example above (since it is kind of
pointless to do so).  You could use kref_put() as follows::

	static void release_entry(struct kref *ref)
	{
		/* All work is done after the return from kref_put(). */
	}

	static void put_entry(struct my_data *entry)
	{
		mutex_lock(&mutex);
		if (kref_put(&entry->refcount, release_entry)) {
			list_del(&entry->link);
			mutex_unlock(&mutex);
			kfree(entry);
		} else
			mutex_unlock(&mutex);
	}

This is really more useful if you have to call other routines as part
of the free operations that could take a long time or might claim the
same lock.  Note that doing everything in the release routine is still
preferred as it is a little neater.

The above example could also be optimized using kref_get_unless_zero() in
the following way::

	static struct my_data *get_entry()
	{
		struct my_data *entry = NULL;
		mutex_lock(&mutex);
		if (!list_empty(&q)) {
			entry = container_of(q.next, struct my_data, link);
			if (!kref_get_unless_zero(&entry->refcount))
				entry = NULL;
		}
		mutex_unlock(&mutex);
		return entry;
	}

	static void release_entry(struct kref *ref)
	{
		struct my_data *entry = container_of(ref, struct my_data, refcount);

		mutex_lock(&mutex);
		list_del(&entry->link);
		mutex_unlock(&mutex);
		kfree(entry);
	}

	static void put_entry(struct my_data *entry)
	{
		kref_put(&entry->refcount, release_entry);
	}

Which is useful to remove the mutex lock around kref_put() in put_entry(), but
it's important that kref_get_unless_zero is enclosed in the same critical
section that finds the entry in the lookup table,
otherwise kref_get_unless_zero may reference already freed memory.
Note that it is illegal to use kref_get_unless_zero without checking its
return value. If you are sure (by already having a valid pointer) that
kref_get_unless_zero() will return true, then use kref_get() instead.

Krefs and RCU
=============

The function kref_get_unless_zero also makes it possible to use rcu
locking for lookups in the above example::

	struct my_data
	{
		struct rcu_head rhead;
		.
		struct kref refcount;
		.
		.
	};

	static struct my_data *get_entry_rcu()
	{
		struct my_data *entry = NULL;
		rcu_read_lock();
		if (!list_empty(&q)) {
			entry = container_of(q.next, struct my_data, link);
			if (!kref_get_unless_zero(&entry->refcount))
				entry = NULL;
		}
		rcu_read_unlock();
		return entry;
	}

	static void release_entry_rcu(struct kref *ref)
	{
		struct my_data *entry = container_of(ref, struct my_data, refcount);

		mutex_lock(&mutex);
		list_del_rcu(&entry->link);
		mutex_unlock(&mutex);
		kfree_rcu(entry, rhead);
	}

	static void put_entry(struct my_data *entry)
	{
		kref_put(&entry->refcount, release_entry_rcu);
	}

But note that the struct kref member needs to remain in valid memory for a
rcu grace period after release_entry_rcu was called. That can be accomplished
by using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu()
before using kfree, but note that synchronize_rcu() may sleep for a
substantial amount of time.
Commit	Line	Data
d6ac1c7e MCC	1	===================================================
	2	Adding reference counters (krefs) to kernel objects
	3	===================================================
	4
	5	:Author: Corey Minyard <minyard@acm.org>
	6	:Author: Thomas Hellstrom <thellstrom@vmware.com>
	7
	8	A lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and
	9	presentation on krefs, which can be found at:
	10
	11	- http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf
	12	- http://www.kroah.com/linux/talks/ols_2004_kref_talk/
	13
	14	Introduction
	15	============
5c11c520 CM	16
	17	krefs allow you to add reference counters to your objects. If you
	18	have objects that are used in multiple places and passed around, and
	19	you don't have refcounts, your code is almost certainly broken. If
	20	you want refcounts, krefs are the way to go.
	21
d6ac1c7e	22	To use a kref, add one to your data structures like::
5c11c520	23
d6ac1c7e MCC	24	struct my_data
d6ac1c7e MCC	25	{
5c11c520 CM	26	.
	27	.
	28	struct kref refcount;
	29	.
	30	.
d6ac1c7e	31	};
5c11c520 CM	32
	33	The kref can occur anywhere within the data structure.
	34
d6ac1c7e MCC	35	Initialization
	36	==============
	37
5c11c520	38	You must initialize the kref after you allocate it. To do this, call
d6ac1c7e	39	kref_init as so::
5c11c520 CM	40
	41	struct my_data *data;
	42
	43	data = kmalloc(sizeof(*data), GFP_KERNEL);
	44	if (!data)
	45	return -ENOMEM;
	46	kref_init(&data->refcount);
	47
	48	This sets the refcount in the kref to 1.
	49
d6ac1c7e MCC	50	Kref rules
	51	==========
	52
5c11c520 CM	53	Once you have an initialized kref, you must follow the following
	54	rules:
	55
	56	1) If you make a non-temporary copy of a pointer, especially if
	57	it can be passed to another thread of execution, you must
d6ac1c7e MCC	58	increment the refcount with kref_get() before passing it off::
d6ac1c7e MCC	59
5c11c520	60	kref_get(&data->refcount);
d6ac1c7e	61
5c11c520 CM	62	If you already have a valid pointer to a kref-ed structure (the
	63	refcount cannot go to zero) you may do this without a lock.
	64
d6ac1c7e MCC	65	2) When you are done with a pointer, you must call kref_put()::
d6ac1c7e MCC	66
5c11c520	67	kref_put(&data->refcount, data_release);
d6ac1c7e	68
5c11c520 CM	69	If this is the last reference to the pointer, the release
	70	routine will be called. If the code never tries to get
	71	a valid pointer to a kref-ed structure without already
	72	holding a valid pointer, it is safe to do this without
	73	a lock.
	74
	75	3) If the code attempts to gain a reference to a kref-ed structure
	76	without already holding a valid pointer, it must serialize access
	77	where a kref_put() cannot occur during the kref_get(), and the
	78	structure must remain valid during the kref_get().
	79
	80	For example, if you allocate some data and then pass it to another
d6ac1c7e	81	thread to process::
5c11c520	82
d6ac1c7e MCC	83	void data_release(struct kref *ref)
d6ac1c7e MCC	84	{
5c11c520 CM	85	struct my_data *data = container_of(ref, struct my_data, refcount);
5c11c520 CM	86	kfree(data);
d6ac1c7e	87	}
5c11c520	88
d6ac1c7e MCC	89	void more_data_handling(void *cb_data)
d6ac1c7e MCC	90	{
5c11c520 CM	91	struct my_data *data = cb_data;
	92	.
	93	. do stuff with data here
	94	.
b7cc4a87	95	kref_put(&data->refcount, data_release);
d6ac1c7e	96	}
5c11c520	97
d6ac1c7e MCC	98	int my_data_handler(void)
d6ac1c7e MCC	99	{
5c11c520 CM	100	int rv = 0;
	101	struct my_data *data;
	102	struct task_struct *task;
	103	data = kmalloc(sizeof(*data), GFP_KERNEL);
	104	if (!data)
	105	return -ENOMEM;
	106	kref_init(&data->refcount);
	107
	108	kref_get(&data->refcount);
	109	task = kthread_run(more_data_handling, data, "more_data_handling");
	110	if (task == ERR_PTR(-ENOMEM)) {
	111	rv = -ENOMEM;
fd0f50db	112	kref_put(&data->refcount, data_release);
5c11c520 CM	113	goto out;
	114	}
	115
	116	.
	117	. do stuff with data here
	118	.
d6ac1c7e	119	out:
5c11c520 CM	120	kref_put(&data->refcount, data_release);
5c11c520 CM	121	return rv;
d6ac1c7e	122	}
5c11c520 CM	123
	124	This way, it doesn't matter what order the two threads handle the
	125	data, the kref_put() handles knowing when the data is not referenced
	126	any more and releasing it. The kref_get() does not require a lock,
	127	since we already have a valid pointer that we own a refcount for. The
	128	put needs no lock because nothing tries to get the data without
	129	already holding a pointer.
	130
	131	Note that the "before" in rule 1 is very important. You should never
d6ac1c7e	132	do something like::
5c11c520 CM	133
	134	task = kthread_run(more_data_handling, data, "more_data_handling");
	135	if (task == ERR_PTR(-ENOMEM)) {
	136	rv = -ENOMEM;
	137	goto out;
	138	} else
	139	/* BAD BAD BAD - get is after the handoff */
	140	kref_get(&data->refcount);
	141
	142	Don't assume you know what you are doing and use the above construct.
	143	First of all, you may not know what you are doing. Second, you may
	144	know what you are doing (there are some situations where locking is
	145	involved where the above may be legal) but someone else who doesn't
	146	know what they are doing may change the code or copy the code. It's
	147	bad style. Don't do it.
	148
	149	There are some situations where you can optimize the gets and puts.
	150	For instance, if you are done with an object and enqueuing it for
	151	something else or passing it off to something else, there is no reason
d6ac1c7e	152	to do a get then a put::
5c11c520 CM	153
	154	/* Silly extra get and put */
	155	kref_get(&obj->ref);
	156	enqueue(obj);
	157	kref_put(&obj->ref, obj_cleanup);
	158
d6ac1c7e	159	Just do the enqueue. A comment about this is always welcome::
5c11c520 CM	160
	161	enqueue(obj);
	162	/* We are done with obj, so we pass our refcount off
	163	to the queue. DON'T TOUCH obj AFTER HERE! */
	164
	165	The last rule (rule 3) is the nastiest one to handle. Say, for
	166	instance, you have a list of items that are each kref-ed, and you wish
	167	to get the first one. You can't just pull the first item off the list
	168	and kref_get() it. That violates rule 3 because you are not already
1373bed3	169	holding a valid pointer. You must add a mutex (or some other lock).
d6ac1c7e MCC	170	For instance::
	171
	172	static DEFINE_MUTEX(mutex);
	173	static LIST_HEAD(q);
	174	struct my_data
	175	{
	176	struct kref refcount;
	177	struct list_head link;
	178	};
	179
	180	static struct my_data *get_entry()
	181	{
	182	struct my_data *entry = NULL;
	183	mutex_lock(&mutex);
	184	if (!list_empty(&q)) {
	185	entry = container_of(q.next, struct my_data, link);
	186	kref_get(&entry->refcount);
	187	}
	188	mutex_unlock(&mutex);
	189	return entry;
5c11c520	190	}
5c11c520	191
d6ac1c7e MCC	192	static void release_entry(struct kref *ref)
	193	{
	194	struct my_data *entry = container_of(ref, struct my_data, refcount);
5c11c520	195
d6ac1c7e MCC	196	list_del(&entry->link);
	197	kfree(entry);
	198	}
5c11c520	199
d6ac1c7e MCC	200	static void put_entry(struct my_data *entry)
	201	{
	202	mutex_lock(&mutex);
	203	kref_put(&entry->refcount, release_entry);
	204	mutex_unlock(&mutex);
	205	}
5c11c520 CM	206
	207	The kref_put() return value is useful if you do not want to hold the
	208	lock during the whole release operation. Say you didn't want to call
	209	kfree() with the lock held in the example above (since it is kind of
d6ac1c7e	210	pointless to do so). You could use kref_put() as follows::
5c11c520	211
d6ac1c7e MCC	212	static void release_entry(struct kref *ref)
	213	{
	214	/* All work is done after the return from kref_put(). */
	215	}
5c11c520	216
d6ac1c7e MCC	217	static void put_entry(struct my_data *entry)
	218	{
	219	mutex_lock(&mutex);
	220	if (kref_put(&entry->refcount, release_entry)) {
	221	list_del(&entry->link);
	222	mutex_unlock(&mutex);
	223	kfree(entry);
	224	} else
	225	mutex_unlock(&mutex);
	226	}
5c11c520 CM	227
	228	This is really more useful if you have to call other routines as part
	229	of the free operations that could take a long time or might claim the
	230	same lock. Note that doing everything in the release routine is still
	231	preferred as it is a little neater.
	232
a82b8db0	233	The above example could also be optimized using kref_get_unless_zero() in
d6ac1c7e MCC	234	the following way::
	235
	236	static struct my_data *get_entry()
	237	{
	238	struct my_data *entry = NULL;
	239	mutex_lock(&mutex);
	240	if (!list_empty(&q)) {
	241	entry = container_of(q.next, struct my_data, link);
	242	if (!kref_get_unless_zero(&entry->refcount))
	243	entry = NULL;
	244	}
	245	mutex_unlock(&mutex);
	246	return entry;
a82b8db0	247	}
a82b8db0	248
d6ac1c7e MCC	249	static void release_entry(struct kref *ref)
	250	{
	251	struct my_data *entry = container_of(ref, struct my_data, refcount);
a82b8db0	252
d6ac1c7e MCC	253	mutex_lock(&mutex);
	254	list_del(&entry->link);
	255	mutex_unlock(&mutex);
	256	kfree(entry);
	257	}
a82b8db0	258
d6ac1c7e MCC	259	static void put_entry(struct my_data *entry)
	260	{
	261	kref_put(&entry->refcount, release_entry);
	262	}
a82b8db0 TH	263
	264	Which is useful to remove the mutex lock around kref_put() in put_entry(), but
	265	it's important that kref_get_unless_zero is enclosed in the same critical
	266	section that finds the entry in the lookup table,
	267	otherwise kref_get_unless_zero may reference already freed memory.
	268	Note that it is illegal to use kref_get_unless_zero without checking its
	269	return value. If you are sure (by already having a valid pointer) that
	270	kref_get_unless_zero() will return true, then use kref_get() instead.
	271
d6ac1c7e MCC	272	Krefs and RCU
d6ac1c7e MCC	273	=============
a82b8db0	274
d6ac1c7e MCC	275	The function kref_get_unless_zero also makes it possible to use rcu
	276	locking for lookups in the above example::
	277
	278	struct my_data
	279	{
	280	struct rcu_head rhead;
	281	.
	282	struct kref refcount;
	283	.
	284	.
	285	};
	286
	287	static struct my_data *get_entry_rcu()
	288	{
	289	struct my_data *entry = NULL;
	290	rcu_read_lock();
	291	if (!list_empty(&q)) {
	292	entry = container_of(q.next, struct my_data, link);
	293	if (!kref_get_unless_zero(&entry->refcount))
	294	entry = NULL;
	295	}
	296	rcu_read_unlock();
	297	return entry;
a82b8db0	298	}
a82b8db0	299
d6ac1c7e MCC	300	static void release_entry_rcu(struct kref *ref)
	301	{
	302	struct my_data *entry = container_of(ref, struct my_data, refcount);
a82b8db0	303
d6ac1c7e MCC	304	mutex_lock(&mutex);
	305	list_del_rcu(&entry->link);
	306	mutex_unlock(&mutex);
	307	kfree_rcu(entry, rhead);
	308	}
a82b8db0	309
d6ac1c7e MCC	310	static void put_entry(struct my_data *entry)
	311	{
	312	kref_put(&entry->refcount, release_entry_rcu);
	313	}
a82b8db0 TH	314
	315	But note that the struct kref member needs to remain in valid memory for a
	316	rcu grace period after release_entry_rcu was called. That can be accomplished
	317	by using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu()
	318	before using kfree, but note that synchronize_rcu() may sleep for a
	319	substantial amount of time.