Commit
333f796235a527 ("tipc: fix a race condition leading to
subscriber refcnt bug") reveals a soft lockup while acquiring
nametbl_lock.
Before commit
333f796235a527, we call tipc_conn_shutdown() from
tipc_close_conn() in the context of tipc_topsrv_stop(). In that
context, we are allowed to grab the nametbl_lock.
Commit
333f796235a527, moved tipc_conn_release (renamed from
tipc_conn_shutdown) to the connection refcount cleanup. This allows
either tipc_nametbl_withdraw() or tipc_topsrv_stop() to the cleanup.
Since tipc_exit_net() first calls tipc_topsrv_stop() and then
tipc_nametble_withdraw() increases the chances for the later to
perform the connection cleanup.
The soft lockup occurs in the call chain of tipc_nametbl_withdraw(),
when it performs the tipc_conn_kref_release() as it tries to grab
nametbl_lock again while holding it already.
tipc_nametbl_withdraw() grabs nametbl_lock
tipc_nametbl_remove_publ()
tipc_subscrp_report_overlap()
tipc_subscrp_send_event()
tipc_conn_sendmsg()
<< if (con->flags != CF_CONNECTED) we do conn_put(),
triggering the cleanup as refcount=0. >>
tipc_conn_kref_release
tipc_sock_release
tipc_conn_release
tipc_subscrb_delete
tipc_subscrp_delete
tipc_nametbl_unsubscribe << Soft Lockup >>
The previous changes in this series fixes the race conditions fixed
by commit
333f796235a527. Hence we can now revert the commit.
Fixes: 333f796235a52727 ("tipc: fix a race condition leading to subscriber refcnt bug")
Reported-and-Tested-by: John Thompson <thompa.atl@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
static void tipc_recv_work(struct work_struct *work);
static void tipc_send_work(struct work_struct *work);
static void tipc_clean_outqueues(struct tipc_conn *con);
-static void tipc_sock_release(struct tipc_conn *con);
static void tipc_conn_kref_release(struct kref *kref)
{
}
saddr->scope = -TIPC_NODE_SCOPE;
kernel_bind(sock, (struct sockaddr *)saddr, sizeof(*saddr));
- tipc_sock_release(con);
sock_release(sock);
con->sock = NULL;
write_unlock_bh(&sk->sk_callback_lock);
}
-static void tipc_sock_release(struct tipc_conn *con)
+static void tipc_close_conn(struct tipc_conn *con)
{
struct tipc_server *s = con->server;
- if (con->conid)
- s->tipc_conn_release(con->conid, con->usr_data);
-
- tipc_unregister_callbacks(con);
-}
-
-static void tipc_close_conn(struct tipc_conn *con)
-{
if (test_and_clear_bit(CF_CONNECTED, &con->flags)) {
+ tipc_unregister_callbacks(con);
+
+ if (con->conid)
+ s->tipc_conn_release(con->conid, con->usr_data);
/* We shouldn't flush pending works as we may be in the
* thread. In fact the races with pending rx/tx work structs