Description
Describe the bug
When sending Bluetooth L2CAP messages over a dynamic channel (with bt_l2cap_chan_send()
), the callback closure in the net_buf
userdata is never initialized (unlike with static channels). This leads to bt_conn_tx_processor()
reading out an invalid callback pointer (this manifests as 0x1 on my device) which later gets called, triggering a crash in tx_notify()
when calling the callback.
The TODO comment "add sdu_sent callback on last PDU" in conn.c:1039
seems relevant, since the exact problem is that this PDU has never had its callback set. I don't know why that is though, since l2cap_data_pull
should add the callback for dynamic channels, but this doesn't seem to happen in all cases.
The issue goes away when I manually zero the net_buf
user_data before passing the buffer to bt_l2cap_chan_send()
.
To Reproduce
- device A connects to device B (
bt_conn_le_create()
) - device A creates L2CAP channel with device B (
bt_l2cap_chan_connect()
) - device A and device B send packets (sending another one as soon as the
chan_ops.sent
callback is called) (bt_l2cap_chan_send()
) - power cycle device A
- go to 1 until the crash occurs on device B
I don't have a small enough sample yet that I can share here since this occurred in an internal code base, but if necessary I can try and strip it down.
Expected behavior
Zephyr should not crash.
Impact
Annoyance, leads to occasional power cycles needed
Logs and console output
These logs where done with warn log level and certain log messages patched to be WRN instead of DBG since with debug log level lead to a lot of log output being dropped due to buffer overruns.
receiver:bt:l2cap: accepting channel connection
receiver:bt:l2cap: channel connected
[00:00:58.524,719] <wrn> bt_conn: conn 0x20000fc0: not connected
[00:00:59.391,998] <wrn> bt_conn: tx_process: pop: cb 0x1 userdata 0 buf 0x20007488 (ref 1)
[00:00:59.392,028] <wrn> bt_conn: tx_process: conn 0x20000fc0 buf 0x20007488 (last)
[00:00:59.392,028] <wrn> bt_conn: send_buf: invalid callback, cb = 0x1
[00:00:59.434,844] <wrn> bt_conn: tx_notify: invalid cb: conn 0x20000fc0 tx 0x20001b1c cb 0x1 user_data 0
[00:00:59.434,875] <err> os: ***** MPU FAULT *****
[00:00:59.434,875] <err> os: Data Access Violation
[00:00:59.434,875] <err> os: MMFAR Address: 0xd
[00:00:59.434,906] <err> os: r0/a1: 0x00000000 r1/a2: 0xffffffff r2/a3: 0x00000000
[00:00:59.434,906] <err> os: r3/a4: 0x00000001 r12/ip: 0x20001cac r14/lr: 0x00009b0b
[00:00:59.434,936] <err> os: xpsr: 0x61000000
[00:00:59.434,936] <err> os: Faulting instruction address (r15/pc): 0x0000002c
[00:00:59.434,967] <err> os: >>> ZEPHYR FATAL ERROR 19: Unknown error on CPU 0
[00:00:59.434,997] <err> os: Current thread: 0x20001898 (unknown)
[00:01:00.841,003] <err> os: Halting system
The crash dump link register points to /home/ferdinand/Documents/Work/UWB/UDK1/zephyrproject/zephyr/subsys/bluetooth/host/conn.c:299
, which means this crash happened why trying to call the invalid cb
.
Logging patch:
diff --git a/subsys/bluetooth/host/conn.c b/subsys/bluetooth/host/conn.c
index 31d08048571..b86d65c1b56 100644
--- a/subsys/bluetooth/host/conn.c
+++ b/subsys/bluetooth/host/conn.c
@@ -278,7 +278,7 @@ static void tx_notify(struct bt_conn *conn)
return;
}
- LOG_DBG("tx %p cb %p user_data %p", tx, tx->cb, tx->user_data);
+ if (tx->cb == (void*)1) LOG_WRN("tx_notify: invalid cb: conn %p tx %p cb %p user_data %p", conn, tx, tx->cb, tx->user_data);
/* Copy over the params */
cb = tx->cb;
@@ -648,6 +648,7 @@ static int send_buf(struct bt_conn *conn, struct net_buf *buf,
return -ENOMEM;
}
+ if (cb == (void*)1) LOG_WRN("send_buf: invalid callback, cb = %p", cb);
tx->cb = cb;
tx->user_data = ud;
@@ -1041,10 +1042,10 @@ void bt_conn_tx_processor(void)
* We still allocate one TX context per-fragment though.
*/
conn->get_and_clear_cb(conn, buf, &cb, &ud);
- LOG_DBG("pop: cb %p userdata %p", cb, ud);
+ if (cb == (void*)1) LOG_WRN("tx_process: pop: cb %p userdata %p buf %p (ref %d)", cb, ud, buf, buf->ref);
}
- LOG_DBG("TX process: conn %p buf %p (%s)",
+ if (cb == (void*)1) LOG_WRN("tx_process: conn %p buf %p (%s)",
conn, buf, last_buf ? "last" : "frag");
int err = send_buf(conn, buf, buf_len, cb, ud);
Environment (please complete the following information):
- OS: Linux
- Toolchain: arm-zephyr-eabi-gcc (Zephyr SDK 0.16.6) 12.2.0
- Commit SHA or Version used: 3.7.0 (642b308)
Additional context
Target: nrf52840dk/nrf52840
Config (prj.conf
):
CONFIG_GPIO=y
CONFIG_BT=y
CONFIG_BT_CENTRAL=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_ATT_PREPARE_COUNT=1
CONFIG_BT_GATT_CLIENT=y
CONFIG_BT_CTLR_ADVANCED_FEATURES=y
CONFIG_BT_CTLR_CONN_RSSI=y
CONFIG_BT_SMP=y
CONFIG_BT_L2CAP_DYNAMIC_CHANNEL=y
CONFIG_BT_L2CAP_TX_MTU=65
CONFIG_BT_BUF_ACL_RX_SIZE=71
CONFIG_CRC=y
CONFIG_STD_C23=y
CONFIG_REBOOT=y
CONFIG_LOG=y
CONFIG_BT_LOG_LEVEL_WRN=y
CONFIG_EXCEPTION_STACK_TRACE=y
CONFIG_COMPILER_OPT="-g"