| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
hwmon: (pmbus/core) Protect regulator operations with mutex
The regulator operations pmbus_regulator_get_voltage(),
pmbus_regulator_set_voltage(), and pmbus_regulator_list_voltage()
access PMBus registers and shared data but were not protected by
the update_lock mutex. This could lead to race conditions.
However, adding mutex protection directly to these functions causes
a deadlock because pmbus_regulator_notify() (which calls
regulator_notifier_call_chain()) is often called with the mutex
already held (e.g., from pmbus_fault_handler()). If a regulator
callback then calls one of the now-protected voltage functions,
it will attempt to acquire the same mutex.
Rework pmbus_regulator_notify() to utilize a worker function to
send notifications outside of the mutex protection. Events are
stored as atomics in a per-page bitmask and processed by the worker.
Initialize the worker and its associated data during regulator
registration, and ensure it is cancelled on device removal using
devm_add_action_or_reset().
While at it, remove the unnecessary include of linux/of.h. |
| In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock
btintel_hw_error() issues two __hci_cmd_sync() calls (HCI_OP_RESET
and Intel exception-info retrieval) without holding
hci_req_sync_lock(). This lets it race against
hci_dev_do_close() -> btintel_shutdown_combined(), which also runs
__hci_cmd_sync() under the same lock. When both paths manipulate
hdev->req_status/req_rsp concurrently, the close path may free the
response skb first, and the still-running hw_error path hits a
slab-use-after-free in kfree_skb().
Wrap the whole recovery sequence in hci_req_sync_lock/unlock so it
is serialized with every other synchronous HCI command issuer.
Below is the data race report and the kasan report:
BUG: data-race in __hci_cmd_sync_sk / btintel_shutdown_combined
read of hdev->req_rsp at net/bluetooth/hci_sync.c:199
by task kworker/u17:1/83:
__hci_cmd_sync_sk+0x12f2/0x1c30 net/bluetooth/hci_sync.c:200
__hci_cmd_sync+0x55/0x80 net/bluetooth/hci_sync.c:223
btintel_hw_error+0x114/0x670 drivers/bluetooth/btintel.c:254
hci_error_reset+0x348/0xa30 net/bluetooth/hci_core.c:1030
write/free by task ioctl/22580:
btintel_shutdown_combined+0xd0/0x360
drivers/bluetooth/btintel.c:3648
hci_dev_close_sync+0x9ae/0x2c10 net/bluetooth/hci_sync.c:5246
hci_dev_do_close+0x232/0x460 net/bluetooth/hci_core.c:526
BUG: KASAN: slab-use-after-free in
sk_skb_reason_drop+0x43/0x380 net/core/skbuff.c:1202
Read of size 4 at addr ffff888144a738dc
by task kworker/u17:1/83:
__hci_cmd_sync_sk+0x12f2/0x1c30 net/bluetooth/hci_sync.c:200
__hci_cmd_sync+0x55/0x80 net/bluetooth/hci_sync.c:223
btintel_hw_error+0x186/0x670 drivers/bluetooth/btintel.c:260 |
| In the Linux kernel, the following vulnerability has been resolved:
KVM: SEV: Lock all vCPUs when synchronzing VMSAs for SNP launch finish
Lock all vCPUs when synchronizing and encrypting VMSAs for SNP guests, as
allowing userspace to manipulate and/or run a vCPU while its state is being
synchronized would at best corrupt vCPU state, and at worst crash the host
kernel.
Opportunistically assert that vcpu->mutex is held when synchronizing its
VMSA (the SEV-ES path already locks vCPUs). |
| In the Linux kernel, the following vulnerability has been resolved:
mtd: rawnand: serialize lock/unlock against other NAND operations
nand_lock() and nand_unlock() call into chip->ops.lock_area/unlock_area
without holding the NAND device lock. On controllers that implement
SET_FEATURES via multiple low-level PIO commands, these can race with
concurrent UBI/UBIFS background erase/write operations that hold the
device lock, resulting in cmd_pending conflicts on the NAND controller.
Add nand_get_device()/nand_release_device() around the lock/unlock
operations to serialize them against all other NAND controller access. |
| In the Linux kernel, the following vulnerability has been resolved:
blktrace: fix __this_cpu_read/write in preemptible context
tracing_record_cmdline() internally uses __this_cpu_read() and
__this_cpu_write() on the per-CPU variable trace_cmdline_save, and
trace_save_cmdline() explicitly asserts preemption is disabled via
lockdep_assert_preemption_disabled(). These operations are only safe
when preemption is off, as they were designed to be called from the
scheduler context (probe_wakeup_sched_switch() / probe_wakeup()).
__blk_add_trace() was calling tracing_record_cmdline(current) early in
the blk_tracer path, before ring buffer reservation, from process
context where preemption is fully enabled. This triggers the following
using blktests/blktrace/002:
blktrace/002 (blktrace ftrace corruption with sysfs trace) [failed]
runtime 0.367s ... 0.437s
something found in dmesg:
[ 81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
[ 81.239580] null_blk: disk nullb1 created
[ 81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
[ 81.362842] caller is tracing_record_cmdline+0x10/0x40
[ 81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G N 7.0.0-rc1lblk+ #84 PREEMPT(full)
[ 81.362877] Tainted: [N]=TEST
[ 81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 81.362881] Call Trace:
[ 81.362884] <TASK>
[ 81.362886] dump_stack_lvl+0x8d/0xb0
...
(See '/mnt/sda/blktests/results/nodev/blktrace/002.dmesg' for the entire message)
[ 81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
[ 81.239580] null_blk: disk nullb1 created
[ 81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
[ 81.362842] caller is tracing_record_cmdline+0x10/0x40
[ 81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G N 7.0.0-rc1lblk+ #84 PREEMPT(full)
[ 81.362877] Tainted: [N]=TEST
[ 81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 81.362881] Call Trace:
[ 81.362884] <TASK>
[ 81.362886] dump_stack_lvl+0x8d/0xb0
[ 81.362895] check_preemption_disabled+0xce/0xe0
[ 81.362902] tracing_record_cmdline+0x10/0x40
[ 81.362923] __blk_add_trace+0x307/0x5d0
[ 81.362934] ? lock_acquire+0xe0/0x300
[ 81.362940] ? iov_iter_extract_pages+0x101/0xa30
[ 81.362959] blk_add_trace_bio+0x106/0x1e0
[ 81.362968] submit_bio_noacct_nocheck+0x24b/0x3a0
[ 81.362979] ? lockdep_init_map_type+0x58/0x260
[ 81.362988] submit_bio_wait+0x56/0x90
[ 81.363009] __blkdev_direct_IO_simple+0x16c/0x250
[ 81.363026] ? __pfx_submit_bio_wait_endio+0x10/0x10
[ 81.363038] ? rcu_read_lock_any_held+0x73/0xa0
[ 81.363051] blkdev_read_iter+0xc1/0x140
[ 81.363059] vfs_read+0x20b/0x330
[ 81.363083] ksys_read+0x67/0xe0
[ 81.363090] do_syscall_64+0xbf/0xf00
[ 81.363102] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 81.363106] RIP: 0033:0x7f281906029d
[ 81.363111] Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d 66 63 0a 00 e8 59 ff 01 00 66 0f 1f 84 00 00 00 00 00 80 3d 41 33 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
[ 81.363113] RSP: 002b:00007ffca127dd48 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 81.363120] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f281906029d
[ 81.363122] RDX: 0000000000001000 RSI: 0000559f8bfae000 RDI: 0000000000000000
[ 81.363123] RBP: 0000000000001000 R08: 0000002863a10a81 R09: 00007f281915f000
[ 81.363124] R10: 00007f2818f77b60 R11: 0000000000000246 R12: 0000559f8bfae000
[ 81.363126] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000a
[ 81.363142] </TASK>
The same BUG fires from blk_add_trace_plug(), blk_add_trace_unplug(),
and blk_add_trace_rq() paths as well.
The purpose of tracin
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
cxl: Fix race of nvdimm_bus object when creating nvdimm objects
Found issue during running of cxl-translate.sh unit test. Adding a 3s
sleep right before the test seems to make the issue reproduce fairly
consistently. The cxl_translate module has dependency on cxl_acpi and
causes orphaned nvdimm objects to reprobe after cxl_acpi is removed.
The nvdimm_bus object is registered by the cxl_nvb object when
cxl_acpi_probe() is called. With the nvdimm_bus object missing,
__nd_device_register() will trigger NULL pointer dereference when
accessing the dev->parent that points to &nvdimm_bus->dev.
[ 192.884510] BUG: kernel NULL pointer dereference, address: 000000000000006c
[ 192.895383] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20250812-19.fc42 08/12/2025
[ 192.897721] Workqueue: cxl_port cxl_bus_rescan_queue [cxl_core]
[ 192.899459] RIP: 0010:kobject_get+0xc/0x90
[ 192.924871] Call Trace:
[ 192.925959] <TASK>
[ 192.926976] ? pm_runtime_init+0xb9/0xe0
[ 192.929712] __nd_device_register.part.0+0x4d/0xc0 [libnvdimm]
[ 192.933314] __nvdimm_create+0x206/0x290 [libnvdimm]
[ 192.936662] cxl_nvdimm_probe+0x119/0x1d0 [cxl_pmem]
[ 192.940245] cxl_bus_probe+0x1a/0x60 [cxl_core]
[ 192.943349] really_probe+0xde/0x380
This patch also relies on the previous change where
devm_cxl_add_nvdimm_bridge() is called from drivers/cxl/pmem.c instead
of drivers/cxl/core.c to ensure the dependency of cxl_acpi on cxl_pmem.
1. Set probe_type of cxl_nvb to PROBE_FORCE_SYNCHRONOUS to ensure the
driver is probed synchronously when add_device() is called.
2. Add a check in __devm_cxl_add_nvdimm_bridge() to ensure that the
cxl_nvb driver is attached during cxl_acpi_probe().
3. Take the cxl_root uport_dev lock and the cxl_nvb->dev lock in
devm_cxl_add_nvdimm() before checking nvdimm_bus is valid.
4. Set cxl_nvdimm flag to CXL_NVD_F_INVALIDATED so cxl_nvdimm_probe()
will exit with -EBUSY.
The removal of cxl_nvdimm devices should prevent any orphaned devices
from probing once the nvdimm_bus is gone.
[ dj: Fixed 0-day reported kdoc issue. ]
[ dj: Fix cxl_nvb reference leak on error. Gregory (kreview-0811365) ] |
| Requires malware code to misuse the DDK kernel module IOCTL interface.
Such code can use the interface in an unsupported way that allows subversion of the GPU to perform writes to arbitrary physical memory pages.
The product utilises a shared resource in a concurrent manner but does not attempt to synchronise access to the resource. |
| jsPDF is a library to generate PDFs in JavaScript. Prior to 4.1.0, the addJS method in the jspdf Node.js build utilizes a shared module-scoped variable (text) to store JavaScript content. When used in a concurrent environment (e.g., a Node.js web server), this variable is shared across all requests. If multiple requests generate PDFs simultaneously, the JavaScript content intended for one user may be overwritten by a subsequent request before the document is generated. This results in Cross-User Data Leakage, where the PDF generated for User A contains the JavaScript payload (and any embedded sensitive data) intended for User B. Typically, this only affects server-side environments, although the same race conditions might occur if jsPDF runs client-side. The vulnerability has been fixed in jsPDF@4.1.0. |
| The on-endpoint Microsoft vulnerable driver blocklist is not fully synchronized with the online Microsoft recommended driver block rules. Some entries present on the online list have been excluded from the on-endpoint blocklist longer than the expected periodic monthly Windows updates. It is possible to fully synchronize the driver blocklist using WDAC policies. NOTE: The vendor explains that Windows Update provides a smaller, compatibility-focused driver blocklist for general users, while the full XML list is available for advanced users and organizations to customize at the risk of usability issues. |
| LibJS in Ladybird before f5a6704 mishandles the freeing of the vector that arguments_list references, leading to a use-after-free, and allowing remote attackers to execute arbitrary code via a crafted .js file. NOTE: the GitHub README says "Ladybird is in a pre-alpha state, and only suitable for use by developers." |
| A vulnerability exists in RTU IEC 61850 client and server functionality that could impact the availability if renegotiation of an open IEC61850 TLS connection takes place in specific timing situations, when IEC61850 communication is active.
Precondition is that IEC61850 as client or server are configured using TLS on RTU500 device. It affects the CMU the IEC61850 stack is configured on. |
| Missing synchronization in Windows Hyper-V allows an authorized attacker to deny service over an adjacent network. |
| Missing synchronization in Windows Hyper-V allows an authorized attacker to deny service over an adjacent network. |
| In the Linux kernel, the following vulnerability has been resolved:
MIPS: cevt-r4k: Don't call get_c0_compare_int if timer irq is installed
This avoids warning:
[ 0.118053] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283
Caused by get_c0_compare_int on secondary CPU.
We also skipped saving IRQ number to struct clock_event_device *cd as
it's never used by clockevent core, as per comments it's only meant
for "non CPU local devices". |
| In the Linux kernel, the following vulnerability has been resolved:
netrom: Fix data-races around sysctl_net_busy_read
We need to protect the reader reading the sysctl value because the
value can be changed concurrently. |
| In the Linux kernel, the following vulnerability has been resolved:
USB: core: Fix hang in usb_kill_urb by adding memory barriers
The syzbot fuzzer has identified a bug in which processes hang waiting
for usb_kill_urb() to return. It turns out the issue is not unlinking
the URB; that works just fine. Rather, the problem arises when the
wakeup notification that the URB has completed is not received.
The reason is memory-access ordering on SMP systems. In outline form,
usb_kill_urb() and __usb_hcd_giveback_urb() operating concurrently on
different CPUs perform the following actions:
CPU 0 CPU 1
---------------------------- ---------------------------------
usb_kill_urb(): __usb_hcd_giveback_urb():
... ...
atomic_inc(&urb->reject); atomic_dec(&urb->use_count);
... ...
wait_event(usb_kill_urb_queue,
atomic_read(&urb->use_count) == 0);
if (atomic_read(&urb->reject))
wake_up(&usb_kill_urb_queue);
Confining your attention to urb->reject and urb->use_count, you can
see that the overall pattern of accesses on CPU 0 is:
write urb->reject, then read urb->use_count;
whereas the overall pattern of accesses on CPU 1 is:
write urb->use_count, then read urb->reject.
This pattern is referred to in memory-model circles as SB (for "Store
Buffering"), and it is well known that without suitable enforcement of
the desired order of accesses -- in the form of memory barriers -- it
is entirely possible for one or both CPUs to execute their reads ahead
of their writes. The end result will be that sometimes CPU 0 sees the
old un-decremented value of urb->use_count while CPU 1 sees the old
un-incremented value of urb->reject. Consequently CPU 0 ends up on
the wait queue and never gets woken up, leading to the observed hang
in usb_kill_urb().
The same pattern of accesses occurs in usb_poison_urb() and the
failure pathway of usb_hcd_submit_urb().
The problem is fixed by adding suitable memory barriers. To provide
proper memory-access ordering in the SB pattern, a full barrier is
required on both CPUs. The atomic_inc() and atomic_dec() accesses
themselves don't provide any memory ordering, but since they are
present, we can use the optimized smp_mb__after_atomic() memory
barrier in the various routines to obtain the desired effect.
This patch adds the necessary memory barriers. |
| In the Linux kernel, the following vulnerability has been resolved:
serial: mxs-auart: add spinlock around changing cts state
The uart_handle_cts_change() function in serial_core expects the caller
to hold uport->lock. For example, I have seen the below kernel splat,
when the Bluetooth driver is loaded on an i.MX28 board.
[ 85.119255] ------------[ cut here ]------------
[ 85.124413] WARNING: CPU: 0 PID: 27 at /drivers/tty/serial/serial_core.c:3453 uart_handle_cts_change+0xb4/0xec
[ 85.134694] Modules linked in: hci_uart bluetooth ecdh_generic ecc wlcore_sdio configfs
[ 85.143314] CPU: 0 PID: 27 Comm: kworker/u3:0 Not tainted 6.6.3-00021-gd62a2f068f92 #1
[ 85.151396] Hardware name: Freescale MXS (Device Tree)
[ 85.156679] Workqueue: hci0 hci_power_on [bluetooth]
(...)
[ 85.191765] uart_handle_cts_change from mxs_auart_irq_handle+0x380/0x3f4
[ 85.198787] mxs_auart_irq_handle from __handle_irq_event_percpu+0x88/0x210
(...) |
| In the Linux kernel, the following vulnerability has been resolved:
clk: imx95-blk-ctl: Fix synchronous abort
When enabling runtime PM for clock suppliers that also belong to a power
domain, the following crash is thrown:
error: synchronous external abort: 0000000096000010 [#1] PREEMPT SMP
Workqueue: events_unbound deferred_probe_work_func
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : clk_mux_get_parent+0x60/0x90
lr : clk_core_reparent_orphans_nolock+0x58/0xd8
Call trace:
clk_mux_get_parent+0x60/0x90
clk_core_reparent_orphans_nolock+0x58/0xd8
of_clk_add_hw_provider.part.0+0x90/0x100
of_clk_add_hw_provider+0x1c/0x38
imx95_bc_probe+0x2e0/0x3f0
platform_probe+0x70/0xd8
Enabling runtime PM without explicitly resuming the device caused
the power domain cut off after clk_register() is called. As a result,
a crash happens when the clock hardware provider is added and attempts
to access the BLK_CTL register.
Fix this by using devm_pm_runtime_enable() instead of pm_runtime_enable()
and getting rid of the pm_runtime_disable() in the cleanup path. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: flowtable: fix stuck flows on cleanup due to pending work
To clear the flow table on flow table free, the following sequence
normally happens in order:
1) gc_step work is stopped to disable any further stats/del requests.
2) All flow table entries are set to teardown state.
3) Run gc_step which will queue HW del work for each flow table entry.
4) Waiting for the above del work to finish (flush).
5) Run gc_step again, deleting all entries from the flow table.
6) Flow table is freed.
But if a flow table entry already has pending HW stats or HW add work
step 3 will not queue HW del work (it will be skipped), step 4 will wait
for the pending add/stats to finish, and step 5 will queue HW del work
which might execute after freeing of the flow table.
To fix the above, this patch flushes the pending work, then it sets the
teardown flag to all flows in the flowtable and it forces a garbage
collector run to queue work to remove the flows from hardware, then it
flushes this new pending work and (finally) it forces another garbage
collector run to remove the entry from the software flowtable.
Stack trace:
[47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460
[47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704
[47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2
[47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
[47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table]
[47773.889727] Call Trace:
[47773.890214] dump_stack+0xbb/0x107
[47773.890818] print_address_description.constprop.0+0x18/0x140
[47773.892990] kasan_report.cold+0x7c/0xd8
[47773.894459] kasan_check_range+0x145/0x1a0
[47773.895174] down_read+0x99/0x460
[47773.899706] nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table]
[47773.907137] flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table]
[47773.913372] process_one_work+0x8ac/0x14e0
[47773.921325]
[47773.921325] Allocated by task 592159:
[47773.922031] kasan_save_stack+0x1b/0x40
[47773.922730] __kasan_kmalloc+0x7a/0x90
[47773.923411] tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct]
[47773.924363] tcf_ct_init+0x71c/0x1156 [act_ct]
[47773.925207] tcf_action_init_1+0x45b/0x700
[47773.925987] tcf_action_init+0x453/0x6b0
[47773.926692] tcf_exts_validate+0x3d0/0x600
[47773.927419] fl_change+0x757/0x4a51 [cls_flower]
[47773.928227] tc_new_tfilter+0x89a/0x2070
[47773.936652]
[47773.936652] Freed by task 543704:
[47773.937303] kasan_save_stack+0x1b/0x40
[47773.938039] kasan_set_track+0x1c/0x30
[47773.938731] kasan_set_free_info+0x20/0x30
[47773.939467] __kasan_slab_free+0xe7/0x120
[47773.940194] slab_free_freelist_hook+0x86/0x190
[47773.941038] kfree+0xce/0x3a0
[47773.941644] tcf_ct_flow_table_cleanup_work
Original patch description and stack trace by Paul Blakey. |
| In the Linux kernel, the following vulnerability has been resolved:
xhci: Handle TD clearing for multiple streams case
When multiple streams are in use, multiple TDs might be in flight when
an endpoint is stopped. We need to issue a Set TR Dequeue Pointer for
each, to ensure everything is reset properly and the caches cleared.
Change the logic so that any N>1 TDs found active for different streams
are deferred until after the first one is processed, calling
xhci_invalidate_cancelled_tds() again from xhci_handle_cmd_set_deq() to
queue another command until we are done with all of them. Also change
the error/"should never happen" paths to ensure we at least clear any
affected TDs, even if we can't issue a command to clear the hardware
cache, and complain loudly with an xhci_warn() if this ever happens.
This problem case dates back to commit e9df17eb1408 ("USB: xhci: Correct
assumptions about number of rings per endpoint.") early on in the XHCI
driver's life, when stream support was first added.
It was then identified but not fixed nor made into a warning in commit
674f8438c121 ("xhci: split handling halted endpoints into two steps"),
which added a FIXME comment for the problem case (without materially
changing the behavior as far as I can tell, though the new logic made
the problem more obvious).
Then later, in commit 94f339147fc3 ("xhci: Fix failure to give back some
cached cancelled URBs."), it was acknowledged again.
[Mathias: commit 94f339147fc3 ("xhci: Fix failure to give back some cached
cancelled URBs.") was a targeted regression fix to the previously mentioned
patch. Users reported issues with usb stuck after unmounting/disconnecting
UAS devices. This rolled back the TD clearing of multiple streams to its
original state.]
Apparently the commit author was aware of the problem (yet still chose
to submit it): It was still mentioned as a FIXME, an xhci_dbg() was
added to log the problem condition, and the remaining issue was mentioned
in the commit description. The choice of making the log type xhci_dbg()
for what is, at this point, a completely unhandled and known broken
condition is puzzling and unfortunate, as it guarantees that no actual
users would see the log in production, thereby making it nigh
undebuggable (indeed, even if you turn on DEBUG, the message doesn't
really hint at there being a problem at all).
It took me *months* of random xHC crashes to finally find a reliable
repro and be able to do a deep dive debug session, which could all have
been avoided had this unhandled, broken condition been actually reported
with a warning, as it should have been as a bug intentionally left in
unfixed (never mind that it shouldn't have been left in at all).
> Another fix to solve clearing the caches of all stream rings with
> cancelled TDs is needed, but not as urgent.
3 years after that statement and 14 years after the original bug was
introduced, I think it's finally time to fix it. And maybe next time
let's not leave bugs unfixed (that are actually worse than the original
bug), and let's actually get people to review kernel commits please.
Fixes xHC crashes and IOMMU faults with UAS devices when handling
errors/faults. Easiest repro is to use `hdparm` to mark an early sector
(e.g. 1024) on a disk as bad, then `cat /dev/sdX > /dev/null` in a loop.
At least in the case of JMicron controllers, the read errors end up
having to cancel two TDs (for two queued requests to different streams)
and the one that didn't get cleared properly ends up faulting the xHC
entirely when it tries to access DMA pages that have since been unmapped,
referred to by the stale TDs. This normally happens quickly (after two
or three loops). After this fix, I left the `cat` in a loop running
overnight and experienced no xHC failures, with all read errors
recovered properly. Repro'd and tested on an Apple M1 Mac Mini
(dwc3 host).
On systems without an IOMMU, this bug would instead silently corrupt
freed memory, making this a
---truncated--- |