| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| Tenda AX-1806 v1.0.0.1 was discovered to contain a stack overflow in the serverName parameter of the sub_65A28 function. This vulnerability allows attackers to cause a Denial of Service (DoS) via a crafted request. |
| Tenda AX-1806 v1.0.0.1 was discovered to contain a stack overflow in the serviceName parameter of the sub_65A28 function. This vulnerability allows attackers to cause a Denial of Service (DoS) via a crafted request. |
| An insecure authentication mechanism in the safe_exec.sh startup script of Blurams Flare Camera version 24.1114.151.929 and earlier allows an attacker with physical access to the device to execute arbitrary commands with root privileges, if file /opt/images/public_key.der is not present in the file system. The vulnerability can be triggered by providing a maliciously crafted auth.ini file on the device's SD card. |
| A vulnerability in the boot process of Blurams Flare Camera version 24.1114.151.929 and earlier allows a physically proximate attacker to hijack the boot mechanism and gain a bootloader shell via the UART interface. This is achieved by inducing a read error from the SPI flash memory during the boot, by shorting a data pin of the IC to ground. An attacker can then dump the entire firmware, leading to the disclosure of sensitive information including cryptographic keys and user configurations. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: remove oem i2c adapter on finish
Fixes a bug where unbinding of the GPU would leave the oem i2c adapter
registered resulting in a null pointer dereference when applications try
to access the invalid device.
(cherry picked from commit 89923fb7ead4fdd37b78dd49962d9bb5892403e6) |
| In the Linux kernel, the following vulnerability has been resolved:
net: phylink: add lock for serializing concurrent pl->phydev writes with resolver
Currently phylink_resolve() protects itself against concurrent
phylink_bringup_phy() or phylink_disconnect_phy() calls which modify
pl->phydev by relying on pl->state_mutex.
The problem is that in phylink_resolve(), pl->state_mutex is in a lock
inversion state with pl->phydev->lock. So pl->phydev->lock needs to be
acquired prior to pl->state_mutex. But that requires dereferencing
pl->phydev in the first place, and without pl->state_mutex, that is
racy.
Hence the reason for the extra lock. Currently it is redundant, but it
will serve a functional purpose once mutex_lock(&phy->lock) will be
moved outside of the mutex_lock(&pl->state_mutex) section.
Another alternative considered would have been to let phylink_resolve()
acquire the rtnl_mutex, which is also held when phylink_bringup_phy()
and phylink_disconnect_phy() are called. But since phylink_disconnect_phy()
runs under rtnl_lock(), it would deadlock with phylink_resolve() when
calling flush_work(&pl->resolve). Additionally, it would have been
undesirable because it would have unnecessarily blocked many other call
paths as well in the entire kernel, so the smaller-scoped lock was
preferred. |
| In the Linux kernel, the following vulnerability has been resolved:
arm64: kexec: initialize kexec_buf struct in load_other_segments()
Patch series "kexec: Fix invalid field access".
The kexec_buf structure was previously declared without initialization.
commit bf454ec31add ("kexec_file: allow to place kexec_buf randomly")
added a field that is always read but not consistently populated by all
architectures. This un-initialized field will contain garbage.
This is also triggering a UBSAN warning when the uninitialized data was
accessed:
------------[ cut here ]------------
UBSAN: invalid-load in ./include/linux/kexec.h:210:10
load of value 252 is not a valid value for type '_Bool'
Zero-initializing kexec_buf at declaration ensures all fields are cleanly
set, preventing future instances of uninitialized memory being used.
An initial fix was already landed for arm64[0], and this patchset fixes
the problem on the remaining arm64 code and on riscv, as raised by Mark.
Discussions about this problem could be found at[1][2].
This patch (of 3):
The kexec_buf structure was previously declared without initialization.
commit bf454ec31add ("kexec_file: allow to place kexec_buf randomly")
added a field that is always read but not consistently populated by all
architectures. This un-initialized field will contain garbage.
This is also triggering a UBSAN warning when the uninitialized data was
accessed:
------------[ cut here ]------------
UBSAN: invalid-load in ./include/linux/kexec.h:210:10
load of value 252 is not a valid value for type '_Bool'
Zero-initializing kexec_buf at declaration ensures all fields are
cleanly set, preventing future instances of uninitialized memory being
used. |
| In the Linux kernel, the following vulnerability has been resolved:
of_numa: fix uninitialized memory nodes causing kernel panic
When there are memory-only nodes (nodes without CPUs), these nodes are not
properly initialized, causing kernel panic during boot.
of_numa_init
of_numa_parse_cpu_nodes
node_set(nid, numa_nodes_parsed);
of_numa_parse_memory_nodes
In of_numa_parse_cpu_nodes, numa_nodes_parsed gets updated only for nodes
containing CPUs. Memory-only nodes should have been updated in
of_numa_parse_memory_nodes, but they weren't.
Subsequently, when free_area_init() attempts to access NODE_DATA() for
these uninitialized memory nodes, the kernel panics due to NULL pointer
dereference.
This can be reproduced on ARM64 QEMU with 1 CPU and 2 memory nodes:
qemu-system-aarch64 \
-cpu host -nographic \
-m 4G -smp 1 \
-machine virt,accel=kvm,gic-version=3,iommu=smmuv3 \
-object memory-backend-ram,size=2G,id=mem0 \
-object memory-backend-ram,size=2G,id=mem1 \
-numa node,nodeid=0,memdev=mem0 \
-numa node,nodeid=1,memdev=mem1 \
-kernel $IMAGE \
-hda $DISK \
-append "console=ttyAMA0 root=/dev/vda rw earlycon"
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x481fd010]
[ 0.000000] Linux version 6.17.0-rc1-00001-gabb4b3daf18c-dirty (yintirui@local) (gcc (GCC) 12.3.1, GNU ld (GNU Binutils) 2.41) #52 SMP PREEMPT Mon Aug 18 09:49:40 CST 2025
[ 0.000000] KASLR enabled
[ 0.000000] random: crng init done
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] efi: UEFI not found.
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] printk: legacy bootconsole [pl11] enabled
[ 0.000000] OF: reserved mem: Reserved memory: No reserved-memory node in the DT
[ 0.000000] NODE_DATA(0) allocated [mem 0xbfffd9c0-0xbfffffff]
[ 0.000000] node 1 must be removed before remove section 23
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal [mem 0x0000000100000000-0x000000013fffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bfffffff]
[ 0.000000] node 1: [mem 0x00000000c0000000-0x000000013fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff]
[ 0.000000] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0
[ 0.000000] Mem abort info:
[ 0.000000] ESR = 0x0000000096000004
[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.000000] SET = 0, FnV = 0
[ 0.000000] EA = 0, S1PTW = 0
[ 0.000000] FSC = 0x04: level 0 translation fault
[ 0.000000] Data abort info:
[ 0.000000] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 0.000000] [00000000000000a0] user address but active_mm is swapper
[ 0.000000] Internal error: Oops: 0000000096000004 [#1] SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17.0-rc1-00001-g760c6dabf762-dirty #54 PREEMPT
[ 0.000000] Hardware name: linux,dummy-virt (DT)
[ 0.000000] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000000] pc : free_area_init+0x50c/0xf9c
[ 0.000000] lr : free_area_init+0x5c0/0xf9c
[ 0.000000] sp : ffffa02ca0f33c00
[ 0.000000] x29: ffffa02ca0f33cb0 x28: 0000000000000000 x27: 0000000000000000
[ 0.000000] x26: 4ec4ec4ec4ec4ec5 x25: 00000000000c0000 x24: 00000000000c0000
[ 0.000000] x23: 0000000000040000 x22: 0000000000000000 x21: ffffa02ca0f3b368
[ 0.000000] x20: ffffa02ca14c7b98 x19: 0000000000000000 x18: 0000000000000002
[ 0.000000] x17: 000000000000cacc x16: 0000000000000001 x15: 0000000000000001
[ 0.000000] x14: 0000000080000000 x13: 0000000000000018 x12: 0000000000000002
[ 0.0
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
i40e: remove read access to debugfs files
The 'command' and 'netdev_ops' debugfs files are a legacy debugging
interface supported by the i40e driver since its early days by commit
02e9c290814c ("i40e: debugfs interface").
Both of these debugfs files provide a read handler which is mostly useless,
and which is implemented with questionable logic. They both use a static
256 byte buffer which is initialized to the empty string. In the case of
the 'command' file this buffer is literally never used and simply wastes
space. In the case of the 'netdev_ops' file, the last command written is
saved here.
On read, the files contents are presented as the name of the device
followed by a colon and then the contents of their respective static
buffer. For 'command' this will always be "<device>: ". For 'netdev_ops',
this will be "<device>: <last command written>". But note the buffer is
shared between all devices operated by this module. At best, it is mostly
meaningless information, and at worse it could be accessed simultaneously
as there doesn't appear to be any locking mechanism.
We have also recently received multiple reports for both read functions
about their use of snprintf and potential overflow that could result in
reading arbitrary kernel memory. For the 'command' file, this is definitely
impossible, since the static buffer is always zero and never written to.
For the 'netdev_ops' file, it does appear to be possible, if the user
carefully crafts the command input, it will be copied into the buffer,
which could be large enough to cause snprintf to truncate, which then
causes the copy_to_user to read beyond the length of the buffer allocated
by kzalloc.
A minimal fix would be to replace snprintf() with scnprintf() which would
cap the return to the number of bytes written, preventing an overflow. A
more involved fix would be to drop the mostly useless static buffers,
saving 512 bytes and modifying the read functions to stop needing those as
input.
Instead, lets just completely drop the read access to these files. These
are debug interfaces exposed as part of debugfs, and I don't believe that
dropping read access will break any script, as the provided output is
pretty useless. You can find the netdev name through other more standard
interfaces, and the 'netdev_ops' interface can easily result in garbage if
you issue simultaneous writes to multiple devices at once.
In order to properly remove the i40e_dbg_netdev_ops_buf, we need to
refactor its write function to avoid using the static buffer. Instead, use
the same logic as the i40e_dbg_command_write, with an allocated buffer.
Update the code to use this instead of the static buffer, and ensure we
free the buffer on exit. This fixes simultaneous writes to 'netdev_ops' on
multiple devices, and allows us to remove the now unused static buffer
along with removing the read access. |
| In the Linux kernel, the following vulnerability has been resolved:
net_sched: gen_estimator: fix est_timer() vs CONFIG_PREEMPT_RT=y
syzbot reported a WARNING in est_timer() [1]
Problem here is that with CONFIG_PREEMPT_RT=y, timer callbacks
can be preempted.
Adopt preempt_disable_nested()/preempt_enable_nested() to fix this.
[1]
WARNING: CPU: 0 PID: 16 at ./include/linux/seqlock.h:221 __seqprop_assert include/linux/seqlock.h:221 [inline]
WARNING: CPU: 0 PID: 16 at ./include/linux/seqlock.h:221 est_timer+0x6dc/0x9f0 net/core/gen_estimator.c:93
Modules linked in:
CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Not tainted syzkaller #0 PREEMPT_{RT,(full)}
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:__seqprop_assert include/linux/seqlock.h:221 [inline]
RIP: 0010:est_timer+0x6dc/0x9f0 net/core/gen_estimator.c:93
Call Trace:
<TASK>
call_timer_fn+0x17e/0x5f0 kernel/time/timer.c:1747
expire_timers kernel/time/timer.c:1798 [inline]
__run_timers kernel/time/timer.c:2372 [inline]
__run_timer_base+0x648/0x970 kernel/time/timer.c:2384
run_timer_base kernel/time/timer.c:2393 [inline]
run_timer_softirq+0xb7/0x180 kernel/time/timer.c:2403
handle_softirqs+0x22c/0x710 kernel/softirq.c:579
__do_softirq kernel/softirq.c:613 [inline]
run_ktimerd+0xcf/0x190 kernel/softirq.c:1043
smpboot_thread_fn+0x53f/0xa60 kernel/smpboot.c:160
kthread+0x70e/0x8a0 kernel/kthread.c:463
ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK> |
| In the Linux kernel, the following vulnerability has been resolved:
mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE
With CONFIG_HIGHPTE on 32-bit ARM, move_pages_pte() maps PTE pages using
kmap_local_page(), which requires unmapping in Last-In-First-Out order.
The current code maps dst_pte first, then src_pte, but unmaps them in the
same order (dst_pte, src_pte), violating the LIFO requirement. This
causes the warning in kunmap_local_indexed():
WARNING: CPU: 0 PID: 604 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
addr \!= __fix_to_virt(FIX_KMAP_BEGIN + idx)
Fix this by reversing the unmap order to respect LIFO ordering.
This issue follows the same pattern as similar fixes:
- commit eca6828403b8 ("crypto: skcipher - fix mismatch between mapping and unmapping order")
- commit 8cf57c6df818 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
Both of which addressed the same fundamental requirement that kmap_local
operations must follow LIFO ordering. |
| In the Linux kernel, the following vulnerability has been resolved:
net: xilinx: axienet: Add error handling for RX metadata pointer retrieval
Add proper error checking for dmaengine_desc_get_metadata_ptr() which
can return an error pointer and lead to potential crashes or undefined
behaviour if the pointer retrieval fails.
Properly handle the error by unmapping DMA buffer, freeing the skb and
returning early to prevent further processing with invalid data. |
| In the Linux kernel, the following vulnerability has been resolved:
accel/ivpu: Prevent recovery work from being queued during device removal
Use disable_work_sync() instead of cancel_work_sync() in ivpu_dev_fini()
to ensure that no new recovery work items can be queued after device
removal has started. Previously, recovery work could be scheduled even
after canceling existing work, potentially leading to use-after-free
bugs if recovery accessed freed resources.
Rename ivpu_pm_cancel_recovery() to ivpu_pm_disable_recovery() to better
reflect its new behavior. |
| In the Linux kernel, the following vulnerability has been resolved:
sched: Fix sched_numa_find_nth_cpu() if mask offline
sched_numa_find_nth_cpu() uses a bsearch to look for the 'closest'
CPU in sched_domains_numa_masks and given cpus mask. However they
might not intersect if all CPUs in the cpus mask are offline. bsearch
will return NULL in that case, bail out instead of dereferencing a
bogus pointer.
The previous behaviour lead to this bug when using maxcpus=4 on an
rk3399 (LLLLbb) (i.e. booting with all big CPUs offline):
[ 1.422922] Unable to handle kernel paging request at virtual address ffffff8000000000
[ 1.423635] Mem abort info:
[ 1.423889] ESR = 0x0000000096000006
[ 1.424227] EC = 0x25: DABT (current EL), IL = 32 bits
[ 1.424715] SET = 0, FnV = 0
[ 1.424995] EA = 0, S1PTW = 0
[ 1.425279] FSC = 0x06: level 2 translation fault
[ 1.425735] Data abort info:
[ 1.425998] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[ 1.426499] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 1.426952] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 1.427428] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000004a9f000
[ 1.428038] [ffffff8000000000] pgd=18000000f7fff403, p4d=18000000f7fff403, pud=18000000f7fff403, pmd=0000000000000000
[ 1.429014] Internal error: Oops: 0000000096000006 [#1] SMP
[ 1.429525] Modules linked in:
[ 1.429813] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-rc4-dirty #343 PREEMPT
[ 1.430559] Hardware name: Pine64 RockPro64 v2.1 (DT)
[ 1.431012] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1.431634] pc : sched_numa_find_nth_cpu+0x2a0/0x488
[ 1.432094] lr : sched_numa_find_nth_cpu+0x284/0x488
[ 1.432543] sp : ffffffc084e1b960
[ 1.432843] x29: ffffffc084e1b960 x28: ffffff80078a8800 x27: ffffffc0846eb1d0
[ 1.433495] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[ 1.434144] x23: 0000000000000000 x22: fffffffffff7f093 x21: ffffffc081de6378
[ 1.434792] x20: 0000000000000000 x19: 0000000ffff7f093 x18: 00000000ffffffff
[ 1.435441] x17: 3030303866666666 x16: 66663d736b73616d x15: ffffffc104e1b5b7
[ 1.436091] x14: 0000000000000000 x13: ffffffc084712860 x12: 0000000000000372
[ 1.436739] x11: 0000000000000126 x10: ffffffc08476a860 x9 : ffffffc084712860
[ 1.437389] x8 : 00000000ffffefff x7 : ffffffc08476a860 x6 : 0000000000000000
[ 1.438036] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000
[ 1.438683] x2 : 0000000000000000 x1 : ffffffc0846eb000 x0 : ffffff8000407b68
[ 1.439332] Call trace:
[ 1.439559] sched_numa_find_nth_cpu+0x2a0/0x488 (P)
[ 1.440016] smp_call_function_any+0xc8/0xd0
[ 1.440416] armv8_pmu_init+0x58/0x27c
[ 1.440770] armv8_cortex_a72_pmu_init+0x20/0x2c
[ 1.441199] arm_pmu_device_probe+0x1e4/0x5e8
[ 1.441603] armv8_pmu_device_probe+0x1c/0x28
[ 1.442007] platform_probe+0x5c/0xac
[ 1.442347] really_probe+0xbc/0x298
[ 1.442683] __driver_probe_device+0x78/0x12c
[ 1.443087] driver_probe_device+0xdc/0x160
[ 1.443475] __driver_attach+0x94/0x19c
[ 1.443833] bus_for_each_dev+0x74/0xd4
[ 1.444190] driver_attach+0x24/0x30
[ 1.444525] bus_add_driver+0xe4/0x208
[ 1.444874] driver_register+0x60/0x128
[ 1.445233] __platform_driver_register+0x24/0x30
[ 1.445662] armv8_pmu_driver_init+0x28/0x4c
[ 1.446059] do_one_initcall+0x44/0x25c
[ 1.446416] kernel_init_freeable+0x1dc/0x3bc
[ 1.446820] kernel_init+0x20/0x1d8
[ 1.447151] ret_from_fork+0x10/0x20
[ 1.447493] Code: 90022e21 f000e5f5 910de2b5 2a1703e2 (f8767803)
[ 1.448040] ---[ end trace 0000000000000000 ]---
[ 1.448483] note: swapper/0[1] exited with preempt_count 1
[ 1.449047] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 1.449741] SMP: stopping secondary CPUs
[ 1.450105] Kernel Offset: disabled
[ 1.450419] CPU features: 0x000000,00080000,20002001,0400421b
[
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
spi: spi-qpic-snand: unregister ECC engine on probe error and device remove
The on-host hardware ECC engine remains registered both when
the spi_register_controller() function returns with an error
and also on device removal.
Change the qcom_spi_probe() function to unregister the engine
on the error path, and add the missing unregistering call to
qcom_spi_remove() to avoid possible use-after-free issues. |
| In the Linux kernel, the following vulnerability has been resolved:
ASoC: soc-core: care NULL dirver name on snd_soc_lookup_component_nolocked()
soc-generic-dmaengine-pcm.c uses same dev for both CPU and Platform.
In such case, CPU component driver might not have driver->name, then
snd_soc_lookup_component_nolocked() will be NULL pointer access error.
Care NULL driver name.
Call trace:
strcmp from snd_soc_lookup_component_nolocked+0x64/0xa4
snd_soc_lookup_component_nolocked from snd_soc_unregister_component_by_driver+0x2c/0x44
snd_soc_unregister_component_by_driver from snd_dmaengine_pcm_unregister+0x28/0x64
snd_dmaengine_pcm_unregister from devres_release_all+0x98/0xfc
devres_release_all from device_unbind_cleanup+0xc/0x60
device_unbind_cleanup from really_probe+0x220/0x2c8
really_probe from __driver_probe_device+0x88/0x1a0
__driver_probe_device from driver_probe_device+0x30/0x110
driver_probe_device from __driver_attach+0x90/0x178
__driver_attach from bus_for_each_dev+0x7c/0xcc
bus_for_each_dev from bus_add_driver+0xcc/0x1ec
bus_add_driver from driver_register+0x80/0x11c
driver_register from do_one_initcall+0x58/0x23c
do_one_initcall from kernel_init_freeable+0x198/0x1f4
kernel_init_freeable from kernel_init+0x1c/0x12c
kernel_init from ret_from_fork+0x14/0x28 |
| In the Linux kernel, the following vulnerability has been resolved:
fuse: Block access to folio overlimit
syz reported a slab-out-of-bounds Write in fuse_dev_do_write.
When the number of bytes to be retrieved is truncated to the upper limit
by fc->max_pages and there is an offset, the oob is triggered.
Add a loop termination condition to prevent overruns. |
| In the Linux kernel, the following vulnerability has been resolved:
tracing/osnoise: Fix null-ptr-deref in bitmap_parselist()
A crash was observed with the following output:
BUG: kernel NULL pointer dereference, address: 0000000000000010
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 2 UID: 0 PID: 92 Comm: osnoise_cpus Not tainted 6.17.0-rc4-00201-gd69eb204c255 #138 PREEMPT(voluntary)
RIP: 0010:bitmap_parselist+0x53/0x3e0
Call Trace:
<TASK>
osnoise_cpus_write+0x7a/0x190
vfs_write+0xf8/0x410
? do_sys_openat2+0x88/0xd0
ksys_write+0x60/0xd0
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
This issue can be reproduced by below code:
fd=open("/sys/kernel/debug/tracing/osnoise/cpus", O_WRONLY);
write(fd, "0-2", 0);
When user pass 'count=0' to osnoise_cpus_write(), kmalloc() will return
ZERO_SIZE_PTR (16) and cpulist_parse() treat it as a normal value, which
trigger the null pointer dereference. Add check for the parameter 'count'. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()
Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can
cause various locking issues; see the following stack trace (edited for
style) as one example:
...
[10.011566] do_raw_spin_lock.cold
[10.011570] try_to_wake_up (5) double-acquiring the same
[10.011575] kick_pool rq_lock, causing a hardlockup
[10.011579] __queue_work
[10.011582] queue_work_on
[10.011585] kernfs_notify
[10.011589] cgroup_file_notify
[10.011593] try_charge_memcg (4) memcg accounting raises an
[10.011597] obj_cgroup_charge_pages MEMCG_MAX event
[10.011599] obj_cgroup_charge_account
[10.011600] __memcg_slab_post_alloc_hook
[10.011603] __kmalloc_node_noprof
...
[10.011611] bpf_map_kmalloc_node
[10.011612] __bpf_async_init
[10.011615] bpf_timer_init (3) BPF calls bpf_timer_init()
[10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
[10.011619] bpf__sched_ext_ops_runnable
[10.011620] enqueue_task_scx (2) BPF runs with rq_lock held
[10.011622] enqueue_task
[10.011626] ttwu_do_activate
[10.011629] sched_ttwu_pending (1) grabs rq_lock
...
The above was reproduced on bpf-next (b338cf849ec8) by modifying
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
ops.runnable(), and hacking the memcg accounting code a bit to make
a bpf_timer_init() call more likely to raise an MEMCG_MAX event.
We have also run into other similar variants (both internally and on
bpf-next), including double-acquiring cgroup_file_kn_lock, the same
worker_pool::lock, etc.
As suggested by Shakeel, fix this by using __GFP_HIGH instead of
GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg()
raises an MEMCG_MAX event, we call __memcg_memory_event() with
@allow_spinning=false and avoid calling cgroup_file_notify() there.
Depends on mm patch
"memcg: skip cgroup_file_notify if spinning is not allowed":
https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/
v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/
https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
v1 approach:
https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/ |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix subvolume deletion lockup caused by inodes xarray race
There is a race condition between inode eviction and inode caching that
can cause a live struct btrfs_inode to be missing from the root->inodes
xarray. Specifically, there is a window during evict() between the inode
being unhashed and deleted from the xarray. If btrfs_iget() is called
for the same inode in that window, it will be recreated and inserted
into the xarray, but then eviction will delete the new entry, leaving
nothing in the xarray:
Thread 1 Thread 2
---------------------------------------------------------------
evict()
remove_inode_hash()
btrfs_iget_path()
btrfs_iget_locked()
btrfs_read_locked_inode()
btrfs_add_inode_to_root()
destroy_inode()
btrfs_destroy_inode()
btrfs_del_inode_from_root()
__xa_erase
In turn, this can cause issues for subvolume deletion. Specifically, if
an inode is in this lost state, and all other inodes are evicted, then
btrfs_del_inode_from_root() will call btrfs_add_dead_root() prematurely.
If the lost inode has a delayed_node attached to it, then when
btrfs_clean_one_deleted_snapshot() calls btrfs_kill_all_delayed_nodes(),
it will loop forever because the delayed_nodes xarray will never become
empty (unless memory pressure forces the inode out). We saw this
manifest as soft lockups in production.
Fix it by only deleting the xarray entry if it matches the given inode
(using __xa_cmpxchg()). |