| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix linked reg delta tracking when src_reg == dst_reg
Consider the case of rX += rX where src_reg and dst_reg are pointers to
the same bpf_reg_state in adjust_reg_min_max_vals(). The latter first
modifies the dst_reg in-place, and later in the delta tracking, the
subsequent is_reg_const(src_reg)/reg_const_value(src_reg) reads the
post-{add,sub} value instead of the original source.
This is problematic since it sets an incorrect delta, which sync_linked_regs()
then propagates to linked registers, thus creating a verifier-vs-runtime
mismatch. Fix it by just skipping this corner case. |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: mt76: mt7921: fix potential deadlock in mt7921_roc_abort_sync
roc_abort_sync() can deadlock with roc_work(). roc_work() holds
dev->mt76.mutex, while cancel_work_sync() waits for roc_work()
to finish. If the caller already owns the same mutex, both
sides block and no progress is possible.
This deadlock can occur during station removal when
mt76_sta_state() -> mt76_sta_remove() -> mt7921_mac_sta_remove() ->
mt7921_roc_abort_sync() invokes cancel_work_sync() while
roc_work() is still running and holding dev->mt76.mutex.
This avoids the mutex deadlock and preserves exactly-once
work ownership. |
| In the Linux kernel, the following vulnerability has been resolved:
powerpc/64s: Fix unmap race with PMD migration entries
The following race is possible with migration swap entries or
device-private THP entries. e.g. when move_pages is called on a PMD THP
page, then there maybe an intermediate state, where PMD entry acts as
a migration swap entry (pmd_present() is true). Then if an munmap
happens at the same time, then this VM_BUG_ON() can happen in
pmdp_huge_get_and_clear_full().
This patch fixes that.
Thread A: move_pages() syscall
add_folio_for_migration()
mmap_read_lock(mm)
folio_isolate_lru(folio)
mmap_read_unlock(mm)
do_move_pages_to_node()
migrate_pages()
try_to_migrate_one()
spin_lock(ptl)
set_pmd_migration_entry()
pmdp_invalidate() # PMD: _PAGE_INVALID | _PAGE_PTE | pfn
set_pmd_at() # PMD: migration swap entry (pmd_present=0)
spin_unlock(ptl)
[page copy phase] # <--- RACE WINDOW -->
Thread B: munmap()
mmap_write_downgrade(mm)
unmap_vmas() -> zap_pmd_range()
zap_huge_pmd()
__pmd_trans_huge_lock()
pmd_is_huge(): # !pmd_present && !pmd_none -> TRUE (swap entry)
pmd_lock() -> # spin_lock(ptl), waits for Thread A to release ptl
pmdp_huge_get_and_clear_full()
VM_BUG_ON(!pmd_present(*pmdp)) # HITS!
[ 287.738700][ T1867] ------------[ cut here ]------------
[ 287.743843][ T1867] kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:187!
cpu 0x0: Vector: 700 (Program Check) at [c00000044037f4f0]
pc: c000000000094ca4: pmdp_huge_get_and_clear_full+0x6c/0x23c
lr: c000000000645dec: zap_huge_pmd+0xb0/0x868
sp: c00000044037f790
msr: 800000000282b033
current = 0xc0000004032c1a00
paca = 0xc000000004fe0000 irqmask: 0x03 irq_happened: 0x09
pid = 1867, comm = a.out
kernel BUG at :187!
Linux version 6.19.0-12136-g14360d4f917c-dirty (powerpc64le-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #27 SMP PREEMPT Sun Feb 22 10:38:56 IST 2026
enter ? for help
[link register ] c000000000645dec zap_huge_pmd+0xb0/0x868
[c00000044037f790] c00000044037f7d0 (unreliable)
[c00000044037f7d0] c000000000645dcc zap_huge_pmd+0x90/0x868
[c00000044037f840] c0000000005724cc unmap_page_range+0x176c/0x1f40
[c00000044037fa00] c000000000572ea0 unmap_vmas+0xb0/0x1d8
[c00000044037fa90] c0000000005af254 unmap_region+0xb4/0x128
[c00000044037fb50] c0000000005af400 vms_complete_munmap_vmas+0x138/0x310
[c00000044037fbe0] c0000000005b0f1c do_vmi_align_munmap+0x1ec/0x238
[c00000044037fd30] c0000000005b3688 __vm_munmap+0x170/0x1f8
[c00000044037fdf0] c000000000587f74 sys_munmap+0x2c/0x40
[c00000044037fe10] c000000000032668 system_call_exception+0x128/0x350
[c00000044037fe50] c00000000000d05c system_call_vectored_common+0x15c/0x2ec
---- Exception: 3000 (System Call Vectored) at 0000000010064a2c
SP (7fff9b1ee9c0) is in userspace
0:mon> zh
commit a30b48bf1b24 ("mm/migrate_device: implement THP migration of zone device pages"),
enabled migration for device-private PMD entries. Hence this is one
other path where this warning could get trigger from.
------------[ cut here ]------------
WARNING: arch/powerpc/mm/book3s64/hash_pgtable.c:199 at hash__pmd_hugepage_update+0x48/0x284, CPU#3: hmm-tests/1905
Modules linked in: test_hmm
CPU: 3 UID: 0 PID: 1905 Comm: hmm-tests Tainted: G B W L N 7.0.0-rc1-01438-g7e2f0ee7581c #21 PREEMPT
Tainted: [B]=BAD_PAGE, [W]=WARN, [L]=SOFTLOCKUP, [N]=TEST
Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries
NIP [c000000000096b70] hash__pmd_hugepage_update+0x48/0x284
LR [c000000000096e7c] hash__pmdp_huge_get_and_clear+0xd0/0xd4
Call Trace:
[c000000604707670] [c000000004e102b8] 0xc000000004e102b8 (unreliable)
[c000000604707700] [c00000000064ec3c] set_pmd_migration_entry+0x414/0x498
[c000000604707760] [c00000000063e5a4] migrate_vma_col
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: test_run: Fix the null pointer dereference issue in bpf_lwt_xmit_push_encap
The bpf_lwt_xmit_push_encap helper needs to access skb_dst(skb)->dev to
calculate the needed headroom:
err = skb_cow_head(skb,
len + LL_RESERVED_SPACE(skb_dst(skb)->dev));
But skb->_skb_refdst may not be initialized when the skb is set up by
bpf_prog_test_run_skb function. Executing bpf_lwt_push_ip_encap function
in this scenario will trigger null pointer dereference, causing a kernel
crash as Yinhao reported:
[ 105.186365] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 105.186382] #PF: supervisor read access in kernel mode
[ 105.186388] #PF: error_code(0x0000) - not-present page
[ 105.186393] PGD 121d3d067 P4D 121d3d067 PUD 106c83067 PMD 0
[ 105.186404] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 105.186412] CPU: 3 PID: 3250 Comm: poc Kdump: loaded Not tainted 6.19.0-rc5 #1
[ 105.186423] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 105.186427] RIP: 0010:bpf_lwt_push_ip_encap+0x1eb/0x520
[ 105.186443] Code: 0f 84 de 01 00 00 0f b7 4a 04 66 85 c9 0f 85 47 01 00 00 31 c0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc 48 8b 73 58 48 83 e6 fe <48> 8b 36 0f b7 be ec 00 00 00 0f b7 b6 e6 00 00 00 01 fe 83 e6 f0
[ 105.186449] RSP: 0018:ffffbb0e0387bc50 EFLAGS: 00010246
[ 105.186455] RAX: 000000000000004e RBX: ffff94c74e036500 RCX: ffff94c74874da00
[ 105.186460] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94c74e036500
[ 105.186463] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000000000
[ 105.186467] R10: ffffbb0e0387bd50 R11: 0000000000000000 R12: ffffbb0e0387bc98
[ 105.186471] R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000002
[ 105.186484] FS: 00007f166aa4d680(0000) GS:ffff94c8b7780000(0000) knlGS:0000000000000000
[ 105.186490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 105.186494] CR2: 0000000000000000 CR3: 000000015eade001 CR4: 0000000000770ee0
[ 105.186499] PKRU: 55555554
[ 105.186502] Call Trace:
[ 105.186507] <TASK>
[ 105.186513] bpf_lwt_xmit_push_encap+0x2b/0x40
[ 105.186522] bpf_prog_a75eaad51e517912+0x41/0x49
[ 105.186536] ? kvm_clock_get_cycles+0x18/0x30
[ 105.186547] ? ktime_get+0x3c/0xa0
[ 105.186554] bpf_test_run+0x195/0x320
[ 105.186563] ? bpf_test_run+0x10f/0x320
[ 105.186579] bpf_prog_test_run_skb+0x2f5/0x4f0
[ 105.186590] __sys_bpf+0x69c/0xa40
[ 105.186603] __x64_sys_bpf+0x1e/0x30
[ 105.186611] do_syscall_64+0x59/0x110
[ 105.186620] entry_SYSCALL_64_after_hwframe+0x76/0xe0
[ 105.186649] RIP: 0033:0x7f166a97455d
Temporarily add the setting of skb->_skb_refdst before bpf_test_run to resolve the issue. |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: ath11k: fix memory leaks in beacon template setup
The functions ath11k_mac_setup_bcn_tmpl_ema() and
ath11k_mac_setup_bcn_tmpl_mbssid() allocate memory for beacon templates
but fail to free it when parameter setup returns an error.
Since beacon templates must be released during normal execution, they
must also be released in the error handling paths to prevent memory
leaks.
Fix this by using unified exit paths with proper cleanup in the respective
error paths.
Compile tested only. Issue found using a prototype static analysis tool
and code review. |
| In the Linux kernel, the following vulnerability has been resolved:
perf/amd/ibs: Avoid calling perf_allow_kernel() from the IBS NMI handler
Calling perf_allow_kernel() from the NMI context is unsafe and could be
fatal. Capture the permission at event-initialization time by storing it
in event->hw.flags, and have the NMI handler rely on that cached flag
instead of making the call directly. |
| In the Linux kernel, the following vulnerability has been resolved:
PCI: use generic driver_override infrastructure
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1] |
| In the Linux kernel, the following vulnerability has been resolved:
amd-pstate: Fix memory leak in amd_pstate_epp_cpu_init()
On failure to set the epp, the function amd_pstate_epp_cpu_init()
returns with an error code without freeing the cpudata object that was
allocated at the beginning of the function.
Ensure that the cpudata object is freed before returning from the
function.
This memory leak was discovered by Claude Opus 4.6 with the aid of
Chris Mason's AI review-prompts
(https://github.com/masoncl/review-prompts/tree/main/kernel). |
| In the Linux kernel, the following vulnerability has been resolved:
drbd: Balance RCU calls in drbd_adm_dump_devices()
Make drbd_adm_dump_devices() call rcu_read_lock() before
rcu_read_unlock() is called. This has been detected by the Clang
thread-safety analyzer. |
| This CVE ID has been rejected or withdrawn by its CVE Numbering Authority. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: avoid double drm_exec_fini() in userq validate
When new_addition is true, amdgpu_userq_vm_validate() calls
drm_exec_fini(&exec) before iterating over the collected HMM ranges and
calling amdgpu_ttm_tt_get_user_pages().
If amdgpu_ttm_tt_get_user_pages() fails in that path, the code jumps to
unlock_all and calls drm_exec_fini(&exec) a second time on the same
exec object. drm_exec_fini() is not idempotent: it frees exec->objects
and may also drop exec->contended and finalize the ww acquire context.
Route that error path directly to the range cleanup once exec has
already been finalized.
Issue found using a prototype static analysis tool
and confirmed by code review.
(cherry picked from commit 2802952e4a07306da6ebe813ff1acacc5691851a) |
| In the Linux kernel, the following vulnerability has been resolved:
sched/psi: fix race between file release and pressure write
A potential race condition exists between pressure write and cgroup file
release regarding the priv member of struct kernfs_open_file, which
triggers the uaf reported in [1].
Consider the following scenario involving execution on two separate CPUs:
CPU0 CPU1
==== ====
vfs_rmdir()
kernfs_iop_rmdir()
cgroup_rmdir()
cgroup_kn_lock_live()
cgroup_destroy_locked()
cgroup_addrm_files()
cgroup_rm_file()
kernfs_remove_by_name()
kernfs_remove_by_name_ns()
vfs_write() __kernfs_remove()
new_sync_write() kernfs_drain()
kernfs_fop_write_iter() kernfs_drain_open_files()
cgroup_file_write() kernfs_release_file()
pressure_write() cgroup_file_release()
ctx = of->priv;
kfree(ctx);
of->priv = NULL;
cgroup_kn_unlock()
cgroup_kn_lock_live()
cgroup_get(cgrp)
cgroup_kn_unlock()
if (ctx->psi.trigger) // here, trigger uaf for ctx, that is of->priv
The cgroup_rmdir() is protected by the cgroup_mutex, it also safeguards
the memory deallocation of of->priv performed within cgroup_file_release().
However, the operations involving of->priv executed within pressure_write()
are not entirely covered by the protection of cgroup_mutex. Consequently,
if the code in pressure_write(), specifically the section handling the
ctx variable executes after cgroup_file_release() has completed, a uaf
vulnerability involving of->priv is triggered.
Therefore, the issue can be resolved by extending the scope of the
cgroup_mutex lock within pressure_write() to encompass all code paths
involving of->priv, thereby properly synchronizing the race condition
occurring between cgroup_file_release() and pressure_write().
And, if an live kn lock can be successfully acquired while executing
the pressure write operation, it indicates that the cgroup deletion
process has not yet reached its final stage; consequently, the priv
pointer within open_file cannot be NULL. Therefore, the operation to
retrieve the ctx value must be moved to a point *after* the live kn
lock has been successfully acquired.
In another situation, specifically after entering cgroup_kn_lock_live()
but before acquiring cgroup_mutex, there exists a different class of
race condition:
CPU0: write memory.pressure CPU1: write cgroup.pressure=0
=========================== =============================
kernfs_fop_write_iter()
kernfs_get_active_of(of)
pressure_write()
cgroup_kn_lock_live(memory.pressure)
cgroup_tryget(cgrp)
kernfs_break_active_protection(kn)
... blocks on cgroup_mutex
cgroup_pressure_write()
cgroup_kn_lock_live(cgroup.pressure)
cgroup_file_show(memory.pressure, false)
kernfs_show(false)
kernfs_drain_open_files()
cgroup_file_release(of)
kfree(ctx)
of->priv = NULL
cgroup_kn_unlock()
... acquires cgroup_mutex
ctx = of->priv; // may now be NULL
if (ctx->psi.trigger) // NULL dereference
Consequently, there is a possibility that of->priv is NULL, the pressure
write needs to check for this.
Now that the scope of the cgroup_mutex has been expanded, the original
explicit cgroup_get/put operations are no longer necessary, this is
because acquiring/releasing the live kn lock inherently executes a
cgroup get/put operation.
[1]
BUG: KASAN: slab-use-after-free in pressure_write+0xa4/0x210 kernel/cgroup/cgroup.c:4011
Call Trace:
pressure_write+0xa4/0x210 kernel/cgroup/cgroup.c:4011
cgroup_file_write+0x36f/0x790 kernel/cgroup/cgroup.c:43
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
um: Fix potential race condition in TLB sync
During the TLB sync, we need to traverse and modify the page table,
so we should hold the page table lock. Since full SMP support for
threads within the same process is still missing, let's disable the
split page table lock for simplicity. |
| In the Linux kernel, the following vulnerability has been resolved:
greybus: raw: fix use-after-free on cdev close
This addresses a use-after-free bug when a raw bundle is disconnected
but its chardev is still opened by an application. When the application
releases the cdev, it causes the following panic when init on free is
enabled (CONFIG_INIT_ON_FREE_DEFAULT_ON=y):
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 139 at lib/refcount.c:28 refcount_warn_saturate+0xd0/0x130
...
Call Trace:
<TASK>
cdev_put+0x18/0x30
__fput+0x255/0x2a0
__x64_sys_close+0x3d/0x80
do_syscall_64+0xa4/0x290
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The cdev is contained in the "gb_raw" structure, which is freed in the
disconnect operation. When the cdev is released at a later time,
cdev_put gets an address that points to freed memory.
To fix this use-after-free, convert the struct device from a pointer to
being embedded, that makes the lifetime of the cdev and of this device
the same. Then, use cdev_device_add, which guarantees that the device
won't be released until all references to the cdev have been released.
Finally, delegate the freeing of the structure to the device release
function, instead of freeing immediately in the disconnect callback. |
| In the Linux kernel, the following vulnerability has been resolved:
dm cache: fix null-deref with concurrent writes in passthrough mode
In passthrough mode, when dm-cache starts to invalidate a cache
entry and bio prison cell lock fails due to concurrent write to
the same cached block, mg->cell remains NULL. The error path in
invalidate_complete() attempts to unlock and free the cell
unconditionally, causing a NULL pointer dereference:
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 UID: 0 PID: 134 Comm: fio Not tainted 6.19.0-rc7 #3 PREEMPT
RIP: 0010:dm_cell_unlock_v2+0x3f/0x210
<snip>
Call Trace:
invalidate_complete+0xef/0x430
map_bio+0x130f/0x1a10
cache_map+0x320/0x6b0
__map_bio+0x458/0x510
dm_submit_bio+0x40e/0x16d0
__submit_bio+0x419/0x870
<snip>
Reproduce steps:
1. Create a cache device
dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
dmsetup create cdata --table "0 131072 linear /dev/sdc 8192"
dmsetup create corig --table "0 262144 linear /dev/sdc 262144"
dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct
dmsetup create cache --table "0 262144 cache /dev/mapper/cmeta \
/dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0"
2. Promote the first data block into cache
fio --filename=/dev/mapper/cache --name=populate --rw=write --bs=4k \
--direct=1 --size=64k
3. Reload the cache into passthrough mode
dmsetup suspend cache
dmsetup reload cache --table "0 262144 cache /dev/mapper/cmeta \
/dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 passthrough smq 0"
dmsetup resume cache
4. Write to the first cached block concurrently
fio --filename=/dev/mapper/cache --name test --rw=randwrite --bs=4k \
--randrepeat=0 --direct=1 --numjobs=2 --size 64k
Fix by checking if mg->cell is valid before attempting to unlock it. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: fix mm lifecycle in open-coded task_vma iterator
The open-coded task_vma iterator reads task->mm locklessly and acquires
mmap_read_trylock() but never calls mmget(). If the task exits
concurrently, the mm_struct can be freed as it is not
SLAB_TYPESAFE_BY_RCU, resulting in a use-after-free.
Safely read task->mm with a trylock on alloc_lock and acquire an mm
reference. Drop the reference via bpf_iter_mmput_async() in _destroy()
and error paths. bpf_iter_mmput_async() is a local wrapper around
mmput_async() with a fallback to mmput() on !CONFIG_MMU.
Reject irqs-disabled contexts (including NMI) up front. Operations used
by _next() and _destroy() (mmap_read_unlock, bpf_iter_mmput_async)
take spinlocks with IRQs disabled (pool->lock, pi_lock). Running from
NMI or from a tracepoint that fires with those locks held could
deadlock.
A trylock on alloc_lock is used instead of the blocking task_lock()
(get_task_mm) to avoid a deadlock when a softirq BPF program iterates
a task that already holds its alloc_lock on the same CPU. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix use-after-free in offloaded map/prog info fill
When querying info for an offloaded BPF map or program,
bpf_map_offload_info_fill_ns() and bpf_prog_offload_info_fill_ns()
obtain the network namespace with get_net(dev_net(offmap->netdev)).
However, the associated netdev's netns may be racing with teardown
during netns destruction. If the netns refcount has already reached 0,
get_net() performs a refcount_t increment on 0, triggering:
refcount_t: addition on 0; use-after-free.
Although rtnl_lock and bpf_devs_lock ensure the netdev pointer remains
valid, they cannot prevent the netns refcount from reaching zero.
Fix this by using maybe_get_net() instead of get_net(). maybe_get_net()
uses refcount_inc_not_zero() and returns NULL if the refcount is already
zero, which causes ns_get_path_cb() to fail and the caller to return
-ENOENT -- the correct behavior when the netns is being destroyed. |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: mt76: Fix memory leak destroying device
All MT76 rx queues have an associated page_pool even if the queue is not
associated to a NAPI (e.g. WED RRO queues with WED enabled). Destroy the
page_pool running mt76_dma_cleanup routine during module unload.
Moreover returns pages to the page pool if WED is not enabled for WED RRO
queues. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Do not allow deleting local storage in NMI
Currently, local storage may deadlock when deferring freeing selem or
local storage through kfree_rcu(), call_rcu() or call_rcu_tasks_trace()
in NMI or reentrant. Since deleting selem in NMI is an unlikely use
case, partially mitigate it by returning error when calling from
bpf_xxx_storage_delete() helpers in NMI. Note that, it is still possible
to deadlock through reentrant. A full mitigation requires returning
error when irqs_disabled() is true, which, however is too heavy-handed
for bpf_xxx_storage_delete().
The long-term solution requires _nolock versions of call_rcu. Another
possible solution is to defer the free through irq_work [0], but it
would grow the size of selem, which is non-ideal.
The check is only needed in bpf_selem_unlink(), which is used by helpers
and syscalls. bpf_selem_unlink_nofail() is fine as it is called during
map and owner tear down that never run in NMI or reentrant.
[0] https://lore.kernel.org/bpf/20260205190233.912-1-alexei.starovoitov@gmail.com/ |
| In the Linux kernel, the following vulnerability has been resolved:
powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy
powerpc uses pt_frag_refcount as a reference counter for tracking it's
pte and pmd page table fragments. For PTE table, in case of Hash with
64K pagesize, we have 16 fragments of 4K size in one 64K page.
Patch series [1] "mm: free retracted page table by RCU"
added pte_free_defer() to defer the freeing of PTE tables when
retract_page_tables() is called for madvise MADV_COLLAPSE on shmem
range.
[1]: https://lore.kernel.org/all/7cd843a9-aa80-14f-5eb2-33427363c20@google.com/
pte_free_defer() sets the active flag on the corresponding fragment's
folio & calls pte_fragment_free(), which reduces the pt_frag_refcount.
When pt_frag_refcount reaches 0 (no active fragment using the folio), it
checks if the folio active flag is set, if set, it calls call_rcu to
free the folio, it the active flag is unset then it calls pte_free_now().
Now, this can lead to following problem in a corner case...
[ 265.351553][ T183] BUG: Bad page state in process a.out pfn:20d62
[ 265.353555][ T183] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20d62
[ 265.355457][ T183] flags: 0x3ffff800000100(active|node=0|zone=0|lastcpupid=0x7ffff)
[ 265.358719][ T183] raw: 003ffff800000100 0000000000000000 5deadbeef0000122 0000000000000000
[ 265.360177][ T183] raw: 0000000000000000 c0000000119caf58 00000000ffffffff 0000000000000000
[ 265.361438][ T183] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 265.362572][ T183] Modules linked in:
[ 265.364622][ T183] CPU: 0 UID: 0 PID: 183 Comm: a.out Not tainted 6.18.0-rc3-00141-g1ddeaaace7ff-dirty #53 VOLUNTARY
[ 265.364785][ T183] Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries
[ 265.364908][ T183] Call Trace:
[ 265.364955][ T183] [c000000011e6f7c0] [c000000001cfaa18] dump_stack_lvl+0x130/0x148 (unreliable)
[ 265.365202][ T183] [c000000011e6f7f0] [c000000000794758] bad_page+0xb4/0x1c8
[ 265.365384][ T183] [c000000011e6f890] [c00000000079c020] __free_frozen_pages+0x838/0xd08
[ 265.365554][ T183] [c000000011e6f980] [c0000000000a70ac] pte_frag_destroy+0x298/0x310
[ 265.365729][ T183] [c000000011e6fa30] [c0000000000aa764] arch_exit_mmap+0x34/0x218
[ 265.365912][ T183] [c000000011e6fa80] [c000000000751698] exit_mmap+0xb8/0x820
[ 265.366080][ T183] [c000000011e6fc30] [c0000000001b1258] __mmput+0x98/0x300
[ 265.366244][ T183] [c000000011e6fc80] [c0000000001c81f8] do_exit+0x470/0x1508
[ 265.366421][ T183] [c000000011e6fd70] [c0000000001c95e4] do_group_exit+0x88/0x148
[ 265.366602][ T183] [c000000011e6fdc0] [c0000000001c96ec] pid_child_should_wake+0x0/0x178
[ 265.366780][ T183] [c000000011e6fdf0] [c00000000003a270] system_call_exception+0x1b0/0x4e0
[ 265.366958][ T183] [c000000011e6fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
The bad page state error occurs when such a folio gets freed (with
active flag set), from do_exit() path in parallel.
... this can happen when the pte fragment was allocated from this folio,
but when all the fragments get freed, the pte_frag_refcount still had some
unused fragments. Now, if this process exits, with such folio as it's cached
pte_frag in mm->context, then during pte_frag_destroy(), we simply call
pagetable_dtor() and pagetable_free(), meaning it doesn't clear the
active flag. This, can lead to the above bug. Since we are anyway in
do_exit() path, then if the refcount is 0, then I guess it should be
ok to simply clear the folio active flag before calling pagetable_dtor()
& pagetable_free(). |