| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
mptcp: ensure snd_nxt is properly initialized on connect
Christoph reported a splat hinting at a corrupted snd_una:
WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005
Modules linked in:
CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.9.0-rc1-gbbeac67456c9 #59
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:__mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005
Code: be 06 01 00 00 bf 06 01 00 00 e8 a8 12 e7 fe e9 00 fe ff ff e8
8e 1a e7 fe 0f b7 ab 3e 02 00 00 e9 d3 fd ff ff e8 7d 1a e7 fe
<0f> 0b 4c 8b bb e0 05 00 00 e9 74 fc ff ff e8 6a 1a e7 fe 0f 0b e9
RSP: 0018:ffffc9000013fd48 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8881029bd280 RCX: ffffffff82382fe4
RDX: ffff8881003cbd00 RSI: ffffffff823833c3 RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888138ba8000
R13: 0000000000000106 R14: ffff8881029bd908 R15: ffff888126560000
FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f604a5dae38 CR3: 0000000101dac002 CR4: 0000000000170ef0
Call Trace:
<TASK>
__mptcp_clean_una_wakeup net/mptcp/protocol.c:1055 [inline]
mptcp_clean_una_wakeup net/mptcp/protocol.c:1062 [inline]
__mptcp_retrans+0x7f/0x7e0 net/mptcp/protocol.c:2615
mptcp_worker+0x434/0x740 net/mptcp/protocol.c:2767
process_one_work+0x1e0/0x560 kernel/workqueue.c:3254
process_scheduled_works kernel/workqueue.c:3335 [inline]
worker_thread+0x3c7/0x640 kernel/workqueue.c:3416
kthread+0x121/0x170 kernel/kthread.c:388
ret_from_fork+0x44/0x50 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>
When fallback to TCP happens early on a client socket, snd_nxt
is not yet initialized and any incoming ack will copy such value
into snd_una. If the mptcp worker (dumbly) tries mptcp-level
re-injection after such ack, that would unconditionally trigger a send
buffer cleanup using 'bad' snd_una values.
We could easily disable re-injection for fallback sockets, but such
dumb behavior already helped catching a few subtle issues and a very
low to zero impact in practice.
Instead address the issue always initializing snd_nxt (and write_seq,
for consistency) at connect time. |
| In the Linux kernel, the following vulnerability has been resolved:
nfs: Handle error of rpc_proc_register() in nfs_net_init().
syzkaller reported a warning [0] triggered while destroying immature
netns.
rpc_proc_register() was called in init_nfs_fs(), but its error
has been ignored since at least the initial commit 1da177e4c3f4
("Linux-2.6.12-rc2").
Recently, commit d47151b79e32 ("nfs: expose /proc/net/sunrpc/nfs
in net namespaces") converted the procfs to per-netns and made
the problem more visible.
Even when rpc_proc_register() fails, nfs_net_init() could succeed,
and thus nfs_net_exit() will be called while destroying the netns.
Then, remove_proc_entry() will be called for non-existing proc
directory and trigger the warning below.
Let's handle the error of rpc_proc_register() properly in nfs_net_init().
[0]:
name 'nfs'
WARNING: CPU: 1 PID: 1710 at fs/proc/generic.c:711 remove_proc_entry+0x1bb/0x2d0 fs/proc/generic.c:711
Modules linked in:
CPU: 1 PID: 1710 Comm: syz-executor.2 Not tainted 6.8.0-12822-gcd51db110a7e #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:remove_proc_entry+0x1bb/0x2d0 fs/proc/generic.c:711
Code: 41 5d 41 5e c3 e8 85 09 b5 ff 48 c7 c7 88 58 64 86 e8 09 0e 71 02 e8 74 09 b5 ff 4c 89 e6 48 c7 c7 de 1b 80 84 e8 c5 ad 97 ff <0f> 0b eb b1 e8 5c 09 b5 ff 48 c7 c7 88 58 64 86 e8 e0 0d 71 02 eb
RSP: 0018:ffffc9000c6d7ce0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8880422b8b00 RCX: ffffffff8110503c
RDX: ffff888030652f00 RSI: ffffffff81105045 RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: ffffffff81bb62cb R12: ffffffff84807ffc
R13: ffff88804ad6fcc0 R14: ffffffff84807ffc R15: ffffffff85741ff8
FS: 00007f30cfba8640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff51afe8000 CR3: 000000005a60a005 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
rpc_proc_unregister+0x64/0x70 net/sunrpc/stats.c:310
nfs_net_exit+0x1c/0x30 fs/nfs/inode.c:2438
ops_exit_list+0x62/0xb0 net/core/net_namespace.c:170
setup_net+0x46c/0x660 net/core/net_namespace.c:372
copy_net_ns+0x244/0x590 net/core/net_namespace.c:505
create_new_namespaces+0x2ed/0x770 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xae/0x160 kernel/nsproxy.c:228
ksys_unshare+0x342/0x760 kernel/fork.c:3322
__do_sys_unshare kernel/fork.c:3393 [inline]
__se_sys_unshare kernel/fork.c:3391 [inline]
__x64_sys_unshare+0x1f/0x30 kernel/fork.c:3391
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x46/0x4e
RIP: 0033:0x7f30d0febe5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007f30cfba7cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f30d0febe5d
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000006c020600
RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 000000000000000b R14: 00007f30d104c530 R15: 0000000000000000
</TASK> |
| In the Linux kernel, the following vulnerability has been resolved:
firewire: ohci: mask bus reset interrupts between ISR and bottom half
In the FireWire OHCI interrupt handler, if a bus reset interrupt has
occurred, mask bus reset interrupts until bus_reset_work has serviced and
cleared the interrupt.
Normally, we always leave bus reset interrupts masked. We infer the bus
reset from the self-ID interrupt that happens shortly thereafter. A
scenario where we unmask bus reset interrupts was introduced in 2008 in
a007bb857e0b26f5d8b73c2ff90782d9c0972620: If
OHCI_PARAM_DEBUG_BUSRESETS (8) is set in the debug parameter bitmask, we
will unmask bus reset interrupts so we can log them.
irq_handler logs the bus reset interrupt. However, we can't clear the bus
reset event flag in irq_handler, because we won't service the event until
later. irq_handler exits with the event flag still set. If the
corresponding interrupt is still unmasked, the first bus reset will
usually freeze the system due to irq_handler being called again each
time it exits. This freeze can be reproduced by loading firewire_ohci
with "modprobe firewire_ohci debug=-1" (to enable all debugging output).
Apparently there are also some cases where bus_reset_work will get called
soon enough to clear the event, and operation will continue normally.
This freeze was first reported a few months after a007bb85 was committed,
but until now it was never fixed. The debug level could safely be set
to -1 through sysfs after the module was loaded, but this would be
ineffectual in logging bus reset interrupts since they were only
unmasked during initialization.
irq_handler will now leave the event flag set but mask bus reset
interrupts, so irq_handler won't be called again and there will be no
freeze. If OHCI_PARAM_DEBUG_BUSRESETS is enabled, bus_reset_work will
unmask the interrupt after servicing the event, so future interrupts
will be caught as desired.
As a side effect to this change, OHCI_PARAM_DEBUG_BUSRESETS can now be
enabled through sysfs in addition to during initial module loading.
However, when enabled through sysfs, logging of bus reset interrupts will
be effective only starting with the second bus reset, after
bus_reset_work has executed. |
| In the Linux kernel, the following vulnerability has been resolved:
fs/9p: only translate RWX permissions for plain 9P2000
Garbage in plain 9P2000's perm bits is allowed through, which causes it
to be able to set (among others) the suid bit. This was presumably not
the intent since the unix extended bits are handled explicitly and
conditionally on .u. |
| In the Linux kernel, the following vulnerability has been resolved:
ipvs: fix uninit-value for saddr in do_output_route4
syzbot reports for uninit-value for the saddr argument [1].
commit 4754957f04f5 ("ipvs: do not use random local source address for
tunnels") already implies that the input value of saddr
should be ignored but the code is still reading it which can prevent
to connect the route. Fix it by changing the argument to ret_saddr.
[1]
BUG: KMSAN: uninit-value in do_output_route4+0x42c/0x4d0 net/netfilter/ipvs/ip_vs_xmit.c:147
do_output_route4+0x42c/0x4d0 net/netfilter/ipvs/ip_vs_xmit.c:147
__ip_vs_get_out_rt+0x403/0x21d0 net/netfilter/ipvs/ip_vs_xmit.c:330
ip_vs_tunnel_xmit+0x205/0x2380 net/netfilter/ipvs/ip_vs_xmit.c:1136
ip_vs_in_hook+0x1aa5/0x35b0 net/netfilter/ipvs/ip_vs_core.c:2063
nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
nf_hook_slow+0xf7/0x400 net/netfilter/core.c:626
nf_hook include/linux/netfilter.h:269 [inline]
__ip_local_out+0x758/0x7e0 net/ipv4/ip_output.c:118
ip_local_out net/ipv4/ip_output.c:127 [inline]
ip_send_skb+0x6a/0x3c0 net/ipv4/ip_output.c:1501
udp_send_skb+0xfda/0x1b70 net/ipv4/udp.c:1195
udp_sendmsg+0x2fe3/0x33c0 net/ipv4/udp.c:1483
inet_sendmsg+0x1fc/0x280 net/ipv4/af_inet.c:851
sock_sendmsg_nosec net/socket.c:712 [inline]
__sock_sendmsg+0x267/0x380 net/socket.c:727
____sys_sendmsg+0x91b/0xda0 net/socket.c:2566
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2620
__sys_sendmmsg+0x41d/0x880 net/socket.c:2702
__compat_sys_sendmmsg net/compat.c:360 [inline]
__do_compat_sys_sendmmsg net/compat.c:367 [inline]
__se_compat_sys_sendmmsg net/compat.c:364 [inline]
__ia32_compat_sys_sendmmsg+0xc8/0x140 net/compat.c:364
ia32_sys_call+0x3ffa/0x41f0 arch/x86/include/generated/asm/syscalls_32.h:346
do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline]
__do_fast_syscall_32+0xb0/0x110 arch/x86/entry/syscall_32.c:306
do_fast_syscall_32+0x38/0x80 arch/x86/entry/syscall_32.c:331
do_SYSENTER_32+0x1f/0x30 arch/x86/entry/syscall_32.c:369
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
Uninit was created at:
slab_post_alloc_hook mm/slub.c:4167 [inline]
slab_alloc_node mm/slub.c:4210 [inline]
__kmalloc_cache_noprof+0x8fa/0xe00 mm/slub.c:4367
kmalloc_noprof include/linux/slab.h:905 [inline]
ip_vs_dest_dst_alloc net/netfilter/ipvs/ip_vs_xmit.c:61 [inline]
__ip_vs_get_out_rt+0x35d/0x21d0 net/netfilter/ipvs/ip_vs_xmit.c:323
ip_vs_tunnel_xmit+0x205/0x2380 net/netfilter/ipvs/ip_vs_xmit.c:1136
ip_vs_in_hook+0x1aa5/0x35b0 net/netfilter/ipvs/ip_vs_core.c:2063
nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
nf_hook_slow+0xf7/0x400 net/netfilter/core.c:626
nf_hook include/linux/netfilter.h:269 [inline]
__ip_local_out+0x758/0x7e0 net/ipv4/ip_output.c:118
ip_local_out net/ipv4/ip_output.c:127 [inline]
ip_send_skb+0x6a/0x3c0 net/ipv4/ip_output.c:1501
udp_send_skb+0xfda/0x1b70 net/ipv4/udp.c:1195
udp_sendmsg+0x2fe3/0x33c0 net/ipv4/udp.c:1483
inet_sendmsg+0x1fc/0x280 net/ipv4/af_inet.c:851
sock_sendmsg_nosec net/socket.c:712 [inline]
__sock_sendmsg+0x267/0x380 net/socket.c:727
____sys_sendmsg+0x91b/0xda0 net/socket.c:2566
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2620
__sys_sendmmsg+0x41d/0x880 net/socket.c:2702
__compat_sys_sendmmsg net/compat.c:360 [inline]
__do_compat_sys_sendmmsg net/compat.c:367 [inline]
__se_compat_sys_sendmmsg net/compat.c:364 [inline]
__ia32_compat_sys_sendmmsg+0xc8/0x140 net/compat.c:364
ia32_sys_call+0x3ffa/0x41f0 arch/x86/include/generated/asm/syscalls_32.h:346
do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline]
__do_fast_syscall_32+0xb0/0x110 arch/x86/entry/syscall_32.c:306
do_fast_syscall_32+0x38/0x80 arch/x86/entry/syscall_32.c:331
do_SYSENTER_32+0x1f/0x30 arch/x86/entry/syscall_32.c:369
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
CPU: 0 UID: 0 PID: 22408 Comm: syz.4.5165 Not tainted 6.15.0-rc3-syzkaller-00019-gbc3372351d0c #0 PREEMPT(undef)
Hardware name: Google Google Compute Engi
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Scrub packet on bpf_redirect_peer
When bpf_redirect_peer is used to redirect packets to a device in
another network namespace, the skb isn't scrubbed. That can lead skb
information from one namespace to be "misused" in another namespace.
As one example, this is causing Cilium to drop traffic when using
bpf_redirect_peer to redirect packets that just went through IPsec
decryption to a container namespace. The following pwru trace shows (1)
the packet path from the host's XFRM layer to the container's XFRM
layer where it's dropped and (2) the number of active skb extensions at
each function.
NETNS MARK IFACE TUPLE FUNC
4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 xfrm_rcv_cb
.active_extensions = (__u8)2,
4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 xfrm4_rcv_cb
.active_extensions = (__u8)2,
4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 gro_cells_receive
.active_extensions = (__u8)2,
[...]
4026533547 0 eth0 10.244.3.124:35473->10.244.2.158:53 skb_do_redirect
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 ip_rcv
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 ip_rcv_core
.active_extensions = (__u8)2,
[...]
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 udp_queue_rcv_one_skb
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 __xfrm_policy_check
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 __xfrm_decode_session
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 security_xfrm_decode_session
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 kfree_skb_reason(SKB_DROP_REASON_XFRM_POLICY)
.active_extensions = (__u8)2,
In this case, there are no XFRM policies in the container's network
namespace so the drop is unexpected. When we decrypt the IPsec packet,
the XFRM state used for decryption is set in the skb extensions. This
information is preserved across the netns switch. When we reach the
XFRM policy check in the container's netns, __xfrm_policy_check drops
the packet with LINUX_MIB_XFRMINNOPOLS because a (container-side) XFRM
policy can't be found that matches the (host-side) XFRM state used for
decryption.
This patch fixes this by scrubbing the packet when using
bpf_redirect_peer, as is done on typical netns switches via veth
devices except skb->mark and skb->tstamp are not zeroed. |
| In the Linux kernel, the following vulnerability has been resolved:
mm/huge_memory: fix dereferencing invalid pmd migration entry
When migrating a THP, concurrent access to the PMD migration entry during
a deferred split scan can lead to an invalid address access, as
illustrated below. To prevent this invalid access, it is necessary to
check the PMD migration entry and return early. In this context, there is
no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the
equality of the target folio. Since the PMD migration entry is locked, it
cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma
lookup points to a location which may contain the folio of interest, but
might instead contain another folio: and weeding out those other folios is
precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of
replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008
CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60
Call Trace:
<TASK>
try_to_migrate_one+0x28c/0x3730
rmap_walk_anon+0x4f6/0x770
unmap_folio+0x196/0x1f0
split_huge_page_to_list_to_order+0x9f6/0x1560
deferred_split_scan+0xac5/0x12a0
shrinker_debugfs_scan_write+0x376/0x470
full_proxy_write+0x15c/0x220
vfs_write+0x2fc/0xcb0
ksys_write+0x146/0x250
do_syscall_64+0x6a/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on
upstream. |
| In the Linux kernel, the following vulnerability has been resolved:
qibfs: fix _another_ leak
failure to allocate inode => leaked dentry...
this one had been there since the initial merge; to be fair,
if we are that far OOM, the odds of failing at that particular
allocation are low... |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: wl1251: fix memory leak in wl1251_tx_work
The skb dequeued from tx_queue is lost when wl1251_ps_elp_wakeup fails
with a -ETIMEDOUT error. Fix that by queueing the skb back to tx_queue. |
| In the Linux kernel, the following vulnerability has been resolved:
ASoC: qcom: Fix sc7280 lpass potential buffer overflow
Case values introduced in commit
5f78e1fb7a3e ("ASoC: qcom: Add driver support for audioreach solution")
cause out of bounds access in arrays of sc7280 driver data (e.g. in case
of RX_CODEC_DMA_RX_0 in sc7280_snd_hw_params()).
Redefine LPASS_MAX_PORTS to consider the maximum possible port id for
q6dsp as sc7280 driver utilizes some of those values.
Found by Linux Verification Center (linuxtesting.org) with SVACE. |
| In the Linux kernel, the following vulnerability has been resolved:
Input: mtk-pmic-keys - fix possible null pointer dereference
In mtk_pmic_keys_probe, the regs parameter is only set if the button is
parsed in the device tree. However, on hardware where the button is left
floating, that node will most likely be removed not to enable that
input. In that case the code will try to dereference a null pointer.
Let's use the regs struct instead as it is defined for all supported
platforms. Note that it is ok setting the key reg even if that latter is
disabled as the interrupt won't be enabled anyway. |
| In the Linux kernel, the following vulnerability has been resolved:
iio: imu: st_lsm6dsx: fix possible lockup in st_lsm6dsx_read_fifo
Prevent st_lsm6dsx_read_fifo from falling in an infinite loop in case
pattern_len is equal to zero and the device FIFO is not empty. |
| In the Linux kernel, the following vulnerability has been resolved:
iio: imu: st_lsm6dsx: fix possible lockup in st_lsm6dsx_read_tagged_fifo
Prevent st_lsm6dsx_read_tagged_fifo from falling in an infinite loop in
case pattern_len is equal to zero and the device FIFO is not empty. |
| In the Linux kernel, the following vulnerability has been resolved:
iio: light: opt3001: fix deadlock due to concurrent flag access
The threaded IRQ function in this driver is reading the flag twice: once to
lock a mutex and once to unlock it. Even though the code setting the flag
is designed to prevent it, there are subtle cases where the flag could be
true at the mutex_lock stage and false at the mutex_unlock stage. This
results in the mutex not being unlocked, resulting in a deadlock.
Fix it by making the opt3001_irq() code generally more robust, reading the
flag into a variable and using the variable value at both stages. |
| In the Linux kernel, the following vulnerability has been resolved:
usb: typec: ucsi: displayport: Fix deadlock
This patch introduces the ucsi_con_mutex_lock / ucsi_con_mutex_unlock
functions to the UCSI driver. ucsi_con_mutex_lock ensures the connector
mutex is only locked if a connection is established and the partner pointer
is valid. This resolves a deadlock scenario where
ucsi_displayport_remove_partner holds con->mutex waiting for
dp_altmode_work to complete while dp_altmode_work attempts to acquire it. |
| In the Linux kernel, the following vulnerability has been resolved:
x86/mm: Eliminate window where TLB flushes may be inadvertently skipped
tl;dr: There is a window in the mm switching code where the new CR3 is
set and the CPU should be getting TLB flushes for the new mm. But
should_flush_tlb() has a bug and suppresses the flush. Fix it by
widening the window where should_flush_tlb() sends an IPI.
Long Version:
=== History ===
There were a few things leading up to this.
First, updating mm_cpumask() was observed to be too expensive, so it was
made lazier. But being lazy caused too many unnecessary IPIs to CPUs
due to the now-lazy mm_cpumask(). So code was added to cull
mm_cpumask() periodically[2]. But that culling was a bit too aggressive
and skipped sending TLB flushes to CPUs that need them. So here we are
again.
=== Problem ===
The too-aggressive code in should_flush_tlb() strikes in this window:
// Turn on IPIs for this CPU/mm combination, but only
// if should_flush_tlb() agrees:
cpumask_set_cpu(cpu, mm_cpumask(next));
next_tlb_gen = atomic64_read(&next->context.tlb_gen);
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
load_new_mm_cr3(need_flush);
// ^ After 'need_flush' is set to false, IPIs *MUST*
// be sent to this CPU and not be ignored.
this_cpu_write(cpu_tlbstate.loaded_mm, next);
// ^ Not until this point does should_flush_tlb()
// become true!
should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()
and writing to 'loaded_mm', which is a window where they should not be
suppressed. Whoops.
=== Solution ===
Thankfully, the fuzzy "just about to write CR3" window is already marked
with loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state in
should_flush_tlb() is sufficient to ensure that the CPU is targeted with
an IPI.
This will cause more TLB flush IPIs. But the window is relatively small
and I do not expect this to cause any kind of measurable performance
impact.
Update the comment where LOADED_MM_SWITCHING is written since it grew
yet another user.
Peter Z also raised a concern that should_flush_tlb() might not observe
'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off()
writes them. Add a barrier to ensure that they are observed in the
order they are written. |
| In the Linux kernel, the following vulnerability has been resolved:
arm64: bpf: Only mitigate cBPF programs loaded by unprivileged users
Support for eBPF programs loaded by unprivileged users is typically
disabled. This means only cBPF programs need to be mitigated for BHB.
In addition, only mitigate cBPF programs that were loaded by an
unprivileged user. Privileged users can also load the same program
via eBPF, making the mitigation pointless. |
| In the Linux kernel, the following vulnerability has been resolved:
ksmbd: fix memory leak in parse_lease_state()
The previous patch that added bounds check for create lease context
introduced a memory leak. When the bounds check fails, the function
returns NULL without freeing the previously allocated lease_ctx_info
structure.
This patch fixes the issue by adding kfree(lreq) before returning NULL
in both boundary check cases. |
| In the Linux kernel, the following vulnerability has been resolved:
sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
When enqueuing the first packet to an HFSC class, hfsc_enqueue() calls the
child qdisc's peek() operation before incrementing sch->q.qlen and
sch->qstats.backlog. If the child qdisc uses qdisc_peek_dequeued(), this may
trigger an immediate dequeue and potential packet drop. In such cases,
qdisc_tree_reduce_backlog() is called, but the HFSC qdisc's qlen and backlog
have not yet been updated, leading to inconsistent queue accounting. This
can leave an empty HFSC class in the active list, causing further
consequences like use-after-free.
This patch fixes the bug by moving the increment of sch->q.qlen and
sch->qstats.backlog before the call to the child qdisc's peek() operation.
This ensures that queue length and backlog are always accurate when packet
drops or dequeues are triggered during the peek. |
| In the Linux kernel, the following vulnerability has been resolved:
openvswitch: Fix unsafe attribute parsing in output_userspace()
This patch replaces the manual Netlink attribute iteration in
output_userspace() with nla_for_each_nested(), which ensures that only
well-formed attributes are processed. |