| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| A flaw has been found in mathurvishal CloudClassroom-PHP-Project up to 5dadec098bfbbf3300d60c3494db3fb95b66e7be. This impacts an unknown function of the file /postquerypublic.php of the component Post Query Details Page. This manipulation of the argument gnamex causes sql injection. The attack is possible to be carried out remotely. The exploit has been published and may be used. This product adopts a rolling release strategy to maintain continuous delivery. Therefore, version details for affected or updated releases cannot be specified. The vendor was contacted early about this disclosure but did not respond in any way. |
| MuPDF versions 1.23.0 through 1.27.0 contain a double-free vulnerability in fz_fill_pixmap_from_display_list() when an exception occurs during display list rendering. The function accepts a caller-owned fz_pixmap pointer but incorrectly drops the pixmap in its error handling path before rethrowing the exception. Callers (including the barcode decoding path in fz_decode_barcode_from_display_list) also drop the same pixmap in cleanup, resulting in a double-free that can corrupt the heap and crash the process. This issue affects applications that enable and use MuPDF barcode decoding and can be triggered by processing crafted input that causes a rendering-time error while decoding barcodes. |
| In the Linux kernel, the following vulnerability has been resolved:
scsi: core: Wake up the error handler when final completions race against each other
The fragile ordering between marking commands completed or failed so
that the error handler only wakes when the last running command
completes or times out has race conditions. These race conditions can
cause the SCSI layer to fail to wake the error handler, leaving I/O
through the SCSI host stuck as the error state cannot advance.
First, there is an memory ordering issue within scsi_dec_host_busy().
The write which clears SCMD_STATE_INFLIGHT may be reordered with reads
counting in scsi_host_busy(). While the local CPU will see its own
write, reordering can allow other CPUs in scsi_dec_host_busy() or
scsi_eh_inc_host_failed() to see a raised busy count, causing no CPU to
see a host busy equal to the host_failed count.
This race condition can be prevented with a memory barrier on the error
path to force the write to be visible before counting host busy
commands.
Second, there is a general ordering issue with scsi_eh_inc_host_failed(). By
counting busy commands before incrementing host_failed, it can race with a
final command in scsi_dec_host_busy(), such that scsi_dec_host_busy() does
not see host_failed incremented but scsi_eh_inc_host_failed() counts busy
commands before SCMD_STATE_INFLIGHT is cleared by scsi_dec_host_busy(),
resulting in neither waking the error handler task.
This needs the call to scsi_host_busy() to be moved after host_failed is
incremented to close the race condition. |
| In the Linux kernel, the following vulnerability has been resolved:
can: usb_8dev: usb_8dev_read_bulk_callback(): fix URB memory leak
Fix similar memory leak as in commit 7352e1d5932a ("can: gs_usb:
gs_usb_receive_bulk_callback(): fix URB memory leak").
In usb_8dev_open() -> usb_8dev_start(), the URBs for USB-in transfers are
allocated, added to the priv->rx_submitted anchor and submitted. In the
complete callback usb_8dev_read_bulk_callback(), the URBs are processed and
resubmitted. In usb_8dev_close() -> unlink_all_urbs() the URBs are freed by
calling usb_kill_anchored_urbs(&priv->rx_submitted).
However, this does not take into account that the USB framework unanchors
the URB before the complete function is called. This means that once an
in-URB has been completed, it is no longer anchored and is ultimately not
released in usb_kill_anchored_urbs().
Fix the memory leak by anchoring the URB in the
usb_8dev_read_bulk_callback() to the priv->rx_submitted anchor. |
| In the Linux kernel, the following vulnerability has been resolved:
arm64/fpsimd: signal: Allocate SSVE storage when restoring ZA
The code to restore a ZA context doesn't attempt to allocate the task's
sve_state before setting TIF_SME. Consequently, restoring a ZA context
can place a task into an invalid state where TIF_SME is set but the
task's sve_state is NULL.
In legitimate but uncommon cases where the ZA signal context was NOT
created by the kernel in the context of the same task (e.g. if the task
is saved/restored with something like CRIU), we have no guarantee that
sve_state had been allocated previously. In these cases, userspace can
enter streaming mode without trapping while sve_state is NULL, causing a
later NULL pointer dereference when the kernel attempts to store the
register state:
| # ./sigreturn-za
| Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
| Mem abort info:
| ESR = 0x0000000096000046
| EC = 0x25: DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| FSC = 0x06: level 2 translation fault
| Data abort info:
| ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
| CM = 0, WnR = 1, TnD = 0, TagAccess = 0
| GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
| user pgtable: 4k pages, 52-bit VAs, pgdp=0000000101f47c00
| [0000000000000000] pgd=08000001021d8403, p4d=0800000102274403, pud=0800000102275403, pmd=0000000000000000
| Internal error: Oops: 0000000096000046 [#1] SMP
| Modules linked in:
| CPU: 0 UID: 0 PID: 153 Comm: sigreturn-za Not tainted 6.19.0-rc1 #1 PREEMPT
| Hardware name: linux,dummy-virt (DT)
| pstate: 214000c9 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
| pc : sve_save_state+0x4/0xf0
| lr : fpsimd_save_user_state+0xb0/0x1c0
| sp : ffff80008070bcc0
| x29: ffff80008070bcc0 x28: fff00000c1ca4c40 x27: 63cfa172fb5cf658
| x26: fff00000c1ca5228 x25: 0000000000000000 x24: 0000000000000000
| x23: 0000000000000000 x22: fff00000c1ca4c40 x21: fff00000c1ca4c40
| x20: 0000000000000020 x19: fff00000ff6900f0 x18: 0000000000000000
| x17: fff05e8e0311f000 x16: 0000000000000000 x15: 028fca8f3bdaf21c
| x14: 0000000000000212 x13: fff00000c0209f10 x12: 0000000000000020
| x11: 0000000000200b20 x10: 0000000000000000 x9 : fff00000ff69dcc0
| x8 : 00000000000003f2 x7 : 0000000000000001 x6 : fff00000c1ca5b48
| x5 : fff05e8e0311f000 x4 : 0000000008000000 x3 : 0000000000000000
| x2 : 0000000000000001 x1 : fff00000c1ca5970 x0 : 0000000000000440
| Call trace:
| sve_save_state+0x4/0xf0 (P)
| fpsimd_thread_switch+0x48/0x198
| __switch_to+0x20/0x1c0
| __schedule+0x36c/0xce0
| schedule+0x34/0x11c
| exit_to_user_mode_loop+0x124/0x188
| el0_interrupt+0xc8/0xd8
| __el0_irq_handler_common+0x18/0x24
| el0t_64_irq_handler+0x10/0x1c
| el0t_64_irq+0x198/0x19c
| Code: 54000040 d51b4408 d65f03c0 d503245f (e5bb5800)
| ---[ end trace 0000000000000000 ]---
Fix this by having restore_za_context() ensure that the task's sve_state
is allocated, matching what we do when taking an SME trap. Any live
SVE/SSVE state (which is restored earlier from a separate signal
context) must be preserved, and hence this is not zeroed. |
| In the Linux kernel, the following vulnerability has been resolved:
net/sched: qfq: Use cl_is_active to determine whether class is active in qfq_rm_from_ag
This is more of a preventive patch to make the code more consistent and
to prevent possible exploits that employ child qlen manipulations on qfq.
use cl_is_active instead of relying on the child qdisc's qlen to determine
class activation. |
| In the Linux kernel, the following vulnerability has been resolved:
ipvlan: Make the addrs_lock be per port
Make the addrs_lock be per port, not per ipvlan dev.
Initial code seems to be written in the assumption,
that any address change must occur under RTNL.
But it is not so for the case of IPv6. So
1) Introduce per-port addrs_lock.
2) It was needed to fix places where it was forgotten
to take lock (ipvlan_open/ipvlan_close)
This appears to be a very minor problem though.
Since it's highly unlikely that ipvlan_add_addr() will
be called on 2 CPU simultaneously. But nevertheless,
this could cause:
1) False-negative of ipvlan_addr_busy(): one interface
iterated through all port->ipvlans + ipvlan->addrs
under some ipvlan spinlock, and another added IP
under its own lock. Though this is only possible
for IPv6, since looks like only ipvlan_addr6_event() can be
called without rtnl_lock.
2) Race since ipvlan_ht_addr_add(port) is called under
different ipvlan->addrs_lock locks
This should not affect performance, since add/remove IP
is a rare situation and spinlock is not taken on fast
paths. |
| In the Linux kernel, the following vulnerability has been resolved:
arm64/fpsimd: signal: Fix restoration of SVE context
When SME is supported, Restoring SVE signal context can go wrong in a
few ways, including placing the task into an invalid state where the
kernel may read from out-of-bounds memory (and may potentially take a
fatal fault) and/or may kill the task with a SIGKILL.
(1) Restoring a context with SVE_SIG_FLAG_SM set can place the task into
an invalid state where SVCR.SM is set (and sve_state is non-NULL)
but TIF_SME is clear, consequently resuting in out-of-bounds memory
reads and/or killing the task with SIGKILL.
This can only occur in unusual (but legitimate) cases where the SVE
signal context has either been modified by userspace or was saved in
the context of another task (e.g. as with CRIU), as otherwise the
presence of an SVE signal context with SVE_SIG_FLAG_SM implies that
TIF_SME is already set.
While in this state, task_fpsimd_load() will NOT configure SMCR_ELx
(leaving some arbitrary value configured in hardware) before
restoring SVCR and attempting to restore the streaming mode SVE
registers from memory via sve_load_state(). As the value of
SMCR_ELx.LEN may be larger than the task's streaming SVE vector
length, this may read memory outside of the task's allocated
sve_state, reading unrelated data and/or triggering a fault.
While this can result in secrets being loaded into streaming SVE
registers, these values are never exposed. As TIF_SME is clear,
fpsimd_bind_task_to_cpu() will configure CPACR_ELx.SMEN to trap EL0
accesses to streaming mode SVE registers, so these cannot be
accessed directly at EL0. As fpsimd_save_user_state() verifies the
live vector length before saving (S)SVE state to memory, no secret
values can be saved back to memory (and hence cannot be observed via
ptrace, signals, etc).
When the live vector length doesn't match the expected vector length
for the task, fpsimd_save_user_state() will send a fatal SIGKILL
signal to the task. Hence the task may be killed after executing
userspace for some period of time.
(2) Restoring a context with SVE_SIG_FLAG_SM clear does not clear the
task's SVCR.SM. If SVCR.SM was set prior to restoring the context,
then the task will be left in streaming mode unexpectedly, and some
register state will be combined inconsistently, though the task will
be left in legitimate state from the kernel's PoV.
This can only occur in unusual (but legitimate) cases where ptrace
has been used to set SVCR.SM after entry to the sigreturn syscall,
as syscall entry clears SVCR.SM.
In these cases, the the provided SVE register data will be loaded
into the task's sve_state using the non-streaming SVE vector length
and the FPSIMD registers will be merged into this using the
streaming SVE vector length.
Fix (1) by setting TIF_SME when setting SVCR.SM. This also requires
ensuring that the task's sme_state has been allocated, but as this could
contain live ZA state, it should not be zeroed. Fix (2) by clearing
SVCR.SM when restoring a SVE signal context with SVE_SIG_FLAG_SM clear.
For consistency, I've pulled the manipulation of SVCR, TIF_SVE, TIF_SME,
and fp_type earlier, immediately after the allocation of
sve_state/sme_state, before the restore of the actual register state.
This makes it easier to ensure that these are always modified
consistently, even if a fault is taken while reading the register data
from the signal context. I do not expect any software to depend on the
exact state restored when a fault is taken while reading the context. |
| In the Linux kernel, the following vulnerability has been resolved:
leds: led-class: Only Add LED to leds_list when it is fully ready
Before this change the LED was added to leds_list before led_init_core()
gets called adding it the list before led_classdev.set_brightness_work gets
initialized.
This leaves a window where led_trigger_register() of a LED's default
trigger will call led_trigger_set() which calls led_set_brightness()
which in turn will end up queueing the *uninitialized*
led_classdev.set_brightness_work.
This race gets hit by the lenovo-thinkpad-t14s EC driver which registers
2 LEDs with a default trigger provided by snd_ctl_led.ko in quick
succession. The first led_classdev_register() causes an async modprobe of
snd_ctl_led to run and that async modprobe manages to exactly hit
the window where the second LED is on the leds_list without led_init_core()
being called for it, resulting in:
------------[ cut here ]------------
WARNING: CPU: 11 PID: 5608 at kernel/workqueue.c:4234 __flush_work+0x344/0x390
Hardware name: LENOVO 21N2S01F0B/21N2S01F0B, BIOS N42ET93W (2.23 ) 09/01/2025
...
Call trace:
__flush_work+0x344/0x390 (P)
flush_work+0x2c/0x50
led_trigger_set+0x1c8/0x340
led_trigger_register+0x17c/0x1c0
led_trigger_register_simple+0x84/0xe8
snd_ctl_led_init+0x40/0xf88 [snd_ctl_led]
do_one_initcall+0x5c/0x318
do_init_module+0x9c/0x2b8
load_module+0x7e0/0x998
Close the race window by moving the adding of the LED to leds_list to
after the led_init_core() call. |
| In the Linux kernel, the following vulnerability has been resolved:
bonding: limit BOND_MODE_8023AD to Ethernet devices
BOND_MODE_8023AD makes sense for ARPHRD_ETHER only.
syzbot reported:
BUG: KASAN: global-out-of-bounds in __hw_addr_create net/core/dev_addr_lists.c:63 [inline]
BUG: KASAN: global-out-of-bounds in __hw_addr_add_ex+0x25d/0x760 net/core/dev_addr_lists.c:118
Read of size 16 at addr ffffffff8bf94040 by task syz.1.3580/19497
CPU: 1 UID: 0 PID: 19497 Comm: syz.1.3580 Tainted: G L syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:-1 [inline]
kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:200
__asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
__hw_addr_create net/core/dev_addr_lists.c:63 [inline]
__hw_addr_add_ex+0x25d/0x760 net/core/dev_addr_lists.c:118
__dev_mc_add net/core/dev_addr_lists.c:868 [inline]
dev_mc_add+0xa1/0x120 net/core/dev_addr_lists.c:886
bond_enslave+0x2b8b/0x3ac0 drivers/net/bonding/bond_main.c:2180
do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2963
do_setlink+0xcf0/0x41c0 net/core/rtnetlink.c:3165
rtnl_changelink net/core/rtnetlink.c:3776 [inline]
__rtnl_newlink net/core/rtnetlink.c:3935 [inline]
rtnl_newlink+0x161c/0x1c90 net/core/rtnetlink.c:4072
rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6958
netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2550
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x82f/0x9e0 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg+0x21c/0x270 net/socket.c:742
____sys_sendmsg+0x505/0x820 net/socket.c:2592
___sys_sendmsg+0x21f/0x2a0 net/socket.c:2646
__sys_sendmsg+0x164/0x220 net/socket.c:2678
do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline]
__do_fast_syscall_32+0x1dc/0x560 arch/x86/entry/syscall_32.c:307
do_fast_syscall_32+0x34/0x80 arch/x86/entry/syscall_32.c:332
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
</TASK>
The buggy address belongs to the variable:
lacpdu_mcast_addr+0x0/0x40 |
| In the Linux kernel, the following vulnerability has been resolved:
netrom: fix double-free in nr_route_frame()
In nr_route_frame(), old_skb is immediately freed without checking if
nr_neigh->ax25 pointer is NULL. Therefore, if nr_neigh->ax25 is NULL,
the caller function will free old_skb again, causing a double-free bug.
Therefore, to prevent this, we need to modify it to check whether
nr_neigh->ax25 is NULL before freeing old_skb. |
| In the Linux kernel, the following vulnerability has been resolved:
migrate: correct lock ordering for hugetlb file folios
Syzbot has found a deadlock (analyzed by Lance Yang):
1) Task (5749): Holds folio_lock, then tries to acquire i_mmap_rwsem(read lock).
2) Task (5754): Holds i_mmap_rwsem(write lock), then tries to acquire
folio_lock.
migrate_pages()
-> migrate_hugetlbs()
-> unmap_and_move_huge_page() <- Takes folio_lock!
-> remove_migration_ptes()
-> __rmap_walk_file()
-> i_mmap_lock_read() <- Waits for i_mmap_rwsem(read lock)!
hugetlbfs_fallocate()
-> hugetlbfs_punch_hole() <- Takes i_mmap_rwsem(write lock)!
-> hugetlbfs_zero_partial_page()
-> filemap_lock_hugetlb_folio()
-> filemap_lock_folio()
-> __filemap_get_folio <- Waits for folio_lock!
The migration path is the one taking locks in the wrong order according to
the documentation at the top of mm/rmap.c. So expand the scope of the
existing i_mmap_lock to cover the calls to remove_migration_ptes() too.
This is (mostly) how it used to be after commit c0d0381ade79. That was
removed by 336bf30eb765 for both file & anon hugetlb pages when it should
only have been removed for anon hugetlb pages. |
| In the Linux kernel, the following vulnerability has been resolved:
uacce: fix cdev handling in the cleanup path
When cdev_device_add fails, it internally releases the cdev memory,
and if cdev_device_del is then executed, it will cause a hang error.
To fix it, we check the return value of cdev_device_add() and clear
uacce->cdev to avoid calling cdev_device_del in the uacce_remove. |
| In the Linux kernel, the following vulnerability has been resolved:
gue: Fix skb memleak with inner IP protocol 0.
syzbot reported skb memleak below. [0]
The repro generated a GUE packet with its inner protocol 0.
gue_udp_recv() returns -guehdr->proto_ctype for "resubmit"
in ip_protocol_deliver_rcu(), but this only works with
non-zero protocol number.
Let's drop such packets.
Note that 0 is a valid number (IPv6 Hop-by-Hop Option).
I think it is not practical to encap HOPOPT in GUE, so once
someone starts to complain, we could pass down a resubmit
flag pointer to distinguish two zeros from the upper layer:
* no error
* resubmit HOPOPT
[0]
BUG: memory leak
unreferenced object 0xffff888109695a00 (size 240):
comm "syz.0.17", pid 6088, jiffies 4294943096
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 40 c2 10 81 88 ff ff 00 00 00 00 00 00 00 00 .@..............
backtrace (crc a84b336f):
kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline]
slab_post_alloc_hook mm/slub.c:4958 [inline]
slab_alloc_node mm/slub.c:5263 [inline]
kmem_cache_alloc_noprof+0x3b4/0x590 mm/slub.c:5270
__build_skb+0x23/0x60 net/core/skbuff.c:474
build_skb+0x20/0x190 net/core/skbuff.c:490
__tun_build_skb drivers/net/tun.c:1541 [inline]
tun_build_skb+0x4a1/0xa40 drivers/net/tun.c:1636
tun_get_user+0xc12/0x2030 drivers/net/tun.c:1770
tun_chr_write_iter+0x71/0x120 drivers/net/tun.c:1999
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0x45d/0x710 fs/read_write.c:686
ksys_write+0xa7/0x170 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xa4/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f |
| In the Linux kernel, the following vulnerability has been resolved:
ksmbd: smbd: fix dma_unmap_sg() nents
The dma_unmap_sg() functions should be called with the same nents as the
dma_map_sg(), not the value the map function returned. |
| In the Linux kernel, the following vulnerability has been resolved:
intel_th: fix device leak on output open()
Make sure to drop the reference taken when looking up the th device
during output device open() on errors and on close().
Note that a recent commit fixed the leak in a couple of open() error
paths but not all of them, and the reference is still leaking on
successful open(). |
| In the Linux kernel, the following vulnerability has been resolved:
slimbus: core: fix device reference leak on report present
Slimbus devices can be allocated dynamically upon reception of
report-present messages.
Make sure to drop the reference taken when looking up already registered
devices.
Note that this requires taking an extra reference in case the device has
not yet been registered and has to be allocated. |
| In the Linux kernel, the following vulnerability has been resolved:
ALSA: usb-audio: Fix use-after-free in snd_usb_mixer_free()
When snd_usb_create_mixer() fails, snd_usb_mixer_free() frees
mixer->id_elems but the controls already added to the card still
reference the freed memory. Later when snd_card_register() runs,
the OSS mixer layer calls their callbacks and hits a use-after-free read.
Call trace:
get_ctl_value+0x63f/0x820 sound/usb/mixer.c:411
get_min_max_with_quirks.isra.0+0x240/0x1f40 sound/usb/mixer.c:1241
mixer_ctl_feature_info+0x26b/0x490 sound/usb/mixer.c:1381
snd_mixer_oss_build_test+0x174/0x3a0 sound/core/oss/mixer_oss.c:887
...
snd_card_register+0x4ed/0x6d0 sound/core/init.c:923
usb_audio_probe+0x5ef/0x2a90 sound/usb/card.c:1025
Fix by calling snd_ctl_remove() for all mixer controls before freeing
id_elems. We save the next pointer first because snd_ctl_remove()
frees the current element. |
| In the Linux kernel, the following vulnerability has been resolved:
scsi: xen: scsiback: Fix potential memory leak in scsiback_remove()
Memory allocated for struct vscsiblk_info in scsiback_probe() is not
freed in scsiback_remove() leading to potential memory leaks on remove,
as well as in the scsiback_probe() error paths. Fix that by freeing it
in scsiback_remove(). |
| In the Linux kernel, the following vulnerability has been resolved:
vsock/virtio: cap TX credit to local buffer size
The virtio transports derives its TX credit directly from peer_buf_alloc,
which is set from the remote endpoint's SO_VM_SOCKETS_BUFFER_SIZE value.
On the host side this means that the amount of data we are willing to
queue for a connection is scaled by a guest-chosen buffer size, rather
than the host's own vsock configuration. A malicious guest can advertise
a large buffer and read slowly, causing the host to allocate a
correspondingly large amount of sk_buff memory.
The same thing would happen in the guest with a malicious host, since
virtio transports share the same code base.
Introduce a small helper, virtio_transport_tx_buf_size(), that
returns min(peer_buf_alloc, buf_alloc), and use it wherever we consume
peer_buf_alloc.
This ensures the effective TX window is bounded by both the peer's
advertised buffer and our own buf_alloc (already clamped to
buffer_max_size via SO_VM_SOCKETS_BUFFER_MAX_SIZE), so a remote peer
cannot force the other to queue more data than allowed by its own
vsock settings.
On an unpatched Ubuntu 22.04 host (~64 GiB RAM), running a PoC with
32 guest vsock connections advertising 2 GiB each and reading slowly
drove Slab/SUnreclaim from ~0.5 GiB to ~57 GiB; the system only
recovered after killing the QEMU process. That said, if QEMU memory is
limited with cgroups, the maximum memory used will be limited.
With this patch applied:
Before:
MemFree: ~61.6 GiB
Slab: ~142 MiB
SUnreclaim: ~117 MiB
After 32 high-credit connections:
MemFree: ~61.5 GiB
Slab: ~178 MiB
SUnreclaim: ~152 MiB
Only ~35 MiB increase in Slab/SUnreclaim, no host OOM, and the guest
remains responsive.
Compatibility with non-virtio transports:
- VMCI uses the AF_VSOCK buffer knobs to size its queue pairs per
socket based on the local vsk->buffer_* values; the remote side
cannot enlarge those queues beyond what the local endpoint
configured.
- Hyper-V's vsock transport uses fixed-size VMBus ring buffers and
an MTU bound; there is no peer-controlled credit field comparable
to peer_buf_alloc, and the remote endpoint cannot drive in-flight
kernel memory above those ring sizes.
- The loopback path reuses virtio_transport_common.c, so it
naturally follows the same semantics as the virtio transport.
This change is limited to virtio_transport_common.c and thus affects
virtio-vsock, vhost-vsock, and loopback, bringing them in line with the
"remote window intersected with local policy" behaviour that VMCI and
Hyper-V already effectively have.
[Stefano: small adjustments after changing the previous patch]
[Stefano: tweak the commit message] |