CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
In the Linux kernel, the following vulnerability has been resolved:
mlxsw: spectrum_acl_erp: Fix object nesting warning
ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM
(A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can
contain more ACLs (i.e., tc filters), but the number of masks in each
region (i.e., tc chain) is limited.
In order to mitigate the effects of the above limitation, the device
allows filters to share a single mask if their masks only differ in up
to 8 consecutive bits. For example, dst_ip/25 can be represented using
dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the
number of masks being used (and therefore does not support mask
aggregation), but can contain a limited number of filters.
The driver uses the "objagg" library to perform the mask aggregation by
passing it objects that consist of the filter's mask and whether the
filter is to be inserted into the A-TCAM or the C-TCAM since filters in
different TCAMs cannot share a mask.
The set of created objects is dependent on the insertion order of the
filters and is not necessarily optimal. Therefore, the driver will
periodically ask the library to compute a more optimal set ("hints") by
looking at all the existing objects.
When the library asks the driver whether two objects can be aggregated
the driver only compares the provided masks and ignores the A-TCAM /
C-TCAM indication. This is the right thing to do since the goal is to
move as many filters as possible to the A-TCAM. The driver also forbids
two identical masks from being aggregated since this can only happen if
one was intentionally put in the C-TCAM to avoid a conflict in the
A-TCAM.
The above can result in the following set of hints:
H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta
H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta
After getting the hints from the library the driver will start migrating
filters from one region to another while consulting the computed hints
and instructing the device to perform a lookup in both regions during
the transition.
Assuming a filter with mask X is being migrated into the A-TCAM in the
new region, the hints lookup will return H1. Since H2 is the parent of
H1, the library will try to find the object associated with it and
create it if necessary in which case another hints lookup (recursive)
will be performed. This hints lookup for {mask Y, A-TCAM} will either
return H2 or H3 since the driver passes the library an object comparison
function that ignores the A-TCAM / C-TCAM indication.
This can eventually lead to nested objects which are not supported by
the library [1].
Fix by removing the object comparison function from both the driver and
the library as the driver was the only user. That way the lookup will
only return exact matches.
I do not have a reliable reproducer that can reproduce the issue in a
timely manner, but before the fix the issue would reproduce in several
minutes and with the fix it does not reproduce in over an hour.
Note that the current usefulness of the hints is limited because they
include the C-TCAM indication and represent aggregation that cannot
actually happen. This will be addressed in net-next.
[1]
WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0
Modules linked in:
CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42
Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0
[...]
Call Trace:
<TASK>
__objagg_obj_get+0x2bb/0x580
objagg_obj_get+0xe/0x80
mlxsw_sp_acl_erp_mask_get+0xb5/0xf0
mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0
mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
process_one_work+0x151/0x370 |
In the Linux kernel, the following vulnerability has been resolved:
perf: Fix event leak upon exit
When a task is scheduled out, pending sigtrap deliveries are deferred
to the target task upon resume to userspace via task_work.
However failures while adding an event's callback to the task_work
engine are ignored. And since the last call for events exit happen
after task work is eventually closed, there is a small window during
which pending sigtrap can be queued though ignored, leaking the event
refcount addition such as in the following scenario:
TASK A
-----
do_exit()
exit_task_work(tsk);
<IRQ>
perf_event_overflow()
event->pending_sigtrap = pending_id;
irq_work_queue(&event->pending_irq);
</IRQ>
=========> PREEMPTION: TASK A -> TASK B
event_sched_out()
event->pending_sigtrap = 0;
atomic_long_inc_not_zero(&event->refcount)
// FAILS: task work has exited
task_work_add(&event->pending_task)
[...]
<IRQ WORK>
perf_pending_irq()
// early return: event->oncpu = -1
</IRQ WORK>
[...]
=========> TASK B -> TASK A
perf_event_exit_task(tsk)
perf_event_exit_event()
free_event()
WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1)
// leak event due to unexpected refcount == 2
As a result the event is never released while the task exits.
Fix this with appropriate task_work_add()'s error handling. |
In the Linux kernel, the following vulnerability has been resolved:
perf: Fix event leak upon exec and file release
The perf pending task work is never waited upon the matching event
release. In the case of a child event, released via free_event()
directly, this can potentially result in a leaked event, such as in the
following scenario that doesn't even require a weak IRQ work
implementation to trigger:
schedule()
prepare_task_switch()
=======> <NMI>
perf_event_overflow()
event->pending_sigtrap = ...
irq_work_queue(&event->pending_irq)
<======= </NMI>
perf_event_task_sched_out()
event_sched_out()
event->pending_sigtrap = 0;
atomic_long_inc_not_zero(&event->refcount)
task_work_add(&event->pending_task)
finish_lock_switch()
=======> <IRQ>
perf_pending_irq()
//do nothing, rely on pending task work
<======= </IRQ>
begin_new_exec()
perf_event_exit_task()
perf_event_exit_event()
// If is child event
free_event()
WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1)
// event is leaked
Similar scenarios can also happen with perf_event_remove_on_exec() or
simply against concurrent perf_event_release().
Fix this with synchonizing against the possibly remaining pending task
work while freeing the event, just like is done with remaining pending
IRQ work. This means that the pending task callback neither need nor
should hold a reference to the event, preventing it from ever beeing
freed. |
In the Linux kernel, the following vulnerability has been resolved:
Input: uinput - reject requests with unreasonable number of slots
When exercising uinput interface syzkaller may try setting up device
with a really large number of slots, which causes memory allocation
failure in input_mt_init_slots(). While this allocation failure is
handled properly and request is rejected, it results in syzkaller
reports. Additionally, such request may put undue burden on the
system which will try to free a lot of memory for a bogus request.
Fix it by limiting allowed number of slots to 100. This can easily
be extended if we see devices that can track more than 100 contacts. |
Any project that parses untrusted Protocol Buffers data containing an arbitrary number of nested groups / series of SGROUP tags can corrupted by exceeding the stack limit i.e. StackOverflow. Parsing nested groups as unknown fields with DiscardUnknownFieldsParser or Java Protobuf Lite parser, or against Protobuf map fields, creates unbounded recursions that can be abused by an attacker. |
In the Linux kernel, the following vulnerability has been resolved:
netfilter: bridge: confirm multicast packets before passing them up the stack
conntrack nf_confirm logic cannot handle cloned skbs referencing
the same nf_conn entry, which will happen for multicast (broadcast)
frames on bridges.
Example:
macvlan0
|
br0
/ \
ethX ethY
ethX (or Y) receives a L2 multicast or broadcast packet containing
an IP packet, flow is not yet in conntrack table.
1. skb passes through bridge and fake-ip (br_netfilter)Prerouting.
-> skb->_nfct now references a unconfirmed entry
2. skb is broad/mcast packet. bridge now passes clones out on each bridge
interface.
3. skb gets passed up the stack.
4. In macvlan case, macvlan driver retains clone(s) of the mcast skb
and schedules a work queue to send them out on the lower devices.
The clone skb->_nfct is not a copy, it is the same entry as the
original skb. The macvlan rx handler then returns RX_HANDLER_PASS.
5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.
The Macvlan broadcast worker and normal confirm path will race.
This race will not happen if step 2 already confirmed a clone. In that
case later steps perform skb_clone() with skb->_nfct already confirmed (in
hash table). This works fine.
But such confirmation won't happen when eb/ip/nftables rules dropped the
packets before they reached the nf_confirm step in postrouting.
Pablo points out that nf_conntrack_bridge doesn't allow use of stateful
nat, so we can safely discard the nf_conn entry and let inet call
conntrack again.
This doesn't work for bridge netfilter: skb could have a nat
transformation. Also bridge nf prevents re-invocation of inet prerouting
via 'sabotage_in' hook.
Work around this problem by explicit confirmation of the entry at LOCAL_IN
time, before upper layer has a chance to clone the unconfirmed entry.
The downside is that this disables NAT and conntrack helpers.
Alternative fix would be to add locking to all code parts that deal with
unconfirmed packets, but even if that could be done in a sane way this
opens up other problems, for example:
-m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4
-m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5
For multicast case, only one of such conflicting mappings will be
created, conntrack only handles 1:1 NAT mappings.
Users should set create a setup that explicitly marks such traffic
NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass
them, ruleset might have accept rules for untracked traffic already,
so user-visible behaviour would change. |
In the Linux kernel, the following vulnerability has been resolved:
i2c: core: Run atomic i2c xfer when !preemptible
Since bae1d3a05a8b, i2c transfers are non-atomic if preemption is
disabled. However, non-atomic i2c transfers require preemption (e.g. in
wait_for_completion() while waiting for the DMA).
panic() calls preempt_disable_notrace() before calling
emergency_restart(). Therefore, if an i2c device is used for the
restart, the xfer should be atomic. This avoids warnings like:
[ 12.667612] WARNING: CPU: 1 PID: 1 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x33c/0x6b0
[ 12.676926] Voluntary context switch within RCU read-side critical section!
...
[ 12.742376] schedule_timeout from wait_for_completion_timeout+0x90/0x114
[ 12.749179] wait_for_completion_timeout from tegra_i2c_wait_completion+0x40/0x70
...
[ 12.994527] atomic_notifier_call_chain from machine_restart+0x34/0x58
[ 13.001050] machine_restart from panic+0x2a8/0x32c
Use !preemptible() instead, which is basically the same check as
pre-v5.2. |
In the Linux kernel, the following vulnerability has been resolved:
crypto: pcrypt - Fix hungtask for PADATA_RESET
We found a hungtask bug in test_aead_vec_cfg as follows:
INFO: task cryptomgr_test:391009 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call trace:
__switch_to+0x98/0xe0
__schedule+0x6c4/0xf40
schedule+0xd8/0x1b4
schedule_timeout+0x474/0x560
wait_for_common+0x368/0x4e0
wait_for_completion+0x20/0x30
wait_for_completion+0x20/0x30
test_aead_vec_cfg+0xab4/0xd50
test_aead+0x144/0x1f0
alg_test_aead+0xd8/0x1e0
alg_test+0x634/0x890
cryptomgr_test+0x40/0x70
kthread+0x1e0/0x220
ret_from_fork+0x10/0x18
Kernel panic - not syncing: hung_task: blocked tasks
For padata_do_parallel, when the return err is 0 or -EBUSY, it will call
wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal
case, aead_request_complete() will be called in pcrypt_aead_serial and the
return err is 0 for padata_do_parallel. But, when pinst->flags is
PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it
won't call aead_request_complete(). Therefore, test_aead_vec_cfg will
hung at wait_for_completion(&wait->completion), which will cause
hungtask.
The problem comes as following:
(padata_do_parallel) |
rcu_read_lock_bh(); |
err = -EINVAL; | (padata_replace)
| pinst->flags |= PADATA_RESET;
err = -EBUSY |
if (pinst->flags & PADATA_RESET) |
rcu_read_unlock_bh() |
return err
In order to resolve the problem, we replace the return err -EBUSY with
-EAGAIN, which means parallel_data is changing, and the caller should call
it again.
v3:
remove retry and just change the return err.
v2:
introduce padata_try_do_parallel() in pcrypt_aead_encrypt and
pcrypt_aead_decrypt to solve the hungtask. |
In the Linux kernel, the following vulnerability has been resolved:
net/ipv6: release expired exception dst cached in socket
Dst objects get leaked in ip6_negative_advice() when this function is
executed for an expired IPv6 route located in the exception table. There
are several conditions that must be fulfilled for the leak to occur:
* an ICMPv6 packet indicating a change of the MTU for the path is received,
resulting in an exception dst being created
* a TCP connection that uses the exception dst for routing packets must
start timing out so that TCP begins retransmissions
* after the exception dst expires, the FIB6 garbage collector must not run
before TCP executes ip6_negative_advice() for the expired exception dst
When TCP executes ip6_negative_advice() for an exception dst that has
expired and if no other socket holds a reference to the exception dst, the
refcount of the exception dst is 2, which corresponds to the increment
made by dst_init() and the increment made by the TCP socket for which the
connection is timing out. The refcount made by the socket is never
released. The refcount of the dst is decremented in sk_dst_reset() but
that decrement is counteracted by a dst_hold() intentionally placed just
before the sk_dst_reset() in ip6_negative_advice(). After
ip6_negative_advice() has finished, there is no other object tied to the
dst. The socket lost its reference stored in sk_dst_cache and the dst is
no longer in the exception table. The exception dst becomes a leaked
object.
As a result of this dst leak, an unbalanced refcount is reported for the
loopback device of a net namespace being destroyed under kernels that do
not contain e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev"):
unregister_netdevice: waiting for lo to become free. Usage count = 2
Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The
patch that introduced the dst_hold() in ip6_negative_advice() was
92f1655aa2b22 ("net: fix __dst_negative_advice() race"). But 92f1655aa2b22
merely refactored the code with regards to the dst refcount so the issue
was present even before 92f1655aa2b22. The bug was introduced in
54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually
expired.") where the expired cached route is deleted and the sk_dst_cache
member of the socket is set to NULL by calling dst_negative_advice() but
the refcount belonging to the socket is left unbalanced.
The IPv4 version - ipv4_negative_advice() - is not affected by this bug.
When the TCP connection times out ipv4_negative_advice() merely resets the
sk_dst_cache of the socket while decrementing the refcount of the
exception dst. |
In the Linux kernel, the following vulnerability has been resolved:
atl1c: Work around the DMA RX overflow issue
This is based on alx driver commit 881d0327db37 ("net: alx: Work around
the DMA RX overflow issue").
The alx and atl1c drivers had RX overflow error which was why a custom
allocator was created to avoid certain addresses. The simpler workaround
then created for alx driver, but not for atl1c due to lack of tester.
Instead of using a custom allocator, check the allocated skb address and
use skb_reserve() to move away from problematic 0x...fc0 address.
Tested on AR8131 on Acer 4540. |
In the Linux kernel, the following vulnerability has been resolved:
misc: lis3lv02d_i2c: Fix regulators getting en-/dis-abled twice on suspend/resume
When not configured for wakeup lis3lv02d_i2c_suspend() will call
lis3lv02d_poweroff() even if the device has already been turned off
by the runtime-suspend handler and if configured for wakeup and
the device is runtime-suspended at this point then it is not turned
back on to serve as a wakeup source.
Before commit b1b9f7a49440 ("misc: lis3lv02d_i2c: Add missing setting
of the reg_ctrl callback"), lis3lv02d_poweroff() failed to disable
the regulators which as a side effect made calling poweroff() twice ok.
Now that poweroff() correctly disables the regulators, doing this twice
triggers a WARN() in the regulator core:
unbalanced disables for regulator-dummy
WARNING: CPU: 1 PID: 92 at drivers/regulator/core.c:2999 _regulator_disable
...
Fix lis3lv02d_i2c_suspend() to not call poweroff() a second time if
already runtime-suspended and add a poweron() call when necessary to
make wakeup work.
lis3lv02d_i2c_resume() has similar issues, with an added weirness that
it always powers on the device if it is runtime suspended, after which
the first runtime-resume will call poweron() again, causing the enabled
count for the regulator to increase by 1 every suspend/resume. These
unbalanced regulator_enable() calls cause the regulator to never
be turned off and trigger the following WARN() on driver unbind:
WARNING: CPU: 1 PID: 1724 at drivers/regulator/core.c:2396 _regulator_put
Fix this by making lis3lv02d_i2c_resume() mirror the new suspend(). |
In the Linux kernel, the following vulnerability has been resolved:
wifi: iwlwifi: mvm: don't set the MFP flag for the GTK
The firmware doesn't need the MFP flag for the GTK, it can even make the
firmware crash. in case the AP is configured with: group cipher TKIP and
MFPC. We would send the GTK with cipher = TKIP and MFP which is of course
not possible. |
In the Linux kernel, the following vulnerability has been resolved:
md/md-bitmap: fix incorrect usage for sb_index
Commit d7038f951828 ("md-bitmap: don't use ->index for pages backing the
bitmap file") removed page->index from bitmap code, but left wrong code
logic for clustered-md. current code never set slot offset for cluster
nodes, will sometimes cause crash in clustered env.
Call trace (partly):
md_bitmap_file_set_bit+0x110/0x1d8 [md_mod]
md_bitmap_startwrite+0x13c/0x240 [md_mod]
raid1_make_request+0x6b0/0x1c08 [raid1]
md_handle_request+0x1dc/0x368 [md_mod]
md_submit_bio+0x80/0xf8 [md_mod]
__submit_bio+0x178/0x300
submit_bio_noacct_nocheck+0x11c/0x338
submit_bio_noacct+0x134/0x614
submit_bio+0x28/0xdc
submit_bh_wbc+0x130/0x1cc
submit_bh+0x1c/0x28 |
In the Linux kernel, the following vulnerability has been resolved:
dm-raid: really frozen sync_thread during suspend
1) commit f52f5c71f3d4 ("md: fix stopping sync thread") remove
MD_RECOVERY_FROZEN from __md_stop_writes() and doesn't realize that
dm-raid relies on __md_stop_writes() to frozen sync_thread
indirectly. Fix this problem by adding MD_RECOVERY_FROZEN in
md_stop_writes(), and since stop_sync_thread() is only used for
dm-raid in this case, also move stop_sync_thread() to
md_stop_writes().
2) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen,
it only prevent new sync_thread to start, and it can't stop the
running sync thread; In order to frozen sync_thread, after seting the
flag, stop_sync_thread() should be used.
3) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use
it as condition for md_stop_writes() in raid_postsuspend() doesn't
look correct. Consider that reentrant stop_sync_thread() do nothing,
always call md_stop_writes() in raid_postsuspend().
4) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime,
and if MD_RECOVERY_FROZEN is cleared while the array is suspended,
new sync_thread can start unexpected. Fix this by disallow
raid_message() to change sync_thread status during suspend.
Note that after commit f52f5c71f3d4 ("md: fix stopping sync thread"), the
test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(),
and with previous fixes, the test won't hang there anymore, however, the
test will still fail and complain that ext4 is corrupted. And with this
patch, the test won't hang due to stop_sync_thread() or fail due to ext4
is corrupted anymore. However, there is still a deadlock related to
dm-raid456 that will be fixed in following patches. |
In the Linux kernel, the following vulnerability has been resolved:
pmdomain: imx8mp-blk-ctrl: add missing loop break condition
Currently imx8mp_blk_ctrl_remove() will continue the for loop
until an out-of-bounds exception occurs.
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dev_pm_domain_detach+0x8/0x48
lr : imx8mp_blk_ctrl_shutdown+0x58/0x90
sp : ffffffc084f8bbf0
x29: ffffffc084f8bbf0 x28: ffffff80daf32ac0 x27: 0000000000000000
x26: ffffffc081658d78 x25: 0000000000000001 x24: ffffffc08201b028
x23: ffffff80d0db9490 x22: ffffffc082340a78 x21: 00000000000005b0
x20: ffffff80d19bc180 x19: 000000000000000a x18: ffffffffffffffff
x17: ffffffc080a39e08 x16: ffffffc080a39c98 x15: 4f435f464f006c72
x14: 0000000000000004 x13: ffffff80d0172110 x12: 0000000000000000
x11: ffffff80d0537740 x10: ffffff80d05376c0 x9 : ffffffc0808ed2d8
x8 : ffffffc084f8bab0 x7 : 0000000000000000 x6 : 0000000000000000
x5 : ffffff80d19b9420 x4 : fffffffe03466e60 x3 : 0000000080800077
x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000000
Call trace:
dev_pm_domain_detach+0x8/0x48
platform_shutdown+0x2c/0x48
device_shutdown+0x158/0x268
kernel_restart_prepare+0x40/0x58
kernel_kexec+0x58/0xe8
__do_sys_reboot+0x198/0x258
__arm64_sys_reboot+0x2c/0x40
invoke_syscall+0x5c/0x138
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x38/0xc8
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x190/0x198
Code: 8128c2d0 ffffffc0 aa1e03e9 d503201f |
In the Linux kernel, the following vulnerability has been resolved:
io_uring: Fix release of pinned pages when __io_uaddr_map fails
Looking at the error path of __io_uaddr_map, if we fail after pinning
the pages for any reasons, ret will be set to -EINVAL and the error
handler won't properly release the pinned pages.
I didn't manage to trigger it without forcing a failure, but it can
happen in real life when memory is heavily fragmented. |
In the Linux kernel, the following vulnerability has been resolved:
drm/vmwgfx: Fix the lifetime of the bo cursor memory
The cleanup can be dispatched while the atomic update is still active,
which means that the memory acquired in the atomic update needs to
not be invalidated by the cleanup. The buffer objects in vmw_plane_state
instead of using the builtin map_and_cache were trying to handle
the lifetime of the mapped memory themselves, leading to crashes.
Use the map_and_cache instead of trying to manage the lifetime of the
buffer objects held by the vmw_plane_state.
Fixes kernel oops'es in IGT's kms_cursor_legacy forked-bo. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: amdgpu_ttm_gart_bind set gtt bound flag
Otherwise after the GTT bo is released, the GTT and gart space is freed
but amdgpu_ttm_backend_unbind will not clear the gart page table entry
and leave valid mapping entry pointing to the stale system page. Then
if GPU access the gart address mistakely, it will read undefined value
instead page fault, harder to debug and reproduce the real issue. |
Malicious code was inserted into the Nx (build system) package and several related plugins. The tampered package was published to the npm software registry, via a supply-chain attack. Affected versions contain code that scans the file system, collects credentials, and posts them to GitHub as a repo under user's accounts. |
A flaw was found in the live query subscription mechanism of the database engine. This vulnerability allows record or guest users to observe unauthorized records within the same table, bypassing access controls, via crafted LIVE SELECT subscriptions when other users alter or delete records. |