Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large PR: 384 changed files, 55k changed lines #2

Open
wants to merge 282 commits into
base: branch-v6.3-rc1
Choose a base branch
from

Conversation

fahhem
Copy link

@fahhem fahhem commented Apr 1, 2023

No description provided.

Eric Dumazet and others added 30 commits March 1, 2023 08:11
Once initial skb->head has been allocated from skb_small_head_cache,
we need to make sure to use the same strategy whenever skb->head
has to be re-allocated, as found by syzbot [1]

This means kmalloc_reserve() can not fallback from using
skb_small_head_cache to generic (power-of-two) kmem caches.

It seems that we probably want to rework things in the future,
to partially revert following patch, because we no longer use
ksize() for skb allocated in TX path.

2b88cba ("net: preserve skb_end_offset() in skb_unclone_keeptruesize()")

Ideally, TCP stack should never put payload in skb->head,
this effort has to be completed.

In the mean time, add a sanity check.

[1]
BUG: KASAN: invalid-free in slab_free mm/slub.c:3787 [inline]
BUG: KASAN: invalid-free in kmem_cache_free+0xee/0x5c0 mm/slub.c:3809
Free of addr ffff88806cdee800 by task syz-executor239/5189

CPU: 0 PID: 5189 Comm: syz-executor239 Not tainted 6.2.0-rc8-syzkaller-02400-gd1fabc68f8e0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:306 [inline]
print_report+0x15e/0x45d mm/kasan/report.c:417
kasan_report_invalid_free+0x9b/0x1b0 mm/kasan/report.c:482
____kasan_slab_free+0x1a5/0x1c0 mm/kasan/common.c:216
kasan_slab_free include/linux/kasan.h:177 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1807
slab_free mm/slub.c:3787 [inline]
kmem_cache_free+0xee/0x5c0 mm/slub.c:3809
skb_kfree_head net/core/skbuff.c:857 [inline]
skb_kfree_head net/core/skbuff.c:853 [inline]
skb_free_head+0x16f/0x1a0 net/core/skbuff.c:872
skb_release_data+0x57a/0x820 net/core/skbuff.c:901
skb_release_all net/core/skbuff.c:966 [inline]
__kfree_skb+0x4f/0x70 net/core/skbuff.c:980
tcp_wmem_free_skb include/net/tcp.h:302 [inline]
tcp_rtx_queue_purge net/ipv4/tcp.c:3061 [inline]
tcp_write_queue_purge+0x617/0xcf0 net/ipv4/tcp.c:3074
tcp_v4_destroy_sock+0x125/0x810 net/ipv4/tcp_ipv4.c:2302
inet_csk_destroy_sock+0x19a/0x440 net/ipv4/inet_connection_sock.c:1195
__tcp_close+0xb96/0xf50 net/ipv4/tcp.c:3021
tcp_close+0x2d/0xc0 net/ipv4/tcp.c:3033
inet_release+0x132/0x270 net/ipv4/af_inet.c:426
__sock_release+0xcd/0x280 net/socket.c:651
sock_close+0x1c/0x20 net/socket.c:1393
__fput+0x27c/0xa90 fs/file_table.c:320
task_work_run+0x16f/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x23c/0x250 kernel/entry/common.c:203
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f2511f546c3
Code: c7 c2 c0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8
RSP: 002b:00007ffef0103d48 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f2511f546c3
RDX: 0000000000000978 RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000003434
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffef0103d6c
R13: 00007ffef0103d80 R14: 00007ffef0103dc0 R15: 0000000000000003
</TASK>

Allocated by task 5189:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
____kasan_kmalloc mm/kasan/common.c:333 [inline]
__kasan_kmalloc+0xa5/0xb0 mm/kasan/common.c:383
kasan_kmalloc include/linux/kasan.h:211 [inline]
__do_kmalloc_node mm/slab_common.c:968 [inline]
__kmalloc_node_track_caller+0x5b/0xc0 mm/slab_common.c:988
kmalloc_reserve+0xf1/0x230 net/core/skbuff.c:539
pskb_expand_head+0x237/0x1160 net/core/skbuff.c:1995
__skb_unclone_keeptruesize+0x93/0x220 net/core/skbuff.c:2094
skb_unclone_keeptruesize include/linux/skbuff.h:1910 [inline]
skb_prepare_for_shift net/core/skbuff.c:3804 [inline]
skb_shift+0xef8/0x1e20 net/core/skbuff.c:3877
tcp_skb_shift net/ipv4/tcp_input.c:1538 [inline]
tcp_shift_skb_data net/ipv4/tcp_input.c:1646 [inline]
tcp_sacktag_walk+0x93b/0x18a0 net/ipv4/tcp_input.c:1713
tcp_sacktag_write_queue+0x1599/0x31d0 net/ipv4/tcp_input.c:1974
tcp_ack+0x2e9f/0x5a10 net/ipv4/tcp_input.c:3847
tcp_rcv_established+0x667/0x2230 net/ipv4/tcp_input.c:6006
tcp_v4_do_rcv+0x670/0x9b0 net/ipv4/tcp_ipv4.c:1721
sk_backlog_rcv include/net/sock.h:1113 [inline]
__release_sock+0x133/0x3b0 net/core/sock.c:2921
release_sock+0x58/0x1b0 net/core/sock.c:3488
tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1485
inet_sendmsg+0x9d/0xe0 net/ipv4/af_inet.c:825
sock_sendmsg_nosec net/socket.c:722 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:745
sock_write_iter+0x295/0x3d0 net/socket.c:1136
call_write_iter include/linux/fs.h:2189 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9ed/0xdd0 fs/read_write.c:584
ksys_write+0x1ec/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff88806cdee800
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 0 bytes inside of
1024-byte region [ffff88806cdee800, ffff88806cdeec00)

The buggy address belongs to the physical page:
page:ffffea0001b37a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x6cde8
head:ffffea0001b37a00 order:3 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffff888012441dc0 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1f2a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC|__GFP_HARDWALL), pid 75, tgid 75 (kworker/u4:4), ts 96369578780, free_ts 26734162530
prep_new_page mm/page_alloc.c:2531 [inline]
get_page_from_freelist+0x119c/0x2ce0 mm/page_alloc.c:4283
__alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
alloc_pages+0x1aa/0x270 mm/mempolicy.c:2287
alloc_slab_page mm/slub.c:1851 [inline]
allocate_slab+0x25f/0x350 mm/slub.c:1998
new_slab mm/slub.c:2051 [inline]
___slab_alloc+0xa91/0x1400 mm/slub.c:3193
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3292
__slab_alloc_node mm/slub.c:3345 [inline]
slab_alloc_node mm/slub.c:3442 [inline]
__kmem_cache_alloc_node+0x1a4/0x430 mm/slub.c:3491
__do_kmalloc_node mm/slab_common.c:967 [inline]
__kmalloc_node_track_caller+0x4b/0xc0 mm/slab_common.c:988
kmalloc_reserve+0xf1/0x230 net/core/skbuff.c:539
__alloc_skb+0x129/0x330 net/core/skbuff.c:608
__netdev_alloc_skb+0x74/0x410 net/core/skbuff.c:672
__netdev_alloc_skb_ip_align include/linux/skbuff.h:3203 [inline]
netdev_alloc_skb_ip_align include/linux/skbuff.h:3213 [inline]
batadv_iv_ogm_aggregate_new+0x106/0x4e0 net/batman-adv/bat_iv_ogm.c:558
batadv_iv_ogm_queue_add net/batman-adv/bat_iv_ogm.c:670 [inline]
batadv_iv_ogm_schedule_buff+0xe6b/0x1450 net/batman-adv/bat_iv_ogm.c:849
batadv_iv_ogm_schedule net/batman-adv/bat_iv_ogm.c:868 [inline]
batadv_iv_ogm_schedule net/batman-adv/bat_iv_ogm.c:861 [inline]
batadv_iv_send_outstanding_bat_ogm_packet+0x744/0x910 net/batman-adv/bat_iv_ogm.c:1712
process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
worker_thread+0x669/0x1090 kernel/workqueue.c:2436
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1446 [inline]
free_pcp_prepare+0x66a/0xc20 mm/page_alloc.c:1496
free_unref_page_prepare mm/page_alloc.c:3369 [inline]
free_unref_page+0x1d/0x490 mm/page_alloc.c:3464
free_contig_range+0xb5/0x180 mm/page_alloc.c:9488
destroy_args+0xa8/0x64c mm/debug_vm_pgtable.c:998
debug_vm_pgtable+0x28de/0x296f mm/debug_vm_pgtable.c:1318
do_one_initcall+0x141/0x790 init/main.c:1306
do_initcall_level init/main.c:1379 [inline]
do_initcalls init/main.c:1395 [inline]
do_basic_setup init/main.c:1414 [inline]
kernel_init_freeable+0x6f9/0x782 init/main.c:1634
kernel_init+0x1e/0x1d0 init/main.c:1522
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

Memory state around the buggy address:
ffff88806cdee700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88806cdee780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88806cdee800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
ffff88806cdee880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Fixes: bf9f1ba ("net: add dedicated kmem_cache for typical/small skb->head")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Tested-by: Christoph Paasch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Smatch reports that 'ci' can be used uninitialized.
The current code ignores errno coming from tcf_idr_check_alloc, which
will lead to the incorrect usage of 'ci'. Handle the errno as it should.

Fixes: 288864e ("net/sched: act_connmark: transition to percpu stats and rcu")
Reviewed-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
ila_xlat_nl_cmd_get_mapping() generates an empty skb,
triggerring a recent sanity check [1].

Instead, return an error code, so that user space
can get it.

[1]
skb_assert_len
WARNING: CPU: 0 PID: 5923 at include/linux/skbuff.h:2527 skb_assert_len include/linux/skbuff.h:2527 [inline]
WARNING: CPU: 0 PID: 5923 at include/linux/skbuff.h:2527 __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
Modules linked in:
CPU: 0 PID: 5923 Comm: syz-executor269 Not tainted 6.2.0-syzkaller-18300-g2ebd1fbb946d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : skb_assert_len include/linux/skbuff.h:2527 [inline]
pc : __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
lr : skb_assert_len include/linux/skbuff.h:2527 [inline]
lr : __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
sp : ffff80001e0d6c40
x29: ffff80001e0d6e60 x28: dfff800000000000 x27: ffff0000c86328c0
x26: dfff800000000000 x25: ffff0000c8632990 x24: ffff0000c8632a00
x23: 0000000000000000 x22: 1fffe000190c6542 x21: ffff0000c8632a10
x20: ffff0000c8632a00 x19: ffff80001856e000 x18: ffff80001e0d5fc0
x17: 0000000000000000 x16: ffff80001235d16c x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000001
x11: ff80800008353a30 x10: 0000000000000000 x9 : 21567eaf25bfb600
x8 : 21567eaf25bfb600 x7 : 0000000000000001 x6 : 0000000000000001
x5 : ffff80001e0d6558 x4 : ffff800015c74760 x3 : ffff800008596744
x2 : 0000000000000001 x1 : 0000000100000000 x0 : 000000000000000e
Call trace:
skb_assert_len include/linux/skbuff.h:2527 [inline]
__dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
dev_queue_xmit include/linux/netdevice.h:3033 [inline]
__netlink_deliver_tap_skb net/netlink/af_netlink.c:307 [inline]
__netlink_deliver_tap+0x45c/0x6f8 net/netlink/af_netlink.c:325
netlink_deliver_tap+0xf4/0x174 net/netlink/af_netlink.c:338
__netlink_sendskb net/netlink/af_netlink.c:1283 [inline]
netlink_sendskb+0x6c/0x154 net/netlink/af_netlink.c:1292
netlink_unicast+0x334/0x8d4 net/netlink/af_netlink.c:1380
nlmsg_unicast include/net/netlink.h:1099 [inline]
genlmsg_unicast include/net/genetlink.h:433 [inline]
genlmsg_reply include/net/genetlink.h:443 [inline]
ila_xlat_nl_cmd_get_mapping+0x620/0x7d0 net/ipv6/ila/ila_xlat.c:493
genl_family_rcv_msg_doit net/netlink/genetlink.c:968 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x938/0xc1c net/netlink/genetlink.c:1065
netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2574
genl_rcv+0x38/0x50 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x800/0xae0 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
____sys_sendmsg+0x558/0x844 net/socket.c:2479
___sys_sendmsg net/socket.c:2533 [inline]
__sys_sendmsg+0x26c/0x33c net/socket.c:2562
__do_sys_sendmsg net/socket.c:2571 [inline]
__se_sys_sendmsg net/socket.c:2569 [inline]
__arm64_sys_sendmsg+0x80/0x94 net/socket.c:2569
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
irq event stamp: 136484
hardirqs last enabled at (136483): [<ffff800008350244>] __up_console_sem+0x60/0xb4 kernel/printk/printk.c:345
hardirqs last disabled at (136484): [<ffff800012358d60>] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:405
softirqs last enabled at (136418): [<ffff800008020ea8>] softirq_handle_end kernel/softirq.c:414 [inline]
softirqs last enabled at (136418): [<ffff800008020ea8>] __do_softirq+0xd4c/0xfa4 kernel/softirq.c:600
softirqs last disabled at (136371): [<ffff80000802b4a4>] ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:80
---[ end trace 0000000000000000 ]---
skb len=0 headroom=0 headlen=0 tailroom=192
mac=(0,0) net=(0,-1) trans=-1
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x0010 pkttype=6 iif=0
dev name=nlmon0 feat=0x0000000000005861

Fixes: 7f00fea ("ila: Add generic ILA translation facility")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
The two "goto errout;" paths in fl_change() became wrong
after cited commit.

Indeed we only must not call __fl_put() until the net pointer
has been set in tcf_exts_init_ex()

This is a minimal fix. We might in the future validate TCA_FLOWER_FLAGS
before we allocate @fnew.

BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:72 [inline]
BUG: KASAN: null-ptr-deref in atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
BUG: KASAN: null-ptr-deref in refcount_read include/linux/refcount.h:147 [inline]
BUG: KASAN: null-ptr-deref in __refcount_add_not_zero include/linux/refcount.h:152 [inline]
BUG: KASAN: null-ptr-deref in __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
BUG: KASAN: null-ptr-deref in refcount_inc_not_zero include/linux/refcount.h:245 [inline]
BUG: KASAN: null-ptr-deref in maybe_get_net include/net/net_namespace.h:269 [inline]
BUG: KASAN: null-ptr-deref in tcf_exts_get_net include/net/pkt_cls.h:260 [inline]
BUG: KASAN: null-ptr-deref in __fl_put net/sched/cls_flower.c:513 [inline]
BUG: KASAN: null-ptr-deref in __fl_put+0x13e/0x3b0 net/sched/cls_flower.c:508
Read of size 4 at addr 000000000000014c by task syz-executor548/5082

CPU: 0 PID: 5082 Comm: syz-executor548 Not tainted 6.2.0-syzkaller-05251-g5b7c4cabbb65 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_report mm/kasan/report.c:420 [inline]
kasan_report+0xec/0x130 mm/kasan/report.c:517
check_region_inline mm/kasan/generic.c:183 [inline]
kasan_check_range+0x141/0x190 mm/kasan/generic.c:189
instrument_atomic_read include/linux/instrumented.h:72 [inline]
atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
refcount_read include/linux/refcount.h:147 [inline]
__refcount_add_not_zero include/linux/refcount.h:152 [inline]
__refcount_inc_not_zero include/linux/refcount.h:227 [inline]
refcount_inc_not_zero include/linux/refcount.h:245 [inline]
maybe_get_net include/net/net_namespace.h:269 [inline]
tcf_exts_get_net include/net/pkt_cls.h:260 [inline]
__fl_put net/sched/cls_flower.c:513 [inline]
__fl_put+0x13e/0x3b0 net/sched/cls_flower.c:508
fl_change+0x101b/0x4ab0 net/sched/cls_flower.c:2341
tc_new_tfilter+0x97c/0x2290 net/sched/cls_api.c:2310
rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:722 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:745
____sys_sendmsg+0x334/0x900 net/socket.c:2504
___sys_sendmsg+0x110/0x1b0 net/socket.c:2558
__sys_sendmmsg+0x18f/0x460 net/socket.c:2644
__do_sys_sendmmsg net/socket.c:2673 [inline]
__se_sys_sendmmsg net/socket.c:2670 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2670

Fixes: 08a0063 ("net/sched: flower: Move filter handle initialization earlier")
Reported-by: [email protected]
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Paul Blakey <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
When the police was removed from the port, then it was trying to
remove the police from the police id and not from the actual
police index.
The police id represents the id of the police and police index
represents the position in HW where the police is situated.
The port police id can be any number while the port police index
is a number based on the port chip port.
Fix this by deleting the police from HW that is situated at the
police index and not police id.

Fixes: 5390334 ("net: lan966x: Add port police support using tc-matchall")
Signed-off-by: Horatiu Vultur <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
… the client

The test_local_dnat_portonly() function initiates the client-side as
soon as it sets the listening side to the background. This could lead to
a race condition where the server may not be ready to listen. To ensure
that the server-side is up and running before initiating the
client-side, a delay is introduced to the test_local_dnat_portonly()
function.

Before the fix:
  # ./nft_nat.sh
  PASS: netns routing/connectivity: ns0-rthlYrBU can reach ns1-rthlYrBU and ns2-rthlYrBU
  PASS: ping to ns1-rthlYrBU was ip NATted to ns2-rthlYrBU
  PASS: ping to ns1-rthlYrBU OK after ip nat output chain flush
  PASS: ipv6 ping to ns1-rthlYrBU was ip6 NATted to ns2-rthlYrBU
  2023/02/27 04:11:03 socat[6055] E connect(5, AF=2 10.0.1.99:2000, 16): Connection refused
  ERROR: inet port rewrite

After the fix:
  # ./nft_nat.sh
  PASS: netns routing/connectivity: ns0-9sPJV6JJ can reach ns1-9sPJV6JJ and ns2-9sPJV6JJ
  PASS: ping to ns1-9sPJV6JJ was ip NATted to ns2-9sPJV6JJ
  PASS: ping to ns1-9sPJV6JJ OK after ip nat output chain flush
  PASS: ipv6 ping to ns1-9sPJV6JJ was ip6 NATted to ns2-9sPJV6JJ
  PASS: inet port rewrite without l3 address

Fixes: 282e5f8 ("netfilter: nat: really support inet nat without l3 address")
Signed-off-by: Hangbin Liu <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
If the ruleset contains last timestamps, restore them accordingly.
Otherwise, listing after restoration shows never used items.

Fixes: 33a24de ("netfilter: nft_last: move stateful fields out of expression data")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
If the ruleset contains consumed quota, restore them accordingly.
Otherwise, listing after restoration shows never used items.

Restore the user-defined quota and flags too.

Fixes: ed0a0c6 ("netfilter: nft_quota: move stateful fields out of expression data")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Make it possible to see the distribution of size classes for block
groups. Helpful for testing and debugging the allocator w.r.t. to size
classes.

The new stats can be found at the path:

  /sys/fs/btrfs/<FSID>/allocation/<bg-type>/size_class

but they will only be non-zero for bg-type = data.

Signed-off-by: Boris Burkov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…) and do_tls_setsockopt_conf()

ctx->crypto_send.info is not protected by lock_sock in
do_tls_getsockopt_conf(). A race condition between do_tls_getsockopt_conf()
and error paths of do_tls_setsockopt_conf() may lead to a use-after-free
or null-deref.

More discussion:  https://lore.kernel.org/all/Y/ht6gQL+u6fj3dG@hog/

Fixes: 3c4d755 ("tls: kernel TLS support")
Signed-off-by: Hangyu Hua <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
syzbot sent a hung task report and Eric explains that adversarial
receiver may keep RWIN at 0 for a long time, so we are not guaranteed
to make forward progress. Thread which took tx_lock and went to sleep
may not release tx_lock for hours. Use interruptible sleep where
possible and reschedule the work if it can't take the lock.

Testing: existing selftest passes

Reported-by: [email protected]
Fixes: 79ffe60 ("net/tls: add a TX lock")
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected] # wait 4 weeks
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix bogus error report in selftests/netfilter/nft_nat.sh,
   from Hangbin Liu.

2) Initialize last and quota expressions from template when
   expr_ops::clone is called, otherwise, states are not restored
   accordingly when loading a dynamic set with elements using
   these two expressions.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_quota: copy content when cloning expression
  netfilter: nft_last: copy content when cloning expression
  selftests: nft_nat: ensuring the listening side is up before starting the client
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Florian reported a regression and sent a patch with the following
changelog:

<quote>
 There is a noticeable tcp performance regression (loopback or cross-netns),
 seen with iperf3 -Z (sendfile mode) when generic retpolines are needed.

 With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave
 memory pressure happen much more often. For TCP indirect calls are
 used.

 We can't remove the if-set-return short-circuit check in
 tcp_enter_memory_pressure because there are callers other than
 sk_enter_memory_pressure.  Doing a check in the sk wrapper too
 reduces the indirect calls enough to recover some performance.

 Before,
 0.00-60.00  sec   322 GBytes  46.1 Gbits/sec                  receiver

 After:
 0.00-60.04  sec   359 GBytes  51.4 Gbits/sec                  receiver

 "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns.
</quote>

It seems we forgot to upstream this indirect call mitigation we
had for years, lets do this instead.

[edumazet] - It seems we forgot to upstream this indirect call
             mitigation we had for years, let's do this instead.
           - Changed to INDIRECT_CALL_INET_1() to avoid bots reports.

Fixes: 4890b68 ("net: keep sk->sk_forward_alloc as small as possible")
Reported-by: Florian Westphal <[email protected]>
Link: https://lore.kernel.org/netdev/[email protected]/T/
Signed-off-by: Brian Vazquez <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
This patch fixes a buffer overflow access of skb->data if
ieee802154_hdr_peek_addrs() fails.

Reported-by: lianhui tang <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stefan Schmidt <[email protected]>
Avoid crashing the machine by checking
info->attrs[NL802154_ATTR_SCAN_TYPE] presence before de-referencing it,
which was the primary intend of the blamed patch.

Reported-by: Sanan Hasanov <[email protected]>
Suggested-by: Eric Dumazet <[email protected]>
Fixes: a0b6106 ("ieee802154: Convert scan error messages to extack")
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stefan Schmidt <[email protected]>
To pick the changes in:

  8c29f01 ("x86/sev: Add SEV-SNP guest feature negotiation support")

That triggers:

  CC      /tmp/build/perf-tools/arch/x86/util/kvm-stat.o
  CC      /tmp/build/perf-tools/util/header.o
  LD      /tmp/build/perf-tools/arch/x86/util/perf-in.o
  LD      /tmp/build/perf-tools/arch/x86/perf-in.o
  LD      /tmp/build/perf-tools/arch/perf-in.o
  LD      /tmp/build/perf-tools/util/perf-in.o
  LD      /tmp/build/perf-tools/perf-in.o
  LINK    /tmp/build/perf-tools/perf

But this time causes no changes in tooling results, as the introduced
SVM_VMGEXIT_TERM_REQUEST exit reason wasn't added to SVM_EXIT_REASONS,
that is used in kvm-stat.c.

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h'
  diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h

Cc: Adrian Hunter <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nikunj A Dadhania <[email protected]>
Link: http://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
When creating counters with initial delay configured, the enable_on_exec
field is not set. So we need to enable the counters later. The problem
is, when a workload is specified the target__none() is true. So we also
need to check stat_config.initial_delay.

In this change, we add a new field 'initial_delay' for struct target
which could be shared by other subcommands. And define
target__enable_on_exec() which returns whether enable_on_exec should be
set on normal cases.

Before this fix the event is not counted:

  $ ./perf stat -e instructions -D 100 sleep 2
  Events disabled
  Events enabled

   Performance counter stats for 'sleep 2':

       <not counted>      instructions

         1.901661124 seconds time elapsed

         0.001602000 seconds user
         0.000000000 seconds sys

After fix it works:

  $ ./perf stat -e instructions -D 100 sleep 2
  Events disabled
  Events enabled

   Performance counter stats for 'sleep 2':

             404,214      instructions

         1.901743475 seconds time elapsed

         0.001617000 seconds user
         0.000000000 seconds sys

Fixes: c587e77 ("perf stat: Do not delay the workload with --delay")
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Hui Wang <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
…ters from the MAC driver

Move the LAN7800 internal phy (phy ID  0x0007c132) specific register
accesses to the phy driver (microchip.c).

Fix the error reported by Enguerrand de Ribaucourt in December 2022,
"Some operations during the cable switch workaround modify the register
LAN88XX_INT_MASK of the PHY. However, this register is specific to the
LAN8835 PHY. For instance, if a DP8322I PHY is connected to the LAN7801,
that register (0x19), corresponds to the LED and MAC address
configuration, resulting in unapropriate behavior."

I did not test with the DP8322I PHY, but I tested with an EVB-LAN7800
with the internal PHY.

Fixes: 14437e3 ("lan78xx: workaround of forced 100 Full/Half duplex mode error")
Signed-off-by: Yuiko Oshino <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
syzbot reported use-after-free in cfusbl_device_notify() [1].  This
causes a stack trace like below:

BUG: KASAN: use-after-free in cfusbl_device_notify+0x7c9/0x870 net/caif/caif_usb.c:138
Read of size 8 at addr ffff88807ac4e6f0 by task kworker/u4:6/1214

CPU: 0 PID: 1214 Comm: kworker/u4:6 Not tainted 5.19.0-rc3-syzkaller-00146-g92f20ff72066 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313
 print_report mm/kasan/report.c:429 [inline]
 kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
 cfusbl_device_notify+0x7c9/0x870 net/caif/caif_usb.c:138
 notifier_call_chain+0xb5/0x200 kernel/notifier.c:87
 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1945
 call_netdevice_notifiers_extack net/core/dev.c:1983 [inline]
 call_netdevice_notifiers net/core/dev.c:1997 [inline]
 netdev_wait_allrefs_any net/core/dev.c:10227 [inline]
 netdev_run_todo+0xbc0/0x10f0 net/core/dev.c:10341
 default_device_exit_batch+0x44e/0x590 net/core/dev.c:11334
 ops_exit_list+0x125/0x170 net/core/net_namespace.c:167
 cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594
 process_one_work+0x996/0x1610 kernel/workqueue.c:2289
 worker_thread+0x665/0x1080 kernel/workqueue.c:2436
 kthread+0x2e9/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
 </TASK>

When unregistering a net device, unregister_netdevice_many_notify()
sets the device's reg_state to NETREG_UNREGISTERING, calls notifiers
with NETDEV_UNREGISTER, and adds the device to the todo list.

Later on, devices in the todo list are processed by netdev_run_todo().
netdev_run_todo() waits devices' reference count become 1 while
rebdoadcasting NETDEV_UNREGISTER notification.

When cfusbl_device_notify() is called with NETDEV_UNREGISTER multiple
times, the parent device might be freed.  This could cause UAF.
Processing NETDEV_UNREGISTER multiple times also causes inbalance of
reference count for the module.

This patch fixes the issue by accepting only first NETDEV_UNREGISTER
notification.

Fixes: 7ad65bf ("caif: Add support for CAIF over CDC NCM USB interface")
CC: [email protected] <[email protected]>
Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?id=c3bfd8e2450adab3bffe4d80821fbbced600407f [1]
Signed-off-by: Shigeru Yoshida <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
…/scm/linux/kernel/git/wpan/wpan

Stefan Schmidt says:

====================
ieee802154 for net 2023-03-02

Two small fixes this time.

Alexander Aring fixed a potential negative array access in the ca8210
driver.

Miquel Raynal fixed a crash that could have been triggered through
the extended netlink API for 802154. This only came in this merge window.
Found by syzkaller.

* tag 'ieee802154-for-net-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
  ieee802154: Prevent user from crashing the host
  ca8210: fix mac_len negative array access
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
To avoid having to repeat the entire definition of an attribute
(including the value) use the Attr object from the original set.
In fact this is already the documented expectation.

Fixes: be5bea1 ("net: add basic C code generators for Netlink")
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Pretty much all families use value: 1 or reserve as unspec
the first entry in attribute set and the first operation.
Make this the default. Update documentation (the doc for
values of operations just refers back to doc for attrs
so updating only attrs).

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Now that the codegen rules had been changed we can update
the specs to reflect the new default.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Jakub Kicinski says:

====================
tools: ynl: fix subset use and change default value for attrs/ops

Fix a problem in subsetting, which will become apparent when
the devlink family comes after the merge window. Even tho none
of the existing families need this, we don't want someone to
get "inspired" by the current, incorrect code when using specs
in other languages.

Change the default value for the first attr/op. This is a slight
behavior change so needs to go in now. The diffstat of the last
patch should serve as the clearest justification there..
====================

Signed-off-by: David S. Miller <[email protected]>
ice_get_module_eeprom() is broken since commit e9c9692 ("ice:
Reimplement module reads used by ethtool") In this refactor,
ice_get_module_eeprom() reads the eeprom in blocks of size 8.
But the condition that should protect the buffer overflow
ignores the last block. The last block always contains zeros.

Bug uncovered by ethtool upstream commit 9538f384b535
("netlink: eeprom: Defer page requests to individual parsers")
After this commit, ethtool reads a block with length = 1;
to read the SFF-8024 identifier value.

unpatched driver:
$ ethtool -m enp65s0f0np0 offset 0x90 length 8
Offset          Values
------          ------
0x0090:         00 00 00 00 00 00 00 00
$ ethtool -m enp65s0f0np0 offset 0x90 length 12
Offset          Values
------          ------
0x0090:         00 00 01 a0 4d 65 6c 6c 00 00 00 00
$

$ ethtool -m enp65s0f0np0
Offset          Values
------          ------
0x0000:         11 06 06 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0010:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0020:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0030:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0040:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0050:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0060:         00 00 00 00 00 00 00 00 00 00 00 00 00 01 08 00
0x0070:         00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00

patched driver:
$ ethtool -m enp65s0f0np0 offset 0x90 length 8
Offset          Values
------          ------
0x0090:         00 00 01 a0 4d 65 6c 6c
$ ethtool -m enp65s0f0np0 offset 0x90 length 12
Offset          Values
------          ------
0x0090:         00 00 01 a0 4d 65 6c 6c 61 6e 6f 78
$ ethtool -m enp65s0f0np0
    Identifier                                : 0x11 (QSFP28)
    Extended identifier                       : 0x00
    Extended identifier description           : 1.5W max. Power consumption
    Extended identifier description           : No CDR in TX, No CDR in RX
    Extended identifier description           : High Power Class (> 3.5 W) not enabled
    Connector                                 : 0x23 (No separable connector)
    Transceiver codes                         : 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x00
    Transceiver type                          : 40G Ethernet: 40G Base-CR4
    Transceiver type                          : 25G Ethernet: 25G Base-CR CA-N
    Encoding                                  : 0x05 (64B/66B)
    BR, Nominal                               : 25500Mbps
    Rate identifier                           : 0x00
    Length (SMF,km)                           : 0km
    Length (OM3 50um)                         : 0m
    Length (OM2 50um)                         : 0m
    Length (OM1 62.5um)                       : 0m
    Length (Copper or Active cable)           : 1m
    Transmitter technology                    : 0xa0 (Copper cable unequalized)
    Attenuation at 2.5GHz                     : 4db
    Attenuation at 5.0GHz                     : 5db
    Attenuation at 7.0GHz                     : 7db
    Attenuation at 12.9GHz                    : 10db
    ........
    ....

Fixes: e9c9692 ("ice: Reimplement module reads used by ethtool")
Signed-off-by: Petr Oros <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Tested-by: Jesse Brandeburg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
The csum flag of IPsec packet are set repeatedly. Therefore, the csum
flag set of IPsec and non-IPsec packet need to be distinguished.

As the ipv6 header does not have a csum field, so l3-csum flag is not
required to be set for ipv6 case.

L4-csum flag include the tcp csum flag and udp csum flag, we shouldn't
set the udp and tcp csum flag at the same time for one packet, should
set l4-csum flag according to the transport layer is tcp or udp.

Fixes: 57f273a ("nfp: add framework to support ipsec offloading")
Signed-off-by: Huanhuan Wang <[email protected]>
Reviewed-by: Louis Peens <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
The csum flag of IPsec packet are set repeatedly. Therefore, the csum
flag set of IPsec and non-IPsec packet need to be distinguished.

As the ipv6 header does not have a csum field, so l3-csum flag is not
required to be set for ipv6 case.

Fixes: 436396f ("nfp: support IPsec offloading for NFP3800")
Signed-off-by: Huanhuan Wang <[email protected]>
Reviewed-by: Louis Peens <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
When esp-tx-csum-offload is set to on, the protocol stack shouldn't
calculate the IPsec offload packet's csum, but it does. Because the
callback `.ndo_features_check` incorrectly masked NETIF_F_CSUM_MASK bit.

Fixes: 57f273a ("nfp: add framework to support ipsec offloading")
Signed-off-by: Huanhuan Wang <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Simon Horman says:

====================
nfp: fix incorrect IPsec checksum handling

this short series resolves two problems with IPsec checksum handling
in the nfp driver.

* PATCH 1/3, 2/3: Correct setting of checksum flags.
  One patch for each of the nfd3 and nfdk datapaths.

* Patch 3/3: Correct configuration of NETIF_F_CSUM_MASK
  so that the stack does not unecessarily calculate csums for
  IPsec offload packets.
====================

Signed-off-by: David S. Miller <[email protected]>
Add signature for the Logitech MX Master 3S mouse over Bluetooth.

Signed-off-by: Rafał Szalecki <[email protected]>
Reviewed-by: Bastien Nocera <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
torvalds and others added 26 commits March 10, 2023 09:19
…inux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - RISC-V architecture-specific ELF attributes have been disabled in the
   kernel builds

 - A fix for a locking failure while during errata patching that
   manifests on SiFive-based systems

 - A fix for a KASAN failure during stack unwinding

 - A fix for some lockdep failures during text patching

* tag 'riscv-for-linus-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Don't check text_mutex during stop_machine
  riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode
  RISC-V: fix taking the text_mutex twice during sifive errata patching
  RISC-V: Stop emitting attributes
…nel/git/rafael/linux-pm

Pull thermal control fix from Rafael Wysocki:
 "Fix a recently introduced deadlock in the int340x thermal control
  driver (Srinivas Pandruvada)"

* tag 'thermal-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: int340x: processor_thermal: Fix deadlock
…it/viro/vfs

Pull misc fixes from Al Viro:
 "pick_file() speculation fix + fix for alpha mis(merge,cherry-pick)

  The fs/file.c one is a genuine missing speculation barrier in
  pick_file() (reachable e.g. via close(2)). The alpha one is strictly
  speaking not a bug fix, but only because confusion between
  preempt_enable() and preempt_disable() is harmless on architecture
  without CONFIG_PREEMPT.

  Looks like alpha.git picked the wrong version of patch - that braino
  used to be there in early versions, but it had been fixed quite a
  while ago..."

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: prevent out-of-bounds array speculation when closing a file descriptor
  alpha: fix lazy-FPU mis(merged/applied/whatnot)
…/git/viro/vfs

Pull put_and_unmap_page() helper from Al Viro:
 "kmap_local_page() conversions in local filesystems keep running into
  kunmap_local_page()+put_page() combinations.  We can keep inventing
  names for identical inline helpers, but it's getting rather
  inconvenient. I've added a trivial helper to linux/highmem.h instead.

  I would've held that back until the merge window, if not for the mess
  it causes in tree topology - I've several branches merging from that
  one, and it's only going to get worse if e.g. ext2 stuff gets picked
  by Jan"

* tag 'pull-highmem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  new helper: put_and_unmap_page()
Pull block fixes from Jens Axboe:

 - Fix a regression in exclusive mode handling of the partition code,
   introduced in this merge windoe (Yu)

 - Fix for a use-after-free in BFQ (Yu)

 - Add sysfs documentation for the 'hidden' attribute (Sagi)

* tag 'block-6.3-2023-03-09' of git://git.kernel.dk/linux:
  block, bfq: fix uaf for 'stable_merge_bfqq'
  docs: sysfs-block: document hidden sysfs entry
  block: fix wrong mode for blkdev_put() from disk_scan_partitions()
…it/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Twenty fixes all in drivers except the one zone storage revalidation
  fix to sd.

  The megaraid_sas fixes are more on the level of a driver update
  (enabling crash dump and increasing lun number) but I thought you
  could let this slide on -rc1 and the next most extensive update is a
  load of fixes to mpi3mr"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: Fix wrong zone_write_granularity value during revalidate
  scsi: storvsc: Handle BlockSize change in Hyper-V VHD/VHDX file
  scsi: megaraid_sas: Driver version update to 07.725.01.00-rc1
  scsi: megaraid_sas: Add crash dump mode capability bit in MFI capabilities
  scsi: megaraid_sas: Update max supported LD IDs to 240
  scsi: mpi3mr: Bad drive in topology results kernel crash
  scsi: mpi3mr: NVMe command size greater than 8K fails
  scsi: mpi3mr: Return proper values for failures in firmware init path
  scsi: mpi3mr: Wait for diagnostic save during controller init
  scsi: mpi3mr: Driver unload crashes host when enhanced logging is enabled
  scsi: mpi3mr: ioctl timeout when disabling/enabling interrupt
  scsi: lpfc: Avoid usage of list iterator variable after loop
  scsi: lpfc: Check kzalloc() in lpfc_sli4_cgn_params_read()
  scsi: ufs: mcq: qcom: Clean the return path of ufs_qcom_mcq_config_resource()
  scsi: ufs: mcq: qcom: Fix passing zero to PTR_ERR
  scsi: ufs: ufs-qcom: Remove impossible check
  scsi: ufs: core: Add soft dependency on governor_simpleondemand
  scsi: hisi_sas: Check devm_add_action() return value
  scsi: qla2xxx: Add option to disable FC2 Target support
  scsi: target: iscsi: Fix an error message in iscsi_check_key()
The only caller of ext4_find_inline_data_nolock() that needs setting of
EXT4_STATE_MAY_INLINE_DATA flag is ext4_iget_extra_inode().  In
ext4_write_inline_data_end() we just need to update inode->i_inline_off.
Since we are going to add one more caller that does not need to set
EXT4_STATE_MAY_INLINE_DATA, just move setting of EXT4_STATE_MAY_INLINE_DATA
out to ext4_iget_extra_inode().

Signed-off-by: Ye Bin <[email protected]>
Cc: [email protected]
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Syzbot found the following issue:
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 without journal. Quota mode: none.
fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-aesni"
fscrypt: AES-256-XTS using implementation "xts-aes-aesni"
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5071 at mm/page_alloc.c:5525 __alloc_pages+0x30a/0x560 mm/page_alloc.c:5525
Modules linked in:
CPU: 1 PID: 5071 Comm: syz-executor263 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:__alloc_pages+0x30a/0x560 mm/page_alloc.c:5525
RSP: 0018:ffffc90003c2f1c0 EFLAGS: 00010246
RAX: ffffc90003c2f220 RBX: 0000000000000014 RCX: 0000000000000000
RDX: 0000000000000028 RSI: 0000000000000000 RDI: ffffc90003c2f248
RBP: ffffc90003c2f2d8 R08: dffffc0000000000 R09: ffffc90003c2f220
R10: fffff52000785e49 R11: 1ffff92000785e44 R12: 0000000000040d40
R13: 1ffff92000785e40 R14: dffffc0000000000 R15: 1ffff92000785e3c
FS:  0000555556c0d300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f95d5e04138 CR3: 00000000793aa000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __alloc_pages_node include/linux/gfp.h:237 [inline]
 alloc_pages_node include/linux/gfp.h:260 [inline]
 __kmalloc_large_node+0x95/0x1e0 mm/slab_common.c:1113
 __do_kmalloc_node mm/slab_common.c:956 [inline]
 __kmalloc+0xfe/0x190 mm/slab_common.c:981
 kmalloc include/linux/slab.h:584 [inline]
 kzalloc include/linux/slab.h:720 [inline]
 ext4_update_inline_data+0x236/0x6b0 fs/ext4/inline.c:346
 ext4_update_inline_dir fs/ext4/inline.c:1115 [inline]
 ext4_try_add_inline_entry+0x328/0x990 fs/ext4/inline.c:1307
 ext4_add_entry+0x5a4/0xeb0 fs/ext4/namei.c:2385
 ext4_add_nondir+0x96/0x260 fs/ext4/namei.c:2772
 ext4_create+0x36c/0x560 fs/ext4/namei.c:2817
 lookup_open fs/namei.c:3413 [inline]
 open_last_lookups fs/namei.c:3481 [inline]
 path_openat+0x12ac/0x2dd0 fs/namei.c:3711
 do_filp_open+0x264/0x4f0 fs/namei.c:3741
 do_sys_openat2+0x124/0x4e0 fs/open.c:1310
 do_sys_open fs/open.c:1326 [inline]
 __do_sys_openat fs/open.c:1342 [inline]
 __se_sys_openat fs/open.c:1337 [inline]
 __x64_sys_openat+0x243/0x290 fs/open.c:1337
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Above issue happens as follows:
ext4_iget
   ext4_find_inline_data_nolock ->i_inline_off=164 i_inline_size=60
ext4_try_add_inline_entry
   __ext4_mark_inode_dirty
      ext4_expand_extra_isize_ea ->i_extra_isize=32 s_want_extra_isize=44
         ext4_xattr_shift_entries
	 ->after shift i_inline_off is incorrect, actually is change to 176
ext4_try_add_inline_entry
  ext4_update_inline_dir
    get_max_inline_xattr_value_size
      if (EXT4_I(inode)->i_inline_off)
	entry = (struct ext4_xattr_entry *)((void *)raw_inode +
			EXT4_I(inode)->i_inline_off);
        free += EXT4_XATTR_SIZE(le32_to_cpu(entry->e_value_size));
	->As entry is incorrect, then 'free' may be negative
   ext4_update_inline_data
      value = kzalloc(len, GFP_NOFS);
      -> len is unsigned int, maybe very large, then trigger warning when
         'kzalloc()'

To resolve the above issue we need to update 'i_inline_off' after
'ext4_xattr_shift_entries()'.  We do not need to set
EXT4_STATE_MAY_INLINE_DATA flag here, since ext4_mark_inode_dirty()
already sets this flag if needed.  Setting EXT4_STATE_MAY_INLINE_DATA
when it is needed may trigger a BUG_ON in ext4_writepages().

Reported-by: [email protected]
Cc: [email protected]
Signed-off-by: Ye Bin <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
The generic bmap() function exported by the VFS takes locks and does
checks that are not necessary for the journal inode.  So allow the
file system to set a journal-optimized bmap function in
journal->j_bmap.

Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?id=e4aaa78795e490421c79f76ec3679006c8ff4cf0
Signed-off-by: Theodore Ts'o <[email protected]>
…ut error

Now, 'es->s_state' maybe covered by recover journal. And journal errno
maybe not recorded in journal sb as IO error. ext4_update_super() only
update error information when 'sbi->s_add_error_count' large than zero.
Then 'EXT4_ERROR_FS' flag maybe lost.
To solve above issue just recover 'es->s_state' error flag after journal
replay like error info.

Signed-off-by: Ye Bin <[email protected]>
Reviewed-by: Baokun Li <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Now, jounral error number maybe cleared even though ext4_commit_super()
failed. This may lead to error flag miss, then fsck will miss to check
file system deeply.

Signed-off-by: Ye Bin <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
If the boot loader inode has never been used before, the
EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the
i_size to 0.  However, if the "never before used" boot loader has a
non-zero i_size, then i_disksize will be non-zero, and the
inconsistency between i_size and i_disksize can trigger a kernel
warning:

 WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319
 CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa
 RIP: 0010:ext4_file_write_iter+0xbc7/0xd10
 Call Trace:
  vfs_write+0x3b1/0x5c0
  ksys_write+0x77/0x160
  __x64_sys_write+0x22/0x30
  do_syscall_64+0x39/0x80

Reproducer:
 1. create corrupted image and mount it:
       mke2fs -t ext4 /tmp/foo.img 200
       debugfs -wR "sif <5> size 25700" /tmp/foo.img
       mount -t ext4 /tmp/foo.img /mnt
       cd /mnt
       echo 123 > file
 2. Run the reproducer program:
       posix_memalign(&buf, 1024, 1024)
       fd = open("file", O_RDWR | O_DIRECT);
       ioctl(fd, EXT4_IOC_SWAP_BOOT);
       write(fd, buf, 1024);

Fix this by setting i_disksize as well as i_size to zero when
initiaizing the boot loader inode.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159
Cc: [email protected]
Signed-off-by: Zhihao Cheng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Switching to BLK_MQ_F_BLOCKING wrongly removed the call to
blk_mq_end_request(). Add it back to have our IOs finished

Fixes: 91cc8fb ("ubi: block: set BLK_MQ_F_BLOCKING")
Analyzed-by: Linus Torvalds <[email protected]>
Reported-by: Daniel Palmer <[email protected]>
Link: https://lore.kernel.org/linux-mtd/CAHk-=wi29bbBNh3RqJKu3PxzpjDN5D5K17gEVtXrb7-6bfrnMQ@mail.gmail.com/
Signed-off-by: Richard Weinberger <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Daniel Palmer <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
…nel/git/wsa/linux

Pull i2c updates from Wolfram Sang:
 "This marks the end of a transition to let I2C have the same probe
  semantics as other subsystems. Uwe took care that no drivers in the
  current tree nor in -next use the deprecated .probe call. So, it is a
  good time to switch to the new, standard semantics now.

  There is also a regression fix:

   - regression fix for the notifier handling of the I2C core

   - final coversions of drivers away from deprecated .probe

   - make .probe_new the standard probe and convert I2C core to use it

* tag 'i2c-for-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: dev: Fix bus callback return values
  i2c: Convert drivers to new .probe() callback
  i2c: mux: Convert all drivers to new .probe() callback
  i2c: Switch .probe() to not take an id parameter
  media: i2c: ov2685: convert to i2c's .probe_new()
  media: i2c: ov5695: convert to i2c's .probe_new()
  w1: ds2482: Convert to i2c's .probe_new()
  serial: sc16is7xx: Convert to i2c's .probe_new()
  mtd: maps: pismo: Convert to i2c's .probe_new()
  misc: ad525x_dpot-i2c: Convert to i2c's .probe_new()
The cpumask_check() was unnecessarily tight, and causes problems for the
users of cpumask_next().

We have a number of users that take the previous return value of one of
the bit scanning functions and subtract one to keep it in "range".  But
since the scanning functions end up returning up to 'small_cpumask_bits'
instead of the tighter 'nr_cpumask_bits', the range really needs to be
using that widened form.

[ This "previous-1" behavior is also the reason we have all those
  comments about /* -1 is a legal arg here. */ and separate checks for
  that being ok.  So we could have just made "small_cpumask_bits-1"
  be a similar special "don't check this" value.

  Tetsuo Handa even suggested a patch that only does that for
  cpumask_next(), since that seems to be the only actual case that
  triggers, but that all makes it even _more_ magical and special. So
  just relax the check ]

One example of this kind of pattern being the 'c_start()' function in
arch/x86/kernel/cpu/proc.c, but also duplicated in various forms on
other architectures.

Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?extid=96cae094d90877641f32
Reported-by: Tetsuo Handa <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Fixes: 596ff4a ("cpumask: re-introduce constant-sized cpumask optimizations")
Signed-off-by: Linus Torvalds <[email protected]>
…ux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Bug fixes and regressions for ext4, the most serious of which is a
  potential deadlock during directory renames that was introduced during
  the merge window discovered by a combination of syzbot and lockdep"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: zero i_disksize when initializing the bootloader inode
  ext4: make sure fs error flag setted before clear journal error
  ext4: commit super block if fs record error when journal record without error
  ext4, jbd2: add an optimized bmap for the journal inode
  ext4: fix WARNING in ext4_update_inline_data
  ext4: move where set the MAY_INLINE_DATA flag is set
  ext4: Fix deadlock during directory rename
  ext4: Fix comment about the 64BIT feature
  docs: ext4: modify the group desc size to 64
  ext4: fix another off-by-one fsmap error on 1k block filesystems
  ext4: fix RENAME_WHITEOUT handling for inline directories
  ext4: make kobj_type structures constant
  ext4: fix cgroup writeback accounting with fs-layer encryption
…ernel/git/vfs/idmapping

Pull vfs fixes from Christian Brauner:

 - When allocating pages for a watch queue failed, we didn't return an
   error causing userspace to proceed even though all subsequent
   notifcations would be lost. Make sure to return an error.

 - Fix a misformed tree entry for the idmapping maintainers entry.

 - When setting file leases from an idmapped mount via
   generic_setlease() we need to take the idmapping into account
   otherwise taking a lease would fail from an idmapped mount.

 - Remove two redundant assignments, one in splice code and the other in
   locks code, that static checkers complained about.

* tag 'vfs.misc.v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  filelocks: use mount idmapping for setlease permission check
  fs/locks: Remove redundant assignment to cmd
  splice: Remove redundant assignment to ret
  MAINTAINERS: repair a malformed T: entry in IDMAPPED MOUNTS
  watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths
…/kernel/git/brauner/linux

Pull clone3 fix from Christian Brauner:
 "A simple fix for the clone3() system call.

  The CLONE_NEWTIME allows the creation of time namespaces. The flag
  reuses a bit from the CSIGNAL bits that are used in the legacy clone()
  system call to set the signal that gets sent to the parent after the
  child exits.

  The clone3() system call doesn't rely on CSIGNAL anymore as it uses a
  dedicated .exit_signal field in struct clone_args. So we blocked all
  CSIGNAL bits in clone3_args_valid(). When CLONE_NEWTIME was introduced
  and reused a CSIGNAL bit we forgot to adapt clone3_args_valid()
  causing CLONE_NEWTIME with clone3() to be rejected. Fix this"

* tag 'kernel.fork.v6.3-rc2' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
  selftests/clone3: test clone3 with CLONE_NEWTIME
  fork: allow CLONE_NEWTIME in clone3 flags
…inux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:
 "A single erratum fix for AMD machines:

   - Disable XSAVES on AMD Zen1 and Zen2 machines due to an erratum. No
     impact to anything as those machines will fallback to XSAVEC which
     is equivalent there"

* tag 'x86_urgent_for_v6.3_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Disable XSAVES on AMD family 0x17
…nel/git/gregkh/staging

Pull staging driver fixes and removal from Greg KH:
 "Here are four small staging driver fixes, and one big staging driver
  deletion for 6.3-rc2.

  The fixes are:

   - rtl8192e driver fixes for where the driver was attempting to
     execute various programs directly from the disk for unknown reasons

   - rtl8723bs driver fixes for issues found by Hans in testing

  The deleted driver is the removal of the r8188eu wireless driver as
  now in 6.3-rc1 we have a "real" wifi driver for one that includes
  support for many many more devices than this old driver did. So it's
  time to remove it as it is no longer needed. The maintainers of this
  driver all have acked its removal. Many thanks to them over the years
  for working to clean it up and keep it working while the real driver
  was being developed.

  All of these have been in linux-next this week with no reported
  problems"

* tag 'staging-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: r8188eu: delete driver
  staging: rtl8723bs: Pass correct parameters to cfg80211_get_bss()
  staging: rtl8723bs: Fix key-store index handling
  staging: rtl8192e: Remove call_usermodehelper starting RadioPower.sh
  staging: rtl8192e: Remove function ..dm_check_ac_dc_power calling a script
…s-linux

Pull xfs fixes from Darrick Wong:

 - Fix a crash if mount time quotacheck fails when there are inodes
   queued for garbage collection.

 - Fix an off by one error when discarding folios after writeback
   failure.

* tag 'xfs-6.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix off-by-one-block in xfs_discard_folio()
  xfs: quotacheck failure can race with background inode inactivation
tpm_read_log_acpi() should return -ENODEV when no eventlog from the ACPI
table is found. If the firmware vendor includes an invalid log address
we are unable to map from the ACPI memory and tpm_read_log() returns -EIO
which would abort discovery of the eventlog.

Change the return value from -EIO to -ENODEV when acpi_os_map_iomem()
fails to map the event log.

The following hardware was used to test this issue:
    Framework Laptop (Pre-production)
    BIOS: INSYDE Corp, Revision: 3.2
    TPM Device: NTC, Firmware Revision: 7.2

Dump of the faulty ACPI TPM2 table:
    [000h 0000   4]                    Signature : "TPM2"    [Trusted Platform Module hardware interface Table]
    [004h 0004   4]                 Table Length : 0000004C
    [008h 0008   1]                     Revision : 04
    [009h 0009   1]                     Checksum : 2B
    [00Ah 0010   6]                       Oem ID : "INSYDE"
    [010h 0016   8]                 Oem Table ID : "TGL-ULT"
    [018h 0024   4]                 Oem Revision : 00000002
    [01Ch 0028   4]              Asl Compiler ID : "ACPI"
    [020h 0032   4]        Asl Compiler Revision : 00040000

    [024h 0036   2]               Platform Class : 0000
    [026h 0038   2]                     Reserved : 0000
    [028h 0040   8]              Control Address : 0000000000000000
    [030h 0048   4]                 Start Method : 06 [Memory Mapped I/O]

    [034h 0052  12]            Method Parameters : 00 00 00 00 00 00 00 00 00 00 00 00
    [040h 0064   4]           Minimum Log Length : 00010000
    [044h 0068   8]                  Log Address : 000000004053D000

Fixes: 0cf577a ("tpm: Fix handling of missing event log")
Tested-by: Erkki Eilonen <[email protected]>
Signed-off-by: Morten Linderud <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
AMD has issued an advisory indicating that having fTPM enabled in
BIOS can cause "stuttering" in the OS.  This issue has been fixed
in newer versions of the fTPM firmware, but it's up to system
designers to decide whether to distribute it.

This issue has existed for a while, but is more prevalent starting
with kernel 6.1 because commit b006c43 ("hwrng: core - start
hwrng kthread also for untrusted sources") started to use the fTPM
for hwrng by default. However, all uses of /dev/hwrng result in
unacceptable stuttering.

So, simply disable registration of the defective hwrng when detecting
these faulty fTPM versions.  As this is caused by faulty firmware, it
is plausible that such a problem could also be reproduced by other TPM
interactions, but this hasn't been shown by any user's testing or reports.

It is hypothesized to be triggered more frequently by the use of the RNG
because userspace software will fetch random numbers regularly.

Intentionally continue to register other TPM functionality so that users
that rely upon PCR measurements or any storage of data will still have
access to it.  If it's found later that another TPM functionality is
exacerbating this problem a module parameter it can be turned off entirely
and a module parameter can be introduced to allow users who rely upon
fTPM functionality to turn it on even though this problem is present.

Link: https://www.amd.com/en/support/kb/faq/pa-410
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216989
Link: https://lore.kernel.org/all/[email protected]/
Fixes: b006c43 ("hwrng: core - start hwrng kthread also for untrusted sources")
Cc: [email protected]
Cc: Jarkko Sakkinen <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: James Bottomley <[email protected]>
Tested-by: [email protected]
Tested-by: Bell <[email protected]>
Co-developed-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
…/git/jarkko/linux-tpmdd

Pull tpm fixes from Jarkko Sakkinen:
 "Two additional bug fixes for v6.3"

* tag 'tpm-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: disable hwrng for fTPM on some AMD designs
  tpm/eventlog: Don't abort tpm_read_log on faulty ACPI address
…r wext"

This reverts part of commit 015b8cc ("wifi: cfg80211: Fix use after
free for wext")

This commit broke WPA offload by unconditionally clearing the crypto
modes for non-WEP connections. Drop that part of the patch.

Signed-off-by: Hector Martin <[email protected]>
Reported-by: Ilya <[email protected]>
Reported-and-tested-by: Janne Grunau <[email protected]>
Reviewed-by: Eric Curtin <[email protected]>
Fixes: 015b8cc ("wifi: cfg80211: Fix use after free for wext")
Cc: [email protected]
Link: https://lore.kernel.org/linux-wireless/[email protected]/T/#m11e6e0915ab8fa19ce8bc9695ab288c0fe018edf
Signed-off-by: Linus Torvalds <[email protected]>
fahhem pushed a commit that referenced this pull request Apr 19, 2023
…ct_cfm

This attempts to fix the following trace:

======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2-g0b93eeba4454 #4703 Not tainted
------------------------------------------------------
kworker/u3:0/46 is trying to acquire lock:
ffff888001fd9130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at:
sco_connect_cfm+0x118/0x4a0

but task is already holding lock:
ffffffff831e3340 (hci_cb_list_lock){+.+.}-{3:3}, at:
hci_sync_conn_complete_evt+0x1ad/0x3d0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (hci_cb_list_lock){+.+.}-{3:3}:
       __mutex_lock+0x13b/0xcc0
       hci_sync_conn_complete_evt+0x1ad/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #1 (&hdev->lock){+.+.}-{3:3}:
       __mutex_lock+0x13b/0xcc0
       sco_sock_connect+0xfc/0x630
       __sys_connect+0x197/0x1b0
       __x64_sys_connect+0x37/0x50
       do_syscall_64+0x42/0x90
       entry_SYSCALL_64_after_hwframe+0x70/0xda

-> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
       __lock_acquire+0x18cc/0x3740
       lock_acquire+0x151/0x3a0
       lock_sock_nested+0x32/0x80
       sco_connect_cfm+0x118/0x4a0
       hci_sync_conn_complete_evt+0x1e6/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

other info that might help us debug this:

Chain exists of:
  sk_lock-AF_BLUETOOTH-BTPROTO_SCO --> &hdev->lock --> hci_cb_list_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hci_cb_list_lock);
                               lock(&hdev->lock);
                               lock(hci_cb_list_lock);
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);

 *** DEADLOCK ***

4 locks held by kworker/u3:0/46:
 #0: ffff8880028d1130 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
 process_one_work+0x4c0/0x910
 #1: ffff8880013dfde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
 at: process_one_work+0x4c0/0x910
 #2: ffff8880025d8070 (&hdev->lock){+.+.}-{3:3}, at:
 hci_sync_conn_complete_evt+0xa6/0x3d0
 #3: ffffffffb79e3340 (hci_cb_list_lock){+.+.}-{3:3}, at:
 hci_sync_conn_complete_evt+0x1ad/0x3d0

Signed-off-by: Luiz Augusto von Dentz <[email protected]>
fahhem pushed a commit that referenced this pull request Apr 19, 2023
…sockopt

This attempts to fix the following trace:

======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2-g68fcb3a7bf97 #4706 Not tainted
------------------------------------------------------
sco-tester/31 is trying to acquire lock:
ffff8880025b8070 (&hdev->lock){+.+.}-{3:3}, at:
sco_sock_getsockopt+0x1fc/0xa90

but task is already holding lock:
ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at:
sco_sock_getsockopt+0x104/0xa90

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
       lock_sock_nested+0x32/0x80
       sco_connect_cfm+0x118/0x4a0
       hci_sync_conn_complete_evt+0x1e6/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #1 (hci_cb_list_lock){+.+.}-{3:3}:
       __mutex_lock+0x13b/0xcc0
       hci_sync_conn_complete_evt+0x1ad/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #0 (&hdev->lock){+.+.}-{3:3}:
       __lock_acquire+0x18cc/0x3740
       lock_acquire+0x151/0x3a0
       __mutex_lock+0x13b/0xcc0
       sco_sock_getsockopt+0x1fc/0xa90
       __sys_getsockopt+0xe9/0x190
       __x64_sys_getsockopt+0x5b/0x70
       do_syscall_64+0x42/0x90
       entry_SYSCALL_64_after_hwframe+0x70/0xda

other info that might help us debug this:

Chain exists of:
  &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
                               lock(hci_cb_list_lock);
                               lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
  lock(&hdev->lock);

 *** DEADLOCK ***

1 lock held by sco-tester/31:
 #0: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0},
 at: sco_sock_getsockopt+0x104/0xa90

Fixes: 248733e ("Bluetooth: Allow querying of supported offload codecs over SCO socket")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
fahhem added a commit that referenced this pull request Apr 19, 2023
commit 76dcd734eca23168cb008912c0f69ff408905235
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 14:48:12 2022 -0800

    Linux 6.1-rc8

commit 0ba09b1733878afe838fe35c310715fda3d46428
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 12:51:59 2022 -0800

    Revert "mm: align larger anonymous mappings on THP boundaries"

    This reverts commit f35b5d7d676e59e401690b678cd3cfec5e785c23.

    It has been reported to cause huge performance regressions on some loads
    (will-it-scale.per_process_ops, but also building the kernel with
    clang).

    The commit did speed up gcc builds by a small amount, so it's not an
    unambiguous regression, but until the big regressions are understood,
    let's revert it.

    Reported-by: kernel test robot <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reported-by: Nathan Chancellor <[email protected]>
    Link: https://lore.kernel.org/lkml/Y1DNQaoPWxE%[email protected]/
    Cc: Huang, Ying <[email protected]>
    Cc: Rik van Riel <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: Yang Shi <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 23393c6461422df5bf8084a086ada9a7e17dc2ba
Author: Jan Dabros <[email protected]>
Date:   Mon Nov 28 20:56:51 2022 +0100

    char: tpm: Protect tpm_pm_suspend with locks

    Currently tpm transactions are executed unconditionally in
    tpm_pm_suspend() function, which may lead to races with other tpm
    accessors in the system.

    Specifically, the hw_random tpm driver makes use of tpm_get_random(),
    and this function is called in a loop from a kthread, which means it's
    not frozen alongside userspace, and so can race with the work done
    during system suspend:

      tpm tpm0: tpm_transmit: tpm_recv: error -52
      tpm tpm0: invalid TPM_STS.x 0xff, dumping stack for forensics
      CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc5+ #135
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
      Call Trace:
       tpm_tis_status.cold+0x19/0x20
       tpm_transmit+0x13b/0x390
       tpm_transmit_cmd+0x20/0x80
       tpm1_pm_suspend+0xa6/0x110
       tpm_pm_suspend+0x53/0x80
       __pnp_bus_suspend+0x35/0xe0
       __device_suspend+0x10f/0x350

    Fix this by calling tpm_try_get_ops(), which itself is a wrapper around
    tpm_chip_start(), but takes the appropriate mutex.

    Signed-off-by: Jan Dabros <[email protected]>
    Reported-by: Vlastimil Babka <[email protected]>
    Tested-by: Jason A. Donenfeld <[email protected]>
    Tested-by: Vlastimil Babka <[email protected]>
    Link: https://lore.kernel.org/all/[email protected]/
    Cc: [email protected]
    Fixes: e891db1a18bf ("tpm: turn on TPM on suspend for TPM 1.x")
    [Jason: reworked commit message, added metadata]
    Signed-off-by: Jason A. Donenfeld <[email protected]>
    Reviewed-by: Jarkko Sakkinen <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 0c3b5bcb484a659dd14466f92a073b57b2d3c1a5
Merge: eea8bebd5173 517e6a301f34
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 12:36:23 2022 -0800

    Merge tag 'perf_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull perf fix from Borislav Petkov:

     - Fix a use-after-free case where the perf pending task callback would
       see an already freed event

    * tag 'perf_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      perf: Fix perf_pending_task() UaF

commit eea8bebd51739cc7a3bb501032ee877b4aada553
Merge: ae6bb7171171 d9f15a9de44a
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 12:33:44 2022 -0800

    Merge tag 'timers_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull timer fix from Borislav Petkov:

     - Revert a fix to RISC-V timers supposed to address an uncertainty
       whether clock events are received during S3 or not which locks up
       other RISC-V platforms. The issue will be fixed differently later.

    * tag 'timers_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"

commit ae6bb71711713e7b6b2d9ed2af7318dc8ae5cfb2
Merge: 50f36c5aa12c 2e7ec190a0e3
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 12:24:58 2022 -0800

    Merge tag 'powerpc-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

    Pull powerpc fixes from Michael Ellerman:

     - Fix oops in 32-bit BPF tail call tests

     - Add missing declaration for machine_check_early_boot()

    Thanks to Christophe Leroy and Naveen N. Rao.

    * tag 'powerpc-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
      powerpc/64s: Add missing declaration for machine_check_early_boot()
      powerpc/bpf/32: Fix Oops on tail call tests

commit 50f36c5aa12c8d0f2adca662e0c106212ea897a8
Merge: c2bf05db6c78 8c9a59939deb
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 12:18:37 2022 -0800

    Merge tag 'input-for-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

    Pull input fix from Dmitry Torokhov:

     - a fix for Raydium touchscreen driver to stop leaking memory when
       sending commands to the chip

    * tag 'input-for-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
      Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send()

commit c2bf05db6c78f53ca5cd4b48f3b9b71f78d215f1
Merge: 6085bc95797c d36678f7905c
Author: Linus Torvalds <[email protected]>
Date:   Sat Dec 3 13:51:37 2022 -0800

    Merge tag 'i2c-for-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

    Pull i2c fixes from Wolfram Sang:
     "A power state fix in the core for ACPI devices, a regression fix
      regarding bus recovery for the cadence driver, a DMA handling fix for
      the imx driver, and two error path fixes (npcm7xx and qcom-geni)"

    * tag 'i2c-for-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
      i2c: imx: Only DMA messages with I2C_M_DMA_SAFE flag set
      i2c: qcom-geni: fix error return code in geni_i2c_gpi_xfer
      i2c: cadence: Fix regression with bus recovery
      i2c: Restore initial power state if probe fails
      i2c: npcm7xx: Fix error handling in npcm_i2c_init()

commit 6085bc95797caa55a68bc0f7dd73e8c33e91037f
Merge: 97ee9d1c1696 472faf72b33d
Author: Linus Torvalds <[email protected]>
Date:   Sat Dec 3 13:43:38 2022 -0800

    Merge tag 'dax-fixes-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

    Pull dax fixes from Dan Williams:
     "A few bug fixes around the handling of "Soft Reserved" memory and
      memory tiering information.

      Linux is starting to enounter more real world systems that deploy an
      ACPI HMAT to describe different performance classes of memory, as well
      the "special purpose" (Linux "Soft Reserved") designation from EFI.

      These fixes result from that testing.

      It has all appeared in -next for a while with no known issues.

       - Fix duplicate overlapping device-dax instances for HMAT described
         "Soft Reserved" Memory

       - Fix missing node targets in the sysfs representation of memory
         tiers

       - Remove a confusing variable initialization"

    * tag 'dax-fixes-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
      device-dax: Fix duplicate 'hmem' device registration
      ACPI: HMAT: Fix initiator registration for single-initiator systems
      ACPI: HMAT: remove unnecessary variable initialization

commit 97ee9d1c16963375eefdf964c429897d27e28956
Merge: 63050a5ca130 d0f411c0b9bd
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 16:27:15 2022 -0800

    Merge tag 'block-6.1-2022-12-02' of git://git.kernel.dk/linux

    Pull block fixes from Jens Axboe:
     "Just a small NVMe merge for this week, fixing protection of the name
      space list, and a missing clear of a reserved field when unused"

    * tag 'block-6.1-2022-12-02' of git://git.kernel.dk/linux:
      nvme: fix SRCU protection of nvme_ns_head list
      nvme-pci: clear the prp2 field when not used

commit 63050a5ca130e76af7199c9a3fba1d175f3a1102
Merge: 0e15c3c75a28 6989ea4881c8
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 16:22:17 2022 -0800

    Merge tag 'pinctrl-v6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

    Pull pin control fixes from Linus Walleij:
     "Three driver fixes. The Intel fix looks like the most important.

       - Fix a potential divide by zero in pinctrl-singe (OMAP and
         HiSilicon)

       - Disable IRQs on startup in the Mediatek driver. This is a classic,
         we should be looking out for this more.

       - Save and restore pins in 'direct IRQ' mode in the Intel driver,
         this works around firmware bugs"

    * tag 'pinctrl-v6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
      pinctrl: intel: Save and restore pins in "direct IRQ" mode
      pinctrl: meditatek: Startup with the IRQs disabled
      pinctrl: single: Fix potential division by zero

commit 0e15c3c75a28b10bac7b3ad7627fd6b458623283
Merge: 2df2adc3e69d 39cefc5f6cd2
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 16:04:53 2022 -0800

    Merge tag 'riscv-for-linus-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

    Pull RISC-V fixes from Palmer Dabbelt:

     - build fix for the NR_CPUS Kconfig SBI version dependency

     - fixes to early memory initialization, to fix page permissions in EFI
       and post-initmem-free

     - build fix for the VDSO, to avoid trying to profile the VDSO functions

     - fixes for kexec crash handling, to fix multi-core and interrupt
       related initialization inside the crash kernel

     - fix for a race condition when handling multiple concurrect kernel
       stack overflows

    * tag 'riscv-for-linus-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
      riscv: kexec: Fixup crash_smp_send_stop without multi cores
      riscv: kexec: Fixup irq controller broken in kexec crash path
      riscv: mm: Proper page permissions after initmem free
      riscv: vdso: fix section overlapping under some conditions
      riscv: fix race when vmap stack overflow
      riscv: Sync efi page table's kernel mappings before switching
      riscv: Fix NR_CPUS range conditions

commit 2df2adc3e69d32751eb534ed55c591854260c4a3
Merge: f66f62f83dba dd30dcfa7a74
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 15:58:07 2022 -0800

    Merge tag 'mmc-v6.1-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

    Pull MMC fixes from Ulf Hansson:
     "MMC core:
       - Fix ambiguous TRIM and DISCARD args
       - Fix removal of debugfs file for mmc_test

      MMC host:
       - mtk-sd: Add missing clk_disable_unprepare() in an error path
       - sdhci: Fix I/O voltage switch delay for UHS-I SD cards
       - sdhci-esdhc-imx: Fix CQHCI exit halt state check
       - sdhci-sprd: Fix voltage switch"

    * tag 'mmc-v6.1-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
      mmc: sdhci-sprd: Fix no reset data and command after voltage switch
      mmc: sdhci: Fix voltage switch delay
      mmc: mtk-sd: Fix missing clk_disable_unprepare in msdc_of_clock_parse()
      mmc: mmc_test: Fix removal of debugfs file
      mmc: sdhci-esdhc-imx: correct CQHCI exit halt state check
      mmc: core: Fix ambiguous TRIM and DISCARD arg

commit f66f62f83dba3ec4237c3dd7a95b776824ac91a0
Merge: 66065157420c 4bedbbd782eb
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 15:54:12 2022 -0800

    Merge tag 'iommu-fixes-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

    Pull iommu fixes from Joerg Roedel:
     "Intel VT-d fixes:

       - IO/TLB flush fix

       - Various pci_dev refcount fixes"

    * tag 'iommu-fixes-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
      iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()
      iommu/vt-d: Fix PCI device refcount leak in has_external_pci()
      iommu/vt-d: Fix PCI device refcount leak in prq_event_thread()
      iommu/vt-d: Add a fix for devices need extra dtlb flush

commit 66065157420c5b9b3f078f43d313c153e1ff7f83
Author: Pawan Gupta <[email protected]>
Date:   Wed Nov 30 07:25:51 2022 -0800

    x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3

    The "force" argument to write_spec_ctrl_current() is currently ambiguous
    as it does not guarantee the MSR write. This is due to the optimization
    that writes to the MSR happen only when the new value differs from the
    cached value.

    This is fine in most cases, but breaks for S3 resume when the cached MSR
    value gets out of sync with the hardware MSR value due to S3 resetting
    it.

    When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write
    is skipped. Which results in SPEC_CTRL mitigations not getting restored.

    Move the MSR write from write_spec_ctrl_current() to a new function that
    unconditionally writes to the MSR. Update the callers accordingly and
    rename functions.

      [ bp: Rework a bit. ]

    Fixes: caa0ff24d5d0 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value")
    Suggested-by: Borislav Petkov <[email protected]>
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Reviewed-by: Thomas Gleixner <[email protected]>
    Cc: <[email protected]>
    Link: https://lore.kernel.org/r/806d39b0bfec2fe8f50dc5446dff20f5bb24a959.1669821572.git.pawan.kumar.gupta@linux.intel.com
    Signed-off-by: Linus Torvalds <[email protected]>

commit 8c9a59939deb4bfafdc451100c03d1e848b4169b
Author: Zhang Xiaoxu <[email protected]>
Date:   Fri Dec 2 15:37:46 2022 -0800

    Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send()

    There is a kmemleak when test the raydium_i2c_ts with bpf mock device:

      unreferenced object 0xffff88812d3675a0 (size 8):
        comm "python3", pid 349, jiffies 4294741067 (age 95.695s)
        hex dump (first 8 bytes):
          11 0e 10 c0 01 00 04 00                          ........
        backtrace:
          [<0000000068427125>] __kmalloc+0x46/0x1b0
          [<0000000090180f91>] raydium_i2c_send+0xd4/0x2bf [raydium_i2c_ts]
          [<000000006e631aee>] raydium_i2c_initialize.cold+0xbc/0x3e4 [raydium_i2c_ts]
          [<00000000dc6fcf38>] raydium_i2c_probe+0x3cd/0x6bc [raydium_i2c_ts]
          [<00000000a310de16>] i2c_device_probe+0x651/0x680
          [<00000000f5a96bf3>] really_probe+0x17c/0x3f0
          [<00000000096ba499>] __driver_probe_device+0xe3/0x170
          [<00000000c5acb4d9>] driver_probe_device+0x49/0x120
          [<00000000264fe082>] __device_attach_driver+0xf7/0x150
          [<00000000f919423c>] bus_for_each_drv+0x114/0x180
          [<00000000e067feca>] __device_attach+0x1e5/0x2d0
          [<0000000054301fc2>] bus_probe_device+0x126/0x140
          [<00000000aad93b22>] device_add+0x810/0x1130
          [<00000000c086a53f>] i2c_new_client_device+0x352/0x4e0
          [<000000003c2c248c>] of_i2c_register_device+0xf1/0x110
          [<00000000ffec4177>] of_i2c_notify+0x100/0x160
      unreferenced object 0xffff88812d3675c8 (size 8):
        comm "python3", pid 349, jiffies 4294741070 (age 95.692s)
        hex dump (first 8 bytes):
          22 00 36 2d 81 88 ff ff                          ".6-....
        backtrace:
          [<0000000068427125>] __kmalloc+0x46/0x1b0
          [<0000000090180f91>] raydium_i2c_send+0xd4/0x2bf [raydium_i2c_ts]
          [<000000001d5c9620>] raydium_i2c_initialize.cold+0x223/0x3e4 [raydium_i2c_ts]
          [<00000000dc6fcf38>] raydium_i2c_probe+0x3cd/0x6bc [raydium_i2c_ts]
          [<00000000a310de16>] i2c_device_probe+0x651/0x680
          [<00000000f5a96bf3>] really_probe+0x17c/0x3f0
          [<00000000096ba499>] __driver_probe_device+0xe3/0x170
          [<00000000c5acb4d9>] driver_probe_device+0x49/0x120
          [<00000000264fe082>] __device_attach_driver+0xf7/0x150
          [<00000000f919423c>] bus_for_each_drv+0x114/0x180
          [<00000000e067feca>] __device_attach+0x1e5/0x2d0
          [<0000000054301fc2>] bus_probe_device+0x126/0x140
          [<00000000aad93b22>] device_add+0x810/0x1130
          [<00000000c086a53f>] i2c_new_client_device+0x352/0x4e0
          [<000000003c2c248c>] of_i2c_register_device+0xf1/0x110
          [<00000000ffec4177>] of_i2c_notify+0x100/0x160

    After BANK_SWITCH command from i2c BUS, no matter success or error
    happened, the tx_buf should be freed.

    Fixes: 3b384bd6c3f2 ("Input: raydium_ts_i2c - do not split tx transactions")
    Signed-off-by: Zhang Xiaoxu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Cc: [email protected]
    Signed-off-by: Dmitry Torokhov <[email protected]>

commit a1e9185d20b56af04022d2e656802254f4ea47eb
Merge: c290db013742 b47068b4aa53
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 15:40:35 2022 -0800

    Merge tag 'sound-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

    Pull sound fixes from Takashi Iwai:
     "Likely the last piece for 6.1; the only significant fixes are ASoC
      core ops fixes, while others are device-specific (rather minor) fixes
      in ASoC and FireWire drivers.

      All appear safe enough to take as a late stage material"

    * tag 'sound-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
      ALSA: dice: fix regression for Lexicon I-ONIX FW810S
      ASoC: cs42l51: Correct PGA Volume minimum value
      ASoC: ops: Correct bounds check for second channel on SX controls
      ASoC: tlv320adc3xxx: Fix build error for implicit function declaration
      ASoC: ops: Check bounds for second channel in snd_soc_put_volsw_sx()
      ASoC: ops: Fix bounds check for _sx controls
      ASoC: fsl_micfil: explicitly clear CHnF flags
      ASoC: fsl_micfil: explicitly clear software reset bit

commit c290db013742e98fe5b64073bc2dd8c8a2ac9e4c
Merge: bdaa78c6aa86 c082fbd687ad
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 15:35:21 2022 -0800

    Merge tag 'drm-fixes-2022-12-02' of git://anongit.freedesktop.org/drm/drm

    Pull drm fixes from Dave Airlie:
     "Things do seem to have finally settled down, just four i915 and one
      amdgpu this week. Probably won't have much for next week if you do
      push rc8 out.

      i915:
       - Fix dram info readout
       - Remove non-existent pipes from bigjoiner pipe mask
       - Fix negative value passed as remaining time
       - Never return 0 if not all requests retired

      amdgpu:
       - VCN fix for vangogh"

    * tag 'drm-fixes-2022-12-02' of git://anongit.freedesktop.org/drm/drm:
      drm/amdgpu: enable Vangogh VCN indirect sram mode
      drm/i915: Never return 0 if not all requests retired
      drm/i915: Fix negative value passed as remaining time
      drm/i915: Remove non-existent pipes from bigjoiner pipe mask
      drm/i915/mtl: Fix dram info readout

commit bdaa78c6aa861f0e8c612a0b2272423d92f0071c
Merge: 6647e76ab623 1d351f189434
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 13:39:38 2022 -0800

    Merge tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

    Pull misc hotfixes from Andrew Morton:
     "15 hotfixes,  11 marked cc:stable.

      Only three or four of the latter address post-6.0 issues, which is
      hopefully a sign that things are converging"

    * tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
      revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"
      Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
      drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
      mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
      mm/khugepaged: fix GUP-fast interaction by sending IPI
      mm/khugepaged: take the right locks for page table retraction
      mm: migrate: fix THP's mapcount on isolation
      mm: introduce arch_has_hw_nonleaf_pmd_young()
      mm: add dummy pmd_young() for architectures not having it
      mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
      tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
      nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
      hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
      madvise: use zap_page_range_single for madvise dontneed
      mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE

commit 6647e76ab623b2b3fb2efe03a86e9c9046c52c33
Author: Linus Torvalds <[email protected]>
Date:   Wed Nov 30 16:10:52 2022 -0800

    v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() fails

    The V4L2_MEMORY_USERPTR interface is long deprecated and shouldn't be
    used (and is discouraged for any modern v4l drivers).  And Seth Jenkins
    points out that the fallback to VM_PFNMAP/VM_IO is fundamentally racy
    and dangerous.

    Note that it's not even a case that should trigger, since any normal
    user pointer logic ends up just using the pin_user_pages_fast() call
    that does the proper page reference counting.  That's not the problem
    case, only if you try to use special device mappings do you have any
    issues.

    Normally I'd just remove this during the merge window, but since Seth
    pointed out the problem cases, we really want to know as soon as
    possible if there are actually any users of this odd special case of a
    legacy interface.  Neither Hans nor Mauro seem to think that such
    mis-uses of the old legacy interface should exist.  As Mauro says:

     "See, V4L2 has actually 4 streaming APIs:
            - Kernel-allocated mmap (usually referred simply as just mmap);
            - USERPTR mmap;
            - read();
            - dmabuf;

      The USERPTR is one of the oldest way to use it, coming from V4L
      version 1 times, and by far the least used one"

    And Hans chimed in on the USERPTR interface:

     "To be honest, I wouldn't mind if it goes away completely, but that's a
      bit of a pipe dream right now"

    but while removing this legacy interface entirely may be a pipe dream we
    can at least try to remove the unlikely (and actively broken) case of
    using special device mappings for USERPTR accesses.

    This replaces it with a WARN_ONCE() that we can remove once we've
    hopefully confirmed that no actual users exist.

    NOTE! Longer term, this means that a 'struct frame_vector' only ever
    contains proper page pointers, and all the games we have with converting
    them to pages can go away (grep for 'frame_vector_to_pages()' and the
    uses of 'vec->is_pfns').  But this is just the first step, to verify
    that this code really is all dead, and do so as quickly as possible.

    Reported-by: Seth Jenkins <[email protected]>
    Acked-by: Hans Verkuil <[email protected]>
    Acked-by: Mauro Carvalho Chehab <[email protected]>
    Cc: David Hildenbrand <[email protected]>
    Cc: Jan Kara <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit d0f411c0b9bdef85f647e15a2fcc790b29891f2c
Merge: 7d4a93176e01 899d2a05dc14
Author: Jens Axboe <[email protected]>
Date:   Fri Dec 2 08:01:06 2022 -0700

    Merge tag 'nvme-6.1-2022-01-02' of git://git.infradead.org/nvme into block-6.1

    Pull NVMe fixes from Christoph:

    "nvme fixes for Linux 6.1

     - fix SRCU protection of nvme_ns_head list (Caleb Sander)
     - clear the prp2 field when not used (Lei Rao)"

    * tag 'nvme-6.1-2022-01-02' of git://git.infradead.org/nvme:
      nvme: fix SRCU protection of nvme_ns_head list
      nvme-pci: clear the prp2 field when not used

commit 4bedbbd782ebbe7287231fea862c158d4f08a9e3
Author: Xiongfeng Wang <[email protected]>
Date:   Thu Dec 1 12:01:27 2022 +0800

    iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()

    for_each_pci_dev() is implemented by pci_get_device(). The comment of
    pci_get_device() says that it will increase the reference count for the
    returned pci_dev and also decrease the reference count for the input
    pci_dev @from if it is not NULL.

    If we break for_each_pci_dev() loop with pdev not NULL, we need to call
    pci_dev_put() to decrease the reference count. Add the missing
    pci_dev_put() for the error path to avoid reference count leak.

    Fixes: 2e4552893038 ("iommu/vt-d: Unify the way to process DMAR device scope array")
    Signed-off-by: Xiongfeng Wang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Signed-off-by: Joerg Roedel <[email protected]>

commit afca9e19cc720bfafc75dc5ce429c185ca93f31d
Author: Xiongfeng Wang <[email protected]>
Date:   Thu Dec 1 12:01:26 2022 +0800

    iommu/vt-d: Fix PCI device refcount leak in has_external_pci()

    for_each_pci_dev() is implemented by pci_get_device(). The comment of
    pci_get_device() says that it will increase the reference count for the
    returned pci_dev and also decrease the reference count for the input
    pci_dev @from if it is not NULL.

    If we break for_each_pci_dev() loop with pdev not NULL, we need to call
    pci_dev_put() to decrease the reference count. Add the missing
    pci_dev_put() before 'return true' to avoid reference count leak.

    Fixes: 89a6079df791 ("iommu/vt-d: Force IOMMU on for platform opt in hint")
    Signed-off-by: Xiongfeng Wang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Signed-off-by: Joerg Roedel <[email protected]>

commit 6927d352380797ddbee18631491ec428741696e2
Author: Yang Yingliang <[email protected]>
Date:   Thu Dec 1 12:01:25 2022 +0800

    iommu/vt-d: Fix PCI device refcount leak in prq_event_thread()

    As comment of pci_get_domain_bus_and_slot() says, it returns a pci device
    with refcount increment, when finish using it, the caller must decrease
    the reference count by calling pci_dev_put(). So call pci_dev_put() after
    using the 'pdev' to avoid refcount leak.

    Besides, if the 'pdev' is null or intel_svm_prq_report() returns error,
    there is no need to trace this fault.

    Fixes: 06f4b8d09dba ("iommu/vt-d: Remove unnecessary SVA data accesses in page fault path")
    Suggested-by: Lu Baolu <[email protected]>
    Signed-off-by: Yang Yingliang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Signed-off-by: Joerg Roedel <[email protected]>

commit e65a6897be5e4939d477c4969a05e12d90b08409
Author: Jacob Pan <[email protected]>
Date:   Thu Dec 1 12:01:24 2022 +0800

    iommu/vt-d: Add a fix for devices need extra dtlb flush

    QAT devices on Intel Sapphire Rapids and Emerald Rapids have a defect in
    address translation service (ATS). These devices may inadvertently issue
    ATS invalidation completion before posted writes initiated with
    translated address that utilized translations matching the invalidation
    address range, violating the invalidation completion ordering.

    This patch adds an extra device TLB invalidation for the affected devices,
    it is needed to ensure no more posted writes with translated address
    following the invalidation completion. Therefore, the ordering is
    preserved and data-corruption is prevented.

    Device TLBs are invalidated under the following six conditions:
    1. Device driver does DMA API unmap IOVA
    2. Device driver unbind a PASID from a process, sva_unbind_device()
    3. PASID is torn down, after PASID cache is flushed. e.g. process
    exit_mmap() due to crash
    4. Under SVA usage, called by mmu_notifier.invalidate_range() where
    VM has to free pages that were unmapped
    5. userspace driver unmaps a DMA buffer
    6. Cache invalidation in vSVA usage (upcoming)

    For #1 and #2, device drivers are responsible for stopping DMA traffic
    before unmap/unbind. For #3, iommu driver gets mmu_notifier to
    invalidate TLB the same way as normal user unmap which will do an extra
    invalidation. The dTLB invalidation after PASID cache flush does not
    need an extra invalidation.

    Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
    covered by this patch due to common code path with #5.

    Tested-by: Yuzhang Luo <[email protected]>
    Reviewed-by: Ashok Raj <[email protected]>
    Reviewed-by: Kevin Tian <[email protected]>
    Signed-off-by: Jacob Pan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Signed-off-by: Joerg Roedel <[email protected]>

commit c082fbd687ad70a92e0a8be486a7555a66f03079
Merge: 65a388250e39 9a8cc8cabc1e
Author: Dave Airlie <[email protected]>
Date:   Fri Dec 2 09:12:46 2022 +1000

    Merge tag 'amd-drm-fixes-6.1-2022-12-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

    amd-drm-fixes-6.1-2022-12-01:

    amdgpu:
    - VCN fix for vangogh

    Signed-off-by: Dave Airlie <[email protected]>
    From: Alex Deucher <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

commit d36678f7905cbd1dc55a8a96e066dafd749d4600
Author: Andrew Lunn <[email protected]>
Date:   Thu Nov 10 00:59:02 2022 +0100

    i2c: imx: Only DMA messages with I2C_M_DMA_SAFE flag set

    Recent changes to the DMA code has resulting in the IMX driver failing
    I2C transfers when the buffer has been vmalloc. Only perform DMA
    transfers if the message has the I2C_M_DMA_SAFE flag set, indicating
    the client is providing a buffer which is DMA safe.

    This is a minimal fix for stable. The I2C core provides helpers to
    allocate a bounce buffer. For a fuller fix the master should make use
    of these helpers.

    Fixes: 4544b9f25e70 ("dma-mapping: Add vmap checks to dma_map_single()")
    Signed-off-by: Andrew Lunn <[email protected]>
    Acked-by: Oleksij Rempel <[email protected]>
    Signed-off-by: Wolfram Sang <[email protected]>

commit 7d8ccf4f117d082156e842d959f634efcf203cef
Author: Wang Yufen <[email protected]>
Date:   Mon Nov 21 18:17:52 2022 +0800

    i2c: qcom-geni: fix error return code in geni_i2c_gpi_xfer

    Fix to return a negative error code from the gi2c->err instead of
    0.

    Fixes: d8703554f4de ("i2c: qcom-geni: Add support for GPI DMA")
    Signed-off-by: Wang Yufen <[email protected]>
    Reviewed-by: Tommaso Merciai <[email protected]>
    Signed-off-by: Wolfram Sang <[email protected]>

commit 8bfd4ec726945cec35ee5cafebc1c55b83d5a9fb
Author: Carsten Haitzler <[email protected]>
Date:   Mon Nov 28 10:51:58 2022 +0000

    i2c: cadence: Fix regression with bus recovery

    Commit "i2c: cadence: Add standard bus recovery support" breaks for i2c
    devices that have no pinctrl defined. There is no requirement for this
    to exist in the DT. This has worked perfectly well without this before in
    at least 1 real usage case on hardware (Mali Komeda DPU, Cadence i2c to
    talk to a tda99xx phy). Adding the requirement to have pinctrl set up in
    the device tree (or otherwise be found) is a regression where the whole
    i2c device is lost entirely (in this case dropping entire devices which
    then leads to the drm display stack unable to find the phy for display
    output, thus having no drm display device and so on down the chain).

    This converts the above commit to an enhancement if pinctrl can be found
    for the i2c device, providing a timeout on read with recovery, but if not,
    do what used to be done rather than a fatal loss of a device.

    This restores the mentioned display devices to their working state again.

    Fixes: 58b924241d0a ("i2c: cadence: Add standard bus recovery support")
    Signed-off-by: Carsten Haitzler <[email protected]>
    Reviewed-by: Shubhrajyoti Datta <[email protected]>
    Reviewed-by: Michael Grzeschik <[email protected]>
    Acked-by: Michal Simek <[email protected]>
    [wsa: added braces to else-branch]
    Signed-off-by: Wolfram Sang <[email protected]>

commit 65a388250e391b7127efd6751f64f3e4955e43a0
Merge: b7b275e60bcd 12b8b046e4c9
Author: Dave Airlie <[email protected]>
Date:   Fri Dec 2 07:34:27 2022 +1000

    Merge tag 'drm-intel-fixes-2022-12-01' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

    - Fix dram info readout (Radhakrishna Sripada)
    - Remove non-existent pipes from bigjoiner pipe mask (Ville Syrjälä)
    - Fix negative value passed as remaining time (Janusz Krzysztofik)
    - Never return 0 if not all requests retired (Janusz Krzysztofik)

    Signed-off-by: Dave Airlie <[email protected]>
    From: Tvrtko Ursulin <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/Y4hp+a3TJ13t2ZA1@tursulin-desk

commit a4412fdd49dc011bcc2c0d81ac4cab7457092650
Author: Steven Rostedt (Google) <[email protected]>
Date:   Mon Nov 21 10:44:03 2022 -0500

    error-injection: Add prompt for function error injection

    The config to be able to inject error codes into any function annotated
    with ALLOW_ERROR_INJECTION() is enabled when FUNCTION_ERROR_INJECTION is
    enabled.  But unfortunately, this is always enabled on x86 when KPROBES
    is enabled, and there's no way to turn it off.

    As kprobes is useful for observability of the kernel, it is useful to
    have it enabled in production environments.  But error injection should
    be avoided.  Add a prompt to the config to allow it to be disabled even
    when kprobes is enabled, and get rid of the "def_bool y".

    This is a kernel debug feature (it's in Kconfig.debug), and should have
    never been something enabled by default.

    Cc: [email protected]
    Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe")
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 9a8cc8cabc1e351614fd7f9e774757a5143b6fe8
Author: Leo Liu <[email protected]>
Date:   Tue Nov 29 18:53:18 2022 -0500

    drm/amdgpu: enable Vangogh VCN indirect sram mode

    So that uses PSP to initialize HW.

    Fixes: 0c2c02b66c672e ("drm/amdgpu/vcn: add firmware support for dimgrey_cavefish")
    Signed-off-by: Leo Liu <[email protected]>
    Reviewed-by: James Zhu <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: [email protected]

commit 39cefc5f6cd25d555e0455b24810e9aff365b8d6
Merge: d556a9aeb62a 7e1864332fbc
Author: Palmer Dabbelt <[email protected]>
Date:   Thu Dec 1 11:38:39 2022 -0800

    RISC-V: Fix a race condition during kernel stack overflow

    This fixes a concrete bug but is also the basis for some cleanup work,
    so I'm merging it based on the offending commit in order to minimize
    future conflicts.

    * commit '7e1864332fbc1b993659eab7974da9fe8bf8c128':
      riscv: fix race when vmap stack overflow

commit 355479c70a489415ef6e2624e514f8f15a40861b
Merge: e214dd935bf1 7572ac3c979d
Author: Linus Torvalds <[email protected]>
Date:   Thu Dec 1 11:25:11 2022 -0800

    Merge tag 'efi-fixes-for-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

    Pull EFI fix from Ard Biesheuvel:
     "A single revert for some code that I added during this cycle. The code
      is not wrong, but it should be a bit more careful about how to handle
      the shadow call stack pointer, so it is better to revert it for now
      and bring it back later in improved form.

      Summary:

       - Revert runtime service sync exception recovery on arm64"

    * tag 'efi-fixes-for-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
      arm64: efi: Revert "Recover from synchronous exceptions ..."

commit e214dd935bf154af4c2d5013b16a283d592ccea5
Merge: 063c0e773ab3 9bdc112be727
Author: Linus Torvalds <[email protected]>
Date:   Thu Dec 1 11:10:25 2022 -0800

    Merge tag 'hwmon-for-v6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

    Pull hwmon fixes from Guenter Roeck:

     - Fix refcount leak and NULL pointer access in coretemp driver

     - Fix UAF in ibmpex driver

     - Add missing pci_disable_device() to i5500_temp driver

     - Fix calculation in ina3221 driver

     - Fix temperature scaling in ltc2947 driver

     - Check result of devm_kcalloc call in asus-ec-sensors driver

    * tag 'hwmon-for-v6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
      hwmon: (asus-ec-sensors) Add checks for devm_kcalloc
      hwmon: (coretemp) fix pci device refcount leak in nv1a_ram_new()
      hwmon: (coretemp) Check for null before removing sysfs attrs
      hwmon: (ibmpex) Fix possible UAF when ibmpex_register_bmc() fails
      hwmon: (i5500_temp) fix missing pci_disable_device()
      hwmon: (ina3221) Fix shunt sum critical calculation
      hwmon: (ltc2947) fix temperature scaling

commit 9bdc112be727cf1ba65be79541147f960c3349d8
Author: Yuan Can <[email protected]>
Date:   Fri Nov 25 01:43:29 2022 +0000

    hwmon: (asus-ec-sensors) Add checks for devm_kcalloc

    As the devm_kcalloc may return NULL, the return value needs to be checked
    to avoid NULL poineter dereference.

    Fixes: d0ddfd241e57 ("hwmon: (asus-ec-sensors) add driver for ASUS EC")
    Signed-off-by: Yuan Can <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Guenter Roeck <[email protected]>

commit 7dec14537c5906b8bf40fd6fd6d9c3850f8df11d
Author: Yang Yingliang <[email protected]>
Date:   Fri Nov 18 17:33:03 2022 +0800

    hwmon: (coretemp) fix pci device refcount leak in nv1a_ram_new()

    As comment of pci_get_domain_bus_and_slot() says, it returns
    a pci device with refcount increment, when finish using it,
    the caller must decrement the reference count by calling
    pci_dev_put(). So call it after using to avoid refcount leak.

    Fixes: 14513ee696a0 ("hwmon: (coretemp) Use PCI host bridge ID to identify CPU if necessary")
    Signed-off-by: Yang Yingliang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Guenter Roeck <[email protected]>

commit a89ff5f5cc64b9fe7a992cf56988fd36f56ca82a
Author: Phil Auld <[email protected]>
Date:   Thu Nov 17 11:23:13 2022 -0500

    hwmon: (coretemp) Check for null before removing sysfs attrs

    If coretemp_add_core() gets an error then pdata->core_data[indx]
    is already NULL and has been kfreed. Don't pass that to
    sysfs_remove_group() as that will crash in sysfs_remove_group().

    [Shortened for readability]
    [91854.020159] sysfs: cannot create duplicate filename '/devices/platform/coretemp.0/hwmon/hwmon2/temp20_label'
    <cpu offline>
    [91855.126115] BUG: kernel NULL pointer dereference, address: 0000000000000188
    [91855.165103] #PF: supervisor read access in kernel mode
    [91855.194506] #PF: error_code(0x0000) - not-present page
    [91855.224445] PGD 0 P4D 0
    [91855.238508] Oops: 0000 [#1] PREEMPT SMP PTI
    ...
    [91855.342716] RIP: 0010:sysfs_remove_group+0xc/0x80
    ...
    [91855.796571] Call Trace:
    [91855.810524]  coretemp_cpu_offline+0x12b/0x1dd [coretemp]
    [91855.841738]  ? coretemp_cpu_online+0x180/0x180 [coretemp]
    [91855.871107]  cpuhp_invoke_callback+0x105/0x4b0
    [91855.893432]  cpuhp_thread_fun+0x8e/0x150
    ...

    Fix this by checking for NULL first.

    Signed-off-by: Phil Auld <[email protected]>
    Cc: [email protected]
    Cc: Fenghua Yu <[email protected]>
    Cc: Jean Delvare <[email protected]>
    Cc: Guenter Roeck <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 199e0de7f5df3 ("hwmon: (coretemp) Merge pkgtemp with coretemp")
    Signed-off-by: Guenter Roeck <[email protected]>

commit 7572ac3c979d4d0fb42d73a72d2608656516ff4f
Author: Ard Biesheuvel <[email protected]>
Date:   Wed Nov 30 17:37:17 2022 +0100

    arm64: efi: Revert "Recover from synchronous exceptions ..."

    This reverts commit 23715a26c8d81291, which introduced some code in
    assembler that manipulates both the ordinary and the shadow call stack
    pointer in a way that could potentially be taken advantage of. So let's
    revert it, and do a better job the next time around.

    Signed-off-by: Ard Biesheuvel <[email protected]>

commit d9f15a9de44affe733e34f93bc184945ba277e6d
Author: Conor Dooley <[email protected]>
Date:   Tue Nov 22 12:16:21 2022 +0000

    Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"

    This reverts commit 232ccac1bd9b5bfe73895f527c08623e7fa0752d.

    On the subject of suspend, the RISC-V SBI spec states:

      This does not cover whether any given events actually reach the hart or
      not, just what the hart will do if it receives an event. On PolarFire
      SoC, and potentially other SiFive based implementations, events from the
      RISC-V timer do reach a hart during suspend. This is not the case for the
      implementation on the Allwinner D1 - there timer events are not received
      during suspend.

    To fix this, the CLOCK_EVT_FEAT_C3STOP (mis)feature was enabled for the
    timer driver - but this has broken both RCU stall detection and timers
    generally on PolarFire SoC and potentially other SiFive based
    implementations.

    If an AXI read to the PCIe controller on PolarFire SoC times out, the
    system will stall, however, with CLOCK_EVT_FEAT_C3STOP active, the system
    just locks up without RCU stalling:

    	io scheduler mq-deadline registered
    	io scheduler kyber registered
    	microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
    	microchip-pcie 2000000000.pcie:      MEM 0x2008000000..0x2087ffffff -> 0x0008000000
    	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: axi read request error
    	microchip-pcie 2000000000.pcie: axi read timeout
    	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
    	Freeing initrd memory: 7332K

    Similarly issues were reported with clock_nanosleep() - with a test app
    that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250 & the blamed
    commit in place, the sleep times are rounded up to the next jiffy:

    == CPU: 1 ==      == CPU: 2 ==      == CPU: 3 ==      == CPU: 4 ==
    Mean: 7.974992    Mean: 7.976534    Mean: 7.962591    Mean: 3.952179
    Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
    Hi: 9.472000      Hi: 10.495000     Hi: 8.864000      Hi: 4.736000
    Lo: 6.087000      Lo: 6.380000      Lo: 4.872000      Lo: 3.403000
    Samples: 521      Samples: 521      Samples: 521      Samples: 521

    Fortunately, the D1 has a second timer, which is "currently used in
    preference to the RISC-V/SBI timer driver" so a revert here does not
    hurt operation of D1 in its current form.

    Ultimately, a DeviceTree property (or node) will be added to encode the
    behaviour of the timers, but until then revert the addition of
    CLOCK_EVT_FEAT_C3STOP.

    Fixes: 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
    Signed-off-by: Conor Dooley <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Palmer Dabbelt <[email protected]>
    Acked-by: Palmer Dabbelt <[email protected]>
    Acked-by: Samuel Holland <[email protected]>
    Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
    Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98/
    Link: https://lore.kernel.org/linux-riscv/[email protected]/
    Link: https://lore.kernel.org/r/[email protected]

commit dd30dcfa7a74a06f8dcdab260d8d5adf32f17333
Author: Wenchao Chen <[email protected]>
Date:   Wed Nov 30 20:13:28 2022 +0800

    mmc: sdhci-sprd: Fix no reset data and command after voltage switch

    After switching the voltage, no reset data and command will cause
    CMD2 timeout.

    Fixes: 29ca763fc26f ("mmc: sdhci-sprd: Add pin control support for voltage switch")
    Signed-off-by: Wenchao Chen <[email protected]>
    Acked-by: Adrian Hunter <[email protected]>
    Reviewed-by: Baolin Wang <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Ulf Hansson <[email protected]>

commit 063c0e773ab33103407a76051821777014b756fe
Merge: ef4d3ea40565 f6abcc21d943
Author: Linus Torvalds <[email protected]>
Date:   Wed Nov 30 15:46:46 2022 -0800

    Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

    Pull clk fixes from Stephen Boyd:
     "A set of clk driver fixes that resolve issues for various SoCs.

      Most of these are incorrect clk data, like bad parent descriptions.
      When the clk tree is improperly described things don't work, like USB
      and UFS controllers, because clk frequencies are wonky. Here are the
      extra details:

       - Fix the parent of UFS reference clks on Qualcomm SC8280XP so that
         UFS works properly

       - Fix the clk ID for USB on AT91 RM9200 so the USB driver continues
         to probe

       - Stop using of_device_get_match_data() on the wrong device for a
         Samsung Exynos driver so it gets the proper clk data

       - Fix ExynosAutov9 binding

       - Fix the parent of the div4 clk on Exynos7885

       - Stop calling runtime PM APIs from the Qualcomm GDSC driver directly
         as it leads to a lockdep splat and is just plain wrong because it
         violates runtime PM semantics by calling runtime PM APIs when the
         device has been runtime PM disabled"

    * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
      clk: qcom: gcc-sc8280xp: add cxo as parent for three ufs ref clks
      ARM: at91: rm9200: fix usb device clock id
      clk: samsung: Revert "clk: samsung: exynos-clkout: Use of_device_get_match_data()"
      dt-bindings: clock: exynosautov9: fix reference to CMU_FSYS1
      clk: qcom: gdsc: Remove direct runtime PM calls
      clk: samsung: exynos7885: Correct "div4" clock parents

commit 1d351f1894342c378b96bb9ed89f8debb1e24e9f
Author: Andrew Morton <[email protected]>
Date:   Mon Nov 28 13:21:40 2022 -0800

    revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"

    It causes build failures with unusual CC/HOSTCC combinations.

    Quoting
    https://lkml.kernel.org/r/[email protected]:

      HOSTCC  scripts/mod/modpost.o - due to target missing
    In file included from include/linux/string.h:5,
                     from scripts/mod/../../include/linux/license.h:5,
                     from scripts/mod/modpost.c:24:
    include/linux/compiler.h:246:10: fatal error: asm/rwonce.h: No such file or directory
      246 | #include <asm/rwonce.h>
          |          ^~~~~~~~~~~~~~
    compilation terminated.

    ...

    The problem is that HOSTCC is not necessarily the same compiler or even
    architecture as CC and pulling in <linux/compiler.h> or <asm/rwonce.h>
    files indirectly isn't a good idea then.

    My toolchain is providing HOSTCC = gcc (MacPorts) and CC = arm-linux-gnueabihf
    (built from gcc source) and all running on Darwin.

    If I change the include to <string.h> I can then "HOSTCC scripts/mod/modpost.c"
    but then it fails for "CC kernel/module/main.c" not finding <string.h>:

      CC      kernel/module/main.o - due to target missing
    In file included from kernel/module/main.c:43:0:
    ./include/linux/license.h:5:20: fatal error: string.h: No such file or directory
     #include <string.h>
                        ^
    compilation terminated.

    Reported-by: "H. Nikolaus Schaller" <[email protected]>
    Cc: Sam James <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 152fe65f300e1819d59b80477d3e0999b4d5d7d2
Author: Lee Jones <[email protected]>
Date:   Fri Nov 25 12:07:50 2022 +0000

    Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled

    When enabled, KASAN enlarges function's stack-frames.  Pushing quite a few
    over the current threshold.  This can mainly be seen on 32-bit
    architectures where the present limit (when !GCC) is a lowly 1024-Bytes.

    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Lee Jones <[email protected]>
    Acked-by: Arnd Bergmann <[email protected]>
    Cc: Alex Deucher <[email protected]>
    Cc: "Christian König" <[email protected]>
    Cc: Daniel Vetter <[email protected]>
    Cc: David Airlie <[email protected]>
    Cc: Harry Wentland <[email protected]>
    Cc: Leo Li <[email protected]>
    Cc: Maarten Lankhorst <[email protected]>
    Cc: Maxime Ripard <[email protected]>
    Cc: Nathan Chancellor <[email protected]>
    Cc: Nick Desaulniers <[email protected]>
    Cc: "Pan, Xinhui" <[email protected]>
    Cc: Rodrigo Siqueira <[email protected]>
    Cc: Thomas Zimmermann <[email protected]>
    Cc: Tom Rix <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 6f6cb1714365a07dbc66851879538df9f6969288
Author: Lee Jones <[email protected]>
Date:   Fri Nov 25 12:07:49 2022 +0000

    drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame

    Patch series "Fix a bunch of allmodconfig errors", v2.

    Since b339ec9c229aa ("kbuild: Only default to -Werror if COMPILE_TEST")
    WERROR now defaults to COMPILE_TEST meaning that it's enabled for
    allmodconfig builds.  This leads to some interesting build failures when
    using Clang, each resolved in this set.

    With this set applied, I am able to obtain a successful allmodconfig Arm
    build.

    This patch (of 2):

    calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 ||
    ARM64) architectures built with Clang (all released versions), whereby the
    stack frame gets blown up to well over 5k.  This would cause an immediate
    kernel panic on most architectures.  We'll revert this when the following
    bug report has been resolved:
    https://github.com/llvm/llvm-project/issues/41896.

    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Lee Jones <[email protected]>
    Suggested-by: Arnd Bergmann <[email protected]>
    Acked-by: Arnd Bergmann <[email protected]>
    Cc: Alex Deucher <[email protected]>
    Cc: "Christian König" <[email protected]>
    Cc: Daniel Vetter <[email protected]>
    Cc: David Airlie <[email protected]>
    Cc: Harry Wentland <[email protected]>
    Cc: Lee Jones <[email protected]>
    Cc: Leo Li <[email protected]>
    Cc: Maarten Lankhorst <[email protected]>
    Cc: Maxime Ripard <[email protected]>
    Cc: Nathan Chancellor <[email protected]>
    Cc: Nick Desaulniers <[email protected]>
    Cc: "Pan, Xinhui" <[email protected]>
    Cc: Rodrigo Siqueira <[email protected]>
    Cc: Thomas Zimmermann <[email protected]>
    Cc: Tom Rix <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit f268f6cf875f3220afc77bdd0bf1bb136eb54db9
Author: Jann Horn <[email protected]>
Date:   Fri Nov 25 22:37:14 2022 +0100

    mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths

    Any codepath that zaps page table entries must invoke MMU notifiers to
    ensure that secondary MMUs (like KVM) don't keep accessing pages which
    aren't mapped anymore.  Secondary MMUs don't hold their own references to
    pages that are mirrored over, so failing to notify them can lead to page
    use-after-free.

    I'm marking this as addressing an issue introduced in commit f3f0e1d2150b
    ("khugepaged: add support of collapse for tmpfs/shmem pages"), but most of
    the security impact of this only came in commit 27e1f8273113 ("khugepaged:
    enable collapse pmd for pte-mapped THP"), which actually omitted flushes
    for the removal of present PTEs, not just for the removal of empty page
    tables.

    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
    Signed-off-by: Jann Horn <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Reviewed-by: Yang Shi <[email protected]>
    Cc: John Hubbard <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 2ba99c5e08812494bc57f319fb562f527d9bacd8
Author: Jann Horn <[email protected]>
Date:   Fri Nov 25 22:37:13 2022 +0100

    mm/khugepaged: fix GUP-fast interaction by sending IPI

    Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP
    collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to
    ensure that the page table was not removed by khugepaged in between.

    However, lockless_pages_from_mm() still requires that the page table is
    not concurrently freed.  Fix it by sending IPIs (if the architecture uses
    semi-RCU-style page table freeing) before freeing/reusing page tables.

    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: ba76149f47d8 ("thp: khugepaged")
    Signed-off-by: Jann Horn <[email protected]>
    Reviewed-by: Yang Shi <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Cc: John Hubbard <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 8d3c106e19e8d251da31ff4cc7462e4565d65084
Author: Jann Horn <[email protected]>
Date:   Fri Nov 25 22:37:12 2022 +0100

    mm/khugepaged: take the right locks for page table retraction

    pagetable walks on address ranges mapped by VMAs can be done under the
    mmap lock, the lock of an anon_vma attached to the VMA, or the lock of the
    VMA's address_space.  Only one of these needs to be held, and it does not
    need to be held in exclusive mode.

    Under those circumstances, the rules for concurrent access to page table
    entries are:

     - Terminal page table entries (entries that don't point to another page
       table) can be arbitrarily changed under the page table lock, with the
       exception that they always need to be consistent for
       hardware page table walks and lockless_pages_from_mm().
       This includes that they can be changed into non-terminal entries.
     - Non-terminal page table entries (which point to another page table)
       can not be modified; readers are allowed to READ_ONCE() an entry, verify
       that it is non-terminal, and then assume that its value will stay as-is.

    Retracting a page table involves modifying a non-terminal entry, so
    page-table-level locks are insufficient to protect against concurrent page
    table traversal; it requires taking all the higher-level locks under which
    it is possible to start a page walk in the relevant range in exclusive
    mode.

    The collapse_huge_page() path for anonymous THP already follows this rule,
    but the shmem/file THP path was getting it wrong, making it possible for
    concurrent rmap-based operations to cause corruption.

    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
    Signed-off-by: Jann Horn <[email protected]>
    Reviewed-by: Yang Shi <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Cc: John Hubbard <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 829ae0f81ce093d674ff2256f66a714753e9ce32
Author: Gavin Shan <[email protected]>
Date:   Thu Nov 24 17:55:23 2022 +0800

    mm: migrate: fix THP's mapcount on isolation

    The issue is reported when removing memory through virtio_mem device.  The
    transparent huge page, experienced copy-on-write fault, is wrongly
    regarded as pinned.  The transparent huge page is escaped from being
    isolated in isolate_migratepages_block().  The transparent huge page can't
    be migrated and the corresponding memory block can't be put into offline
    state.

    Fix it by replacing page_mapcount() with total_mapcount().  With this, the
    transparent huge page can be isolated and migrated, and the memory block
    can be put into offline state.  Besides, The page's refcount is increased
    a bit earlier to avoid the page is released when the check is executed.

    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
    Signed-off-by: Gavin Shan <[email protected]>
    Reported-by: Zhenyu Zhang <[email protected]>
    Tested-by: Zhenyu Zhang <[email protected]>
    Suggested-by: David Hildenbrand <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Cc: Alistair Popple <[email protected]>
    Cc: Hugh Dickins <[email protected]>
    Cc: Kirill A. Shutemov <[email protected]>
    Cc: Matthew Wilcox <[email protected]>
    Cc: William Kucharski <[email protected]>
    Cc: Zi Yan <[email protected]>
    Cc: <[email protected]>	[5.7+]
    Signed-off-by: Andrew Morton <[email protected]>

commit 4aaf269c768dbacd6268af73fda2ffccaa3f1d88
Author: Juergen Gross <[email protected]>
Date:   Wed Nov 23 07:45:10 2022 +0100

    mm: introduce arch_has_hw_nonleaf_pmd_young()

    When running as a Xen PV guests commit eed9a328aa1a ("mm: x86: add
    CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") can cause a protection violation in
    pmdp_test_and_clear_young():

     BUG: unable to handle page fault for address: ffff8880083374d0
     #PF: supervisor write access in kernel mode
     #PF: error_code(0x0003) - permissions violation
     PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065
     Oops: 0003 [#1] PREEMPT SMP NOPTI
     CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr+ #1
     RIP: e030:pmdp_test_and_clear_young+0x25/0x40

    This happens because the Xen hypervisor can't emulate direct writes to
    page table entries other than PTEs.

    This can easily be fixed by introducing arch_has_hw_nonleaf_pmd_young()
    similar to arch_has_hw_pte_young() and test that instead of
    CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG.

    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: eed9a328aa1a ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG")
    Signed-off-by: Juergen Gross <[email protected]>
    Reported-by: Sander Eikelenboom <[email protected]>
    Acked-by: Yu Zhao <[email protected]>
    Tested-by: Sander Eikelenboom <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>	[core changes]
    Signed-off-by: Andrew Morton <[email protected]>

commit 6617da8fb565445e0be4a4885443006374943d09
Author: Juergen Gross <[email protected]>
Date:   Wed Nov 30 14:49:41 2022 -0800

    mm: add dummy pmd_young() for architectures not having it

    In order to avoid #ifdeffery add a dummy pmd_young() implementation as a
    fallback.  This is required for the later patch "mm: introduce
    arch_has_hw_nonleaf_pmd_young()".

    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Juergen Gross <[email protected]>
    Acked-by: Yu Zhao <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Dave Hansen <[email protected]>
    Cc: Geert Uytterhoeven <[email protected]>
    Cc: "H. Peter Anvin" <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: Sander Eikelenboom <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 95bc35f9bee5220dad4e8567654ab3288a181639
Author: SeongJae Park <[email protected]>
Date:   Tue Nov 22 19:48:31 2022 +0000

    mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()

    Commit da87878010e5 ("mm/damon/sysfs: support online inputs update") made
    'damon_sysfs_set_schemes()' to be called for running DAMON context, which
    could have schemes.  In the case, DAMON sysfs interface is supposed to
    update, remove, or add schemes to reflect the sysfs files.  However, the
    code is assuming the DAMON context wouldn't have schemes at all, and
    therefore creates and adds new schemes.  As a result, the code doesn't
    work as intended for online schemes tuning and could have more than
    expected memory footprint.  The schemes are all in the DAMON context, so
    it doesn't leak the memory, though.

    Remove the wrong asssumption (the DAMON context wouldn't have schemes) in
    'damon_sysfs_set_schemes()' to fix the bug.

    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update")
    Signed-off-by: SeongJae Park <[email protected]>
    Cc: <[email protected]>	[5.19+]
    Signed-off-by: Andrew Morton <[email protected]>

commit a435874bf626f55d7147026b059008c8de89fbb8
Author: Tiezhu Yang <[email protected]>
Date:   Sat Nov 19 10:36:59 2022 +0800

    tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"

    The latest version of grep claims the egrep is now obsolete so the build
    now contains warnings that look like:

    	egrep: warning: egrep is obsolescent; using grep -E

    fix this up by moving the related file to use "grep -E" instead.

      sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/vm`

    Here are the steps to install the latest grep:

      wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
      tar xf grep-3.8.tar.gz
      cd grep-3.8 && ./configure && make
      sudo make install
      export PATH=/usr/local/bin:$PATH

    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Tiezhu Yang <[email protected]>
    Reviewed-by: Sergey Senozhatsky <[email protected]>
    Cc: Vlastimil Babka <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit f0a0ccda18d6fd826d7c7e7ad48a6ed61c20f8b4
Author: ZhangPeng <[email protected]>
Date:   Sat Nov 19 21:05:42 2022 +0900

    nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()

    Syzbot reported a null-ptr-deref bug:

     NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP
     frequency < 30 seconds
     general protection fault, probably for non-canonical address
     0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
     KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
     CPU: 1…
earlAchromatic referenced this pull request in earlAchromatic/linux Oct 4, 2023
* Merge tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fix from Ted Ts'o:
 "Fix a double unlock bug on an error path in ext4, found by smatch and
  syzkaller"

* tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix possible double unlock when moving a directory

* Merge tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "There's a little bit more 'movement' in there for my taste but it
  needs to happen and should make the code better after it.

   - Check cmdline_find_option()'s return value before further
     processing

   - Clear temporary storage in the resctrl code to prevent access to an
     unexistent MSR

   - Add a simple throttling mechanism to protect the hypervisor from
     potentially malicious SEV guests issuing requests in rapid
     succession.

     In order to not jeopardize the sanity of everyone involved in
     maintaining this code, the request issuing side has received a
     cleanup, split in more or less trivial, small and digestible
     pieces. Otherwise, the code was threatening to become an
     unmaintainable mess.

     Therefore, that cleanup is marked indirectly also for stable so
     that there's no differences between the upstream code and the
     stable variant when it comes down to backporting more there"

* tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix use of uninitialized buffer in sme_enable()
  x86/resctrl: Clear staged_config[] before and after it is used
  virt/coco/sev-guest: Add throttling awareness
  virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case
  virt/coco/sev-guest: Do some code style cleanups
  virt/coco/sev-guest: Carve out the request issuing logic into a helper
  virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request()
  virt/coco/sev-guest: Simplify extended guest request handling
  virt/coco/sev-guest: Check SEV_SNP attribute at probe time

* Merge tag 'perf_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Check whether sibling events have been deactivated before adding them
   to groups

 - Update the proper event time tracking variable depending on the event
   type

 - Fix a memory overwrite issue due to using the wrong function argument
   when outputting perf events

* tag 'perf_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix check before add_event_to_groups() in perf_group_detach()
  perf: fix perf_event_context->time
  perf/core: Fix perf_output_begin parameter is incorrectly invoked in perf_event_bpf_output

* xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags

Callers of xfs_alloc_vextent_iterate_ags that pass in the TRYLOCK flag
want us to perform a non-blocking scan of the AGs for free space.  There
are no ordering constraints for non-blocking AGF lock acquisition, so
the scan can freely start over at AG 0 even when minimum_agno > 0.

This manifests fairly reliably on xfs/294 on 6.3-rc2 with the parent
pointer patchset applied and the realtime volume enabled.  I observed
the following sequence as part of an xfs_dir_createname call:

0. Fragment the free space, then allocate nearly all the free space in
   all AGs except AG 0.

1. Create a directory in AG 2 and let it grow for a while.

2. Try to allocate 2 blocks to expand the dirent part of a directory.
   The space will be allocated out of AG 0, but the allocation will not
   be contiguous.  This (I think) activates the LOWMODE allocator.

3. The bmapi call decides to convert from extents to bmbt format and
   tries to allocate 1 block.  This allocation request calls
   xfs_alloc_vextent_start_ag with the inode number, which starts the
   scan at AG 2.  We ignore AG 0 (with all its free space) and instead
   scrape AG 2 and 3 for more space.  We find one block, but this now
   kicks t_highest_agno to 3.

4. The createname call decides it needs to split the dabtree.  It tries
   to allocate even more space with xfs_alloc_vextent_start_ag, but now
   we're constrained to AG 3, and we don't find the space.  The
   createname returns ENOSPC and the filesystem shuts down.

This change fixes the problem by making the trylock scan wrap around to
AG 0 if it doesn't like the AGs that it finds.  Since the current
transaction itself holds AGF 0, the trylock of AGF 0 will succeed, and
we take space from the AG that has plenty.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

* xfs: add tracepoints for each of the externally visible allocators

There are now five separate space allocator interfaces exposed to the
rest of XFS for five different strategies to find space.  Add
tracepoints for each of them so that I can tell from a trace dump
exactly which ones got called and what happened underneath them.  Add a
sixth so it's more obvious if an allocation actually happened.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

* xfs: test dir/attr hash when loading module

Back in the 6.2-rc1 days, Eric Whitney reported a fstests regression in
ext4 against generic/454.  The cause of this test failure was the
unfortunate combination of setting an xattr name containing UTF8 encoded
emoji, an xattr hash function that accepted a char pointer with no
explicit signedness, signed type extension of those chars to an int, and
the 6.2 build tools maintainers deciding to mandate -funsigned-char
across the board.  As a result, the ondisk extended attribute structure
written out by 6.1 and 6.2 were not the same.

This discrepancy, in fact, had been noticeable if a filesystem with such
an xattr were moved between any two architectures that don't employ the
same signedness of a raw "char" declaration.  The only reason anyone
noticed is that x86 gcc defaults to signed, and no such -funsigned-char
update was made to e2fsprogs, so e2fsck immediately started reporting
data corruption.

After a day and a half of discussing how to handle this use case (xattrs
with bit 7 set anywhere in the name) without breaking existing users,
Linus merged his own patch and didn't tell the maintainer.  None of the
ext4 developers realized this until AUTOSEL announced that the commit
had been backported to stable.

In the end, this problem could have been detected much earlier if there
had been any useful tests of hash function(s) in use inside ext4 to make
sure that they always produce the same outputs given the same inputs.

The XFS dirent/xattr name hash takes a uint8_t*, so I don't think it's
vulnerable to this problem.  However, let's avoid all this drama by
adding our own self test to check that the da hash produces the same
outputs for a static pile of inputs on various platforms.  This enables
us to fix any breakage that may result in a controlled fashion.  The
buffer and test data are identical to the patches submitted to xfsprogs.

Link: https://lore.kernel.org/linux-ext4/Y8bpkm3jA3bDm3eL@debian-BULLSEYE-live-builder-AMD64/
Link: https://lore.kernel.org/linux-xfs/ZBUKCRR7xvIqPrpX@destitution/T/#md38272cc684e2c0d61494435ccbb91f022e8dee4
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

* Merge tag 'ras_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS fix from Borislav Petkov:

 - Flush out logged errors immediately after MCA banks configuration
   changes over sysfs have been done instead of waiting until something
   else triggers the workqueue later - another error or the polling
   interval cycle is reached

* tag 'ras_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Make sure logged MCEs are processed after sysfs update

* cpumask: introduce for_each_cpu_or

Equivalent of for_each_cpu_and, except it ORs the two masks together
so it iterates all the CPUs present in either mask.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

* pcpcntrs: fix dying cpu summation race

In commit f689054aace2 ("percpu_counter: add percpu_counter_sum_all
interface") a race condition between a cpu dying and
percpu_counter_sum() iterating online CPUs was identified. The
solution was to iterate all possible CPUs for summation via
percpu_counter_sum_all().

We recently had a percpu_counter_sum() call in XFS trip over this
same race condition and it fired a debug assert because the
filesystem was unmounting and the counter *should* be zero just
before we destroy it. That was reported here:

https://lore.kernel.org/linux-kernel/[email protected]/

likely as a result of running generic/648 which exercises
filesystems in the presence of CPU online/offline events.

The solution to use percpu_counter_sum_all() is an awful one. We
use percpu counters and percpu_counter_sum() for accurate and
reliable threshold detection for space management, so a summation
race condition during these operations can result in overcommit of
available space and that may result in filesystem shutdowns.

As percpu_counter_sum_all() iterates all possible CPUs rather than
just those online or even those present, the mask can include CPUs
that aren't even installed in the machine, or in the case of
machines that can hot-plug CPU capable nodes, even have physical
sockets present in the machine.

Fundamentally, this race condition is caused by the CPU being
offlined being removed from the cpu_online_mask before the notifier
that cleans up per-cpu state is run. Hence percpu_counter_sum() will
not sum the count for a cpu currently being taken offline,
regardless of whether the notifier has run or not. This is
the root cause of the bug.

The percpu counter notifier iterates all the registered counters,
locks the counter and moves the percpu count to the global sum.
This is serialised against other operations that move the percpu
counter to the global sum as well as percpu_counter_sum() operations
that sum the percpu counts while holding the counter lock.

Hence the notifier is safe to run concurrently with sum operations,
and the only thing we actually need to care about is that
percpu_counter_sum() iterates dying CPUs. That's trivial to do,
and when there are no CPUs dying, it has no addition overhead except
for a cpumask_or() operation.

This change makes percpu_counter_sum() always do the right thing in
the presence of CPU hot unplug events and makes
percpu_counter_sum_all() unnecessary. This, in turn, means that
filesystems like XFS, ext4, and btrfs don't have to work out when
they should use percpu_counter_sum() vs percpu_counter_sum_all() in
their space accounting algorithms

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

* fork: remove use of percpu_counter_sum_all

This effectively reverts the change made in commit f689054aace2
("percpu_counter: add percpu_counter_sum_all interface") as the
race condition percpu_counter_sum_all() was invented to avoid is
now handled directly in percpu_counter_sum() and nobody needs to
care about summing racing with cpu unplug anymore.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

* pcpcntr: remove percpu_counter_sum_all()

percpu_counter_sum_all() is now redundant as the race condition it
was invented to handle is now dealt with by percpu_counter_sum()
directly and all users of percpu_counter_sum_all() have been
removed.

Remove it.

This effectively reverts the changes made in f689054aace2
("percpu_counter: add percpu_counter_sum_all interface") except for
the cpumask iteration that fixes percpu_counter_sum() made earlier
in this series.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

* Merge tag 'char-misc-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are a few small char/misc/other driver subsystem patches to
  resolve reported problems for 6.3-rc3.

  Included in here are:

   - Interconnect driver fixes for reported problems

   - Memory driver fixes for reported problems

   - nvmem core fix

   - firmware driver fix for reported problem

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (23 commits)
  memory: tegra30-emc: fix interconnect registration race
  memory: tegra20-emc: fix interconnect registration race
  memory: tegra124-emc: fix interconnect registration race
  memory: tegra: fix interconnect registration race
  interconnect: exynos: drop redundant link destroy
  interconnect: exynos: fix registration race
  interconnect: exynos: fix node leak in probe PM QoS error path
  interconnect: qcom: msm8974: fix registration race
  interconnect: qcom: rpmh: fix registration race
  interconnect: qcom: rpmh: fix probe child-node error handling
  interconnect: qcom: rpm: fix registration race
  nvmem: core: return -ENOENT if nvmem cell is not found
  firmware: xilinx: don't make a sleepable memory allocation from an atomic context
  interconnect: qcom: rpm: fix probe child-node error handling
  interconnect: qcom: osm-l3: fix registration race
  interconnect: imx: fix registration race
  interconnect: fix provider registration API
  interconnect: fix icc_provider_del() error handling
  interconnect: fix mem leak when freeing nodes
  interconnect: qcom: qcm2290: Fix MASTER_SNOC_BIMC_NRT
  ...

* Merge tag 'tty-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
 "Here are some small tty and serial driver fixes for 6.3-rc3 to resolve
  some reported issues.

  They include:

   - 8250 driver Kconfig issue pointed out by you that showed up in -rc1

   - qcom-geni serial driver fixes

   - various 8250 driver fixes for reported problems

   - fsl_lpuart driver fixes

   - serdev fix for regression in -rc1

   - vt.c bugfix

  All have been in linux-next for over a week with no reported problems"

* tag 'tty-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: vt: protect KD_FONT_OP_GET_TALL from unbound access
  serial: qcom-geni: drop bogus uart_write_wakeup()
  serial: qcom-geni: fix mapping of empty DMA buffer
  serial: qcom-geni: fix DMA mapping leak on shutdown
  serial: qcom-geni: fix console shutdown hang
  serdev: Set fwnode for serdev devices
  tty: serial: fsl_lpuart: fix race on RX DMA shutdown
  serial: 8250_pci1xxxx: Disable SERIAL_8250_PCI1XXXX config by default
  serial: 8250_fsl: fix handle_irq locking
  serial: 8250_em: Fix UART port type
  serial: 8250: ASPEED_VUART: select REGMAP instead of depending on it
  tty: serial: fsl_lpuart: skip waiting for transmission complete when UARTCTRL_SBK is asserted
  Revert "tty: serial: fsl_lpuart: adjust SERIAL_FSL_LPUART_CONSOLE config dependency"

* tracing: Make splice_read available again

Since the commit 36e2c7421f02 ("fs: don't allow splice read/write
without explicit ops") is applied to the kernel, splice() and
sendfile() calls on the trace file (/sys/kernel/debug/tracing
/trace) return EINVAL.

This patch restores these system calls by initializing splice_read
in file_operations of the trace file. This patch only enables such
functionalities for the read case.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]

Cc: [email protected]
Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: Sung-hun Kim <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

* ring-buffer: remove obsolete comment for free_buffer_page()

The comment refers to mm/slob.c which is being removed. It comes from
commit ed56829cb319 ("ring_buffer: reset buffer page when freeing") and
according to Steven the borrowed code was a page mapcount and mapping
reset, which was later removed by commit e4c2ce82ca27 ("ring_buffer:
allocate buffer page pointer"). Thus the comment is not accurate anyway,
remove it.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]

Cc: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Reported-by: Mike Rapoport <[email protected]>
Suggested-by: Steven Rostedt (Google) <[email protected]>
Fixes: e4c2ce82ca27 ("ring_buffer: allocate buffer page pointer")
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Mukesh Ojha <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

* tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr

There is a problem with the behavior of hwlat in a container,
resulting in incorrect output. A warning message is generated:
"cpumask changed while in round-robin mode, switching to mode none",
and the tracing_cpumask is ignored. This issue arises because
the kernel thread, hwlatd, is not a part of the container, and
the function sched_setaffinity is unable to locate it using its PID.
Additionally, the task_struct of hwlatd is already known.
Ultimately, the function set_cpus_allowed_ptr achieves
the same outcome as sched_setaffinity, but employs task_struct
instead of PID.

Test case:

  # cd /sys/kernel/tracing
  # echo 0 > tracing_on
  # echo round-robin > hwlat_detector/mode
  # echo hwlat > current_tracer
  # unshare --fork --pid bash -c 'echo 1 > tracing_on'
  # dmesg -c

Actual behavior:

[573502.809060] hwlat_detector: cpumask changed while in round-robin mode, switching to mode none

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]

Cc: Masami Hiramatsu <[email protected]>
Fixes: 0330f7aa8ee63 ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs")
Signed-off-by: Costa Shulyupin <[email protected]>
Acked-by: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

* Merge tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix setting affinity of hwlat threads in containers

   Using sched_set_affinity() has unwanted side effects when being
   called within a container. Use set_cpus_allowed_ptr() instead

 - Fix per cpu thread management of the hwlat tracer:
    - Do not start per_cpu threads if one is already running for the CPU
    - When starting per_cpu threads, do not clear the kthread variable
      as it may already be set to running per cpu threads

 - Fix return value for test_gen_kprobe_cmd()

   On error the return value was overwritten by being set to the result
   of the call from kprobe_event_delete(), which would likely succeed,
   and thus have the function return success

 - Fix splice() reads from the trace file that was broken by commit
   36e2c7421f02 ("fs: don't allow splice read/write without explicit
   ops")

 - Remove obsolete and confusing comment in ring_buffer.c

   The original design of the ring buffer used struct page flags for
   tricks to optimize, which was shortly removed due to them being
   tricks. But a comment for those tricks remained

 - Set local functions and variables to static

* tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr
  ring-buffer: remove obsolete comment for free_buffer_page()
  tracing: Make splice_read available again
  ftrace: Set direct_ops storage-class-specifier to static
  trace/hwlat: Do not start per-cpu thread if it is already running
  trace/hwlat: Do not wipe the contents of per-cpu thread data
  tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static
  tracing: Fix wrong return in kprobe_event_gen_test.c

* Linux 6.3-rc3

* thunderbolt: Use const qualifier for `ring_interrupt_index`

`ring_interrupt_index` doesn't change the data for `ring` so mark it as
const. This is needed by the following patch that disables interrupt
auto clear for rings.

Cc: Sanju Mehta <[email protected]>
Cc: [email protected]
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Mika Westerberg <[email protected]>

* thunderbolt: Disable interrupt auto clear for rings

When interrupt auto clear is programmed, any read to the interrupt
status register will clear all interrupts.  If two interrupts have
come in before one can be serviced then this will cause lost interrupts.

On AMD USB4 routers this has manifested in odd problems particularly
with long strings of control tranfers such as reading the DROM via bit
banging.

Instead of clearing interrupts automatically, clear the bit corresponding
to the given ring's interrupt in the ISR.

Fixes: 7a1808f82a37 ("thunderbolt: Handle ring interrupt by reading interrupt status register")
Cc: Sanju Mehta <[email protected]>
Cc: [email protected]
Tested-by: Anson Tsao <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Mika Westerberg <[email protected]>

* drm/i915/mtl: Fix Wa_16015201720 implementation

The commit 2357f2b271ad ("drm/i915/mtl: Initial display workarounds")
extended the workaround Wa_16015201720 to MTL. However the registers
that the original WA implemented moved for MTL.

Implement the workaround with the correct register.

v3: Skip clock gating for pipe C, D DMC's and fix the title

Fixes: 2357f2b271ad ("drm/i915/mtl: Initial display workarounds")
Cc: Matt Atwood <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Signed-off-by: Radhakrishna Sripada <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 0188be507b973e36f637ba010a369057c8cb7282)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915/fbdev: lock the fbdev obj before vma pin

lock the fbdev obj before calling into
i915_vma_pin_iomap(). This helps to solve below :

<7>[   93.563308] i915 0000:00:02.0: [drm:intelfb_create [i915]] no BIOS fb, allocating a new one
<4>[   93.581844] ------------[ cut here ]------------
<4>[   93.581855] WARNING: CPU: 12 PID: 625 at drivers/gpu/drm/i915/gem/i915_gem_pages.c:424 i915_gem_object_pin_map+0x152/0x1c0 [i915]

Fixes: f0b6b01b3efe ("drm/i915: Add ww context to intel_dpt_pin, v2.")
Cc: Chris Wilson <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Signed-off-by: Tejas Upadhyay <[email protected]>
Signed-off-by: Radhakrishna Sripada <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 561b31acfd65502a2cda2067513240fc57ccdbdc)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915: Preserve crtc_state->inherited during state clearing

intel_crtc_prepare_cleared_state() is unintentionally losing
the "inherited" flag. This will happen if intel_initial_commit()
is forced to go through the full modeset calculations for
whatever reason.

Afterwards the first real commit from userspace will not get
forced to the full modeset path, and thus eg. audio state may
not get recomputed properly. So if the monitor was already
enabled during boot audio will not work until userspace itself
does an explicit full modeset.

Cc: [email protected]
Tested-by: Lee Shawn C <[email protected]>
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>
(cherry picked from commit 2553bacaf953b48c59357f5a622282bc0c45adae)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915/mtl: Disable MC6 for MTL A step

The Wa_14017073508 require to send Media Busy/Idle mailbox while
accessing Media tile. As of now it is getting handled while __gt_unpark,
__gt_park. But there are various corner cases where forcewakes are taken
without __gt_unpark i.e. without sending Busy Mailbox especially during
register reads. Forcewakes are taken without busy mailbox leads to
GPU HANG. So bringing mailbox calls under forcewake calls are no feasible
option as forcewake calls are atomic and mailbox calls are blocking.
The issue already fixed in B step so disabling MC6 on A step and
reverting previous commit which handles Wa_14017073508

Fixes: 8f70f1ec587d ("drm/i915/mtl: Add Wa_14017073508 for SAMedia")
Cc: Rodrigo Vivi <[email protected]>
Signed-off-by: Badal Nilawar <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Anshuman Gupta <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 038a24835ab68f341eaa7a0e3bcc6ce0f9b22e17)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915/guc: Fix missing ecodes

Error captures are tagged with an 'ecode'. This is a pseduo-unique magic
number that is meant to distinguish similar seeming bugs with
different underlying signatures. It is a combination of two ring state
registers. Unfortunately, the register state being used is only valid
in execlist mode. In GuC mode, the register state exists in a separate
list of arbitrary register address/value pairs rather than the named
entry structure. So, search through that list to find the two exciting
registers and copy them over to the structure's named members.

v2: if else if instead of if if (Alan)

Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Alan Previn <[email protected]>
Fixes: a6f0f9cf330a ("drm/i915/guc: Plumb GuC-capture into gpu_coredump")
Cc: Alan Previn <[email protected]>
Cc: Umesh Nerlige Ramappa <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Aravind Iddamsetty <[email protected]>
Cc: Michael Cheng <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Bruce Chang <[email protected]>
Cc: Daniele Ceraolo Spurio <[email protected]>
Cc: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 9724ecdbb9ddd6da3260e4a442574b90fc75188a)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915/active: Fix missing debug object activation

debug_active_activate() expected ref->count to be zero
which is not true anymore as __i915_active_activate() calls
debug_active_activate() after incrementing the count.

v2: No need to check for "ref->count == 1" as __i915_active_activate()
already make sure of that(Janusz).

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6733
Fixes: 04240e30ed06 ("drm/i915: Skip taking acquire mutex for no ref->active callback")
Cc: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: [email protected]
Cc: Janusz Krzysztofik <[email protected]>
Cc: <[email protected]> # v5.10+
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Janusz Krzysztofik <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit bfad380c542438a9b642f8190b7fd37bc77e2723)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915/gt: perform uc late init after probe error injection

Probe pseudo errors should be injected only in places where real errors
can be encountered, otherwise unwinding code can be broken.
Placing intel_uc_init_late before i915_inject_probe_error violated
this rule, resulting in following bug:
__intel_gt_disable:655 GEM_BUG_ON(intel_gt_pm_is_awake(gt))

Fixes: 481d458caede ("drm/i915/guc: Add golden context to GuC ADS")
Acked-by: Nirmoy Das <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Signed-off-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit c4252a11131c7f27a158294241466e2a4e7ff94e)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915: Fix format for perf_limit_reasons

Use hex format so that it is easier to decode.

Fixes: fe5979665f64 ("drm/i915/debugfs: Add perf_limit_reasons in debugfs")
Signed-off-by: Vinay Belgaumkar <[email protected]>
Reviewed-by: Ashutosh Dixit <[email protected]>
Signed-off-by: John Harrison <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 5e008ba67cb80084e99b40ccd46f9029ae421632)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915: Update vblank timestamping stuff on seamless M/N change

When we change the M/N values seamlessly during a fastset we should
also update the vblank timestamping stuff to make sure the vblank
timestamp corrections/guesstimations come out exact.

Note that only crtc_clock and framedur_ns can actually end up
changing here during fastsets. Everything else we touch can
only change during full modesets.

Technically we should try to do this exactly at the start of
vblank, but that would require some kind of double buffering
scheme. Let's skip that for now and just update things right
after the commit has been submitted to the hardware. This
means the information will be properly up to date when the
vblank irq handler goes to work. Only if someone ends up
querying some vblanky stuff in between the commit and start
of vblank may we see a slight discrepancy.

Also this same problem really exists for the DRRS downclocking
stuff. But as that is supposed to be more or less transparent
to the user, and it only drops to low gear after a long delay
(1 sec currently) we probably don't have to worry about it.
Any time something is actively submitting updates DRRS will
remain in high gear and so the timestamping constants will
match the hardware state.

Reviewed-by: Jani Nikula <[email protected]>
Reviewed-by: Mitul Golani <[email protected]>
Fixes: e6f29923c048 ("drm/i915: Allow M/N change during fastset on bdw+")
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 8cb1f95cca68421b08333175719fdd3615372ca8)
Signed-off-by: Jani Nikula <[email protected]>

* net: dsa: report rx_bytes unadjusted for ETH_HLEN

We collect the software statistics counters for RX bytes (reported to
/proc/net/dev and to ethtool -S $dev | grep 'rx_bytes: ") at a time when
skb->len has already been adjusted by the eth_type_trans() ->
skb_pull_inline(skb, ETH_HLEN) call to exclude the L2 header.

This means that when connecting 2 DSA interfaces back to back and
sending 1 packet with length 100, the sending interface will report
tx_bytes as incrementing by 100, and the receiving interface will report
rx_bytes as incrementing by 86.

Since accounting for that in scripts is quirky and is something that
would be DSA-specific behavior (requiring users to know that they are
running on a DSA interface in the first place), the proposal is that we
treat it as a bug and fix it.

This design bug has always existed in DSA, according to my analysis:
commit 91da11f870f0 ("net: Distributed Switch Architecture protocol
support") also updates skb->dev->stats.rx_bytes += skb->len after the
eth_type_trans() call. Technically, prior to Florian's commit
a86d8becc3f0 ("net: dsa: Factor bottom tag receive functions"), each and
every vendor-specific tagging protocol driver open-coded the same bug,
until the buggy code was consolidated into something resembling what can
be seen now. So each and every driver should have its own Fixes: tag,
because of their different histories until the convergence point.
I'm not going to do that, for the sake of simplicity, but just blame the
oldest appearance of buggy code.

There are 2 ways to fix the problem. One is the obvious way, and the
other is how I ended up doing it. Obvious would have been to move
dev_sw_netstats_rx_add() one line above eth_type_trans(), and below
skb_push(skb, ETH_HLEN). But DSA processing is not as simple as that.
We count the bytes after removing everything DSA-related from the
packet, to emulate what the packet's length was, on the wire, when the
user port received it.

When eth_type_trans() executes, dsa_untag_bridge_pvid() has not run yet,
so in case the switch driver requests this behavior - commit
412a1526d067 ("net: dsa: untag the bridge pvid from rx skbs") has the
details - the obvious variant of the fix wouldn't have worked, because
the positioning there would have also counted the not-yet-stripped VLAN
header length, something which is absent from the packet as seen on the
wire (there it may be untagged, whereas software will see it as
PVID-tagged).

Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics")
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

* net: qcom/emac: Fix use after free bug in emac_remove due to race condition

In emac_probe, &adpt->work_thread is bound with
emac_work_thread. Then it will be started by timeout
handler emac_tx_timeout or a IRQ handler emac_isr.

If we remove the driver which will call emac_remove
  to make cleanup, there may be a unfinished work.

The possible sequence is as follows:

Fix it by finishing the work before cleanup in the emac_remove
and disable timeout response.

CPU0                  CPU1

                    |emac_work_thread
emac_remove         |
free_netdev         |
kfree(netdev);      |
                    |emac_reinit_locked
                    |emac_mac_down
                    |//use netdev
Fixes: b9b17debc69d ("net: emac: emac gigabit ethernet controller driver")
Signed-off-by: Zheng Wang <[email protected]>

Signed-off-by: David S. Miller <[email protected]>

* net: usb: lan78xx: Limit packet length to skb->len

Packet length retrieved from descriptor may be larger than
the actual socket buffer length. In such case the cloned
skb passed up the network stack will leak kernel memory contents.

Additionally prevent integer underflow when size is less than
ETH_FCS_LEN.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Szymon Heidrich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

* usb: plusb: remove unused pl_clear_QuickLink_features function

clang with W=1 reports
drivers/net/usb/plusb.c:65:1: error:
  unused function 'pl_clear_QuickLink_features' [-Werror,-Wunused-function]
pl_clear_QuickLink_features(struct usbnet *dev, int val)
^
This static function is not used, so remove it.

Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

* net/ps3_gelic_net: Fix RX sk_buff length

The Gelic Ethernet device needs to have the RX sk_buffs aligned to
GELIC_NET_RXBUF_ALIGN, and also the length of the RX sk_buffs must
be a multiple of GELIC_NET_RXBUF_ALIGN.

The current Gelic Ethernet driver was not allocating sk_buffs large
enough to allow for this alignment.

Also, correct the maximum and minimum MTU sizes, and add a new
preprocessor macro for the maximum frame size, GELIC_NET_MAX_FRAME.

Fixes various randomly occurring runtime network errors.

Fixes: 02c1889166b4 ("ps3: gigabit ethernet driver for PS3, take3")
Signed-off-by: Geoff Levand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

* net/ps3_gelic_net: Use dma_mapping_error

The current Gelic Etherenet driver was checking the return value of its
dma_map_single call, and not using the dma_mapping_error() routine.

Fixes runtime problems like these:

  DMA-API: ps3_gelic_driver sb_05: device driver failed to check map error
  WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:1027 .check_unmap+0x888/0x8dc

Fixes: 02c1889166b4 ("ps3: gigabit ethernet driver for PS3, take3")
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Geoff Levand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

* Merge branch 'ps3_gelic_net-fixes'

Geoff Levand says:

====================
net/ps3_gelic_net: DMA related fixes

v9: Make rx_skb_size local to gelic_descr_prepare_rx.
v8: Add more cpu_to_be32 calls.
v7: Remove all cleanups, sync to spider net.
v6: Reworked and cleaned up patches.
v5: Some additional patch cleanups.
v4: More patch cleanups.
v3: Cleaned up patches as requested.
====================

Signed-off-by: David S. Miller <[email protected]>

* Revert "drm/i915/hwmon: Enable PL1 power limit"

This reverts commit ee892ea83d99610fa33bea612de058e0955eec3a.

It was accidentally picked up for backporting. Revert.

Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Signed-off-by: Ashutosh Dixit <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

* thunderbolt: Rename shadowed variables bit to interrupt_bit and auto_clear_bit

cppcheck reports
drivers/thunderbolt/nhi.c:74:7: style: Local variable 'bit' shadows outer variable [shadowVariable]
  int bit;
      ^
drivers/thunderbolt/nhi.c:66:6: note: Shadowed declaration
 int bit = ring_interrupt_index(ring) & 31;
     ^
drivers/thunderbolt/nhi.c:74:7: note: Shadow variable
  int bit;
      ^
For readablity rename the outer to interrupt_bit and the innner
to auto_clear_bit.

Fixes: 468c49f44759 ("thunderbolt: Disable interrupt auto clear for ring")
Cc: [email protected]
Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: Mika Westerberg <[email protected]>

* Merge tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:

 - Fix /proc/PID/io read_bytes accounting

 - Fix setting NLM file_lock start and end during decoding testargs

 - Fix timing for setting access cache timestamps

* tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Correct timing for assigning access cache timestamp
  lockd: set file_lock start and end when decoding nlm4 testargs
  NFS: Fix /proc/PID/io read_bytes for buffered reads

* ACPI: video: Add backlight=native DMI quirk for Acer Aspire 3830TG

The Acer Aspire 3830TG predates Windows 8, so it defaults to using
acpi_video# for backlight control, but this is non functional on
this model.

Add a DMI quirk to use the native backlight interface which does
work properly.

Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

* gpu: host1x: fix uninitialized variable use

The error handling for platform_get_irq() failing no longer works after
a recent change, clang now points this out with a warning:

  drivers/gpu/host1x/dev.c:520:6: error: variable 'syncpt_irq' is uninitialized when used here [-Werror,-Wuninitialized]
          if (syncpt_irq < 0)
              ^~~~~~~~~~

Fix this by removing the variable and checking the correct error status.

Fixes: 625d4ffb438c ("gpu: host1x: Rewrite syncpoint interrupt handling")
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Jon Hunter <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Mikko Perttunen <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

* zonefs: Prevent uninitialized symbol 'size' warning

In zonefs_file_dio_append(), initialize the variable size to 0 to
prevent compilation and static code analizers warning such as:

New smatch warnings:
fs/zonefs/file.c:441 zonefs_file_dio_append() error: uninitialized
symbol 'size'.

The warning is a false positive as size is never actually used
uninitialized.

No functional change.

Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>

* zonefs: Fix error message in zonefs_file_dio_append()

Since the expected write location in a sequential file is always at the
end of the file (append write), when an invalid write append location is
detected in zonefs_file_dio_append(), print the invalid written location
instead of the expected write location.

Fixes: a608da3bd730 ("zonefs: Detect append writes at invalid locations")
Cc: [email protected]
Signed-off-by: Damien Le Moal <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>

* Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt fix from Eric Biggers:
 "Fix a bug where when a filesystem was being unmounted, the fscrypt
  keyring was destroyed before inodes have been released by the Landlock
  LSM.

  This bug was found by syzbot"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  fscrypt: check for NULL keyring in fscrypt_put_master_key_activeref()
  fscrypt: improve fscrypt_destroy_keyring() documentation
  fscrypt: destroy keyring after security_sb_delete()

* Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity fixes from Eric Biggers:
 "Fix two significant performance issues with fsverity"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
  fsverity: Remove WQ_UNBOUND from fsverity read workqueue

* block/io_uring: pass in issue_flags for uring_cmd task_work handling

io_uring_cmd_done() currently assumes that the uring_lock is held
when invoked, and while it generally is, this is not guaranteed.
Pass in the issue_flags associated with it, so that we have
IO_URING_F_UNLOCKED available to be able to lock the CQ ring
appropriately when completing events.

Cc: [email protected]
Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Jens Axboe <[email protected]>

* io_uring/net: avoid sending -ECONNABORTED on repeated connection requests

Since io_uring does nonblocking connect requests, if we do two repeated
ones without having a listener, the second will get -ECONNABORTED rather
than the expected -ECONNREFUSED. Treat -ECONNABORTED like a normal retry
condition if we're nonblocking, if we haven't already seen it.

Cc: [email protected]
Fixes: 3fb1bd688172 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT")
Link: https://github.com/axboe/liburing/issues/828
Reported-by: Hui, Chunyang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

* octeontx2-vf: Add missing free for alloc_percpu

Add the free_percpu for the allocated "vf->hw.lmt_info" in order to avoid
memory leak, same as the "pf->hw.lmt_info" in
`drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c`.

Fixes: 5c0512072f65 ("octeontx2-pf: cn10k: Use runtime allocated LMTLINE region")
Signed-off-by: Jiasheng Jiang <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Acked-by: Geethasowjanya Akula <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

* drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Book X90F

Like the Windows Lenovo Yoga Book X91F/L the Android Lenovo Yoga Book
X90F/L has a portrait 1200x1920 screen used in landscape mode,
add a quirk for this.

When the quirk for the X91F/L was initially added it was written to
also apply to the X90F/L but this does not work because the Android
version of the Yoga Book uses completely different DMI strings.
Also adjust the X91F/L quirk to reflect that it only applies to
the X91F/L models.

Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

* entry: Fix noinstr warning in __enter_from_user_mode()

__enter_from_user_mode() is triggering noinstr warnings with
CONFIG_DEBUG_PREEMPT due to its call of preempt_count_add() via
ct_state().

The preemption disable isn't needed as interrupts are already disabled.
And the context_tracking_enabled() check in ct_state() also isn't needed
as that's already being done by the CT_WARN_ON().

Just use __ct_state() instead.

Fixes the following warnings:

  vmlinux.o: warning: objtool: enter_from_user_mode+0xba: call to preempt_count_add() leaves .noinstr.text section
  vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0xf9: call to preempt_count_add() leaves .noinstr.text section
  vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0xc7: call to preempt_count_add() leaves .noinstr.text section
  vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0xba: call to preempt_count_add() leaves .noinstr.text section

Fixes: 171476775d32 ("context_tracking: Convert state to atomic_t")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/d8955fa6d68dc955dda19baf13ae014ae27926f5.1677369694.git.jpoimboe@kernel.org

* sched/fair: Sanitize vruntime of entity being migrated

Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
fixes an overflowing bug, but ignore a case that se->exec_start is reset
after a migration.

For fixing this case, we delay the reset of se->exec_start after
placing the entity which se->exec_start to detect long sleeping task.

In order to take into account a possible divergence between the clock_task
of 2 rqs, we increase the threshold to around 104 days.

Fixes: 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
Originally-by: Zhang Qiao <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Qiao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

* perf/x86/amd/core: Always clear status for idx

The variable 'status' (which contains the unhandled overflow bits) is
not being properly masked in some cases, displaying the following
warning:

  WARNING: CPU: 156 PID: 475601 at arch/x86/events/amd/core.c:972 amd_pmu_v2_handle_irq+0x216/0x270

This seems to be happening because the loop is being continued before
the status bit being unset, in case x86_perf_event_set_period()
returns 0. This is also causing an inconsistency because the "handled"
counter is incremented, but the status bit is not cleaned.

Move the bit cleaning together above, together when the "handled"
counter is incremented.

Fixes: 7685665c390d ("perf/x86/amd/core: Add PerfMonV2 overflow handling")
Signed-off-by: Breno Leitao <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Sandipan Das <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

* entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up

RCU sometimes needs to perform a delayed wake up for specific kthreads
handling offloaded callbacks (RCU_NOCB).  These wakeups are performed
by timers and upon entry to idle (also to guest and to user on nohz_full).

However the delayed wake-up on kernel exit is actually performed after
the thread flags are fetched towards the fast path check for work to
do on exit to user. As a result, and if there is no other pending work
to do upon that kernel exit, the current task will resume to userspace
with TIF_RESCHED set and the pending wake up ignored.

Fix this with fetching the thread flags _after_ the delayed RCU-nocb
kthread wake-up.

Fixes: 47b8ff194c1f ("entry: Explicitly flush pending rcuog wakeup before last rescheduling point")
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

* efi/libstub: zboot: Add compressed image to make targets

Avoid needlessly rebuilding the compressed image by adding the file
'vmlinuz' to the 'targets' Kbuild make variable.

Signed-off-by: Ard Biesheuvel <[email protected]>

* hwmon: (peci/cputemp) Fix miscalculated DTS for SKX

For Skylake, DTS temperature of the CPU is reported in S10.6 format
instead of S8.8.

Reported-by: Paul Fertser <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Iwona Winiarska <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Guenter Roeck <[email protected]>

* hwmon: fix potential sensor registration fail if of_node is missing

It is not sufficient to check of_node in current device.
In some cases, this would cause the sensor registration to fail.

This patch looks for device's ancestors to find a valid of_node if any.

Fixes: d560168b5d0f ("hwmon: (core) New hwmon registration API")
Signed-off-by: Phinex Hung <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Guenter Roeck <[email protected]>

* hwmon: (xgene) Fix ioremap and memremap leak

Smatch reports:

drivers/hwmon/xgene-hwmon.c:757 xgene_hwmon_probe() warn:
'ctx->pcc_comm_addr' from ioremap() not released on line: 757.

This is because in drivers/hwmon/xgene-hwmon.c:701 xgene_hwmon_probe(),
ioremap and memremap is not released, which may cause a leak.

To fix this, ioremap and memremap is modified to devm_ioremap and
devm_memremap.

Signed-off-by: Tianyi Jing <[email protected]>
Reviewed-by: Dongliang Mu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[groeck: Fixed formatting and subject]
Signed-off-by: Guenter Roeck <[email protected]>

* bootconfig: Fix testcase to increase max node

Since commit 6c40624930c5 ("bootconfig: Increase max nodes of bootconfig
from 1024 to 8192 for DCC support") increased the max number of bootconfig
node to 8192, the bootconfig testcase of the max number of nodes fails.
To fix this issue, we can not simply increase the number in the test script
because the test bootconfig file becomes too big (>32KB). To fix that, we
can use a combination of three alphabets (26^3 = 17576). But with that,
we can not express the 8193 (just one exceed from the limitation) because
it also exceeds the max size of bootconfig. So, the first 26 nodes will just
use one alphabet.

With this fix, test-bootconfig.sh passes all tests.

Link: https://lore.kernel.org/all/167888844790.791176.670805252426835131.stgit@devnote2/

Reported-by: Heinz Wiesinger <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Fixes: 6c40624930c5 ("bootconfig: Increase max nodes of bootconfig from 1024 to 8192 for DCC support")
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>

* keys: Do not cache key in task struct if key is requested from kernel thread

The key which gets cached in task structure from a kernel thread does not
get invalidated even after expiry.  Due to which, a new key request from
kernel thread will be served with the cached key if it's present in task
struct irrespective of the key validity.  The change is to not cache key in
task_struct when key requested from kernel thread so that kernel thread
gets a valid key on every key request.

The problem has been seen with the cifs module doing DNS lookups from a
kernel thread and the results getting pinned by being attached to that
kernel thread's cache - and thus not something that can be easily got rid
of.  The cache would ordinarily be cleared by notify-resume, but kernel
threads don't do that.

This isn't seen with AFS because AFS is doing request_key() within the
kernel half of a user thread - which will do notify-resume.

Fixes: 7743c48e54ee ("keys: Cache result of request_key*() temporarily in task_struct")
Signed-off-by: Bharath SM <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
cc: Shyam Prasad N <[email protected]>
cc: Steve French <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/CAGypqWw951d=zYRbdgNR4snUDvJhWL=q3=WOyh7HhSJupjz2vA@mail.gmail.com/

* verify_pefile: relax wrapper length check

The PE Format Specification (section "The Attribute Certificate Table
(Image Only)") states that `dwLength` is to be rounded up to 8-byte
alignment when used for traversal.  Therefore, the field is not required
to be an 8-byte multiple in the first place.

Accordingly, pesign has not performed this alignment since version
0.110.  This causes kexec failure on pesign'd binaries with "PEFILE:
Signature wrapper len wrong".  Update the comment and relax the check.

Signed-off-by: Robbie Harwood <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Jarkko Sakkinen <[email protected]>
cc: Eric Biederman <[email protected]>
cc: Herbert Xu <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#the-attribute-certificate-table-image-only
Link: https://github.com/rhboot/pesign
Link: https://lore.kernel.org/r/[email protected]/ # v2

* asymmetric_keys: log on fatal failures in PE/pkcs7

These particular errors can be encountered while trying to kexec when
secureboot lockdown is in place.  Without this change, even with a
signed debug build, one still needs to reboot the machine to add the
appropriate dyndbg parameters (since lockdown blocks debugfs).

Accordingly, upgrade all pr_debug() before fatal error into pr_warn().

Signed-off-by: Robbie Harwood <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Jarkko Sakkinen <[email protected]>
cc: Eric Biederman <[email protected]>
cc: Herbert Xu <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ # v2

* ice: fix rx buffers handling for flow director packets

Adding flow director filters stopped working correctly after
commit 2fba7dc5157b ("ice: Add support for XDP multi-buffer
on Rx side"). As a result, only first flow director filter
can be added, adding next filter leads to NULL pointer
dereference attached below.

Rx buffer handling and reallocation logic has been optimized,
however flow director specific traffic was not accounted for.
As a result driver handled those packets incorrectly since new
logic was based on ice_rx_ring::first_desc which was not set
in this case.

Fix this by setting struct ice_rx_ring::first_desc to next_to_clean
for flow director received packets.

[  438.544867] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  438.551840] #PF: supervisor read access in kernel mode
[  438.556978] #PF: error_code(0x0000) - not-present page
[  438.562115] PGD 7c953b2067 P4D 0
[  438.565436] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  438.569794] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.2.0-net-bug #1
[  438.577531] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
[  438.588470] RIP: 0010:ice_clean_rx_irq+0x2b9/0xf20 [ice]
[  438.593860] Code: 45 89 f7 e9 ac 00 00 00 8b 4d 78 41 31 4e 10 41 09 d5 4d 85 f6 0f 84 82 00 00 00 49 8b 4e 08 41 8b 76
1c 65 8b 3d 47 36 4a 3f <48> 8b 11 48 c1 ea 36 39 d7 0f 85 a6 00 00 00 f6 41 08 02 0f 85 9c
[  438.612605] RSP: 0018:ff8c732640003ec8 EFLAGS: 00010082
[  438.617831] RAX: 0000000000000800 RBX: 00000000000007ff RCX: 0000000000000000
[  438.624957] RDX: 0000000000000800 RSI: 0000000000000000 RDI: 0000000000000000
[  438.632089] RBP: ff4ed275a2158200 R08: 00000000ffffffff R09: 0000000000000020
[  438.639222] R10: 0000000000000000 R11: 0000000000000020 R12: 0000000000001000
[  438.646356] R13: 0000000000000000 R14: ff4ed275d0daffe0 R15: 0000000000000000
[  438.653485] FS:  0000000000000000(0000) GS:ff4ed2738fa00000(0000) knlGS:0000000000000000
[  438.661563] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  438.667310] CR2: 0000000000000000 CR3: 0000007c9f0d6006 CR4: 0000000000771ef0
[  438.674444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  438.681573] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  438.688697] PKRU: 55555554
[  438.691404] Call Trace:
[  438.693857]  <IRQ>
[  438.695877]  ? profile_tick+0x17/0x80
[  438.699542]  ice_msix_clean_ctrl_vsi+0x24/0x50 [ice]
[  438.702571] ice 0000:b1:00.0: VF 1: ctrl_vsi irq timeout
[  438.704542]  __handle_irq_event_percpu+0x43/0x1a0
[  438.704549]  handle_irq_event+0x34/0x70
[  438.704554]  handle_edge_irq+0x9f/0x240
[  438.709901] iavf 0000:b1:01.1: Failed to add Flow Director filter with status: 6
[  438.714571]  __common_interrupt+0x63/0x100
[  438.714580]  common_interrupt+0xb4/0xd0
[  438.718424] iavf 0000:b1:01.1: Rule ID: 127 dst_ip: 0.0.0.0 src_ip 0.0.0.0 UDP: dst_port 4 src_port 0
[  438.722255]  </IRQ>
[  438.722257]  <TASK>
[  438.722257]  asm_common_interrupt+0x22/0x40
[  438.722262] RIP: 0010:cpuidle_enter_state+0xc8/0x430
[  438.722267] Code: 6e e9 25 ff e8 f9 ef ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 d7 f1 24 ff 45
84 ff 0f 85 57 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[  438.722269] RSP: 0018:ffffffff86003e50 EFLAGS: 00000246
[  438.784108] RAX: ff4ed2738fa00000 RBX: ffbe72a64fc01020 RCX: 0000000000000000
[  438.791234] RDX: 0000000000000000 RSI: ffffffff858d84de RDI: ffffffff85893641
[  438.798365] RBP: 0000000000000002 R08: 0000000000000002 R09: 000000003158af9d
[  438.805490] R10: 0000000000000008 R11: 0000000000000354 R12: ffffffff862365a0
[  438.812622] R13: 000000661b472a87 R14: 0000000000000002 R15: 0000000000000000
[  438.819757]  cpuidle_enter+0x29/0x40
[  438.823333]  do_idle+0x1b6/0x230
[  438.826566]  cpu_startup_entry+0x19/0x20
[  438.830492]  rest_init+0xcb/0xd0
[  438.833717]  arch_call_rest_init+0xa/0x30
[  438.837731]  start_kernel+0x776/0xb70
[  438.841396]  secondary_startup_64_no_verify+0xe5/0xeb
[  438.846449]  </TASK>

Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Signed-off-by: Piotr Raczynski <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

* ice: check if VF exists before mode check

Setting trust on VF should return EINVAL when there is no VF. Move
checking for switchdev mode after checking if VF exists.

Fixes: c54d209c78b8 ("ice: Wait for VF to be reset/ready before configuration")
Signed-off-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Kalyan Kodamagula <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

* ice: remove filters only if VSI is deleted

Filters shouldn't be removed in VSI rebuild path. Removing them on PF
VSI results in no rule for PF MAC after changing for example queues
amount.

Remove all filters only in the VSI remove flow. As unload should also
cause the filter to be removed introduce, a new function ice_stop_eth().
It will unroll ice_start_eth(), so remove filters and close VSI.

Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

* iavf: fix hang on reboot with ice

When a system with E810 with existing VFs gets rebooted the following
hang may be observed.

 Pid 1 is hung in iavf_remove(), part of a network driver:
 PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
  #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
  #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
  #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
  #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
  #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
  #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
  #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
  #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
  #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
  #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
 #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
 #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
 #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
 #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
 #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
     RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
     RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
     RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
     R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
     R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
     ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.

Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().

Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove")
Fixes: a8417330f8a5 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
Reported-by: Marius Cornea <[email protected]>
Signed-off-by: Stefan Assmann <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

* i40e: fix flow director packet filter programming

Initialize to zero structures to build a valid
Tx Packet used for the filter programming.

Fixes: a9219b332f52 ("i40e: VLAN field for flow director")
Signed-off-by: Radoslaw Tyl <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
Sign…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.