Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR with 2k changed files, no reverts #4

Open
wants to merge 2,654 commits into
base: branch-v6.1-rc1
Choose a base branch
from
Open

Conversation

fahhem
Copy link

@fahhem fahhem commented Apr 19, 2023

No description provided.

fdmanana and others added 30 commits November 23, 2022 16:52
When logging an inode in full mode, or when logging xattrs or when logging
the dir index items of a directory, we are modifying the log tree while
holding a read lock on a leaf from the fs/subvolume tree. This can lead to
a deadlock in rare circumstances, but it is a real possibility, and it was
recently reported by syzbot with the following trace from lockdep:

   WARNING: possible circular locking dependency detected
   6.1.0-rc5-next-20221116-syzkaller #0 Not tainted
   ------------------------------------------------------
   syz-executor.1/16154 is trying to acquire lock:
   ffff88807e3084a0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0xa1/0xf30 fs/btrfs/delayed-inode.c:256

   but task is already holding lock:
   ffff88807df33078 (btrfs-log-00){++++}-{3:3}, at: __btrfs_tree_lock+0x32/0x3d0 fs/btrfs/locking.c:197

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #2 (btrfs-log-00){++++}-{3:3}:
          down_read_nested+0x9e/0x450 kernel/locking/rwsem.c:1634
          __btrfs_tree_read_lock+0x32/0x350 fs/btrfs/locking.c:135
          btrfs_tree_read_lock fs/btrfs/locking.c:141 [inline]
          btrfs_read_lock_root_node+0x82/0x3a0 fs/btrfs/locking.c:280
          btrfs_search_slot_get_root fs/btrfs/ctree.c:1678 [inline]
          btrfs_search_slot+0x3ca/0x2c70 fs/btrfs/ctree.c:1998
          btrfs_lookup_csum+0x116/0x3f0 fs/btrfs/file-item.c:209
          btrfs_csum_file_blocks+0x40e/0x1370 fs/btrfs/file-item.c:1021
          log_csums.isra.0+0x244/0x2d0 fs/btrfs/tree-log.c:4258
          copy_items.isra.0+0xbfb/0xed0 fs/btrfs/tree-log.c:4403
          copy_inode_items_to_log+0x13d6/0x1d90 fs/btrfs/tree-log.c:5873
          btrfs_log_inode+0xb19/0x4680 fs/btrfs/tree-log.c:6495
          btrfs_log_inode_parent+0x890/0x2a20 fs/btrfs/tree-log.c:6982
          btrfs_log_dentry_safe+0x59/0x80 fs/btrfs/tree-log.c:7083
          btrfs_sync_file+0xa41/0x13c0 fs/btrfs/file.c:1921
          vfs_fsync_range+0x13e/0x230 fs/sync.c:188
          generic_write_sync include/linux/fs.h:2856 [inline]
          iomap_dio_complete+0x73a/0x920 fs/iomap/direct-io.c:128
          btrfs_direct_write fs/btrfs/file.c:1536 [inline]
          btrfs_do_write_iter+0xba2/0x1470 fs/btrfs/file.c:1668
          call_write_iter include/linux/fs.h:2160 [inline]
          do_iter_readv_writev+0x20b/0x3b0 fs/read_write.c:735
          do_iter_write+0x182/0x700 fs/read_write.c:861
          vfs_iter_write+0x74/0xa0 fs/read_write.c:902
          iter_file_splice_write+0x745/0xc90 fs/splice.c:686
          do_splice_from fs/splice.c:764 [inline]
          direct_splice_actor+0x114/0x180 fs/splice.c:931
          splice_direct_to_actor+0x335/0x8a0 fs/splice.c:886
          do_splice_direct+0x1ab/0x280 fs/splice.c:974
          do_sendfile+0xb19/0x1270 fs/read_write.c:1255
          __do_sys_sendfile64 fs/read_write.c:1323 [inline]
          __se_sys_sendfile64 fs/read_write.c:1309 [inline]
          __x64_sys_sendfile64+0x259/0x2c0 fs/read_write.c:1309
          do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
          entry_SYSCALL_64_after_hwframe+0x63/0xcd

   -> #1 (btrfs-tree-00){++++}-{3:3}:
          __lock_release kernel/locking/lockdep.c:5382 [inline]
          lock_release+0x371/0x810 kernel/locking/lockdep.c:5688
          up_write+0x2a/0x520 kernel/locking/rwsem.c:1614
          btrfs_tree_unlock_rw fs/btrfs/locking.h:189 [inline]
          btrfs_unlock_up_safe+0x1e3/0x290 fs/btrfs/locking.c:238
          search_leaf fs/btrfs/ctree.c:1832 [inline]
          btrfs_search_slot+0x265e/0x2c70 fs/btrfs/ctree.c:2074
          btrfs_insert_empty_items+0xbd/0x1c0 fs/btrfs/ctree.c:4133
          btrfs_insert_delayed_item+0x826/0xfa0 fs/btrfs/delayed-inode.c:746
          btrfs_insert_delayed_items fs/btrfs/delayed-inode.c:824 [inline]
          __btrfs_commit_inode_delayed_items fs/btrfs/delayed-inode.c:1111 [inline]
          __btrfs_run_delayed_items+0x280/0x590 fs/btrfs/delayed-inode.c:1153
          flush_space+0x147/0xe90 fs/btrfs/space-info.c:728
          btrfs_async_reclaim_metadata_space+0x541/0xc10 fs/btrfs/space-info.c:1086
          process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
          worker_thread+0x669/0x1090 kernel/workqueue.c:2436
          kthread+0x2e8/0x3a0 kernel/kthread.c:376
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

   -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
          check_prev_add kernel/locking/lockdep.c:3097 [inline]
          check_prevs_add kernel/locking/lockdep.c:3216 [inline]
          validate_chain kernel/locking/lockdep.c:3831 [inline]
          __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5055
          lock_acquire kernel/locking/lockdep.c:5668 [inline]
          lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
          __mutex_lock_common kernel/locking/mutex.c:603 [inline]
          __mutex_lock+0x12f/0x1360 kernel/locking/mutex.c:747
          __btrfs_release_delayed_node.part.0+0xa1/0xf30 fs/btrfs/delayed-inode.c:256
          __btrfs_release_delayed_node fs/btrfs/delayed-inode.c:251 [inline]
          btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline]
          btrfs_remove_delayed_node+0x52/0x60 fs/btrfs/delayed-inode.c:1285
          btrfs_evict_inode+0x511/0xf30 fs/btrfs/inode.c:5554
          evict+0x2ed/0x6b0 fs/inode.c:664
          dispose_list+0x117/0x1e0 fs/inode.c:697
          prune_icache_sb+0xeb/0x150 fs/inode.c:896
          super_cache_scan+0x391/0x590 fs/super.c:106
          do_shrink_slab+0x464/0xce0 mm/vmscan.c:843
          shrink_slab_memcg mm/vmscan.c:912 [inline]
          shrink_slab+0x388/0x660 mm/vmscan.c:991
          shrink_node_memcgs mm/vmscan.c:6088 [inline]
          shrink_node+0x93d/0x1f30 mm/vmscan.c:6117
          shrink_zones mm/vmscan.c:6355 [inline]
          do_try_to_free_pages+0x3b4/0x17a0 mm/vmscan.c:6417
          try_to_free_mem_cgroup_pages+0x3a4/0xa70 mm/vmscan.c:6732
          reclaim_high.constprop.0+0x182/0x230 mm/memcontrol.c:2393
          mem_cgroup_handle_over_high+0x190/0x520 mm/memcontrol.c:2578
          try_charge_memcg+0xe0c/0x12f0 mm/memcontrol.c:2816
          try_charge mm/memcontrol.c:2827 [inline]
          charge_memcg+0x90/0x3b0 mm/memcontrol.c:6889
          __mem_cgroup_charge+0x2b/0x90 mm/memcontrol.c:6910
          mem_cgroup_charge include/linux/memcontrol.h:667 [inline]
          __filemap_add_folio+0x615/0xf80 mm/filemap.c:852
          filemap_add_folio+0xaf/0x1e0 mm/filemap.c:934
          __filemap_get_folio+0x389/0xd80 mm/filemap.c:1976
          pagecache_get_page+0x2e/0x280 mm/folio-compat.c:104
          find_or_create_page include/linux/pagemap.h:612 [inline]
          alloc_extent_buffer+0x2b9/0x1580 fs/btrfs/extent_io.c:4588
          btrfs_init_new_buffer fs/btrfs/extent-tree.c:4869 [inline]
          btrfs_alloc_tree_block+0x2e1/0x1320 fs/btrfs/extent-tree.c:4988
          __btrfs_cow_block+0x3b2/0x1420 fs/btrfs/ctree.c:440
          btrfs_cow_block+0x2fa/0x950 fs/btrfs/ctree.c:595
          btrfs_search_slot+0x11b0/0x2c70 fs/btrfs/ctree.c:2038
          btrfs_update_root+0xdb/0x630 fs/btrfs/root-tree.c:137
          update_log_root fs/btrfs/tree-log.c:2841 [inline]
          btrfs_sync_log+0xbfb/0x2870 fs/btrfs/tree-log.c:3064
          btrfs_sync_file+0xdb9/0x13c0 fs/btrfs/file.c:1947
          vfs_fsync_range+0x13e/0x230 fs/sync.c:188
          generic_write_sync include/linux/fs.h:2856 [inline]
          iomap_dio_complete+0x73a/0x920 fs/iomap/direct-io.c:128
          btrfs_direct_write fs/btrfs/file.c:1536 [inline]
          btrfs_do_write_iter+0xba2/0x1470 fs/btrfs/file.c:1668
          call_write_iter include/linux/fs.h:2160 [inline]
          do_iter_readv_writev+0x20b/0x3b0 fs/read_write.c:735
          do_iter_write+0x182/0x700 fs/read_write.c:861
          vfs_iter_write+0x74/0xa0 fs/read_write.c:902
          iter_file_splice_write+0x745/0xc90 fs/splice.c:686
          do_splice_from fs/splice.c:764 [inline]
          direct_splice_actor+0x114/0x180 fs/splice.c:931
          splice_direct_to_actor+0x335/0x8a0 fs/splice.c:886
          do_splice_direct+0x1ab/0x280 fs/splice.c:974
          do_sendfile+0xb19/0x1270 fs/read_write.c:1255
          __do_sys_sendfile64 fs/read_write.c:1323 [inline]
          __se_sys_sendfile64 fs/read_write.c:1309 [inline]
          __x64_sys_sendfile64+0x259/0x2c0 fs/read_write.c:1309
          do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
          entry_SYSCALL_64_after_hwframe+0x63/0xcd

   other info that might help us debug this:

   Chain exists of:
     &delayed_node->mutex --> btrfs-tree-00 --> btrfs-log-00

   Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(btrfs-log-00);
                                  lock(btrfs-tree-00);
                                  lock(btrfs-log-00);
     lock(&delayed_node->mutex);

Holding a read lock on a leaf from a fs/subvolume tree creates a nasty
lock dependency when we are COWing extent buffers for the log tree and we
have two tasks modifying the log tree, with each one in one of the
following 2 scenarios:

1) Modifying the log tree triggers an extent buffer allocation while
   holding a write lock on a parent extent buffer from the log tree.
   Allocating the pages for an extent buffer, or the extent buffer
   struct, can trigger inode eviction and finally the inode eviction
   will trigger a release/remove of a delayed node, which requires
   taking the delayed node's mutex;

2) Allocating a metadata extent for a log tree can trigger the async
   reclaim thread and make us wait for it to release enough space and
   unblock our reservation ticket. The reclaim thread can start flushing
   delayed items, and that in turn results in the need to lock delayed
   node mutexes and in the need to write lock extent buffers of a
   subvolume tree - all this while holding a write lock on the parent
   extent buffer in the log tree.

So one task in scenario 1) running in parallel with another task in
scenario 2) could lead to a deadlock, one wanting to lock a delayed node
mutex while having a read lock on a leaf from the subvolume, while the
other is holding the delayed node's mutex and wants to write lock the same
subvolume leaf for flushing delayed items.

Fix this by cloning the leaf of the fs/subvolume tree, release/unlock the
fs/subvolume leaf and use the clone leaf instead.

Reported-by: [email protected]
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
CC: [email protected] # 6.0+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Although kset_unregister() can eventually remove all attribute files,
explicitly rolling back with the matching function makes the code logic
look clearer.

CC: [email protected] # 5.4+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.1

A clutch of small fixes that have come in in the past week, people seem
to have been unusually active for this late in the release cycle.  The
most critical one here is the fix to renumber the SOF DAI types in order
to restore ABI compatibility which was broken by the addition of AMD
support.
ixgbevf_init_module() won't destroy the workqueue created by
create_singlethread_workqueue() when pci_register_driver() failed. Add
destroy_workqueue() in fail path to prevent the resource leak.

Similar to the handling of u132_hcd_init in commit f276e00
("usb: u132-hcd: fix resource leak")

Fixes: 40a13e2 ("ixgbevf: Use a private workqueue to avoid certain possible hangs")
Signed-off-by: Shang XiaoJing <[email protected]>
Reviewed-by: Saeed Mahameed <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
i40e_init_module() won't free the debugfs directory created by
i40e_dbg_init() when pci_register_driver() failed. Add fail path to
call i40e_dbg_exit() to remove the debugfs entries to prevent the bug.

i40e: Intel(R) Ethernet Connection XL710 Network Driver
i40e: Copyright (c) 2013 - 2019 Intel Corporation.
debugfs: Directory 'i40e' with parent '/' already present!

Fixes: 41c445f ("i40e: main driver core")
Signed-off-by: Shang XiaoJing <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
A problem about modprobe fm10k failed is triggered with the following log
given:

 Intel(R) Ethernet Switch Host Interface Driver
 Copyright(c) 2013 - 2019 Intel Corporation.
 debugfs: Directory 'fm10k' with parent '/' already present!

The reason is that fm10k_init_module() returns fm10k_register_pci_driver()
directly without checking its return value, if fm10k_register_pci_driver()
failed, it returns without removing debugfs and destroy workqueue,
resulting the debugfs of fm10k can never be created later and leaks the
workqueue.

 fm10k_init_module()
   alloc_workqueue()
   fm10k_dbg_init() # create debugfs
   fm10k_register_pci_driver()
     pci_register_driver()
       driver_register()
         bus_add_driver()
           priv = kzalloc(...) # OOM happened
   # return without remove debugfs and destroy workqueue

Fix by remove debugfs and destroy workqueue when
fm10k_register_pci_driver() returns error.

Fixes: 7461fd9 ("fm10k: Add support for debugfs")
Fixes: b382bb1 ("fm10k: use separate workqueue for fm10k driver")
Signed-off-by: Yuan Can <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
The iavf_init_module() won't destroy workqueue when pci_register_driver()
failed. Call destroy_workqueue() when pci_register_driver() failed to
prevent the resource leak.

Similar to the handling of u132_hcd_init in commit f276e00
("usb: u132-hcd: fix resource leak")

Fixes: 2803b16 ("i40e/i40evf: Use private workqueue")
Signed-off-by: Yuan Can <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
In e100_xmit_prepare(), if we can't map the skb, then return -ENOMEM, so
e100_xmit_frame() will return NETDEV_TX_BUSY and the upper layer will
resend the skb. But the skb is already freed, which will cause UAF bug
when the upper layer resends the skb.

Remove the harmful free.

Fixes: 5e5d494 ("e100: Release skb when DMA mapping is failed in e100_xmit_prepare")
Signed-off-by: Wang Hai <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
There is a spelling mistake in a pr_warn message. Fix it.

Signed-off-by: Colin Ian King <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stefan Haberland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
32 byte are to less for important data from prefix or
other commands.
Print up to 128 byte data. This is enough for the largest
CCW data we have.

Since printk can only print up to 1024 byte at once, print the
different parts of the CCW dumps separately.

Signed-off-by: Stefan Haberland <[email protected]>
Reviewed-by: Jan Hoeppner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
For DASD devices in raw_track_access mode only full track images are
read and written.
For this purpose it is not necessary to do search operation in the
locate record extended function. The documentation even states that
this might fail if the searched record is not found on a track.

Currently the driver sets a value of 1 in the search field for the first
record after record zero. This is the default for disks not in
raw_track_access mode but record 1 might be missing on a completely
empty track.

There has not been any problem with this on IBM storage servers but it
might lead to errors with DASD devices on other vendors storage servers.

Fix this by setting the search field to 0. Record zero is always available
even on a completely empty track.

Fixes: e4dbb0f ("[S390] dasd: Add support for raw ECKD access.")
Signed-off-by: Stefan Haberland <[email protected]>
Reviewed-by: Jan Hoeppner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
dasd_copy_relation->entry[] array might be accessed out of bounds if the
loop does not break.

Fixes: a91ff09 ("s390/dasd: add copy pair setup")
Signed-off-by: Stefan Haberland <[email protected]>
Reviewed-by: Jan Hoeppner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
The type of a->key[0] is char in fscache_volume_same().  If the length
of cache volume key is greater than 127, the value of a->key[0] is less
than 0.  In this case, klen becomes much larger than 255 after type
conversion, because the type of klen is size_t.  As a result, memcmp()
is read out of bounds.

This causes a slab-out-of-bounds Read in __fscache_acquire_volume(), as
reported by Syzbot.

Fix this by changing the type of the stored key to "u8 *" rather than
"char *" (it isn't a simple string anyway).  Also put in a check that
the volume name doesn't exceed NAME_MAX.

  BUG: KASAN: slab-out-of-bounds in memcmp+0x16f/0x1c0 lib/string.c:757
  Read of size 8 at addr ffff888016f3aa90 by task syz-executor344/3613
  Call Trace:
   memcmp+0x16f/0x1c0 lib/string.c:757
   memcmp include/linux/fortify-string.h:420 [inline]
   fscache_volume_same fs/fscache/volume.c:133 [inline]
   fscache_hash_volume fs/fscache/volume.c:171 [inline]
   __fscache_acquire_volume+0x76c/0x1080 fs/fscache/volume.c:328
   fscache_acquire_volume include/linux/fscache.h:204 [inline]
   v9fs_cache_session_get_cookie+0x143/0x240 fs/9p/cache.c:34
   v9fs_session_init+0x1166/0x1810 fs/9p/v9fs.c:473
   v9fs_mount+0xba/0xc90 fs/9p/vfs_super.c:126
   legacy_get_tree+0x105/0x220 fs/fs_context.c:610
   vfs_get_tree+0x89/0x2f0 fs/super.c:1530
   do_new_mount fs/namespace.c:3040 [inline]
   path_mount+0x1326/0x1e20 fs/namespace.c:3370
   do_mount fs/namespace.c:3383 [inline]
   __do_sys_mount fs/namespace.c:3591 [inline]
   __se_sys_mount fs/namespace.c:3568 [inline]
   __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568

Fixes: 62ab633 ("fscache: Implement volume registration")
Reported-by: [email protected]
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Zhang Peng <[email protected]>
Reviewed-by: Jingbo Xu <[email protected]>
cc: Dominique Martinet <[email protected]>
cc: Jeff Layton <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ # Zhang Peng's v1 fix
Link: https://lore.kernel.org/r/[email protected]/ # Zhang Peng's v2 fix
Link: https://lore.kernel.org/r/166869954095.3793579.8500020902371015443.stgit@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <[email protected]>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
	egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the vdso Makefile to use "grep -E" instead.

Cc: Andy Lutomirski <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Pull 9p fixes from Dominique Martinet:

 - 9p now uses a variable size for its recv buffer, but every place
   hadn't been updated properly to use it and some buffer overflows have
   been found and needed fixing.

   There's still one place where msize is incorrectly used in a safety
   check (p9_check_errors), but all paths leading to it should already
   be avoiding overflows and that patch took a bit more time to get
   right for zero-copy requests so I'll send it for 6.2

 - yet another race condition in p9_conn_cancel introduced by a fix for
   a syzbot report in the same place. Maybe at some point we'll get it
   right without burning it all down...

* tag '9p-for-6.1-rc7' of https://github.com/martinetd/linux:
  9p/xen: check logical size for buffer size
  9p/fd: Use P9_HDRSZ for header size
  9p/fd: Fix write overflow in p9_read_work
  9p/fd: fix issue of list_del corruption in p9_fd_cancel()
…rnel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A few fixes, all device specific.

  The most important ones are for the i.MX driver which had a couple of
  nasty data corruption inducing errors appear after the change to
  support PIO mode in the last merge window (one introduced by the
  change and one latent one which the PIO changes exposed).

  Thanks to Frieder, Fabio, Marc and Marek for jumping on that and
  resolving the issues quickly once they were found"

* tag 'spi-fix-v6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-imx: spi_imx_transfer_one(): check for DMA transfer first
  spi: tegra210-quad: Fix duplicate resource error
  spi: dw-dma: decrease reference count in dw_spi_dma_init_mfld()
  spi: spi-imx: Fix spi_bus_clk if requested clock is higher than input clock
  spi: mediatek: Fix DEVAPC Violation at KO Remove
This was found when virtual machines with nfs-mounted qcow2 disks
failed to boot properly.

Reported-by: Anders Blomdell <[email protected]>
Suggested-by: Al Viro <[email protected]>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2142132
Fixes: bfbfb61 ("nfsd_splice_actor(): handle compound pages")
Signed-off-by: Chuck Lever <[email protected]>
…rnel/git/helgaas/pci

Pull pci fixes from Bjorn Helgaas:

 - Update MAINTAINERS to add Manivannan Sadhasivam as Qcom PCIe RC
   maintainer (replacing Stanimir Varbanov) and include DT PCI bindings
   in the "PCI native host bridge and endpoint drivers" entry.

* tag 'pci-v6.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  MAINTAINERS: Include PCI bindings in host bridge entry
  MAINTAINERS: Add Manivannan Sadhasivam as Qcom PCIe RC maintainer
make_mmu_pages_available() must be called with mmu_lock held for write.
However, if the TDP MMU is used, it will be called with mmu_lock held for
read.
This function does nothing unless shadow pages are used, so there is no
race unless nested TDP is used.
Since nested TDP uses shadow pages, old shadow pages may be zapped by this
function even when the TDP MMU is enabled.
Since shadow pages are never allocated by kvm_tdp_mmu_map(), a race
condition can be avoided by not calling make_mmu_pages_available() if the
TDP MMU is currently in use.

I encountered this when repeatedly starting and stopping nested VM.
It can be artificially caused by allocating a large number of nested TDP
SPTEs.

For example, the following BUG and general protection fault are caused in
the host kernel.

pte_list_remove: 00000000cd54fc10 many->many
------------[ cut here ]------------
kernel BUG at arch/x86/kvm/mmu/mmu.c:963!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:pte_list_remove.cold+0x16/0x48 [kvm]
Call Trace:
 <TASK>
 drop_spte+0xe0/0x180 [kvm]
 mmu_page_zap_pte+0x4f/0x140 [kvm]
 __kvm_mmu_prepare_zap_page+0x62/0x3e0 [kvm]
 kvm_mmu_zap_oldest_mmu_pages+0x7d/0xf0 [kvm]
 direct_page_fault+0x3cb/0x9b0 [kvm]
 kvm_tdp_page_fault+0x2c/0xa0 [kvm]
 kvm_mmu_page_fault+0x207/0x930 [kvm]
 npf_interception+0x47/0xb0 [kvm_amd]
 svm_invoke_exit_handler+0x13c/0x1a0 [kvm_amd]
 svm_handle_exit+0xfc/0x2c0 [kvm_amd]
 kvm_arch_vcpu_ioctl_run+0xa79/0x1780 [kvm]
 kvm_vcpu_ioctl+0x29b/0x6f0 [kvm]
 __x64_sys_ioctl+0x95/0xd0
 do_syscall_64+0x5c/0x90

general protection fault, probably for non-canonical address
0xdead000000000122: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:kvm_mmu_commit_zap_page.part.0+0x4b/0xe0 [kvm]
Call Trace:
 <TASK>
 kvm_mmu_zap_oldest_mmu_pages+0xae/0xf0 [kvm]
 direct_page_fault+0x3cb/0x9b0 [kvm]
 kvm_tdp_page_fault+0x2c/0xa0 [kvm]
 kvm_mmu_page_fault+0x207/0x930 [kvm]
 npf_interception+0x47/0xb0 [kvm_amd]

CVE: CVE-2022-45869
Fixes: a2855af ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Signed-off-by: Kazuki Takiguchi <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
We shouldn't allow guests to poll on arbitrary port numbers off the end
of the event channel table.

Fixes: 1a65105 ("KVM: x86/xen: handle PV spinlocks slowpath")
[dwmw2: my bug though; the original version did check the validity as a
 side-effect of an idr_find() which I ripped out in refactoring.]
Reported-by: Michal Luczaj <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
…CPL0

There are almost no hypercalls which are valid from CPL > 0, and definitely
none which are handled by the kernel.

Fixes: 2fd6df2 ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Reported-by: Michal Luczaj <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
In the case where a GPC is refreshed to a different location within the
same page, we didn't bother to update it. Mostly we don't need to, but
since the ->khva field also includes the offset within the page, that
does have to be updated.

Fixes: 3ba2c95 ("KVM: Do not incorporate page offset into gfn=>pfn cache user address")
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Paul Durrant <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
This brings in a few important fixes for Xen emulation.
While nobody should be enabling it, the bug effectively
allows userspace to read arbitrary memory.

Signed-off-by: Paolo Bonzini <[email protected]>
commit 94eedf3 ("tracing: Fix race where eprobes can be called before
the event") fixed an issue where if an event is soft disabled, and the
trigger is being added, there's a small window where the event sees that
there's a trigger but does not see that it requires reading the event yet,
and then calls the trigger with the record == NULL.

This could be solved with adding memory barriers in the hot path, or to
make sure that all the triggers requiring a record check for NULL. The
latter was chosen.

Commit 94eedf3 set the eprobe trigger handle to check for NULL, but
the same needs to be done with histograms.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]

Cc: Tom Zanussi <[email protected]>
Cc: [email protected]
Fixes: 7491e2c ("tracing: Add a probe that attaches to trace events")
Reported-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Currently the tracing_reset_all_online_cpus() requires the
trace_types_lock held. But only one caller of this function actually has
that lock held before calling it, and the other just takes the lock so
that it can call it. More users of this function is needed where the lock
is not held.

Add a tracing_reset_all_online_cpus_unlocked() function for the one use
case that calls it without being held, and also add a lockdep_assert to
make sure it is held when called.

Then have tracing_reset_all_online_cpus() take the lock internally, such
that callers do not need to worry about taking it.

Link: https://lkml.kernel.org/r/[email protected]

Cc: Masami Hiramatsu <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
After 65536 dynamic events have been added and removed, the "type" field
of the event then uses the first type number that is available (not
currently used by other events). A type number is the identifier of the
binary blobs in the tracing ring buffer (known as events) to map them to
logic that can parse the binary blob.

The issue is that if a dynamic event (like a kprobe event) is traced and
is in the ring buffer, and then that event is removed (because it is
dynamic, which means it can be created and destroyed), if another dynamic
event is created that has the same number that new event's logic on
parsing the binary blob will be used.

To show how this can be an issue, the following can crash the kernel:

 # cd /sys/kernel/tracing
 # for i in `seq 65536`; do
     echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events
 # done

For every iteration of the above, the writing to the kprobe_events will
remove the old event and create a new one (with the same format) and
increase the type number to the next available on until the type number
reaches over 65535 which is the max number for the 16 bit type. After it
reaches that number, the logic to allocate a new number simply looks for
the next available number. When an dynamic event is removed, that number
is then available to be reused by the next dynamic event created. That is,
once the above reaches the max number, the number assigned to the event in
that loop will remain the same.

Now that means deleting one dynamic event and created another will reuse
the previous events type number. This is where bad things can happen.
After the above loop finishes, the kprobes/foo event which reads the
do_sys_openat2 function call's first parameter as an integer.

 # echo 1 > kprobes/foo/enable
 # cat /etc/passwd > /dev/null
 # cat trace
             cat-2211    [005] ....  2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
             cat-2211    [005] ....  2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
             cat-2211    [005] ....  2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
             cat-2211    [005] ....  2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
 # echo 0 > kprobes/foo/enable

Now if we delete the kprobe and create a new one that reads a string:

 # echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events

And now we can the trace:

 # cat trace
        sendmail-1942    [002] .....   530.136320: foo: (do_sys_openat2+0x0/0x240) arg1=             cat-2046    [004] .....   530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
             cat-2046    [004] .....   530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
             cat-2046    [004] .....   530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
             cat-2046    [004] .....   530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
            bash-1515    [007] .....   534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U

And dmesg has:

==================================================================
BUG: KASAN: use-after-free in string+0xd4/0x1c0
Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049

 CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ torvalds#641
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5b/0x77
  print_report+0x17f/0x47b
  kasan_report+0xad/0x130
  string+0xd4/0x1c0
  vsnprintf+0x500/0x840
  seq_buf_vprintf+0x62/0xc0
  trace_seq_printf+0x10e/0x1e0
  print_type_string+0x90/0xa0
  print_kprobe_event+0x16b/0x290
  print_trace_line+0x451/0x8e0
  s_show+0x72/0x1f0
  seq_read_iter+0x58e/0x750
  seq_read+0x115/0x160
  vfs_read+0x11d/0x460
  ksys_read+0xa9/0x130
  do_syscall_64+0x3a/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
 RIP: 0033:0x7fc2e972ade2
 Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
 RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2
 RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003
 RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000
 R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00
 R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
  </TASK>

 The buggy address belongs to the physical page:
 page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb
 flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
 raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000
 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 >ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                ^
  ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ==================================================================

This was found when Zheng Yejian sent a patch to convert the event type
number assignment to use IDA, which gives the next available number, and
this bug showed up in the fuzz testing by Yujie Liu and the kernel test
robot. But after further analysis, I found that this behavior is the same
as when the event type numbers go past the 16bit max (and the above shows
that).

As modules have a similar issue, but is dealt with by setting a
"WAS_ENABLED" flag when a module event is enabled, and when the module is
freed, if any of its events were enabled, the ring buffer that holds that
event is also cleared, to prevent reading stale events. The same can be
done for dynamic events.

If any dynamic event that is being removed was enabled, then make sure the
buffers they were enabled in are now cleared.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/

Cc: [email protected]
Cc: Andrew Morton <[email protected]>
Depends-on: e18eb87 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function")
Depends-on: 5448d44 ("tracing: Add unified dynamic event framework")
Depends-on: 6212dd2 ("tracing/kprobes: Use dyn_event framework for kprobe events")
Depends-on: 065e63f ("tracing: Only have rmmod clear buffers that its events were active in")
Depends-on: 575380d ("tracing: Only clear trace buffer on module unload if event was traced")
Fixes: 77b44d1 ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event")
Reported-by: Zheng Yejian <[email protected]>
Reported-by: Yujie Liu <[email protected]>
Reported-by: kernel test robot <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Since commit 2df8220 ("kbuild: build init/built-in.a just once"),
the .version file is not touched at all when KBUILD_BUILD_VERSION is
given.

If KBUILD_BUILD_VERSION is specified and the .version file is missing
(for example right after 'make mrproper'), "No such file or director"
is shown. Even if the .version exists, it is irrelevant to the version
of the current build.

  $ make -j$(nproc) KBUILD_BUILD_VERSION=100 mrproper defconfig all
    [ snip ]
    BUILD   arch/x86/boot/bzImage
  cat: .version: No such file or directory
  Kernel: arch/x86/boot/bzImage is ready  (#)

Show KBUILD_BUILD_VERSION if it is given.

Fixes: 2df8220 ("kbuild: build init/built-in.a just once")
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nicolas Schier <[email protected]>
Add rust argument at TAR_CONTENT in
scripts/Makefile.package script with alphabetical order.

Signed-off-by: Paran Lee <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
The documentation refers to invalid web page under www.linuxfoundation.org
The patch refers to a working URL under wiki.linuxfoundation.org

Signed-off-by: Nir Levy <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix regression in ipset hash:ip with IPv4 range, from Vishwanath Pai.
   This is fixing up a bug introduced in the 6.0 release.

2) The "netfilter: ipset: enforce documented limit to prevent allocating
   huge memory" patch contained a wrong condition which makes impossible to
   add up to 64 clashing elements to a hash:net,iface type of set while it
   is the documented feature of the set type. The patch fixes the condition
   and thus makes possible to add the elements while keeps preventing
   allocating huge memory, from Jozsef Kadlecsik. This has been broken
   for several releases.

3) Missing locking when updating the flow block list which might lead
   a reader to crash. This has been broken since the introduction of the
   flowtable hardware offload support.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: flowtable_offload: add missing locking
  netfilter: ipset: restore allowing 64 clashing elements in hash:net,iface
  netfilter: ipset: regression in ip_set_hash_ip.c
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
wangyufen316 and others added 28 commits December 1, 2022 23:55
Fix to return a negative error code from the gi2c->err instead of
0.

Fixes: d870355 ("i2c: qcom-geni: Add support for GPI DMA")
Signed-off-by: Wang Yufen <[email protected]>
Reviewed-by: Tommaso Merciai <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Recent changes to the DMA code has resulting in the IMX driver failing
I2C transfers when the buffer has been vmalloc. Only perform DMA
transfers if the message has the I2C_M_DMA_SAFE flag set, indicating
the client is providing a buffer which is DMA safe.

This is a minimal fix for stable. The I2C core provides helpers to
allocate a bounce buffer. For a fuller fix the master should make use
of these helpers.

Fixes: 4544b9f ("dma-mapping: Add vmap checks to dma_map_single()")
Signed-off-by: Andrew Lunn <[email protected]>
Acked-by: Oleksij Rempel <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
…p.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.1-2022-12-01:

amdgpu:
- VCN fix for vangogh

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
QAT devices on Intel Sapphire Rapids and Emerald Rapids have a defect in
address translation service (ATS). These devices may inadvertently issue
ATS invalidation completion before posted writes initiated with
translated address that utilized translations matching the invalidation
address range, violating the invalidation completion ordering.

This patch adds an extra device TLB invalidation for the affected devices,
it is needed to ensure no more posted writes with translated address
following the invalidation completion. Therefore, the ordering is
preserved and data-corruption is prevented.

Device TLBs are invalidated under the following six conditions:
1. Device driver does DMA API unmap IOVA
2. Device driver unbind a PASID from a process, sva_unbind_device()
3. PASID is torn down, after PASID cache is flushed. e.g. process
exit_mmap() due to crash
4. Under SVA usage, called by mmu_notifier.invalidate_range() where
VM has to free pages that were unmapped
5. userspace driver unmaps a DMA buffer
6. Cache invalidation in vSVA usage (upcoming)

For #1 and #2, device drivers are responsible for stopping DMA traffic
before unmap/unbind. For #3, iommu driver gets mmu_notifier to
invalidate TLB the same way as normal user unmap which will do an extra
invalidation. The dTLB invalidation after PASID cache flush does not
need an extra invalidation.

Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
covered by this patch due to common code path with #5.

Tested-by: Yuzhang Luo <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
As comment of pci_get_domain_bus_and_slot() says, it returns a pci device
with refcount increment, when finish using it, the caller must decrease
the reference count by calling pci_dev_put(). So call pci_dev_put() after
using the 'pdev' to avoid refcount leak.

Besides, if the 'pdev' is null or intel_svm_prq_report() returns error,
there is no need to trace this fault.

Fixes: 06f4b8d ("iommu/vt-d: Remove unnecessary SVA data accesses in page fault path")
Suggested-by: Lu Baolu <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
for_each_pci_dev() is implemented by pci_get_device(). The comment of
pci_get_device() says that it will increase the reference count for the
returned pci_dev and also decrease the reference count for the input
pci_dev @from if it is not NULL.

If we break for_each_pci_dev() loop with pdev not NULL, we need to call
pci_dev_put() to decrease the reference count. Add the missing
pci_dev_put() before 'return true' to avoid reference count leak.

Fixes: 89a6079 ("iommu/vt-d: Force IOMMU on for platform opt in hint")
Signed-off-by: Xiongfeng Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
for_each_pci_dev() is implemented by pci_get_device(). The comment of
pci_get_device() says that it will increase the reference count for the
returned pci_dev and also decrease the reference count for the input
pci_dev @from if it is not NULL.

If we break for_each_pci_dev() loop with pdev not NULL, we need to call
pci_dev_put() to decrease the reference count. Add the missing
pci_dev_put() for the error path to avoid reference count leak.

Fixes: 2e45528 ("iommu/vt-d: Unify the way to process DMAR device scope array")
Signed-off-by: Xiongfeng Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
…block-6.1

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.1

 - fix SRCU protection of nvme_ns_head list (Caleb Sander)
 - clear the prp2 field when not used (Lei Rao)"

* tag 'nvme-6.1-2022-01-02' of git://git.infradead.org/nvme:
  nvme: fix SRCU protection of nvme_ns_head list
  nvme-pci: clear the prp2 field when not used
The V4L2_MEMORY_USERPTR interface is long deprecated and shouldn't be
used (and is discouraged for any modern v4l drivers).  And Seth Jenkins
points out that the fallback to VM_PFNMAP/VM_IO is fundamentally racy
and dangerous.

Note that it's not even a case that should trigger, since any normal
user pointer logic ends up just using the pin_user_pages_fast() call
that does the proper page reference counting.  That's not the problem
case, only if you try to use special device mappings do you have any
issues.

Normally I'd just remove this during the merge window, but since Seth
pointed out the problem cases, we really want to know as soon as
possible if there are actually any users of this odd special case of a
legacy interface.  Neither Hans nor Mauro seem to think that such
mis-uses of the old legacy interface should exist.  As Mauro says:

 "See, V4L2 has actually 4 streaming APIs:
        - Kernel-allocated mmap (usually referred simply as just mmap);
        - USERPTR mmap;
        - read();
        - dmabuf;

  The USERPTR is one of the oldest way to use it, coming from V4L
  version 1 times, and by far the least used one"

And Hans chimed in on the USERPTR interface:

 "To be honest, I wouldn't mind if it goes away completely, but that's a
  bit of a pipe dream right now"

but while removing this legacy interface entirely may be a pipe dream we
can at least try to remove the unlikely (and actively broken) case of
using special device mappings for USERPTR accesses.

This replaces it with a WARN_ONCE() that we can remove once we've
hopefully confirmed that no actual users exist.

NOTE! Longer term, this means that a 'struct frame_vector' only ever
contains proper page pointers, and all the games we have with converting
them to pages can go away (grep for 'frame_vector_to_pages()' and the
uses of 'vec->is_pfns').  But this is just the first step, to verify
that this code really is all dead, and do so as quickly as possible.

Reported-by: Seth Jenkins <[email protected]>
Acked-by: Hans Verkuil <[email protected]>
Acked-by: Mauro Carvalho Chehab <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
…/scm/linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
 "15 hotfixes,  11 marked cc:stable.

  Only three or four of the latter address post-6.0 issues, which is
  hopefully a sign that things are converging"

* tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"
  Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
  drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
  mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
  mm/khugepaged: fix GUP-fast interaction by sending IPI
  mm/khugepaged: take the right locks for page table retraction
  mm: migrate: fix THP's mapcount on isolation
  mm: introduce arch_has_hw_nonleaf_pmd_young()
  mm: add dummy pmd_young() for architectures not having it
  mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
  tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
  nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
  hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
  madvise: use zap_page_range_single for madvise dontneed
  mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
…/drm

Pull drm fixes from Dave Airlie:
 "Things do seem to have finally settled down, just four i915 and one
  amdgpu this week. Probably won't have much for next week if you do
  push rc8 out.

  i915:
   - Fix dram info readout
   - Remove non-existent pipes from bigjoiner pipe mask
   - Fix negative value passed as remaining time
   - Never return 0 if not all requests retired

  amdgpu:
   - VCN fix for vangogh"

* tag 'drm-fixes-2022-12-02' of git://anongit.freedesktop.org/drm/drm:
  drm/amdgpu: enable Vangogh VCN indirect sram mode
  drm/i915: Never return 0 if not all requests retired
  drm/i915: Fix negative value passed as remaining time
  drm/i915: Remove non-existent pipes from bigjoiner pipe mask
  drm/i915/mtl: Fix dram info readout
…l/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Likely the last piece for 6.1; the only significant fixes are ASoC
  core ops fixes, while others are device-specific (rather minor) fixes
  in ASoC and FireWire drivers.

  All appear safe enough to take as a late stage material"

* tag 'sound-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: dice: fix regression for Lexicon I-ONIX FW810S
  ASoC: cs42l51: Correct PGA Volume minimum value
  ASoC: ops: Correct bounds check for second channel on SX controls
  ASoC: tlv320adc3xxx: Fix build error for implicit function declaration
  ASoC: ops: Check bounds for second channel in snd_soc_put_volsw_sx()
  ASoC: ops: Fix bounds check for _sx controls
  ASoC: fsl_micfil: explicitly clear CHnF flags
  ASoC: fsl_micfil: explicitly clear software reset bit
There is a kmemleak when test the raydium_i2c_ts with bpf mock device:

  unreferenced object 0xffff88812d3675a0 (size 8):
    comm "python3", pid 349, jiffies 4294741067 (age 95.695s)
    hex dump (first 8 bytes):
      11 0e 10 c0 01 00 04 00                          ........
    backtrace:
      [<0000000068427125>] __kmalloc+0x46/0x1b0
      [<0000000090180f91>] raydium_i2c_send+0xd4/0x2bf [raydium_i2c_ts]
      [<000000006e631aee>] raydium_i2c_initialize.cold+0xbc/0x3e4 [raydium_i2c_ts]
      [<00000000dc6fcf38>] raydium_i2c_probe+0x3cd/0x6bc [raydium_i2c_ts]
      [<00000000a310de16>] i2c_device_probe+0x651/0x680
      [<00000000f5a96bf3>] really_probe+0x17c/0x3f0
      [<00000000096ba499>] __driver_probe_device+0xe3/0x170
      [<00000000c5acb4d9>] driver_probe_device+0x49/0x120
      [<00000000264fe082>] __device_attach_driver+0xf7/0x150
      [<00000000f919423c>] bus_for_each_drv+0x114/0x180
      [<00000000e067feca>] __device_attach+0x1e5/0x2d0
      [<0000000054301fc2>] bus_probe_device+0x126/0x140
      [<00000000aad93b22>] device_add+0x810/0x1130
      [<00000000c086a53f>] i2c_new_client_device+0x352/0x4e0
      [<000000003c2c248c>] of_i2c_register_device+0xf1/0x110
      [<00000000ffec4177>] of_i2c_notify+0x100/0x160
  unreferenced object 0xffff88812d3675c8 (size 8):
    comm "python3", pid 349, jiffies 4294741070 (age 95.692s)
    hex dump (first 8 bytes):
      22 00 36 2d 81 88 ff ff                          ".6-....
    backtrace:
      [<0000000068427125>] __kmalloc+0x46/0x1b0
      [<0000000090180f91>] raydium_i2c_send+0xd4/0x2bf [raydium_i2c_ts]
      [<000000001d5c9620>] raydium_i2c_initialize.cold+0x223/0x3e4 [raydium_i2c_ts]
      [<00000000dc6fcf38>] raydium_i2c_probe+0x3cd/0x6bc [raydium_i2c_ts]
      [<00000000a310de16>] i2c_device_probe+0x651/0x680
      [<00000000f5a96bf3>] really_probe+0x17c/0x3f0
      [<00000000096ba499>] __driver_probe_device+0xe3/0x170
      [<00000000c5acb4d9>] driver_probe_device+0x49/0x120
      [<00000000264fe082>] __device_attach_driver+0xf7/0x150
      [<00000000f919423c>] bus_for_each_drv+0x114/0x180
      [<00000000e067feca>] __device_attach+0x1e5/0x2d0
      [<0000000054301fc2>] bus_probe_device+0x126/0x140
      [<00000000aad93b22>] device_add+0x810/0x1130
      [<00000000c086a53f>] i2c_new_client_device+0x352/0x4e0
      [<000000003c2c248c>] of_i2c_register_device+0xf1/0x110
      [<00000000ffec4177>] of_i2c_notify+0x100/0x160

After BANK_SWITCH command from i2c BUS, no matter success or error
happened, the tx_buf should be freed.

Fixes: 3b384bd ("Input: raydium_ts_i2c - do not split tx transactions")
Signed-off-by: Zhang Xiaoxu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
…m S3

The "force" argument to write_spec_ctrl_current() is currently ambiguous
as it does not guarantee the MSR write. This is due to the optimization
that writes to the MSR happen only when the new value differs from the
cached value.

This is fine in most cases, but breaks for S3 resume when the cached MSR
value gets out of sync with the hardware MSR value due to S3 resetting
it.

When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write
is skipped. Which results in SPEC_CTRL mitigations not getting restored.

Move the MSR write from write_spec_ctrl_current() to a new function that
unconditionally writes to the MSR. Update the callers accordingly and
rename functions.

  [ bp: Rework a bit. ]

Fixes: caa0ff2 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value")
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/806d39b0bfec2fe8f50dc5446dff20f5bb24a959.1669821572.git.pawan.kumar.gupta@linux.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
…x/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:
 "Intel VT-d fixes:

   - IO/TLB flush fix

   - Various pci_dev refcount fixes"

* tag 'iommu-fixes-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()
  iommu/vt-d: Fix PCI device refcount leak in has_external_pci()
  iommu/vt-d: Fix PCI device refcount leak in prq_event_thread()
  iommu/vt-d: Add a fix for devices need extra dtlb flush
…el/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - Fix ambiguous TRIM and DISCARD args
   - Fix removal of debugfs file for mmc_test

  MMC host:
   - mtk-sd: Add missing clk_disable_unprepare() in an error path
   - sdhci: Fix I/O voltage switch delay for UHS-I SD cards
   - sdhci-esdhc-imx: Fix CQHCI exit halt state check
   - sdhci-sprd: Fix voltage switch"

* tag 'mmc-v6.1-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-sprd: Fix no reset data and command after voltage switch
  mmc: sdhci: Fix voltage switch delay
  mmc: mtk-sd: Fix missing clk_disable_unprepare in msdc_of_clock_parse()
  mmc: mmc_test: Fix removal of debugfs file
  mmc: sdhci-esdhc-imx: correct CQHCI exit halt state check
  mmc: core: Fix ambiguous TRIM and DISCARD arg
…inux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - build fix for the NR_CPUS Kconfig SBI version dependency

 - fixes to early memory initialization, to fix page permissions in EFI
   and post-initmem-free

 - build fix for the VDSO, to avoid trying to profile the VDSO functions

 - fixes for kexec crash handling, to fix multi-core and interrupt
   related initialization inside the crash kernel

 - fix for a race condition when handling multiple concurrect kernel
   stack overflows

* tag 'riscv-for-linus-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: kexec: Fixup crash_smp_send_stop without multi cores
  riscv: kexec: Fixup irq controller broken in kexec crash path
  riscv: mm: Proper page permissions after initmem free
  riscv: vdso: fix section overlapping under some conditions
  riscv: fix race when vmap stack overflow
  riscv: Sync efi page table's kernel mappings before switching
  riscv: Fix NR_CPUS range conditions
…el/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Three driver fixes. The Intel fix looks like the most important.

   - Fix a potential divide by zero in pinctrl-singe (OMAP and
     HiSilicon)

   - Disable IRQs on startup in the Mediatek driver. This is a classic,
     we should be looking out for this more.

   - Save and restore pins in 'direct IRQ' mode in the Intel driver,
     this works around firmware bugs"

* tag 'pinctrl-v6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: intel: Save and restore pins in "direct IRQ" mode
  pinctrl: meditatek: Startup with the IRQs disabled
  pinctrl: single: Fix potential division by zero
Pull block fixes from Jens Axboe:
 "Just a small NVMe merge for this week, fixing protection of the name
  space list, and a missing clear of a reserved field when unused"

* tag 'block-6.1-2022-12-02' of git://git.kernel.dk/linux:
  nvme: fix SRCU protection of nvme_ns_head list
  nvme-pci: clear the prp2 field when not used
…ernel/git/nvdimm/nvdimm

Pull dax fixes from Dan Williams:
 "A few bug fixes around the handling of "Soft Reserved" memory and
  memory tiering information.

  Linux is starting to enounter more real world systems that deploy an
  ACPI HMAT to describe different performance classes of memory, as well
  the "special purpose" (Linux "Soft Reserved") designation from EFI.

  These fixes result from that testing.

  It has all appeared in -next for a while with no known issues.

   - Fix duplicate overlapping device-dax instances for HMAT described
     "Soft Reserved" Memory

   - Fix missing node targets in the sysfs representation of memory
     tiers

   - Remove a confusing variable initialization"

* tag 'dax-fixes-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: Fix duplicate 'hmem' device registration
  ACPI: HMAT: Fix initiator registration for single-initiator systems
  ACPI: HMAT: remove unnecessary variable initialization
…nel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "A power state fix in the core for ACPI devices, a regression fix
  regarding bus recovery for the cadence driver, a DMA handling fix for
  the imx driver, and two error path fixes (npcm7xx and qcom-geni)"

* tag 'i2c-for-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: imx: Only DMA messages with I2C_M_DMA_SAFE flag set
  i2c: qcom-geni: fix error return code in geni_i2c_gpi_xfer
  i2c: cadence: Fix regression with bus recovery
  i2c: Restore initial power state if probe fails
  i2c: npcm7xx: Fix error handling in npcm_i2c_init()
…kernel/git/dtor/input

Pull input fix from Dmitry Torokhov:

 - a fix for Raydium touchscreen driver to stop leaking memory when
   sending commands to the chip

* tag 'input-for-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send()
…l/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix oops in 32-bit BPF tail call tests

 - Add missing declaration for machine_check_early_boot()

Thanks to Christophe Leroy and Naveen N. Rao.

* tag 'powerpc-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Add missing declaration for machine_check_early_boot()
  powerpc/bpf/32: Fix Oops on tail call tests
…m/linux/kernel/git/tip/tip

Pull timer fix from Borislav Petkov:

 - Revert a fix to RISC-V timers supposed to address an uncertainty
   whether clock events are received during S3 or not which locks up
   other RISC-V platforms. The issue will be fixed differently later.

* tag 'timers_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"
…linux/kernel/git/tip/tip

Pull perf fix from Borislav Petkov:

 - Fix a use-after-free case where the perf pending task callback would
   see an already freed event

* tag 'perf_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix perf_pending_task() UaF
Currently tpm transactions are executed unconditionally in
tpm_pm_suspend() function, which may lead to races with other tpm
accessors in the system.

Specifically, the hw_random tpm driver makes use of tpm_get_random(),
and this function is called in a loop from a kthread, which means it's
not frozen alongside userspace, and so can race with the work done
during system suspend:

  tpm tpm0: tpm_transmit: tpm_recv: error -52
  tpm tpm0: invalid TPM_STS.x 0xff, dumping stack for forensics
  CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc5+ torvalds#135
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
  Call Trace:
   tpm_tis_status.cold+0x19/0x20
   tpm_transmit+0x13b/0x390
   tpm_transmit_cmd+0x20/0x80
   tpm1_pm_suspend+0xa6/0x110
   tpm_pm_suspend+0x53/0x80
   __pnp_bus_suspend+0x35/0xe0
   __device_suspend+0x10f/0x350

Fix this by calling tpm_try_get_ops(), which itself is a wrapper around
tpm_chip_start(), but takes the appropriate mutex.

Signed-off-by: Jan Dabros <[email protected]>
Reported-by: Vlastimil Babka <[email protected]>
Tested-by: Jason A. Donenfeld <[email protected]>
Tested-by: Vlastimil Babka <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected]
Fixes: e891db1 ("tpm: turn on TPM on suspend for TPM 1.x")
[Jason: reworked commit message, added metadata]
Signed-off-by: Jason A. Donenfeld <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
This reverts commit f35b5d7.

It has been reported to cause huge performance regressions on some loads
(will-it-scale.per_process_ops, but also building the kernel with
clang).

The commit did speed up gcc builds by a small amount, so it's not an
unambiguous regression, but until the big regressions are understood,
let's revert it.

Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Nathan Chancellor <[email protected]>
Link: https://lore.kernel.org/lkml/Y1DNQaoPWxE%[email protected]/
Cc: Huang, Ying <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
fahhem added a commit that referenced this pull request Apr 19, 2023
commit 76dcd734eca23168cb008912c0f69ff408905235
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 14:48:12 2022 -0800

    Linux 6.1-rc8

commit 0ba09b1733878afe838fe35c310715fda3d46428
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 12:51:59 2022 -0800

    Revert "mm: align larger anonymous mappings on THP boundaries"

    This reverts commit f35b5d7d676e59e401690b678cd3cfec5e785c23.

    It has been reported to cause huge performance regressions on some loads
    (will-it-scale.per_process_ops, but also building the kernel with
    clang).

    The commit did speed up gcc builds by a small amount, so it's not an
    unambiguous regression, but until the big regressions are understood,
    let's revert it.

    Reported-by: kernel test robot <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reported-by: Nathan Chancellor <[email protected]>
    Link: https://lore.kernel.org/lkml/Y1DNQaoPWxE%[email protected]/
    Cc: Huang, Ying <[email protected]>
    Cc: Rik van Riel <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: Yang Shi <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 23393c6461422df5bf8084a086ada9a7e17dc2ba
Author: Jan Dabros <[email protected]>
Date:   Mon Nov 28 20:56:51 2022 +0100

    char: tpm: Protect tpm_pm_suspend with locks

    Currently tpm transactions are executed unconditionally in
    tpm_pm_suspend() function, which may lead to races with other tpm
    accessors in the system.

    Specifically, the hw_random tpm driver makes use of tpm_get_random(),
    and this function is called in a loop from a kthread, which means it's
    not frozen alongside userspace, and so can race with the work done
    during system suspend:

      tpm tpm0: tpm_transmit: tpm_recv: error -52
      tpm tpm0: invalid TPM_STS.x 0xff, dumping stack for forensics
      CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc5+ #135
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
      Call Trace:
       tpm_tis_status.cold+0x19/0x20
       tpm_transmit+0x13b/0x390
       tpm_transmit_cmd+0x20/0x80
       tpm1_pm_suspend+0xa6/0x110
       tpm_pm_suspend+0x53/0x80
       __pnp_bus_suspend+0x35/0xe0
       __device_suspend+0x10f/0x350

    Fix this by calling tpm_try_get_ops(), which itself is a wrapper around
    tpm_chip_start(), but takes the appropriate mutex.

    Signed-off-by: Jan Dabros <[email protected]>
    Reported-by: Vlastimil Babka <[email protected]>
    Tested-by: Jason A. Donenfeld <[email protected]>
    Tested-by: Vlastimil Babka <[email protected]>
    Link: https://lore.kernel.org/all/[email protected]/
    Cc: [email protected]
    Fixes: e891db1a18bf ("tpm: turn on TPM on suspend for TPM 1.x")
    [Jason: reworked commit message, added metadata]
    Signed-off-by: Jason A. Donenfeld <[email protected]>
    Reviewed-by: Jarkko Sakkinen <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 0c3b5bcb484a659dd14466f92a073b57b2d3c1a5
Merge: eea8bebd5173 517e6a301f34
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 12:36:23 2022 -0800

    Merge tag 'perf_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull perf fix from Borislav Petkov:

     - Fix a use-after-free case where the perf pending task callback would
       see an already freed event

    * tag 'perf_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      perf: Fix perf_pending_task() UaF

commit eea8bebd51739cc7a3bb501032ee877b4aada553
Merge: ae6bb7171171 d9f15a9de44a
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 12:33:44 2022 -0800

    Merge tag 'timers_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull timer fix from Borislav Petkov:

     - Revert a fix to RISC-V timers supposed to address an uncertainty
       whether clock events are received during S3 or not which locks up
       other RISC-V platforms. The issue will be fixed differently later.

    * tag 'timers_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"

commit ae6bb71711713e7b6b2d9ed2af7318dc8ae5cfb2
Merge: 50f36c5aa12c 2e7ec190a0e3
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 12:24:58 2022 -0800

    Merge tag 'powerpc-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

    Pull powerpc fixes from Michael Ellerman:

     - Fix oops in 32-bit BPF tail call tests

     - Add missing declaration for machine_check_early_boot()

    Thanks to Christophe Leroy and Naveen N. Rao.

    * tag 'powerpc-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
      powerpc/64s: Add missing declaration for machine_check_early_boot()
      powerpc/bpf/32: Fix Oops on tail call tests

commit 50f36c5aa12c8d0f2adca662e0c106212ea897a8
Merge: c2bf05db6c78 8c9a59939deb
Author: Linus Torvalds <[email protected]>
Date:   Sun Dec 4 12:18:37 2022 -0800

    Merge tag 'input-for-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

    Pull input fix from Dmitry Torokhov:

     - a fix for Raydium touchscreen driver to stop leaking memory when
       sending commands to the chip

    * tag 'input-for-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
      Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send()

commit c2bf05db6c78f53ca5cd4b48f3b9b71f78d215f1
Merge: 6085bc95797c d36678f7905c
Author: Linus Torvalds <[email protected]>
Date:   Sat Dec 3 13:51:37 2022 -0800

    Merge tag 'i2c-for-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

    Pull i2c fixes from Wolfram Sang:
     "A power state fix in the core for ACPI devices, a regression fix
      regarding bus recovery for the cadence driver, a DMA handling fix for
      the imx driver, and two error path fixes (npcm7xx and qcom-geni)"

    * tag 'i2c-for-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
      i2c: imx: Only DMA messages with I2C_M_DMA_SAFE flag set
      i2c: qcom-geni: fix error return code in geni_i2c_gpi_xfer
      i2c: cadence: Fix regression with bus recovery
      i2c: Restore initial power state if probe fails
      i2c: npcm7xx: Fix error handling in npcm_i2c_init()

commit 6085bc95797caa55a68bc0f7dd73e8c33e91037f
Merge: 97ee9d1c1696 472faf72b33d
Author: Linus Torvalds <[email protected]>
Date:   Sat Dec 3 13:43:38 2022 -0800

    Merge tag 'dax-fixes-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

    Pull dax fixes from Dan Williams:
     "A few bug fixes around the handling of "Soft Reserved" memory and
      memory tiering information.

      Linux is starting to enounter more real world systems that deploy an
      ACPI HMAT to describe different performance classes of memory, as well
      the "special purpose" (Linux "Soft Reserved") designation from EFI.

      These fixes result from that testing.

      It has all appeared in -next for a while with no known issues.

       - Fix duplicate overlapping device-dax instances for HMAT described
         "Soft Reserved" Memory

       - Fix missing node targets in the sysfs representation of memory
         tiers

       - Remove a confusing variable initialization"

    * tag 'dax-fixes-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
      device-dax: Fix duplicate 'hmem' device registration
      ACPI: HMAT: Fix initiator registration for single-initiator systems
      ACPI: HMAT: remove unnecessary variable initialization

commit 97ee9d1c16963375eefdf964c429897d27e28956
Merge: 63050a5ca130 d0f411c0b9bd
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 16:27:15 2022 -0800

    Merge tag 'block-6.1-2022-12-02' of git://git.kernel.dk/linux

    Pull block fixes from Jens Axboe:
     "Just a small NVMe merge for this week, fixing protection of the name
      space list, and a missing clear of a reserved field when unused"

    * tag 'block-6.1-2022-12-02' of git://git.kernel.dk/linux:
      nvme: fix SRCU protection of nvme_ns_head list
      nvme-pci: clear the prp2 field when not used

commit 63050a5ca130e76af7199c9a3fba1d175f3a1102
Merge: 0e15c3c75a28 6989ea4881c8
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 16:22:17 2022 -0800

    Merge tag 'pinctrl-v6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

    Pull pin control fixes from Linus Walleij:
     "Three driver fixes. The Intel fix looks like the most important.

       - Fix a potential divide by zero in pinctrl-singe (OMAP and
         HiSilicon)

       - Disable IRQs on startup in the Mediatek driver. This is a classic,
         we should be looking out for this more.

       - Save and restore pins in 'direct IRQ' mode in the Intel driver,
         this works around firmware bugs"

    * tag 'pinctrl-v6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
      pinctrl: intel: Save and restore pins in "direct IRQ" mode
      pinctrl: meditatek: Startup with the IRQs disabled
      pinctrl: single: Fix potential division by zero

commit 0e15c3c75a28b10bac7b3ad7627fd6b458623283
Merge: 2df2adc3e69d 39cefc5f6cd2
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 16:04:53 2022 -0800

    Merge tag 'riscv-for-linus-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

    Pull RISC-V fixes from Palmer Dabbelt:

     - build fix for the NR_CPUS Kconfig SBI version dependency

     - fixes to early memory initialization, to fix page permissions in EFI
       and post-initmem-free

     - build fix for the VDSO, to avoid trying to profile the VDSO functions

     - fixes for kexec crash handling, to fix multi-core and interrupt
       related initialization inside the crash kernel

     - fix for a race condition when handling multiple concurrect kernel
       stack overflows

    * tag 'riscv-for-linus-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
      riscv: kexec: Fixup crash_smp_send_stop without multi cores
      riscv: kexec: Fixup irq controller broken in kexec crash path
      riscv: mm: Proper page permissions after initmem free
      riscv: vdso: fix section overlapping under some conditions
      riscv: fix race when vmap stack overflow
      riscv: Sync efi page table's kernel mappings before switching
      riscv: Fix NR_CPUS range conditions

commit 2df2adc3e69d32751eb534ed55c591854260c4a3
Merge: f66f62f83dba dd30dcfa7a74
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 15:58:07 2022 -0800

    Merge tag 'mmc-v6.1-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

    Pull MMC fixes from Ulf Hansson:
     "MMC core:
       - Fix ambiguous TRIM and DISCARD args
       - Fix removal of debugfs file for mmc_test

      MMC host:
       - mtk-sd: Add missing clk_disable_unprepare() in an error path
       - sdhci: Fix I/O voltage switch delay for UHS-I SD cards
       - sdhci-esdhc-imx: Fix CQHCI exit halt state check
       - sdhci-sprd: Fix voltage switch"

    * tag 'mmc-v6.1-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
      mmc: sdhci-sprd: Fix no reset data and command after voltage switch
      mmc: sdhci: Fix voltage switch delay
      mmc: mtk-sd: Fix missing clk_disable_unprepare in msdc_of_clock_parse()
      mmc: mmc_test: Fix removal of debugfs file
      mmc: sdhci-esdhc-imx: correct CQHCI exit halt state check
      mmc: core: Fix ambiguous TRIM and DISCARD arg

commit f66f62f83dba3ec4237c3dd7a95b776824ac91a0
Merge: 66065157420c 4bedbbd782eb
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 15:54:12 2022 -0800

    Merge tag 'iommu-fixes-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

    Pull iommu fixes from Joerg Roedel:
     "Intel VT-d fixes:

       - IO/TLB flush fix

       - Various pci_dev refcount fixes"

    * tag 'iommu-fixes-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
      iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()
      iommu/vt-d: Fix PCI device refcount leak in has_external_pci()
      iommu/vt-d: Fix PCI device refcount leak in prq_event_thread()
      iommu/vt-d: Add a fix for devices need extra dtlb flush

commit 66065157420c5b9b3f078f43d313c153e1ff7f83
Author: Pawan Gupta <[email protected]>
Date:   Wed Nov 30 07:25:51 2022 -0800

    x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3

    The "force" argument to write_spec_ctrl_current() is currently ambiguous
    as it does not guarantee the MSR write. This is due to the optimization
    that writes to the MSR happen only when the new value differs from the
    cached value.

    This is fine in most cases, but breaks for S3 resume when the cached MSR
    value gets out of sync with the hardware MSR value due to S3 resetting
    it.

    When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write
    is skipped. Which results in SPEC_CTRL mitigations not getting restored.

    Move the MSR write from write_spec_ctrl_current() to a new function that
    unconditionally writes to the MSR. Update the callers accordingly and
    rename functions.

      [ bp: Rework a bit. ]

    Fixes: caa0ff24d5d0 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value")
    Suggested-by: Borislav Petkov <[email protected]>
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Reviewed-by: Thomas Gleixner <[email protected]>
    Cc: <[email protected]>
    Link: https://lore.kernel.org/r/806d39b0bfec2fe8f50dc5446dff20f5bb24a959.1669821572.git.pawan.kumar.gupta@linux.intel.com
    Signed-off-by: Linus Torvalds <[email protected]>

commit 8c9a59939deb4bfafdc451100c03d1e848b4169b
Author: Zhang Xiaoxu <[email protected]>
Date:   Fri Dec 2 15:37:46 2022 -0800

    Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send()

    There is a kmemleak when test the raydium_i2c_ts with bpf mock device:

      unreferenced object 0xffff88812d3675a0 (size 8):
        comm "python3", pid 349, jiffies 4294741067 (age 95.695s)
        hex dump (first 8 bytes):
          11 0e 10 c0 01 00 04 00                          ........
        backtrace:
          [<0000000068427125>] __kmalloc+0x46/0x1b0
          [<0000000090180f91>] raydium_i2c_send+0xd4/0x2bf [raydium_i2c_ts]
          [<000000006e631aee>] raydium_i2c_initialize.cold+0xbc/0x3e4 [raydium_i2c_ts]
          [<00000000dc6fcf38>] raydium_i2c_probe+0x3cd/0x6bc [raydium_i2c_ts]
          [<00000000a310de16>] i2c_device_probe+0x651/0x680
          [<00000000f5a96bf3>] really_probe+0x17c/0x3f0
          [<00000000096ba499>] __driver_probe_device+0xe3/0x170
          [<00000000c5acb4d9>] driver_probe_device+0x49/0x120
          [<00000000264fe082>] __device_attach_driver+0xf7/0x150
          [<00000000f919423c>] bus_for_each_drv+0x114/0x180
          [<00000000e067feca>] __device_attach+0x1e5/0x2d0
          [<0000000054301fc2>] bus_probe_device+0x126/0x140
          [<00000000aad93b22>] device_add+0x810/0x1130
          [<00000000c086a53f>] i2c_new_client_device+0x352/0x4e0
          [<000000003c2c248c>] of_i2c_register_device+0xf1/0x110
          [<00000000ffec4177>] of_i2c_notify+0x100/0x160
      unreferenced object 0xffff88812d3675c8 (size 8):
        comm "python3", pid 349, jiffies 4294741070 (age 95.692s)
        hex dump (first 8 bytes):
          22 00 36 2d 81 88 ff ff                          ".6-....
        backtrace:
          [<0000000068427125>] __kmalloc+0x46/0x1b0
          [<0000000090180f91>] raydium_i2c_send+0xd4/0x2bf [raydium_i2c_ts]
          [<000000001d5c9620>] raydium_i2c_initialize.cold+0x223/0x3e4 [raydium_i2c_ts]
          [<00000000dc6fcf38>] raydium_i2c_probe+0x3cd/0x6bc [raydium_i2c_ts]
          [<00000000a310de16>] i2c_device_probe+0x651/0x680
          [<00000000f5a96bf3>] really_probe+0x17c/0x3f0
          [<00000000096ba499>] __driver_probe_device+0xe3/0x170
          [<00000000c5acb4d9>] driver_probe_device+0x49/0x120
          [<00000000264fe082>] __device_attach_driver+0xf7/0x150
          [<00000000f919423c>] bus_for_each_drv+0x114/0x180
          [<00000000e067feca>] __device_attach+0x1e5/0x2d0
          [<0000000054301fc2>] bus_probe_device+0x126/0x140
          [<00000000aad93b22>] device_add+0x810/0x1130
          [<00000000c086a53f>] i2c_new_client_device+0x352/0x4e0
          [<000000003c2c248c>] of_i2c_register_device+0xf1/0x110
          [<00000000ffec4177>] of_i2c_notify+0x100/0x160

    After BANK_SWITCH command from i2c BUS, no matter success or error
    happened, the tx_buf should be freed.

    Fixes: 3b384bd6c3f2 ("Input: raydium_ts_i2c - do not split tx transactions")
    Signed-off-by: Zhang Xiaoxu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Cc: [email protected]
    Signed-off-by: Dmitry Torokhov <[email protected]>

commit a1e9185d20b56af04022d2e656802254f4ea47eb
Merge: c290db013742 b47068b4aa53
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 15:40:35 2022 -0800

    Merge tag 'sound-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

    Pull sound fixes from Takashi Iwai:
     "Likely the last piece for 6.1; the only significant fixes are ASoC
      core ops fixes, while others are device-specific (rather minor) fixes
      in ASoC and FireWire drivers.

      All appear safe enough to take as a late stage material"

    * tag 'sound-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
      ALSA: dice: fix regression for Lexicon I-ONIX FW810S
      ASoC: cs42l51: Correct PGA Volume minimum value
      ASoC: ops: Correct bounds check for second channel on SX controls
      ASoC: tlv320adc3xxx: Fix build error for implicit function declaration
      ASoC: ops: Check bounds for second channel in snd_soc_put_volsw_sx()
      ASoC: ops: Fix bounds check for _sx controls
      ASoC: fsl_micfil: explicitly clear CHnF flags
      ASoC: fsl_micfil: explicitly clear software reset bit

commit c290db013742e98fe5b64073bc2dd8c8a2ac9e4c
Merge: bdaa78c6aa86 c082fbd687ad
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 15:35:21 2022 -0800

    Merge tag 'drm-fixes-2022-12-02' of git://anongit.freedesktop.org/drm/drm

    Pull drm fixes from Dave Airlie:
     "Things do seem to have finally settled down, just four i915 and one
      amdgpu this week. Probably won't have much for next week if you do
      push rc8 out.

      i915:
       - Fix dram info readout
       - Remove non-existent pipes from bigjoiner pipe mask
       - Fix negative value passed as remaining time
       - Never return 0 if not all requests retired

      amdgpu:
       - VCN fix for vangogh"

    * tag 'drm-fixes-2022-12-02' of git://anongit.freedesktop.org/drm/drm:
      drm/amdgpu: enable Vangogh VCN indirect sram mode
      drm/i915: Never return 0 if not all requests retired
      drm/i915: Fix negative value passed as remaining time
      drm/i915: Remove non-existent pipes from bigjoiner pipe mask
      drm/i915/mtl: Fix dram info readout

commit bdaa78c6aa861f0e8c612a0b2272423d92f0071c
Merge: 6647e76ab623 1d351f189434
Author: Linus Torvalds <[email protected]>
Date:   Fri Dec 2 13:39:38 2022 -0800

    Merge tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

    Pull misc hotfixes from Andrew Morton:
     "15 hotfixes,  11 marked cc:stable.

      Only three or four of the latter address post-6.0 issues, which is
      hopefully a sign that things are converging"

    * tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
      revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"
      Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
      drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
      mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
      mm/khugepaged: fix GUP-fast interaction by sending IPI
      mm/khugepaged: take the right locks for page table retraction
      mm: migrate: fix THP's mapcount on isolation
      mm: introduce arch_has_hw_nonleaf_pmd_young()
      mm: add dummy pmd_young() for architectures not having it
      mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
      tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
      nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
      hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
      madvise: use zap_page_range_single for madvise dontneed
      mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE

commit 6647e76ab623b2b3fb2efe03a86e9c9046c52c33
Author: Linus Torvalds <[email protected]>
Date:   Wed Nov 30 16:10:52 2022 -0800

    v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() fails

    The V4L2_MEMORY_USERPTR interface is long deprecated and shouldn't be
    used (and is discouraged for any modern v4l drivers).  And Seth Jenkins
    points out that the fallback to VM_PFNMAP/VM_IO is fundamentally racy
    and dangerous.

    Note that it's not even a case that should trigger, since any normal
    user pointer logic ends up just using the pin_user_pages_fast() call
    that does the proper page reference counting.  That's not the problem
    case, only if you try to use special device mappings do you have any
    issues.

    Normally I'd just remove this during the merge window, but since Seth
    pointed out the problem cases, we really want to know as soon as
    possible if there are actually any users of this odd special case of a
    legacy interface.  Neither Hans nor Mauro seem to think that such
    mis-uses of the old legacy interface should exist.  As Mauro says:

     "See, V4L2 has actually 4 streaming APIs:
            - Kernel-allocated mmap (usually referred simply as just mmap);
            - USERPTR mmap;
            - read();
            - dmabuf;

      The USERPTR is one of the oldest way to use it, coming from V4L
      version 1 times, and by far the least used one"

    And Hans chimed in on the USERPTR interface:

     "To be honest, I wouldn't mind if it goes away completely, but that's a
      bit of a pipe dream right now"

    but while removing this legacy interface entirely may be a pipe dream we
    can at least try to remove the unlikely (and actively broken) case of
    using special device mappings for USERPTR accesses.

    This replaces it with a WARN_ONCE() that we can remove once we've
    hopefully confirmed that no actual users exist.

    NOTE! Longer term, this means that a 'struct frame_vector' only ever
    contains proper page pointers, and all the games we have with converting
    them to pages can go away (grep for 'frame_vector_to_pages()' and the
    uses of 'vec->is_pfns').  But this is just the first step, to verify
    that this code really is all dead, and do so as quickly as possible.

    Reported-by: Seth Jenkins <[email protected]>
    Acked-by: Hans Verkuil <[email protected]>
    Acked-by: Mauro Carvalho Chehab <[email protected]>
    Cc: David Hildenbrand <[email protected]>
    Cc: Jan Kara <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit d0f411c0b9bdef85f647e15a2fcc790b29891f2c
Merge: 7d4a93176e01 899d2a05dc14
Author: Jens Axboe <[email protected]>
Date:   Fri Dec 2 08:01:06 2022 -0700

    Merge tag 'nvme-6.1-2022-01-02' of git://git.infradead.org/nvme into block-6.1

    Pull NVMe fixes from Christoph:

    "nvme fixes for Linux 6.1

     - fix SRCU protection of nvme_ns_head list (Caleb Sander)
     - clear the prp2 field when not used (Lei Rao)"

    * tag 'nvme-6.1-2022-01-02' of git://git.infradead.org/nvme:
      nvme: fix SRCU protection of nvme_ns_head list
      nvme-pci: clear the prp2 field when not used

commit 4bedbbd782ebbe7287231fea862c158d4f08a9e3
Author: Xiongfeng Wang <[email protected]>
Date:   Thu Dec 1 12:01:27 2022 +0800

    iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()

    for_each_pci_dev() is implemented by pci_get_device(). The comment of
    pci_get_device() says that it will increase the reference count for the
    returned pci_dev and also decrease the reference count for the input
    pci_dev @from if it is not NULL.

    If we break for_each_pci_dev() loop with pdev not NULL, we need to call
    pci_dev_put() to decrease the reference count. Add the missing
    pci_dev_put() for the error path to avoid reference count leak.

    Fixes: 2e4552893038 ("iommu/vt-d: Unify the way to process DMAR device scope array")
    Signed-off-by: Xiongfeng Wang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Signed-off-by: Joerg Roedel <[email protected]>

commit afca9e19cc720bfafc75dc5ce429c185ca93f31d
Author: Xiongfeng Wang <[email protected]>
Date:   Thu Dec 1 12:01:26 2022 +0800

    iommu/vt-d: Fix PCI device refcount leak in has_external_pci()

    for_each_pci_dev() is implemented by pci_get_device(). The comment of
    pci_get_device() says that it will increase the reference count for the
    returned pci_dev and also decrease the reference count for the input
    pci_dev @from if it is not NULL.

    If we break for_each_pci_dev() loop with pdev not NULL, we need to call
    pci_dev_put() to decrease the reference count. Add the missing
    pci_dev_put() before 'return true' to avoid reference count leak.

    Fixes: 89a6079df791 ("iommu/vt-d: Force IOMMU on for platform opt in hint")
    Signed-off-by: Xiongfeng Wang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Signed-off-by: Joerg Roedel <[email protected]>

commit 6927d352380797ddbee18631491ec428741696e2
Author: Yang Yingliang <[email protected]>
Date:   Thu Dec 1 12:01:25 2022 +0800

    iommu/vt-d: Fix PCI device refcount leak in prq_event_thread()

    As comment of pci_get_domain_bus_and_slot() says, it returns a pci device
    with refcount increment, when finish using it, the caller must decrease
    the reference count by calling pci_dev_put(). So call pci_dev_put() after
    using the 'pdev' to avoid refcount leak.

    Besides, if the 'pdev' is null or intel_svm_prq_report() returns error,
    there is no need to trace this fault.

    Fixes: 06f4b8d09dba ("iommu/vt-d: Remove unnecessary SVA data accesses in page fault path")
    Suggested-by: Lu Baolu <[email protected]>
    Signed-off-by: Yang Yingliang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Signed-off-by: Joerg Roedel <[email protected]>

commit e65a6897be5e4939d477c4969a05e12d90b08409
Author: Jacob Pan <[email protected]>
Date:   Thu Dec 1 12:01:24 2022 +0800

    iommu/vt-d: Add a fix for devices need extra dtlb flush

    QAT devices on Intel Sapphire Rapids and Emerald Rapids have a defect in
    address translation service (ATS). These devices may inadvertently issue
    ATS invalidation completion before posted writes initiated with
    translated address that utilized translations matching the invalidation
    address range, violating the invalidation completion ordering.

    This patch adds an extra device TLB invalidation for the affected devices,
    it is needed to ensure no more posted writes with translated address
    following the invalidation completion. Therefore, the ordering is
    preserved and data-corruption is prevented.

    Device TLBs are invalidated under the following six conditions:
    1. Device driver does DMA API unmap IOVA
    2. Device driver unbind a PASID from a process, sva_unbind_device()
    3. PASID is torn down, after PASID cache is flushed. e.g. process
    exit_mmap() due to crash
    4. Under SVA usage, called by mmu_notifier.invalidate_range() where
    VM has to free pages that were unmapped
    5. userspace driver unmaps a DMA buffer
    6. Cache invalidation in vSVA usage (upcoming)

    For #1 and #2, device drivers are responsible for stopping DMA traffic
    before unmap/unbind. For #3, iommu driver gets mmu_notifier to
    invalidate TLB the same way as normal user unmap which will do an extra
    invalidation. The dTLB invalidation after PASID cache flush does not
    need an extra invalidation.

    Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
    covered by this patch due to common code path with #5.

    Tested-by: Yuzhang Luo <[email protected]>
    Reviewed-by: Ashok Raj <[email protected]>
    Reviewed-by: Kevin Tian <[email protected]>
    Signed-off-by: Jacob Pan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Signed-off-by: Joerg Roedel <[email protected]>

commit c082fbd687ad70a92e0a8be486a7555a66f03079
Merge: 65a388250e39 9a8cc8cabc1e
Author: Dave Airlie <[email protected]>
Date:   Fri Dec 2 09:12:46 2022 +1000

    Merge tag 'amd-drm-fixes-6.1-2022-12-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

    amd-drm-fixes-6.1-2022-12-01:

    amdgpu:
    - VCN fix for vangogh

    Signed-off-by: Dave Airlie <[email protected]>
    From: Alex Deucher <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

commit d36678f7905cbd1dc55a8a96e066dafd749d4600
Author: Andrew Lunn <[email protected]>
Date:   Thu Nov 10 00:59:02 2022 +0100

    i2c: imx: Only DMA messages with I2C_M_DMA_SAFE flag set

    Recent changes to the DMA code has resulting in the IMX driver failing
    I2C transfers when the buffer has been vmalloc. Only perform DMA
    transfers if the message has the I2C_M_DMA_SAFE flag set, indicating
    the client is providing a buffer which is DMA safe.

    This is a minimal fix for stable. The I2C core provides helpers to
    allocate a bounce buffer. For a fuller fix the master should make use
    of these helpers.

    Fixes: 4544b9f25e70 ("dma-mapping: Add vmap checks to dma_map_single()")
    Signed-off-by: Andrew Lunn <[email protected]>
    Acked-by: Oleksij Rempel <[email protected]>
    Signed-off-by: Wolfram Sang <[email protected]>

commit 7d8ccf4f117d082156e842d959f634efcf203cef
Author: Wang Yufen <[email protected]>
Date:   Mon Nov 21 18:17:52 2022 +0800

    i2c: qcom-geni: fix error return code in geni_i2c_gpi_xfer

    Fix to return a negative error code from the gi2c->err instead of
    0.

    Fixes: d8703554f4de ("i2c: qcom-geni: Add support for GPI DMA")
    Signed-off-by: Wang Yufen <[email protected]>
    Reviewed-by: Tommaso Merciai <[email protected]>
    Signed-off-by: Wolfram Sang <[email protected]>

commit 8bfd4ec726945cec35ee5cafebc1c55b83d5a9fb
Author: Carsten Haitzler <[email protected]>
Date:   Mon Nov 28 10:51:58 2022 +0000

    i2c: cadence: Fix regression with bus recovery

    Commit "i2c: cadence: Add standard bus recovery support" breaks for i2c
    devices that have no pinctrl defined. There is no requirement for this
    to exist in the DT. This has worked perfectly well without this before in
    at least 1 real usage case on hardware (Mali Komeda DPU, Cadence i2c to
    talk to a tda99xx phy). Adding the requirement to have pinctrl set up in
    the device tree (or otherwise be found) is a regression where the whole
    i2c device is lost entirely (in this case dropping entire devices which
    then leads to the drm display stack unable to find the phy for display
    output, thus having no drm display device and so on down the chain).

    This converts the above commit to an enhancement if pinctrl can be found
    for the i2c device, providing a timeout on read with recovery, but if not,
    do what used to be done rather than a fatal loss of a device.

    This restores the mentioned display devices to their working state again.

    Fixes: 58b924241d0a ("i2c: cadence: Add standard bus recovery support")
    Signed-off-by: Carsten Haitzler <[email protected]>
    Reviewed-by: Shubhrajyoti Datta <[email protected]>
    Reviewed-by: Michael Grzeschik <[email protected]>
    Acked-by: Michal Simek <[email protected]>
    [wsa: added braces to else-branch]
    Signed-off-by: Wolfram Sang <[email protected]>

commit 65a388250e391b7127efd6751f64f3e4955e43a0
Merge: b7b275e60bcd 12b8b046e4c9
Author: Dave Airlie <[email protected]>
Date:   Fri Dec 2 07:34:27 2022 +1000

    Merge tag 'drm-intel-fixes-2022-12-01' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

    - Fix dram info readout (Radhakrishna Sripada)
    - Remove non-existent pipes from bigjoiner pipe mask (Ville Syrjälä)
    - Fix negative value passed as remaining time (Janusz Krzysztofik)
    - Never return 0 if not all requests retired (Janusz Krzysztofik)

    Signed-off-by: Dave Airlie <[email protected]>
    From: Tvrtko Ursulin <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/Y4hp+a3TJ13t2ZA1@tursulin-desk

commit a4412fdd49dc011bcc2c0d81ac4cab7457092650
Author: Steven Rostedt (Google) <[email protected]>
Date:   Mon Nov 21 10:44:03 2022 -0500

    error-injection: Add prompt for function error injection

    The config to be able to inject error codes into any function annotated
    with ALLOW_ERROR_INJECTION() is enabled when FUNCTION_ERROR_INJECTION is
    enabled.  But unfortunately, this is always enabled on x86 when KPROBES
    is enabled, and there's no way to turn it off.

    As kprobes is useful for observability of the kernel, it is useful to
    have it enabled in production environments.  But error injection should
    be avoided.  Add a prompt to the config to allow it to be disabled even
    when kprobes is enabled, and get rid of the "def_bool y".

    This is a kernel debug feature (it's in Kconfig.debug), and should have
    never been something enabled by default.

    Cc: [email protected]
    Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe")
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 9a8cc8cabc1e351614fd7f9e774757a5143b6fe8
Author: Leo Liu <[email protected]>
Date:   Tue Nov 29 18:53:18 2022 -0500

    drm/amdgpu: enable Vangogh VCN indirect sram mode

    So that uses PSP to initialize HW.

    Fixes: 0c2c02b66c672e ("drm/amdgpu/vcn: add firmware support for dimgrey_cavefish")
    Signed-off-by: Leo Liu <[email protected]>
    Reviewed-by: James Zhu <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: [email protected]

commit 39cefc5f6cd25d555e0455b24810e9aff365b8d6
Merge: d556a9aeb62a 7e1864332fbc
Author: Palmer Dabbelt <[email protected]>
Date:   Thu Dec 1 11:38:39 2022 -0800

    RISC-V: Fix a race condition during kernel stack overflow

    This fixes a concrete bug but is also the basis for some cleanup work,
    so I'm merging it based on the offending commit in order to minimize
    future conflicts.

    * commit '7e1864332fbc1b993659eab7974da9fe8bf8c128':
      riscv: fix race when vmap stack overflow

commit 355479c70a489415ef6e2624e514f8f15a40861b
Merge: e214dd935bf1 7572ac3c979d
Author: Linus Torvalds <[email protected]>
Date:   Thu Dec 1 11:25:11 2022 -0800

    Merge tag 'efi-fixes-for-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

    Pull EFI fix from Ard Biesheuvel:
     "A single revert for some code that I added during this cycle. The code
      is not wrong, but it should be a bit more careful about how to handle
      the shadow call stack pointer, so it is better to revert it for now
      and bring it back later in improved form.

      Summary:

       - Revert runtime service sync exception recovery on arm64"

    * tag 'efi-fixes-for-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
      arm64: efi: Revert "Recover from synchronous exceptions ..."

commit e214dd935bf154af4c2d5013b16a283d592ccea5
Merge: 063c0e773ab3 9bdc112be727
Author: Linus Torvalds <[email protected]>
Date:   Thu Dec 1 11:10:25 2022 -0800

    Merge tag 'hwmon-for-v6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

    Pull hwmon fixes from Guenter Roeck:

     - Fix refcount leak and NULL pointer access in coretemp driver

     - Fix UAF in ibmpex driver

     - Add missing pci_disable_device() to i5500_temp driver

     - Fix calculation in ina3221 driver

     - Fix temperature scaling in ltc2947 driver

     - Check result of devm_kcalloc call in asus-ec-sensors driver

    * tag 'hwmon-for-v6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
      hwmon: (asus-ec-sensors) Add checks for devm_kcalloc
      hwmon: (coretemp) fix pci device refcount leak in nv1a_ram_new()
      hwmon: (coretemp) Check for null before removing sysfs attrs
      hwmon: (ibmpex) Fix possible UAF when ibmpex_register_bmc() fails
      hwmon: (i5500_temp) fix missing pci_disable_device()
      hwmon: (ina3221) Fix shunt sum critical calculation
      hwmon: (ltc2947) fix temperature scaling

commit 9bdc112be727cf1ba65be79541147f960c3349d8
Author: Yuan Can <[email protected]>
Date:   Fri Nov 25 01:43:29 2022 +0000

    hwmon: (asus-ec-sensors) Add checks for devm_kcalloc

    As the devm_kcalloc may return NULL, the return value needs to be checked
    to avoid NULL poineter dereference.

    Fixes: d0ddfd241e57 ("hwmon: (asus-ec-sensors) add driver for ASUS EC")
    Signed-off-by: Yuan Can <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Guenter Roeck <[email protected]>

commit 7dec14537c5906b8bf40fd6fd6d9c3850f8df11d
Author: Yang Yingliang <[email protected]>
Date:   Fri Nov 18 17:33:03 2022 +0800

    hwmon: (coretemp) fix pci device refcount leak in nv1a_ram_new()

    As comment of pci_get_domain_bus_and_slot() says, it returns
    a pci device with refcount increment, when finish using it,
    the caller must decrement the reference count by calling
    pci_dev_put(). So call it after using to avoid refcount leak.

    Fixes: 14513ee696a0 ("hwmon: (coretemp) Use PCI host bridge ID to identify CPU if necessary")
    Signed-off-by: Yang Yingliang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Guenter Roeck <[email protected]>

commit a89ff5f5cc64b9fe7a992cf56988fd36f56ca82a
Author: Phil Auld <[email protected]>
Date:   Thu Nov 17 11:23:13 2022 -0500

    hwmon: (coretemp) Check for null before removing sysfs attrs

    If coretemp_add_core() gets an error then pdata->core_data[indx]
    is already NULL and has been kfreed. Don't pass that to
    sysfs_remove_group() as that will crash in sysfs_remove_group().

    [Shortened for readability]
    [91854.020159] sysfs: cannot create duplicate filename '/devices/platform/coretemp.0/hwmon/hwmon2/temp20_label'
    <cpu offline>
    [91855.126115] BUG: kernel NULL pointer dereference, address: 0000000000000188
    [91855.165103] #PF: supervisor read access in kernel mode
    [91855.194506] #PF: error_code(0x0000) - not-present page
    [91855.224445] PGD 0 P4D 0
    [91855.238508] Oops: 0000 [#1] PREEMPT SMP PTI
    ...
    [91855.342716] RIP: 0010:sysfs_remove_group+0xc/0x80
    ...
    [91855.796571] Call Trace:
    [91855.810524]  coretemp_cpu_offline+0x12b/0x1dd [coretemp]
    [91855.841738]  ? coretemp_cpu_online+0x180/0x180 [coretemp]
    [91855.871107]  cpuhp_invoke_callback+0x105/0x4b0
    [91855.893432]  cpuhp_thread_fun+0x8e/0x150
    ...

    Fix this by checking for NULL first.

    Signed-off-by: Phil Auld <[email protected]>
    Cc: [email protected]
    Cc: Fenghua Yu <[email protected]>
    Cc: Jean Delvare <[email protected]>
    Cc: Guenter Roeck <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 199e0de7f5df3 ("hwmon: (coretemp) Merge pkgtemp with coretemp")
    Signed-off-by: Guenter Roeck <[email protected]>

commit 7572ac3c979d4d0fb42d73a72d2608656516ff4f
Author: Ard Biesheuvel <[email protected]>
Date:   Wed Nov 30 17:37:17 2022 +0100

    arm64: efi: Revert "Recover from synchronous exceptions ..."

    This reverts commit 23715a26c8d81291, which introduced some code in
    assembler that manipulates both the ordinary and the shadow call stack
    pointer in a way that could potentially be taken advantage of. So let's
    revert it, and do a better job the next time around.

    Signed-off-by: Ard Biesheuvel <[email protected]>

commit d9f15a9de44affe733e34f93bc184945ba277e6d
Author: Conor Dooley <[email protected]>
Date:   Tue Nov 22 12:16:21 2022 +0000

    Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"

    This reverts commit 232ccac1bd9b5bfe73895f527c08623e7fa0752d.

    On the subject of suspend, the RISC-V SBI spec states:

      This does not cover whether any given events actually reach the hart or
      not, just what the hart will do if it receives an event. On PolarFire
      SoC, and potentially other SiFive based implementations, events from the
      RISC-V timer do reach a hart during suspend. This is not the case for the
      implementation on the Allwinner D1 - there timer events are not received
      during suspend.

    To fix this, the CLOCK_EVT_FEAT_C3STOP (mis)feature was enabled for the
    timer driver - but this has broken both RCU stall detection and timers
    generally on PolarFire SoC and potentially other SiFive based
    implementations.

    If an AXI read to the PCIe controller on PolarFire SoC times out, the
    system will stall, however, with CLOCK_EVT_FEAT_C3STOP active, the system
    just locks up without RCU stalling:

    	io scheduler mq-deadline registered
    	io scheduler kyber registered
    	microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
    	microchip-pcie 2000000000.pcie:      MEM 0x2008000000..0x2087ffffff -> 0x0008000000
    	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: axi read request error
    	microchip-pcie 2000000000.pcie: axi read timeout
    	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
    	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
    	Freeing initrd memory: 7332K

    Similarly issues were reported with clock_nanosleep() - with a test app
    that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250 & the blamed
    commit in place, the sleep times are rounded up to the next jiffy:

    == CPU: 1 ==      == CPU: 2 ==      == CPU: 3 ==      == CPU: 4 ==
    Mean: 7.974992    Mean: 7.976534    Mean: 7.962591    Mean: 3.952179
    Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
    Hi: 9.472000      Hi: 10.495000     Hi: 8.864000      Hi: 4.736000
    Lo: 6.087000      Lo: 6.380000      Lo: 4.872000      Lo: 3.403000
    Samples: 521      Samples: 521      Samples: 521      Samples: 521

    Fortunately, the D1 has a second timer, which is "currently used in
    preference to the RISC-V/SBI timer driver" so a revert here does not
    hurt operation of D1 in its current form.

    Ultimately, a DeviceTree property (or node) will be added to encode the
    behaviour of the timers, but until then revert the addition of
    CLOCK_EVT_FEAT_C3STOP.

    Fixes: 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
    Signed-off-by: Conor Dooley <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Palmer Dabbelt <[email protected]>
    Acked-by: Palmer Dabbelt <[email protected]>
    Acked-by: Samuel Holland <[email protected]>
    Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
    Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98/
    Link: https://lore.kernel.org/linux-riscv/[email protected]/
    Link: https://lore.kernel.org/r/[email protected]

commit dd30dcfa7a74a06f8dcdab260d8d5adf32f17333
Author: Wenchao Chen <[email protected]>
Date:   Wed Nov 30 20:13:28 2022 +0800

    mmc: sdhci-sprd: Fix no reset data and command after voltage switch

    After switching the voltage, no reset data and command will cause
    CMD2 timeout.

    Fixes: 29ca763fc26f ("mmc: sdhci-sprd: Add pin control support for voltage switch")
    Signed-off-by: Wenchao Chen <[email protected]>
    Acked-by: Adrian Hunter <[email protected]>
    Reviewed-by: Baolin Wang <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Ulf Hansson <[email protected]>

commit 063c0e773ab33103407a76051821777014b756fe
Merge: ef4d3ea40565 f6abcc21d943
Author: Linus Torvalds <[email protected]>
Date:   Wed Nov 30 15:46:46 2022 -0800

    Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

    Pull clk fixes from Stephen Boyd:
     "A set of clk driver fixes that resolve issues for various SoCs.

      Most of these are incorrect clk data, like bad parent descriptions.
      When the clk tree is improperly described things don't work, like USB
      and UFS controllers, because clk frequencies are wonky. Here are the
      extra details:

       - Fix the parent of UFS reference clks on Qualcomm SC8280XP so that
         UFS works properly

       - Fix the clk ID for USB on AT91 RM9200 so the USB driver continues
         to probe

       - Stop using of_device_get_match_data() on the wrong device for a
         Samsung Exynos driver so it gets the proper clk data

       - Fix ExynosAutov9 binding

       - Fix the parent of the div4 clk on Exynos7885

       - Stop calling runtime PM APIs from the Qualcomm GDSC driver directly
         as it leads to a lockdep splat and is just plain wrong because it
         violates runtime PM semantics by calling runtime PM APIs when the
         device has been runtime PM disabled"

    * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
      clk: qcom: gcc-sc8280xp: add cxo as parent for three ufs ref clks
      ARM: at91: rm9200: fix usb device clock id
      clk: samsung: Revert "clk: samsung: exynos-clkout: Use of_device_get_match_data()"
      dt-bindings: clock: exynosautov9: fix reference to CMU_FSYS1
      clk: qcom: gdsc: Remove direct runtime PM calls
      clk: samsung: exynos7885: Correct "div4" clock parents

commit 1d351f1894342c378b96bb9ed89f8debb1e24e9f
Author: Andrew Morton <[email protected]>
Date:   Mon Nov 28 13:21:40 2022 -0800

    revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"

    It causes build failures with unusual CC/HOSTCC combinations.

    Quoting
    https://lkml.kernel.org/r/[email protected]:

      HOSTCC  scripts/mod/modpost.o - due to target missing
    In file included from include/linux/string.h:5,
                     from scripts/mod/../../include/linux/license.h:5,
                     from scripts/mod/modpost.c:24:
    include/linux/compiler.h:246:10: fatal error: asm/rwonce.h: No such file or directory
      246 | #include <asm/rwonce.h>
          |          ^~~~~~~~~~~~~~
    compilation terminated.

    ...

    The problem is that HOSTCC is not necessarily the same compiler or even
    architecture as CC and pulling in <linux/compiler.h> or <asm/rwonce.h>
    files indirectly isn't a good idea then.

    My toolchain is providing HOSTCC = gcc (MacPorts) and CC = arm-linux-gnueabihf
    (built from gcc source) and all running on Darwin.

    If I change the include to <string.h> I can then "HOSTCC scripts/mod/modpost.c"
    but then it fails for "CC kernel/module/main.c" not finding <string.h>:

      CC      kernel/module/main.o - due to target missing
    In file included from kernel/module/main.c:43:0:
    ./include/linux/license.h:5:20: fatal error: string.h: No such file or directory
     #include <string.h>
                        ^
    compilation terminated.

    Reported-by: "H. Nikolaus Schaller" <[email protected]>
    Cc: Sam James <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 152fe65f300e1819d59b80477d3e0999b4d5d7d2
Author: Lee Jones <[email protected]>
Date:   Fri Nov 25 12:07:50 2022 +0000

    Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled

    When enabled, KASAN enlarges function's stack-frames.  Pushing quite a few
    over the current threshold.  This can mainly be seen on 32-bit
    architectures where the present limit (when !GCC) is a lowly 1024-Bytes.

    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Lee Jones <[email protected]>
    Acked-by: Arnd Bergmann <[email protected]>
    Cc: Alex Deucher <[email protected]>
    Cc: "Christian König" <[email protected]>
    Cc: Daniel Vetter <[email protected]>
    Cc: David Airlie <[email protected]>
    Cc: Harry Wentland <[email protected]>
    Cc: Leo Li <[email protected]>
    Cc: Maarten Lankhorst <[email protected]>
    Cc: Maxime Ripard <[email protected]>
    Cc: Nathan Chancellor <[email protected]>
    Cc: Nick Desaulniers <[email protected]>
    Cc: "Pan, Xinhui" <[email protected]>
    Cc: Rodrigo Siqueira <[email protected]>
    Cc: Thomas Zimmermann <[email protected]>
    Cc: Tom Rix <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 6f6cb1714365a07dbc66851879538df9f6969288
Author: Lee Jones <[email protected]>
Date:   Fri Nov 25 12:07:49 2022 +0000

    drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame

    Patch series "Fix a bunch of allmodconfig errors", v2.

    Since b339ec9c229aa ("kbuild: Only default to -Werror if COMPILE_TEST")
    WERROR now defaults to COMPILE_TEST meaning that it's enabled for
    allmodconfig builds.  This leads to some interesting build failures when
    using Clang, each resolved in this set.

    With this set applied, I am able to obtain a successful allmodconfig Arm
    build.

    This patch (of 2):

    calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 ||
    ARM64) architectures built with Clang (all released versions), whereby the
    stack frame gets blown up to well over 5k.  This would cause an immediate
    kernel panic on most architectures.  We'll revert this when the following
    bug report has been resolved:
    https://github.com/llvm/llvm-project/issues/41896.

    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Lee Jones <[email protected]>
    Suggested-by: Arnd Bergmann <[email protected]>
    Acked-by: Arnd Bergmann <[email protected]>
    Cc: Alex Deucher <[email protected]>
    Cc: "Christian König" <[email protected]>
    Cc: Daniel Vetter <[email protected]>
    Cc: David Airlie <[email protected]>
    Cc: Harry Wentland <[email protected]>
    Cc: Lee Jones <[email protected]>
    Cc: Leo Li <[email protected]>
    Cc: Maarten Lankhorst <[email protected]>
    Cc: Maxime Ripard <[email protected]>
    Cc: Nathan Chancellor <[email protected]>
    Cc: Nick Desaulniers <[email protected]>
    Cc: "Pan, Xinhui" <[email protected]>
    Cc: Rodrigo Siqueira <[email protected]>
    Cc: Thomas Zimmermann <[email protected]>
    Cc: Tom Rix <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit f268f6cf875f3220afc77bdd0bf1bb136eb54db9
Author: Jann Horn <[email protected]>
Date:   Fri Nov 25 22:37:14 2022 +0100

    mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths

    Any codepath that zaps page table entries must invoke MMU notifiers to
    ensure that secondary MMUs (like KVM) don't keep accessing pages which
    aren't mapped anymore.  Secondary MMUs don't hold their own references to
    pages that are mirrored over, so failing to notify them can lead to page
    use-after-free.

    I'm marking this as addressing an issue introduced in commit f3f0e1d2150b
    ("khugepaged: add support of collapse for tmpfs/shmem pages"), but most of
    the security impact of this only came in commit 27e1f8273113 ("khugepaged:
    enable collapse pmd for pte-mapped THP"), which actually omitted flushes
    for the removal of present PTEs, not just for the removal of empty page
    tables.

    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
    Signed-off-by: Jann Horn <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Reviewed-by: Yang Shi <[email protected]>
    Cc: John Hubbard <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 2ba99c5e08812494bc57f319fb562f527d9bacd8
Author: Jann Horn <[email protected]>
Date:   Fri Nov 25 22:37:13 2022 +0100

    mm/khugepaged: fix GUP-fast interaction by sending IPI

    Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP
    collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to
    ensure that the page table was not removed by khugepaged in between.

    However, lockless_pages_from_mm() still requires that the page table is
    not concurrently freed.  Fix it by sending IPIs (if the architecture uses
    semi-RCU-style page table freeing) before freeing/reusing page tables.

    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: ba76149f47d8 ("thp: khugepaged")
    Signed-off-by: Jann Horn <[email protected]>
    Reviewed-by: Yang Shi <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Cc: John Hubbard <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 8d3c106e19e8d251da31ff4cc7462e4565d65084
Author: Jann Horn <[email protected]>
Date:   Fri Nov 25 22:37:12 2022 +0100

    mm/khugepaged: take the right locks for page table retraction

    pagetable walks on address ranges mapped by VMAs can be done under the
    mmap lock, the lock of an anon_vma attached to the VMA, or the lock of the
    VMA's address_space.  Only one of these needs to be held, and it does not
    need to be held in exclusive mode.

    Under those circumstances, the rules for concurrent access to page table
    entries are:

     - Terminal page table entries (entries that don't point to another page
       table) can be arbitrarily changed under the page table lock, with the
       exception that they always need to be consistent for
       hardware page table walks and lockless_pages_from_mm().
       This includes that they can be changed into non-terminal entries.
     - Non-terminal page table entries (which point to another page table)
       can not be modified; readers are allowed to READ_ONCE() an entry, verify
       that it is non-terminal, and then assume that its value will stay as-is.

    Retracting a page table involves modifying a non-terminal entry, so
    page-table-level locks are insufficient to protect against concurrent page
    table traversal; it requires taking all the higher-level locks under which
    it is possible to start a page walk in the relevant range in exclusive
    mode.

    The collapse_huge_page() path for anonymous THP already follows this rule,
    but the shmem/file THP path was getting it wrong, making it possible for
    concurrent rmap-based operations to cause corruption.

    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
    Signed-off-by: Jann Horn <[email protected]>
    Reviewed-by: Yang Shi <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Cc: John Hubbard <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 829ae0f81ce093d674ff2256f66a714753e9ce32
Author: Gavin Shan <[email protected]>
Date:   Thu Nov 24 17:55:23 2022 +0800

    mm: migrate: fix THP's mapcount on isolation

    The issue is reported when removing memory through virtio_mem device.  The
    transparent huge page, experienced copy-on-write fault, is wrongly
    regarded as pinned.  The transparent huge page is escaped from being
    isolated in isolate_migratepages_block().  The transparent huge page can't
    be migrated and the corresponding memory block can't be put into offline
    state.

    Fix it by replacing page_mapcount() with total_mapcount().  With this, the
    transparent huge page can be isolated and migrated, and the memory block
    can be put into offline state.  Besides, The page's refcount is increased
    a bit earlier to avoid the page is released when the check is executed.

    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
    Signed-off-by: Gavin Shan <[email protected]>
    Reported-by: Zhenyu Zhang <[email protected]>
    Tested-by: Zhenyu Zhang <[email protected]>
    Suggested-by: David Hildenbrand <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Cc: Alistair Popple <[email protected]>
    Cc: Hugh Dickins <[email protected]>
    Cc: Kirill A. Shutemov <[email protected]>
    Cc: Matthew Wilcox <[email protected]>
    Cc: William Kucharski <[email protected]>
    Cc: Zi Yan <[email protected]>
    Cc: <[email protected]>	[5.7+]
    Signed-off-by: Andrew Morton <[email protected]>

commit 4aaf269c768dbacd6268af73fda2ffccaa3f1d88
Author: Juergen Gross <[email protected]>
Date:   Wed Nov 23 07:45:10 2022 +0100

    mm: introduce arch_has_hw_nonleaf_pmd_young()

    When running as a Xen PV guests commit eed9a328aa1a ("mm: x86: add
    CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") can cause a protection violation in
    pmdp_test_and_clear_young():

     BUG: unable to handle page fault for address: ffff8880083374d0
     #PF: supervisor write access in kernel mode
     #PF: error_code(0x0003) - permissions violation
     PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065
     Oops: 0003 [#1] PREEMPT SMP NOPTI
     CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr+ #1
     RIP: e030:pmdp_test_and_clear_young+0x25/0x40

    This happens because the Xen hypervisor can't emulate direct writes to
    page table entries other than PTEs.

    This can easily be fixed by introducing arch_has_hw_nonleaf_pmd_young()
    similar to arch_has_hw_pte_young() and test that instead of
    CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG.

    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: eed9a328aa1a ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG")
    Signed-off-by: Juergen Gross <[email protected]>
    Reported-by: Sander Eikelenboom <[email protected]>
    Acked-by: Yu Zhao <[email protected]>
    Tested-by: Sander Eikelenboom <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>	[core changes]
    Signed-off-by: Andrew Morton <[email protected]>

commit 6617da8fb565445e0be4a4885443006374943d09
Author: Juergen Gross <[email protected]>
Date:   Wed Nov 30 14:49:41 2022 -0800

    mm: add dummy pmd_young() for architectures not having it

    In order to avoid #ifdeffery add a dummy pmd_young() implementation as a
    fallback.  This is required for the later patch "mm: introduce
    arch_has_hw_nonleaf_pmd_young()".

    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Juergen Gross <[email protected]>
    Acked-by: Yu Zhao <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Dave Hansen <[email protected]>
    Cc: Geert Uytterhoeven <[email protected]>
    Cc: "H. Peter Anvin" <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: Sander Eikelenboom <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit 95bc35f9bee5220dad4e8567654ab3288a181639
Author: SeongJae Park <[email protected]>
Date:   Tue Nov 22 19:48:31 2022 +0000

    mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()

    Commit da87878010e5 ("mm/damon/sysfs: support online inputs update") made
    'damon_sysfs_set_schemes()' to be called for running DAMON context, which
    could have schemes.  In the case, DAMON sysfs interface is supposed to
    update, remove, or add schemes to reflect the sysfs files.  However, the
    code is assuming the DAMON context wouldn't have schemes at all, and
    therefore creates and adds new schemes.  As a result, the code doesn't
    work as intended for online schemes tuning and could have more than
    expected memory footprint.  The schemes are all in the DAMON context, so
    it doesn't leak the memory, though.

    Remove the wrong asssumption (the DAMON context wouldn't have schemes) in
    'damon_sysfs_set_schemes()' to fix the bug.

    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update")
    Signed-off-by: SeongJae Park <[email protected]>
    Cc: <[email protected]>	[5.19+]
    Signed-off-by: Andrew Morton <[email protected]>

commit a435874bf626f55d7147026b059008c8de89fbb8
Author: Tiezhu Yang <[email protected]>
Date:   Sat Nov 19 10:36:59 2022 +0800

    tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"

    The latest version of grep claims the egrep is now obsolete so the build
    now contains warnings that look like:

    	egrep: warning: egrep is obsolescent; using grep -E

    fix this up by moving the related file to use "grep -E" instead.

      sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/vm`

    Here are the steps to install the latest grep:

      wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
      tar xf grep-3.8.tar.gz
      cd grep-3.8 && ./configure && make
      sudo make install
      export PATH=/usr/local/bin:$PATH

    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Tiezhu Yang <[email protected]>
    Reviewed-by: Sergey Senozhatsky <[email protected]>
    Cc: Vlastimil Babka <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

commit f0a0ccda18d6fd826d7c7e7ad48a6ed61c20f8b4
Author: ZhangPeng <[email protected]>
Date:   Sat Nov 19 21:05:42 2022 +0900

    nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()

    Syzbot reported a null-ptr-deref bug:

     NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP
     frequency < 30 seconds
     general protection fault, probably for non-canonical address
     0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
     KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
     CPU: 1…
earlAchromatic added a commit to earlAchromatic/linux that referenced this pull request Oct 4, 2023
* Merge tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fix from Ted Ts'o:
 "Fix a double unlock bug on an error path in ext4, found by smatch and
  syzkaller"

* tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix possible double unlock when moving a directory

* Merge tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "There's a little bit more 'movement' in there for my taste but it
  needs to happen and should make the code better after it.

   - Check cmdline_find_option()'s return value before further
     processing

   - Clear temporary storage in the resctrl code to prevent access to an
     unexistent MSR

   - Add a simple throttling mechanism to protect the hypervisor from
     potentially malicious SEV guests issuing requests in rapid
     succession.

     In order to not jeopardize the sanity of everyone involved in
     maintaining this code, the request issuing side has received a
     cleanup, split in more or less trivial, small and digestible
     pieces. Otherwise, the code was threatening to become an
     unmaintainable mess.

     Therefore, that cleanup is marked indirectly also for stable so
     that there's no differences between the upstream code and the
     stable variant when it comes down to backporting more there"

* tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix use of uninitialized buffer in sme_enable()
  x86/resctrl: Clear staged_config[] before and after it is used
  virt/coco/sev-guest: Add throttling awareness
  virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case
  virt/coco/sev-guest: Do some code style cleanups
  virt/coco/sev-guest: Carve out the request issuing logic into a helper
  virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request()
  virt/coco/sev-guest: Simplify extended guest request handling
  virt/coco/sev-guest: Check SEV_SNP attribute at probe time

* Merge tag 'perf_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Check whether sibling events have been deactivated before adding them
   to groups

 - Update the proper event time tracking variable depending on the event
   type

 - Fix a memory overwrite issue due to using the wrong function argument
   when outputting perf events

* tag 'perf_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix check before add_event_to_groups() in perf_group_detach()
  perf: fix perf_event_context->time
  perf/core: Fix perf_output_begin parameter is incorrectly invoked in perf_event_bpf_output

* xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags

Callers of xfs_alloc_vextent_iterate_ags that pass in the TRYLOCK flag
want us to perform a non-blocking scan of the AGs for free space.  There
are no ordering constraints for non-blocking AGF lock acquisition, so
the scan can freely start over at AG 0 even when minimum_agno > 0.

This manifests fairly reliably on xfs/294 on 6.3-rc2 with the parent
pointer patchset applied and the realtime volume enabled.  I observed
the following sequence as part of an xfs_dir_createname call:

0. Fragment the free space, then allocate nearly all the free space in
   all AGs except AG 0.

1. Create a directory in AG 2 and let it grow for a while.

2. Try to allocate 2 blocks to expand the dirent part of a directory.
   The space will be allocated out of AG 0, but the allocation will not
   be contiguous.  This (I think) activates the LOWMODE allocator.

3. The bmapi call decides to convert from extents to bmbt format and
   tries to allocate 1 block.  This allocation request calls
   xfs_alloc_vextent_start_ag with the inode number, which starts the
   scan at AG 2.  We ignore AG 0 (with all its free space) and instead
   scrape AG 2 and 3 for more space.  We find one block, but this now
   kicks t_highest_agno to 3.

4. The createname call decides it needs to split the dabtree.  It tries
   to allocate even more space with xfs_alloc_vextent_start_ag, but now
   we're constrained to AG 3, and we don't find the space.  The
   createname returns ENOSPC and the filesystem shuts down.

This change fixes the problem by making the trylock scan wrap around to
AG 0 if it doesn't like the AGs that it finds.  Since the current
transaction itself holds AGF 0, the trylock of AGF 0 will succeed, and
we take space from the AG that has plenty.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

* xfs: add tracepoints for each of the externally visible allocators

There are now five separate space allocator interfaces exposed to the
rest of XFS for five different strategies to find space.  Add
tracepoints for each of them so that I can tell from a trace dump
exactly which ones got called and what happened underneath them.  Add a
sixth so it's more obvious if an allocation actually happened.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

* xfs: test dir/attr hash when loading module

Back in the 6.2-rc1 days, Eric Whitney reported a fstests regression in
ext4 against generic/454.  The cause of this test failure was the
unfortunate combination of setting an xattr name containing UTF8 encoded
emoji, an xattr hash function that accepted a char pointer with no
explicit signedness, signed type extension of those chars to an int, and
the 6.2 build tools maintainers deciding to mandate -funsigned-char
across the board.  As a result, the ondisk extended attribute structure
written out by 6.1 and 6.2 were not the same.

This discrepancy, in fact, had been noticeable if a filesystem with such
an xattr were moved between any two architectures that don't employ the
same signedness of a raw "char" declaration.  The only reason anyone
noticed is that x86 gcc defaults to signed, and no such -funsigned-char
update was made to e2fsprogs, so e2fsck immediately started reporting
data corruption.

After a day and a half of discussing how to handle this use case (xattrs
with bit 7 set anywhere in the name) without breaking existing users,
Linus merged his own patch and didn't tell the maintainer.  None of the
ext4 developers realized this until AUTOSEL announced that the commit
had been backported to stable.

In the end, this problem could have been detected much earlier if there
had been any useful tests of hash function(s) in use inside ext4 to make
sure that they always produce the same outputs given the same inputs.

The XFS dirent/xattr name hash takes a uint8_t*, so I don't think it's
vulnerable to this problem.  However, let's avoid all this drama by
adding our own self test to check that the da hash produces the same
outputs for a static pile of inputs on various platforms.  This enables
us to fix any breakage that may result in a controlled fashion.  The
buffer and test data are identical to the patches submitted to xfsprogs.

Link: https://lore.kernel.org/linux-ext4/Y8bpkm3jA3bDm3eL@debian-BULLSEYE-live-builder-AMD64/
Link: https://lore.kernel.org/linux-xfs/ZBUKCRR7xvIqPrpX@destitution/T/#md38272cc684e2c0d61494435ccbb91f022e8dee4
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

* Merge tag 'ras_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS fix from Borislav Petkov:

 - Flush out logged errors immediately after MCA banks configuration
   changes over sysfs have been done instead of waiting until something
   else triggers the workqueue later - another error or the polling
   interval cycle is reached

* tag 'ras_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Make sure logged MCEs are processed after sysfs update

* cpumask: introduce for_each_cpu_or

Equivalent of for_each_cpu_and, except it ORs the two masks together
so it iterates all the CPUs present in either mask.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

* pcpcntrs: fix dying cpu summation race

In commit f689054aace2 ("percpu_counter: add percpu_counter_sum_all
interface") a race condition between a cpu dying and
percpu_counter_sum() iterating online CPUs was identified. The
solution was to iterate all possible CPUs for summation via
percpu_counter_sum_all().

We recently had a percpu_counter_sum() call in XFS trip over this
same race condition and it fired a debug assert because the
filesystem was unmounting and the counter *should* be zero just
before we destroy it. That was reported here:

https://lore.kernel.org/linux-kernel/[email protected]/

likely as a result of running generic/648 which exercises
filesystems in the presence of CPU online/offline events.

The solution to use percpu_counter_sum_all() is an awful one. We
use percpu counters and percpu_counter_sum() for accurate and
reliable threshold detection for space management, so a summation
race condition during these operations can result in overcommit of
available space and that may result in filesystem shutdowns.

As percpu_counter_sum_all() iterates all possible CPUs rather than
just those online or even those present, the mask can include CPUs
that aren't even installed in the machine, or in the case of
machines that can hot-plug CPU capable nodes, even have physical
sockets present in the machine.

Fundamentally, this race condition is caused by the CPU being
offlined being removed from the cpu_online_mask before the notifier
that cleans up per-cpu state is run. Hence percpu_counter_sum() will
not sum the count for a cpu currently being taken offline,
regardless of whether the notifier has run or not. This is
the root cause of the bug.

The percpu counter notifier iterates all the registered counters,
locks the counter and moves the percpu count to the global sum.
This is serialised against other operations that move the percpu
counter to the global sum as well as percpu_counter_sum() operations
that sum the percpu counts while holding the counter lock.

Hence the notifier is safe to run concurrently with sum operations,
and the only thing we actually need to care about is that
percpu_counter_sum() iterates dying CPUs. That's trivial to do,
and when there are no CPUs dying, it has no addition overhead except
for a cpumask_or() operation.

This change makes percpu_counter_sum() always do the right thing in
the presence of CPU hot unplug events and makes
percpu_counter_sum_all() unnecessary. This, in turn, means that
filesystems like XFS, ext4, and btrfs don't have to work out when
they should use percpu_counter_sum() vs percpu_counter_sum_all() in
their space accounting algorithms

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

* fork: remove use of percpu_counter_sum_all

This effectively reverts the change made in commit f689054aace2
("percpu_counter: add percpu_counter_sum_all interface") as the
race condition percpu_counter_sum_all() was invented to avoid is
now handled directly in percpu_counter_sum() and nobody needs to
care about summing racing with cpu unplug anymore.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

* pcpcntr: remove percpu_counter_sum_all()

percpu_counter_sum_all() is now redundant as the race condition it
was invented to handle is now dealt with by percpu_counter_sum()
directly and all users of percpu_counter_sum_all() have been
removed.

Remove it.

This effectively reverts the changes made in f689054aace2
("percpu_counter: add percpu_counter_sum_all interface") except for
the cpumask iteration that fixes percpu_counter_sum() made earlier
in this series.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

* Merge tag 'char-misc-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are a few small char/misc/other driver subsystem patches to
  resolve reported problems for 6.3-rc3.

  Included in here are:

   - Interconnect driver fixes for reported problems

   - Memory driver fixes for reported problems

   - nvmem core fix

   - firmware driver fix for reported problem

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (23 commits)
  memory: tegra30-emc: fix interconnect registration race
  memory: tegra20-emc: fix interconnect registration race
  memory: tegra124-emc: fix interconnect registration race
  memory: tegra: fix interconnect registration race
  interconnect: exynos: drop redundant link destroy
  interconnect: exynos: fix registration race
  interconnect: exynos: fix node leak in probe PM QoS error path
  interconnect: qcom: msm8974: fix registration race
  interconnect: qcom: rpmh: fix registration race
  interconnect: qcom: rpmh: fix probe child-node error handling
  interconnect: qcom: rpm: fix registration race
  nvmem: core: return -ENOENT if nvmem cell is not found
  firmware: xilinx: don't make a sleepable memory allocation from an atomic context
  interconnect: qcom: rpm: fix probe child-node error handling
  interconnect: qcom: osm-l3: fix registration race
  interconnect: imx: fix registration race
  interconnect: fix provider registration API
  interconnect: fix icc_provider_del() error handling
  interconnect: fix mem leak when freeing nodes
  interconnect: qcom: qcm2290: Fix MASTER_SNOC_BIMC_NRT
  ...

* Merge tag 'tty-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
 "Here are some small tty and serial driver fixes for 6.3-rc3 to resolve
  some reported issues.

  They include:

   - 8250 driver Kconfig issue pointed out by you that showed up in -rc1

   - qcom-geni serial driver fixes

   - various 8250 driver fixes for reported problems

   - fsl_lpuart driver fixes

   - serdev fix for regression in -rc1

   - vt.c bugfix

  All have been in linux-next for over a week with no reported problems"

* tag 'tty-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: vt: protect KD_FONT_OP_GET_TALL from unbound access
  serial: qcom-geni: drop bogus uart_write_wakeup()
  serial: qcom-geni: fix mapping of empty DMA buffer
  serial: qcom-geni: fix DMA mapping leak on shutdown
  serial: qcom-geni: fix console shutdown hang
  serdev: Set fwnode for serdev devices
  tty: serial: fsl_lpuart: fix race on RX DMA shutdown
  serial: 8250_pci1xxxx: Disable SERIAL_8250_PCI1XXXX config by default
  serial: 8250_fsl: fix handle_irq locking
  serial: 8250_em: Fix UART port type
  serial: 8250: ASPEED_VUART: select REGMAP instead of depending on it
  tty: serial: fsl_lpuart: skip waiting for transmission complete when UARTCTRL_SBK is asserted
  Revert "tty: serial: fsl_lpuart: adjust SERIAL_FSL_LPUART_CONSOLE config dependency"

* tracing: Make splice_read available again

Since the commit 36e2c7421f02 ("fs: don't allow splice read/write
without explicit ops") is applied to the kernel, splice() and
sendfile() calls on the trace file (/sys/kernel/debug/tracing
/trace) return EINVAL.

This patch restores these system calls by initializing splice_read
in file_operations of the trace file. This patch only enables such
functionalities for the read case.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]

Cc: [email protected]
Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: Sung-hun Kim <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

* ring-buffer: remove obsolete comment for free_buffer_page()

The comment refers to mm/slob.c which is being removed. It comes from
commit ed56829cb319 ("ring_buffer: reset buffer page when freeing") and
according to Steven the borrowed code was a page mapcount and mapping
reset, which was later removed by commit e4c2ce82ca27 ("ring_buffer:
allocate buffer page pointer"). Thus the comment is not accurate anyway,
remove it.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]

Cc: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Reported-by: Mike Rapoport <[email protected]>
Suggested-by: Steven Rostedt (Google) <[email protected]>
Fixes: e4c2ce82ca27 ("ring_buffer: allocate buffer page pointer")
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Mukesh Ojha <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

* tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr

There is a problem with the behavior of hwlat in a container,
resulting in incorrect output. A warning message is generated:
"cpumask changed while in round-robin mode, switching to mode none",
and the tracing_cpumask is ignored. This issue arises because
the kernel thread, hwlatd, is not a part of the container, and
the function sched_setaffinity is unable to locate it using its PID.
Additionally, the task_struct of hwlatd is already known.
Ultimately, the function set_cpus_allowed_ptr achieves
the same outcome as sched_setaffinity, but employs task_struct
instead of PID.

Test case:

  # cd /sys/kernel/tracing
  # echo 0 > tracing_on
  # echo round-robin > hwlat_detector/mode
  # echo hwlat > current_tracer
  # unshare --fork --pid bash -c 'echo 1 > tracing_on'
  # dmesg -c

Actual behavior:

[573502.809060] hwlat_detector: cpumask changed while in round-robin mode, switching to mode none

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]

Cc: Masami Hiramatsu <[email protected]>
Fixes: 0330f7aa8ee63 ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs")
Signed-off-by: Costa Shulyupin <[email protected]>
Acked-by: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

* Merge tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix setting affinity of hwlat threads in containers

   Using sched_set_affinity() has unwanted side effects when being
   called within a container. Use set_cpus_allowed_ptr() instead

 - Fix per cpu thread management of the hwlat tracer:
    - Do not start per_cpu threads if one is already running for the CPU
    - When starting per_cpu threads, do not clear the kthread variable
      as it may already be set to running per cpu threads

 - Fix return value for test_gen_kprobe_cmd()

   On error the return value was overwritten by being set to the result
   of the call from kprobe_event_delete(), which would likely succeed,
   and thus have the function return success

 - Fix splice() reads from the trace file that was broken by commit
   36e2c7421f02 ("fs: don't allow splice read/write without explicit
   ops")

 - Remove obsolete and confusing comment in ring_buffer.c

   The original design of the ring buffer used struct page flags for
   tricks to optimize, which was shortly removed due to them being
   tricks. But a comment for those tricks remained

 - Set local functions and variables to static

* tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr
  ring-buffer: remove obsolete comment for free_buffer_page()
  tracing: Make splice_read available again
  ftrace: Set direct_ops storage-class-specifier to static
  trace/hwlat: Do not start per-cpu thread if it is already running
  trace/hwlat: Do not wipe the contents of per-cpu thread data
  tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static
  tracing: Fix wrong return in kprobe_event_gen_test.c

* Linux 6.3-rc3

* thunderbolt: Use const qualifier for `ring_interrupt_index`

`ring_interrupt_index` doesn't change the data for `ring` so mark it as
const. This is needed by the following patch that disables interrupt
auto clear for rings.

Cc: Sanju Mehta <[email protected]>
Cc: [email protected]
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Mika Westerberg <[email protected]>

* thunderbolt: Disable interrupt auto clear for rings

When interrupt auto clear is programmed, any read to the interrupt
status register will clear all interrupts.  If two interrupts have
come in before one can be serviced then this will cause lost interrupts.

On AMD USB4 routers this has manifested in odd problems particularly
with long strings of control tranfers such as reading the DROM via bit
banging.

Instead of clearing interrupts automatically, clear the bit corresponding
to the given ring's interrupt in the ISR.

Fixes: 7a1808f82a37 ("thunderbolt: Handle ring interrupt by reading interrupt status register")
Cc: Sanju Mehta <[email protected]>
Cc: [email protected]
Tested-by: Anson Tsao <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Mika Westerberg <[email protected]>

* drm/i915/mtl: Fix Wa_16015201720 implementation

The commit 2357f2b271ad ("drm/i915/mtl: Initial display workarounds")
extended the workaround Wa_16015201720 to MTL. However the registers
that the original WA implemented moved for MTL.

Implement the workaround with the correct register.

v3: Skip clock gating for pipe C, D DMC's and fix the title

Fixes: 2357f2b271ad ("drm/i915/mtl: Initial display workarounds")
Cc: Matt Atwood <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Signed-off-by: Radhakrishna Sripada <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 0188be507b973e36f637ba010a369057c8cb7282)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915/fbdev: lock the fbdev obj before vma pin

lock the fbdev obj before calling into
i915_vma_pin_iomap(). This helps to solve below :

<7>[   93.563308] i915 0000:00:02.0: [drm:intelfb_create [i915]] no BIOS fb, allocating a new one
<4>[   93.581844] ------------[ cut here ]------------
<4>[   93.581855] WARNING: CPU: 12 PID: 625 at drivers/gpu/drm/i915/gem/i915_gem_pages.c:424 i915_gem_object_pin_map+0x152/0x1c0 [i915]

Fixes: f0b6b01b3efe ("drm/i915: Add ww context to intel_dpt_pin, v2.")
Cc: Chris Wilson <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Signed-off-by: Tejas Upadhyay <[email protected]>
Signed-off-by: Radhakrishna Sripada <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 561b31acfd65502a2cda2067513240fc57ccdbdc)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915: Preserve crtc_state->inherited during state clearing

intel_crtc_prepare_cleared_state() is unintentionally losing
the "inherited" flag. This will happen if intel_initial_commit()
is forced to go through the full modeset calculations for
whatever reason.

Afterwards the first real commit from userspace will not get
forced to the full modeset path, and thus eg. audio state may
not get recomputed properly. So if the monitor was already
enabled during boot audio will not work until userspace itself
does an explicit full modeset.

Cc: [email protected]
Tested-by: Lee Shawn C <[email protected]>
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>
(cherry picked from commit 2553bacaf953b48c59357f5a622282bc0c45adae)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915/mtl: Disable MC6 for MTL A step

The Wa_14017073508 require to send Media Busy/Idle mailbox while
accessing Media tile. As of now it is getting handled while __gt_unpark,
__gt_park. But there are various corner cases where forcewakes are taken
without __gt_unpark i.e. without sending Busy Mailbox especially during
register reads. Forcewakes are taken without busy mailbox leads to
GPU HANG. So bringing mailbox calls under forcewake calls are no feasible
option as forcewake calls are atomic and mailbox calls are blocking.
The issue already fixed in B step so disabling MC6 on A step and
reverting previous commit which handles Wa_14017073508

Fixes: 8f70f1ec587d ("drm/i915/mtl: Add Wa_14017073508 for SAMedia")
Cc: Rodrigo Vivi <[email protected]>
Signed-off-by: Badal Nilawar <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Anshuman Gupta <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 038a24835ab68f341eaa7a0e3bcc6ce0f9b22e17)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915/guc: Fix missing ecodes

Error captures are tagged with an 'ecode'. This is a pseduo-unique magic
number that is meant to distinguish similar seeming bugs with
different underlying signatures. It is a combination of two ring state
registers. Unfortunately, the register state being used is only valid
in execlist mode. In GuC mode, the register state exists in a separate
list of arbitrary register address/value pairs rather than the named
entry structure. So, search through that list to find the two exciting
registers and copy them over to the structure's named members.

v2: if else if instead of if if (Alan)

Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Alan Previn <[email protected]>
Fixes: a6f0f9cf330a ("drm/i915/guc: Plumb GuC-capture into gpu_coredump")
Cc: Alan Previn <[email protected]>
Cc: Umesh Nerlige Ramappa <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Aravind Iddamsetty <[email protected]>
Cc: Michael Cheng <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Bruce Chang <[email protected]>
Cc: Daniele Ceraolo Spurio <[email protected]>
Cc: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 9724ecdbb9ddd6da3260e4a442574b90fc75188a)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915/active: Fix missing debug object activation

debug_active_activate() expected ref->count to be zero
which is not true anymore as __i915_active_activate() calls
debug_active_activate() after incrementing the count.

v2: No need to check for "ref->count == 1" as __i915_active_activate()
already make sure of that(Janusz).

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6733
Fixes: 04240e30ed06 ("drm/i915: Skip taking acquire mutex for no ref->active callback")
Cc: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: [email protected]
Cc: Janusz Krzysztofik <[email protected]>
Cc: <[email protected]> # v5.10+
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Janusz Krzysztofik <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit bfad380c542438a9b642f8190b7fd37bc77e2723)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915/gt: perform uc late init after probe error injection

Probe pseudo errors should be injected only in places where real errors
can be encountered, otherwise unwinding code can be broken.
Placing intel_uc_init_late before i915_inject_probe_error violated
this rule, resulting in following bug:
__intel_gt_disable:655 GEM_BUG_ON(intel_gt_pm_is_awake(gt))

Fixes: 481d458caede ("drm/i915/guc: Add golden context to GuC ADS")
Acked-by: Nirmoy Das <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Signed-off-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit c4252a11131c7f27a158294241466e2a4e7ff94e)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915: Fix format for perf_limit_reasons

Use hex format so that it is easier to decode.

Fixes: fe5979665f64 ("drm/i915/debugfs: Add perf_limit_reasons in debugfs")
Signed-off-by: Vinay Belgaumkar <[email protected]>
Reviewed-by: Ashutosh Dixit <[email protected]>
Signed-off-by: John Harrison <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 5e008ba67cb80084e99b40ccd46f9029ae421632)
Signed-off-by: Jani Nikula <[email protected]>

* drm/i915: Update vblank timestamping stuff on seamless M/N change

When we change the M/N values seamlessly during a fastset we should
also update the vblank timestamping stuff to make sure the vblank
timestamp corrections/guesstimations come out exact.

Note that only crtc_clock and framedur_ns can actually end up
changing here during fastsets. Everything else we touch can
only change during full modesets.

Technically we should try to do this exactly at the start of
vblank, but that would require some kind of double buffering
scheme. Let's skip that for now and just update things right
after the commit has been submitted to the hardware. This
means the information will be properly up to date when the
vblank irq handler goes to work. Only if someone ends up
querying some vblanky stuff in between the commit and start
of vblank may we see a slight discrepancy.

Also this same problem really exists for the DRRS downclocking
stuff. But as that is supposed to be more or less transparent
to the user, and it only drops to low gear after a long delay
(1 sec currently) we probably don't have to worry about it.
Any time something is actively submitting updates DRRS will
remain in high gear and so the timestamping constants will
match the hardware state.

Reviewed-by: Jani Nikula <[email protected]>
Reviewed-by: Mitul Golani <[email protected]>
Fixes: e6f29923c048 ("drm/i915: Allow M/N change during fastset on bdw+")
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 8cb1f95cca68421b08333175719fdd3615372ca8)
Signed-off-by: Jani Nikula <[email protected]>

* net: dsa: report rx_bytes unadjusted for ETH_HLEN

We collect the software statistics counters for RX bytes (reported to
/proc/net/dev and to ethtool -S $dev | grep 'rx_bytes: ") at a time when
skb->len has already been adjusted by the eth_type_trans() ->
skb_pull_inline(skb, ETH_HLEN) call to exclude the L2 header.

This means that when connecting 2 DSA interfaces back to back and
sending 1 packet with length 100, the sending interface will report
tx_bytes as incrementing by 100, and the receiving interface will report
rx_bytes as incrementing by 86.

Since accounting for that in scripts is quirky and is something that
would be DSA-specific behavior (requiring users to know that they are
running on a DSA interface in the first place), the proposal is that we
treat it as a bug and fix it.

This design bug has always existed in DSA, according to my analysis:
commit 91da11f870f0 ("net: Distributed Switch Architecture protocol
support") also updates skb->dev->stats.rx_bytes += skb->len after the
eth_type_trans() call. Technically, prior to Florian's commit
a86d8becc3f0 ("net: dsa: Factor bottom tag receive functions"), each and
every vendor-specific tagging protocol driver open-coded the same bug,
until the buggy code was consolidated into something resembling what can
be seen now. So each and every driver should have its own Fixes: tag,
because of their different histories until the convergence point.
I'm not going to do that, for the sake of simplicity, but just blame the
oldest appearance of buggy code.

There are 2 ways to fix the problem. One is the obvious way, and the
other is how I ended up doing it. Obvious would have been to move
dev_sw_netstats_rx_add() one line above eth_type_trans(), and below
skb_push(skb, ETH_HLEN). But DSA processing is not as simple as that.
We count the bytes after removing everything DSA-related from the
packet, to emulate what the packet's length was, on the wire, when the
user port received it.

When eth_type_trans() executes, dsa_untag_bridge_pvid() has not run yet,
so in case the switch driver requests this behavior - commit
412a1526d067 ("net: dsa: untag the bridge pvid from rx skbs") has the
details - the obvious variant of the fix wouldn't have worked, because
the positioning there would have also counted the not-yet-stripped VLAN
header length, something which is absent from the packet as seen on the
wire (there it may be untagged, whereas software will see it as
PVID-tagged).

Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics")
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

* net: qcom/emac: Fix use after free bug in emac_remove due to race condition

In emac_probe, &adpt->work_thread is bound with
emac_work_thread. Then it will be started by timeout
handler emac_tx_timeout or a IRQ handler emac_isr.

If we remove the driver which will call emac_remove
  to make cleanup, there may be a unfinished work.

The possible sequence is as follows:

Fix it by finishing the work before cleanup in the emac_remove
and disable timeout response.

CPU0                  CPU1

                    |emac_work_thread
emac_remove         |
free_netdev         |
kfree(netdev);      |
                    |emac_reinit_locked
                    |emac_mac_down
                    |//use netdev
Fixes: b9b17debc69d ("net: emac: emac gigabit ethernet controller driver")
Signed-off-by: Zheng Wang <[email protected]>

Signed-off-by: David S. Miller <[email protected]>

* net: usb: lan78xx: Limit packet length to skb->len

Packet length retrieved from descriptor may be larger than
the actual socket buffer length. In such case the cloned
skb passed up the network stack will leak kernel memory contents.

Additionally prevent integer underflow when size is less than
ETH_FCS_LEN.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Szymon Heidrich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

* usb: plusb: remove unused pl_clear_QuickLink_features function

clang with W=1 reports
drivers/net/usb/plusb.c:65:1: error:
  unused function 'pl_clear_QuickLink_features' [-Werror,-Wunused-function]
pl_clear_QuickLink_features(struct usbnet *dev, int val)
^
This static function is not used, so remove it.

Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

* net/ps3_gelic_net: Fix RX sk_buff length

The Gelic Ethernet device needs to have the RX sk_buffs aligned to
GELIC_NET_RXBUF_ALIGN, and also the length of the RX sk_buffs must
be a multiple of GELIC_NET_RXBUF_ALIGN.

The current Gelic Ethernet driver was not allocating sk_buffs large
enough to allow for this alignment.

Also, correct the maximum and minimum MTU sizes, and add a new
preprocessor macro for the maximum frame size, GELIC_NET_MAX_FRAME.

Fixes various randomly occurring runtime network errors.

Fixes: 02c1889166b4 ("ps3: gigabit ethernet driver for PS3, take3")
Signed-off-by: Geoff Levand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

* net/ps3_gelic_net: Use dma_mapping_error

The current Gelic Etherenet driver was checking the return value of its
dma_map_single call, and not using the dma_mapping_error() routine.

Fixes runtime problems like these:

  DMA-API: ps3_gelic_driver sb_05: device driver failed to check map error
  WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:1027 .check_unmap+0x888/0x8dc

Fixes: 02c1889166b4 ("ps3: gigabit ethernet driver for PS3, take3")
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Geoff Levand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

* Merge branch 'ps3_gelic_net-fixes'

Geoff Levand says:

====================
net/ps3_gelic_net: DMA related fixes

v9: Make rx_skb_size local to gelic_descr_prepare_rx.
v8: Add more cpu_to_be32 calls.
v7: Remove all cleanups, sync to spider net.
v6: Reworked and cleaned up patches.
v5: Some additional patch cleanups.
v4: More patch cleanups.
v3: Cleaned up patches as requested.
====================

Signed-off-by: David S. Miller <[email protected]>

* Revert "drm/i915/hwmon: Enable PL1 power limit"

This reverts commit ee892ea83d99610fa33bea612de058e0955eec3a.

It was accidentally picked up for backporting. Revert.

Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Signed-off-by: Ashutosh Dixit <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

* thunderbolt: Rename shadowed variables bit to interrupt_bit and auto_clear_bit

cppcheck reports
drivers/thunderbolt/nhi.c:74:7: style: Local variable 'bit' shadows outer variable [shadowVariable]
  int bit;
      ^
drivers/thunderbolt/nhi.c:66:6: note: Shadowed declaration
 int bit = ring_interrupt_index(ring) & 31;
     ^
drivers/thunderbolt/nhi.c:74:7: note: Shadow variable
  int bit;
      ^
For readablity rename the outer to interrupt_bit and the innner
to auto_clear_bit.

Fixes: 468c49f44759 ("thunderbolt: Disable interrupt auto clear for ring")
Cc: [email protected]
Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: Mika Westerberg <[email protected]>

* Merge tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:

 - Fix /proc/PID/io read_bytes accounting

 - Fix setting NLM file_lock start and end during decoding testargs

 - Fix timing for setting access cache timestamps

* tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Correct timing for assigning access cache timestamp
  lockd: set file_lock start and end when decoding nlm4 testargs
  NFS: Fix /proc/PID/io read_bytes for buffered reads

* ACPI: video: Add backlight=native DMI quirk for Acer Aspire 3830TG

The Acer Aspire 3830TG predates Windows 8, so it defaults to using
acpi_video# for backlight control, but this is non functional on
this model.

Add a DMI quirk to use the native backlight interface which does
work properly.

Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

* gpu: host1x: fix uninitialized variable use

The error handling for platform_get_irq() failing no longer works after
a recent change, clang now points this out with a warning:

  drivers/gpu/host1x/dev.c:520:6: error: variable 'syncpt_irq' is uninitialized when used here [-Werror,-Wuninitialized]
          if (syncpt_irq < 0)
              ^~~~~~~~~~

Fix this by removing the variable and checking the correct error status.

Fixes: 625d4ffb438c ("gpu: host1x: Rewrite syncpoint interrupt handling")
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Jon Hunter <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Mikko Perttunen <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

* zonefs: Prevent uninitialized symbol 'size' warning

In zonefs_file_dio_append(), initialize the variable size to 0 to
prevent compilation and static code analizers warning such as:

New smatch warnings:
fs/zonefs/file.c:441 zonefs_file_dio_append() error: uninitialized
symbol 'size'.

The warning is a false positive as size is never actually used
uninitialized.

No functional change.

Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>

* zonefs: Fix error message in zonefs_file_dio_append()

Since the expected write location in a sequential file is always at the
end of the file (append write), when an invalid write append location is
detected in zonefs_file_dio_append(), print the invalid written location
instead of the expected write location.

Fixes: a608da3bd730 ("zonefs: Detect append writes at invalid locations")
Cc: [email protected]
Signed-off-by: Damien Le Moal <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>

* Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt fix from Eric Biggers:
 "Fix a bug where when a filesystem was being unmounted, the fscrypt
  keyring was destroyed before inodes have been released by the Landlock
  LSM.

  This bug was found by syzbot"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  fscrypt: check for NULL keyring in fscrypt_put_master_key_activeref()
  fscrypt: improve fscrypt_destroy_keyring() documentation
  fscrypt: destroy keyring after security_sb_delete()

* Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity fixes from Eric Biggers:
 "Fix two significant performance issues with fsverity"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
  fsverity: Remove WQ_UNBOUND from fsverity read workqueue

* block/io_uring: pass in issue_flags for uring_cmd task_work handling

io_uring_cmd_done() currently assumes that the uring_lock is held
when invoked, and while it generally is, this is not guaranteed.
Pass in the issue_flags associated with it, so that we have
IO_URING_F_UNLOCKED available to be able to lock the CQ ring
appropriately when completing events.

Cc: [email protected]
Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Jens Axboe <[email protected]>

* io_uring/net: avoid sending -ECONNABORTED on repeated connection requests

Since io_uring does nonblocking connect requests, if we do two repeated
ones without having a listener, the second will get -ECONNABORTED rather
than the expected -ECONNREFUSED. Treat -ECONNABORTED like a normal retry
condition if we're nonblocking, if we haven't already seen it.

Cc: [email protected]
Fixes: 3fb1bd688172 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT")
Link: https://github.com/axboe/liburing/issues/828
Reported-by: Hui, Chunyang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

* octeontx2-vf: Add missing free for alloc_percpu

Add the free_percpu for the allocated "vf->hw.lmt_info" in order to avoid
memory leak, same as the "pf->hw.lmt_info" in
`drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c`.

Fixes: 5c0512072f65 ("octeontx2-pf: cn10k: Use runtime allocated LMTLINE region")
Signed-off-by: Jiasheng Jiang <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Acked-by: Geethasowjanya Akula <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

* drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Book X90F

Like the Windows Lenovo Yoga Book X91F/L the Android Lenovo Yoga Book
X90F/L has a portrait 1200x1920 screen used in landscape mode,
add a quirk for this.

When the quirk for the X91F/L was initially added it was written to
also apply to the X90F/L but this does not work because the Android
version of the Yoga Book uses completely different DMI strings.
Also adjust the X91F/L quirk to reflect that it only applies to
the X91F/L models.

Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

* entry: Fix noinstr warning in __enter_from_user_mode()

__enter_from_user_mode() is triggering noinstr warnings with
CONFIG_DEBUG_PREEMPT due to its call of preempt_count_add() via
ct_state().

The preemption disable isn't needed as interrupts are already disabled.
And the context_tracking_enabled() check in ct_state() also isn't needed
as that's already being done by the CT_WARN_ON().

Just use __ct_state() instead.

Fixes the following warnings:

  vmlinux.o: warning: objtool: enter_from_user_mode+0xba: call to preempt_count_add() leaves .noinstr.text section
  vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0xf9: call to preempt_count_add() leaves .noinstr.text section
  vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0xc7: call to preempt_count_add() leaves .noinstr.text section
  vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0xba: call to preempt_count_add() leaves .noinstr.text section

Fixes: 171476775d32 ("context_tracking: Convert state to atomic_t")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/d8955fa6d68dc955dda19baf13ae014ae27926f5.1677369694.git.jpoimboe@kernel.org

* sched/fair: Sanitize vruntime of entity being migrated

Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
fixes an overflowing bug, but ignore a case that se->exec_start is reset
after a migration.

For fixing this case, we delay the reset of se->exec_start after
placing the entity which se->exec_start to detect long sleeping task.

In order to take into account a possible divergence between the clock_task
of 2 rqs, we increase the threshold to around 104 days.

Fixes: 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
Originally-by: Zhang Qiao <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Qiao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

* perf/x86/amd/core: Always clear status for idx

The variable 'status' (which contains the unhandled overflow bits) is
not being properly masked in some cases, displaying the following
warning:

  WARNING: CPU: 156 PID: 475601 at arch/x86/events/amd/core.c:972 amd_pmu_v2_handle_irq+0x216/0x270

This seems to be happening because the loop is being continued before
the status bit being unset, in case x86_perf_event_set_period()
returns 0. This is also causing an inconsistency because the "handled"
counter is incremented, but the status bit is not cleaned.

Move the bit cleaning together above, together when the "handled"
counter is incremented.

Fixes: 7685665c390d ("perf/x86/amd/core: Add PerfMonV2 overflow handling")
Signed-off-by: Breno Leitao <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Sandipan Das <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

* entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up

RCU sometimes needs to perform a delayed wake up for specific kthreads
handling offloaded callbacks (RCU_NOCB).  These wakeups are performed
by timers and upon entry to idle (also to guest and to user on nohz_full).

However the delayed wake-up on kernel exit is actually performed after
the thread flags are fetched towards the fast path check for work to
do on exit to user. As a result, and if there is no other pending work
to do upon that kernel exit, the current task will resume to userspace
with TIF_RESCHED set and the pending wake up ignored.

Fix this with fetching the thread flags _after_ the delayed RCU-nocb
kthread wake-up.

Fixes: 47b8ff194c1f ("entry: Explicitly flush pending rcuog wakeup before last rescheduling point")
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

* efi/libstub: zboot: Add compressed image to make targets

Avoid needlessly rebuilding the compressed image by adding the file
'vmlinuz' to the 'targets' Kbuild make variable.

Signed-off-by: Ard Biesheuvel <[email protected]>

* hwmon: (peci/cputemp) Fix miscalculated DTS for SKX

For Skylake, DTS temperature of the CPU is reported in S10.6 format
instead of S8.8.

Reported-by: Paul Fertser <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Iwona Winiarska <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Guenter Roeck <[email protected]>

* hwmon: fix potential sensor registration fail if of_node is missing

It is not sufficient to check of_node in current device.
In some cases, this would cause the sensor registration to fail.

This patch looks for device's ancestors to find a valid of_node if any.

Fixes: d560168b5d0f ("hwmon: (core) New hwmon registration API")
Signed-off-by: Phinex Hung <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Guenter Roeck <[email protected]>

* hwmon: (xgene) Fix ioremap and memremap leak

Smatch reports:

drivers/hwmon/xgene-hwmon.c:757 xgene_hwmon_probe() warn:
'ctx->pcc_comm_addr' from ioremap() not released on line: 757.

This is because in drivers/hwmon/xgene-hwmon.c:701 xgene_hwmon_probe(),
ioremap and memremap is not released, which may cause a leak.

To fix this, ioremap and memremap is modified to devm_ioremap and
devm_memremap.

Signed-off-by: Tianyi Jing <[email protected]>
Reviewed-by: Dongliang Mu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[groeck: Fixed formatting and subject]
Signed-off-by: Guenter Roeck <[email protected]>

* bootconfig: Fix testcase to increase max node

Since commit 6c40624930c5 ("bootconfig: Increase max nodes of bootconfig
from 1024 to 8192 for DCC support") increased the max number of bootconfig
node to 8192, the bootconfig testcase of the max number of nodes fails.
To fix this issue, we can not simply increase the number in the test script
because the test bootconfig file becomes too big (>32KB). To fix that, we
can use a combination of three alphabets (26^3 = 17576). But with that,
we can not express the 8193 (just one exceed from the limitation) because
it also exceeds the max size of bootconfig. So, the first 26 nodes will just
use one alphabet.

With this fix, test-bootconfig.sh passes all tests.

Link: https://lore.kernel.org/all/167888844790.791176.670805252426835131.stgit@devnote2/

Reported-by: Heinz Wiesinger <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Fixes: 6c40624930c5 ("bootconfig: Increase max nodes of bootconfig from 1024 to 8192 for DCC support")
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>

* keys: Do not cache key in task struct if key is requested from kernel thread

The key which gets cached in task structure from a kernel thread does not
get invalidated even after expiry.  Due to which, a new key request from
kernel thread will be served with the cached key if it's present in task
struct irrespective of the key validity.  The change is to not cache key in
task_struct when key requested from kernel thread so that kernel thread
gets a valid key on every key request.

The problem has been seen with the cifs module doing DNS lookups from a
kernel thread and the results getting pinned by being attached to that
kernel thread's cache - and thus not something that can be easily got rid
of.  The cache would ordinarily be cleared by notify-resume, but kernel
threads don't do that.

This isn't seen with AFS because AFS is doing request_key() within the
kernel half of a user thread - which will do notify-resume.

Fixes: 7743c48e54ee ("keys: Cache result of request_key*() temporarily in task_struct")
Signed-off-by: Bharath SM <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
cc: Shyam Prasad N <[email protected]>
cc: Steve French <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/CAGypqWw951d=zYRbdgNR4snUDvJhWL=q3=WOyh7HhSJupjz2vA@mail.gmail.com/

* verify_pefile: relax wrapper length check

The PE Format Specification (section "The Attribute Certificate Table
(Image Only)") states that `dwLength` is to be rounded up to 8-byte
alignment when used for traversal.  Therefore, the field is not required
to be an 8-byte multiple in the first place.

Accordingly, pesign has not performed this alignment since version
0.110.  This causes kexec failure on pesign'd binaries with "PEFILE:
Signature wrapper len wrong".  Update the comment and relax the check.

Signed-off-by: Robbie Harwood <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Jarkko Sakkinen <[email protected]>
cc: Eric Biederman <[email protected]>
cc: Herbert Xu <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#the-attribute-certificate-table-image-only
Link: https://github.com/rhboot/pesign
Link: https://lore.kernel.org/r/[email protected]/ # v2

* asymmetric_keys: log on fatal failures in PE/pkcs7

These particular errors can be encountered while trying to kexec when
secureboot lockdown is in place.  Without this change, even with a
signed debug build, one still needs to reboot the machine to add the
appropriate dyndbg parameters (since lockdown blocks debugfs).

Accordingly, upgrade all pr_debug() before fatal error into pr_warn().

Signed-off-by: Robbie Harwood <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Jarkko Sakkinen <[email protected]>
cc: Eric Biederman <[email protected]>
cc: Herbert Xu <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ # v2

* ice: fix rx buffers handling for flow director packets

Adding flow director filters stopped working correctly after
commit 2fba7dc5157b ("ice: Add support for XDP multi-buffer
on Rx side"). As a result, only first flow director filter
can be added, adding next filter leads to NULL pointer
dereference attached below.

Rx buffer handling and reallocation logic has been optimized,
however flow director specific traffic was not accounted for.
As a result driver handled those packets incorrectly since new
logic was based on ice_rx_ring::first_desc which was not set
in this case.

Fix this by setting struct ice_rx_ring::first_desc to next_to_clean
for flow director received packets.

[  438.544867] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  438.551840] #PF: supervisor read access in kernel mode
[  438.556978] #PF: error_code(0x0000) - not-present page
[  438.562115] PGD 7c953b2067 P4D 0
[  438.565436] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  438.569794] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.2.0-net-bug #1
[  438.577531] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
[  438.588470] RIP: 0010:ice_clean_rx_irq+0x2b9/0xf20 [ice]
[  438.593860] Code: 45 89 f7 e9 ac 00 00 00 8b 4d 78 41 31 4e 10 41 09 d5 4d 85 f6 0f 84 82 00 00 00 49 8b 4e 08 41 8b 76
1c 65 8b 3d 47 36 4a 3f <48> 8b 11 48 c1 ea 36 39 d7 0f 85 a6 00 00 00 f6 41 08 02 0f 85 9c
[  438.612605] RSP: 0018:ff8c732640003ec8 EFLAGS: 00010082
[  438.617831] RAX: 0000000000000800 RBX: 00000000000007ff RCX: 0000000000000000
[  438.624957] RDX: 0000000000000800 RSI: 0000000000000000 RDI: 0000000000000000
[  438.632089] RBP: ff4ed275a2158200 R08: 00000000ffffffff R09: 0000000000000020
[  438.639222] R10: 0000000000000000 R11: 0000000000000020 R12: 0000000000001000
[  438.646356] R13: 0000000000000000 R14: ff4ed275d0daffe0 R15: 0000000000000000
[  438.653485] FS:  0000000000000000(0000) GS:ff4ed2738fa00000(0000) knlGS:0000000000000000
[  438.661563] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  438.667310] CR2: 0000000000000000 CR3: 0000007c9f0d6006 CR4: 0000000000771ef0
[  438.674444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  438.681573] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  438.688697] PKRU: 55555554
[  438.691404] Call Trace:
[  438.693857]  <IRQ>
[  438.695877]  ? profile_tick+0x17/0x80
[  438.699542]  ice_msix_clean_ctrl_vsi+0x24/0x50 [ice]
[  438.702571] ice 0000:b1:00.0: VF 1: ctrl_vsi irq timeout
[  438.704542]  __handle_irq_event_percpu+0x43/0x1a0
[  438.704549]  handle_irq_event+0x34/0x70
[  438.704554]  handle_edge_irq+0x9f/0x240
[  438.709901] iavf 0000:b1:01.1: Failed to add Flow Director filter with status: 6
[  438.714571]  __common_interrupt+0x63/0x100
[  438.714580]  common_interrupt+0xb4/0xd0
[  438.718424] iavf 0000:b1:01.1: Rule ID: 127 dst_ip: 0.0.0.0 src_ip 0.0.0.0 UDP: dst_port 4 src_port 0
[  438.722255]  </IRQ>
[  438.722257]  <TASK>
[  438.722257]  asm_common_interrupt+0x22/0x40
[  438.722262] RIP: 0010:cpuidle_enter_state+0xc8/0x430
[  438.722267] Code: 6e e9 25 ff e8 f9 ef ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 d7 f1 24 ff 45
84 ff 0f 85 57 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[  438.722269] RSP: 0018:ffffffff86003e50 EFLAGS: 00000246
[  438.784108] RAX: ff4ed2738fa00000 RBX: ffbe72a64fc01020 RCX: 0000000000000000
[  438.791234] RDX: 0000000000000000 RSI: ffffffff858d84de RDI: ffffffff85893641
[  438.798365] RBP: 0000000000000002 R08: 0000000000000002 R09: 000000003158af9d
[  438.805490] R10: 0000000000000008 R11: 0000000000000354 R12: ffffffff862365a0
[  438.812622] R13: 000000661b472a87 R14: 0000000000000002 R15: 0000000000000000
[  438.819757]  cpuidle_enter+0x29/0x40
[  438.823333]  do_idle+0x1b6/0x230
[  438.826566]  cpu_startup_entry+0x19/0x20
[  438.830492]  rest_init+0xcb/0xd0
[  438.833717]  arch_call_rest_init+0xa/0x30
[  438.837731]  start_kernel+0x776/0xb70
[  438.841396]  secondary_startup_64_no_verify+0xe5/0xeb
[  438.846449]  </TASK>

Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Signed-off-by: Piotr Raczynski <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

* ice: check if VF exists before mode check

Setting trust on VF should return EINVAL when there is no VF. Move
checking for switchdev mode after checking if VF exists.

Fixes: c54d209c78b8 ("ice: Wait for VF to be reset/ready before configuration")
Signed-off-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Kalyan Kodamagula <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

* ice: remove filters only if VSI is deleted

Filters shouldn't be removed in VSI rebuild path. Removing them on PF
VSI results in no rule for PF MAC after changing for example queues
amount.

Remove all filters only in the VSI remove flow. As unload should also
cause the filter to be removed introduce, a new function ice_stop_eth().
It will unroll ice_start_eth(), so remove filters and close VSI.

Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

* iavf: fix hang on reboot with ice

When a system with E810 with existing VFs gets rebooted the following
hang may be observed.

 Pid 1 is hung in iavf_remove(), part of a network driver:
 PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
  #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
  #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
  #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
  #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
  #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
  #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
  #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
  #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
  #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
  #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
 #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
 #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
 #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
 #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
 #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
     RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
     RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
     RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
     R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
     R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
     ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.

Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().

Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove")
Fixes: a8417330f8a5 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
Reported-by: Marius Cornea <[email protected]>
Signed-off-by: Stefan Assmann <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

* i40e: fix flow director packet filter programming

Initialize to zero structures to build a valid
Tx Packet used for the filter programming.

Fixes: a9219b332f52 ("i40e: VLAN field for flow director")
Signed-off-by: Radoslaw Tyl <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
Sign…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.