commit 25405d5eecac69622a155752bb8b0e1ed5071e36 Author: Greg Kroah-Hartman Date: Mon Jun 6 08:49:00 2022 +0200 Linux 5.18.2 Link: https://lore.kernel.org/r/20220603173820.731531504@linuxfoundation.org Tested-by: Ronald Warsow Tested-by: Ron Economos Tested-by: Linux Kernel Functional Testing Tested-by: Guenter Roeck Tested-by: Rudi Heitbaum Tested-by: Bagas Sanjaya Signed-off-by: Greg Kroah-Hartman commit 8f4baf2c265602c396bec604503cb7c669b00461 Author: Takashi Iwai Date: Tue May 31 15:07:49 2022 +0200 ALSA: usb-audio: Optimize TEAC clock quirk commit 3753fcc22974affa26160ce1c46a6ebaaaa86758 upstream. Maris found out that the quirk for TEAC devices to work around the clock setup is needed to apply only when the base clock is changed, e.g. from 48000-based clocks (48000, 96000, 192000, 384000) to 44100-based clocks (44100, 88200, 176400, 352800), or vice versa, while switching to another clock with the same base clock doesn't need the (forcible) interface setup. This patch implements the optimization for the TEAC clock quirk to avoid the unnecessary interface re-setup. Fixes: 5ce0b06ae5e6 ("ALSA: usb-audio: Workaround for clock setup on TEAC devices") Reported-by: Maris Abele Cc: Link: https://lore.kernel.org/r/20220531130749.30357-1-tiwai@suse.de Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 4caf12e7c81d96acfa268c8e44c721a8f036f52c Author: Kumar Kartikeya Dwivedi Date: Sat Mar 19 13:38:23 2022 +0530 bpf: Do write access check for kfunc and global func commit be77354a3d7ebd4897ee18eca26dca6df9224c76 upstream. When passing pointer to some map value to kfunc or global func, in verifier we are passing meta as NULL to various functions, which uses meta->raw_mode to check whether memory is being written to. Since some kfunc or global funcs may also write to memory pointers they receive as arguments, we must check for write access to memory. E.g. in some case map may be read only and this will be missed by current checks. However meta->raw_mode allows for uninitialized memory (e.g. on stack), since there is not enough info available through BTF, we must perform one call for read access (raw_mode = false), and one for write access (raw_mode = true). Fixes: e5069b9c23b3 ("bpf: Support pointers in global func args") Fixes: d583691c47dc ("bpf: Introduce mem, size argument pair support for kfunc") Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220319080827.73251-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 717c39718dbc4f7ebcbb7b625fb11851cd9007fe Author: Kumar Kartikeya Dwivedi Date: Sat Mar 19 13:38:24 2022 +0530 bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access commit 97e6d7dab1ca4648821c790a2b7913d6d5d549db upstream. The commit being fixed was aiming to disallow users from incorrectly obtaining writable pointer to memory that is only meant to be read. This is enforced now using a MEM_RDONLY flag. For instance, in case of global percpu variables, when the BTF type is not struct (e.g. bpf_prog_active), the verifier marks register type as PTR_TO_MEM | MEM_RDONLY from bpf_this_cpu_ptr or bpf_per_cpu_ptr helpers. However, when passing such pointer to kfunc, global funcs, or BPF helpers, in check_helper_mem_access, there is no expectation MEM_RDONLY flag will be set, hence it is checked as pointer to writable memory. Later, verifier sets up argument type of global func as PTR_TO_MEM | PTR_MAYBE_NULL, so user can use a global func to get around the limitations imposed by this flag. This check will also cover global non-percpu variables that may be introduced in kernel BTF in future. Also, we update the log message for PTR_TO_BUF case to be similar to PTR_TO_MEM case, so that the reason for error is clear to user. Fixes: 34d3a78c681e ("bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.") Reviewed-by: Hao Luo Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220319080827.73251-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit a08d942ecbf46e23a192093f6983cb1d779f4fa8 Author: Kumar Kartikeya Dwivedi Date: Sat Mar 19 13:38:25 2022 +0530 bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access commit 7b3552d3f9f6897851fc453b5131a967167e43c2 upstream. It is not permitted to write to PTR_TO_MAP_KEY, but the current code in check_helper_mem_access would allow for it, reject this case as well, as helpers taking ARG_PTR_TO_UNINIT_MEM also take PTR_TO_MAP_KEY. Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220319080827.73251-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit a31729c945fdcb9b3eea6fe14a1e94e7d39bd8c6 Author: Yuntao Wang Date: Thu Apr 7 21:04:23 2022 +0800 bpf: Fix excessive memory allocation in stack_map_alloc() commit b45043192b3e481304062938a6561da2ceea46a6 upstream. The 'n_buckets * (value_size + sizeof(struct stack_map_bucket))' part of the allocated memory for 'smap' is never used after the memlock accounting was removed, thus get rid of it. [ Note, Daniel: Commit b936ca643ade ("bpf: rework memlock-based memory accounting for maps") moved `cost += n_buckets * (value_size + sizeof(struct stack_map_bucket))` up and therefore before the bpf_map_area_alloc() allocation, sigh. In a later step commit c85d69135a91 ("bpf: move memory size checks to bpf_map_charge_init()"), and the overflow checks of `cost >= U32_MAX - PAGE_SIZE` moved into bpf_map_charge_init(). And then 370868107bf6 ("bpf: Eliminate rlimit-based memory accounting for stackmap maps") finally removed the bpf_map_charge_init(). Anyway, the original code did the allocation same way as /after/ this fix. ] Fixes: b936ca643ade ("bpf: rework memlock-based memory accounting for maps") Signed-off-by: Yuntao Wang Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220407130423.798386-1-ytcoode@gmail.com Signed-off-by: Greg Kroah-Hartman commit d2d2a1b6670bfac94a8b6a1fa9ce8b786cbe972e Author: KP Singh Date: Mon Apr 18 15:51:58 2022 +0000 bpf: Fix usage of trace RCU in local storage. commit dcf456c9a095a6e71f53d6f6f004133ee851ee70 upstream. bpf_{sk,task,inode}_storage_free() do not need to use call_rcu_tasks_trace as no BPF program should be accessing the owner as it's being destroyed. The only other reader at this point is bpf_local_storage_map_free() which uses normal RCU. The only path that needs trace RCU are: * bpf_local_storage_{delete,update} helpers * map_{delete,update}_elem() syscalls Fixes: 0fe4b381a59e ("bpf: Allow bpf_local_storage to be used by sleepable programs") Signed-off-by: KP Singh Signed-off-by: Alexei Starovoitov Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20220418155158.2865678-1-kpsingh@kernel.org Signed-off-by: Greg Kroah-Hartman commit 652cefc840c6814868f57c997e49e31608abb531 Author: Liu Jian Date: Sat Apr 16 18:57:59 2022 +0800 bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes commit 45969b4152c1752089351cd6836a42a566d49bcf upstream. The data length of skb frags + frag_list may be greater than 0xffff, and skb_header_pointer can not handle negative offset. So, here INT_MAX is used to check the validity of offset. Add the same change to the related function skb_store_bytes. Fixes: 05c74e5e53f6 ("bpf: add bpf_skb_load_bytes helper") Signed-off-by: Liu Jian Signed-off-by: Daniel Borkmann Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20220416105801.88708-2-liujian56@huawei.com Signed-off-by: Greg Kroah-Hartman commit d106a3e96fca30e44081eae9c27aab28fc132a46 Author: Alexei Starovoitov Date: Thu May 12 18:10:24 2022 -0700 bpf: Fix combination of jit blinding and pointers to bpf subprogs. commit 4b6313cf99b0d51b49aeaea98ec76ca8161ecb80 upstream. The combination of jit blinding and pointers to bpf subprogs causes: [ 36.989548] BUG: unable to handle page fault for address: 0000000100000001 [ 36.990342] #PF: supervisor instruction fetch in kernel mode [ 36.990968] #PF: error_code(0x0010) - not-present page [ 36.994859] RIP: 0010:0x100000001 [ 36.995209] Code: Unable to access opcode bytes at RIP 0xffffffd7. [ 37.004091] Call Trace: [ 37.004351] [ 37.004576] ? bpf_loop+0x4d/0x70 [ 37.004932] ? bpf_prog_3899083f75e4c5de_F+0xe3/0x13b The jit blinding logic didn't recognize that ld_imm64 with an address of bpf subprogram is a special instruction and proceeded to randomize it. By itself it wouldn't have been an issue, but jit_subprogs() logic relies on two step process to JIT all subprogs and then JIT them again when addresses of all subprogs are known. Blinding process in the first JIT phase caused second JIT to miss adjustment of special ld_imm64. Fix this issue by ignoring special ld_imm64 instructions that don't have user controlled constants and shouldn't be blinded. Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Reported-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Acked-by: Andrii Nakryiko Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20220513011025.13344-1-alexei.starovoitov@gmail.com Signed-off-by: Greg Kroah-Hartman commit 4f8897bcc20b9ae44758e0572538d741ab66f0dc Author: Yuntao Wang Date: Sat Apr 30 21:08:03 2022 +0800 bpf: Fix potential array overflow in bpf_trampoline_get_progs() commit a2aa95b71c9bbec793b5c5fa50f0a80d882b3e8d upstream. The cnt value in the 'cnt >= BPF_MAX_TRAMP_PROGS' check does not include BPF_TRAMP_MODIFY_RETURN bpf programs, so the number of the attached BPF_TRAMP_MODIFY_RETURN bpf programs in a trampoline can exceed BPF_MAX_TRAMP_PROGS. When this happens, the assignment '*progs++ = aux->prog' in bpf_trampoline_get_progs() will cause progs array overflow as the progs field in the bpf_tramp_progs struct can only hold at most BPF_MAX_TRAMP_PROGS bpf programs. Fixes: 88fd9e5352fe ("bpf: Refactor trampoline update code") Signed-off-by: Yuntao Wang Link: https://lore.kernel.org/r/20220430130803.210624-1-ytcoode@gmail.com Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 8da0d2d056beaff23e12e543f6828f7c61e02d52 Author: Song Liu Date: Fri May 20 16:57:51 2022 -0700 bpf: Fill new bpf_prog_pack with illegal instructions commit d88bb5eed04ce50cc20e7f9282977841728be798 upstream. bpf_prog_pack enables sharing huge pages among multiple BPF programs. These pages are marked as executable before the JIT engine fill it with BPF programs. To make these pages safe, fill the hole bpf_prog_pack with illegal instructions before making it executable. Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator") Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]") Reported-by: Linus Torvalds Signed-off-by: Song Liu Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220520235758.1858153-2-song@kernel.org Signed-off-by: Greg Kroah-Hartman commit e8020d96dd5b2dcc1f6a8ee4f87a53a373002cd5 Author: Chuck Lever Date: Sat May 21 19:06:13 2022 -0400 NFSD: Fix possible sleep during nfsd4_release_lockowner() commit ce3c4ad7f4ce5db7b4f08a1e237d8dd94b39180b upstream. nfsd4_release_lockowner() holds clp->cl_lock when it calls check_for_locks(). However, check_for_locks() calls nfsd_file_get() / nfsd_file_put() to access the backing inode's flc_posix list, and nfsd_file_put() can sleep if the inode was recently removed. Let's instead rely on the stateowner's reference count to gate whether the release is permitted. This should be a reliable indication of locks-in-use since file lock operations and ->lm_get_owner take appropriate references, which are released appropriately when file locks are removed. Reported-by: Dai Ngo Signed-off-by: Chuck Lever Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit d98d13a45e281c9521c1ce747eb8ad7aa16a1c69 Author: Trond Myklebust Date: Sat May 14 10:08:10 2022 -0400 NFS: Memory allocation failures are not server fatal errors commit 452284407c18d8a522c3039339b1860afa0025a8 upstream. We need to filter out ENOMEM in nfs_error_is_fatal_on_server(), because running out of memory on our client is not a server error. Reported-by: Olga Kornievskaia Fixes: 2dc23afffbca ("NFS: ENOMEM should also be a fatal error.") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust Signed-off-by: Anna Schumaker Signed-off-by: Greg Kroah-Hartman commit ca90b4a045dfabbef6c2ac16e3985b7399b68ea2 Author: Akira Yokosawa Date: Wed Apr 27 18:28:39 2022 +0900 docs: submitting-patches: Fix crossref to 'The canonical patch format' commit 6d5aa418b3bd42cdccc36e94ee199af423ef7c84 upstream. The reference to `explicit_in_reply_to` is pointless as when the reference was added in the form of "#15" [1], Section 15) was "The canonical patch format". The reference of "#15" had not been properly updated in a couple of reorganizations during the plain-text SubmittingPatches era. Fix it by using `the_canonical_patch_format`. [1]: 2ae19acaa50a ("Documentation: Add "how to write a good patch summary" to SubmittingPatches") Signed-off-by: Akira Yokosawa Fixes: 5903019b2a5e ("Documentation/SubmittingPatches: convert it to ReST markup") Fixes: 9b2c76777acc ("Documentation/SubmittingPatches: enrich the Sphinx output") Cc: Jonathan Corbet Cc: Mauro Carvalho Chehab Cc: stable@vger.kernel.org # v4.9+ Link: https://lore.kernel.org/r/64e105a5-50be-23f2-6cae-903a2ea98e18@gmail.com Signed-off-by: Jonathan Corbet Signed-off-by: Greg Kroah-Hartman commit 01e0745c39353a481b4f2e46327ab2d18e76f479 Author: Xiu Jianfeng Date: Fri Mar 18 14:02:01 2022 +0800 tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe() commit d0dc1a7100f19121f6e7450f9cdda11926aa3838 upstream. Currently it returns zero when CRQ response timed out, it should return an error code instead. Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before proceeding") Signed-off-by: Xiu Jianfeng Reviewed-by: Stefan Berger Acked-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen Signed-off-by: Greg Kroah-Hartman commit f8730e16aa5054f8f416aebfe73476702082d228 Author: Stefan Mahnke-Hartmann Date: Fri May 13 15:41:51 2022 +0200 tpm: Fix buffer access in tpm2_get_tpm_pt() commit e57b2523bd37e6434f4e64c7a685e3715ad21e9a upstream. Under certain conditions uninitialized memory will be accessed. As described by TCG Trusted Platform Module Library Specification, rev. 1.59 (Part 3: Commands), if a TPM2_GetCapability is received, requesting a capability, the TPM in field upgrade mode may return a zero length list. Check the property count in tpm2_get_tpm_pt(). Fixes: 2ab3241161b3 ("tpm: migrate tpm2_get_tpm_pt() to use struct tpm_buf") Cc: stable@vger.kernel.org Signed-off-by: Stefan Mahnke-Hartmann Reviewed-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen Signed-off-by: Greg Kroah-Hartman commit d4c32bd33522c36fa9c291bfbe05331ee3a9a0b5 Author: Bryan O'Donoghue Date: Fri Apr 15 13:59:52 2022 +0200 media: i2c: imx412: Fix power_off ordering commit 9a199694c6a1519522ec73a4571f68abe9f13d5d upstream. The enable path does - gpio - clock The disable path does - gpio - clock Fix the order on the power-off path so that power-off and power-on have the same ordering for clock and gpio. Fixes: 9214e86c0cc1 ("media: i2c: Add imx412 camera sensor driver") Cc: stable@vger.kernel.org Signed-off-by: Bryan O'Donoghue Reviewed-by: Jacopo Mondi Reviewed-by: Daniele Alessandrelli Signed-off-by: Sakari Ailus Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 5aada654649d9bcf6b89d7c0d1ff4b794f9295d3 Author: Bryan O'Donoghue Date: Fri Apr 15 13:59:51 2022 +0200 media: i2c: imx412: Fix reset GPIO polarity commit bb25f071fc92d3d227178a45853347c7b3b45a6b upstream. The imx412/imx577 sensor has a reset line that is active low not active high. Currently the logic for this is inverted. The right way to define the reset line is to declare it active low in the DTS and invert the logic currently contained in the driver. The DTS should represent the hardware does i.e. reset is active low. So: + reset-gpios = <&tlmm 78 GPIO_ACTIVE_LOW>; not: - reset-gpios = <&tlmm 78 GPIO_ACTIVE_HIGH>; I was a bit reticent about changing this logic since I thought it might negatively impact @intel.com users. Googling a bit though I believe this sensor is used on "Keem Bay" which is clearly a DTS based system and is not upstream yet. Fixes: 9214e86c0cc1 ("media: i2c: Add imx412 camera sensor driver") Cc: stable@vger.kernel.org Signed-off-by: Bryan O'Donoghue Reviewed-by: Jacopo Mondi Reviewed-by: Daniele Alessandrelli Signed-off-by: Sakari Ailus Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 22e83371210d2a1226ea9238957b0c30659835c3 Author: Reinette Chatre Date: Thu May 12 14:51:01 2022 -0700 x86/sgx: Ensure no data in PCMD page after truncate commit e3a3bbe3e99de73043a1d32d36cf4d211dc58c7e upstream. A PCMD (Paging Crypto MetaData) page contains the PCMD structures of enclave pages that have been encrypted and moved to the shmem backing store. When all enclave pages sharing a PCMD page are loaded in the enclave, there is no need for the PCMD page and it can be truncated from the backing store. A few issues appeared around the truncation of PCMD pages. The known issues have been addressed but the PCMD handling code could be made more robust by loudly complaining if any new issue appears in this area. Add a check that will complain with a warning if the PCMD page is not actually empty after it has been truncated. There should never be data in the PCMD page at this point since it is was just checked to be empty and truncated with enclave mutex held and is updated with the enclave mutex held. Suggested-by: Dave Hansen Signed-off-by: Reinette Chatre Signed-off-by: Dave Hansen Reviewed-by: Jarkko Sakkinen Tested-by: Haitao Huang Link: https://lkml.kernel.org/r/6495120fed43fafc1496d09dd23df922b9a32709.1652389823.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman commit 0e1f976339535561aa7b3884de3a96884c7c8184 Author: Reinette Chatre Date: Thu May 12 14:51:00 2022 -0700 x86/sgx: Fix race between reclaimer and page fault handler commit af117837ceb9a78e995804ade4726ad2c2c8981f upstream. Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP. The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away. Consider two enclave pages (ENCLAVE_A and ENCLAVE_B) sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains just one entry, that of ENCLAVE_B. Scenario proceeds where ENCLAVE_A is being evicted from the enclave while ENCLAVE_B is faulted in. sgx_reclaim_pages() { ... /* * Reclaim ENCLAVE_A */ mutex_lock(&encl->lock); /* * Get a reference to ENCLAVE_A's * shmem page where enclave page * encrypted data will be stored * as well as a reference to the * enclave page's PCMD data page, * PCMD_AB. * Release mutex before writing * any data to the shmem pages. */ sgx_encl_get_backing(...); encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl->lock); /* * Fault ENCLAVE_B */ sgx_vma_fault() { mutex_lock(&encl->lock); /* * Get reference to * ENCLAVE_B's shmem page * as well as PCMD_AB. */ sgx_encl_get_backing(...) /* * Load page back into * enclave via ELDU. */ /* * Release reference to * ENCLAVE_B' shmem page and * PCMD_AB. */ sgx_encl_put_backing(...); /* * PCMD_AB is found empty so * it and ENCLAVE_B's shmem page * are truncated. */ /* Truncate ENCLAVE_B backing page */ sgx_encl_truncate_backing_page(); /* Truncate PCMD_AB */ sgx_encl_truncate_backing_page(); mutex_unlock(&encl->lock); ... } mutex_lock(&encl->lock); encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED; /* * Write encrypted contents of * ENCLAVE_A to ENCLAVE_A shmem * page and its PCMD data to * PCMD_AB. */ sgx_encl_put_backing(...) /* * Reference to PCMD_AB is * dropped and it is truncated. * ENCLAVE_A's PCMD data is lost. */ mutex_unlock(&encl->lock); } What happens next depends on whether it is ENCLAVE_A being faulted in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting with a #GP. If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called a new PCMD page is allocated and providing the empty PCMD data for ENCLAVE_A would cause ENCLS[ELDU] to #GP If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction would #GP during its checks of the PCMD value and the WARN would be encountered. Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time it obtains a reference to the backing store pages of an enclave page it is in the process of reclaiming, fix the race by only truncating the PCMD page after ensuring that no page sharing the PCMD page is in the process of being reclaimed. Cc: stable@vger.kernel.org Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") Reported-by: Haitao Huang Signed-off-by: Reinette Chatre Signed-off-by: Dave Hansen Reviewed-by: Jarkko Sakkinen Tested-by: Haitao Huang Link: https://lkml.kernel.org/r/ed20a5db516aa813873268e125680041ae11dfcf.1652389823.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman commit 69432ff1809151114e1ef9203944c7f51d6a4203 Author: Reinette Chatre Date: Thu May 12 14:50:59 2022 -0700 x86/sgx: Obtain backing storage page with enclave mutex held commit 0e4e729a830c1e7f31d3b3fbf8feb355a402b117 upstream. Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP. The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away. The SGX backing storage is accessed on two paths: when there are insufficient free pages in the EPC the reclaimer works to move enclave pages to the backing storage and as enclaves access pages that have been moved to the backing storage they are retrieved from there as part of page fault handling. An oversubscribed SGX system will often run the reclaimer and page fault handler concurrently and needs to ensure that the backing store is accessed safely between the reclaimer and the page fault handler. This is not the case because the reclaimer accesses the backing store without the enclave mutex while the page fault handler accesses the backing store with the enclave mutex. Consider the scenario where a page is faulted while a page sharing a PCMD page with the faulted page is being reclaimed. The consequence is a race between the reclaimer and page fault handler, the reclaimer attempting to access a PCMD at the same time it is truncated by the page fault handler. This could result in lost PCMD data. Data may still be lost if the reclaimer wins the race, this is addressed in the following patch. The reclaimer accesses pages from the backing storage without holding the enclave mutex and runs the risk of concurrently accessing the backing storage with the page fault handler that does access the backing storage with the enclave mutex held. In the scenario below a PCMD page is truncated from the backing store after all its pages have been loaded in to the enclave at the same time the PCMD page is loaded from the backing store when one of its pages are reclaimed: sgx_reclaim_pages() { sgx_vma_fault() { ... mutex_lock(&encl->lock); ... __sgx_encl_eldu() { ... if (pcmd_page_empty) { /* * EPC page being reclaimed /* * shares a PCMD page with an * PCMD page truncated * enclave page that is being * while requested from * faulted in. * reclaimer. */ */ sgx_encl_get_backing() <----------> sgx_encl_truncate_backing_page() } mutex_unlock(&encl->lock); } } In this scenario there is a race between the reclaimer and the page fault handler when the reclaimer attempts to get access to the same PCMD page that is being truncated. This could result in the reclaimer writing to the PCMD page that is then truncated, causing the PCMD data to be lost, or in a new PCMD page being allocated. The lost PCMD data may still occur after protecting the backing store access with the mutex - this is fixed in the next patch. By ensuring the backing store is accessed with the mutex held the enclave page state can be made accurate with the SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page is in the process of being reclaimed. Consistently protect the reclaimer's backing store access with the enclave's mutex to ensure that it can safely run concurrently with the page fault handler. Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Reported-by: Haitao Huang Signed-off-by: Reinette Chatre Signed-off-by: Dave Hansen Reviewed-by: Jarkko Sakkinen Tested-by: Jarkko Sakkinen Tested-by: Haitao Huang Link: https://lkml.kernel.org/r/fa2e04c561a8555bfe1f4e7adc37d60efc77387b.1652389823.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman commit 876053dd7503a0f247b999b30f62b3d440ce46e2 Author: Reinette Chatre Date: Thu May 12 14:50:58 2022 -0700 x86/sgx: Mark PCMD page as dirty when modifying contents commit 2154e1c11b7080aa19f47160bd26b6f39bbd7824 upstream. Recent commit 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") expanded __sgx_encl_eldu() to clear an enclave page's PCMD (Paging Crypto MetaData) from the PCMD page in the backing store after the enclave page is restored to the enclave. Since the PCMD page in the backing store is modified the page should be marked as dirty to ensure the modified data is retained. Cc: stable@vger.kernel.org Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") Signed-off-by: Reinette Chatre Signed-off-by: Dave Hansen Reviewed-by: Jarkko Sakkinen Tested-by: Haitao Huang Link: https://lkml.kernel.org/r/00cd2ac480db01058d112e347b32599c1a806bc4.1652389823.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman commit 5ded81f422580a7901559356e214352ea30f5f66 Author: Reinette Chatre Date: Thu May 12 14:50:57 2022 -0700 x86/sgx: Disconnect backing page references from dirty status commit 6bd429643cc265e94a9d19839c771bcc5d008fa8 upstream. SGX uses shmem backing storage to store encrypted enclave pages and their crypto metadata when enclave pages are moved out of enclave memory. Two shmem backing storage pages are associated with each enclave page - one backing page to contain the encrypted enclave page data and one backing page (shared by a few enclave pages) to contain the crypto metadata used by the processor to verify the enclave page when it is loaded back into the enclave. sgx_encl_put_backing() is used to release references to the backing storage and, optionally, mark both backing store pages as dirty. Managing references and dirty status together in this way results in both backing store pages marked as dirty, even if only one of the backing store pages are changed. Additionally, waiting until the page reference is dropped to set the page dirty risks a race with the page fault handler that may load outdated data into the enclave when a page is faulted right after it is reclaimed. Consider what happens if the reclaimer writes a page to the backing store and the page is immediately faulted back, before the reclaimer is able to set the dirty bit of the page: sgx_reclaim_pages() { sgx_vma_fault() { ... sgx_encl_get_backing(); ... ... sgx_reclaimer_write() { mutex_lock(&encl->lock); /* Write data to backing store */ mutex_unlock(&encl->lock); } mutex_lock(&encl->lock); __sgx_encl_eldu() { ... /* * Enclave backing store * page not released * nor marked dirty - * contents may not be * up to date. */ sgx_encl_get_backing(); ... /* * Enclave data restored * from backing store * and PCMD pages that * are not up to date. * ENCLS[ELDU] faults * because of MAC or PCMD * checking failure. */ sgx_encl_put_backing(); } ... /* set page dirty */ sgx_encl_put_backing(); ... mutex_unlock(&encl->lock); } } Remove the option to sgx_encl_put_backing() to set the backing pages as dirty and set the needed pages as dirty right after receiving important data while enclave mutex is held. This ensures that the page fault handler can get up to date data from a page and prepares the code for a following change where only one of the backing pages need to be marked as dirty. Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Suggested-by: Dave Hansen Signed-off-by: Reinette Chatre Signed-off-by: Dave Hansen Reviewed-by: Jarkko Sakkinen Tested-by: Haitao Huang Link: https://lore.kernel.org/linux-sgx/8922e48f-6646-c7cc-6393-7c78dcf23d23@intel.com/ Link: https://lkml.kernel.org/r/fa9f98986923f43e72ef4c6702a50b2a0b3c42e3.1652389823.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman commit 6ad9dbb202a9fbdbed69f09d718d1031742ca2ca Author: Tao Jin Date: Sun Apr 3 12:57:44 2022 -0400 HID: multitouch: add quirks to enable Lenovo X12 trackpoint commit 95cd2cdc88c755dcd0a58b951faeb77742c733a4 upstream. This applies the similar quirks used by previous generation devices such as X1 tablet for X12 tablet, so that the trackpoint and buttons can work. This patch was applied and tested working on 5.17.1 . Cc: stable@vger.kernel.org # 5.8+ given that it relies on 40d5bb87377a Signed-off-by: Tao Jin Signed-off-by: Benjamin Tissoires Link: https://lore.kernel.org/r/CO6PR03MB6241CB276FCDC7F4CEDC34F6E1E29@CO6PR03MB6241.namprd03.prod.outlook.com Signed-off-by: Greg Kroah-Hartman commit 557b6a9ccceeec1ae13a83b4490458b92e064c0e Author: Marek Maślanka Date: Tue Apr 5 17:04:07 2022 +0200 HID: multitouch: Add support for Google Whiskers Touchpad commit 1d07cef7fd7599450b3d03e1915efc2a96e1f03f upstream. The Google Whiskers touchpad does not work properly with the default multitouch configuration. Instead, use the same configuration as Google Rose. Signed-off-by: Marek Maslanka Acked-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit a2b6986316a2d106f6951e76db70fa4b2fde64a9 Author: Randy Dunlap Date: Thu May 12 20:38:37 2022 -0700 fs/ntfs3: validate BOOT sectors_per_clusters commit a3b774342fa752a5290c0de36375289dfcf4a260 upstream. When the NTFS BOOT sectors_per_clusters field is > 0x80, it represents a shift value. Make sure that the shift value is not too large before using it (NTFS max cluster size is 2MB). Return -EVINVAL if it too large. This prevents negative shift values and shift values that are larger than the field size. Prevents this UBSAN error: UBSAN: shift-out-of-bounds in ../fs/ntfs3/super.c:673:16 shift exponent -192 is negative Link: https://lkml.kernel.org/r/20220502175342.20296-1-rdunlap@infradead.org Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Randy Dunlap Reported-by: syzbot+1631f09646bc214d2e76@syzkaller.appspotmail.com Reviewed-by: Namjae Jeon Cc: Konstantin Komarov Cc: Alexander Viro Cc: Kari Argillander Cc: Namjae Jeon Cc: Matthew Wilcox Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 8a395a219c081d902287643f3065b97d316d3baa Author: Mariusz Tkaczyk Date: Tue Mar 22 16:23:39 2022 +0100 raid5: introduce MD_BROKEN commit 57668f0a4cc4083a120cc8c517ca0055c4543b59 upstream. Raid456 module had allowed to achieve failed state. It was fixed by fb73b357fb9 ("raid5: block failing device if raid will be failed"). This fix introduces a bug, now if raid5 fails during IO, it may result with a hung task without completion. Faulty flag on the device is necessary to process all requests and is checked many times, mainly in analyze_stripe(). Allow to set faulty on drive again and set MD_BROKEN if raid is failed. As a result, this level is allowed to achieve failed state again, but communication with userspace (via -EBUSY status) will be preserved. This restores possibility to fail array via #mdadm --set-faulty command and will be fixed by additional verification on mdadm side. Reproduction steps: mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1 mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean mkfs.xfs /dev/md126 -f mount /dev/md126 /mnt/root/ fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1 & echo 1 > /sys/block/nvme2n1/device/device/remove echo 1 > /sys/block/nvme1n1/device/device/remove [ 1475.787779] Call Trace: [ 1475.793111] __schedule+0x2a6/0x700 [ 1475.799460] schedule+0x38/0xa0 [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456] [ 1475.813856] ? finish_wait+0x80/0x80 [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456] [ 1475.828281] ? finish_wait+0x80/0x80 [ 1475.834727] ? finish_wait+0x80/0x80 [ 1475.841127] ? finish_wait+0x80/0x80 [ 1475.847480] md_handle_request+0x119/0x190 [ 1475.854390] md_make_request+0x8a/0x190 [ 1475.861041] generic_make_request+0xcf/0x310 [ 1475.868145] submit_bio+0x3c/0x160 [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60 [ 1475.882070] iomap_dio_bio_actor+0x175/0x390 [ 1475.889149] iomap_apply+0xff/0x310 [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.909974] iomap_dio_rw+0x2f2/0x490 [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.923680] ? atime_needs_update+0x77/0xe0 [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs] [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs] [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs] [ 1475.953403] aio_read+0xd5/0x180 [ 1475.959395] ? _cond_resched+0x15/0x30 [ 1475.965907] io_submit_one+0x20b/0x3c0 [ 1475.972398] __x64_sys_io_submit+0xa2/0x180 [ 1475.979335] ? do_io_getevents+0x7c/0xc0 [ 1475.986009] do_syscall_64+0x5b/0x1a0 [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 1476.000255] RIP: 0033:0x7f11fc27978d [ 1476.006631] Code: Bad RIP value. [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds. Cc: stable@vger.kernel.org Fixes: fb73b357fb9 ("raid5: block failing device if raid will be failed") Reviewd-by: Xiao Ni Signed-off-by: Mariusz Tkaczyk Signed-off-by: Song Liu Signed-off-by: Greg Kroah-Hartman commit 417c73db67ea7ad8f03dfd34c6b0bb5f54294fa9 Author: Sarthak Kukreti Date: Tue May 31 15:56:40 2022 -0400 dm verity: set DM_TARGET_IMMUTABLE feature flag commit 4caae58406f8ceb741603eee460d79bacca9b1b5 upstream. The device-mapper framework provides a mechanism to mark targets as immutable (and hence fail table reloads that try to change the target type). Add the DM_TARGET_IMMUTABLE flag to the dm-verity target's feature flags to prevent switching the verity target with a different target type. Fixes: a4ffc152198e ("dm: add verity target") Cc: stable@vger.kernel.org Signed-off-by: Sarthak Kukreti Reviewed-by: Kees Cook Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit ddd5cd42bc5714e926be74b87eb1354b04f9d082 Author: Mikulas Patocka Date: Sun Apr 24 16:43:00 2022 -0400 dm stats: add cond_resched when looping over entries commit bfe2b0146c4d0230b68f5c71a64380ff8d361f8b upstream. dm-stats can be used with a very large number of entries (it is only limited by 1/4 of total system memory), so add rescheduling points to the loops that iterate over the entries. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit eb27bd452e9f656cb7b1a6c091c2a84d520d6258 Author: Mikulas Patocka Date: Mon Apr 25 08:53:29 2022 -0400 dm crypt: make printing of the key constant-time commit 567dd8f34560fa221a6343729474536aa7ede4fd upstream. The device mapper dm-crypt target is using scnprintf("%02x", cc->key[i]) to report the current key to userspace. However, this is not a constant-time operation and it may leak information about the key via timing, via cache access patterns or via the branch predictor. Change dm-crypt's key printing to use "%c" instead of "%02x". Also introduce hex2asc() that carefully avoids any branching or memory accesses when converting a number in the range 0 ... 15 to an ascii character. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka Tested-by: Milan Broz Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 7b05734915e59985e6793d5b8880c673cbb15326 Author: Dan Carpenter Date: Mon Apr 25 14:56:48 2022 +0300 dm integrity: fix error code in dm_integrity_ctr() commit d3f2a14b8906df913cb04a706367b012db94a6e8 upstream. The "r" variable shadows an earlier "r" that has function scope. It means that we accidentally return success instead of an error code. Smatch has a warning for this: drivers/md/dm-integrity.c:4503 dm_integrity_ctr() warn: missing error code 'r' Fixes: 7eada909bfd7 ("dm: add integrity target") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter Reviewed-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 3ad6c173c09f9127951084db211d94810f3d7dbb Author: Jonathan Bakker Date: Sun Mar 27 11:08:51 2022 -0700 ARM: dts: s5pv210: Correct interrupt name for bluetooth in Aries commit 3f5e3d3a8b895c8a11da8b0063ba2022dd9e2045 upstream. Correct the name of the bluetooth interrupt from host-wake to host-wakeup. Fixes: 1c65b6184441b ("ARM: dts: s5pv210: Correct BCM4329 bluetooth node") Cc: Signed-off-by: Jonathan Bakker Link: https://lore.kernel.org/r/CY4PR04MB0567495CFCBDC8D408D44199CB1C9@CY4PR04MB0567.namprd04.prod.outlook.com Signed-off-by: Krzysztof Kozlowski Signed-off-by: Greg Kroah-Hartman commit 2717654ae022e6ea959a4b7b762702fe1a4690c2 Author: Steven Rostedt Date: Tue Apr 5 10:02:00 2022 -0400 Bluetooth: hci_qca: Use del_timer_sync() before freeing commit 72ef98445aca568a81c2da050532500a8345ad3a upstream. While looking at a crash report on a timer list being corrupted, which usually happens when a timer is freed while still active. This is commonly triggered by code calling del_timer() instead of del_timer_sync() just before freeing. One possible culprit is the hci_qca driver, which does exactly that. Eric mentioned that wake_retrans_timer could be rearmed via the work queue, so also move the destruction of the work queue before del_timer_sync(). Cc: Eric Dumazet Cc: stable@vger.kernel.org Fixes: 0ff252c1976da ("Bluetooth: hciuart: Add support QCA chipset for UART") Signed-off-by: Steven Rostedt (Google) Signed-off-by: Marcel Holtmann Signed-off-by: Greg Kroah-Hartman commit 03141e3fd52a764da2280a5d93e14cae21ca53ae Author: Craig McLure Date: Tue May 24 08:21:15 2022 +0200 ALSA: usb-audio: Configure sync endpoints before data commit 0e85a22d01dfe9ad9a9d9e87cd4a88acce1aad65 upstream. Devices such as the TC-Helicon GoXLR require the sync endpoint to be configured in advance of the data endpoint in order for sound output to work. This patch simply changes the ordering of EP configuration to resolve this. Fixes: bf6313a0ff76 ("ALSA: usb-audio: Refactor endpoint management") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215079 Signed-off-by: Craig McLure Reviewed-by: Jaroslav Kysela Cc: Link: https://lore.kernel.org/r/20220524062115.25968-1-tiwai@suse.de Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 8a8972b987220250aa85c301194b40c796b4e6f5 Author: Takashi Iwai Date: Sat May 21 08:53:25 2022 +0200 ALSA: usb-audio: Add missing ep_idx in fixed EP quirks commit 7b0efea4baf02f5e2f89e5f9b75ef891571b45f1 upstream. The quirk entry for Focusrite Saffire 6 had no proper ep_idx for the capture endpoint, and this confused the driver, resulting in the broken sound. This patch adds the missing ep_idx in the entry. While we are at it, a couple of other entries (for Digidesign MBox and MOTU MicroBook II) seem to have the same problem, and those are covered as well. Fixes: bf6313a0ff76 ("ALSA: usb-audio: Refactor endpoint management") Reported-by: André Kapelrud Cc: Link: https://lore.kernel.org/r/20220521065325.426-1-tiwai@suse.de Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit ff2ce1bf570633707863fae1668e91b08dcb9d40 Author: Takashi Iwai Date: Sat May 21 08:46:27 2022 +0200 ALSA: usb-audio: Workaround for clock setup on TEAC devices commit 5ce0b06ae5e69e23142e73c5c3c0260e9f2ccb4b upstream. Maris reported that TEAC UD-501 (0644:8043) doesn't work with the typical "clock source 41 is not valid, cannot use" errors on the recent kernels. The currently known workaround so far is to restore (partially) what we've done unconditionally at the clock setup; namely, re-setup the USB interface immediately after the clock is changed. This patch re-introduces the behavior conditionally for TEAC devices. Further notes: - The USB interface shall be set later in snd_usb_endpoint_configure(), but this seems to be too late. - Even calling usb_set_interface() right after sne_usb_init_sample_rate() doesn't help; so this must be related with the clock validation, too. - The device may still spew the "clock source 41 is not valid" error at the first clock setup. This seems happening at the very first try of clock setup, but it disappears at later attempts. The error is likely harmless because the driver retries the clock setup (such an error is more or less expected on some devices). Fixes: bf6313a0ff76 ("ALSA: usb-audio: Refactor endpoint management") Reported-and-tested-by: Maris Abele Cc: Link: https://lore.kernel.org/r/20220521064627.29292-1-tiwai@suse.de Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit ae6ce35594fa4d08a4c8b90fb489fed3ca0dd315 Author: Akira Yokosawa Date: Mon May 2 21:05:09 2022 +0900 tools/memory-model/README: Update klitmus7 compat table commit 5b759db44195bb779828a188bad6b745c18dcd55 upstream. EXPORT_SYMBOL of do_exec() was removed in v5.17. Unfortunately, kernel modules from klitmus7 7.56 have do_exec() at the end of each kthread. herdtools7 7.56.1 has addressed the issue. Update the compatibility table accordingly. Signed-off-by: Akira Yokosawa Cc: Luc Maranget Cc: Jade Alglave Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: Paul E. McKenney Signed-off-by: Greg Kroah-Hartman commit c5402fb5f71f1a725f1e55d9c6799c0c7bec308f Author: Sultan Alsawaf Date: Fri May 13 15:11:26 2022 -0700 zsmalloc: fix races between asynchronous zspage free and page migration commit 2505a981114dcb715f8977b8433f7540854851d8 upstream. The asynchronous zspage free worker tries to lock a zspage's entire page list without defending against page migration. Since pages which haven't yet been locked can concurrently migrate off the zspage page list while lock_zspage() churns away, lock_zspage() can suffer from a few different lethal races. It can lock a page which no longer belongs to the zspage and unsafely dereference page_private(), it can unsafely dereference a torn pointer to the next page (since there's a data race), and it can observe a spurious NULL pointer to the next page and thus not lock all of the zspage's pages (since a single page migration will reconstruct the entire page list, and create_page_chain() unconditionally zeroes out each list pointer in the process). Fix the races by using migrate_read_lock() in lock_zspage() to synchronize with page migration. Link: https://lkml.kernel.org/r/20220509024703.243847-1-sultan@kerneltoast.com Fixes: 77ff465799c602 ("zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse") Signed-off-by: Sultan Alsawaf Acked-by: Minchan Kim Cc: Nitin Gupta Cc: Sergey Senozhatsky Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 7f962066014a2b3101881faf0d1614b55e3b332d Author: Marco Chiappero Date: Thu Apr 7 17:54:51 2022 +0100 crypto: qat - rework the VF2PF interrupt handling logic commit c690c7f6312ce69b426af08ae1da2b9e48a0246f upstream. Change the VF2PF interrupt handler in the PF ISR and the definition of the internal PFVF API to correct the current implementation, which can result in missed interrupts. More specifically, current HW generations consider a write to the mask register, regardless of the value, as an acknowledge of any pending VF2PF interrupt. Therefore, if there is an interrupt between the source register read and the mask register write, such interrupt will not be delivered and silently acknowledged, resulting in a lost VF2PF message. To work around the problem, rather than disabling specific interrupts, disable all the interrupts and re-enable only the ones that we are not serving (excluding the already disabled ones too). This will force any other pending interrupt to be triggered and be serviced by a subsequent ISR. This new approach requires, however, changes to the interrupt related pfvf_ops functions. In particular, get_vf2pf_sources() has now been removed in favor of disable_pending_vf2pf_interrupts(), which not only retrieves and returns the pending (and enabled) sources, but also disables them. As a consequence, introduce the adf_disable_pending_vf2pf_interrupts() utility in place of adf_disable_vf2pf_interrupts_irq(), which is no longer needed. Cc: stable@vger.kernel.org Fixes: 993161d ("crypto: qat - fix handling of VF to PF interrupts") Signed-off-by: Marco Chiappero Co-developed-by: Giovanni Cabiddu Signed-off-by: Giovanni Cabiddu Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit c98c48e067e9b46c5bffcadece8933b790262ea3 Author: Vitaly Chikunov Date: Thu Apr 21 20:25:10 2022 +0300 crypto: ecrdsa - Fix incorrect use of vli_cmp commit 7cc7ab73f83ee6d50dc9536bc3355495d8600fad upstream. Correctly compare values that shall be greater-or-equal and not just greater. Fixes: 0d7a78643f69 ("crypto: ecrdsa - add EC-RDSA (GOST 34.10) algorithm") Cc: Signed-off-by: Vitaly Chikunov Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit 78ad61fa04a95274147840a4885e0eba8ad75c97 Author: Fabio Estevam Date: Wed Apr 20 09:06:01 2022 -0300 crypto: caam - fix i.MX6SX entropy delay value commit 4ee4cdad368a26de3967f2975806a9ee2fa245df upstream. Since commit 358ba762d9f1 ("crypto: caam - enable prediction resistance in HRWNG") the following CAAM errors can be seen on i.MX6SX: caam_jr 2101000.jr: 20003c5b: CCB: desc idx 60: RNG: Hardware error hwrng: no data available This error is due to an incorrect entropy delay for i.MX6SX. Fix it by increasing the minimum entropy delay for i.MX6SX as done in U-Boot: https://patchwork.ozlabs.org/project/uboot/patch/20220415111049.2565744-1-gaurav.jain@nxp.com/ As explained in the U-Boot patch: "RNG self tests are run to determine the correct entropy delay. Such tests are executed with different voltages and temperatures to identify the worst case value for the entropy delay. For i.MX6SX, it was determined that after adding a margin value of 1000 the minimum entropy delay should be at least 12000." Cc: Fixes: 358ba762d9f1 ("crypto: caam - enable prediction resistance in HRWNG") Signed-off-by: Fabio Estevam Reviewed-by: Horia Geantă Reviewed-by: Vabhav Sharma Reviewed-by: Gaurav Jain Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit 57a01725339f9d82b099102ba2751621b1caab93 Author: Ashish Kalra Date: Mon May 16 15:43:10 2022 +0000 KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak commit d22d2474e3953996f03528b84b7f52cc26a39403 upstream. For some sev ioctl interfaces, the length parameter that is passed maybe less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data that PSP firmware returns. In this case, kmalloc will allocate memory that is the size of the input rather than the size of the data. Since PSP firmware doesn't fully overwrite the allocated buffer, these sev ioctl interface may return uninitialized kernel slab memory. Reported-by: Andy Nguyen Suggested-by: David Rientjes Suggested-by: Peter Gonda Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: eaf78265a4ab3 ("KVM: SVM: Move SEV code to separate file") Fixes: 2c07ded06427d ("KVM: SVM: add support for SEV attestation command") Fixes: 4cfdd47d6d95a ("KVM: SVM: Add KVM_SEV SEND_START command") Fixes: d3d1af85e2c75 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command") Fixes: eba04b20e4861 ("KVM: x86: Account a variety of miscellaneous allocations") Signed-off-by: Ashish Kalra Reviewed-by: Peter Gonda Message-Id: <20220516154310.3685678-1-Ashish.Kalra@amd.com> Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit bd6fce7d96d14d8b3804ebbc780b95e36e53275c Author: Hou Wenlong Date: Tue Mar 15 17:35:13 2022 +0800 KVM: x86/mmu: Don't rebuild page when the page is synced and no tlb flushing is required commit 8d5678a76689acbf91245a3791fe853ab773090f upstream. Before Commit c3e5e415bc1e6 ("KVM: X86: Change kvm_sync_page() to return true when remote flush is needed"), the return value of kvm_sync_page() indicates whether the page is synced, and kvm_mmu_get_page() would rebuild page when the sync fails. But now, kvm_sync_page() returns false when the page is synced and no tlb flushing is required, which leads to rebuild page in kvm_mmu_get_page(). So return the return value of mmu->sync_page() directly and check it in kvm_mmu_get_page(). If the sync fails, the page will be zapped and the invalid_list is not empty, so set flush as true is accepted in mmu_sync_children(). Cc: stable@vger.kernel.org Fixes: c3e5e415bc1e6 ("KVM: X86: Change kvm_sync_page() to return true when remote flush is needed") Signed-off-by: Hou Wenlong Acked-by: Lai Jiangshan Message-Id: <0dabeeb789f57b0d793f85d073893063e692032d.1647336064.git.houwenlong.hwl@antgroup.com> [mmu_sync_children should not flush if the page is zapped. - Paolo] Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 7de373c9b48229e428ecdb8fbde269c5a8617fd2 Author: Sean Christopherson Date: Thu Apr 7 00:23:13 2022 +0000 KVM: x86: Drop WARNs that assert a triple fault never "escapes" from L2 commit 45846661d10422ce9e22da21f8277540b29eca22 upstream. Remove WARNs that sanity check that KVM never lets a triple fault for L2 escape and incorrectly end up in L1. In normal operation, the sanity check is perfectly valid, but it incorrectly assumes that it's impossible for userspace to induce KVM_REQ_TRIPLE_FAULT without bouncing through KVM_RUN (which guarantees kvm_check_nested_state() will see and handle the triple fault). The WARN can currently be triggered if userspace injects a machine check while L2 is active and CR4.MCE=0. And a future fix to allow save/restore of KVM_REQ_TRIPLE_FAULT, e.g. so that a synthesized triple fault isn't lost on migration, will make it trivially easy for userspace to trigger the WARN. Clearing KVM_REQ_TRIPLE_FAULT when forcibly leaving guest mode is tempting, but wrong, especially if/when the request is saved/restored, e.g. if userspace restores events (including a triple fault) and then restores nested state (which may forcibly leave guest mode). Ignoring the fact that KVM doesn't currently provide the necessary APIs, it's userspace's responsibility to manage pending events during save/restore. ------------[ cut here ]------------ WARNING: CPU: 7 PID: 1399 at arch/x86/kvm/vmx/nested.c:4522 nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel] Modules linked in: kvm_intel kvm irqbypass CPU: 7 PID: 1399 Comm: state_test Not tainted 5.17.0-rc3+ #808 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel] Call Trace: vmx_leave_nested+0x30/0x40 [kvm_intel] vmx_set_nested_state+0xca/0x3e0 [kvm_intel] kvm_arch_vcpu_ioctl+0xf49/0x13e0 [kvm] kvm_vcpu_ioctl+0x4b9/0x660 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae ---[ end trace 0000000000000000 ]--- Fixes: cb6a32c2b877 ("KVM: x86: Handle triple fault in L2 without killing L1") Cc: stable@vger.kernel.org Cc: Chenyi Qiang Signed-off-by: Sean Christopherson Message-Id: <20220407002315.78092-2-seanjc@google.com> Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit f095b99780115d7e03f46c2d247b32fd6eb9990c Author: Yanfei Xu Date: Mon May 23 22:08:21 2022 +0800 KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest commit ffd1925a596ce68bed7d81c61cb64bc35f788a9d upstream. When kernel handles the vm-exit caused by external interrupts and NMI, it always sets kvm_intr_type to tell if it's dealing an IRQ or NMI. For the PMI scenario, it could be IRQ or NMI. However, intel_pt PMIs are only generated for HARDWARE perf events, and HARDWARE events are always configured to generate NMIs. Use kvm_handling_nmi_from_guest() to precisely identify if the intel_pt PMI came from the guest; this avoids false positives if an intel_pt PMI/NMI arrives while the host is handling an unrelated IRQ VM-Exit. Fixes: db215756ae59 ("KVM: x86: More precisely identify NMI from guest when handling PMI") Signed-off-by: Yanfei Xu Message-Id: <20220523140821.1345605-1-yanfei.xu@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit c97c2730435954f2efd712c685421146bb56bb39 Author: Maxim Levitsky Date: Tue Mar 22 19:24:42 2022 +0200 KVM: x86: avoid loading a vCPU after .vm_destroy was called commit 6fcee03df6a1a3101a77344be37bb85c6142d56c upstream. This can cause various unexpected issues, since VM is partially destroyed at that point. For example when AVIC is enabled, this causes avic_vcpu_load to access physical id page entry which is already freed by .vm_destroy. Fixes: 8221c1370056 ("svm: Manage vcpu load/unload when enable AVIC") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky Message-Id: <20220322172449.235575-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 02ea15c02befea2539d5f0d6b60ce8df88de418b Author: Sean Christopherson Date: Fri Mar 11 03:27:41 2022 +0000 KVM: x86: avoid calling x86 emulator without a decoded instruction commit fee060cd52d69c114b62d1a2948ea9648b5131f9 upstream. Whenever x86_decode_emulated_instruction() detects a breakpoint, it returns the value that kvm_vcpu_check_breakpoint() writes into its pass-by-reference second argument. Unfortunately this is completely bogus because the expected outcome of x86_decode_emulated_instruction is an EMULATION_* value. Then, if kvm_vcpu_check_breakpoint() does "*r = 0" (corresponding to a KVM_EXIT_DEBUG userspace exit), it is misunderstood as EMULATION_OK and x86_emulate_instruction() is called without having decoded the instruction. This causes various havoc from running with a stale emulation context. The fix is to move the call to kvm_vcpu_check_breakpoint() where it was before commit 4aa2691dcbd3 ("KVM: x86: Factor out x86 instruction emulation with decoding") introduced x86_decode_emulated_instruction(). The other caller of the function does not need breakpoint checks, because it is invoked as part of a vmexit and the processor has already checked those before executing the instruction that #GP'd. This fixes CVE-2022-1852. Reported-by: Qiuhao Li Reported-by: Gaoning Pan Reported-by: Yongkang Jia Fixes: 4aa2691dcbd3 ("KVM: x86: Factor out x86 instruction emulation with decoding") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Message-Id: <20220311032801.3467418-2-seanjc@google.com> [Rewrote commit message according to Qiuhao's report, since a patch already existed to fix the bug. - Paolo] Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 7cef7042458c52a401d30f2823ae9e9f22f3a90c Author: Maxim Levitsky Date: Thu May 12 13:14:20 2022 +0300 KVM: x86: fix typo in __try_cmpxchg_user causing non-atomicness commit 33fbe6befa622c082f7d417896832856814bdde0 upstream. This shows up as a TDP MMU leak when running nested. Non-working cmpxchg on L0 relies makes L1 install two different shadow pages under same spte, and one of them is leaked. Fixes: 1c2361f667f36 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses") Signed-off-by: Maxim Levitsky Message-Id: <20220512101420.306759-1-mlevitsk@redhat.com> Reviewed-by: Sean Christopherson Reviewed-by: Vitaly Kuznetsov Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit e964665cc7ca13a16992b205fce63554b9efc78b Author: Sean Christopherson Date: Wed Feb 2 00:49:44 2022 +0000 KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses commit 1c2361f667f3648855ceae25f1332c18413fdb9f upstream. Use the recently introduce __try_cmpxchg_user() to emulate atomic guest accesses via the associated userspace address instead of mapping the backing pfn into kernel address space. Using kvm_vcpu_map() is unsafe as it does not coordinate with KVM's mmu_notifier to ensure the hva=>pfn translation isn't changed/unmapped in the memremap() path, i.e. when there's no struct page and thus no elevated refcount. Fixes: 42e35f8072c3 ("KVM/X86: Use kvm_vcpu_map in emulator_cmpxchg_emulated") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Message-Id: <20220202004945.2540433-5-seanjc@google.com> Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 8089e5e1d18402fb8152d6b6815450a36fffa9b0 Author: Sean Christopherson Date: Wed Feb 2 00:49:43 2022 +0000 KVM: x86: Use __try_cmpxchg_user() to update guest PTE A/D bits commit f122dfe4476890d60b8c679128cd2259ec96a24c upstream. Use the recently introduced __try_cmpxchg_user() to update guest PTE A/D bits instead of mapping the PTE into kernel address space. The VM_PFNMAP path is broken as it assumes that vm_pgoff is the base pfn of the mapped VMA range, which is conceptually wrong as vm_pgoff is the offset relative to the file and has nothing to do with the pfn. The horrific hack worked for the original use case (backing guest memory with /dev/mem), but leads to accessing "random" pfns for pretty much any other VM_PFNMAP case. Fixes: bd53cb35a3e9 ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs") Debugged-by: Tadeusz Struk Tested-by: Tadeusz Struk Reported-by: syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Message-Id: <20220202004945.2540433-4-seanjc@google.com> Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 256ded2dbf0edcbaf4591dca64a0886dc4610b91 Author: Peter Zijlstra Date: Wed Feb 2 00:49:42 2022 +0000 x86/uaccess: Implement macros for CMPXCHG on user addresses commit 989b5db215a2f22f89d730b607b071d964780f10 upstream. Add support for CMPXCHG loops on userspace addresses. Provide both an "unsafe" version for tight loops that do their own uaccess begin/end, as well as a "safe" version for use cases where the CMPXCHG is not buried in a loop, e.g. KVM will resume the guest instead of looping when emulation of a guest atomic accesses fails the CMPXCHG. Provide 8-byte versions for 32-bit kernels so that KVM can do CMPXCHG on guest PAE PTEs, which are accessed via userspace addresses. Guard the asm_volatile_goto() variation with CC_HAS_ASM_GOTO_TIED_OUTPUT, the "+m" constraint fails on some compilers that otherwise support CC_HAS_ASM_GOTO_OUTPUT. Cc: stable@vger.kernel.org Signed-off-by: Peter Zijlstra (Intel) Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Message-Id: <20220202004945.2540433-3-seanjc@google.com> Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 2bfcab29da15c75915f988fb7de0f84a961457c7 Author: Paolo Bonzini Date: Tue May 24 09:43:31 2022 -0400 x86, kvm: use correct GFP flags for preemption disabled commit baec4f5a018fe2d708fc1022330dba04b38b5fe3 upstream. Commit ddd7ed842627 ("x86/kvm: Alloc dummy async #PF token outside of raw spinlock") leads to the following Smatch static checker warning: arch/x86/kernel/kvm.c:212 kvm_async_pf_task_wake() warn: sleeping in atomic context arch/x86/kernel/kvm.c 202 raw_spin_lock(&b->lock); 203 n = _find_apf_task(b, token); 204 if (!n) { 205 /* 206 * Async #PF not yet handled, add a dummy entry for the token. 207 * Allocating the token must be down outside of the raw lock 208 * as the allocator is preemptible on PREEMPT_RT kernels. 209 */ 210 if (!dummy) { 211 raw_spin_unlock(&b->lock); --> 212 dummy = kzalloc(sizeof(*dummy), GFP_KERNEL); ^^^^^^^^^^ Smatch thinks the caller has preempt disabled. The `smdb.py preempt kvm_async_pf_task_wake` output call tree is: sysvec_kvm_asyncpf_interrupt() <- disables preempt -> __sysvec_kvm_asyncpf_interrupt() -> kvm_async_pf_task_wake() The caller is this: arch/x86/kernel/kvm.c 290 DEFINE_IDTENTRY_SYSVEC(sysvec_kvm_asyncpf_interrupt) 291 { 292 struct pt_regs *old_regs = set_irq_regs(regs); 293 u32 token; 294 295 ack_APIC_irq(); 296 297 inc_irq_stat(irq_hv_callback_count); 298 299 if (__this_cpu_read(apf_reason.enabled)) { 300 token = __this_cpu_read(apf_reason.token); 301 kvm_async_pf_task_wake(token); 302 __this_cpu_write(apf_reason.token, 0); 303 wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1); 304 } 305 306 set_irq_regs(old_regs); 307 } The DEFINE_IDTENTRY_SYSVEC() is a wrapper that calls this function from the call_on_irqstack_cond(). It's inside the call_on_irqstack_cond() where preempt is disabled (unless it's already disabled). The irq_enter/exit_rcu() functions disable/enable preempt. Reported-by: Dan Carpenter Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 9fd15d9f62a12f5e0dc612fec9c106396daa9fa8 Author: Sean Christopherson Date: Thu May 19 07:57:11 2022 -0700 x86/kvm: Alloc dummy async #PF token outside of raw spinlock commit 0547758a6de3cc71a0cfdd031a3621a30db6a68b upstream. Drop the raw spinlock in kvm_async_pf_task_wake() before allocating the the dummy async #PF token, the allocator is preemptible on PREEMPT_RT kernels and must not be called from truly atomic contexts. Opportunistically document why it's ok to loop on allocation failure, i.e. why the function won't get stuck in an infinite loop. Reported-by: Yajun Deng Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit c181acbd1a427859d5fda543b95fbae28f7f6068 Author: Sean Christopherson Date: Wed May 4 00:12:19 2022 +0000 x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave) commit d187ba5312307d51818beafaad87d28a7d939adf upstream. Set the starting uABI size of KVM's guest FPU to 'struct kvm_xsave', i.e. to KVM's historical uABI size. When saving FPU state for usersapce, KVM (well, now the FPU) sets the FP+SSE bits in the XSAVE header even if the host doesn't support XSAVE. Setting the XSAVE header allows the VM to be migrated to a host that does support XSAVE without the new host having to handle FPU state that may or may not be compatible with XSAVE. Setting the uABI size to the host's default size results in out-of-bounds writes (setting the FP+SSE bits) and data corruption (that is thankfully caught by KASAN) when running on hosts without XSAVE, e.g. on Core2 CPUs. WARN if the default size is larger than KVM's historical uABI size; all features that can push the FPU size beyond the historical size must be opt-in. ================================================================== BUG: KASAN: slab-out-of-bounds in fpu_copy_uabi_to_guest_fpstate+0x86/0x130 Read of size 8 at addr ffff888011e33a00 by task qemu-build/681 CPU: 1 PID: 681 Comm: qemu-build Not tainted 5.18.0-rc5-KASAN-amd64 #1 Hardware name: /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010 Call Trace: dump_stack_lvl+0x34/0x45 print_report.cold+0x45/0x575 kasan_report+0x9b/0xd0 fpu_copy_uabi_to_guest_fpstate+0x86/0x130 kvm_arch_vcpu_ioctl+0x72a/0x1c50 [kvm] kvm_vcpu_ioctl+0x47f/0x7b0 [kvm] __x64_sys_ioctl+0x5de/0xc90 do_syscall_64+0x31/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xae Allocated by task 0: (stack is not available) The buggy address belongs to the object at ffff888011e33800 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 0 bytes to the right of 512-byte region [ffff888011e33800, ffff888011e33a00) The buggy address belongs to the physical page: page:0000000089cd4adb refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11e30 head:0000000089cd4adb order:2 compound_mapcount:0 compound_pincount:0 flags: 0x4000000000010200(slab|head|zone=1) raw: 4000000000010200 dead000000000100 dead000000000122 ffff888001041c80 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888011e33900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888011e33980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff888011e33a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff888011e33a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888011e33b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Fixes: be50b2065dfa ("kvm: x86: Add support for getting/setting expanded xstate buffer") Fixes: c60427dd50ba ("x86/fpu: Add uabi_size to guest_fpu") Reported-by: Zdenek Kaspar Cc: Maciej S. Szmigiero Cc: Paolo Bonzini Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Tested-by: Zdenek Kaspar Message-Id: <20220504001219.983513-1-seanjc@google.com> Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 558ecc747ccf60ce3fc7b5603a0f6a6ccfb9d9ef Author: Xiaomeng Tong Date: Thu Apr 14 14:21:03 2022 +0800 KVM: PPC: Book3S HV: fix incorrect NULL check on list iterator commit 300981abddcb13f8f06ad58f52358b53a8096775 upstream. The bug is here: if (!p) return ret; The list iterator value 'p' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found. To fix the bug, Use a new value 'iter' as the list iterator, while use the old value 'p' as a dedicated variable to point to the found element. Fixes: dfaa973ae960 ("KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Xiaomeng Tong Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20220414062103.8153-1-xiam0nd.tong@gmail.com Signed-off-by: Greg Kroah-Hartman commit 04e4a11dc723c52db7a36dc58f0d69ce6426f8f0 Author: Florian Westphal Date: Fri May 20 00:02:04 2022 +0200 netfilter: conntrack: re-fetch conntrack after insertion commit 56b14ecec97f39118bf85c9ac2438c5a949509ed upstream. In case the conntrack is clashing, insertion can free skb->_nfct and set skb->_nfct to the already-confirmed entry. This wasn't found before because the conntrack entry and the extension space used to free'd after an rcu grace period, plus the race needs events enabled to trigger. Reported-by: Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race") Fixes: 2ad9d7747c10 ("netfilter: conntrack: free extension area immediately") Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 86c0154f4c3a56c5db8b9dd09e3ce885382c2c19 Author: Pablo Neira Ayuso Date: Mon May 30 18:24:06 2022 +0200 netfilter: nf_tables: double hook unregistration in netns path commit f9a43007d3f7ba76d5e7f9421094f00f2ef202f8 upstream. __nft_release_hooks() is called from pre_netns exit path which unregisters the hooks, then the NETDEV_UNREGISTER event is triggered which unregisters the hooks again. [ 565.221461] WARNING: CPU: 18 PID: 193 at net/netfilter/core.c:495 __nf_unregister_net_hook+0x247/0x270 [...] [ 565.246890] CPU: 18 PID: 193 Comm: kworker/u64:1 Tainted: G E 5.18.0-rc7+ #27 [ 565.253682] Workqueue: netns cleanup_net [ 565.257059] RIP: 0010:__nf_unregister_net_hook+0x247/0x270 [...] [ 565.297120] Call Trace: [ 565.300900] [ 565.304683] nf_tables_flowtable_event+0x16a/0x220 [nf_tables] [ 565.308518] raw_notifier_call_chain+0x63/0x80 [ 565.312386] unregister_netdevice_many+0x54f/0xb50 Unregister and destroy netdev hook from netns pre_exit via kfree_rcu so the NETDEV_UNREGISTER path see unregistered hooks. Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns") Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit cc7c6e0a8e1d45955b976943dbd4ba98448ceb46 Author: Pablo Neira Ayuso Date: Mon May 30 18:24:05 2022 +0200 netfilter: nf_tables: hold mutex on netns pre_exit path commit 3923b1e4406680d57da7e873da77b1683035d83f upstream. clean_net() runs in workqueue while walking over the lists, grab mutex. Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns") Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit c9a46a3d549286861259c19af4747e12cfaeece9 Author: Pablo Neira Ayuso Date: Fri May 27 09:56:18 2022 +0200 netfilter: nf_tables: sanitize nft_set_desc_concat_parse() commit fecf31ee395b0295f2d7260aa29946b7605f7c85 upstream. Add several sanity checks for nft_set_desc_concat_parse(): - validate desc->field_count not larger than desc->field_len array. - field length cannot be larger than desc->field_len (ie. U8_MAX) - total length of the concatenation cannot be larger than register array. Joint work with Florian Westphal. Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields") Reported-by: Reviewed-by: Stefano Brivio Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit a51c6c58ce940db65100bbfd58189b49d9da1f27 Author: Phil Sutter Date: Tue May 24 14:50:01 2022 +0200 netfilter: nft_limit: Clone packet limits' cost value commit 558254b0b602b8605d7246a10cfeb584b1fcabfc upstream. When cloning a packet-based limit expression, copy the cost value as well. Otherwise the new limit is not functional anymore. Fixes: 3b9e2ea6c11bf ("netfilter: nft_limit: move stateful fields out of expression data") Signed-off-by: Phil Sutter Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 2193286402df2d9c53294f7a858d5e6fd7346e08 Author: Tadeusz Struk Date: Tue May 17 08:13:08 2022 +0900 exfat: check if cluster num is valid commit 64ba4b15e5c045f8b746c6da5fc9be9a6b00b61d upstream. Syzbot reported slab-out-of-bounds read in exfat_clear_bitmap. This was triggered by reproducer calling truncute with size 0, which causes the following trace: BUG: KASAN: slab-out-of-bounds in exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 Read of size 8 at addr ffff888115aa9508 by task syz-executor251/365 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack_lvl+0x1e2/0x24b lib/dump_stack.c:118 print_address_description+0x81/0x3c0 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report+0x1a4/0x1f0 mm/kasan/report.c:436 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:309 exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 exfat_free_cluster+0x25a/0x4a0 fs/exfat/fatent.c:181 __exfat_truncate+0x99e/0xe00 fs/exfat/file.c:217 exfat_truncate+0x11b/0x4f0 fs/exfat/file.c:243 exfat_setattr+0xa03/0xd40 fs/exfat/file.c:339 notify_change+0xb76/0xe10 fs/attr.c:336 do_truncate+0x1ea/0x2d0 fs/open.c:65 Move the is_valid_cluster() helper from fatent.c to a common header to make it reusable in other *.c files. And add is_valid_cluster() to validate if cluster number is within valid range in exfat_clear_bitmap() and exfat_set_bitmap(). Link: https://syzkaller.appspot.com/bug?id=50381fc73821ecae743b8cf24b4c9a04776f767c Reported-by: syzbot+a4087e40b9c13aad7892@syzkaller.appspotmail.com Fixes: 1e49a94cf707 ("exfat: add bitmap operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Tadeusz Struk Reviewed-by: Sungjong Seo Signed-off-by: Namjae Jeon Signed-off-by: Greg Kroah-Hartman commit 01d3876bdbd058d8620d5fe8f02aa05fd7276351 Author: Gustavo A. R. Silva Date: Wed Apr 27 17:47:14 2022 -0500 drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency() commit 336feb502a715909a8136eb6a62a83d7268a353b upstream. Fix the following -Wstringop-overflow warnings when building with GCC-11: drivers/gpu/drm/i915/intel_pm.c:3106:9: warning: ‘intel_read_wm_latency’ accessing 16 bytes in a region of size 10 [-Wstringop-overflow=] 3106 | intel_read_wm_latency(dev_priv, dev_priv->wm.pri_latency); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/i915/intel_pm.c:3106:9: note: referencing argument 2 of type ‘u16 *’ {aka ‘short unsigned int *’} drivers/gpu/drm/i915/intel_pm.c:2861:13: note: in a call to function ‘intel_read_wm_latency’ 2861 | static void intel_read_wm_latency(struct drm_i915_private *dev_priv, | ^~~~~~~~~~~~~~~~~~~~~ by removing the over-specified array size from the argument declarations. It seems that this code is actually safe because the size of the array depends on the hardware generation, and the function checks for that. Notice that wm can be an array of 5 elements: drivers/gpu/drm/i915/intel_pm.c:3109: intel_read_wm_latency(dev_priv, dev_priv->wm.pri_latency); or an array of 8 elements: drivers/gpu/drm/i915/intel_pm.c:3131: intel_read_wm_latency(dev_priv, dev_priv->wm.skl_latency); and the compiler legitimately complains about that. This helps with the ongoing efforts to globally enable -Wstringop-overflow. Link: https://github.com/KSPP/linux/issues/181 Signed-off-by: Gustavo A. R. Silva Signed-off-by: Greg Kroah-Hartman commit d3cb02694e703a7ff3de9ea81fd3b0a55a245849 Author: Alex Elder Date: Thu Apr 21 13:53:33 2022 -0500 net: ipa: compute proper aggregation limit commit c5794097b269f15961ed78f7f27b50e51766dec9 upstream. The aggregation byte limit for an endpoint is currently computed based on the endpoint's receive buffer size. However, some bytes at the front of each receive buffer are reserved on the assumption that--as with SKBs--it might be useful to insert data (such as headers) before what lands in the buffer. The aggregation byte limit currently doesn't take into account that reserved space, and as a result, aggregation could require space past that which is available in the buffer. Fix this by reducing the size used to compute the aggregation byte limit by the NET_SKB_PAD offset reserved for each receive buffer. Signed-off-by: Alex Elder Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 71c603806614c6715165eed06099e24c2e41ad58 Author: David Howells Date: Thu May 26 07:34:52 2022 +0100 pipe: Fix missing lock in pipe_resize_ring() commit 189b0ddc245139af81198d1a3637cac74f96e13a upstream. pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to prevent post_one_notification() from trying to insert into the ring whilst the ring is being replaced. The occupancy check must be done after the lock is taken, and the lock must be taken after the new ring is allocated. The bug can lead to an oops looking something like: BUG: KASAN: use-after-free in post_one_notification.isra.0+0x62e/0x840 Read of size 4 at addr ffff88801cc72a70 by task poc/27196 ... Call Trace: post_one_notification.isra.0+0x62e/0x840 __post_watch_notification+0x3b7/0x650 key_create_or_update+0xb8b/0xd20 __do_sys_add_key+0x175/0x340 __x64_sys_add_key+0xbe/0x140 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Reported by Selim Enes Karaduman @Enesdex working with Trend Micro Zero Day Initiative. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17291 Signed-off-by: David Howells Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 05e0caaf5133a5924636aaa7d69694f889ab58b6 Author: Kuniyuki Iwashima Date: Fri Apr 29 14:38:01 2022 -0700 pipe: make poll_usage boolean and annotate its access commit f485922d8fe4e44f6d52a5bb95a603b7c65554bb upstream. Patch series "Fix data-races around epoll reported by KCSAN." This series suppresses a false positive KCSAN's message and fixes a real data-race. This patch (of 2): pipe_poll() runs locklessly and assigns 1 to poll_usage. Once poll_usage is set to 1, it never changes in other places. However, concurrent writes of a value trigger KCSAN, so let's make KCSAN happy. BUG: KCSAN: data-race in pipe_poll / pipe_poll write to 0xffff8880042f6678 of 4 bytes by task 174 on cpu 3: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) write to 0xffff8880042f6678 of 4 bytes by task 177 on cpu 1: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 177 Comm: epoll_race Not tainted 5.17.0-58927-gf443e374ae13 #6 Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014 Link: https://lkml.kernel.org/r/20220322002653.33865-1-kuniyu@amazon.co.jp Link: https://lkml.kernel.org/r/20220322002653.33865-2-kuniyu@amazon.co.jp Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Signed-off-by: Kuniyuki Iwashima Cc: Alexander Duyck Cc: Al Viro Cc: Davidlohr Bueso Cc: Kuniyuki Iwashima Cc: "Soheil Hassas Yeganeh" Cc: "Sridhar Samudrala" Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit fffb23ab10b437aac626567f6ed041e03a20fb22 Author: Stephen Brennan Date: Thu May 19 09:50:30 2022 +0100 assoc_array: Fix BUG_ON during garbage collect commit d1dc87763f406d4e67caf16dbe438a5647692395 upstream. A rare BUG_ON triggered in assoc_array_gc: [3430308.818153] kernel BUG at lib/assoc_array.c:1609! Which corresponded to the statement currently at line 1593 upstream: BUG_ON(assoc_array_ptr_is_meta(p)); Using the data from the core dump, I was able to generate a userspace reproducer[1] and determine the cause of the bug. [1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc After running the iterator on the entire branch, an internal tree node looked like the following: NODE (nr_leaves_on_branch: 3) SLOT [0] NODE (2 leaves) SLOT [1] NODE (1 leaf) SLOT [2..f] NODE (empty) In the userspace reproducer, the pr_devel output when compressing this node was: -- compress node 0x5607cc089380 -- free=0, leaves=0 [0] retain node 2/1 [nx 0] [1] fold node 1/1 [nx 0] [2] fold node 0/1 [nx 2] [3] fold node 0/2 [nx 2] [4] fold node 0/3 [nx 2] [5] fold node 0/4 [nx 2] [6] fold node 0/5 [nx 2] [7] fold node 0/6 [nx 2] [8] fold node 0/7 [nx 2] [9] fold node 0/8 [nx 2] [10] fold node 0/9 [nx 2] [11] fold node 0/10 [nx 2] [12] fold node 0/11 [nx 2] [13] fold node 0/12 [nx 2] [14] fold node 0/13 [nx 2] [15] fold node 0/14 [nx 2] after: 3 At slot 0, an internal node with 2 leaves could not be folded into the node, because there was only one available slot (slot 0). Thus, the internal node was retained. At slot 1, the node had one leaf, and was able to be folded in successfully. The remaining nodes had no leaves, and so were removed. By the end of the compression stage, there were 14 free slots, and only 3 leaf nodes. The tree was ascended and then its parent node was compressed. When this node was seen, it could not be folded, due to the internal node it contained. The invariant for compression in this function is: whenever nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all leaf nodes. The compression step currently cannot guarantee this, given the corner case shown above. To fix this issue, retry compression whenever we have retained a node, and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second compression will then allow the node in slot 1 to be folded in, satisfying the invariant. Below is the output of the reproducer once the fix is applied: -- compress node 0x560e9c562380 -- free=0, leaves=0 [0] retain node 2/1 [nx 0] [1] fold node 1/1 [nx 0] [2] fold node 0/1 [nx 2] [3] fold node 0/2 [nx 2] [4] fold node 0/3 [nx 2] [5] fold node 0/4 [nx 2] [6] fold node 0/5 [nx 2] [7] fold node 0/6 [nx 2] [8] fold node 0/7 [nx 2] [9] fold node 0/8 [nx 2] [10] fold node 0/9 [nx 2] [11] fold node 0/10 [nx 2] [12] fold node 0/11 [nx 2] [13] fold node 0/12 [nx 2] [14] fold node 0/13 [nx 2] [15] fold node 0/14 [nx 2] internal nodes remain despite enough space, retrying -- compress node 0x560e9c562380 -- free=14, leaves=1 [0] fold node 2/15 [nx 0] after: 3 Changes ======= DH: - Use false instead of 0. - Reorder the inserted lines in a couple of places to put retained before next_slot. ver #2) - Fix typo in pr_devel, correct comparison to "<=" Fixes: 3cb989501c26 ("Add a generic associative array implementation.") Cc: Signed-off-by: Stephen Brennan Signed-off-by: David Howells cc: Andrew Morton cc: keyrings@vger.kernel.org Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.com/ # v1 Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.com/ # v2 Reviewed-by: Jarkko Sakkinen Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 2a81133304e8c10e6afa03e59f1b11beaccc7153 Author: Dan Carpenter Date: Thu Jun 2 14:02:18 2022 +0300 i2c: ismt: prevent memory corruption in ismt_access() commit 690b2549b19563ec5ad53e5c82f6a944d910086e upstream. The "data->block[0]" variable comes from the user and is a number between 0-255. It needs to be capped to prevent writing beyond the end of dma_buffer[]. Fixes: 5e9a97b1f449 ("i2c: ismt: Adding support for I2C_SMBUS_BLOCK_PROC_CALL") Reported-and-tested-by: Zheyu Ma Signed-off-by: Dan Carpenter Reviewed-by: Mika Westerberg Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 8f44c83e51b4ca49c815f8dd0d9c38f497cdbcb0 Author: Pablo Neira Ayuso Date: Wed May 25 10:36:38 2022 +0200 netfilter: nf_tables: disallow non-stateful expression in sets earlier commit 520778042ccca019f3ffa136dd0ca565c486cedd upstream. Since 3e135cd499bf ("netfilter: nft_dynset: dynamic stateful expression instantiation"), it is possible to attach stateful expressions to set elements. cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") introduces conditional destruction on the object to accomodate transaction semantics. nft_expr_init() calls expr->ops->init() first, then check for NFT_STATEFUL_EXPR, this stills allows to initialize a non-stateful lookup expressions which points to a set, which might lead to UAF since the set is not properly detached from the set->binding for this case. Anyway, this combination is non-sense from nf_tables perspective. This patch fixes this problem by checking for NFT_STATEFUL_EXPR before expr->ops->init() is called. The reporter provides a KASAN splat and a poc reproducer (similar to those autogenerated by syzbot to report use-after-free errors). It is unknown to me if they are using syzbot or if they use similar automated tool to locate the bug that they are reporting. For the record, this is the KASAN splat. [ 85.431824] ================================================================== [ 85.432901] BUG: KASAN: use-after-free in nf_tables_bind_set+0x81b/0xa20 [ 85.433825] Write of size 8 at addr ffff8880286f0e98 by task poc/776 [ 85.434756] [ 85.434999] CPU: 1 PID: 776 Comm: poc Tainted: G W 5.18.0+ #2 [ 85.436023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Fixes: 0b2d8a7b638b ("netfilter: nf_tables: add helper functions for expression handling") Reported-and-tested-by: Aaron Adams Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman