From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.gentoo.org (woodpecker.gentoo.org [140.211.166.183]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 3E0991582EF for ; Thu, 27 Feb 2025 13:21:03 +0000 (UTC) Received: from lists.gentoo.org (bobolink.gentoo.org [140.211.166.189]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: relay-lists.gentoo.org@gentoo.org) by smtp.gentoo.org (Postfix) with ESMTPSA id 2DB83342FA6 for ; Thu, 27 Feb 2025 13:20:20 +0000 (UTC) Received: from bobolink.gentoo.org (localhost [127.0.0.1]) by bobolink.gentoo.org (Postfix) with ESMTP id A82031102C9; Thu, 27 Feb 2025 13:20:18 +0000 (UTC) Received: from smtp.gentoo.org (woodpecker.gentoo.org [140.211.166.183]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bobolink.gentoo.org (Postfix) with ESMTPS id 9913B1102C9 for ; Thu, 27 Feb 2025 13:20:18 +0000 (UTC) Received: from oystercatcher.gentoo.org (oystercatcher.gentoo.org [148.251.78.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 76323342FA6 for ; Thu, 27 Feb 2025 13:20:17 +0000 (UTC) Received: from localhost.localdomain (localhost [IPv6:::1]) by oystercatcher.gentoo.org (Postfix) with ESMTP id DED0423EF for ; Thu, 27 Feb 2025 13:20:15 +0000 (UTC) From: "Mike Pagano" To: gentoo-commits@lists.gentoo.org Content-Transfer-Encoding: 8bit Content-type: text/plain; charset=UTF-8 Reply-To: gentoo-dev@lists.gentoo.org, "Mike Pagano" Message-ID: <1740662400.d8f3f9f3f0b8bb0d73ecda7a445dc34dd7752c4d.mpagano@gentoo> Subject: [gentoo-commits] proj/linux-patches:6.13 commit in: / X-VCS-Repository: proj/linux-patches X-VCS-Files: 0000_README 1004_linux-6.13.5.patch X-VCS-Directories: / X-VCS-Committer: mpagano X-VCS-Committer-Name: Mike Pagano X-VCS-Revision: d8f3f9f3f0b8bb0d73ecda7a445dc34dd7752c4d X-VCS-Branch: 6.13 Date: Thu, 27 Feb 2025 13:20:15 +0000 (UTC) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-commits@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply X-Archives-Salt: 71bf9cc3-7a39-4142-9f07-35e5f4437189 X-Archives-Hash: 1ab60ac6991e7e0a60223caadb18d890 commit: d8f3f9f3f0b8bb0d73ecda7a445dc34dd7752c4d Author: Mike Pagano gentoo org> AuthorDate: Thu Feb 27 13:20:00 2025 +0000 Commit: Mike Pagano gentoo org> CommitDate: Thu Feb 27 13:20:00 2025 +0000 URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=d8f3f9f3 Linux patch 6.13.5 Signed-off-by: Mike Pagano gentoo.org> 0000_README | 4 + 1004_linux-6.13.5.patch | 8352 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 8356 insertions(+) diff --git a/0000_README b/0000_README index 6a580881..51a3feed 100644 --- a/0000_README +++ b/0000_README @@ -59,6 +59,10 @@ Patch: 1003_linux-6.13.4.patch From: https://www.kernel.org Desc: Linux 6.13.4 +Patch: 1004_linux-6.13.5.patch +From: https://www.kernel.org +Desc: Linux 6.13.5 + Patch: 1510_fs-enable-link-security-restrictions-by-default.patch From: http://sources.debian.net/src/linux/3.16.7-ckt4-3/debian/patches/debian/fs-enable-link-security-restrictions-by-default.patch/ Desc: Enable link security restrictions by default. diff --git a/1004_linux-6.13.5.patch b/1004_linux-6.13.5.patch new file mode 100644 index 00000000..a9ea6c23 --- /dev/null +++ b/1004_linux-6.13.5.patch @@ -0,0 +1,8352 @@ +diff --git a/Makefile b/Makefile +index c436a6e64971d7..56d5c11b6f1ec6 100644 +--- a/Makefile ++++ b/Makefile +@@ -1,7 +1,7 @@ + # SPDX-License-Identifier: GPL-2.0 + VERSION = 6 + PATCHLEVEL = 13 +-SUBLEVEL = 4 ++SUBLEVEL = 5 + EXTRAVERSION = + NAME = Baby Opossum Posse + +diff --git a/arch/arm64/boot/dts/rockchip/px30-ringneck-haikou.dts b/arch/arm64/boot/dts/rockchip/px30-ringneck-haikou.dts +index e4517f47d519cc..eb9470a00e549f 100644 +--- a/arch/arm64/boot/dts/rockchip/px30-ringneck-haikou.dts ++++ b/arch/arm64/boot/dts/rockchip/px30-ringneck-haikou.dts +@@ -226,7 +226,6 @@ &uart0 { + }; + + &uart5 { +- pinctrl-0 = <&uart5_xfer>; + rts-gpios = <&gpio0 RK_PB5 GPIO_ACTIVE_HIGH>; + status = "okay"; + }; +diff --git a/arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi b/arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi +index ae050cc6cd050f..e80412abec081f 100644 +--- a/arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi ++++ b/arch/arm64/boot/dts/rockchip/px30-ringneck.dtsi +@@ -396,6 +396,12 @@ &u2phy_host { + status = "okay"; + }; + ++&uart5 { ++ /delete-property/ dmas; ++ /delete-property/ dma-names; ++ pinctrl-0 = <&uart5_xfer>; ++}; ++ + /* Mule UCAN */ + &usb_host0_ehci { + status = "okay"; +diff --git a/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts b/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts +index 67c246ad8b8c0d..ec2ce894da1fc1 100644 +--- a/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts ++++ b/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts +@@ -17,8 +17,7 @@ / { + + &gmac2io { + phy-handle = <&yt8531c>; +- tx_delay = <0x19>; +- rx_delay = <0x05>; ++ phy-mode = "rgmii-id"; + status = "okay"; + + mdio { +diff --git a/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus.dts b/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus.dts +index 324a8e951f7e49..846b931e16d212 100644 +--- a/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus.dts ++++ b/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus.dts +@@ -15,6 +15,7 @@ / { + + &gmac2io { + phy-handle = <&rtl8211e>; ++ phy-mode = "rgmii"; + tx_delay = <0x24>; + rx_delay = <0x18>; + status = "okay"; +diff --git a/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus.dtsi b/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus.dtsi +index 82021ffb0a49c2..381b88a912382c 100644 +--- a/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus.dtsi ++++ b/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus.dtsi +@@ -109,7 +109,6 @@ &gmac2io { + assigned-clocks = <&cru SCLK_MAC2IO>, <&cru SCLK_MAC2IO_EXT>; + assigned-clock-parents = <&gmac_clk>, <&gmac_clk>; + clock_in_out = "input"; +- phy-mode = "rgmii"; + phy-supply = <&vcc_io>; + pinctrl-0 = <&rgmiim1_pins>; + pinctrl-names = "default"; +diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru-chromebook.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-gru-chromebook.dtsi +index 988e6ca32fac94..a9ea4b0daa04c6 100644 +--- a/arch/arm64/boot/dts/rockchip/rk3399-gru-chromebook.dtsi ++++ b/arch/arm64/boot/dts/rockchip/rk3399-gru-chromebook.dtsi +@@ -22,11 +22,11 @@ pp900_ap: regulator-pp900-ap { + }; + + /* EC turns on w/ pp900_usb_en */ +- pp900_usb: pp900-ap { ++ pp900_usb: regulator-pp900-ap { + }; + + /* EC turns on w/ pp900_pcie_en */ +- pp900_pcie: pp900-ap { ++ pp900_pcie: regulator-pp900-ap { + }; + + pp3000: regulator-pp3000 { +@@ -126,7 +126,7 @@ pp1800_pcie: regulator-pp1800-pcie { + }; + + /* Always on; plain and simple */ +- pp3000_ap: pp3000_emmc: pp3000 { ++ pp3000_ap: pp3000_emmc: regulator-pp3000 { + }; + + pp1500_ap_io: regulator-pp1500-ap-io { +@@ -160,7 +160,7 @@ pp3300_disp: regulator-pp3300-disp { + }; + + /* EC turns on w/ pp3300_usb_en_l */ +- pp3300_usb: pp3300 { ++ pp3300_usb: regulator-pp3300 { + }; + + /* gpio is shared with pp1800_pcie and pinctrl is set there */ +diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru-scarlet.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-gru-scarlet.dtsi +index 19b23b43896583..5e068377a0a28e 100644 +--- a/arch/arm64/boot/dts/rockchip/rk3399-gru-scarlet.dtsi ++++ b/arch/arm64/boot/dts/rockchip/rk3399-gru-scarlet.dtsi +@@ -92,7 +92,7 @@ pp900_s3: regulator-pp900-s3 { + }; + + /* EC turns on pp1800_s3_en */ +- pp1800_s3: pp1800 { ++ pp1800_s3: regulator-pp1800 { + }; + + /* pp3300 children, sorted by name */ +@@ -109,11 +109,11 @@ pp2800_cam: regulator-pp2800-avdd { + }; + + /* EC turns on pp3300_s0_en */ +- pp3300_s0: pp3300 { ++ pp3300_s0: regulator-pp3300 { + }; + + /* EC turns on pp3300_s3_en */ +- pp3300_s3: pp3300 { ++ pp3300_s3: regulator-pp3300 { + }; + + /* +diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi +index 6d9e60b01225e5..7eca1da78cffab 100644 +--- a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi ++++ b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi +@@ -189,39 +189,39 @@ ppvar_gpu: ppvar-gpu { + }; + + /* EC turns on w/ pp900_ddrpll_en */ +- pp900_ddrpll: pp900-ap { ++ pp900_ddrpll: regulator-pp900-ap { + }; + + /* EC turns on w/ pp900_pll_en */ +- pp900_pll: pp900-ap { ++ pp900_pll: regulator-pp900-ap { + }; + + /* EC turns on w/ pp900_pmu_en */ +- pp900_pmu: pp900-ap { ++ pp900_pmu: regulator-pp900-ap { + }; + + /* EC turns on w/ pp1800_s0_en_l */ +- pp1800_ap_io: pp1800_emmc: pp1800_nfc: pp1800_s0: pp1800 { ++ pp1800_ap_io: pp1800_emmc: pp1800_nfc: pp1800_s0: regulator-pp1800 { + }; + + /* EC turns on w/ pp1800_avdd_en_l */ +- pp1800_avdd: pp1800 { ++ pp1800_avdd: regulator-pp1800 { + }; + + /* EC turns on w/ pp1800_lid_en_l */ +- pp1800_lid: pp1800_mic: pp1800 { ++ pp1800_lid: pp1800_mic: regulator-pp1800 { + }; + + /* EC turns on w/ lpddr_pwr_en */ +- pp1800_lpddr: pp1800 { ++ pp1800_lpddr: regulator-pp1800 { + }; + + /* EC turns on w/ pp1800_pmu_en_l */ +- pp1800_pmu: pp1800 { ++ pp1800_pmu: regulator-pp1800 { + }; + + /* EC turns on w/ pp1800_usb_en_l */ +- pp1800_usb: pp1800 { ++ pp1800_usb: regulator-pp1800 { + }; + + pp3000_sd_slot: regulator-pp3000-sd-slot { +@@ -259,11 +259,11 @@ ppvar_sd_card_io: ppvar-sd-card-io { + }; + + /* EC turns on w/ pp3300_trackpad_en_l */ +- pp3300_trackpad: pp3300-trackpad { ++ pp3300_trackpad: regulator-pp3300-trackpad { + }; + + /* EC turns on w/ usb_a_en */ +- pp5000_usb_a_vbus: pp5000 { ++ pp5000_usb_a_vbus: regulator-pp5000 { + }; + + ap_rtc_clk: ap-rtc-clk { +diff --git a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi +index a337f3fb8377e4..8e73c681268bbe 100644 +--- a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi ++++ b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi +@@ -549,10 +549,10 @@ usb_host2_xhci: usb@fcd00000 { + mmu600_pcie: iommu@fc900000 { + compatible = "arm,smmu-v3"; + reg = <0x0 0xfc900000 0x0 0x200000>; +- interrupts = , +- , +- , +- ; ++ interrupts = , ++ , ++ , ++ ; + interrupt-names = "eventq", "gerror", "priq", "cmdq-sync"; + #iommu-cells = <1>; + status = "disabled"; +@@ -561,10 +561,10 @@ mmu600_pcie: iommu@fc900000 { + mmu600_php: iommu@fcb00000 { + compatible = "arm,smmu-v3"; + reg = <0x0 0xfcb00000 0x0 0x200000>; +- interrupts = , +- , +- , +- ; ++ interrupts = , ++ , ++ , ++ ; + interrupt-names = "eventq", "gerror", "priq", "cmdq-sync"; + #iommu-cells = <1>; + status = "disabled"; +@@ -2667,9 +2667,9 @@ tsadc: tsadc@fec00000 { + rockchip,hw-tshut-temp = <120000>; + rockchip,hw-tshut-mode = <0>; /* tshut mode 0:CRU 1:GPIO */ + rockchip,hw-tshut-polarity = <0>; /* tshut polarity 0:LOW 1:HIGH */ +- pinctrl-0 = <&tsadc_gpio_func>; +- pinctrl-1 = <&tsadc_shut>; +- pinctrl-names = "gpio", "otpout"; ++ pinctrl-0 = <&tsadc_shut_org>; ++ pinctrl-1 = <&tsadc_gpio_func>; ++ pinctrl-names = "default", "sleep"; + #thermal-sensor-cells = <1>; + status = "disabled"; + }; +diff --git a/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dts b/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dts +index 92f0ed83c99022..bc6b43a771537b 100644 +--- a/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dts ++++ b/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-genbook.dts +@@ -113,7 +113,7 @@ vcc3v3_lcd: regulator-vcc3v3-lcd { + compatible = "regulator-fixed"; + regulator-name = "vcc3v3_lcd"; + enable-active-high; +- gpio = <&gpio1 RK_PC4 GPIO_ACTIVE_HIGH>; ++ gpio = <&gpio0 RK_PC4 GPIO_ACTIVE_HIGH>; + pinctrl-names = "default"; + pinctrl-0 = <&lcdpwr_en>; + vin-supply = <&vcc3v3_sys>; +@@ -241,7 +241,7 @@ &pcie3x4 { + &pinctrl { + lcd { + lcdpwr_en: lcdpwr-en { +- rockchip,pins = <1 RK_PC4 RK_FUNC_GPIO &pcfg_pull_down>; ++ rockchip,pins = <0 RK_PC4 RK_FUNC_GPIO &pcfg_pull_down>; + }; + + bl_en: bl-en { +diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h +index c3efacab4b9412..aa90a048f319a3 100644 +--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h ++++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h +@@ -77,9 +77,17 @@ + /* + * With 4K page size the real_pte machinery is all nops. + */ +-#define __real_pte(e, p, o) ((real_pte_t){(e)}) ++static inline real_pte_t __real_pte(pte_t pte, pte_t *ptep, int offset) ++{ ++ return (real_pte_t){pte}; ++} ++ + #define __rpte_to_pte(r) ((r).pte) +-#define __rpte_to_hidx(r,index) (pte_val(__rpte_to_pte(r)) >> H_PAGE_F_GIX_SHIFT) ++ ++static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long index) ++{ ++ return pte_val(__rpte_to_pte(rpte)) >> H_PAGE_F_GIX_SHIFT; ++} + + #define pte_iterate_hashed_subpages(rpte, psize, va, index, shift) \ + do { \ +diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c +index af97fbb3c257ef..f84e0337cc0296 100644 +--- a/arch/powerpc/lib/code-patching.c ++++ b/arch/powerpc/lib/code-patching.c +@@ -108,7 +108,7 @@ static int text_area_cpu_up(unsigned int cpu) + unsigned long addr; + int err; + +- area = get_vm_area(PAGE_SIZE, VM_ALLOC); ++ area = get_vm_area(PAGE_SIZE, 0); + if (!area) { + WARN_ONCE(1, "Failed to create text area for cpu %d\n", + cpu); +@@ -493,7 +493,9 @@ static int __do_patch_instructions_mm(u32 *addr, u32 *code, size_t len, bool rep + + orig_mm = start_using_temp_mm(patching_mm); + ++ kasan_disable_current(); + err = __patch_instructions(patch_addr, code, len, repeat_instr); ++ kasan_enable_current(); + + /* context synchronisation performed by __patch_instructions */ + stop_using_temp_mm(patching_mm, orig_mm); +diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c +index 6087d38c723512..ea56a6492c81bd 100644 +--- a/arch/s390/boot/startup.c ++++ b/arch/s390/boot/startup.c +@@ -75,7 +75,7 @@ static int cmma_test_essa(void) + : [reg1] "=&d" (reg1), + [reg2] "=&a" (reg2), + [rc] "+&d" (rc), +- [tmp] "=&d" (tmp), ++ [tmp] "+&d" (tmp), + "+Q" (get_lowcore()->program_new_psw), + "=Q" (old) + : [psw_old] "a" (&old), +diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c +index b1855a46b2adf6..2bba1d934efb0d 100644 +--- a/arch/x86/events/intel/core.c ++++ b/arch/x86/events/intel/core.c +@@ -397,34 +397,28 @@ static struct event_constraint intel_lnc_event_constraints[] = { + METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_FETCH_LAT, 6), + METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_MEM_BOUND, 7), + ++ INTEL_EVENT_CONSTRAINT(0x20, 0xf), ++ ++ INTEL_UEVENT_CONSTRAINT(0x012a, 0xf), ++ INTEL_UEVENT_CONSTRAINT(0x012b, 0xf), + INTEL_UEVENT_CONSTRAINT(0x0148, 0x4), + INTEL_UEVENT_CONSTRAINT(0x0175, 0x4), + + INTEL_EVENT_CONSTRAINT(0x2e, 0x3ff), + INTEL_EVENT_CONSTRAINT(0x3c, 0x3ff), +- /* +- * Generally event codes < 0x90 are restricted to counters 0-3. +- * The 0x2E and 0x3C are exception, which has no restriction. +- */ +- INTEL_EVENT_CONSTRAINT_RANGE(0x01, 0x8f, 0xf), + +- INTEL_UEVENT_CONSTRAINT(0x01a3, 0xf), +- INTEL_UEVENT_CONSTRAINT(0x02a3, 0xf), + INTEL_UEVENT_CONSTRAINT(0x08a3, 0x4), + INTEL_UEVENT_CONSTRAINT(0x0ca3, 0x4), + INTEL_UEVENT_CONSTRAINT(0x04a4, 0x1), + INTEL_UEVENT_CONSTRAINT(0x08a4, 0x1), + INTEL_UEVENT_CONSTRAINT(0x10a4, 0x1), + INTEL_UEVENT_CONSTRAINT(0x01b1, 0x8), ++ INTEL_UEVENT_CONSTRAINT(0x01cd, 0x3fc), + INTEL_UEVENT_CONSTRAINT(0x02cd, 0x3), +- INTEL_EVENT_CONSTRAINT(0xce, 0x1), + + INTEL_EVENT_CONSTRAINT_RANGE(0xd0, 0xdf, 0xf), +- /* +- * Generally event codes >= 0x90 are likely to have no restrictions. +- * The exception are defined as above. +- */ +- INTEL_EVENT_CONSTRAINT_RANGE(0x90, 0xfe, 0x3ff), ++ ++ INTEL_UEVENT_CONSTRAINT(0x00e0, 0xf), + + EVENT_CONSTRAINT_END + }; +diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c +index cb0eca73478995..04b83d5af4c4ba 100644 +--- a/arch/x86/events/intel/ds.c ++++ b/arch/x86/events/intel/ds.c +@@ -1199,7 +1199,7 @@ struct event_constraint intel_lnc_pebs_event_constraints[] = { + INTEL_FLAGS_UEVENT_CONSTRAINT(0x100, 0x100000000ULL), /* INST_RETIRED.PREC_DIST */ + INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x800000000ULL), + +- INTEL_HYBRID_LDLAT_CONSTRAINT(0x1cd, 0x3ff), ++ INTEL_HYBRID_LDLAT_CONSTRAINT(0x1cd, 0x3fc), + INTEL_HYBRID_STLAT_CONSTRAINT(0x2cd, 0x3), + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_LOADS */ + INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_STORES */ +diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c +index dfbbac92242a84..04d02c746ec0fd 100644 +--- a/drivers/bluetooth/btqca.c ++++ b/drivers/bluetooth/btqca.c +@@ -272,6 +272,39 @@ int qca_send_pre_shutdown_cmd(struct hci_dev *hdev) + } + EXPORT_SYMBOL_GPL(qca_send_pre_shutdown_cmd); + ++static bool qca_filename_has_extension(const char *filename) ++{ ++ const char *suffix = strrchr(filename, '.'); ++ ++ /* File extensions require a dot, but not as the first or last character */ ++ if (!suffix || suffix == filename || *(suffix + 1) == '\0') ++ return 0; ++ ++ /* Avoid matching directories with names that look like files with extensions */ ++ return !strchr(suffix, '/'); ++} ++ ++static bool qca_get_alt_nvm_file(char *filename, size_t max_size) ++{ ++ char fwname[64]; ++ const char *suffix; ++ ++ /* nvm file name has an extension, replace with .bin */ ++ if (qca_filename_has_extension(filename)) { ++ suffix = strrchr(filename, '.'); ++ strscpy(fwname, filename, suffix - filename + 1); ++ snprintf(fwname + (suffix - filename), ++ sizeof(fwname) - (suffix - filename), ".bin"); ++ /* If nvm file is already the default one, return false to skip the retry. */ ++ if (strcmp(fwname, filename) == 0) ++ return false; ++ ++ snprintf(filename, max_size, "%s", fwname); ++ return true; ++ } ++ return false; ++} ++ + static int qca_tlv_check_data(struct hci_dev *hdev, + struct qca_fw_config *config, + u8 *fw_data, size_t fw_size, +@@ -564,6 +597,19 @@ static int qca_download_firmware(struct hci_dev *hdev, + config->fwname, ret); + return ret; + } ++ } ++ /* If the board-specific file is missing, try loading the default ++ * one, unless that was attempted already. ++ */ ++ else if (config->type == TLV_TYPE_NVM && ++ qca_get_alt_nvm_file(config->fwname, sizeof(config->fwname))) { ++ bt_dev_info(hdev, "QCA Downloading %s", config->fwname); ++ ret = request_firmware(&fw, config->fwname, &hdev->dev); ++ if (ret) { ++ bt_dev_err(hdev, "QCA Failed to request file: %s (%d)", ++ config->fwname, ret); ++ return ret; ++ } + } else { + bt_dev_err(hdev, "QCA Failed to request file: %s (%d)", + config->fwname, ret); +@@ -700,34 +746,38 @@ static int qca_check_bdaddr(struct hci_dev *hdev, const struct qca_fw_config *co + return 0; + } + +-static void qca_generate_hsp_nvm_name(char *fwname, size_t max_size, ++static void qca_get_nvm_name_by_board(char *fwname, size_t max_size, ++ const char *stem, enum qca_btsoc_type soc_type, + struct qca_btsoc_version ver, u8 rom_ver, u16 bid) + { + const char *variant; ++ const char *prefix; + +- /* hsp gf chip */ +- if ((le32_to_cpu(ver.soc_id) & QCA_HSP_GF_SOC_MASK) == QCA_HSP_GF_SOC_ID) +- variant = "g"; +- else +- variant = ""; ++ /* Set the default value to variant and prefix */ ++ variant = ""; ++ prefix = "b"; + +- if (bid == 0x0) +- snprintf(fwname, max_size, "qca/hpnv%02x%s.bin", rom_ver, variant); +- else +- snprintf(fwname, max_size, "qca/hpnv%02x%s.%x", rom_ver, variant, bid); +-} ++ if (soc_type == QCA_QCA2066) ++ prefix = ""; + +-static inline void qca_get_nvm_name_generic(struct qca_fw_config *cfg, +- const char *stem, u8 rom_ver, u16 bid) +-{ +- if (bid == 0x0) +- snprintf(cfg->fwname, sizeof(cfg->fwname), "qca/%snv%02x.bin", stem, rom_ver); +- else if (bid & 0xff00) +- snprintf(cfg->fwname, sizeof(cfg->fwname), +- "qca/%snv%02x.b%x", stem, rom_ver, bid); +- else +- snprintf(cfg->fwname, sizeof(cfg->fwname), +- "qca/%snv%02x.b%02x", stem, rom_ver, bid); ++ if (soc_type == QCA_WCN6855 || soc_type == QCA_QCA2066) { ++ /* If the chip is manufactured by GlobalFoundries */ ++ if ((le32_to_cpu(ver.soc_id) & QCA_HSP_GF_SOC_MASK) == QCA_HSP_GF_SOC_ID) ++ variant = "g"; ++ } ++ ++ if (rom_ver != 0) { ++ if (bid == 0x0 || bid == 0xffff) ++ snprintf(fwname, max_size, "qca/%s%02x%s.bin", stem, rom_ver, variant); ++ else ++ snprintf(fwname, max_size, "qca/%s%02x%s.%s%02x", stem, rom_ver, ++ variant, prefix, bid); ++ } else { ++ if (bid == 0x0 || bid == 0xffff) ++ snprintf(fwname, max_size, "qca/%s%s.bin", stem, variant); ++ else ++ snprintf(fwname, max_size, "qca/%s%s.%s%02x", stem, variant, prefix, bid); ++ } + } + + int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, +@@ -816,8 +866,14 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, + /* Download NVM configuration */ + config.type = TLV_TYPE_NVM; + if (firmware_name) { +- snprintf(config.fwname, sizeof(config.fwname), +- "qca/%s", firmware_name); ++ /* The firmware name has an extension, use it directly */ ++ if (qca_filename_has_extension(firmware_name)) { ++ snprintf(config.fwname, sizeof(config.fwname), "qca/%s", firmware_name); ++ } else { ++ qca_read_fw_board_id(hdev, &boardid); ++ qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname), ++ firmware_name, soc_type, ver, 0, boardid); ++ } + } else { + switch (soc_type) { + case QCA_WCN3990: +@@ -836,8 +892,9 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, + "qca/apnv%02x.bin", rom_ver); + break; + case QCA_QCA2066: +- qca_generate_hsp_nvm_name(config.fwname, +- sizeof(config.fwname), ver, rom_ver, boardid); ++ qca_get_nvm_name_by_board(config.fwname, ++ sizeof(config.fwname), "hpnv", soc_type, ver, ++ rom_ver, boardid); + break; + case QCA_QCA6390: + snprintf(config.fwname, sizeof(config.fwname), +@@ -848,13 +905,14 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate, + "qca/msnv%02x.bin", rom_ver); + break; + case QCA_WCN6855: +- snprintf(config.fwname, sizeof(config.fwname), +- "qca/hpnv%02x.bin", rom_ver); ++ qca_read_fw_board_id(hdev, &boardid); ++ qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname), ++ "hpnv", soc_type, ver, rom_ver, boardid); + break; + case QCA_WCN7850: +- qca_get_nvm_name_generic(&config, "hmt", rom_ver, boardid); ++ qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname), ++ "hmtnv", soc_type, ver, rom_ver, boardid); + break; +- + default: + snprintf(config.fwname, sizeof(config.fwname), + "qca/nvm_%08x.bin", soc_ver); +diff --git a/drivers/clocksource/jcore-pit.c b/drivers/clocksource/jcore-pit.c +index a3fe98cd383820..82815428f8f925 100644 +--- a/drivers/clocksource/jcore-pit.c ++++ b/drivers/clocksource/jcore-pit.c +@@ -114,6 +114,18 @@ static int jcore_pit_local_init(unsigned cpu) + pit->periodic_delta = DIV_ROUND_CLOSEST(NSEC_PER_SEC, HZ * buspd); + + clockevents_config_and_register(&pit->ced, freq, 1, ULONG_MAX); ++ enable_percpu_irq(pit->ced.irq, IRQ_TYPE_NONE); ++ ++ return 0; ++} ++ ++static int jcore_pit_local_teardown(unsigned cpu) ++{ ++ struct jcore_pit *pit = this_cpu_ptr(jcore_pit_percpu); ++ ++ pr_info("Local J-Core PIT teardown on cpu %u\n", cpu); ++ ++ disable_percpu_irq(pit->ced.irq); + + return 0; + } +@@ -168,6 +180,7 @@ static int __init jcore_pit_init(struct device_node *node) + return -ENOMEM; + } + ++ irq_set_percpu_devid(pit_irq); + err = request_percpu_irq(pit_irq, jcore_timer_interrupt, + "jcore_pit", jcore_pit_percpu); + if (err) { +@@ -237,7 +250,7 @@ static int __init jcore_pit_init(struct device_node *node) + + cpuhp_setup_state(CPUHP_AP_JCORE_TIMER_STARTING, + "clockevents/jcore:starting", +- jcore_pit_local_init, NULL); ++ jcore_pit_local_init, jcore_pit_local_teardown); + + return 0; + } +diff --git a/drivers/edac/qcom_edac.c b/drivers/edac/qcom_edac.c +index 04c42c83a2bad5..f3da9385ca0d88 100644 +--- a/drivers/edac/qcom_edac.c ++++ b/drivers/edac/qcom_edac.c +@@ -95,7 +95,7 @@ static int qcom_llcc_core_setup(struct llcc_drv_data *drv, struct regmap *llcc_b + * Configure interrupt enable registers such that Tag, Data RAM related + * interrupts are propagated to interrupt controller for servicing + */ +- ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_2_enable, ++ ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_0_enable, + TRP0_INTERRUPT_ENABLE, + TRP0_INTERRUPT_ENABLE); + if (ret) +@@ -113,7 +113,7 @@ static int qcom_llcc_core_setup(struct llcc_drv_data *drv, struct regmap *llcc_b + if (ret) + return ret; + +- ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_2_enable, ++ ret = regmap_update_bits(llcc_bcast_regmap, drv->edac_reg_offset->cmn_interrupt_0_enable, + DRP0_INTERRUPT_ENABLE, + DRP0_INTERRUPT_ENABLE); + if (ret) +diff --git a/drivers/firmware/arm_scmi/vendors/imx/imx-sm-misc.c b/drivers/firmware/arm_scmi/vendors/imx/imx-sm-misc.c +index a86ab9b35953f7..2641faa329cdd0 100644 +--- a/drivers/firmware/arm_scmi/vendors/imx/imx-sm-misc.c ++++ b/drivers/firmware/arm_scmi/vendors/imx/imx-sm-misc.c +@@ -254,8 +254,8 @@ static int scmi_imx_misc_ctrl_set(const struct scmi_protocol_handle *ph, + if (num > max_num) + return -EINVAL; + +- ret = ph->xops->xfer_get_init(ph, SCMI_IMX_MISC_CTRL_SET, sizeof(*in), +- 0, &t); ++ ret = ph->xops->xfer_get_init(ph, SCMI_IMX_MISC_CTRL_SET, ++ sizeof(*in) + num * sizeof(__le32), 0, &t); + if (ret) + return ret; + +diff --git a/drivers/firmware/imx/Kconfig b/drivers/firmware/imx/Kconfig +index 907cd149c40a8b..c964f4924359fc 100644 +--- a/drivers/firmware/imx/Kconfig ++++ b/drivers/firmware/imx/Kconfig +@@ -25,6 +25,7 @@ config IMX_SCU + + config IMX_SCMI_MISC_DRV + tristate "IMX SCMI MISC Protocol driver" ++ depends on ARCH_MXC || COMPILE_TEST + default y if ARCH_MXC + help + The System Controller Management Interface firmware (SCMI FW) is +diff --git a/drivers/gpio/gpio-vf610.c b/drivers/gpio/gpio-vf610.c +index c4f34a347cb6ea..c36a9dbccd4dd5 100644 +--- a/drivers/gpio/gpio-vf610.c ++++ b/drivers/gpio/gpio-vf610.c +@@ -36,6 +36,7 @@ struct vf610_gpio_port { + struct clk *clk_port; + struct clk *clk_gpio; + int irq; ++ spinlock_t lock; /* protect gpio direction registers */ + }; + + #define GPIO_PDOR 0x00 +@@ -124,6 +125,7 @@ static int vf610_gpio_direction_input(struct gpio_chip *chip, unsigned int gpio) + u32 val; + + if (port->sdata->have_paddr) { ++ guard(spinlock_irqsave)(&port->lock); + val = vf610_gpio_readl(port->gpio_base + GPIO_PDDR); + val &= ~mask; + vf610_gpio_writel(val, port->gpio_base + GPIO_PDDR); +@@ -142,6 +144,7 @@ static int vf610_gpio_direction_output(struct gpio_chip *chip, unsigned int gpio + vf610_gpio_set(chip, gpio, value); + + if (port->sdata->have_paddr) { ++ guard(spinlock_irqsave)(&port->lock); + val = vf610_gpio_readl(port->gpio_base + GPIO_PDDR); + val |= mask; + vf610_gpio_writel(val, port->gpio_base + GPIO_PDDR); +@@ -297,6 +300,7 @@ static int vf610_gpio_probe(struct platform_device *pdev) + return -ENOMEM; + + port->sdata = device_get_match_data(dev); ++ spin_lock_init(&port->lock); + + dual_base = port->sdata->have_dual_base; + +diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c +index ca2f58a2cd45e7..19878bc75e94ca 100644 +--- a/drivers/gpio/gpiolib.c ++++ b/drivers/gpio/gpiolib.c +@@ -3129,6 +3129,8 @@ static int gpiod_get_raw_value_commit(const struct gpio_desc *desc) + static int gpio_chip_get_multiple(struct gpio_chip *gc, + unsigned long *mask, unsigned long *bits) + { ++ lockdep_assert_held(&gc->gpiodev->srcu); ++ + if (gc->get_multiple) + return gc->get_multiple(gc, mask, bits); + if (gc->get) { +@@ -3159,6 +3161,7 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, + struct gpio_array *array_info, + unsigned long *value_bitmap) + { ++ struct gpio_chip *gc; + int ret, i = 0; + + /* +@@ -3170,10 +3173,15 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, + array_size <= array_info->size && + (void *)array_info == desc_array + array_info->size) { + if (!can_sleep) +- WARN_ON(array_info->chip->can_sleep); ++ WARN_ON(array_info->gdev->can_sleep); ++ ++ guard(srcu)(&array_info->gdev->srcu); ++ gc = srcu_dereference(array_info->gdev->chip, ++ &array_info->gdev->srcu); ++ if (!gc) ++ return -ENODEV; + +- ret = gpio_chip_get_multiple(array_info->chip, +- array_info->get_mask, ++ ret = gpio_chip_get_multiple(gc, array_info->get_mask, + value_bitmap); + if (ret) + return ret; +@@ -3454,6 +3462,8 @@ static void gpiod_set_raw_value_commit(struct gpio_desc *desc, bool value) + static void gpio_chip_set_multiple(struct gpio_chip *gc, + unsigned long *mask, unsigned long *bits) + { ++ lockdep_assert_held(&gc->gpiodev->srcu); ++ + if (gc->set_multiple) { + gc->set_multiple(gc, mask, bits); + } else { +@@ -3471,6 +3481,7 @@ int gpiod_set_array_value_complex(bool raw, bool can_sleep, + struct gpio_array *array_info, + unsigned long *value_bitmap) + { ++ struct gpio_chip *gc; + int i = 0; + + /* +@@ -3482,14 +3493,19 @@ int gpiod_set_array_value_complex(bool raw, bool can_sleep, + array_size <= array_info->size && + (void *)array_info == desc_array + array_info->size) { + if (!can_sleep) +- WARN_ON(array_info->chip->can_sleep); ++ WARN_ON(array_info->gdev->can_sleep); ++ ++ guard(srcu)(&array_info->gdev->srcu); ++ gc = srcu_dereference(array_info->gdev->chip, ++ &array_info->gdev->srcu); ++ if (!gc) ++ return -ENODEV; + + if (!raw && !bitmap_empty(array_info->invert_mask, array_size)) + bitmap_xor(value_bitmap, value_bitmap, + array_info->invert_mask, array_size); + +- gpio_chip_set_multiple(array_info->chip, array_info->set_mask, +- value_bitmap); ++ gpio_chip_set_multiple(gc, array_info->set_mask, value_bitmap); + + i = find_first_zero_bit(array_info->set_mask, array_size); + if (i == array_size) +@@ -4751,9 +4767,10 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, + { + struct gpio_desc *desc; + struct gpio_descs *descs; ++ struct gpio_device *gdev; + struct gpio_array *array_info = NULL; +- struct gpio_chip *gc; + int count, bitmap_size; ++ unsigned long dflags; + size_t descs_size; + + count = gpiod_count(dev, con_id); +@@ -4774,7 +4791,7 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, + + descs->desc[descs->ndescs] = desc; + +- gc = gpiod_to_chip(desc); ++ gdev = gpiod_to_gpio_device(desc); + /* + * If pin hardware number of array member 0 is also 0, select + * its chip as a candidate for fast bitmap processing path. +@@ -4782,8 +4799,8 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, + if (descs->ndescs == 0 && gpio_chip_hwgpio(desc) == 0) { + struct gpio_descs *array; + +- bitmap_size = BITS_TO_LONGS(gc->ngpio > count ? +- gc->ngpio : count); ++ bitmap_size = BITS_TO_LONGS(gdev->ngpio > count ? ++ gdev->ngpio : count); + + array = krealloc(descs, descs_size + + struct_size(array_info, invert_mask, 3 * bitmap_size), +@@ -4803,7 +4820,7 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, + + array_info->desc = descs->desc; + array_info->size = count; +- array_info->chip = gc; ++ array_info->gdev = gdev; + bitmap_set(array_info->get_mask, descs->ndescs, + count - descs->ndescs); + bitmap_set(array_info->set_mask, descs->ndescs, +@@ -4816,7 +4833,7 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, + continue; + + /* Unmark array members which don't belong to the 'fast' chip */ +- if (array_info->chip != gc) { ++ if (array_info->gdev != gdev) { + __clear_bit(descs->ndescs, array_info->get_mask); + __clear_bit(descs->ndescs, array_info->set_mask); + } +@@ -4839,9 +4856,10 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, + array_info->set_mask); + } + } else { ++ dflags = READ_ONCE(desc->flags); + /* Exclude open drain or open source from fast output */ +- if (gpiochip_line_is_open_drain(gc, descs->ndescs) || +- gpiochip_line_is_open_source(gc, descs->ndescs)) ++ if (test_bit(FLAG_OPEN_DRAIN, &dflags) || ++ test_bit(FLAG_OPEN_SOURCE, &dflags)) + __clear_bit(descs->ndescs, + array_info->set_mask); + /* Identify 'fast' pins which require invertion */ +@@ -4853,7 +4871,7 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev, + if (array_info) + dev_dbg(dev, + "GPIO array info: chip=%s, size=%d, get_mask=%lx, set_mask=%lx, invert_mask=%lx\n", +- array_info->chip->label, array_info->size, ++ array_info->gdev->label, array_info->size, + *array_info->get_mask, *array_info->set_mask, + *array_info->invert_mask); + return descs; +diff --git a/drivers/gpio/gpiolib.h b/drivers/gpio/gpiolib.h +index 83690f72f7e5cb..147156ec502b29 100644 +--- a/drivers/gpio/gpiolib.h ++++ b/drivers/gpio/gpiolib.h +@@ -114,7 +114,7 @@ extern const char *const gpio_suffixes[]; + * + * @desc: Array of pointers to the GPIO descriptors + * @size: Number of elements in desc +- * @chip: Parent GPIO chip ++ * @gdev: Parent GPIO device + * @get_mask: Get mask used in fastpath + * @set_mask: Set mask used in fastpath + * @invert_mask: Invert mask used in fastpath +@@ -126,7 +126,7 @@ extern const char *const gpio_suffixes[]; + struct gpio_array { + struct gpio_desc **desc; + unsigned int size; +- struct gpio_chip *chip; ++ struct gpio_device *gdev; + unsigned long *get_mask; + unsigned long *set_mask; + unsigned long invert_mask[]; +diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig +index b55be8889e2ca6..5f140d4541a83c 100644 +--- a/drivers/gpu/drm/Kconfig ++++ b/drivers/gpu/drm/Kconfig +@@ -359,6 +359,7 @@ config DRM_TTM_HELPER + tristate + depends on DRM + select DRM_TTM ++ select DRM_KMS_HELPER if DRM_FBDEV_EMULATION + select FB_CORE if DRM_FBDEV_EMULATION + select FB_SYSMEM_HELPERS_DEFERRED if DRM_FBDEV_EMULATION + help +@@ -367,6 +368,7 @@ config DRM_TTM_HELPER + config DRM_GEM_DMA_HELPER + tristate + depends on DRM ++ select DRM_KMS_HELPER if DRM_FBDEV_EMULATION + select FB_CORE if DRM_FBDEV_EMULATION + select FB_DMAMEM_HELPERS_DEFERRED if DRM_FBDEV_EMULATION + help +@@ -375,6 +377,7 @@ config DRM_GEM_DMA_HELPER + config DRM_GEM_SHMEM_HELPER + tristate + depends on DRM && MMU ++ select DRM_KMS_HELPER if DRM_FBDEV_EMULATION + select FB_CORE if DRM_FBDEV_EMULATION + select FB_SYSMEM_HELPERS_DEFERRED if DRM_FBDEV_EMULATION + help +diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +index e63efe5c5b75a2..91a874bb0e2415 100644 +--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c ++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +@@ -120,9 +120,10 @@ + * - 3.58.0 - Add GFX12 DCC support + * - 3.59.0 - Cleared VRAM + * - 3.60.0 - Add AMDGPU_TILING_GFX12_DCC_WRITE_COMPRESS_DISABLE (Vulkan requirement) ++ * - 3.61.0 - Contains fix for RV/PCO compute queues + */ + #define KMS_DRIVER_MAJOR 3 +-#define KMS_DRIVER_MINOR 60 ++#define KMS_DRIVER_MINOR 61 + #define KMS_DRIVER_PATCHLEVEL 0 + + /* +diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +index 0b6f09f2cc9bd0..d28258bb6d2985 100644 +--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c ++++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +@@ -7439,6 +7439,34 @@ static void gfx_v9_0_ring_emit_cleaner_shader(struct amdgpu_ring *ring) + amdgpu_ring_write(ring, 0); /* RESERVED field, programmed to zero */ + } + ++static void gfx_v9_0_ring_begin_use_compute(struct amdgpu_ring *ring) ++{ ++ struct amdgpu_device *adev = ring->adev; ++ ++ amdgpu_gfx_enforce_isolation_ring_begin_use(ring); ++ ++ /* Raven and PCO APUs seem to have stability issues ++ * with compute and gfxoff and gfx pg. Disable gfx pg during ++ * submission and allow again afterwards. ++ */ ++ if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 1, 0)) ++ gfx_v9_0_set_powergating_state(adev, AMD_PG_STATE_UNGATE); ++} ++ ++static void gfx_v9_0_ring_end_use_compute(struct amdgpu_ring *ring) ++{ ++ struct amdgpu_device *adev = ring->adev; ++ ++ /* Raven and PCO APUs seem to have stability issues ++ * with compute and gfxoff and gfx pg. Disable gfx pg during ++ * submission and allow again afterwards. ++ */ ++ if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 1, 0)) ++ gfx_v9_0_set_powergating_state(adev, AMD_PG_STATE_GATE); ++ ++ amdgpu_gfx_enforce_isolation_ring_end_use(ring); ++} ++ + static const struct amd_ip_funcs gfx_v9_0_ip_funcs = { + .name = "gfx_v9_0", + .early_init = gfx_v9_0_early_init, +@@ -7615,8 +7643,8 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { + .emit_wave_limit = gfx_v9_0_emit_wave_limit, + .reset = gfx_v9_0_reset_kcq, + .emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader, +- .begin_use = amdgpu_gfx_enforce_isolation_ring_begin_use, +- .end_use = amdgpu_gfx_enforce_isolation_ring_end_use, ++ .begin_use = gfx_v9_0_ring_begin_use_compute, ++ .end_use = gfx_v9_0_ring_end_use_compute, + }; + + static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = { +diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +index 02f7ba8c93cd45..7062f12b5b7511 100644 +--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h ++++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +@@ -4117,7 +4117,8 @@ static const uint32_t cwsr_trap_gfx12_hex[] = { + 0x0000ffff, 0x8bfe7e7e, + 0x8bea6a6a, 0xb97af804, + 0xbe804ec2, 0xbf94fffe, +- 0xbe804a6c, 0xbfb10000, ++ 0xbe804a6c, 0xbe804ec2, ++ 0xbf94fffe, 0xbfb10000, + 0xbf9f0000, 0xbf9f0000, + 0xbf9f0000, 0xbf9f0000, + 0xbf9f0000, 0x00000000, +diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm +index 44772eec9ef4df..96fbb16ceb216d 100644 +--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm ++++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm +@@ -34,41 +34,24 @@ + * cpp -DASIC_FAMILY=CHIP_PLUM_BONITO cwsr_trap_handler_gfx10.asm -P -o gfx11.sp3 + * sp3 gfx11.sp3 -hex gfx11.hex + * +- * gfx12: +- * cpp -DASIC_FAMILY=CHIP_GFX12 cwsr_trap_handler_gfx10.asm -P -o gfx12.sp3 +- * sp3 gfx12.sp3 -hex gfx12.hex + */ + + #define CHIP_NAVI10 26 + #define CHIP_SIENNA_CICHLID 30 + #define CHIP_PLUM_BONITO 36 +-#define CHIP_GFX12 37 + + #define NO_SQC_STORE (ASIC_FAMILY >= CHIP_SIENNA_CICHLID) + #define HAVE_XNACK (ASIC_FAMILY < CHIP_SIENNA_CICHLID) + #define HAVE_SENDMSG_RTN (ASIC_FAMILY >= CHIP_PLUM_BONITO) + #define HAVE_BUFFER_LDS_LOAD (ASIC_FAMILY < CHIP_PLUM_BONITO) +-#define SW_SA_TRAP (ASIC_FAMILY >= CHIP_PLUM_BONITO && ASIC_FAMILY < CHIP_GFX12) ++#define SW_SA_TRAP (ASIC_FAMILY == CHIP_PLUM_BONITO) + #define SAVE_AFTER_XNACK_ERROR (HAVE_XNACK && !NO_SQC_STORE) // workaround for TCP store failure after XNACK error when ALLOW_REPLAY=0, for debugger + #define SINGLE_STEP_MISSED_WORKAROUND 1 //workaround for lost MODE.DEBUG_EN exception when SAVECTX raised + +-#if ASIC_FAMILY < CHIP_GFX12 + #define S_COHERENCE glc:1 + #define V_COHERENCE slc:1 glc:1 + #define S_WAITCNT_0 s_waitcnt 0 +-#else +-#define S_COHERENCE scope:SCOPE_SYS +-#define V_COHERENCE scope:SCOPE_SYS +-#define S_WAITCNT_0 s_wait_idle +- +-#define HW_REG_SHADER_FLAT_SCRATCH_LO HW_REG_WAVE_SCRATCH_BASE_LO +-#define HW_REG_SHADER_FLAT_SCRATCH_HI HW_REG_WAVE_SCRATCH_BASE_HI +-#define HW_REG_GPR_ALLOC HW_REG_WAVE_GPR_ALLOC +-#define HW_REG_LDS_ALLOC HW_REG_WAVE_LDS_ALLOC +-#define HW_REG_MODE HW_REG_WAVE_MODE +-#endif + +-#if ASIC_FAMILY < CHIP_GFX12 + var SQ_WAVE_STATUS_SPI_PRIO_MASK = 0x00000006 + var SQ_WAVE_STATUS_HALT_MASK = 0x2000 + var SQ_WAVE_STATUS_ECC_ERR_MASK = 0x20000 +@@ -81,21 +64,6 @@ var S_STATUS_ALWAYS_CLEAR_MASK = SQ_WAVE_STATUS_SPI_PRIO_MASK|SQ_WAVE_STATUS_E + var S_STATUS_HALT_MASK = SQ_WAVE_STATUS_HALT_MASK + var S_SAVE_PC_HI_TRAP_ID_MASK = 0x00FF0000 + var S_SAVE_PC_HI_HT_MASK = 0x01000000 +-#else +-var SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK = 0x4 +-var SQ_WAVE_STATE_PRIV_SCC_SHIFT = 9 +-var SQ_WAVE_STATE_PRIV_SYS_PRIO_MASK = 0xC00 +-var SQ_WAVE_STATE_PRIV_HALT_MASK = 0x4000 +-var SQ_WAVE_STATE_PRIV_POISON_ERR_MASK = 0x8000 +-var SQ_WAVE_STATE_PRIV_POISON_ERR_SHIFT = 15 +-var SQ_WAVE_STATUS_WAVE64_SHIFT = 29 +-var SQ_WAVE_STATUS_WAVE64_SIZE = 1 +-var SQ_WAVE_LDS_ALLOC_GRANULARITY = 9 +-var S_STATUS_HWREG = HW_REG_WAVE_STATE_PRIV +-var S_STATUS_ALWAYS_CLEAR_MASK = SQ_WAVE_STATE_PRIV_SYS_PRIO_MASK|SQ_WAVE_STATE_PRIV_POISON_ERR_MASK +-var S_STATUS_HALT_MASK = SQ_WAVE_STATE_PRIV_HALT_MASK +-var S_SAVE_PC_HI_TRAP_ID_MASK = 0xF0000000 +-#endif + + var SQ_WAVE_STATUS_NO_VGPRS_SHIFT = 24 + var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT = 12 +@@ -110,7 +78,6 @@ var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT = 8 + var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT = 12 + #endif + +-#if ASIC_FAMILY < CHIP_GFX12 + var SQ_WAVE_TRAPSTS_SAVECTX_MASK = 0x400 + var SQ_WAVE_TRAPSTS_EXCP_MASK = 0x1FF + var SQ_WAVE_TRAPSTS_SAVECTX_SHIFT = 10 +@@ -161,39 +128,6 @@ var S_TRAPSTS_RESTORE_PART_3_SIZE = 32 - S_TRAPSTS_RESTORE_PART_3_SHIFT + var S_TRAPSTS_HWREG = HW_REG_TRAPSTS + var S_TRAPSTS_SAVE_CONTEXT_MASK = SQ_WAVE_TRAPSTS_SAVECTX_MASK + var S_TRAPSTS_SAVE_CONTEXT_SHIFT = SQ_WAVE_TRAPSTS_SAVECTX_SHIFT +-#else +-var SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK = 0xF +-var SQ_WAVE_EXCP_FLAG_PRIV_MEM_VIOL_MASK = 0x10 +-var SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT = 5 +-var SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_MASK = 0x20 +-var SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_MASK = 0x40 +-var SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT = 6 +-var SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK = 0x80 +-var SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_SHIFT = 7 +-var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_MASK = 0x100 +-var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_SHIFT = 8 +-var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_END_MASK = 0x200 +-var SQ_WAVE_EXCP_FLAG_PRIV_TRAP_AFTER_INST_MASK = 0x800 +-var SQ_WAVE_TRAP_CTRL_ADDR_WATCH_MASK = 0x80 +-var SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST_MASK = 0x200 +- +-var S_TRAPSTS_HWREG = HW_REG_WAVE_EXCP_FLAG_PRIV +-var S_TRAPSTS_SAVE_CONTEXT_MASK = SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_MASK +-var S_TRAPSTS_SAVE_CONTEXT_SHIFT = SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT +-var S_TRAPSTS_NON_MASKABLE_EXCP_MASK = SQ_WAVE_EXCP_FLAG_PRIV_MEM_VIOL_MASK |\ +- SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_MASK |\ +- SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK |\ +- SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_MASK |\ +- SQ_WAVE_EXCP_FLAG_PRIV_WAVE_END_MASK |\ +- SQ_WAVE_EXCP_FLAG_PRIV_TRAP_AFTER_INST_MASK +-var S_TRAPSTS_RESTORE_PART_1_SIZE = SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT +-var S_TRAPSTS_RESTORE_PART_2_SHIFT = SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT +-var S_TRAPSTS_RESTORE_PART_2_SIZE = SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_SHIFT - SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT +-var S_TRAPSTS_RESTORE_PART_3_SHIFT = SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_SHIFT +-var S_TRAPSTS_RESTORE_PART_3_SIZE = 32 - S_TRAPSTS_RESTORE_PART_3_SHIFT +-var BARRIER_STATE_SIGNAL_OFFSET = 16 +-var BARRIER_STATE_VALID_OFFSET = 0 +-#endif + + // bits [31:24] unused by SPI debug data + var TTMP11_SAVE_REPLAY_W64H_SHIFT = 31 +@@ -305,11 +239,7 @@ L_TRAP_NO_BARRIER: + + L_HALTED: + // Host trap may occur while wave is halted. +-#if ASIC_FAMILY < CHIP_GFX12 + s_and_b32 ttmp2, s_save_pc_hi, S_SAVE_PC_HI_TRAP_ID_MASK +-#else +- s_and_b32 ttmp2, s_save_trapsts, SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK +-#endif + s_cbranch_scc1 L_FETCH_2ND_TRAP + + L_CHECK_SAVE: +@@ -336,7 +266,6 @@ L_NOT_HALTED: + // Check for maskable exceptions in trapsts.excp and trapsts.excp_hi. + // Maskable exceptions only cause the wave to enter the trap handler if + // their respective bit in mode.excp_en is set. +-#if ASIC_FAMILY < CHIP_GFX12 + s_and_b32 ttmp2, s_save_trapsts, SQ_WAVE_TRAPSTS_EXCP_MASK|SQ_WAVE_TRAPSTS_EXCP_HI_MASK + s_cbranch_scc0 L_CHECK_TRAP_ID + +@@ -349,17 +278,6 @@ L_NOT_ADDR_WATCH: + s_lshl_b32 ttmp2, ttmp2, SQ_WAVE_MODE_EXCP_EN_SHIFT + s_and_b32 ttmp2, ttmp2, ttmp3 + s_cbranch_scc1 L_FETCH_2ND_TRAP +-#else +- s_getreg_b32 ttmp2, hwreg(HW_REG_WAVE_EXCP_FLAG_USER) +- s_and_b32 ttmp3, s_save_trapsts, SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK +- s_cbranch_scc0 L_NOT_ADDR_WATCH +- s_or_b32 ttmp2, ttmp2, SQ_WAVE_TRAP_CTRL_ADDR_WATCH_MASK +- +-L_NOT_ADDR_WATCH: +- s_getreg_b32 ttmp3, hwreg(HW_REG_WAVE_TRAP_CTRL) +- s_and_b32 ttmp2, ttmp3, ttmp2 +- s_cbranch_scc1 L_FETCH_2ND_TRAP +-#endif + + L_CHECK_TRAP_ID: + // Check trap_id != 0 +@@ -369,13 +287,8 @@ L_CHECK_TRAP_ID: + #if SINGLE_STEP_MISSED_WORKAROUND + // Prioritize single step exception over context save. + // Second-level trap will halt wave and RFE, re-entering for SAVECTX. +-#if ASIC_FAMILY < CHIP_GFX12 + s_getreg_b32 ttmp2, hwreg(HW_REG_MODE) + s_and_b32 ttmp2, ttmp2, SQ_WAVE_MODE_DEBUG_EN_MASK +-#else +- // WAVE_TRAP_CTRL is already in ttmp3. +- s_and_b32 ttmp3, ttmp3, SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST_MASK +-#endif + s_cbranch_scc1 L_FETCH_2ND_TRAP + #endif + +@@ -425,12 +338,7 @@ L_NO_NEXT_TRAP: + s_cbranch_scc1 L_TRAP_CASE + + // Host trap will not cause trap re-entry. +-#if ASIC_FAMILY < CHIP_GFX12 + s_and_b32 ttmp2, s_save_pc_hi, S_SAVE_PC_HI_HT_MASK +-#else +- s_getreg_b32 ttmp2, hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV) +- s_and_b32 ttmp2, ttmp2, SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK +-#endif + s_cbranch_scc1 L_EXIT_TRAP + s_or_b32 s_save_status, s_save_status, S_STATUS_HALT_MASK + +@@ -457,16 +365,7 @@ L_EXIT_TRAP: + s_and_b64 exec, exec, exec // Restore STATUS.EXECZ, not writable by s_setreg_b32 + s_and_b64 vcc, vcc, vcc // Restore STATUS.VCCZ, not writable by s_setreg_b32 + +-#if ASIC_FAMILY < CHIP_GFX12 + s_setreg_b32 hwreg(S_STATUS_HWREG), s_save_status +-#else +- // STATE_PRIV.BARRIER_COMPLETE may have changed since we read it. +- // Only restore fields which the trap handler changes. +- s_lshr_b32 s_save_status, s_save_status, SQ_WAVE_STATE_PRIV_SCC_SHIFT +- s_setreg_b32 hwreg(S_STATUS_HWREG, SQ_WAVE_STATE_PRIV_SCC_SHIFT, \ +- SQ_WAVE_STATE_PRIV_POISON_ERR_SHIFT - SQ_WAVE_STATE_PRIV_SCC_SHIFT + 1), s_save_status +-#endif +- + s_rfe_b64 [ttmp0, ttmp1] + + L_SAVE: +@@ -478,14 +377,6 @@ L_SAVE: + s_endpgm + L_HAVE_VGPRS: + #endif +-#if ASIC_FAMILY >= CHIP_GFX12 +- s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_STATUS) +- s_bitcmp1_b32 s_save_tmp, SQ_WAVE_STATUS_NO_VGPRS_SHIFT +- s_cbranch_scc0 L_HAVE_VGPRS +- s_endpgm +-L_HAVE_VGPRS: +-#endif +- + s_and_b32 s_save_pc_hi, s_save_pc_hi, 0x0000ffff //pc[47:32] + s_mov_b32 s_save_tmp, 0 + s_setreg_b32 hwreg(S_TRAPSTS_HWREG, S_TRAPSTS_SAVE_CONTEXT_SHIFT, 1), s_save_tmp //clear saveCtx bit +@@ -671,19 +562,6 @@ L_SAVE_HWREG: + s_mov_b32 m0, 0x0 //Next lane of v2 to write to + #endif + +-#if ASIC_FAMILY >= CHIP_GFX12 +- // Ensure no further changes to barrier or LDS state. +- // STATE_PRIV.BARRIER_COMPLETE may change up to this point. +- s_barrier_signal -2 +- s_barrier_wait -2 +- +- // Re-read final state of BARRIER_COMPLETE field for save. +- s_getreg_b32 s_save_tmp, hwreg(S_STATUS_HWREG) +- s_and_b32 s_save_tmp, s_save_tmp, SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK +- s_andn2_b32 s_save_status, s_save_status, SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK +- s_or_b32 s_save_status, s_save_status, s_save_tmp +-#endif +- + write_hwreg_to_mem(s_save_m0, s_save_buf_rsrc0, s_save_mem_offset) + write_hwreg_to_mem(s_save_pc_lo, s_save_buf_rsrc0, s_save_mem_offset) + s_andn2_b32 s_save_tmp, s_save_pc_hi, S_SAVE_PC_HI_FIRST_WAVE_MASK +@@ -707,21 +585,6 @@ L_SAVE_HWREG: + s_getreg_b32 s_save_m0, hwreg(HW_REG_SHADER_FLAT_SCRATCH_HI) + write_hwreg_to_mem(s_save_m0, s_save_buf_rsrc0, s_save_mem_offset) + +-#if ASIC_FAMILY >= CHIP_GFX12 +- s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_EXCP_FLAG_USER) +- write_hwreg_to_mem(s_save_m0, s_save_buf_rsrc0, s_save_mem_offset) +- +- s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_TRAP_CTRL) +- write_hwreg_to_mem(s_save_m0, s_save_buf_rsrc0, s_save_mem_offset) +- +- s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_STATUS) +- write_hwreg_to_mem(s_save_tmp, s_save_buf_rsrc0, s_save_mem_offset) +- +- s_get_barrier_state s_save_tmp, -1 +- s_wait_kmcnt (0) +- write_hwreg_to_mem(s_save_tmp, s_save_buf_rsrc0, s_save_mem_offset) +-#endif +- + #if NO_SQC_STORE + // Write HWREGs with 16 VGPR lanes. TTMPs occupy space after this. + s_mov_b32 exec_lo, 0xFFFF +@@ -814,9 +677,7 @@ L_SAVE_LDS_NORMAL: + s_and_b32 s_save_alloc_size, s_save_alloc_size, 0xFFFFFFFF //lds_size is zero? + s_cbranch_scc0 L_SAVE_LDS_DONE //no lds used? jump to L_SAVE_DONE + +-#if ASIC_FAMILY < CHIP_GFX12 + s_barrier //LDS is used? wait for other waves in the same TG +-#endif + s_and_b32 s_save_tmp, s_save_pc_hi, S_SAVE_PC_HI_FIRST_WAVE_MASK + s_cbranch_scc0 L_SAVE_LDS_DONE + +@@ -1081,11 +942,6 @@ L_RESTORE: + s_mov_b32 s_restore_buf_rsrc2, 0 //NUM_RECORDS initial value = 0 (in bytes) + s_mov_b32 s_restore_buf_rsrc3, S_RESTORE_BUF_RSRC_WORD3_MISC + +-#if ASIC_FAMILY >= CHIP_GFX12 +- // Save s_restore_spi_init_hi for later use. +- s_mov_b32 s_restore_spi_init_hi_save, s_restore_spi_init_hi +-#endif +- + //determine it is wave32 or wave64 + get_wave_size2(s_restore_size) + +@@ -1320,9 +1176,7 @@ L_RESTORE_SGPR: + // s_barrier with MODE.DEBUG_EN=1, STATUS.PRIV=1 incorrectly asserts debug exception. + // Clear DEBUG_EN before and restore MODE after the barrier. + s_setreg_imm32_b32 hwreg(HW_REG_MODE), 0 +-#if ASIC_FAMILY < CHIP_GFX12 + s_barrier //barrier to ensure the readiness of LDS before access attemps from any other wave in the same TG +-#endif + + /* restore HW registers */ + L_RESTORE_HWREG: +@@ -1334,11 +1188,6 @@ L_RESTORE_HWREG: + + s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes + +-#if ASIC_FAMILY >= CHIP_GFX12 +- // Restore s_restore_spi_init_hi before the saved value gets clobbered. +- s_mov_b32 s_restore_spi_init_hi, s_restore_spi_init_hi_save +-#endif +- + read_hwreg_from_mem(s_restore_m0, s_restore_buf_rsrc0, s_restore_mem_offset) + read_hwreg_from_mem(s_restore_pc_lo, s_restore_buf_rsrc0, s_restore_mem_offset) + read_hwreg_from_mem(s_restore_pc_hi, s_restore_buf_rsrc0, s_restore_mem_offset) +@@ -1358,44 +1207,6 @@ L_RESTORE_HWREG: + + s_setreg_b32 hwreg(HW_REG_SHADER_FLAT_SCRATCH_HI), s_restore_flat_scratch + +-#if ASIC_FAMILY >= CHIP_GFX12 +- read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) +- S_WAITCNT_0 +- s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_USER), s_restore_tmp +- +- read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) +- S_WAITCNT_0 +- s_setreg_b32 hwreg(HW_REG_WAVE_TRAP_CTRL), s_restore_tmp +- +- // Only the first wave needs to restore the workgroup barrier. +- s_and_b32 s_restore_tmp, s_restore_spi_init_hi, S_RESTORE_SPI_INIT_FIRST_WAVE_MASK +- s_cbranch_scc0 L_SKIP_BARRIER_RESTORE +- +- // Skip over WAVE_STATUS, since there is no state to restore from it +- s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 4 +- +- read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) +- S_WAITCNT_0 +- +- s_bitcmp1_b32 s_restore_tmp, BARRIER_STATE_VALID_OFFSET +- s_cbranch_scc0 L_SKIP_BARRIER_RESTORE +- +- // extract the saved signal count from s_restore_tmp +- s_lshr_b32 s_restore_tmp, s_restore_tmp, BARRIER_STATE_SIGNAL_OFFSET +- +- // We need to call s_barrier_signal repeatedly to restore the signal +- // count of the work group barrier. The member count is already +- // initialized with the number of waves in the work group. +-L_BARRIER_RESTORE_LOOP: +- s_and_b32 s_restore_tmp, s_restore_tmp, s_restore_tmp +- s_cbranch_scc0 L_SKIP_BARRIER_RESTORE +- s_barrier_signal -1 +- s_add_i32 s_restore_tmp, s_restore_tmp, -1 +- s_branch L_BARRIER_RESTORE_LOOP +- +-L_SKIP_BARRIER_RESTORE: +-#endif +- + s_mov_b32 m0, s_restore_m0 + s_mov_b32 exec_lo, s_restore_exec_lo + s_mov_b32 exec_hi, s_restore_exec_hi +@@ -1453,13 +1264,6 @@ L_RETURN_WITHOUT_PRIV: + + s_setreg_b32 hwreg(S_STATUS_HWREG), s_restore_status // SCC is included, which is changed by previous salu + +-#if ASIC_FAMILY >= CHIP_GFX12 +- // Make barrier and LDS state visible to all waves in the group. +- // STATE_PRIV.BARRIER_COMPLETE may change after this point. +- s_barrier_signal -2 +- s_barrier_wait -2 +-#endif +- + s_rfe_b64 s_restore_pc_lo //Return to the main shader program and resume execution + + L_END_PGM: +@@ -1598,11 +1402,7 @@ function get_hwreg_size_bytes + end + + function get_wave_size2(s_reg) +-#if ASIC_FAMILY < CHIP_GFX12 + s_getreg_b32 s_reg, hwreg(HW_REG_IB_STS2,SQ_WAVE_IB_STS2_WAVE64_SHIFT,SQ_WAVE_IB_STS2_WAVE64_SIZE) +-#else +- s_getreg_b32 s_reg, hwreg(HW_REG_WAVE_STATUS,SQ_WAVE_STATUS_WAVE64_SHIFT,SQ_WAVE_STATUS_WAVE64_SIZE) +-#endif + s_lshl_b32 s_reg, s_reg, S_WAVE_SIZE + end + +diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm +new file mode 100644 +index 00000000000000..7b9d36e5fa4372 +--- /dev/null ++++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx12.asm +@@ -0,0 +1,1130 @@ ++/* ++ * Copyright 2018 Advanced Micro Devices, Inc. ++ * ++ * Permission is hereby granted, free of charge, to any person obtaining a ++ * copy of this software and associated documentation files (the "Software"), ++ * to deal in the Software without restriction, including without limitation ++ * the rights to use, copy, modify, merge, publish, distribute, sublicense, ++ * and/or sell copies of the Software, and to permit persons to whom the ++ * Software is furnished to do so, subject to the following conditions: ++ * ++ * The above copyright notice and this permission notice shall be included in ++ * all copies or substantial portions of the Software. ++ * ++ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ++ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ++ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ++ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR ++ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ++ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR ++ * OTHER DEALINGS IN THE SOFTWARE. ++ */ ++ ++/* To compile this assembly code: ++ * ++ * gfx12: ++ * cpp -DASIC_FAMILY=CHIP_GFX12 cwsr_trap_handler_gfx12.asm -P -o gfx12.sp3 ++ * sp3 gfx12.sp3 -hex gfx12.hex ++ */ ++ ++#define CHIP_GFX12 37 ++ ++#define SINGLE_STEP_MISSED_WORKAROUND 1 //workaround for lost TRAP_AFTER_INST exception when SAVECTX raised ++ ++var SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK = 0x4 ++var SQ_WAVE_STATE_PRIV_SCC_SHIFT = 9 ++var SQ_WAVE_STATE_PRIV_SYS_PRIO_MASK = 0xC00 ++var SQ_WAVE_STATE_PRIV_HALT_MASK = 0x4000 ++var SQ_WAVE_STATE_PRIV_POISON_ERR_MASK = 0x8000 ++var SQ_WAVE_STATE_PRIV_POISON_ERR_SHIFT = 15 ++var SQ_WAVE_STATUS_WAVE64_SHIFT = 29 ++var SQ_WAVE_STATUS_WAVE64_SIZE = 1 ++var SQ_WAVE_STATUS_NO_VGPRS_SHIFT = 24 ++var SQ_WAVE_STATE_PRIV_ALWAYS_CLEAR_MASK = SQ_WAVE_STATE_PRIV_SYS_PRIO_MASK|SQ_WAVE_STATE_PRIV_POISON_ERR_MASK ++var S_SAVE_PC_HI_TRAP_ID_MASK = 0xF0000000 ++ ++var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT = 12 ++var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE = 9 ++var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE = 8 ++var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT = 12 ++var SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SHIFT = 24 ++var SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SIZE = 4 ++var SQ_WAVE_LDS_ALLOC_GRANULARITY = 9 ++ ++var SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK = 0xF ++var SQ_WAVE_EXCP_FLAG_PRIV_MEM_VIOL_MASK = 0x10 ++var SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT = 5 ++var SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_MASK = 0x20 ++var SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_MASK = 0x40 ++var SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT = 6 ++var SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK = 0x80 ++var SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_SHIFT = 7 ++var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_MASK = 0x100 ++var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_SHIFT = 8 ++var SQ_WAVE_EXCP_FLAG_PRIV_WAVE_END_MASK = 0x200 ++var SQ_WAVE_EXCP_FLAG_PRIV_TRAP_AFTER_INST_MASK = 0x800 ++var SQ_WAVE_TRAP_CTRL_ADDR_WATCH_MASK = 0x80 ++var SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST_MASK = 0x200 ++ ++var SQ_WAVE_EXCP_FLAG_PRIV_NON_MASKABLE_EXCP_MASK= SQ_WAVE_EXCP_FLAG_PRIV_MEM_VIOL_MASK |\ ++ SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_MASK |\ ++ SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK |\ ++ SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_MASK |\ ++ SQ_WAVE_EXCP_FLAG_PRIV_WAVE_END_MASK |\ ++ SQ_WAVE_EXCP_FLAG_PRIV_TRAP_AFTER_INST_MASK ++var SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_1_SIZE = SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT ++var SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SHIFT = SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT ++var SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SIZE = SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_SHIFT - SQ_WAVE_EXCP_FLAG_PRIV_ILLEGAL_INST_SHIFT ++var SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SHIFT = SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_SHIFT ++var SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SIZE = 32 - SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SHIFT ++var BARRIER_STATE_SIGNAL_OFFSET = 16 ++var BARRIER_STATE_VALID_OFFSET = 0 ++ ++var TTMP11_DEBUG_TRAP_ENABLED_SHIFT = 23 ++var TTMP11_DEBUG_TRAP_ENABLED_MASK = 0x800000 ++ ++// SQ_SEL_X/Y/Z/W, BUF_NUM_FORMAT_FLOAT, (0 for MUBUF stride[17:14] ++// when ADD_TID_ENABLE and BUF_DATA_FORMAT_32 for MTBUF), ADD_TID_ENABLE ++var S_SAVE_BUF_RSRC_WORD1_STRIDE = 0x00040000 ++var S_SAVE_BUF_RSRC_WORD3_MISC = 0x10807FAC ++var S_SAVE_SPI_INIT_FIRST_WAVE_MASK = 0x04000000 ++var S_SAVE_SPI_INIT_FIRST_WAVE_SHIFT = 26 ++ ++var S_SAVE_PC_HI_FIRST_WAVE_MASK = 0x80000000 ++var S_SAVE_PC_HI_FIRST_WAVE_SHIFT = 31 ++ ++var s_sgpr_save_num = 108 ++ ++var s_save_spi_init_lo = exec_lo ++var s_save_spi_init_hi = exec_hi ++var s_save_pc_lo = ttmp0 ++var s_save_pc_hi = ttmp1 ++var s_save_exec_lo = ttmp2 ++var s_save_exec_hi = ttmp3 ++var s_save_state_priv = ttmp12 ++var s_save_excp_flag_priv = ttmp15 ++var s_save_xnack_mask = s_save_excp_flag_priv ++var s_wave_size = ttmp7 ++var s_save_buf_rsrc0 = ttmp8 ++var s_save_buf_rsrc1 = ttmp9 ++var s_save_buf_rsrc2 = ttmp10 ++var s_save_buf_rsrc3 = ttmp11 ++var s_save_mem_offset = ttmp4 ++var s_save_alloc_size = s_save_excp_flag_priv ++var s_save_tmp = ttmp14 ++var s_save_m0 = ttmp5 ++var s_save_ttmps_lo = s_save_tmp ++var s_save_ttmps_hi = s_save_excp_flag_priv ++ ++var S_RESTORE_BUF_RSRC_WORD1_STRIDE = S_SAVE_BUF_RSRC_WORD1_STRIDE ++var S_RESTORE_BUF_RSRC_WORD3_MISC = S_SAVE_BUF_RSRC_WORD3_MISC ++ ++var S_RESTORE_SPI_INIT_FIRST_WAVE_MASK = 0x04000000 ++var S_RESTORE_SPI_INIT_FIRST_WAVE_SHIFT = 26 ++var S_WAVE_SIZE = 25 ++ ++var s_restore_spi_init_lo = exec_lo ++var s_restore_spi_init_hi = exec_hi ++var s_restore_mem_offset = ttmp12 ++var s_restore_alloc_size = ttmp3 ++var s_restore_tmp = ttmp2 ++var s_restore_mem_offset_save = s_restore_tmp ++var s_restore_m0 = s_restore_alloc_size ++var s_restore_mode = ttmp7 ++var s_restore_flat_scratch = s_restore_tmp ++var s_restore_pc_lo = ttmp0 ++var s_restore_pc_hi = ttmp1 ++var s_restore_exec_lo = ttmp4 ++var s_restore_exec_hi = ttmp5 ++var s_restore_state_priv = ttmp14 ++var s_restore_excp_flag_priv = ttmp15 ++var s_restore_xnack_mask = ttmp13 ++var s_restore_buf_rsrc0 = ttmp8 ++var s_restore_buf_rsrc1 = ttmp9 ++var s_restore_buf_rsrc2 = ttmp10 ++var s_restore_buf_rsrc3 = ttmp11 ++var s_restore_size = ttmp6 ++var s_restore_ttmps_lo = s_restore_tmp ++var s_restore_ttmps_hi = s_restore_alloc_size ++var s_restore_spi_init_hi_save = s_restore_exec_hi ++ ++shader main ++ asic(DEFAULT) ++ type(CS) ++ wave_size(32) ++ ++ s_branch L_SKIP_RESTORE //NOT restore. might be a regular trap or save ++ ++L_JUMP_TO_RESTORE: ++ s_branch L_RESTORE ++ ++L_SKIP_RESTORE: ++ s_getreg_b32 s_save_state_priv, hwreg(HW_REG_WAVE_STATE_PRIV) //save STATUS since we will change SCC ++ ++ // Clear SPI_PRIO: do not save with elevated priority. ++ // Clear ECC_ERR: prevents SQC store and triggers FATAL_HALT if setreg'd. ++ s_andn2_b32 s_save_state_priv, s_save_state_priv, SQ_WAVE_STATE_PRIV_ALWAYS_CLEAR_MASK ++ ++ s_getreg_b32 s_save_excp_flag_priv, hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV) ++ ++ s_and_b32 ttmp2, s_save_state_priv, SQ_WAVE_STATE_PRIV_HALT_MASK ++ s_cbranch_scc0 L_NOT_HALTED ++ ++L_HALTED: ++ // Host trap may occur while wave is halted. ++ s_and_b32 ttmp2, s_save_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK ++ s_cbranch_scc1 L_FETCH_2ND_TRAP ++ ++L_CHECK_SAVE: ++ s_and_b32 ttmp2, s_save_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_MASK ++ s_cbranch_scc1 L_SAVE ++ ++ // Wave is halted but neither host trap nor SAVECTX is raised. ++ // Caused by instruction fetch memory violation. ++ // Spin wait until context saved to prevent interrupt storm. ++ s_sleep 0x10 ++ s_getreg_b32 s_save_excp_flag_priv, hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV) ++ s_branch L_CHECK_SAVE ++ ++L_NOT_HALTED: ++ // Let second-level handle non-SAVECTX exception or trap. ++ // Any concurrent SAVECTX will be handled upon re-entry once halted. ++ ++ // Check non-maskable exceptions. memory_violation, illegal_instruction ++ // and xnack_error exceptions always cause the wave to enter the trap ++ // handler. ++ s_and_b32 ttmp2, s_save_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_NON_MASKABLE_EXCP_MASK ++ s_cbranch_scc1 L_FETCH_2ND_TRAP ++ ++ // Check for maskable exceptions in trapsts.excp and trapsts.excp_hi. ++ // Maskable exceptions only cause the wave to enter the trap handler if ++ // their respective bit in mode.excp_en is set. ++ s_getreg_b32 ttmp2, hwreg(HW_REG_WAVE_EXCP_FLAG_USER) ++ s_and_b32 ttmp3, s_save_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK ++ s_cbranch_scc0 L_NOT_ADDR_WATCH ++ s_or_b32 ttmp2, ttmp2, SQ_WAVE_TRAP_CTRL_ADDR_WATCH_MASK ++ ++L_NOT_ADDR_WATCH: ++ s_getreg_b32 ttmp3, hwreg(HW_REG_WAVE_TRAP_CTRL) ++ s_and_b32 ttmp2, ttmp3, ttmp2 ++ s_cbranch_scc1 L_FETCH_2ND_TRAP ++ ++L_CHECK_TRAP_ID: ++ // Check trap_id != 0 ++ s_and_b32 ttmp2, s_save_pc_hi, S_SAVE_PC_HI_TRAP_ID_MASK ++ s_cbranch_scc1 L_FETCH_2ND_TRAP ++ ++#if SINGLE_STEP_MISSED_WORKAROUND ++ // Prioritize single step exception over context save. ++ // Second-level trap will halt wave and RFE, re-entering for SAVECTX. ++ // WAVE_TRAP_CTRL is already in ttmp3. ++ s_and_b32 ttmp3, ttmp3, SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST_MASK ++ s_cbranch_scc1 L_FETCH_2ND_TRAP ++#endif ++ ++ s_and_b32 ttmp2, s_save_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_MASK ++ s_cbranch_scc1 L_SAVE ++ ++L_FETCH_2ND_TRAP: ++ // Read second-level TBA/TMA from first-level TMA and jump if available. ++ // ttmp[2:5] and ttmp12 can be used (others hold SPI-initialized debug data) ++ // ttmp12 holds SQ_WAVE_STATUS ++ s_sendmsg_rtn_b64 [ttmp14, ttmp15], sendmsg(MSG_RTN_GET_TMA) ++ s_wait_idle ++ s_lshl_b64 [ttmp14, ttmp15], [ttmp14, ttmp15], 0x8 ++ ++ s_bitcmp1_b32 ttmp15, 0xF ++ s_cbranch_scc0 L_NO_SIGN_EXTEND_TMA ++ s_or_b32 ttmp15, ttmp15, 0xFFFF0000 ++L_NO_SIGN_EXTEND_TMA: ++ ++ s_load_dword ttmp2, [ttmp14, ttmp15], 0x10 scope:SCOPE_SYS // debug trap enabled flag ++ s_wait_idle ++ s_lshl_b32 ttmp2, ttmp2, TTMP11_DEBUG_TRAP_ENABLED_SHIFT ++ s_andn2_b32 ttmp11, ttmp11, TTMP11_DEBUG_TRAP_ENABLED_MASK ++ s_or_b32 ttmp11, ttmp11, ttmp2 ++ ++ s_load_dwordx2 [ttmp2, ttmp3], [ttmp14, ttmp15], 0x0 scope:SCOPE_SYS // second-level TBA ++ s_wait_idle ++ s_load_dwordx2 [ttmp14, ttmp15], [ttmp14, ttmp15], 0x8 scope:SCOPE_SYS // second-level TMA ++ s_wait_idle ++ ++ s_and_b64 [ttmp2, ttmp3], [ttmp2, ttmp3], [ttmp2, ttmp3] ++ s_cbranch_scc0 L_NO_NEXT_TRAP // second-level trap handler not been set ++ s_setpc_b64 [ttmp2, ttmp3] // jump to second-level trap handler ++ ++L_NO_NEXT_TRAP: ++ // If not caused by trap then halt wave to prevent re-entry. ++ s_and_b32 ttmp2, s_save_pc_hi, S_SAVE_PC_HI_TRAP_ID_MASK ++ s_cbranch_scc1 L_TRAP_CASE ++ ++ // Host trap will not cause trap re-entry. ++ s_getreg_b32 ttmp2, hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV) ++ s_and_b32 ttmp2, ttmp2, SQ_WAVE_EXCP_FLAG_PRIV_HOST_TRAP_MASK ++ s_cbranch_scc1 L_EXIT_TRAP ++ s_or_b32 s_save_state_priv, s_save_state_priv, SQ_WAVE_STATE_PRIV_HALT_MASK ++ ++ // If the PC points to S_ENDPGM then context save will fail if STATE_PRIV.HALT is set. ++ // Rewind the PC to prevent this from occurring. ++ s_sub_u32 ttmp0, ttmp0, 0x8 ++ s_subb_u32 ttmp1, ttmp1, 0x0 ++ ++ s_branch L_EXIT_TRAP ++ ++L_TRAP_CASE: ++ // Advance past trap instruction to prevent re-entry. ++ s_add_u32 ttmp0, ttmp0, 0x4 ++ s_addc_u32 ttmp1, ttmp1, 0x0 ++ ++L_EXIT_TRAP: ++ s_and_b32 ttmp1, ttmp1, 0xFFFF ++ ++ // Restore SQ_WAVE_STATUS. ++ s_and_b64 exec, exec, exec // Restore STATUS.EXECZ, not writable by s_setreg_b32 ++ s_and_b64 vcc, vcc, vcc // Restore STATUS.VCCZ, not writable by s_setreg_b32 ++ ++ // STATE_PRIV.BARRIER_COMPLETE may have changed since we read it. ++ // Only restore fields which the trap handler changes. ++ s_lshr_b32 s_save_state_priv, s_save_state_priv, SQ_WAVE_STATE_PRIV_SCC_SHIFT ++ s_setreg_b32 hwreg(HW_REG_WAVE_STATE_PRIV, SQ_WAVE_STATE_PRIV_SCC_SHIFT, \ ++ SQ_WAVE_STATE_PRIV_POISON_ERR_SHIFT - SQ_WAVE_STATE_PRIV_SCC_SHIFT + 1), s_save_state_priv ++ ++ s_rfe_b64 [ttmp0, ttmp1] ++ ++L_SAVE: ++ // If VGPRs have been deallocated then terminate the wavefront. ++ // It has no remaining program to run and cannot save without VGPRs. ++ s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_STATUS) ++ s_bitcmp1_b32 s_save_tmp, SQ_WAVE_STATUS_NO_VGPRS_SHIFT ++ s_cbranch_scc0 L_HAVE_VGPRS ++ s_endpgm ++L_HAVE_VGPRS: ++ ++ s_and_b32 s_save_pc_hi, s_save_pc_hi, 0x0000ffff //pc[47:32] ++ s_mov_b32 s_save_tmp, 0 ++ s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV, SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT, 1), s_save_tmp //clear saveCtx bit ++ ++ /* inform SPI the readiness and wait for SPI's go signal */ ++ s_mov_b32 s_save_exec_lo, exec_lo //save EXEC and use EXEC for the go signal from SPI ++ s_mov_b32 s_save_exec_hi, exec_hi ++ s_mov_b64 exec, 0x0 //clear EXEC to get ready to receive ++ ++ s_sendmsg_rtn_b64 [exec_lo, exec_hi], sendmsg(MSG_RTN_SAVE_WAVE) ++ s_wait_idle ++ ++ // Save first_wave flag so we can clear high bits of save address. ++ s_and_b32 s_save_tmp, s_save_spi_init_hi, S_SAVE_SPI_INIT_FIRST_WAVE_MASK ++ s_lshl_b32 s_save_tmp, s_save_tmp, (S_SAVE_PC_HI_FIRST_WAVE_SHIFT - S_SAVE_SPI_INIT_FIRST_WAVE_SHIFT) ++ s_or_b32 s_save_pc_hi, s_save_pc_hi, s_save_tmp ++ ++ // Trap temporaries must be saved via VGPR but all VGPRs are in use. ++ // There is no ttmp space to hold the resource constant for VGPR save. ++ // Save v0 by itself since it requires only two SGPRs. ++ s_mov_b32 s_save_ttmps_lo, exec_lo ++ s_and_b32 s_save_ttmps_hi, exec_hi, 0xFFFF ++ s_mov_b32 exec_lo, 0xFFFFFFFF ++ s_mov_b32 exec_hi, 0xFFFFFFFF ++ global_store_dword_addtid v0, [s_save_ttmps_lo, s_save_ttmps_hi] scope:SCOPE_SYS ++ v_mov_b32 v0, 0x0 ++ s_mov_b32 exec_lo, s_save_ttmps_lo ++ s_mov_b32 exec_hi, s_save_ttmps_hi ++ ++ // Save trap temporaries 4-11, 13 initialized by SPI debug dispatch logic ++ // ttmp SR memory offset : size(VGPR)+size(SVGPR)+size(SGPR)+0x40 ++ get_wave_size2(s_save_ttmps_hi) ++ get_vgpr_size_bytes(s_save_ttmps_lo, s_save_ttmps_hi) ++ get_svgpr_size_bytes(s_save_ttmps_hi) ++ s_add_u32 s_save_ttmps_lo, s_save_ttmps_lo, s_save_ttmps_hi ++ s_and_b32 s_save_ttmps_hi, s_save_spi_init_hi, 0xFFFF ++ s_add_u32 s_save_ttmps_lo, s_save_ttmps_lo, get_sgpr_size_bytes() ++ s_add_u32 s_save_ttmps_lo, s_save_ttmps_lo, s_save_spi_init_lo ++ s_addc_u32 s_save_ttmps_hi, s_save_ttmps_hi, 0x0 ++ ++ v_writelane_b32 v0, ttmp4, 0x4 ++ v_writelane_b32 v0, ttmp5, 0x5 ++ v_writelane_b32 v0, ttmp6, 0x6 ++ v_writelane_b32 v0, ttmp7, 0x7 ++ v_writelane_b32 v0, ttmp8, 0x8 ++ v_writelane_b32 v0, ttmp9, 0x9 ++ v_writelane_b32 v0, ttmp10, 0xA ++ v_writelane_b32 v0, ttmp11, 0xB ++ v_writelane_b32 v0, ttmp13, 0xD ++ v_writelane_b32 v0, exec_lo, 0xE ++ v_writelane_b32 v0, exec_hi, 0xF ++ ++ s_mov_b32 exec_lo, 0x3FFF ++ s_mov_b32 exec_hi, 0x0 ++ global_store_dword_addtid v0, [s_save_ttmps_lo, s_save_ttmps_hi] offset:0x40 scope:SCOPE_SYS ++ v_readlane_b32 ttmp14, v0, 0xE ++ v_readlane_b32 ttmp15, v0, 0xF ++ s_mov_b32 exec_lo, ttmp14 ++ s_mov_b32 exec_hi, ttmp15 ++ ++ /* setup Resource Contants */ ++ s_mov_b32 s_save_buf_rsrc0, s_save_spi_init_lo //base_addr_lo ++ s_and_b32 s_save_buf_rsrc1, s_save_spi_init_hi, 0x0000FFFF //base_addr_hi ++ s_or_b32 s_save_buf_rsrc1, s_save_buf_rsrc1, S_SAVE_BUF_RSRC_WORD1_STRIDE ++ s_mov_b32 s_save_buf_rsrc2, 0 //NUM_RECORDS initial value = 0 (in bytes) although not neccessarily inited ++ s_mov_b32 s_save_buf_rsrc3, S_SAVE_BUF_RSRC_WORD3_MISC ++ ++ s_mov_b32 s_save_m0, m0 ++ ++ /* global mem offset */ ++ s_mov_b32 s_save_mem_offset, 0x0 ++ get_wave_size2(s_wave_size) ++ ++ /* save first 4 VGPRs, needed for SGPR save */ ++ s_mov_b32 exec_lo, 0xFFFFFFFF //need every thread from now on ++ s_lshr_b32 m0, s_wave_size, S_WAVE_SIZE ++ s_and_b32 m0, m0, 1 ++ s_cmp_eq_u32 m0, 1 ++ s_cbranch_scc1 L_ENABLE_SAVE_4VGPR_EXEC_HI ++ s_mov_b32 exec_hi, 0x00000000 ++ s_branch L_SAVE_4VGPR_WAVE32 ++L_ENABLE_SAVE_4VGPR_EXEC_HI: ++ s_mov_b32 exec_hi, 0xFFFFFFFF ++ s_branch L_SAVE_4VGPR_WAVE64 ++L_SAVE_4VGPR_WAVE32: ++ s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ // VGPR Allocated in 4-GPR granularity ++ ++ buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128 ++ buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128*2 ++ buffer_store_dword v3, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128*3 ++ s_branch L_SAVE_HWREG ++ ++L_SAVE_4VGPR_WAVE64: ++ s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ // VGPR Allocated in 4-GPR granularity ++ ++ buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256 ++ buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256*2 ++ buffer_store_dword v3, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256*3 ++ ++ /* save HW registers */ ++ ++L_SAVE_HWREG: ++ // HWREG SR memory offset : size(VGPR)+size(SVGPR)+size(SGPR) ++ get_vgpr_size_bytes(s_save_mem_offset, s_wave_size) ++ get_svgpr_size_bytes(s_save_tmp) ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, s_save_tmp ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, get_sgpr_size_bytes() ++ ++ s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ v_mov_b32 v0, 0x0 //Offset[31:0] from buffer resource ++ v_mov_b32 v1, 0x0 //Offset[63:32] from buffer resource ++ v_mov_b32 v2, 0x0 //Set of SGPRs for TCP store ++ s_mov_b32 m0, 0x0 //Next lane of v2 to write to ++ ++ // Ensure no further changes to barrier or LDS state. ++ // STATE_PRIV.BARRIER_COMPLETE may change up to this point. ++ s_barrier_signal -2 ++ s_barrier_wait -2 ++ ++ // Re-read final state of BARRIER_COMPLETE field for save. ++ s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_STATE_PRIV) ++ s_and_b32 s_save_tmp, s_save_tmp, SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK ++ s_andn2_b32 s_save_state_priv, s_save_state_priv, SQ_WAVE_STATE_PRIV_BARRIER_COMPLETE_MASK ++ s_or_b32 s_save_state_priv, s_save_state_priv, s_save_tmp ++ ++ write_hwreg_to_v2(s_save_m0) ++ write_hwreg_to_v2(s_save_pc_lo) ++ s_andn2_b32 s_save_tmp, s_save_pc_hi, S_SAVE_PC_HI_FIRST_WAVE_MASK ++ write_hwreg_to_v2(s_save_tmp) ++ write_hwreg_to_v2(s_save_exec_lo) ++ write_hwreg_to_v2(s_save_exec_hi) ++ write_hwreg_to_v2(s_save_state_priv) ++ ++ s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV) ++ write_hwreg_to_v2(s_save_tmp) ++ ++ write_hwreg_to_v2(s_save_xnack_mask) ++ ++ s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_MODE) ++ write_hwreg_to_v2(s_save_m0) ++ ++ s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_SCRATCH_BASE_LO) ++ write_hwreg_to_v2(s_save_m0) ++ ++ s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_SCRATCH_BASE_HI) ++ write_hwreg_to_v2(s_save_m0) ++ ++ s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_EXCP_FLAG_USER) ++ write_hwreg_to_v2(s_save_m0) ++ ++ s_getreg_b32 s_save_m0, hwreg(HW_REG_WAVE_TRAP_CTRL) ++ write_hwreg_to_v2(s_save_m0) ++ ++ s_getreg_b32 s_save_tmp, hwreg(HW_REG_WAVE_STATUS) ++ write_hwreg_to_v2(s_save_tmp) ++ ++ s_get_barrier_state s_save_tmp, -1 ++ s_wait_kmcnt (0) ++ write_hwreg_to_v2(s_save_tmp) ++ ++ // Write HWREGs with 16 VGPR lanes. TTMPs occupy space after this. ++ s_mov_b32 exec_lo, 0xFFFF ++ s_mov_b32 exec_hi, 0x0 ++ buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS ++ ++ // Write SGPRs with 32 VGPR lanes. This works in wave32 and wave64 mode. ++ s_mov_b32 exec_lo, 0xFFFFFFFF ++ ++ /* save SGPRs */ ++ // Save SGPR before LDS save, then the s0 to s4 can be used during LDS save... ++ ++ // SGPR SR memory offset : size(VGPR)+size(SVGPR) ++ get_vgpr_size_bytes(s_save_mem_offset, s_wave_size) ++ get_svgpr_size_bytes(s_save_tmp) ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, s_save_tmp ++ s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ s_mov_b32 ttmp13, 0x0 //next VGPR lane to copy SGPR into ++ ++ s_mov_b32 m0, 0x0 //SGPR initial index value =0 ++ s_nop 0x0 //Manually inserted wait states ++L_SAVE_SGPR_LOOP: ++ // SGPR is allocated in 16 SGPR granularity ++ s_movrels_b64 s0, s0 //s0 = s[0+m0], s1 = s[1+m0] ++ s_movrels_b64 s2, s2 //s2 = s[2+m0], s3 = s[3+m0] ++ s_movrels_b64 s4, s4 //s4 = s[4+m0], s5 = s[5+m0] ++ s_movrels_b64 s6, s6 //s6 = s[6+m0], s7 = s[7+m0] ++ s_movrels_b64 s8, s8 //s8 = s[8+m0], s9 = s[9+m0] ++ s_movrels_b64 s10, s10 //s10 = s[10+m0], s11 = s[11+m0] ++ s_movrels_b64 s12, s12 //s12 = s[12+m0], s13 = s[13+m0] ++ s_movrels_b64 s14, s14 //s14 = s[14+m0], s15 = s[15+m0] ++ ++ write_16sgpr_to_v2(s0) ++ ++ s_cmp_eq_u32 ttmp13, 0x20 //have 32 VGPR lanes filled? ++ s_cbranch_scc0 L_SAVE_SGPR_SKIP_TCP_STORE ++ ++ buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, 0x80 ++ s_mov_b32 ttmp13, 0x0 ++ v_mov_b32 v2, 0x0 ++L_SAVE_SGPR_SKIP_TCP_STORE: ++ ++ s_add_u32 m0, m0, 16 //next sgpr index ++ s_cmp_lt_u32 m0, 96 //scc = (m0 < first 96 SGPR) ? 1 : 0 ++ s_cbranch_scc1 L_SAVE_SGPR_LOOP //first 96 SGPR save is complete? ++ ++ //save the rest 12 SGPR ++ s_movrels_b64 s0, s0 //s0 = s[0+m0], s1 = s[1+m0] ++ s_movrels_b64 s2, s2 //s2 = s[2+m0], s3 = s[3+m0] ++ s_movrels_b64 s4, s4 //s4 = s[4+m0], s5 = s[5+m0] ++ s_movrels_b64 s6, s6 //s6 = s[6+m0], s7 = s[7+m0] ++ s_movrels_b64 s8, s8 //s8 = s[8+m0], s9 = s[9+m0] ++ s_movrels_b64 s10, s10 //s10 = s[10+m0], s11 = s[11+m0] ++ write_12sgpr_to_v2(s0) ++ ++ buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS ++ ++ /* save LDS */ ++ ++L_SAVE_LDS: ++ // Change EXEC to all threads... ++ s_mov_b32 exec_lo, 0xFFFFFFFF //need every thread from now on ++ s_lshr_b32 m0, s_wave_size, S_WAVE_SIZE ++ s_and_b32 m0, m0, 1 ++ s_cmp_eq_u32 m0, 1 ++ s_cbranch_scc1 L_ENABLE_SAVE_LDS_EXEC_HI ++ s_mov_b32 exec_hi, 0x00000000 ++ s_branch L_SAVE_LDS_NORMAL ++L_ENABLE_SAVE_LDS_EXEC_HI: ++ s_mov_b32 exec_hi, 0xFFFFFFFF ++L_SAVE_LDS_NORMAL: ++ s_getreg_b32 s_save_alloc_size, hwreg(HW_REG_WAVE_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE) ++ s_and_b32 s_save_alloc_size, s_save_alloc_size, 0xFFFFFFFF //lds_size is zero? ++ s_cbranch_scc0 L_SAVE_LDS_DONE //no lds used? jump to L_SAVE_DONE ++ ++ s_and_b32 s_save_tmp, s_save_pc_hi, S_SAVE_PC_HI_FIRST_WAVE_MASK ++ s_cbranch_scc0 L_SAVE_LDS_DONE ++ ++ // first wave do LDS save; ++ ++ s_lshl_b32 s_save_alloc_size, s_save_alloc_size, SQ_WAVE_LDS_ALLOC_GRANULARITY ++ s_mov_b32 s_save_buf_rsrc2, s_save_alloc_size //NUM_RECORDS in bytes ++ ++ // LDS at offset: size(VGPR)+size(SVGPR)+SIZE(SGPR)+SIZE(HWREG) ++ // ++ get_vgpr_size_bytes(s_save_mem_offset, s_wave_size) ++ get_svgpr_size_bytes(s_save_tmp) ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, s_save_tmp ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, get_sgpr_size_bytes() ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, get_hwreg_size_bytes() ++ ++ s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ //load 0~63*4(byte address) to vgpr v0 ++ v_mbcnt_lo_u32_b32 v0, -1, 0 ++ v_mbcnt_hi_u32_b32 v0, -1, v0 ++ v_mul_u32_u24 v0, 4, v0 ++ ++ s_lshr_b32 m0, s_wave_size, S_WAVE_SIZE ++ s_and_b32 m0, m0, 1 ++ s_cmp_eq_u32 m0, 1 ++ s_mov_b32 m0, 0x0 ++ s_cbranch_scc1 L_SAVE_LDS_W64 ++ ++L_SAVE_LDS_W32: ++ s_mov_b32 s3, 128 ++ s_nop 0 ++ s_nop 0 ++ s_nop 0 ++L_SAVE_LDS_LOOP_W32: ++ ds_read_b32 v1, v0 ++ s_wait_idle ++ buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS ++ ++ s_add_u32 m0, m0, s3 //every buffer_store_lds does 128 bytes ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, s3 ++ v_add_nc_u32 v0, v0, 128 //mem offset increased by 128 bytes ++ s_cmp_lt_u32 m0, s_save_alloc_size //scc=(m0 < s_save_alloc_size) ? 1 : 0 ++ s_cbranch_scc1 L_SAVE_LDS_LOOP_W32 //LDS save is complete? ++ ++ s_branch L_SAVE_LDS_DONE ++ ++L_SAVE_LDS_W64: ++ s_mov_b32 s3, 256 ++ s_nop 0 ++ s_nop 0 ++ s_nop 0 ++L_SAVE_LDS_LOOP_W64: ++ ds_read_b32 v1, v0 ++ s_wait_idle ++ buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS ++ ++ s_add_u32 m0, m0, s3 //every buffer_store_lds does 256 bytes ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, s3 ++ v_add_nc_u32 v0, v0, 256 //mem offset increased by 256 bytes ++ s_cmp_lt_u32 m0, s_save_alloc_size //scc=(m0 < s_save_alloc_size) ? 1 : 0 ++ s_cbranch_scc1 L_SAVE_LDS_LOOP_W64 //LDS save is complete? ++ ++L_SAVE_LDS_DONE: ++ /* save VGPRs - set the Rest VGPRs */ ++L_SAVE_VGPR: ++ // VGPR SR memory offset: 0 ++ s_mov_b32 exec_lo, 0xFFFFFFFF //need every thread from now on ++ s_lshr_b32 m0, s_wave_size, S_WAVE_SIZE ++ s_and_b32 m0, m0, 1 ++ s_cmp_eq_u32 m0, 1 ++ s_cbranch_scc1 L_ENABLE_SAVE_VGPR_EXEC_HI ++ s_mov_b32 s_save_mem_offset, (0+128*4) // for the rest VGPRs ++ s_mov_b32 exec_hi, 0x00000000 ++ s_branch L_SAVE_VGPR_NORMAL ++L_ENABLE_SAVE_VGPR_EXEC_HI: ++ s_mov_b32 s_save_mem_offset, (0+256*4) // for the rest VGPRs ++ s_mov_b32 exec_hi, 0xFFFFFFFF ++L_SAVE_VGPR_NORMAL: ++ s_getreg_b32 s_save_alloc_size, hwreg(HW_REG_WAVE_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE) ++ s_add_u32 s_save_alloc_size, s_save_alloc_size, 1 ++ s_lshl_b32 s_save_alloc_size, s_save_alloc_size, 2 //Number of VGPRs = (vgpr_size + 1) * 4 (non-zero value) ++ //determine it is wave32 or wave64 ++ s_lshr_b32 m0, s_wave_size, S_WAVE_SIZE ++ s_and_b32 m0, m0, 1 ++ s_cmp_eq_u32 m0, 1 ++ s_cbranch_scc1 L_SAVE_VGPR_WAVE64 ++ ++ s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ // VGPR Allocated in 4-GPR granularity ++ ++ // VGPR store using dw burst ++ s_mov_b32 m0, 0x4 //VGPR initial index value =4 ++ s_cmp_lt_u32 m0, s_save_alloc_size ++ s_cbranch_scc0 L_SAVE_VGPR_END ++ ++L_SAVE_VGPR_W32_LOOP: ++ v_movrels_b32 v0, v0 //v0 = v[0+m0] ++ v_movrels_b32 v1, v1 //v1 = v[1+m0] ++ v_movrels_b32 v2, v2 //v2 = v[2+m0] ++ v_movrels_b32 v3, v3 //v3 = v[3+m0] ++ ++ buffer_store_dword v0, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS ++ buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128 ++ buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128*2 ++ buffer_store_dword v3, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:128*3 ++ ++ s_add_u32 m0, m0, 4 //next vgpr index ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, 128*4 //every buffer_store_dword does 128 bytes ++ s_cmp_lt_u32 m0, s_save_alloc_size //scc = (m0 < s_save_alloc_size) ? 1 : 0 ++ s_cbranch_scc1 L_SAVE_VGPR_W32_LOOP //VGPR save is complete? ++ ++ s_branch L_SAVE_VGPR_END ++ ++L_SAVE_VGPR_WAVE64: ++ s_mov_b32 s_save_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ // VGPR store using dw burst ++ s_mov_b32 m0, 0x4 //VGPR initial index value =4 ++ s_cmp_lt_u32 m0, s_save_alloc_size ++ s_cbranch_scc0 L_SAVE_SHARED_VGPR ++ ++L_SAVE_VGPR_W64_LOOP: ++ v_movrels_b32 v0, v0 //v0 = v[0+m0] ++ v_movrels_b32 v1, v1 //v1 = v[1+m0] ++ v_movrels_b32 v2, v2 //v2 = v[2+m0] ++ v_movrels_b32 v3, v3 //v3 = v[3+m0] ++ ++ buffer_store_dword v0, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS ++ buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256 ++ buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256*2 ++ buffer_store_dword v3, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS offset:256*3 ++ ++ s_add_u32 m0, m0, 4 //next vgpr index ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, 256*4 //every buffer_store_dword does 256 bytes ++ s_cmp_lt_u32 m0, s_save_alloc_size //scc = (m0 < s_save_alloc_size) ? 1 : 0 ++ s_cbranch_scc1 L_SAVE_VGPR_W64_LOOP //VGPR save is complete? ++ ++L_SAVE_SHARED_VGPR: ++ s_getreg_b32 s_save_alloc_size, hwreg(HW_REG_WAVE_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SIZE) ++ s_and_b32 s_save_alloc_size, s_save_alloc_size, 0xFFFFFFFF //shared_vgpr_size is zero? ++ s_cbranch_scc0 L_SAVE_VGPR_END //no shared_vgpr used? jump to L_SAVE_LDS ++ s_lshl_b32 s_save_alloc_size, s_save_alloc_size, 3 //Number of SHARED_VGPRs = shared_vgpr_size * 8 (non-zero value) ++ //m0 now has the value of normal vgpr count, just add the m0 with shared_vgpr count to get the total count. ++ //save shared_vgpr will start from the index of m0 ++ s_add_u32 s_save_alloc_size, s_save_alloc_size, m0 ++ s_mov_b32 exec_lo, 0xFFFFFFFF ++ s_mov_b32 exec_hi, 0x00000000 ++ ++L_SAVE_SHARED_VGPR_WAVE64_LOOP: ++ v_movrels_b32 v0, v0 //v0 = v[0+m0] ++ buffer_store_dword v0, v0, s_save_buf_rsrc0, s_save_mem_offset scope:SCOPE_SYS ++ s_add_u32 m0, m0, 1 //next vgpr index ++ s_add_u32 s_save_mem_offset, s_save_mem_offset, 128 ++ s_cmp_lt_u32 m0, s_save_alloc_size //scc = (m0 < s_save_alloc_size) ? 1 : 0 ++ s_cbranch_scc1 L_SAVE_SHARED_VGPR_WAVE64_LOOP //SHARED_VGPR save is complete? ++ ++L_SAVE_VGPR_END: ++ s_branch L_END_PGM ++ ++L_RESTORE: ++ /* Setup Resource Contants */ ++ s_mov_b32 s_restore_buf_rsrc0, s_restore_spi_init_lo //base_addr_lo ++ s_and_b32 s_restore_buf_rsrc1, s_restore_spi_init_hi, 0x0000FFFF //base_addr_hi ++ s_or_b32 s_restore_buf_rsrc1, s_restore_buf_rsrc1, S_RESTORE_BUF_RSRC_WORD1_STRIDE ++ s_mov_b32 s_restore_buf_rsrc2, 0 //NUM_RECORDS initial value = 0 (in bytes) ++ s_mov_b32 s_restore_buf_rsrc3, S_RESTORE_BUF_RSRC_WORD3_MISC ++ ++ // Save s_restore_spi_init_hi for later use. ++ s_mov_b32 s_restore_spi_init_hi_save, s_restore_spi_init_hi ++ ++ //determine it is wave32 or wave64 ++ get_wave_size2(s_restore_size) ++ ++ s_and_b32 s_restore_tmp, s_restore_spi_init_hi, S_RESTORE_SPI_INIT_FIRST_WAVE_MASK ++ s_cbranch_scc0 L_RESTORE_VGPR ++ ++ /* restore LDS */ ++L_RESTORE_LDS: ++ s_mov_b32 exec_lo, 0xFFFFFFFF //need every thread from now on ++ s_lshr_b32 m0, s_restore_size, S_WAVE_SIZE ++ s_and_b32 m0, m0, 1 ++ s_cmp_eq_u32 m0, 1 ++ s_cbranch_scc1 L_ENABLE_RESTORE_LDS_EXEC_HI ++ s_mov_b32 exec_hi, 0x00000000 ++ s_branch L_RESTORE_LDS_NORMAL ++L_ENABLE_RESTORE_LDS_EXEC_HI: ++ s_mov_b32 exec_hi, 0xFFFFFFFF ++L_RESTORE_LDS_NORMAL: ++ s_getreg_b32 s_restore_alloc_size, hwreg(HW_REG_WAVE_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE) ++ s_and_b32 s_restore_alloc_size, s_restore_alloc_size, 0xFFFFFFFF //lds_size is zero? ++ s_cbranch_scc0 L_RESTORE_VGPR //no lds used? jump to L_RESTORE_VGPR ++ s_lshl_b32 s_restore_alloc_size, s_restore_alloc_size, SQ_WAVE_LDS_ALLOC_GRANULARITY ++ s_mov_b32 s_restore_buf_rsrc2, s_restore_alloc_size //NUM_RECORDS in bytes ++ ++ // LDS at offset: size(VGPR)+size(SVGPR)+SIZE(SGPR)+SIZE(HWREG) ++ // ++ get_vgpr_size_bytes(s_restore_mem_offset, s_restore_size) ++ get_svgpr_size_bytes(s_restore_tmp) ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, s_restore_tmp ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, get_sgpr_size_bytes() ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, get_hwreg_size_bytes() ++ ++ s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ s_lshr_b32 m0, s_restore_size, S_WAVE_SIZE ++ s_and_b32 m0, m0, 1 ++ s_cmp_eq_u32 m0, 1 ++ s_mov_b32 m0, 0x0 ++ s_cbranch_scc1 L_RESTORE_LDS_LOOP_W64 ++ ++L_RESTORE_LDS_LOOP_W32: ++ buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset ++ s_wait_idle ++ ds_store_addtid_b32 v0 ++ s_add_u32 m0, m0, 128 // 128 DW ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 128 //mem offset increased by 128DW ++ s_cmp_lt_u32 m0, s_restore_alloc_size //scc=(m0 < s_restore_alloc_size) ? 1 : 0 ++ s_cbranch_scc1 L_RESTORE_LDS_LOOP_W32 //LDS restore is complete? ++ s_branch L_RESTORE_VGPR ++ ++L_RESTORE_LDS_LOOP_W64: ++ buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset ++ s_wait_idle ++ ds_store_addtid_b32 v0 ++ s_add_u32 m0, m0, 256 // 256 DW ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 256 //mem offset increased by 256DW ++ s_cmp_lt_u32 m0, s_restore_alloc_size //scc=(m0 < s_restore_alloc_size) ? 1 : 0 ++ s_cbranch_scc1 L_RESTORE_LDS_LOOP_W64 //LDS restore is complete? ++ ++ /* restore VGPRs */ ++L_RESTORE_VGPR: ++ // VGPR SR memory offset : 0 ++ s_mov_b32 s_restore_mem_offset, 0x0 ++ s_mov_b32 exec_lo, 0xFFFFFFFF //need every thread from now on ++ s_lshr_b32 m0, s_restore_size, S_WAVE_SIZE ++ s_and_b32 m0, m0, 1 ++ s_cmp_eq_u32 m0, 1 ++ s_cbranch_scc1 L_ENABLE_RESTORE_VGPR_EXEC_HI ++ s_mov_b32 exec_hi, 0x00000000 ++ s_branch L_RESTORE_VGPR_NORMAL ++L_ENABLE_RESTORE_VGPR_EXEC_HI: ++ s_mov_b32 exec_hi, 0xFFFFFFFF ++L_RESTORE_VGPR_NORMAL: ++ s_getreg_b32 s_restore_alloc_size, hwreg(HW_REG_WAVE_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE) ++ s_add_u32 s_restore_alloc_size, s_restore_alloc_size, 1 ++ s_lshl_b32 s_restore_alloc_size, s_restore_alloc_size, 2 //Number of VGPRs = (vgpr_size + 1) * 4 (non-zero value) ++ //determine it is wave32 or wave64 ++ s_lshr_b32 m0, s_restore_size, S_WAVE_SIZE ++ s_and_b32 m0, m0, 1 ++ s_cmp_eq_u32 m0, 1 ++ s_cbranch_scc1 L_RESTORE_VGPR_WAVE64 ++ ++ s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ // VGPR load using dw burst ++ s_mov_b32 s_restore_mem_offset_save, s_restore_mem_offset // restore start with v1, v0 will be the last ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 128*4 ++ s_mov_b32 m0, 4 //VGPR initial index value = 4 ++ s_cmp_lt_u32 m0, s_restore_alloc_size ++ s_cbranch_scc0 L_RESTORE_SGPR ++ ++L_RESTORE_VGPR_WAVE32_LOOP: ++ buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS ++ buffer_load_dword v1, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:128 ++ buffer_load_dword v2, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:128*2 ++ buffer_load_dword v3, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:128*3 ++ s_wait_idle ++ v_movreld_b32 v0, v0 //v[0+m0] = v0 ++ v_movreld_b32 v1, v1 ++ v_movreld_b32 v2, v2 ++ v_movreld_b32 v3, v3 ++ s_add_u32 m0, m0, 4 //next vgpr index ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 128*4 //every buffer_load_dword does 128 bytes ++ s_cmp_lt_u32 m0, s_restore_alloc_size //scc = (m0 < s_restore_alloc_size) ? 1 : 0 ++ s_cbranch_scc1 L_RESTORE_VGPR_WAVE32_LOOP //VGPR restore (except v0) is complete? ++ ++ /* VGPR restore on v0 */ ++ buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS ++ buffer_load_dword v1, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:128 ++ buffer_load_dword v2, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:128*2 ++ buffer_load_dword v3, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:128*3 ++ s_wait_idle ++ ++ s_branch L_RESTORE_SGPR ++ ++L_RESTORE_VGPR_WAVE64: ++ s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ // VGPR load using dw burst ++ s_mov_b32 s_restore_mem_offset_save, s_restore_mem_offset // restore start with v4, v0 will be the last ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 256*4 ++ s_mov_b32 m0, 4 //VGPR initial index value = 4 ++ s_cmp_lt_u32 m0, s_restore_alloc_size ++ s_cbranch_scc0 L_RESTORE_SHARED_VGPR ++ ++L_RESTORE_VGPR_WAVE64_LOOP: ++ buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS ++ buffer_load_dword v1, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:256 ++ buffer_load_dword v2, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:256*2 ++ buffer_load_dword v3, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS offset:256*3 ++ s_wait_idle ++ v_movreld_b32 v0, v0 //v[0+m0] = v0 ++ v_movreld_b32 v1, v1 ++ v_movreld_b32 v2, v2 ++ v_movreld_b32 v3, v3 ++ s_add_u32 m0, m0, 4 //next vgpr index ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 256*4 //every buffer_load_dword does 256 bytes ++ s_cmp_lt_u32 m0, s_restore_alloc_size //scc = (m0 < s_restore_alloc_size) ? 1 : 0 ++ s_cbranch_scc1 L_RESTORE_VGPR_WAVE64_LOOP //VGPR restore (except v0) is complete? ++ ++L_RESTORE_SHARED_VGPR: ++ s_getreg_b32 s_restore_alloc_size, hwreg(HW_REG_WAVE_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SIZE) //shared_vgpr_size ++ s_and_b32 s_restore_alloc_size, s_restore_alloc_size, 0xFFFFFFFF //shared_vgpr_size is zero? ++ s_cbranch_scc0 L_RESTORE_V0 //no shared_vgpr used? ++ s_lshl_b32 s_restore_alloc_size, s_restore_alloc_size, 3 //Number of SHARED_VGPRs = shared_vgpr_size * 8 (non-zero value) ++ //m0 now has the value of normal vgpr count, just add the m0 with shared_vgpr count to get the total count. ++ //restore shared_vgpr will start from the index of m0 ++ s_add_u32 s_restore_alloc_size, s_restore_alloc_size, m0 ++ s_mov_b32 exec_lo, 0xFFFFFFFF ++ s_mov_b32 exec_hi, 0x00000000 ++L_RESTORE_SHARED_VGPR_WAVE64_LOOP: ++ buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset scope:SCOPE_SYS ++ s_wait_idle ++ v_movreld_b32 v0, v0 //v[0+m0] = v0 ++ s_add_u32 m0, m0, 1 //next vgpr index ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 128 ++ s_cmp_lt_u32 m0, s_restore_alloc_size //scc = (m0 < s_restore_alloc_size) ? 1 : 0 ++ s_cbranch_scc1 L_RESTORE_SHARED_VGPR_WAVE64_LOOP //VGPR restore (except v0) is complete? ++ ++ s_mov_b32 exec_hi, 0xFFFFFFFF //restore back exec_hi before restoring V0!! ++ ++ /* VGPR restore on v0 */ ++L_RESTORE_V0: ++ buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS ++ buffer_load_dword v1, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:256 ++ buffer_load_dword v2, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:256*2 ++ buffer_load_dword v3, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save scope:SCOPE_SYS offset:256*3 ++ s_wait_idle ++ ++ /* restore SGPRs */ ++ //will be 2+8+16*6 ++ // SGPR SR memory offset : size(VGPR)+size(SVGPR) ++L_RESTORE_SGPR: ++ get_vgpr_size_bytes(s_restore_mem_offset, s_restore_size) ++ get_svgpr_size_bytes(s_restore_tmp) ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, s_restore_tmp ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, get_sgpr_size_bytes() ++ s_sub_u32 s_restore_mem_offset, s_restore_mem_offset, 20*4 //s108~s127 is not saved ++ ++ s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ s_mov_b32 m0, s_sgpr_save_num ++ ++ read_4sgpr_from_mem(s0, s_restore_buf_rsrc0, s_restore_mem_offset) ++ s_wait_idle ++ ++ s_sub_u32 m0, m0, 4 // Restore from S[0] to S[104] ++ s_nop 0 // hazard SALU M0=> S_MOVREL ++ ++ s_movreld_b64 s0, s0 //s[0+m0] = s0 ++ s_movreld_b64 s2, s2 ++ ++ read_8sgpr_from_mem(s0, s_restore_buf_rsrc0, s_restore_mem_offset) ++ s_wait_idle ++ ++ s_sub_u32 m0, m0, 8 // Restore from S[0] to S[96] ++ s_nop 0 // hazard SALU M0=> S_MOVREL ++ ++ s_movreld_b64 s0, s0 //s[0+m0] = s0 ++ s_movreld_b64 s2, s2 ++ s_movreld_b64 s4, s4 ++ s_movreld_b64 s6, s6 ++ ++ L_RESTORE_SGPR_LOOP: ++ read_16sgpr_from_mem(s0, s_restore_buf_rsrc0, s_restore_mem_offset) ++ s_wait_idle ++ ++ s_sub_u32 m0, m0, 16 // Restore from S[n] to S[0] ++ s_nop 0 // hazard SALU M0=> S_MOVREL ++ ++ s_movreld_b64 s0, s0 //s[0+m0] = s0 ++ s_movreld_b64 s2, s2 ++ s_movreld_b64 s4, s4 ++ s_movreld_b64 s6, s6 ++ s_movreld_b64 s8, s8 ++ s_movreld_b64 s10, s10 ++ s_movreld_b64 s12, s12 ++ s_movreld_b64 s14, s14 ++ ++ s_cmp_eq_u32 m0, 0 //scc = (m0 < s_sgpr_save_num) ? 1 : 0 ++ s_cbranch_scc0 L_RESTORE_SGPR_LOOP ++ ++ // s_barrier with STATE_PRIV.TRAP_AFTER_INST=1, STATUS.PRIV=1 incorrectly asserts debug exception. ++ // Clear DEBUG_EN before and restore MODE after the barrier. ++ s_setreg_imm32_b32 hwreg(HW_REG_WAVE_MODE), 0 ++ ++ /* restore HW registers */ ++L_RESTORE_HWREG: ++ // HWREG SR memory offset : size(VGPR)+size(SVGPR)+size(SGPR) ++ get_vgpr_size_bytes(s_restore_mem_offset, s_restore_size) ++ get_svgpr_size_bytes(s_restore_tmp) ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, s_restore_tmp ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, get_sgpr_size_bytes() ++ ++ s_mov_b32 s_restore_buf_rsrc2, 0x1000000 //NUM_RECORDS in bytes ++ ++ // Restore s_restore_spi_init_hi before the saved value gets clobbered. ++ s_mov_b32 s_restore_spi_init_hi, s_restore_spi_init_hi_save ++ ++ read_hwreg_from_mem(s_restore_m0, s_restore_buf_rsrc0, s_restore_mem_offset) ++ read_hwreg_from_mem(s_restore_pc_lo, s_restore_buf_rsrc0, s_restore_mem_offset) ++ read_hwreg_from_mem(s_restore_pc_hi, s_restore_buf_rsrc0, s_restore_mem_offset) ++ read_hwreg_from_mem(s_restore_exec_lo, s_restore_buf_rsrc0, s_restore_mem_offset) ++ read_hwreg_from_mem(s_restore_exec_hi, s_restore_buf_rsrc0, s_restore_mem_offset) ++ read_hwreg_from_mem(s_restore_state_priv, s_restore_buf_rsrc0, s_restore_mem_offset) ++ read_hwreg_from_mem(s_restore_excp_flag_priv, s_restore_buf_rsrc0, s_restore_mem_offset) ++ read_hwreg_from_mem(s_restore_xnack_mask, s_restore_buf_rsrc0, s_restore_mem_offset) ++ read_hwreg_from_mem(s_restore_mode, s_restore_buf_rsrc0, s_restore_mem_offset) ++ read_hwreg_from_mem(s_restore_flat_scratch, s_restore_buf_rsrc0, s_restore_mem_offset) ++ s_wait_idle ++ ++ s_setreg_b32 hwreg(HW_REG_WAVE_SCRATCH_BASE_LO), s_restore_flat_scratch ++ ++ read_hwreg_from_mem(s_restore_flat_scratch, s_restore_buf_rsrc0, s_restore_mem_offset) ++ s_wait_idle ++ ++ s_setreg_b32 hwreg(HW_REG_WAVE_SCRATCH_BASE_HI), s_restore_flat_scratch ++ ++ read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) ++ s_wait_idle ++ s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_USER), s_restore_tmp ++ ++ read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) ++ s_wait_idle ++ s_setreg_b32 hwreg(HW_REG_WAVE_TRAP_CTRL), s_restore_tmp ++ ++ // Only the first wave needs to restore the workgroup barrier. ++ s_and_b32 s_restore_tmp, s_restore_spi_init_hi, S_RESTORE_SPI_INIT_FIRST_WAVE_MASK ++ s_cbranch_scc0 L_SKIP_BARRIER_RESTORE ++ ++ // Skip over WAVE_STATUS, since there is no state to restore from it ++ s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 4 ++ ++ read_hwreg_from_mem(s_restore_tmp, s_restore_buf_rsrc0, s_restore_mem_offset) ++ s_wait_idle ++ ++ s_bitcmp1_b32 s_restore_tmp, BARRIER_STATE_VALID_OFFSET ++ s_cbranch_scc0 L_SKIP_BARRIER_RESTORE ++ ++ // extract the saved signal count from s_restore_tmp ++ s_lshr_b32 s_restore_tmp, s_restore_tmp, BARRIER_STATE_SIGNAL_OFFSET ++ ++ // We need to call s_barrier_signal repeatedly to restore the signal ++ // count of the work group barrier. The member count is already ++ // initialized with the number of waves in the work group. ++L_BARRIER_RESTORE_LOOP: ++ s_and_b32 s_restore_tmp, s_restore_tmp, s_restore_tmp ++ s_cbranch_scc0 L_SKIP_BARRIER_RESTORE ++ s_barrier_signal -1 ++ s_add_i32 s_restore_tmp, s_restore_tmp, -1 ++ s_branch L_BARRIER_RESTORE_LOOP ++ ++L_SKIP_BARRIER_RESTORE: ++ ++ s_mov_b32 m0, s_restore_m0 ++ s_mov_b32 exec_lo, s_restore_exec_lo ++ s_mov_b32 exec_hi, s_restore_exec_hi ++ ++ // EXCP_FLAG_PRIV.SAVE_CONTEXT and HOST_TRAP may have changed. ++ // Only restore the other fields to avoid clobbering them. ++ s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV, 0, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_1_SIZE), s_restore_excp_flag_priv ++ s_lshr_b32 s_restore_excp_flag_priv, s_restore_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SHIFT ++ s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SHIFT, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SIZE), s_restore_excp_flag_priv ++ s_lshr_b32 s_restore_excp_flag_priv, s_restore_excp_flag_priv, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SHIFT - SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_2_SHIFT ++ s_setreg_b32 hwreg(HW_REG_WAVE_EXCP_FLAG_PRIV, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SHIFT, SQ_WAVE_EXCP_FLAG_PRIV_RESTORE_PART_3_SIZE), s_restore_excp_flag_priv ++ ++ s_setreg_b32 hwreg(HW_REG_WAVE_MODE), s_restore_mode ++ ++ // Restore trap temporaries 4-11, 13 initialized by SPI debug dispatch logic ++ // ttmp SR memory offset : size(VGPR)+size(SVGPR)+size(SGPR)+0x40 ++ get_vgpr_size_bytes(s_restore_ttmps_lo, s_restore_size) ++ get_svgpr_size_bytes(s_restore_ttmps_hi) ++ s_add_u32 s_restore_ttmps_lo, s_restore_ttmps_lo, s_restore_ttmps_hi ++ s_add_u32 s_restore_ttmps_lo, s_restore_ttmps_lo, get_sgpr_size_bytes() ++ s_add_u32 s_restore_ttmps_lo, s_restore_ttmps_lo, s_restore_buf_rsrc0 ++ s_addc_u32 s_restore_ttmps_hi, s_restore_buf_rsrc1, 0x0 ++ s_and_b32 s_restore_ttmps_hi, s_restore_ttmps_hi, 0xFFFF ++ s_load_dwordx4 [ttmp4, ttmp5, ttmp6, ttmp7], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x50 scope:SCOPE_SYS ++ s_load_dwordx4 [ttmp8, ttmp9, ttmp10, ttmp11], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x60 scope:SCOPE_SYS ++ s_load_dword ttmp13, [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x74 scope:SCOPE_SYS ++ s_wait_idle ++ ++ s_and_b32 s_restore_pc_hi, s_restore_pc_hi, 0x0000ffff //pc[47:32] //Do it here in order not to affect STATUS ++ s_and_b64 exec, exec, exec // Restore STATUS.EXECZ, not writable by s_setreg_b32 ++ s_and_b64 vcc, vcc, vcc // Restore STATUS.VCCZ, not writable by s_setreg_b32 ++ ++ s_setreg_b32 hwreg(HW_REG_WAVE_STATE_PRIV), s_restore_state_priv // SCC is included, which is changed by previous salu ++ ++ // Make barrier and LDS state visible to all waves in the group. ++ // STATE_PRIV.BARRIER_COMPLETE may change after this point. ++ s_barrier_signal -2 ++ s_barrier_wait -2 ++ ++ s_rfe_b64 s_restore_pc_lo //Return to the main shader program and resume execution ++ ++L_END_PGM: ++ // Make sure that no wave of the workgroup can exit the trap handler ++ // before the workgroup barrier state is saved. ++ s_barrier_signal -2 ++ s_barrier_wait -2 ++ s_endpgm_saved ++end ++ ++function write_hwreg_to_v2(s) ++ // Copy into VGPR for later TCP store. ++ v_writelane_b32 v2, s, m0 ++ s_add_u32 m0, m0, 0x1 ++end ++ ++ ++function write_16sgpr_to_v2(s) ++ // Copy into VGPR for later TCP store. ++ for var sgpr_idx = 0; sgpr_idx < 16; sgpr_idx ++ ++ v_writelane_b32 v2, s[sgpr_idx], ttmp13 ++ s_add_u32 ttmp13, ttmp13, 0x1 ++ end ++end ++ ++function write_12sgpr_to_v2(s) ++ // Copy into VGPR for later TCP store. ++ for var sgpr_idx = 0; sgpr_idx < 12; sgpr_idx ++ ++ v_writelane_b32 v2, s[sgpr_idx], ttmp13 ++ s_add_u32 ttmp13, ttmp13, 0x1 ++ end ++end ++ ++function read_hwreg_from_mem(s, s_rsrc, s_mem_offset) ++ s_buffer_load_dword s, s_rsrc, s_mem_offset scope:SCOPE_SYS ++ s_add_u32 s_mem_offset, s_mem_offset, 4 ++end ++ ++function read_16sgpr_from_mem(s, s_rsrc, s_mem_offset) ++ s_sub_u32 s_mem_offset, s_mem_offset, 4*16 ++ s_buffer_load_dwordx16 s, s_rsrc, s_mem_offset scope:SCOPE_SYS ++end ++ ++function read_8sgpr_from_mem(s, s_rsrc, s_mem_offset) ++ s_sub_u32 s_mem_offset, s_mem_offset, 4*8 ++ s_buffer_load_dwordx8 s, s_rsrc, s_mem_offset scope:SCOPE_SYS ++end ++ ++function read_4sgpr_from_mem(s, s_rsrc, s_mem_offset) ++ s_sub_u32 s_mem_offset, s_mem_offset, 4*4 ++ s_buffer_load_dwordx4 s, s_rsrc, s_mem_offset scope:SCOPE_SYS ++end ++ ++function get_vgpr_size_bytes(s_vgpr_size_byte, s_size) ++ s_getreg_b32 s_vgpr_size_byte, hwreg(HW_REG_WAVE_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE) ++ s_add_u32 s_vgpr_size_byte, s_vgpr_size_byte, 1 ++ s_bitcmp1_b32 s_size, S_WAVE_SIZE ++ s_cbranch_scc1 L_ENABLE_SHIFT_W64 ++ s_lshl_b32 s_vgpr_size_byte, s_vgpr_size_byte, (2+7) //Number of VGPRs = (vgpr_size + 1) * 4 * 32 * 4 (non-zero value) ++ s_branch L_SHIFT_DONE ++L_ENABLE_SHIFT_W64: ++ s_lshl_b32 s_vgpr_size_byte, s_vgpr_size_byte, (2+8) //Number of VGPRs = (vgpr_size + 1) * 4 * 64 * 4 (non-zero value) ++L_SHIFT_DONE: ++end ++ ++function get_svgpr_size_bytes(s_svgpr_size_byte) ++ s_getreg_b32 s_svgpr_size_byte, hwreg(HW_REG_WAVE_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_VGPR_SHARED_SIZE_SIZE) ++ s_lshl_b32 s_svgpr_size_byte, s_svgpr_size_byte, (3+7) ++end ++ ++function get_sgpr_size_bytes ++ return 512 ++end ++ ++function get_hwreg_size_bytes ++ return 128 ++end ++ ++function get_wave_size2(s_reg) ++ s_getreg_b32 s_reg, hwreg(HW_REG_WAVE_STATUS,SQ_WAVE_STATUS_WAVE64_SHIFT,SQ_WAVE_STATUS_WAVE64_SIZE) ++ s_lshl_b32 s_reg, s_reg, S_WAVE_SIZE ++end +diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile +index ab1132bc896a32..d9955c5d2e5ed5 100644 +--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile ++++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile +@@ -174,7 +174,7 @@ AMD_DISPLAY_FILES += $(AMD_DAL_CLK_MGR_DCN32) + ############################################################################### + # DCN35 + ############################################################################### +-CLK_MGR_DCN35 = dcn35_smu.o dcn35_clk_mgr.o ++CLK_MGR_DCN35 = dcn35_smu.o dcn351_clk_mgr.o dcn35_clk_mgr.o + + AMD_DAL_CLK_MGR_DCN35 = $(addprefix $(AMDDALPATH)/dc/clk_mgr/dcn35/,$(CLK_MGR_DCN35)) + +diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c +index 0e243f4344d050..4c3e58c730b11c 100644 +--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c ++++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c +@@ -355,8 +355,11 @@ struct clk_mgr *dc_clk_mgr_create(struct dc_context *ctx, struct pp_smu_funcs *p + BREAK_TO_DEBUGGER(); + return NULL; + } ++ if (ctx->dce_version == DCN_VERSION_3_51) ++ dcn351_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); ++ else ++ dcn35_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); + +- dcn35_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); + return &clk_mgr->base.base; + } + break; +diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c +new file mode 100644 +index 00000000000000..6a6ae618650b6d +--- /dev/null ++++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c +@@ -0,0 +1,140 @@ ++/* ++ * Copyright 2024 Advanced Micro Devices, Inc. ++ * ++ * Permission is hereby granted, free of charge, to any person obtaining a ++ * copy of this software and associated documentation files (the "Software"), ++ * to deal in the Software without restriction, including without limitation ++ * the rights to use, copy, modify, merge, publish, distribute, sublicense, ++ * and/or sell copies of the Software, and to permit persons to whom the ++ * Software is furnished to do so, subject to the following conditions: ++ * ++ * The above copyright notice and this permission notice shall be included in ++ * all copies or substantial portions of the Software. ++ * ++ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ++ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ++ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ++ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR ++ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ++ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR ++ * OTHER DEALINGS IN THE SOFTWARE. ++ * ++ * Authors: AMD ++ * ++ */ ++ ++#include "core_types.h" ++#include "dcn35_clk_mgr.h" ++ ++#define DCN_BASE__INST0_SEG1 0x000000C0 ++#define mmCLK1_CLK_PLL_REQ 0x16E37 ++ ++#define mmCLK1_CLK0_DFS_CNTL 0x16E69 ++#define mmCLK1_CLK1_DFS_CNTL 0x16E6C ++#define mmCLK1_CLK2_DFS_CNTL 0x16E6F ++#define mmCLK1_CLK3_DFS_CNTL 0x16E72 ++#define mmCLK1_CLK4_DFS_CNTL 0x16E75 ++#define mmCLK1_CLK5_DFS_CNTL 0x16E78 ++ ++#define mmCLK1_CLK0_CURRENT_CNT 0x16EFC ++#define mmCLK1_CLK1_CURRENT_CNT 0x16EFD ++#define mmCLK1_CLK2_CURRENT_CNT 0x16EFE ++#define mmCLK1_CLK3_CURRENT_CNT 0x16EFF ++#define mmCLK1_CLK4_CURRENT_CNT 0x16F00 ++#define mmCLK1_CLK5_CURRENT_CNT 0x16F01 ++ ++#define mmCLK1_CLK0_BYPASS_CNTL 0x16E8A ++#define mmCLK1_CLK1_BYPASS_CNTL 0x16E93 ++#define mmCLK1_CLK2_BYPASS_CNTL 0x16E9C ++#define mmCLK1_CLK3_BYPASS_CNTL 0x16EA5 ++#define mmCLK1_CLK4_BYPASS_CNTL 0x16EAE ++#define mmCLK1_CLK5_BYPASS_CNTL 0x16EB7 ++ ++#define mmCLK1_CLK0_DS_CNTL 0x16E83 ++#define mmCLK1_CLK1_DS_CNTL 0x16E8C ++#define mmCLK1_CLK2_DS_CNTL 0x16E95 ++#define mmCLK1_CLK3_DS_CNTL 0x16E9E ++#define mmCLK1_CLK4_DS_CNTL 0x16EA7 ++#define mmCLK1_CLK5_DS_CNTL 0x16EB0 ++ ++#define mmCLK1_CLK0_ALLOW_DS 0x16E84 ++#define mmCLK1_CLK1_ALLOW_DS 0x16E8D ++#define mmCLK1_CLK2_ALLOW_DS 0x16E96 ++#define mmCLK1_CLK3_ALLOW_DS 0x16E9F ++#define mmCLK1_CLK4_ALLOW_DS 0x16EA8 ++#define mmCLK1_CLK5_ALLOW_DS 0x16EB1 ++ ++#define mmCLK5_spll_field_8 0x1B04B ++#define mmDENTIST_DISPCLK_CNTL 0x0124 ++#define regDENTIST_DISPCLK_CNTL 0x0064 ++#define regDENTIST_DISPCLK_CNTL_BASE_IDX 1 ++ ++#define CLK1_CLK_PLL_REQ__FbMult_int__SHIFT 0x0 ++#define CLK1_CLK_PLL_REQ__PllSpineDiv__SHIFT 0xc ++#define CLK1_CLK_PLL_REQ__FbMult_frac__SHIFT 0x10 ++#define CLK1_CLK_PLL_REQ__FbMult_int_MASK 0x000001FFL ++#define CLK1_CLK_PLL_REQ__PllSpineDiv_MASK 0x0000F000L ++#define CLK1_CLK_PLL_REQ__FbMult_frac_MASK 0xFFFF0000L ++ ++#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK 0x00000007L ++ ++// DENTIST_DISPCLK_CNTL ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER__SHIFT 0x0 ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER__SHIFT 0x8 ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE__SHIFT 0x13 ++#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE__SHIFT 0x14 ++#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER__SHIFT 0x18 ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER_MASK 0x0000007FL ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER_MASK 0x00007F00L ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE_MASK 0x00080000L ++#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE_MASK 0x00100000L ++#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER_MASK 0x7F000000L ++ ++#define CLK5_spll_field_8__spll_ssc_en_MASK 0x00002000L ++ ++#define REG(reg) \ ++ (clk_mgr->regs->reg) ++ ++#define BASE_INNER(seg) DCN_BASE__INST0_SEG ## seg ++ ++#define BASE(seg) BASE_INNER(seg) ++ ++#define SR(reg_name)\ ++ .reg_name = BASE(reg ## reg_name ## _BASE_IDX) + \ ++ reg ## reg_name ++ ++#define CLK_SR_DCN35(reg_name)\ ++ .reg_name = mm ## reg_name ++ ++static const struct clk_mgr_registers clk_mgr_regs_dcn351 = { ++ CLK_REG_LIST_DCN35() ++}; ++ ++static const struct clk_mgr_shift clk_mgr_shift_dcn351 = { ++ CLK_COMMON_MASK_SH_LIST_DCN32(__SHIFT) ++}; ++ ++static const struct clk_mgr_mask clk_mgr_mask_dcn351 = { ++ CLK_COMMON_MASK_SH_LIST_DCN32(_MASK) ++}; ++ ++#define TO_CLK_MGR_DCN35(clk_mgr)\ ++ container_of(clk_mgr, struct clk_mgr_dcn35, base) ++ ++ ++void dcn351_clk_mgr_construct( ++ struct dc_context *ctx, ++ struct clk_mgr_dcn35 *clk_mgr, ++ struct pp_smu_funcs *pp_smu, ++ struct dccg *dccg) ++{ ++ /*register offset changed*/ ++ clk_mgr->base.regs = &clk_mgr_regs_dcn351; ++ clk_mgr->base.clk_mgr_shift = &clk_mgr_shift_dcn351; ++ clk_mgr->base.clk_mgr_mask = &clk_mgr_mask_dcn351; ++ ++ dcn35_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); ++ ++} ++ ++ +diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c +index b77333817f1895..2e435ee363fede 100644 +--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c ++++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c +@@ -36,15 +36,11 @@ + #include "dcn20/dcn20_clk_mgr.h" + + +- +- + #include "reg_helper.h" + #include "core_types.h" + #include "dcn35_smu.h" + #include "dm_helpers.h" + +-/* TODO: remove this include once we ported over remaining clk mgr functions*/ +-#include "dcn30/dcn30_clk_mgr.h" + #include "dcn31/dcn31_clk_mgr.h" + + #include "dc_dmub_srv.h" +@@ -55,35 +51,102 @@ + #define DC_LOGGER \ + clk_mgr->base.base.ctx->logger + ++#define DCN_BASE__INST0_SEG1 0x000000C0 ++#define mmCLK1_CLK_PLL_REQ 0x16E37 ++ ++#define mmCLK1_CLK0_DFS_CNTL 0x16E69 ++#define mmCLK1_CLK1_DFS_CNTL 0x16E6C ++#define mmCLK1_CLK2_DFS_CNTL 0x16E6F ++#define mmCLK1_CLK3_DFS_CNTL 0x16E72 ++#define mmCLK1_CLK4_DFS_CNTL 0x16E75 ++#define mmCLK1_CLK5_DFS_CNTL 0x16E78 ++ ++#define mmCLK1_CLK0_CURRENT_CNT 0x16EFB ++#define mmCLK1_CLK1_CURRENT_CNT 0x16EFC ++#define mmCLK1_CLK2_CURRENT_CNT 0x16EFD ++#define mmCLK1_CLK3_CURRENT_CNT 0x16EFE ++#define mmCLK1_CLK4_CURRENT_CNT 0x16EFF ++#define mmCLK1_CLK5_CURRENT_CNT 0x16F00 ++ ++#define mmCLK1_CLK0_BYPASS_CNTL 0x16E8A ++#define mmCLK1_CLK1_BYPASS_CNTL 0x16E93 ++#define mmCLK1_CLK2_BYPASS_CNTL 0x16E9C ++#define mmCLK1_CLK3_BYPASS_CNTL 0x16EA5 ++#define mmCLK1_CLK4_BYPASS_CNTL 0x16EAE ++#define mmCLK1_CLK5_BYPASS_CNTL 0x16EB7 ++ ++#define mmCLK1_CLK0_DS_CNTL 0x16E83 ++#define mmCLK1_CLK1_DS_CNTL 0x16E8C ++#define mmCLK1_CLK2_DS_CNTL 0x16E95 ++#define mmCLK1_CLK3_DS_CNTL 0x16E9E ++#define mmCLK1_CLK4_DS_CNTL 0x16EA7 ++#define mmCLK1_CLK5_DS_CNTL 0x16EB0 ++ ++#define mmCLK1_CLK0_ALLOW_DS 0x16E84 ++#define mmCLK1_CLK1_ALLOW_DS 0x16E8D ++#define mmCLK1_CLK2_ALLOW_DS 0x16E96 ++#define mmCLK1_CLK3_ALLOW_DS 0x16E9F ++#define mmCLK1_CLK4_ALLOW_DS 0x16EA8 ++#define mmCLK1_CLK5_ALLOW_DS 0x16EB1 ++ ++#define mmCLK5_spll_field_8 0x1B24B ++#define mmDENTIST_DISPCLK_CNTL 0x0124 ++#define regDENTIST_DISPCLK_CNTL 0x0064 ++#define regDENTIST_DISPCLK_CNTL_BASE_IDX 1 ++ ++#define CLK1_CLK_PLL_REQ__FbMult_int__SHIFT 0x0 ++#define CLK1_CLK_PLL_REQ__PllSpineDiv__SHIFT 0xc ++#define CLK1_CLK_PLL_REQ__FbMult_frac__SHIFT 0x10 ++#define CLK1_CLK_PLL_REQ__FbMult_int_MASK 0x000001FFL ++#define CLK1_CLK_PLL_REQ__PllSpineDiv_MASK 0x0000F000L ++#define CLK1_CLK_PLL_REQ__FbMult_frac_MASK 0xFFFF0000L ++ ++#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK 0x00000007L ++#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_DIV_MASK 0x000F0000L ++// DENTIST_DISPCLK_CNTL ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER__SHIFT 0x0 ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER__SHIFT 0x8 ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE__SHIFT 0x13 ++#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE__SHIFT 0x14 ++#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER__SHIFT 0x18 ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER_MASK 0x0000007FL ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER_MASK 0x00007F00L ++#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE_MASK 0x00080000L ++#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE_MASK 0x00100000L ++#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER_MASK 0x7F000000L ++ ++#define CLK5_spll_field_8__spll_ssc_en_MASK 0x00002000L ++ ++#define SMU_VER_THRESHOLD 0x5D4A00 //93.74.0 ++#undef FN ++#define FN(reg_name, field_name) \ ++ clk_mgr->clk_mgr_shift->field_name, clk_mgr->clk_mgr_mask->field_name + +-#define regCLK1_CLK_PLL_REQ 0x0237 +-#define regCLK1_CLK_PLL_REQ_BASE_IDX 0 ++#define REG(reg) \ ++ (clk_mgr->regs->reg) + +-#define CLK1_CLK_PLL_REQ__FbMult_int__SHIFT 0x0 +-#define CLK1_CLK_PLL_REQ__PllSpineDiv__SHIFT 0xc +-#define CLK1_CLK_PLL_REQ__FbMult_frac__SHIFT 0x10 +-#define CLK1_CLK_PLL_REQ__FbMult_int_MASK 0x000001FFL +-#define CLK1_CLK_PLL_REQ__PllSpineDiv_MASK 0x0000F000L +-#define CLK1_CLK_PLL_REQ__FbMult_frac_MASK 0xFFFF0000L ++#define BASE_INNER(seg) DCN_BASE__INST0_SEG ## seg + +-#define regCLK1_CLK2_BYPASS_CNTL 0x029c +-#define regCLK1_CLK2_BYPASS_CNTL_BASE_IDX 0 ++#define BASE(seg) BASE_INNER(seg) + +-#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL__SHIFT 0x0 +-#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_DIV__SHIFT 0x10 +-#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK 0x00000007L +-#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_DIV_MASK 0x000F0000L ++#define SR(reg_name)\ ++ .reg_name = BASE(reg ## reg_name ## _BASE_IDX) + \ ++ reg ## reg_name + +-#define regCLK5_0_CLK5_spll_field_8 0x464b +-#define regCLK5_0_CLK5_spll_field_8_BASE_IDX 0 ++#define CLK_SR_DCN35(reg_name)\ ++ .reg_name = mm ## reg_name + +-#define CLK5_0_CLK5_spll_field_8__spll_ssc_en__SHIFT 0xd +-#define CLK5_0_CLK5_spll_field_8__spll_ssc_en_MASK 0x00002000L ++static const struct clk_mgr_registers clk_mgr_regs_dcn35 = { ++ CLK_REG_LIST_DCN35() ++}; + +-#define SMU_VER_THRESHOLD 0x5D4A00 //93.74.0 ++static const struct clk_mgr_shift clk_mgr_shift_dcn35 = { ++ CLK_COMMON_MASK_SH_LIST_DCN32(__SHIFT) ++}; + +-#define REG(reg_name) \ +- (ctx->clk_reg_offsets[reg ## reg_name ## _BASE_IDX] + reg ## reg_name) ++static const struct clk_mgr_mask clk_mgr_mask_dcn35 = { ++ CLK_COMMON_MASK_SH_LIST_DCN32(_MASK) ++}; + + #define TO_CLK_MGR_DCN35(clk_mgr)\ + container_of(clk_mgr, struct clk_mgr_dcn35, base) +@@ -452,7 +515,6 @@ static int get_vco_frequency_from_reg(struct clk_mgr_internal *clk_mgr) + struct fixed31_32 pll_req; + unsigned int fbmult_frac_val = 0; + unsigned int fbmult_int_val = 0; +- struct dc_context *ctx = clk_mgr->base.ctx; + + /* + * Register value of fbmult is in 8.16 format, we are converting to 314.32 +@@ -512,12 +574,12 @@ static void dcn35_dump_clk_registers(struct clk_state_registers_and_bypass *regs + static bool dcn35_is_spll_ssc_enabled(struct clk_mgr *clk_mgr_base) + { + struct clk_mgr_internal *clk_mgr = TO_CLK_MGR_INTERNAL(clk_mgr_base); +- struct dc_context *ctx = clk_mgr->base.ctx; ++ + uint32_t ssc_enable; + +- REG_GET(CLK5_0_CLK5_spll_field_8, spll_ssc_en, &ssc_enable); ++ ssc_enable = REG_READ(CLK5_spll_field_8) & CLK5_spll_field_8__spll_ssc_en_MASK; + +- return ssc_enable == 1; ++ return ssc_enable != 0; + } + + static void init_clk_states(struct clk_mgr *clk_mgr) +@@ -642,10 +704,10 @@ static struct dcn35_ss_info_table ss_info_table = { + + static void dcn35_read_ss_info_from_lut(struct clk_mgr_internal *clk_mgr) + { +- struct dc_context *ctx = clk_mgr->base.ctx; +- uint32_t clock_source; ++ uint32_t clock_source = 0; ++ ++ clock_source = REG_READ(CLK1_CLK2_BYPASS_CNTL) & CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK; + +- REG_GET(CLK1_CLK2_BYPASS_CNTL, CLK2_BYPASS_SEL, &clock_source); + // If it's DFS mode, clock_source is 0. + if (dcn35_is_spll_ssc_enabled(&clk_mgr->base) && (clock_source < ARRAY_SIZE(ss_info_table.ss_percentage))) { + clk_mgr->dprefclk_ss_percentage = ss_info_table.ss_percentage[clock_source]; +@@ -1112,6 +1174,12 @@ void dcn35_clk_mgr_construct( + clk_mgr->base.dprefclk_ss_divider = 1000; + clk_mgr->base.ss_on_dprefclk = false; + clk_mgr->base.dfs_ref_freq_khz = 48000; ++ if (ctx->dce_version == DCN_VERSION_3_5) { ++ clk_mgr->base.regs = &clk_mgr_regs_dcn35; ++ clk_mgr->base.clk_mgr_shift = &clk_mgr_shift_dcn35; ++ clk_mgr->base.clk_mgr_mask = &clk_mgr_mask_dcn35; ++ } ++ + + clk_mgr->smu_wm_set.wm_set = (struct dcn35_watermarks *)dm_helpers_allocate_gpu_mem( + clk_mgr->base.base.ctx, +diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h +index 1203dc605b12c4..a12a9bf90806ed 100644 +--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h ++++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h +@@ -60,4 +60,8 @@ void dcn35_clk_mgr_construct(struct dc_context *ctx, + + void dcn35_clk_mgr_destroy(struct clk_mgr_internal *clk_mgr_int); + ++void dcn351_clk_mgr_construct(struct dc_context *ctx, ++ struct clk_mgr_dcn35 *clk_mgr, ++ struct pp_smu_funcs *pp_smu, ++ struct dccg *dccg); + #endif //__DCN35_CLK_MGR_H__ +diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h b/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h +index c2dd061892f4d9..7a1ca1e98059b0 100644 +--- a/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h ++++ b/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h +@@ -166,6 +166,41 @@ enum dentist_divider_range { + CLK_SR_DCN32(CLK1_CLK4_CURRENT_CNT), \ + CLK_SR_DCN32(CLK4_CLK0_CURRENT_CNT) + ++#define CLK_REG_LIST_DCN35() \ ++ CLK_SR_DCN35(CLK1_CLK_PLL_REQ), \ ++ CLK_SR_DCN35(CLK1_CLK0_DFS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK1_DFS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK2_DFS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK3_DFS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK4_DFS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK5_DFS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK0_CURRENT_CNT), \ ++ CLK_SR_DCN35(CLK1_CLK1_CURRENT_CNT), \ ++ CLK_SR_DCN35(CLK1_CLK2_CURRENT_CNT), \ ++ CLK_SR_DCN35(CLK1_CLK3_CURRENT_CNT), \ ++ CLK_SR_DCN35(CLK1_CLK4_CURRENT_CNT), \ ++ CLK_SR_DCN35(CLK1_CLK5_CURRENT_CNT), \ ++ CLK_SR_DCN35(CLK1_CLK0_BYPASS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK1_BYPASS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK2_BYPASS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK3_BYPASS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK4_BYPASS_CNTL),\ ++ CLK_SR_DCN35(CLK1_CLK5_BYPASS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK0_DS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK1_DS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK2_DS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK3_DS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK4_DS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK5_DS_CNTL), \ ++ CLK_SR_DCN35(CLK1_CLK0_ALLOW_DS), \ ++ CLK_SR_DCN35(CLK1_CLK1_ALLOW_DS), \ ++ CLK_SR_DCN35(CLK1_CLK2_ALLOW_DS), \ ++ CLK_SR_DCN35(CLK1_CLK3_ALLOW_DS), \ ++ CLK_SR_DCN35(CLK1_CLK4_ALLOW_DS), \ ++ CLK_SR_DCN35(CLK1_CLK5_ALLOW_DS), \ ++ CLK_SR_DCN35(CLK5_spll_field_8), \ ++ SR(DENTIST_DISPCLK_CNTL), \ ++ + #define CLK_COMMON_MASK_SH_LIST_DCN32(mask_sh) \ + CLK_COMMON_MASK_SH_LIST_DCN20_BASE(mask_sh),\ + CLK_SF(CLK1_CLK_PLL_REQ, FbMult_int, mask_sh),\ +@@ -236,6 +271,7 @@ struct clk_mgr_registers { + uint32_t CLK1_CLK2_DFS_CNTL; + uint32_t CLK1_CLK3_DFS_CNTL; + uint32_t CLK1_CLK4_DFS_CNTL; ++ uint32_t CLK1_CLK5_DFS_CNTL; + uint32_t CLK2_CLK2_DFS_CNTL; + + uint32_t CLK1_CLK0_CURRENT_CNT; +@@ -243,11 +279,34 @@ struct clk_mgr_registers { + uint32_t CLK1_CLK2_CURRENT_CNT; + uint32_t CLK1_CLK3_CURRENT_CNT; + uint32_t CLK1_CLK4_CURRENT_CNT; ++ uint32_t CLK1_CLK5_CURRENT_CNT; + + uint32_t CLK0_CLK0_DFS_CNTL; + uint32_t CLK0_CLK1_DFS_CNTL; + uint32_t CLK0_CLK3_DFS_CNTL; + uint32_t CLK0_CLK4_DFS_CNTL; ++ uint32_t CLK1_CLK0_BYPASS_CNTL; ++ uint32_t CLK1_CLK1_BYPASS_CNTL; ++ uint32_t CLK1_CLK2_BYPASS_CNTL; ++ uint32_t CLK1_CLK3_BYPASS_CNTL; ++ uint32_t CLK1_CLK4_BYPASS_CNTL; ++ uint32_t CLK1_CLK5_BYPASS_CNTL; ++ ++ uint32_t CLK1_CLK0_DS_CNTL; ++ uint32_t CLK1_CLK1_DS_CNTL; ++ uint32_t CLK1_CLK2_DS_CNTL; ++ uint32_t CLK1_CLK3_DS_CNTL; ++ uint32_t CLK1_CLK4_DS_CNTL; ++ uint32_t CLK1_CLK5_DS_CNTL; ++ ++ uint32_t CLK1_CLK0_ALLOW_DS; ++ uint32_t CLK1_CLK1_ALLOW_DS; ++ uint32_t CLK1_CLK2_ALLOW_DS; ++ uint32_t CLK1_CLK3_ALLOW_DS; ++ uint32_t CLK1_CLK4_ALLOW_DS; ++ uint32_t CLK1_CLK5_ALLOW_DS; ++ uint32_t CLK5_spll_field_8; ++ + }; + + struct clk_mgr_shift { +diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/drm_panic_qr.rs +index ef2d490965ba20..bcf248f69252c2 100644 +--- a/drivers/gpu/drm/drm_panic_qr.rs ++++ b/drivers/gpu/drm/drm_panic_qr.rs +@@ -931,7 +931,7 @@ fn draw_all(&mut self, data: impl Iterator) { + /// They must remain valid for the duration of the function call. + #[no_mangle] + pub unsafe extern "C" fn drm_panic_qr_generate( +- url: *const i8, ++ url: *const kernel::ffi::c_char, + data: *mut u8, + data_len: usize, + data_size: usize, +diff --git a/drivers/gpu/drm/i915/display/icl_dsi.c b/drivers/gpu/drm/i915/display/icl_dsi.c +index 8a49f499e3fb3f..b40f1398f0f822 100644 +--- a/drivers/gpu/drm/i915/display/icl_dsi.c ++++ b/drivers/gpu/drm/i915/display/icl_dsi.c +@@ -808,8 +808,8 @@ gen11_dsi_configure_transcoder(struct intel_encoder *encoder, + /* select data lane width */ + tmp = intel_de_read(display, + TRANS_DDI_FUNC_CTL(display, dsi_trans)); +- tmp &= ~DDI_PORT_WIDTH_MASK; +- tmp |= DDI_PORT_WIDTH(intel_dsi->lane_count); ++ tmp &= ~TRANS_DDI_PORT_WIDTH_MASK; ++ tmp |= TRANS_DDI_PORT_WIDTH(intel_dsi->lane_count); + + /* select input pipe */ + tmp &= ~TRANS_DDI_EDP_INPUT_MASK; +diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c b/drivers/gpu/drm/i915/display/intel_ddi.c +index 49b5cc01ce40ad..943b57835b3a69 100644 +--- a/drivers/gpu/drm/i915/display/intel_ddi.c ++++ b/drivers/gpu/drm/i915/display/intel_ddi.c +@@ -3399,7 +3399,7 @@ static void intel_enable_ddi_hdmi(struct intel_atomic_state *state, + intel_de_rmw(dev_priv, XELPDP_PORT_BUF_CTL1(dev_priv, port), + XELPDP_PORT_WIDTH_MASK | XELPDP_PORT_REVERSAL, port_buf); + +- buf_ctl |= DDI_PORT_WIDTH(lane_count); ++ buf_ctl |= DDI_PORT_WIDTH(crtc_state->lane_count); + + if (DISPLAY_VER(dev_priv) >= 20) + buf_ctl |= XE2LPD_DDI_BUF_D2D_LINK_ENABLE; +diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c +index 863927f429aa73..9d9fe11dd0557a 100644 +--- a/drivers/gpu/drm/i915/display/intel_display.c ++++ b/drivers/gpu/drm/i915/display/intel_display.c +@@ -6641,12 +6641,30 @@ static int intel_async_flip_check_hw(struct intel_atomic_state *state, struct in + static int intel_joiner_add_affected_crtcs(struct intel_atomic_state *state) + { + struct drm_i915_private *i915 = to_i915(state->base.dev); ++ const struct intel_plane_state *plane_state; + struct intel_crtc_state *crtc_state; ++ struct intel_plane *plane; + struct intel_crtc *crtc; + u8 affected_pipes = 0; + u8 modeset_pipes = 0; + int i; + ++ /* ++ * Any plane which is in use by the joiner needs its crtc. ++ * Pull those in first as this will not have happened yet ++ * if the plane remains disabled according to uapi. ++ */ ++ for_each_new_intel_plane_in_state(state, plane, plane_state, i) { ++ crtc = to_intel_crtc(plane_state->hw.crtc); ++ if (!crtc) ++ continue; ++ ++ crtc_state = intel_atomic_get_crtc_state(&state->base, crtc); ++ if (IS_ERR(crtc_state)) ++ return PTR_ERR(crtc_state); ++ } ++ ++ /* Now pull in all joined crtcs */ + for_each_new_intel_crtc_in_state(state, crtc, crtc_state, i) { + affected_pipes |= crtc_state->joiner_pipes; + if (intel_crtc_needs_modeset(crtc_state)) +diff --git a/drivers/gpu/drm/i915/display/intel_dp_link_training.c b/drivers/gpu/drm/i915/display/intel_dp_link_training.c +index 397cc4ebae526a..bb70ba31efd9d6 100644 +--- a/drivers/gpu/drm/i915/display/intel_dp_link_training.c ++++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.c +@@ -1565,7 +1565,7 @@ intel_dp_128b132b_link_train(struct intel_dp *intel_dp, + + if (wait_for(intel_dp_128b132b_intra_hop(intel_dp, crtc_state) == 0, 500)) { + lt_err(intel_dp, DP_PHY_DPRX, "128b/132b intra-hop not clear\n"); +- return false; ++ goto out; + } + + if (intel_dp_128b132b_lane_eq(intel_dp, crtc_state) && +@@ -1577,6 +1577,19 @@ intel_dp_128b132b_link_train(struct intel_dp *intel_dp, + passed ? "passed" : "failed", + crtc_state->port_clock, crtc_state->lane_count); + ++out: ++ /* ++ * Ensure that the training pattern does get set to TPS2 even in case ++ * of a failure, as is the case at the end of a passing link training ++ * and what is expected by the transcoder. Leaving TPS1 set (and ++ * disabling the link train mode in DP_TP_CTL later from TPS1 directly) ++ * would result in a stuck transcoder HW state and flip-done timeouts ++ * later in the modeset sequence. ++ */ ++ if (!passed) ++ intel_dp_program_link_training_pattern(intel_dp, crtc_state, ++ DP_PHY_DPRX, DP_TRAINING_PATTERN_2); ++ + return passed; + } + +diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +index 4b12a6c7c247bd..20b5890754aefb 100644 +--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c ++++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +@@ -3425,10 +3425,10 @@ static inline int guc_lrc_desc_unpin(struct intel_context *ce) + */ + ret = deregister_context(ce, ce->guc_id.id); + if (ret) { +- spin_lock(&ce->guc_state.lock); ++ spin_lock_irqsave(&ce->guc_state.lock, flags); + set_context_registered(ce); + clr_context_destroyed(ce); +- spin_unlock(&ce->guc_state.lock); ++ spin_unlock_irqrestore(&ce->guc_state.lock, flags); + /* + * As gt-pm is awake at function entry, intel_wakeref_put_async merely decrements + * the wakeref immediately but per function spec usage call this after unlock. +diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h +index 22be4a731d27e6..2f0fc0dbd48477 100644 +--- a/drivers/gpu/drm/i915/i915_reg.h ++++ b/drivers/gpu/drm/i915/i915_reg.h +@@ -3917,7 +3917,7 @@ enum skl_power_gate { + #define DDI_BUF_IS_IDLE (1 << 7) + #define DDI_BUF_CTL_TC_PHY_OWNERSHIP REG_BIT(6) + #define DDI_A_4_LANES (1 << 4) +-#define DDI_PORT_WIDTH(width) (((width) - 1) << 1) ++#define DDI_PORT_WIDTH(width) (((width) == 3 ? 4 : ((width) - 1)) << 1) + #define DDI_PORT_WIDTH_MASK (7 << 1) + #define DDI_PORT_WIDTH_SHIFT 1 + #define DDI_INIT_DISPLAY_DETECTED (1 << 0) +diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h +index 421afacb724803..36cc9dbc00b5c1 100644 +--- a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h ++++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h +@@ -297,7 +297,7 @@ static const struct dpu_wb_cfg sm8150_wb[] = { + { + .name = "wb_2", .id = WB_2, + .base = 0x65000, .len = 0x2c8, +- .features = WB_SDM845_MASK, ++ .features = WB_SM8250_MASK, + .format_list = wb2_formats_rgb, + .num_formats = ARRAY_SIZE(wb2_formats_rgb), + .clk_ctrl = DPU_CLK_CTRL_WB2, +diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h +index 641023b102bf59..e8eacdb47967a2 100644 +--- a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h ++++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h +@@ -304,7 +304,7 @@ static const struct dpu_wb_cfg sc8180x_wb[] = { + { + .name = "wb_2", .id = WB_2, + .base = 0x65000, .len = 0x2c8, +- .features = WB_SDM845_MASK, ++ .features = WB_SM8250_MASK, + .format_list = wb2_formats_rgb, + .num_formats = ARRAY_SIZE(wb2_formats_rgb), + .clk_ctrl = DPU_CLK_CTRL_WB2, +diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_4_sm6125.h b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_4_sm6125.h +index d039b96beb97cf..76f60a2df7a890 100644 +--- a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_4_sm6125.h ++++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_5_4_sm6125.h +@@ -144,7 +144,7 @@ static const struct dpu_wb_cfg sm6125_wb[] = { + { + .name = "wb_2", .id = WB_2, + .base = 0x65000, .len = 0x2c8, +- .features = WB_SDM845_MASK, ++ .features = WB_SM8250_MASK, + .format_list = wb2_formats_rgb, + .num_formats = ARRAY_SIZE(wb2_formats_rgb), + .clk_ctrl = DPU_CLK_CTRL_WB2, +diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c +index 83de7564e2c1fe..67f5fc6fdae102 100644 +--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c ++++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c +@@ -2281,6 +2281,9 @@ void dpu_encoder_helper_phys_cleanup(struct dpu_encoder_phys *phys_enc) + } + } + ++ if (phys_enc->hw_pp && phys_enc->hw_pp->ops.setup_dither) ++ phys_enc->hw_pp->ops.setup_dither(phys_enc->hw_pp, NULL); ++ + /* reset the merge 3D HW block */ + if (phys_enc->hw_pp && phys_enc->hw_pp->merge_3d) { + phys_enc->hw_pp->merge_3d->ops.setup_3d_mode(phys_enc->hw_pp->merge_3d, +diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c +index 657200401f5763..cec6d4e8baec4d 100644 +--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c ++++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c +@@ -52,6 +52,7 @@ static void dpu_hw_dsc_config(struct dpu_hw_dsc *hw_dsc, + u32 slice_last_group_size; + u32 det_thresh_flatness; + bool is_cmd_mode = !(mode & DSC_MODE_VIDEO); ++ bool input_10_bits = dsc->bits_per_component == 10; + + DPU_REG_WRITE(c, DSC_COMMON_MODE, mode); + +@@ -68,7 +69,7 @@ static void dpu_hw_dsc_config(struct dpu_hw_dsc *hw_dsc, + data |= (dsc->line_buf_depth << 3); + data |= (dsc->simple_422 << 2); + data |= (dsc->convert_rgb << 1); +- data |= dsc->bits_per_component; ++ data |= input_10_bits; + + DPU_REG_WRITE(c, DSC_ENC, data); + +diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c +index ad19330de61abd..562a3f4c5238a3 100644 +--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c ++++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c +@@ -272,7 +272,7 @@ static void _setup_mdp_ops(struct dpu_hw_mdp_ops *ops, + + if (cap & BIT(DPU_MDP_VSYNC_SEL)) + ops->setup_vsync_source = dpu_hw_setup_vsync_sel; +- else ++ else if (!(cap & BIT(DPU_MDP_PERIPH_0_REMOVED))) + ops->setup_vsync_source = dpu_hw_setup_wd_timer; + + ops->get_safe_status = dpu_hw_get_safe_status; +diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c +index aff51bb973ebe0..6d69598e85c573 100644 +--- a/drivers/gpu/drm/msm/dp/dp_display.c ++++ b/drivers/gpu/drm/msm/dp/dp_display.c +@@ -937,16 +937,17 @@ enum drm_mode_status msm_dp_bridge_mode_valid(struct drm_bridge *bridge, + return -EINVAL; + } + +- if (mode->clock > DP_MAX_PIXEL_CLK_KHZ) +- return MODE_CLOCK_HIGH; +- + msm_dp_display = container_of(dp, struct msm_dp_display_private, msm_dp_display); + link_info = &msm_dp_display->panel->link_info; + +- if (drm_mode_is_420_only(&dp->connector->display_info, mode) && +- msm_dp_display->panel->vsc_sdp_supported) ++ if ((drm_mode_is_420_only(&dp->connector->display_info, mode) && ++ msm_dp_display->panel->vsc_sdp_supported) || ++ msm_dp_wide_bus_available(dp)) + mode_pclk_khz /= 2; + ++ if (mode_pclk_khz > DP_MAX_PIXEL_CLK_KHZ) ++ return MODE_CLOCK_HIGH; ++ + mode_bpp = dp->connector->display_info.bpc * num_components; + if (!mode_bpp) + mode_bpp = default_bpp; +diff --git a/drivers/gpu/drm/msm/dp/dp_drm.c b/drivers/gpu/drm/msm/dp/dp_drm.c +index d3e241ea694161..16b7913d1eefa8 100644 +--- a/drivers/gpu/drm/msm/dp/dp_drm.c ++++ b/drivers/gpu/drm/msm/dp/dp_drm.c +@@ -257,7 +257,10 @@ static enum drm_mode_status msm_edp_bridge_mode_valid(struct drm_bridge *bridge, + return -EINVAL; + } + +- if (mode->clock > DP_MAX_PIXEL_CLK_KHZ) ++ if (msm_dp_wide_bus_available(dp)) ++ mode_pclk_khz /= 2; ++ ++ if (mode_pclk_khz > DP_MAX_PIXEL_CLK_KHZ) + return MODE_CLOCK_HIGH; + + /* +diff --git a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c +index 031446c87daec0..798168180c1ab6 100644 +--- a/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c ++++ b/drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c +@@ -83,6 +83,9 @@ struct dsi_pll_7nm { + /* protects REG_DSI_7nm_PHY_CMN_CLK_CFG0 register */ + spinlock_t postdiv_lock; + ++ /* protects REG_DSI_7nm_PHY_CMN_CLK_CFG1 register */ ++ spinlock_t pclk_mux_lock; ++ + struct pll_7nm_cached_state cached_state; + + struct dsi_pll_7nm *slave; +@@ -372,22 +375,41 @@ static void dsi_pll_enable_pll_bias(struct dsi_pll_7nm *pll) + ndelay(250); + } + +-static void dsi_pll_disable_global_clk(struct dsi_pll_7nm *pll) ++static void dsi_pll_cmn_clk_cfg0_write(struct dsi_pll_7nm *pll, u32 val) + { ++ unsigned long flags; ++ ++ spin_lock_irqsave(&pll->postdiv_lock, flags); ++ writel(val, pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG0); ++ spin_unlock_irqrestore(&pll->postdiv_lock, flags); ++} ++ ++static void dsi_pll_cmn_clk_cfg1_update(struct dsi_pll_7nm *pll, u32 mask, ++ u32 val) ++{ ++ unsigned long flags; + u32 data; + ++ spin_lock_irqsave(&pll->pclk_mux_lock, flags); + data = readl(pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); +- writel(data & ~BIT(5), pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); ++ data &= ~mask; ++ data |= val & mask; ++ ++ writel(data, pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); ++ spin_unlock_irqrestore(&pll->pclk_mux_lock, flags); ++} ++ ++static void dsi_pll_disable_global_clk(struct dsi_pll_7nm *pll) ++{ ++ dsi_pll_cmn_clk_cfg1_update(pll, DSI_7nm_PHY_CMN_CLK_CFG1_CLK_EN, 0); + } + + static void dsi_pll_enable_global_clk(struct dsi_pll_7nm *pll) + { +- u32 data; ++ u32 cfg_1 = DSI_7nm_PHY_CMN_CLK_CFG1_CLK_EN | DSI_7nm_PHY_CMN_CLK_CFG1_CLK_EN_SEL; + + writel(0x04, pll->phy->base + REG_DSI_7nm_PHY_CMN_CTRL_3); +- +- data = readl(pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); +- writel(data | BIT(5) | BIT(4), pll->phy->base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); ++ dsi_pll_cmn_clk_cfg1_update(pll, cfg_1, cfg_1); + } + + static void dsi_pll_phy_dig_reset(struct dsi_pll_7nm *pll) +@@ -565,7 +587,6 @@ static int dsi_7nm_pll_restore_state(struct msm_dsi_phy *phy) + { + struct dsi_pll_7nm *pll_7nm = to_pll_7nm(phy->vco_hw); + struct pll_7nm_cached_state *cached = &pll_7nm->cached_state; +- void __iomem *phy_base = pll_7nm->phy->base; + u32 val; + int ret; + +@@ -574,13 +595,10 @@ static int dsi_7nm_pll_restore_state(struct msm_dsi_phy *phy) + val |= cached->pll_out_div; + writel(val, pll_7nm->phy->pll_base + REG_DSI_7nm_PHY_PLL_PLL_OUTDIV_RATE); + +- writel(cached->bit_clk_div | (cached->pix_clk_div << 4), +- phy_base + REG_DSI_7nm_PHY_CMN_CLK_CFG0); +- +- val = readl(phy_base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); +- val &= ~0x3; +- val |= cached->pll_mux; +- writel(val, phy_base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); ++ dsi_pll_cmn_clk_cfg0_write(pll_7nm, ++ DSI_7nm_PHY_CMN_CLK_CFG0_DIV_CTRL_3_0(cached->bit_clk_div) | ++ DSI_7nm_PHY_CMN_CLK_CFG0_DIV_CTRL_7_4(cached->pix_clk_div)); ++ dsi_pll_cmn_clk_cfg1_update(pll_7nm, 0x3, cached->pll_mux); + + ret = dsi_pll_7nm_vco_set_rate(phy->vco_hw, + pll_7nm->vco_current_rate, +@@ -599,7 +617,6 @@ static int dsi_7nm_pll_restore_state(struct msm_dsi_phy *phy) + static int dsi_7nm_set_usecase(struct msm_dsi_phy *phy) + { + struct dsi_pll_7nm *pll_7nm = to_pll_7nm(phy->vco_hw); +- void __iomem *base = phy->base; + u32 data = 0x0; /* internal PLL */ + + DBG("DSI PLL%d", pll_7nm->phy->id); +@@ -618,7 +635,8 @@ static int dsi_7nm_set_usecase(struct msm_dsi_phy *phy) + } + + /* set PLL src */ +- writel(data << 2, base + REG_DSI_7nm_PHY_CMN_CLK_CFG1); ++ dsi_pll_cmn_clk_cfg1_update(pll_7nm, DSI_7nm_PHY_CMN_CLK_CFG1_BITCLK_SEL__MASK, ++ DSI_7nm_PHY_CMN_CLK_CFG1_BITCLK_SEL(data)); + + return 0; + } +@@ -733,7 +751,7 @@ static int pll_7nm_register(struct dsi_pll_7nm *pll_7nm, struct clk_hw **provide + pll_by_2_bit, + }), 2, 0, pll_7nm->phy->base + + REG_DSI_7nm_PHY_CMN_CLK_CFG1, +- 0, 1, 0, NULL); ++ 0, 1, 0, &pll_7nm->pclk_mux_lock); + if (IS_ERR(hw)) { + ret = PTR_ERR(hw); + goto fail; +@@ -778,6 +796,7 @@ static int dsi_pll_7nm_init(struct msm_dsi_phy *phy) + pll_7nm_list[phy->id] = pll_7nm; + + spin_lock_init(&pll_7nm->postdiv_lock); ++ spin_lock_init(&pll_7nm->pclk_mux_lock); + + pll_7nm->phy = phy; + +diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h +index d8c9a1b192632d..f15962cfb373c5 100644 +--- a/drivers/gpu/drm/msm/msm_drv.h ++++ b/drivers/gpu/drm/msm/msm_drv.h +@@ -530,15 +530,12 @@ static inline int align_pitch(int width, int bpp) + static inline unsigned long timeout_to_jiffies(const ktime_t *timeout) + { + ktime_t now = ktime_get(); +- s64 remaining_jiffies; + +- if (ktime_compare(*timeout, now) < 0) { +- remaining_jiffies = 0; +- } else { +- ktime_t rem = ktime_sub(*timeout, now); +- remaining_jiffies = ktime_divns(rem, NSEC_PER_SEC / HZ); +- } ++ if (ktime_compare(*timeout, now) <= 0) ++ return 0; + ++ ktime_t rem = ktime_sub(*timeout, now); ++ s64 remaining_jiffies = ktime_divns(rem, NSEC_PER_SEC / HZ); + return clamp(remaining_jiffies, 1LL, (s64)INT_MAX); + } + +diff --git a/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml b/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml +index d54b72f924493b..35f7f40e405b7d 100644 +--- a/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml ++++ b/drivers/gpu/drm/msm/registers/display/dsi_phy_7nm.xml +@@ -9,8 +9,15 @@ xsi:schemaLocation="https://gitlab.freedesktop.org/freedreno/ rules-fd.xsd"> + + + +- +- ++ ++ ++ ++ ++ ++ ++ ++ ++ + + + +diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c +index b4da82ddbb6b2f..8ea98f06d39afc 100644 +--- a/drivers/gpu/drm/nouveau/nouveau_svm.c ++++ b/drivers/gpu/drm/nouveau/nouveau_svm.c +@@ -590,6 +590,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, + unsigned long timeout = + jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); + struct mm_struct *mm = svmm->notifier.mm; ++ struct folio *folio; + struct page *page; + unsigned long start = args->p.addr; + unsigned long notifier_seq; +@@ -616,12 +617,16 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, + ret = -EINVAL; + goto out; + } ++ folio = page_folio(page); + + mutex_lock(&svmm->mutex); + if (!mmu_interval_read_retry(¬ifier->notifier, + notifier_seq)) + break; + mutex_unlock(&svmm->mutex); ++ ++ folio_unlock(folio); ++ folio_put(folio); + } + + /* Map the page on the GPU. */ +@@ -637,8 +642,8 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, + ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); + mutex_unlock(&svmm->mutex); + +- unlock_page(page); +- put_page(page); ++ folio_unlock(folio); ++ folio_put(folio); + + out: + mmu_interval_notifier_remove(¬ifier->notifier); +diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c +index a6f410ba60bc94..d393bc540f8628 100644 +--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c ++++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c +@@ -75,7 +75,7 @@ gp10b_pmu_acr = { + .bootstrap_multiple_falcons = gp10b_pmu_acr_bootstrap_multiple_falcons, + }; + +-#if IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC) ++#if IS_ENABLED(CONFIG_ARCH_TEGRA_186_SOC) + MODULE_FIRMWARE("nvidia/gp10b/pmu/desc.bin"); + MODULE_FIRMWARE("nvidia/gp10b/pmu/image.bin"); + MODULE_FIRMWARE("nvidia/gp10b/pmu/sig.bin"); +diff --git a/drivers/gpu/drm/panel/panel-jadard-jd9365da-h3.c b/drivers/gpu/drm/panel/panel-jadard-jd9365da-h3.c +index 45d09e6fa667fd..7d68a8acfe2ea4 100644 +--- a/drivers/gpu/drm/panel/panel-jadard-jd9365da-h3.c ++++ b/drivers/gpu/drm/panel/panel-jadard-jd9365da-h3.c +@@ -109,13 +109,13 @@ static int jadard_prepare(struct drm_panel *panel) + if (jadard->desc->lp11_to_reset_delay_ms) + msleep(jadard->desc->lp11_to_reset_delay_ms); + +- gpiod_set_value(jadard->reset, 1); ++ gpiod_set_value(jadard->reset, 0); + msleep(5); + +- gpiod_set_value(jadard->reset, 0); ++ gpiod_set_value(jadard->reset, 1); + msleep(10); + +- gpiod_set_value(jadard->reset, 1); ++ gpiod_set_value(jadard->reset, 0); + msleep(130); + + ret = jadard->desc->init(jadard); +@@ -1130,7 +1130,7 @@ static int jadard_dsi_probe(struct mipi_dsi_device *dsi) + dsi->format = desc->format; + dsi->lanes = desc->lanes; + +- jadard->reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_LOW); ++ jadard->reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_HIGH); + if (IS_ERR(jadard->reset)) { + DRM_DEV_ERROR(&dsi->dev, "failed to get our reset GPIO\n"); + return PTR_ERR(jadard->reset); +diff --git a/drivers/gpu/drm/xe/display/ext/i915_irq.c b/drivers/gpu/drm/xe/display/ext/i915_irq.c +index a7dbc6554d6944..ac4cda2d81c7a1 100644 +--- a/drivers/gpu/drm/xe/display/ext/i915_irq.c ++++ b/drivers/gpu/drm/xe/display/ext/i915_irq.c +@@ -53,18 +53,7 @@ void gen2_irq_init(struct intel_uncore *uncore, struct i915_irq_regs regs, + + bool intel_irqs_enabled(struct xe_device *xe) + { +- /* +- * XXX: i915 has a racy handling of the irq.enabled, since it doesn't +- * lock its transitions. Because of that, the irq.enabled sometimes +- * is not read with the irq.lock in place. +- * However, the most critical cases like vblank and page flips are +- * properly using the locks. +- * We cannot take the lock in here or run any kind of assert because +- * of i915 inconsistency. +- * But at this point the xe irq is better protected against races, +- * although the full solution would be protecting the i915 side. +- */ +- return xe->irq.enabled; ++ return atomic_read(&xe->irq.enabled); + } + + void intel_synchronize_irq(struct xe_device *xe) +diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c +index 06d6db8b50f93f..7f902d50ebf696 100644 +--- a/drivers/gpu/drm/xe/xe_device.c ++++ b/drivers/gpu/drm/xe/xe_device.c +@@ -324,7 +324,9 @@ struct xe_device *xe_device_create(struct pci_dev *pdev, + xe->info.revid = pdev->revision; + xe->info.force_execlist = xe_modparam.force_execlist; + +- spin_lock_init(&xe->irq.lock); ++ err = xe_irq_init(xe); ++ if (err) ++ goto err; + + init_waitqueue_head(&xe->ufence_wq); + +diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h +index f1fbfe91686782..fc3c2af3fb7fd1 100644 +--- a/drivers/gpu/drm/xe/xe_device.h ++++ b/drivers/gpu/drm/xe/xe_device.h +@@ -157,8 +157,7 @@ static inline bool xe_device_has_sriov(struct xe_device *xe) + + static inline bool xe_device_has_msix(struct xe_device *xe) + { +- /* TODO: change this when MSI-X support is fully integrated */ +- return false; ++ return xe->irq.msix.nvec > 0; + } + + static inline bool xe_device_has_memirq(struct xe_device *xe) +diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h +index b9ea455d6f59fa..782eb224a46e7a 100644 +--- a/drivers/gpu/drm/xe/xe_device_types.h ++++ b/drivers/gpu/drm/xe/xe_device_types.h +@@ -345,7 +345,13 @@ struct xe_device { + spinlock_t lock; + + /** @irq.enabled: interrupts enabled on this device */ +- bool enabled; ++ atomic_t enabled; ++ ++ /** @irq.msix: irq info for platforms that support MSI-X */ ++ struct { ++ /** @irq.msix.nvec: number of MSI-X interrupts */ ++ u16 nvec; ++ } msix; + } irq; + + /** @ttm: ttm device */ +diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c +index b7995ebd54abde..ca04327bd6dfbd 100644 +--- a/drivers/gpu/drm/xe/xe_irq.c ++++ b/drivers/gpu/drm/xe/xe_irq.c +@@ -10,6 +10,7 @@ + #include + + #include "display/xe_display.h" ++#include "regs/xe_guc_regs.h" + #include "regs/xe_irq_regs.h" + #include "xe_device.h" + #include "xe_drv.h" +@@ -29,6 +30,11 @@ + #define IIR(offset) XE_REG(offset + 0x8) + #define IER(offset) XE_REG(offset + 0xc) + ++static int xe_irq_msix_init(struct xe_device *xe); ++static void xe_irq_msix_free(struct xe_device *xe); ++static int xe_irq_msix_request_irqs(struct xe_device *xe); ++static void xe_irq_msix_synchronize_irq(struct xe_device *xe); ++ + static void assert_iir_is_zero(struct xe_mmio *mmio, struct xe_reg reg) + { + u32 val = xe_mmio_read32(mmio, reg); +@@ -348,12 +354,8 @@ static irqreturn_t xelp_irq_handler(int irq, void *arg) + unsigned long intr_dw[2]; + u32 identity[32]; + +- spin_lock(&xe->irq.lock); +- if (!xe->irq.enabled) { +- spin_unlock(&xe->irq.lock); ++ if (!atomic_read(&xe->irq.enabled)) + return IRQ_NONE; +- } +- spin_unlock(&xe->irq.lock); + + master_ctl = xelp_intr_disable(xe); + if (!master_ctl) { +@@ -417,12 +419,8 @@ static irqreturn_t dg1_irq_handler(int irq, void *arg) + + /* TODO: This really shouldn't be copied+pasted */ + +- spin_lock(&xe->irq.lock); +- if (!xe->irq.enabled) { +- spin_unlock(&xe->irq.lock); ++ if (!atomic_read(&xe->irq.enabled)) + return IRQ_NONE; +- } +- spin_unlock(&xe->irq.lock); + + master_tile_ctl = dg1_intr_disable(xe); + if (!master_tile_ctl) { +@@ -580,6 +578,11 @@ static void xe_irq_reset(struct xe_device *xe) + if (IS_SRIOV_VF(xe)) + return vf_irq_reset(xe); + ++ if (xe_device_uses_memirq(xe)) { ++ for_each_tile(tile, xe, id) ++ xe_memirq_reset(&tile->memirq); ++ } ++ + for_each_tile(tile, xe, id) { + if (GRAPHICS_VERx100(xe) >= 1210) + dg1_irq_reset(tile); +@@ -622,6 +625,14 @@ static void xe_irq_postinstall(struct xe_device *xe) + if (IS_SRIOV_VF(xe)) + return vf_irq_postinstall(xe); + ++ if (xe_device_uses_memirq(xe)) { ++ struct xe_tile *tile; ++ unsigned int id; ++ ++ for_each_tile(tile, xe, id) ++ xe_memirq_postinstall(&tile->memirq); ++ } ++ + xe_display_irq_postinstall(xe, xe_root_mmio_gt(xe)); + + /* +@@ -644,12 +655,8 @@ static irqreturn_t vf_mem_irq_handler(int irq, void *arg) + struct xe_tile *tile; + unsigned int id; + +- spin_lock(&xe->irq.lock); +- if (!xe->irq.enabled) { +- spin_unlock(&xe->irq.lock); ++ if (!atomic_read(&xe->irq.enabled)) + return IRQ_NONE; +- } +- spin_unlock(&xe->irq.lock); + + for_each_tile(tile, xe, id) + xe_memirq_handler(&tile->memirq); +@@ -668,87 +675,105 @@ static irq_handler_t xe_irq_handler(struct xe_device *xe) + return xelp_irq_handler; + } + +-static void irq_uninstall(void *arg) ++static int xe_irq_msi_request_irqs(struct xe_device *xe) ++{ ++ struct pci_dev *pdev = to_pci_dev(xe->drm.dev); ++ irq_handler_t irq_handler; ++ int irq, err; ++ ++ irq_handler = xe_irq_handler(xe); ++ if (!irq_handler) { ++ drm_err(&xe->drm, "No supported interrupt handler"); ++ return -EINVAL; ++ } ++ ++ irq = pci_irq_vector(pdev, 0); ++ err = request_irq(irq, irq_handler, IRQF_SHARED, DRIVER_NAME, xe); ++ if (err < 0) { ++ drm_err(&xe->drm, "Failed to request MSI IRQ %d\n", err); ++ return err; ++ } ++ ++ return 0; ++} ++ ++static void xe_irq_msi_free(struct xe_device *xe) + { +- struct xe_device *xe = arg; + struct pci_dev *pdev = to_pci_dev(xe->drm.dev); + int irq; + +- if (!xe->irq.enabled) ++ irq = pci_irq_vector(pdev, 0); ++ free_irq(irq, xe); ++} ++ ++static void irq_uninstall(void *arg) ++{ ++ struct xe_device *xe = arg; ++ ++ if (!atomic_xchg(&xe->irq.enabled, 0)) + return; + +- xe->irq.enabled = false; + xe_irq_reset(xe); + +- irq = pci_irq_vector(pdev, 0); +- free_irq(irq, xe); ++ if (xe_device_has_msix(xe)) ++ xe_irq_msix_free(xe); ++ else ++ xe_irq_msi_free(xe); ++} ++ ++int xe_irq_init(struct xe_device *xe) ++{ ++ spin_lock_init(&xe->irq.lock); ++ ++ return xe_irq_msix_init(xe); + } + + int xe_irq_install(struct xe_device *xe) + { + struct pci_dev *pdev = to_pci_dev(xe->drm.dev); +- unsigned int irq_flags = PCI_IRQ_MSIX; +- irq_handler_t irq_handler; +- int err, irq, nvec; +- +- irq_handler = xe_irq_handler(xe); +- if (!irq_handler) { +- drm_err(&xe->drm, "No supported interrupt handler"); +- return -EINVAL; +- } ++ unsigned int irq_flags = PCI_IRQ_MSI; ++ int nvec = 1; ++ int err; + + xe_irq_reset(xe); + +- nvec = pci_msix_vec_count(pdev); +- if (nvec <= 0) { +- if (nvec == -EINVAL) { +- /* MSIX capability is not supported in the device, using MSI */ +- irq_flags = PCI_IRQ_MSI; +- nvec = 1; +- } else { +- drm_err(&xe->drm, "MSIX: Failed getting count\n"); +- return nvec; +- } ++ if (xe_device_has_msix(xe)) { ++ nvec = xe->irq.msix.nvec; ++ irq_flags = PCI_IRQ_MSIX; + } + + err = pci_alloc_irq_vectors(pdev, nvec, nvec, irq_flags); + if (err < 0) { +- drm_err(&xe->drm, "MSI/MSIX: Failed to enable support %d\n", err); ++ drm_err(&xe->drm, "Failed to allocate IRQ vectors: %d\n", err); + return err; + } + +- irq = pci_irq_vector(pdev, 0); +- err = request_irq(irq, irq_handler, IRQF_SHARED, DRIVER_NAME, xe); +- if (err < 0) { +- drm_err(&xe->drm, "Failed to request MSI/MSIX IRQ %d\n", err); ++ err = xe_device_has_msix(xe) ? xe_irq_msix_request_irqs(xe) : ++ xe_irq_msi_request_irqs(xe); ++ if (err) + return err; +- } + +- xe->irq.enabled = true; ++ atomic_set(&xe->irq.enabled, 1); + + xe_irq_postinstall(xe); + +- err = devm_add_action_or_reset(xe->drm.dev, irq_uninstall, xe); +- if (err) +- goto free_irq_handler; +- +- return 0; +- +-free_irq_handler: +- free_irq(irq, xe); ++ return devm_add_action_or_reset(xe->drm.dev, irq_uninstall, xe); ++} + +- return err; ++static void xe_irq_msi_synchronize_irq(struct xe_device *xe) ++{ ++ synchronize_irq(to_pci_dev(xe->drm.dev)->irq); + } + + void xe_irq_suspend(struct xe_device *xe) + { +- int irq = to_pci_dev(xe->drm.dev)->irq; +- +- spin_lock_irq(&xe->irq.lock); +- xe->irq.enabled = false; /* no new irqs */ +- spin_unlock_irq(&xe->irq.lock); ++ atomic_set(&xe->irq.enabled, 0); /* no new irqs */ + +- synchronize_irq(irq); /* flush irqs */ ++ /* flush irqs */ ++ if (xe_device_has_msix(xe)) ++ xe_irq_msix_synchronize_irq(xe); ++ else ++ xe_irq_msi_synchronize_irq(xe); + xe_irq_reset(xe); /* turn irqs off */ + } + +@@ -762,10 +787,149 @@ void xe_irq_resume(struct xe_device *xe) + * 1. no irq will arrive before the postinstall + * 2. display is not yet resumed + */ +- xe->irq.enabled = true; ++ atomic_set(&xe->irq.enabled, 1); + xe_irq_reset(xe); + xe_irq_postinstall(xe); /* turn irqs on */ + + for_each_gt(gt, xe, id) + xe_irq_enable_hwe(gt); + } ++ ++/* MSI-X related definitions and functions below. */ ++ ++enum xe_irq_msix_static { ++ GUC2HOST_MSIX = 0, ++ DEFAULT_MSIX = XE_IRQ_DEFAULT_MSIX, ++ /* Must be last */ ++ NUM_OF_STATIC_MSIX, ++}; ++ ++static int xe_irq_msix_init(struct xe_device *xe) ++{ ++ struct pci_dev *pdev = to_pci_dev(xe->drm.dev); ++ int nvec = pci_msix_vec_count(pdev); ++ ++ if (nvec == -EINVAL) ++ return 0; /* MSI */ ++ ++ if (nvec < 0) { ++ drm_err(&xe->drm, "Failed getting MSI-X vectors count: %d\n", nvec); ++ return nvec; ++ } ++ ++ xe->irq.msix.nvec = nvec; ++ return 0; ++} ++ ++static irqreturn_t guc2host_irq_handler(int irq, void *arg) ++{ ++ struct xe_device *xe = arg; ++ struct xe_tile *tile; ++ u8 id; ++ ++ if (!atomic_read(&xe->irq.enabled)) ++ return IRQ_NONE; ++ ++ for_each_tile(tile, xe, id) ++ xe_guc_irq_handler(&tile->primary_gt->uc.guc, ++ GUC_INTR_GUC2HOST); ++ ++ return IRQ_HANDLED; ++} ++ ++static irqreturn_t xe_irq_msix_default_hwe_handler(int irq, void *arg) ++{ ++ unsigned int tile_id, gt_id; ++ struct xe_device *xe = arg; ++ struct xe_memirq *memirq; ++ struct xe_hw_engine *hwe; ++ enum xe_hw_engine_id id; ++ struct xe_tile *tile; ++ struct xe_gt *gt; ++ ++ if (!atomic_read(&xe->irq.enabled)) ++ return IRQ_NONE; ++ ++ for_each_tile(tile, xe, tile_id) { ++ memirq = &tile->memirq; ++ if (!memirq->bo) ++ continue; ++ ++ for_each_gt(gt, xe, gt_id) { ++ if (gt->tile != tile) ++ continue; ++ ++ for_each_hw_engine(hwe, gt, id) ++ xe_memirq_hwe_handler(memirq, hwe); ++ } ++ } ++ ++ return IRQ_HANDLED; ++} ++ ++static int xe_irq_msix_request_irq(struct xe_device *xe, irq_handler_t handler, ++ const char *name, u16 msix) ++{ ++ struct pci_dev *pdev = to_pci_dev(xe->drm.dev); ++ int ret, irq; ++ ++ irq = pci_irq_vector(pdev, msix); ++ if (irq < 0) ++ return irq; ++ ++ ret = request_irq(irq, handler, IRQF_SHARED, name, xe); ++ if (ret < 0) ++ return ret; ++ ++ return 0; ++} ++ ++static void xe_irq_msix_free_irq(struct xe_device *xe, u16 msix) ++{ ++ struct pci_dev *pdev = to_pci_dev(xe->drm.dev); ++ int irq; ++ ++ irq = pci_irq_vector(pdev, msix); ++ if (irq < 0) { ++ drm_err(&xe->drm, "MSI-X %u can't be released, there is no matching IRQ\n", msix); ++ return; ++ } ++ ++ free_irq(irq, xe); ++} ++ ++static int xe_irq_msix_request_irqs(struct xe_device *xe) ++{ ++ int err; ++ ++ err = xe_irq_msix_request_irq(xe, guc2host_irq_handler, ++ DRIVER_NAME "-guc2host", GUC2HOST_MSIX); ++ if (err) { ++ drm_err(&xe->drm, "Failed to request MSI-X IRQ %d: %d\n", GUC2HOST_MSIX, err); ++ return err; ++ } ++ ++ err = xe_irq_msix_request_irq(xe, xe_irq_msix_default_hwe_handler, ++ DRIVER_NAME "-default-msix", DEFAULT_MSIX); ++ if (err) { ++ drm_err(&xe->drm, "Failed to request MSI-X IRQ %d: %d\n", DEFAULT_MSIX, err); ++ xe_irq_msix_free_irq(xe, GUC2HOST_MSIX); ++ return err; ++ } ++ ++ return 0; ++} ++ ++static void xe_irq_msix_free(struct xe_device *xe) ++{ ++ xe_irq_msix_free_irq(xe, GUC2HOST_MSIX); ++ xe_irq_msix_free_irq(xe, DEFAULT_MSIX); ++} ++ ++static void xe_irq_msix_synchronize_irq(struct xe_device *xe) ++{ ++ struct pci_dev *pdev = to_pci_dev(xe->drm.dev); ++ ++ synchronize_irq(pci_irq_vector(pdev, GUC2HOST_MSIX)); ++ synchronize_irq(pci_irq_vector(pdev, DEFAULT_MSIX)); ++} +diff --git a/drivers/gpu/drm/xe/xe_irq.h b/drivers/gpu/drm/xe/xe_irq.h +index 067514e13675ba..24ff16111b9688 100644 +--- a/drivers/gpu/drm/xe/xe_irq.h ++++ b/drivers/gpu/drm/xe/xe_irq.h +@@ -6,10 +6,13 @@ + #ifndef _XE_IRQ_H_ + #define _XE_IRQ_H_ + ++#define XE_IRQ_DEFAULT_MSIX 1 ++ + struct xe_device; + struct xe_tile; + struct xe_gt; + ++int xe_irq_init(struct xe_device *xe); + int xe_irq_install(struct xe_device *xe); + void xe_irq_suspend(struct xe_device *xe); + void xe_irq_resume(struct xe_device *xe); +diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c +index bf5608a7405610..0f6cd44fff2921 100644 +--- a/drivers/hv/vmbus_drv.c ++++ b/drivers/hv/vmbus_drv.c +@@ -2462,6 +2462,7 @@ static int vmbus_bus_suspend(struct device *dev) + + static int vmbus_bus_resume(struct device *dev) + { ++ struct vmbus_channel *channel; + struct vmbus_channel_msginfo *msginfo; + size_t msgsize; + int ret; +@@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev) + + vmbus_request_offers(); + ++ mutex_lock(&vmbus_connection.channel_mutex); ++ list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) { ++ if (channel->offermsg.child_relid != INVALID_RELID) ++ continue; ++ ++ /* hvsock channels are not expected to be present. */ ++ if (is_hvsock_channel(channel)) ++ continue; ++ ++ pr_err("channel %pUl/%pUl not present after resume.\n", ++ &channel->offermsg.offer.if_type, ++ &channel->offermsg.offer.if_instance); ++ /* ToDo: Cleanup these channels here */ ++ } ++ mutex_unlock(&vmbus_connection.channel_mutex); ++ + /* Reset the event for the next suspend. */ + reinit_completion(&vmbus_connection.ready_for_suspend_event); + +diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c +index 76dce0aac24656..270d7a4d85a6d7 100644 +--- a/drivers/irqchip/irq-gic-v3.c ++++ b/drivers/irqchip/irq-gic-v3.c +@@ -44,6 +44,7 @@ static u8 dist_prio_nmi __ro_after_init = GICV3_PRIO_NMI; + #define FLAGS_WORKAROUND_GICR_WAKER_MSM8996 (1ULL << 0) + #define FLAGS_WORKAROUND_CAVIUM_ERRATUM_38539 (1ULL << 1) + #define FLAGS_WORKAROUND_ASR_ERRATUM_8601001 (1ULL << 2) ++#define FLAGS_WORKAROUND_INSECURE (1ULL << 3) + + #define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1) + +@@ -83,6 +84,8 @@ static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key); + #define GIC_LINE_NR min(GICD_TYPER_SPIS(gic_data.rdists.gicd_typer), 1020U) + #define GIC_ESPI_NR GICD_TYPER_ESPIS(gic_data.rdists.gicd_typer) + ++static bool nmi_support_forbidden; ++ + /* + * There are 16 SGIs, though we only actually use 8 in Linux. The other 8 SGIs + * are potentially stolen by the secure side. Some code, especially code dealing +@@ -163,21 +166,27 @@ static void __init gic_prio_init(void) + { + bool ds; + +- ds = gic_dist_security_disabled(); +- if (!ds) { +- u32 val; +- +- val = readl_relaxed(gic_data.dist_base + GICD_CTLR); +- val |= GICD_CTLR_DS; +- writel_relaxed(val, gic_data.dist_base + GICD_CTLR); ++ cpus_have_group0 = gic_has_group0(); + +- ds = gic_dist_security_disabled(); +- if (ds) +- pr_warn("Broken GIC integration, security disabled"); ++ ds = gic_dist_security_disabled(); ++ if ((gic_data.flags & FLAGS_WORKAROUND_INSECURE) && !ds) { ++ if (cpus_have_group0) { ++ u32 val; ++ ++ val = readl_relaxed(gic_data.dist_base + GICD_CTLR); ++ val |= GICD_CTLR_DS; ++ writel_relaxed(val, gic_data.dist_base + GICD_CTLR); ++ ++ ds = gic_dist_security_disabled(); ++ if (ds) ++ pr_warn("Broken GIC integration, security disabled\n"); ++ } else { ++ pr_warn("Broken GIC integration, pNMI forbidden\n"); ++ nmi_support_forbidden = true; ++ } + } + + cpus_have_security_disabled = ds; +- cpus_have_group0 = gic_has_group0(); + + /* + * How priority values are used by the GIC depends on two things: +@@ -209,7 +218,7 @@ static void __init gic_prio_init(void) + * be in the non-secure range, we program the non-secure values into + * the distributor to match the PMR values we want. + */ +- if (cpus_have_group0 & !cpus_have_security_disabled) { ++ if (cpus_have_group0 && !cpus_have_security_disabled) { + dist_prio_irq = __gicv3_prio_to_ns(dist_prio_irq); + dist_prio_nmi = __gicv3_prio_to_ns(dist_prio_nmi); + } +@@ -1922,6 +1931,18 @@ static bool gic_enable_quirk_arm64_2941627(void *data) + return true; + } + ++static bool gic_enable_quirk_rk3399(void *data) ++{ ++ struct gic_chip_data *d = data; ++ ++ if (of_machine_is_compatible("rockchip,rk3399")) { ++ d->flags |= FLAGS_WORKAROUND_INSECURE; ++ return true; ++ } ++ ++ return false; ++} ++ + static bool rd_set_non_coherent(void *data) + { + struct gic_chip_data *d = data; +@@ -1996,6 +2017,12 @@ static const struct gic_quirk gic_quirks[] = { + .property = "dma-noncoherent", + .init = rd_set_non_coherent, + }, ++ { ++ .desc = "GICv3: Insecure RK3399 integration", ++ .iidr = 0x0000043b, ++ .mask = 0xff000fff, ++ .init = gic_enable_quirk_rk3399, ++ }, + { + } + }; +@@ -2004,7 +2031,7 @@ static void gic_enable_nmi_support(void) + { + int i; + +- if (!gic_prio_masking_enabled()) ++ if (!gic_prio_masking_enabled() || nmi_support_forbidden) + return; + + rdist_nmi_refs = kcalloc(gic_data.ppi_nr + SGI_NR, +diff --git a/drivers/irqchip/irq-jcore-aic.c b/drivers/irqchip/irq-jcore-aic.c +index b9dcc8e78c7501..1f613eb7b7f034 100644 +--- a/drivers/irqchip/irq-jcore-aic.c ++++ b/drivers/irqchip/irq-jcore-aic.c +@@ -38,7 +38,7 @@ static struct irq_chip jcore_aic; + static void handle_jcore_irq(struct irq_desc *desc) + { + if (irqd_is_per_cpu(irq_desc_get_irq_data(desc))) +- handle_percpu_irq(desc); ++ handle_percpu_devid_irq(desc); + else + handle_simple_irq(desc); + } +diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c +index 7049ec7fb8eb44..e8802309ed600b 100644 +--- a/drivers/md/raid0.c ++++ b/drivers/md/raid0.c +@@ -386,10 +386,8 @@ static int raid0_set_limits(struct mddev *mddev) + lim.io_opt = lim.io_min * mddev->raid_disks; + lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED; + err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); +- if (err) { +- queue_limits_cancel_update(mddev->gendisk->queue); ++ if (err) + return err; +- } + return queue_limits_set(mddev->gendisk->queue, &lim); + } + +diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c +index a5cd6522fc2d4d..3c75a69376f470 100644 +--- a/drivers/md/raid1.c ++++ b/drivers/md/raid1.c +@@ -3219,10 +3219,8 @@ static int raid1_set_limits(struct mddev *mddev) + lim.max_write_zeroes_sectors = 0; + lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED; + err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); +- if (err) { +- queue_limits_cancel_update(mddev->gendisk->queue); ++ if (err) + return err; +- } + return queue_limits_set(mddev->gendisk->queue, &lim); + } + +diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c +index e1e6cd7fb125e1..8b736f30ef9262 100644 +--- a/drivers/md/raid10.c ++++ b/drivers/md/raid10.c +@@ -4020,10 +4020,8 @@ static int raid10_set_queue_limits(struct mddev *mddev) + lim.io_opt = lim.io_min * raid10_nr_stripes(conf); + lim.features |= BLK_FEAT_ATOMIC_WRITES_STACKED; + err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); +- if (err) { +- queue_limits_cancel_update(mddev->gendisk->queue); ++ if (err) + return err; +- } + return queue_limits_set(mddev->gendisk->queue, &lim); + } + +diff --git a/drivers/mtd/nand/raw/cadence-nand-controller.c b/drivers/mtd/nand/raw/cadence-nand-controller.c +index 8d1d710e439dd3..0b2db4173e7230 100644 +--- a/drivers/mtd/nand/raw/cadence-nand-controller.c ++++ b/drivers/mtd/nand/raw/cadence-nand-controller.c +@@ -471,6 +471,8 @@ struct cdns_nand_ctrl { + struct { + void __iomem *virt; + dma_addr_t dma; ++ dma_addr_t iova_dma; ++ u32 size; + } io; + + int irq; +@@ -1835,11 +1837,11 @@ static int cadence_nand_slave_dma_transfer(struct cdns_nand_ctrl *cdns_ctrl, + } + + if (dir == DMA_FROM_DEVICE) { +- src_dma = cdns_ctrl->io.dma; ++ src_dma = cdns_ctrl->io.iova_dma; + dst_dma = buf_dma; + } else { + src_dma = buf_dma; +- dst_dma = cdns_ctrl->io.dma; ++ dst_dma = cdns_ctrl->io.iova_dma; + } + + tx = dmaengine_prep_dma_memcpy(cdns_ctrl->dmac, dst_dma, src_dma, len, +@@ -1861,12 +1863,12 @@ static int cadence_nand_slave_dma_transfer(struct cdns_nand_ctrl *cdns_ctrl, + dma_async_issue_pending(cdns_ctrl->dmac); + wait_for_completion(&finished); + +- dma_unmap_single(cdns_ctrl->dev, buf_dma, len, dir); ++ dma_unmap_single(dma_dev->dev, buf_dma, len, dir); + + return 0; + + err_unmap: +- dma_unmap_single(cdns_ctrl->dev, buf_dma, len, dir); ++ dma_unmap_single(dma_dev->dev, buf_dma, len, dir); + + err: + dev_dbg(cdns_ctrl->dev, "Fall back to CPU I/O\n"); +@@ -2869,6 +2871,7 @@ cadence_nand_irq_cleanup(int irqnum, struct cdns_nand_ctrl *cdns_ctrl) + static int cadence_nand_init(struct cdns_nand_ctrl *cdns_ctrl) + { + dma_cap_mask_t mask; ++ struct dma_device *dma_dev = cdns_ctrl->dmac->device; + int ret; + + cdns_ctrl->cdma_desc = dma_alloc_coherent(cdns_ctrl->dev, +@@ -2904,15 +2907,24 @@ static int cadence_nand_init(struct cdns_nand_ctrl *cdns_ctrl) + dma_cap_set(DMA_MEMCPY, mask); + + if (cdns_ctrl->caps1->has_dma) { +- cdns_ctrl->dmac = dma_request_channel(mask, NULL, NULL); +- if (!cdns_ctrl->dmac) { +- dev_err(cdns_ctrl->dev, +- "Unable to get a DMA channel\n"); +- ret = -EBUSY; ++ cdns_ctrl->dmac = dma_request_chan_by_mask(&mask); ++ if (IS_ERR(cdns_ctrl->dmac)) { ++ ret = dev_err_probe(cdns_ctrl->dev, PTR_ERR(cdns_ctrl->dmac), ++ "%d: Failed to get a DMA channel\n", ret); + goto disable_irq; + } + } + ++ cdns_ctrl->io.iova_dma = dma_map_resource(dma_dev->dev, cdns_ctrl->io.dma, ++ cdns_ctrl->io.size, ++ DMA_BIDIRECTIONAL, 0); ++ ++ ret = dma_mapping_error(dma_dev->dev, cdns_ctrl->io.iova_dma); ++ if (ret) { ++ dev_err(cdns_ctrl->dev, "Failed to map I/O resource to DMA\n"); ++ goto dma_release_chnl; ++ } ++ + nand_controller_init(&cdns_ctrl->controller); + INIT_LIST_HEAD(&cdns_ctrl->chips); + +@@ -2923,18 +2935,22 @@ static int cadence_nand_init(struct cdns_nand_ctrl *cdns_ctrl) + if (ret) { + dev_err(cdns_ctrl->dev, "Failed to register MTD: %d\n", + ret); +- goto dma_release_chnl; ++ goto unmap_dma_resource; + } + + kfree(cdns_ctrl->buf); + cdns_ctrl->buf = kzalloc(cdns_ctrl->buf_size, GFP_KERNEL); + if (!cdns_ctrl->buf) { + ret = -ENOMEM; +- goto dma_release_chnl; ++ goto unmap_dma_resource; + } + + return 0; + ++unmap_dma_resource: ++ dma_unmap_resource(dma_dev->dev, cdns_ctrl->io.iova_dma, ++ cdns_ctrl->io.size, DMA_BIDIRECTIONAL, 0); ++ + dma_release_chnl: + if (cdns_ctrl->dmac) + dma_release_channel(cdns_ctrl->dmac); +@@ -2956,6 +2972,8 @@ static int cadence_nand_init(struct cdns_nand_ctrl *cdns_ctrl) + static void cadence_nand_remove(struct cdns_nand_ctrl *cdns_ctrl) + { + cadence_nand_chips_cleanup(cdns_ctrl); ++ dma_unmap_resource(cdns_ctrl->dmac->device->dev, cdns_ctrl->io.iova_dma, ++ cdns_ctrl->io.size, DMA_BIDIRECTIONAL, 0); + cadence_nand_irq_cleanup(cdns_ctrl->irq, cdns_ctrl); + kfree(cdns_ctrl->buf); + dma_free_coherent(cdns_ctrl->dev, sizeof(struct cadence_nand_cdma_desc), +@@ -3020,7 +3038,9 @@ static int cadence_nand_dt_probe(struct platform_device *ofdev) + cdns_ctrl->io.virt = devm_platform_get_and_ioremap_resource(ofdev, 1, &res); + if (IS_ERR(cdns_ctrl->io.virt)) + return PTR_ERR(cdns_ctrl->io.virt); ++ + cdns_ctrl->io.dma = res->start; ++ cdns_ctrl->io.size = resource_size(res); + + dt->clk = devm_clk_get(cdns_ctrl->dev, "nf_clk"); + if (IS_ERR(dt->clk)) +diff --git a/drivers/mtd/spi-nor/sst.c b/drivers/mtd/spi-nor/sst.c +index b5ad7118c49a2b..175211fe6a5ed2 100644 +--- a/drivers/mtd/spi-nor/sst.c ++++ b/drivers/mtd/spi-nor/sst.c +@@ -174,7 +174,7 @@ static int sst_nor_write_data(struct spi_nor *nor, loff_t to, size_t len, + int ret; + + nor->program_opcode = op; +- ret = spi_nor_write_data(nor, to, 1, buf); ++ ret = spi_nor_write_data(nor, to, len, buf); + if (ret < 0) + return ret; + WARN(ret != len, "While writing %zu byte written %i bytes\n", len, ret); +diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h +index 8167cc5fb0df13..78d2a19593d180 100644 +--- a/drivers/net/ethernet/google/gve/gve.h ++++ b/drivers/net/ethernet/google/gve/gve.h +@@ -1116,6 +1116,16 @@ static inline u32 gve_xdp_tx_start_queue_id(struct gve_priv *priv) + return gve_xdp_tx_queue_id(priv, 0); + } + ++static inline bool gve_supports_xdp_xmit(struct gve_priv *priv) ++{ ++ switch (priv->queue_format) { ++ case GVE_GQI_QPL_FORMAT: ++ return true; ++ default: ++ return false; ++ } ++} ++ + /* gqi napi handler defined in gve_main.c */ + int gve_napi_poll(struct napi_struct *napi, int budget); + +diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c +index 533e659b15b31c..92237fb0b60c1e 100644 +--- a/drivers/net/ethernet/google/gve/gve_main.c ++++ b/drivers/net/ethernet/google/gve/gve_main.c +@@ -1903,6 +1903,8 @@ static void gve_turndown(struct gve_priv *priv) + /* Stop tx queues */ + netif_tx_disable(priv->dev); + ++ xdp_features_clear_redirect_target(priv->dev); ++ + gve_clear_napi_enabled(priv); + gve_clear_report_stats(priv); + +@@ -1972,6 +1974,9 @@ static void gve_turnup(struct gve_priv *priv) + napi_schedule(&block->napi); + } + ++ if (priv->num_xdp_queues && gve_supports_xdp_xmit(priv)) ++ xdp_features_set_redirect_target(priv->dev, false); ++ + gve_set_napi_enabled(priv); + } + +@@ -2246,7 +2251,6 @@ static void gve_set_netdev_xdp_features(struct gve_priv *priv) + if (priv->queue_format == GVE_GQI_QPL_FORMAT) { + xdp_features = NETDEV_XDP_ACT_BASIC; + xdp_features |= NETDEV_XDP_ACT_REDIRECT; +- xdp_features |= NETDEV_XDP_ACT_NDO_XMIT; + xdp_features |= NETDEV_XDP_ACT_XSK_ZEROCOPY; + } else { + xdp_features = 0; +diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c +index e95ae0d39948c8..0676fc547b6f47 100644 +--- a/drivers/net/ethernet/ibm/ibmvnic.c ++++ b/drivers/net/ethernet/ibm/ibmvnic.c +@@ -2408,6 +2408,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) + dma_addr_t data_dma_addr; + struct netdev_queue *txq; + unsigned long lpar_rc; ++ unsigned int skblen; + union sub_crq tx_crq; + unsigned int offset; + bool use_scrq_send_direct = false; +@@ -2522,6 +2523,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) + tx_buff->skb = skb; + tx_buff->index = bufidx; + tx_buff->pool_index = queue_num; ++ skblen = skb->len; + + memset(&tx_crq, 0, sizeof(tx_crq)); + tx_crq.v1.first = IBMVNIC_CRQ_CMD; +@@ -2614,7 +2616,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) + netif_stop_subqueue(netdev, queue_num); + } + +- tx_bytes += skb->len; ++ tx_bytes += skblen; + txq_trans_cond_update(txq); + ret = NETDEV_TX_OK; + goto out; +diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c +index 2ec62c8d86e1c1..59486fe2ad18c2 100644 +--- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c ++++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c +@@ -20,6 +20,8 @@ nfp_bpf_cmsg_alloc(struct nfp_app_bpf *bpf, unsigned int size) + struct sk_buff *skb; + + skb = nfp_app_ctrl_msg_alloc(bpf->app, size, GFP_KERNEL); ++ if (!skb) ++ return NULL; + skb_put(skb, size); + + return skb; +diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +index ae743991117c45..300cf7fed8bca0 100644 +--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c ++++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +@@ -2888,6 +2888,7 @@ static int axienet_probe(struct platform_device *pdev) + + lp->phylink_config.dev = &ndev->dev; + lp->phylink_config.type = PHYLINK_NETDEV; ++ lp->phylink_config.mac_managed_pm = true; + lp->phylink_config.mac_capabilities = MAC_SYM_PAUSE | MAC_ASYM_PAUSE | + MAC_10FD | MAC_100FD | MAC_1000FD; + +diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c +index bc658bc6088546..eea0875e4e5518 100644 +--- a/drivers/net/geneve.c ++++ b/drivers/net/geneve.c +@@ -1902,21 +1902,9 @@ static void geneve_destroy_tunnels(struct net *net, struct list_head *head) + { + struct geneve_net *gn = net_generic(net, geneve_net_id); + struct geneve_dev *geneve, *next; +- struct net_device *dev, *aux; + +- /* gather any geneve devices that were moved into this ns */ +- for_each_netdev_safe(net, dev, aux) +- if (dev->rtnl_link_ops == &geneve_link_ops) +- unregister_netdevice_queue(dev, head); +- +- /* now gather any other geneve devices that were created in this ns */ +- list_for_each_entry_safe(geneve, next, &gn->geneve_list, next) { +- /* If geneve->dev is in the same netns, it was already added +- * to the list by the previous loop. +- */ +- if (!net_eq(dev_net(geneve->dev), net)) +- unregister_netdevice_queue(geneve->dev, head); +- } ++ list_for_each_entry_safe(geneve, next, &gn->geneve_list, next) ++ geneve_dellink(geneve->dev, head); + } + + static void __net_exit geneve_exit_batch_rtnl(struct list_head *net_list, +diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c +index fbabada7d3ba98..2cb13e092a856b 100644 +--- a/drivers/net/gtp.c ++++ b/drivers/net/gtp.c +@@ -2479,11 +2479,6 @@ static void __net_exit gtp_net_exit_batch_rtnl(struct list_head *net_list, + list_for_each_entry(net, net_list, exit_list) { + struct gtp_net *gn = net_generic(net, gtp_net_id); + struct gtp_dev *gtp, *gtp_next; +- struct net_device *dev; +- +- for_each_netdev(net, dev) +- if (dev->rtnl_link_ops == >p_link_ops) +- gtp_dellink(dev, dev_to_kill); + + list_for_each_entry_safe(gtp, gtp_next, &gn->gtp_dev_list, list) + gtp_dellink(gtp->dev, dev_to_kill); +diff --git a/drivers/net/pse-pd/pd692x0.c b/drivers/net/pse-pd/pd692x0.c +index 0af7db80b2f883..7cfc36cadb5761 100644 +--- a/drivers/net/pse-pd/pd692x0.c ++++ b/drivers/net/pse-pd/pd692x0.c +@@ -999,13 +999,12 @@ static int pd692x0_pi_get_voltage(struct pse_controller_dev *pcdev, int id) + return (buf.sub[0] << 8 | buf.sub[1]) * 100000; + } + +-static int pd692x0_pi_get_current_limit(struct pse_controller_dev *pcdev, +- int id) ++static int pd692x0_pi_get_pw_limit(struct pse_controller_dev *pcdev, ++ int id) + { + struct pd692x0_priv *priv = to_pd692x0_priv(pcdev); + struct pd692x0_msg msg, buf = {0}; +- int mW, uV, uA, ret; +- s64 tmp_64; ++ int ret; + + msg = pd692x0_msg_template_list[PD692X0_MSG_GET_PORT_PARAM]; + msg.sub[2] = id; +@@ -1013,48 +1012,24 @@ static int pd692x0_pi_get_current_limit(struct pse_controller_dev *pcdev, + if (ret < 0) + return ret; + +- ret = pd692x0_pi_get_pw_from_table(buf.data[2], buf.data[3]); +- if (ret < 0) +- return ret; +- mW = ret; +- +- ret = pd692x0_pi_get_voltage(pcdev, id); +- if (ret < 0) +- return ret; +- uV = ret; +- +- tmp_64 = mW; +- tmp_64 *= 1000000000ull; +- /* uA = mW * 1000000000 / uV */ +- uA = DIV_ROUND_CLOSEST_ULL(tmp_64, uV); +- return uA; ++ return pd692x0_pi_get_pw_from_table(buf.data[0], buf.data[1]); + } + +-static int pd692x0_pi_set_current_limit(struct pse_controller_dev *pcdev, +- int id, int max_uA) ++static int pd692x0_pi_set_pw_limit(struct pse_controller_dev *pcdev, ++ int id, int max_mW) + { + struct pd692x0_priv *priv = to_pd692x0_priv(pcdev); + struct device *dev = &priv->client->dev; + struct pd692x0_msg msg, buf = {0}; +- int uV, ret, mW; +- s64 tmp_64; ++ int ret; + + ret = pd692x0_fw_unavailable(priv); + if (ret) + return ret; + +- ret = pd692x0_pi_get_voltage(pcdev, id); +- if (ret < 0) +- return ret; +- uV = ret; +- + msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM]; + msg.sub[2] = id; +- tmp_64 = uV; +- tmp_64 *= max_uA; +- /* mW = uV * uA / 1000000000 */ +- mW = DIV_ROUND_CLOSEST_ULL(tmp_64, 1000000000); +- ret = pd692x0_pi_set_pw_from_table(dev, &msg, mW); ++ ret = pd692x0_pi_set_pw_from_table(dev, &msg, max_mW); + if (ret) + return ret; + +@@ -1068,8 +1043,8 @@ static const struct pse_controller_ops pd692x0_ops = { + .pi_disable = pd692x0_pi_disable, + .pi_is_enabled = pd692x0_pi_is_enabled, + .pi_get_voltage = pd692x0_pi_get_voltage, +- .pi_get_current_limit = pd692x0_pi_get_current_limit, +- .pi_set_current_limit = pd692x0_pi_set_current_limit, ++ .pi_get_pw_limit = pd692x0_pi_get_pw_limit, ++ .pi_set_pw_limit = pd692x0_pi_set_pw_limit, + }; + + #define PD692X0_FW_LINE_MAX_SZ 0xff +diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c +index 2906ce173f66cd..bb509d973e914e 100644 +--- a/drivers/net/pse-pd/pse_core.c ++++ b/drivers/net/pse-pd/pse_core.c +@@ -291,32 +291,24 @@ static int pse_pi_get_voltage(struct regulator_dev *rdev) + return ret; + } + +-static int _pse_ethtool_get_status(struct pse_controller_dev *pcdev, +- int id, +- struct netlink_ext_ack *extack, +- struct pse_control_status *status); +- + static int pse_pi_get_current_limit(struct regulator_dev *rdev) + { + struct pse_controller_dev *pcdev = rdev_get_drvdata(rdev); + const struct pse_controller_ops *ops; +- struct netlink_ext_ack extack = {}; +- struct pse_control_status st = {}; +- int id, uV, ret; ++ int id, uV, mW, ret; + s64 tmp_64; + + ops = pcdev->ops; + id = rdev_get_id(rdev); ++ if (!ops->pi_get_pw_limit || !ops->pi_get_voltage) ++ return -EOPNOTSUPP; ++ + mutex_lock(&pcdev->lock); +- if (ops->pi_get_current_limit) { +- ret = ops->pi_get_current_limit(pcdev, id); ++ ret = ops->pi_get_pw_limit(pcdev, id); ++ if (ret < 0) + goto out; +- } ++ mW = ret; + +- /* If pi_get_current_limit() callback not populated get voltage +- * from pi_get_voltage() and power limit from ethtool_get_status() +- * to calculate current limit. +- */ + ret = _pse_pi_get_voltage(rdev); + if (!ret) { + dev_err(pcdev->dev, "Voltage null\n"); +@@ -327,16 +319,7 @@ static int pse_pi_get_current_limit(struct regulator_dev *rdev) + goto out; + uV = ret; + +- ret = _pse_ethtool_get_status(pcdev, id, &extack, &st); +- if (ret) +- goto out; +- +- if (!st.c33_avail_pw_limit) { +- ret = -ENODATA; +- goto out; +- } +- +- tmp_64 = st.c33_avail_pw_limit; ++ tmp_64 = mW; + tmp_64 *= 1000000000ull; + /* uA = mW * 1000000000 / uV */ + ret = DIV_ROUND_CLOSEST_ULL(tmp_64, uV); +@@ -351,15 +334,33 @@ static int pse_pi_set_current_limit(struct regulator_dev *rdev, int min_uA, + { + struct pse_controller_dev *pcdev = rdev_get_drvdata(rdev); + const struct pse_controller_ops *ops; +- int id, ret; ++ int id, mW, ret; ++ s64 tmp_64; + + ops = pcdev->ops; +- if (!ops->pi_set_current_limit) ++ if (!ops->pi_set_pw_limit || !ops->pi_get_voltage) + return -EOPNOTSUPP; + ++ if (max_uA > MAX_PI_CURRENT) ++ return -ERANGE; ++ + id = rdev_get_id(rdev); + mutex_lock(&pcdev->lock); +- ret = ops->pi_set_current_limit(pcdev, id, max_uA); ++ ret = _pse_pi_get_voltage(rdev); ++ if (!ret) { ++ dev_err(pcdev->dev, "Voltage null\n"); ++ ret = -ERANGE; ++ goto out; ++ } ++ if (ret < 0) ++ goto out; ++ ++ tmp_64 = ret; ++ tmp_64 *= max_uA; ++ /* mW = uA * uV / 1000000000 */ ++ mW = DIV_ROUND_CLOSEST_ULL(tmp_64, 1000000000); ++ ret = ops->pi_set_pw_limit(pcdev, id, mW); ++out: + mutex_unlock(&pcdev->lock); + + return ret; +@@ -403,11 +404,9 @@ devm_pse_pi_regulator_register(struct pse_controller_dev *pcdev, + + rinit_data->constraints.valid_ops_mask = REGULATOR_CHANGE_STATUS; + +- if (pcdev->ops->pi_set_current_limit) { ++ if (pcdev->ops->pi_set_pw_limit) + rinit_data->constraints.valid_ops_mask |= + REGULATOR_CHANGE_CURRENT; +- rinit_data->constraints.max_uA = MAX_PI_CURRENT; +- } + + rinit_data->supply_regulator = "vpwr"; + +@@ -736,23 +735,6 @@ struct pse_control *of_pse_control_get(struct device_node *node) + } + EXPORT_SYMBOL_GPL(of_pse_control_get); + +-static int _pse_ethtool_get_status(struct pse_controller_dev *pcdev, +- int id, +- struct netlink_ext_ack *extack, +- struct pse_control_status *status) +-{ +- const struct pse_controller_ops *ops; +- +- ops = pcdev->ops; +- if (!ops->ethtool_get_status) { +- NL_SET_ERR_MSG(extack, +- "PSE driver does not support status report"); +- return -EOPNOTSUPP; +- } +- +- return ops->ethtool_get_status(pcdev, id, extack, status); +-} +- + /** + * pse_ethtool_get_status - get status of PSE control + * @psec: PSE control pointer +@@ -765,11 +747,21 @@ int pse_ethtool_get_status(struct pse_control *psec, + struct netlink_ext_ack *extack, + struct pse_control_status *status) + { ++ const struct pse_controller_ops *ops; ++ struct pse_controller_dev *pcdev; + int err; + +- mutex_lock(&psec->pcdev->lock); +- err = _pse_ethtool_get_status(psec->pcdev, psec->id, extack, status); +- mutex_unlock(&psec->pcdev->lock); ++ pcdev = psec->pcdev; ++ ops = pcdev->ops; ++ if (!ops->ethtool_get_status) { ++ NL_SET_ERR_MSG(extack, ++ "PSE driver does not support status report"); ++ return -EOPNOTSUPP; ++ } ++ ++ mutex_lock(&pcdev->lock); ++ err = ops->ethtool_get_status(pcdev, psec->id, extack, status); ++ mutex_unlock(&pcdev->lock); + + return err; + } +diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c +index e8930146847af4..b1b46c2713e1cc 100644 +--- a/drivers/nvme/host/ioctl.c ++++ b/drivers/nvme/host/ioctl.c +@@ -283,8 +283,7 @@ static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl, + { + if (ns && nsid != ns->head->ns_id) { + dev_err(ctrl->device, +- "%s: nsid (%u) in cmd does not match nsid (%u)" +- "of namespace\n", ++ "%s: nsid (%u) in cmd does not match nsid (%u) of namespace\n", + current->comm, nsid, ns->head->ns_id); + return false; + } +diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c +index 841238f38fddab..d7c193028e7c36 100644 +--- a/drivers/nvme/host/tcp.c ++++ b/drivers/nvme/host/tcp.c +@@ -1449,11 +1449,14 @@ static int nvme_tcp_init_connection(struct nvme_tcp_queue *queue) + msg.msg_control = cbuf; + msg.msg_controllen = sizeof(cbuf); + } ++ msg.msg_flags = MSG_WAITALL; + ret = kernel_recvmsg(queue->sock, &msg, &iov, 1, + iov.iov_len, msg.msg_flags); +- if (ret < 0) { ++ if (ret < sizeof(*icresp)) { + pr_warn("queue %d: failed to receive icresp, error %d\n", + nvme_tcp_queue_id(queue), ret); ++ if (ret >= 0) ++ ret = -ECONNRESET; + goto free_icresp; + } + ret = -ENOTCONN; +@@ -1565,7 +1568,7 @@ static bool nvme_tcp_poll_queue(struct nvme_tcp_queue *queue) + ctrl->io_queues[HCTX_TYPE_POLL]; + } + +-/** ++/* + * Track the number of queues assigned to each cpu using a global per-cpu + * counter and select the least used cpu from the mq_map. Our goal is to spread + * different controllers I/O threads across different cpu cores. +diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c +index fde6c555af619e..56e3c870ab4c3a 100644 +--- a/drivers/nvme/target/core.c ++++ b/drivers/nvme/target/core.c +@@ -606,6 +606,9 @@ int nvmet_ns_enable(struct nvmet_ns *ns) + goto out_dev_put; + } + ++ if (percpu_ref_init(&ns->ref, nvmet_destroy_namespace, 0, GFP_KERNEL)) ++ goto out_pr_exit; ++ + nvmet_ns_changed(subsys, ns->nsid); + ns->enabled = true; + xa_set_mark(&subsys->namespaces, ns->nsid, NVMET_NS_ENABLED); +@@ -613,6 +616,9 @@ int nvmet_ns_enable(struct nvmet_ns *ns) + out_unlock: + mutex_unlock(&subsys->lock); + return ret; ++out_pr_exit: ++ if (ns->pr.enable) ++ nvmet_pr_exit_ns(ns); + out_dev_put: + list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) + pci_dev_put(radix_tree_delete(&ctrl->p2p_ns_map, ns->nsid)); +@@ -638,6 +644,19 @@ void nvmet_ns_disable(struct nvmet_ns *ns) + + mutex_unlock(&subsys->lock); + ++ /* ++ * Now that we removed the namespaces from the lookup list, we ++ * can kill the per_cpu ref and wait for any remaining references ++ * to be dropped, as well as a RCU grace period for anyone only ++ * using the namepace under rcu_read_lock(). Note that we can't ++ * use call_rcu here as we need to ensure the namespaces have ++ * been fully destroyed before unloading the module. ++ */ ++ percpu_ref_kill(&ns->ref); ++ synchronize_rcu(); ++ wait_for_completion(&ns->disable_done); ++ percpu_ref_exit(&ns->ref); ++ + if (ns->pr.enable) + nvmet_pr_exit_ns(ns); + +@@ -660,22 +679,6 @@ void nvmet_ns_free(struct nvmet_ns *ns) + if (ns->nsid == subsys->max_nsid) + subsys->max_nsid = nvmet_max_nsid(subsys); + +- mutex_unlock(&subsys->lock); +- +- /* +- * Now that we removed the namespaces from the lookup list, we +- * can kill the per_cpu ref and wait for any remaining references +- * to be dropped, as well as a RCU grace period for anyone only +- * using the namepace under rcu_read_lock(). Note that we can't +- * use call_rcu here as we need to ensure the namespaces have +- * been fully destroyed before unloading the module. +- */ +- percpu_ref_kill(&ns->ref); +- synchronize_rcu(); +- wait_for_completion(&ns->disable_done); +- percpu_ref_exit(&ns->ref); +- +- mutex_lock(&subsys->lock); + subsys->nr_namespaces--; + mutex_unlock(&subsys->lock); + +@@ -705,9 +708,6 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid) + ns->nsid = nsid; + ns->subsys = subsys; + +- if (percpu_ref_init(&ns->ref, nvmet_destroy_namespace, 0, GFP_KERNEL)) +- goto out_free; +- + if (ns->nsid > subsys->max_nsid) + subsys->max_nsid = nsid; + +@@ -730,8 +730,6 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid) + return ns; + out_exit: + subsys->max_nsid = nvmet_max_nsid(subsys); +- percpu_ref_exit(&ns->ref); +-out_free: + kfree(ns); + out_unlock: + mutex_unlock(&subsys->lock); +diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c +index 3b59a86a764b11..1adebcb263bd08 100644 +--- a/drivers/pci/devres.c ++++ b/drivers/pci/devres.c +@@ -411,46 +411,20 @@ static inline bool mask_contains_bar(int mask, int bar) + return mask & BIT(bar); + } + +-/* +- * This is a copy of pci_intx() used to bypass the problem of recursive +- * function calls due to the hybrid nature of pci_intx(). +- */ +-static void __pcim_intx(struct pci_dev *pdev, int enable) +-{ +- u16 pci_command, new; +- +- pci_read_config_word(pdev, PCI_COMMAND, &pci_command); +- +- if (enable) +- new = pci_command & ~PCI_COMMAND_INTX_DISABLE; +- else +- new = pci_command | PCI_COMMAND_INTX_DISABLE; +- +- if (new != pci_command) +- pci_write_config_word(pdev, PCI_COMMAND, new); +-} +- + static void pcim_intx_restore(struct device *dev, void *data) + { + struct pci_dev *pdev = to_pci_dev(dev); + struct pcim_intx_devres *res = data; + +- __pcim_intx(pdev, res->orig_intx); ++ pci_intx(pdev, res->orig_intx); + } + +-static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev) ++static void save_orig_intx(struct pci_dev *pdev, struct pcim_intx_devres *res) + { +- struct pcim_intx_devres *res; +- +- res = devres_find(dev, pcim_intx_restore, NULL, NULL); +- if (res) +- return res; ++ u16 pci_command; + +- res = devres_alloc(pcim_intx_restore, sizeof(*res), GFP_KERNEL); +- if (res) +- devres_add(dev, res); +- +- return res; ++ pci_read_config_word(pdev, PCI_COMMAND, &pci_command); ++ res->orig_intx = !(pci_command & PCI_COMMAND_INTX_DISABLE); + } + + /** +@@ -466,16 +440,28 @@ static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev) + int pcim_intx(struct pci_dev *pdev, int enable) + { + struct pcim_intx_devres *res; ++ struct device *dev = &pdev->dev; + +- res = get_or_create_intx_devres(&pdev->dev); +- if (!res) +- return -ENOMEM; ++ /* ++ * pcim_intx() must only restore the INTx value that existed before the ++ * driver was loaded, i.e., before it called pcim_intx() for the ++ * first time. ++ */ ++ res = devres_find(dev, pcim_intx_restore, NULL, NULL); ++ if (!res) { ++ res = devres_alloc(pcim_intx_restore, sizeof(*res), GFP_KERNEL); ++ if (!res) ++ return -ENOMEM; ++ ++ save_orig_intx(pdev, res); ++ devres_add(dev, res); ++ } + +- res->orig_intx = !enable; +- __pcim_intx(pdev, enable); ++ pci_intx(pdev, enable); + + return 0; + } ++EXPORT_SYMBOL_GPL(pcim_intx); + + static void pcim_disable_device(void *pdev_raw) + { +diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c +index 661f98c6c63a39..b0ae4bc1a1bee0 100644 +--- a/drivers/pci/pci.c ++++ b/drivers/pci/pci.c +@@ -4488,11 +4488,6 @@ void pci_disable_parity(struct pci_dev *dev) + * @enable: boolean: whether to enable or disable PCI INTx + * + * Enables/disables PCI INTx for device @pdev +- * +- * NOTE: +- * This is a "hybrid" function: It's normally unmanaged, but becomes managed +- * when pcim_enable_device() has been called in advance. This hybrid feature is +- * DEPRECATED! If you want managed cleanup, use pcim_intx() instead. + */ + void pci_intx(struct pci_dev *pdev, int enable) + { +@@ -4505,15 +4500,10 @@ void pci_intx(struct pci_dev *pdev, int enable) + else + new = pci_command | PCI_COMMAND_INTX_DISABLE; + +- if (new != pci_command) { +- /* Preserve the "hybrid" behavior for backwards compatibility */ +- if (pci_is_managed(pdev)) { +- WARN_ON_ONCE(pcim_intx(pdev, enable) != 0); +- return; +- } ++ if (new == pci_command) ++ return; + +- pci_write_config_word(pdev, PCI_COMMAND, new); +- } ++ pci_write_config_word(pdev, PCI_COMMAND, new); + } + EXPORT_SYMBOL_GPL(pci_intx); + +diff --git a/drivers/platform/cznic/Kconfig b/drivers/platform/cznic/Kconfig +index 49c383eb678541..13e37b49d9d01e 100644 +--- a/drivers/platform/cznic/Kconfig ++++ b/drivers/platform/cznic/Kconfig +@@ -6,6 +6,7 @@ + + menuconfig CZNIC_PLATFORMS + bool "Platform support for CZ.NIC's Turris hardware" ++ depends on ARCH_MVEBU || COMPILE_TEST + help + Say Y here to be able to choose driver support for CZ.NIC's Turris + devices. This option alone does not add any kernel code. +diff --git a/drivers/power/supply/axp20x_battery.c b/drivers/power/supply/axp20x_battery.c +index fa27195f074e7d..3c3158f31a484d 100644 +--- a/drivers/power/supply/axp20x_battery.c ++++ b/drivers/power/supply/axp20x_battery.c +@@ -466,10 +466,9 @@ static int axp717_battery_get_prop(struct power_supply *psy, + + /* + * If a fault is detected it must also be cleared; if the +- * condition persists it should reappear (This is an +- * assumption, it's actually not documented). A restart was +- * not sufficient to clear the bit in testing despite the +- * register listed as POR. ++ * condition persists it should reappear. A restart was not ++ * sufficient to clear the bit in testing despite the register ++ * listed as POR. + */ + case POWER_SUPPLY_PROP_HEALTH: + ret = regmap_read(axp20x_batt->regmap, AXP717_PMU_FAULT, +@@ -480,26 +479,26 @@ static int axp717_battery_get_prop(struct power_supply *psy, + switch (reg & AXP717_BATT_PMU_FAULT_MASK) { + case AXP717_BATT_UVLO_2_5V: + val->intval = POWER_SUPPLY_HEALTH_DEAD; +- regmap_update_bits(axp20x_batt->regmap, +- AXP717_PMU_FAULT, +- AXP717_BATT_UVLO_2_5V, +- AXP717_BATT_UVLO_2_5V); ++ regmap_write_bits(axp20x_batt->regmap, ++ AXP717_PMU_FAULT, ++ AXP717_BATT_UVLO_2_5V, ++ AXP717_BATT_UVLO_2_5V); + return 0; + + case AXP717_BATT_OVER_TEMP: + val->intval = POWER_SUPPLY_HEALTH_HOT; +- regmap_update_bits(axp20x_batt->regmap, +- AXP717_PMU_FAULT, +- AXP717_BATT_OVER_TEMP, +- AXP717_BATT_OVER_TEMP); ++ regmap_write_bits(axp20x_batt->regmap, ++ AXP717_PMU_FAULT, ++ AXP717_BATT_OVER_TEMP, ++ AXP717_BATT_OVER_TEMP); + return 0; + + case AXP717_BATT_UNDER_TEMP: + val->intval = POWER_SUPPLY_HEALTH_COLD; +- regmap_update_bits(axp20x_batt->regmap, +- AXP717_PMU_FAULT, +- AXP717_BATT_UNDER_TEMP, +- AXP717_BATT_UNDER_TEMP); ++ regmap_write_bits(axp20x_batt->regmap, ++ AXP717_PMU_FAULT, ++ AXP717_BATT_UNDER_TEMP, ++ AXP717_BATT_UNDER_TEMP); + return 0; + + default: +diff --git a/drivers/power/supply/da9150-fg.c b/drivers/power/supply/da9150-fg.c +index 652c1f213af1c2..4f28ef1bba1a3c 100644 +--- a/drivers/power/supply/da9150-fg.c ++++ b/drivers/power/supply/da9150-fg.c +@@ -247,9 +247,9 @@ static int da9150_fg_current_avg(struct da9150_fg *fg, + DA9150_QIF_SD_GAIN_SIZE); + da9150_fg_read_sync_end(fg); + +- div = (u64) (sd_gain * shunt_val * 65536ULL); ++ div = 65536ULL * sd_gain * shunt_val; + do_div(div, 1000000); +- res = (u64) (iavg * 1000000ULL); ++ res = 1000000ULL * iavg; + do_div(res, div); + + val->intval = (int) res; +diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c +index e36e3ea165d3b2..2f34761e64135c 100644 +--- a/drivers/s390/net/ism_drv.c ++++ b/drivers/s390/net/ism_drv.c +@@ -588,6 +588,15 @@ static int ism_dev_init(struct ism_dev *ism) + return ret; + } + ++static void ism_dev_release(struct device *dev) ++{ ++ struct ism_dev *ism; ++ ++ ism = container_of(dev, struct ism_dev, dev); ++ ++ kfree(ism); ++} ++ + static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id) + { + struct ism_dev *ism; +@@ -601,6 +610,7 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id) + dev_set_drvdata(&pdev->dev, ism); + ism->pdev = pdev; + ism->dev.parent = &pdev->dev; ++ ism->dev.release = ism_dev_release; + device_initialize(&ism->dev); + dev_set_name(&ism->dev, dev_name(&pdev->dev)); + ret = device_add(&ism->dev); +@@ -637,7 +647,7 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id) + device_del(&ism->dev); + err_dev: + dev_set_drvdata(&pdev->dev, NULL); +- kfree(ism); ++ put_device(&ism->dev); + + return ret; + } +@@ -682,7 +692,7 @@ static void ism_remove(struct pci_dev *pdev) + pci_disable_device(pdev); + device_del(&ism->dev); + dev_set_drvdata(&pdev->dev, NULL); +- kfree(ism); ++ put_device(&ism->dev); + } + + static struct pci_driver ism_driver = { +diff --git a/drivers/soc/loongson/loongson2_guts.c b/drivers/soc/loongson/loongson2_guts.c +index ae42e3a9127fc1..16913c3ef65ca4 100644 +--- a/drivers/soc/loongson/loongson2_guts.c ++++ b/drivers/soc/loongson/loongson2_guts.c +@@ -114,8 +114,11 @@ static int loongson2_guts_probe(struct platform_device *pdev) + if (of_property_read_string(root, "model", &machine)) + of_property_read_string_index(root, "compatible", 0, &machine); + of_node_put(root); +- if (machine) ++ if (machine) { + soc_dev_attr.machine = devm_kstrdup(dev, machine, GFP_KERNEL); ++ if (!soc_dev_attr.machine) ++ return -ENOMEM; ++ } + + svr = loongson2_guts_get_svr(); + soc_die = loongson2_soc_die_match(svr, loongson2_soc_die); +diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c +index 322a543b8c278a..d0f397c9024201 100644 +--- a/drivers/tee/optee/supp.c ++++ b/drivers/tee/optee/supp.c +@@ -80,7 +80,6 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params, + struct optee *optee = tee_get_drvdata(ctx->teedev); + struct optee_supp *supp = &optee->supp; + struct optee_supp_req *req; +- bool interruptable; + u32 ret; + + /* +@@ -111,36 +110,18 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params, + /* + * Wait for supplicant to process and return result, once we've + * returned from wait_for_completion(&req->c) successfully we have +- * exclusive access again. ++ * exclusive access again. Allow the wait to be killable such that ++ * the wait doesn't turn into an indefinite state if the supplicant ++ * gets hung for some reason. + */ +- while (wait_for_completion_interruptible(&req->c)) { ++ if (wait_for_completion_killable(&req->c)) { + mutex_lock(&supp->mutex); +- interruptable = !supp->ctx; +- if (interruptable) { +- /* +- * There's no supplicant available and since the +- * supp->mutex currently is held none can +- * become available until the mutex released +- * again. +- * +- * Interrupting an RPC to supplicant is only +- * allowed as a way of slightly improving the user +- * experience in case the supplicant hasn't been +- * started yet. During normal operation the supplicant +- * will serve all requests in a timely manner and +- * interrupting then wouldn't make sense. +- */ +- if (req->in_queue) { +- list_del(&req->link); +- req->in_queue = false; +- } ++ if (req->in_queue) { ++ list_del(&req->link); ++ req->in_queue = false; + } + mutex_unlock(&supp->mutex); +- +- if (interruptable) { +- req->ret = TEEC_ERROR_COMMUNICATION; +- break; +- } ++ req->ret = TEEC_ERROR_COMMUNICATION; + } + + ret = req->ret; +diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c +index 0050d6253c05d1..1a050ec9912cb8 100644 +--- a/drivers/tty/serial/sh-sci.c ++++ b/drivers/tty/serial/sh-sci.c +@@ -166,6 +166,7 @@ static struct sci_port sci_ports[SCI_NPORTS]; + static unsigned long sci_ports_in_use; + static struct uart_driver sci_uart_driver; + static bool sci_uart_earlycon; ++static bool sci_uart_earlycon_dev_probing; + + static inline struct sci_port * + to_sci_port(struct uart_port *uart) +@@ -3057,10 +3058,6 @@ static int sci_init_single(struct platform_device *dev, + ret = sci_init_clocks(sci_port, &dev->dev); + if (ret < 0) + return ret; +- +- port->dev = &dev->dev; +- +- pm_runtime_enable(&dev->dev); + } + + port->type = p->type; +@@ -3087,11 +3084,6 @@ static int sci_init_single(struct platform_device *dev, + return 0; + } + +-static void sci_cleanup_single(struct sci_port *port) +-{ +- pm_runtime_disable(port->port.dev); +-} +- + #if defined(CONFIG_SERIAL_SH_SCI_CONSOLE) || \ + defined(CONFIG_SERIAL_SH_SCI_EARLYCON) + static void serial_console_putchar(struct uart_port *port, unsigned char ch) +@@ -3261,8 +3253,6 @@ static void sci_remove(struct platform_device *dev) + sci_ports_in_use &= ~BIT(port->port.line); + uart_remove_one_port(&sci_uart_driver, &port->port); + +- sci_cleanup_single(port); +- + if (port->port.fifosize > 1) + device_remove_file(&dev->dev, &dev_attr_rx_fifo_trigger); + if (type == PORT_SCIFA || type == PORT_SCIFB || type == PORT_HSCIF) +@@ -3397,7 +3387,8 @@ static struct plat_sci_port *sci_parse_dt(struct platform_device *pdev, + static int sci_probe_single(struct platform_device *dev, + unsigned int index, + struct plat_sci_port *p, +- struct sci_port *sciport) ++ struct sci_port *sciport, ++ struct resource *sci_res) + { + int ret; + +@@ -3426,6 +3417,11 @@ static int sci_probe_single(struct platform_device *dev, + if (ret) + return ret; + ++ sciport->port.dev = &dev->dev; ++ ret = devm_pm_runtime_enable(&dev->dev); ++ if (ret) ++ return ret; ++ + sciport->gpios = mctrl_gpio_init(&sciport->port, 0); + if (IS_ERR(sciport->gpios)) + return PTR_ERR(sciport->gpios); +@@ -3439,13 +3435,31 @@ static int sci_probe_single(struct platform_device *dev, + sciport->port.flags |= UPF_HARD_FLOW; + } + +- ret = uart_add_one_port(&sci_uart_driver, &sciport->port); +- if (ret) { +- sci_cleanup_single(sciport); +- return ret; ++ if (sci_uart_earlycon && sci_ports[0].port.mapbase == sci_res->start) { ++ /* ++ * In case: ++ * - this is the earlycon port (mapped on index 0 in sci_ports[]) and ++ * - it now maps to an alias other than zero and ++ * - the earlycon is still alive (e.g., "earlycon keep_bootcon" is ++ * available in bootargs) ++ * ++ * we need to avoid disabling clocks and PM domains through the runtime ++ * PM APIs called in __device_attach(). For this, increment the runtime ++ * PM reference counter (the clocks and PM domains were already enabled ++ * by the bootloader). Otherwise the earlycon may access the HW when it ++ * has no clocks enabled leading to failures (infinite loop in ++ * sci_poll_put_char()). ++ */ ++ pm_runtime_get_noresume(&dev->dev); ++ ++ /* ++ * Skip cleanup the sci_port[0] in early_console_exit(), this ++ * port is the same as the earlycon one. ++ */ ++ sci_uart_earlycon_dev_probing = true; + } + +- return 0; ++ return uart_add_one_port(&sci_uart_driver, &sciport->port); + } + + static int sci_probe(struct platform_device *dev) +@@ -3503,7 +3517,7 @@ static int sci_probe(struct platform_device *dev) + + platform_set_drvdata(dev, sp); + +- ret = sci_probe_single(dev, dev_id, p, sp); ++ ret = sci_probe_single(dev, dev_id, p, sp, res); + if (ret) + return ret; + +@@ -3586,6 +3600,22 @@ sh_early_platform_init_buffer("earlyprintk", &sci_driver, + #ifdef CONFIG_SERIAL_SH_SCI_EARLYCON + static struct plat_sci_port port_cfg; + ++static int early_console_exit(struct console *co) ++{ ++ struct sci_port *sci_port = &sci_ports[0]; ++ ++ /* ++ * Clean the slot used by earlycon. A new SCI device might ++ * map to this slot. ++ */ ++ if (!sci_uart_earlycon_dev_probing) { ++ memset(sci_port, 0, sizeof(*sci_port)); ++ sci_uart_earlycon = false; ++ } ++ ++ return 0; ++} ++ + static int __init early_console_setup(struct earlycon_device *device, + int type) + { +@@ -3603,6 +3633,8 @@ static int __init early_console_setup(struct earlycon_device *device, + SCSCR_RE | SCSCR_TE | port_cfg.scscr); + + device->con->write = serial_console_write; ++ device->con->exit = early_console_exit; ++ + return 0; + } + static int __init sci_early_console_setup(struct earlycon_device *device, +diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c +index 47260d65066a89..da82598fcef8a8 100644 +--- a/drivers/usb/gadget/function/f_midi.c ++++ b/drivers/usb/gadget/function/f_midi.c +@@ -283,7 +283,7 @@ f_midi_complete(struct usb_ep *ep, struct usb_request *req) + /* Our transmit completed. See if there's more to go. + * f_midi_transmit eats req, don't queue it again. */ + req->length = 0; +- f_midi_transmit(midi); ++ queue_work(system_highpri_wq, &midi->work); + return; + } + break; +diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c +index d14ecbe24d7754..0dd24d12898638 100644 +--- a/fs/btrfs/extent_io.c ++++ b/fs/btrfs/extent_io.c +@@ -1145,14 +1145,19 @@ static bool find_next_delalloc_bitmap(struct folio *folio, + } + + /* +- * helper for extent_writepage(), doing all of the delayed allocation setup. ++ * Do all of the delayed allocation setup. + * +- * This returns 1 if btrfs_run_delalloc_range function did all the work required +- * to write the page (copy into inline extent). In this case the IO has +- * been started and the page is already unlocked. ++ * Return >0 if all the dirty blocks are submitted async (compression) or inlined. ++ * The @folio should no longer be touched (treat it as already unlocked). + * +- * This returns 0 if all went well (page still locked) +- * This returns < 0 if there were errors (page still locked) ++ * Return 0 if there is still dirty block that needs to be submitted through ++ * extent_writepage_io(). ++ * bio_ctrl->submit_bitmap will indicate which blocks of the folio should be ++ * submitted, and @folio is still kept locked. ++ * ++ * Return <0 if there is any error hit. ++ * Any allocated ordered extent range covering this folio will be marked ++ * finished (IOERR), and @folio is still kept locked. + */ + static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, + struct folio *folio, +@@ -1170,6 +1175,16 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, + * last delalloc end. + */ + u64 last_delalloc_end = 0; ++ /* ++ * The range end (exclusive) of the last successfully finished delalloc ++ * range. ++ * Any range covered by ordered extent must either be manually marked ++ * finished (error handling), or has IO submitted (and finish the ++ * ordered extent normally). ++ * ++ * This records the end of ordered extent cleanup if we hit an error. ++ */ ++ u64 last_finished_delalloc_end = page_start; + u64 delalloc_start = page_start; + u64 delalloc_end = page_end; + u64 delalloc_to_write = 0; +@@ -1238,11 +1253,19 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, + found_len = last_delalloc_end + 1 - found_start; + + if (ret >= 0) { ++ /* ++ * Some delalloc range may be created by previous folios. ++ * Thus we still need to clean up this range during error ++ * handling. ++ */ ++ last_finished_delalloc_end = found_start; + /* No errors hit so far, run the current delalloc range. */ + ret = btrfs_run_delalloc_range(inode, folio, + found_start, + found_start + found_len - 1, + wbc); ++ if (ret >= 0) ++ last_finished_delalloc_end = found_start + found_len; + } else { + /* + * We've hit an error during previous delalloc range, +@@ -1277,8 +1300,22 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode, + + delalloc_start = found_start + found_len; + } +- if (ret < 0) ++ /* ++ * It's possible we had some ordered extents created before we hit ++ * an error, cleanup non-async successfully created delalloc ranges. ++ */ ++ if (unlikely(ret < 0)) { ++ unsigned int bitmap_size = min( ++ (last_finished_delalloc_end - page_start) >> ++ fs_info->sectorsize_bits, ++ fs_info->sectors_per_page); ++ ++ for_each_set_bit(bit, &bio_ctrl->submit_bitmap, bitmap_size) ++ btrfs_mark_ordered_io_finished(inode, folio, ++ page_start + (bit << fs_info->sectorsize_bits), ++ fs_info->sectorsize, false); + return ret; ++ } + out: + if (last_delalloc_end) + delalloc_end = last_delalloc_end; +@@ -1394,6 +1431,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, + struct btrfs_fs_info *fs_info = inode->root->fs_info; + unsigned long range_bitmap = 0; + bool submitted_io = false; ++ bool error = false; + const u64 folio_start = folio_pos(folio); + u64 cur; + int bit; +@@ -1436,11 +1474,26 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, + break; + } + ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size); +- if (ret < 0) +- goto out; ++ if (unlikely(ret < 0)) { ++ /* ++ * bio_ctrl may contain a bio crossing several folios. ++ * Submit it immediately so that the bio has a chance ++ * to finish normally, other than marked as error. ++ */ ++ submit_one_bio(bio_ctrl); ++ /* ++ * Failed to grab the extent map which should be very rare. ++ * Since there is no bio submitted to finish the ordered ++ * extent, we have to manually finish this sector. ++ */ ++ btrfs_mark_ordered_io_finished(inode, folio, cur, ++ fs_info->sectorsize, false); ++ error = true; ++ continue; ++ } + submitted_io = true; + } +-out: ++ + /* + * If we didn't submitted any sector (>= i_size), folio dirty get + * cleared but PAGECACHE_TAG_DIRTY is not cleared (only cleared +@@ -1448,8 +1501,11 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, + * + * Here we set writeback and clear for the range. If the full folio + * is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag. ++ * ++ * If we hit any error, the corresponding sector will still be dirty ++ * thus no need to clear PAGECACHE_TAG_DIRTY. + */ +- if (!submitted_io) { ++ if (!submitted_io && !error) { + btrfs_folio_set_writeback(fs_info, folio, start, len); + btrfs_folio_clear_writeback(fs_info, folio, start, len); + } +@@ -1467,15 +1523,14 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, + */ + static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl) + { +- struct inode *inode = folio->mapping->host; +- struct btrfs_fs_info *fs_info = inode_to_fs_info(inode); +- const u64 page_start = folio_pos(folio); ++ struct btrfs_inode *inode = BTRFS_I(folio->mapping->host); ++ struct btrfs_fs_info *fs_info = inode->root->fs_info; + int ret; + size_t pg_offset; +- loff_t i_size = i_size_read(inode); ++ loff_t i_size = i_size_read(&inode->vfs_inode); + unsigned long end_index = i_size >> PAGE_SHIFT; + +- trace_extent_writepage(folio, inode, bio_ctrl->wbc); ++ trace_extent_writepage(folio, &inode->vfs_inode, bio_ctrl->wbc); + + WARN_ON(!folio_test_locked(folio)); + +@@ -1499,13 +1554,13 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl + if (ret < 0) + goto done; + +- ret = writepage_delalloc(BTRFS_I(inode), folio, bio_ctrl); ++ ret = writepage_delalloc(inode, folio, bio_ctrl); + if (ret == 1) + return 0; + if (ret) + goto done; + +- ret = extent_writepage_io(BTRFS_I(inode), folio, folio_pos(folio), ++ ret = extent_writepage_io(inode, folio, folio_pos(folio), + PAGE_SIZE, bio_ctrl, i_size); + if (ret == 1) + return 0; +@@ -1513,12 +1568,8 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl + bio_ctrl->wbc->nr_to_write--; + + done: +- if (ret) { +- btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio, +- page_start, PAGE_SIZE, !ret); ++ if (ret < 0) + mapping_set_error(folio->mapping, ret); +- } +- + /* + * Only unlock ranges that are submitted. As there can be some async + * submitted ranges inside the folio. +@@ -2295,11 +2346,8 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f + if (ret == 1) + goto next_page; + +- if (ret) { +- btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio, +- cur, cur_len, !ret); ++ if (ret) + mapping_set_error(mapping, ret); +- } + btrfs_folio_end_lock(fs_info, folio, cur, cur_len); + if (ret < 0) + found_error = true; +diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c +index d1c8f6730a5687..b4160b1c77573d 100644 +--- a/fs/btrfs/inode.c ++++ b/fs/btrfs/inode.c +@@ -2385,8 +2385,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol + + out: + if (ret < 0) +- btrfs_cleanup_ordered_extents(inode, locked_folio, start, +- end - start + 1); ++ btrfs_cleanup_ordered_extents(inode, NULL, start, end - start + 1); + return ret; + } + +diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c +index f146e06c97eb69..0d149b315a832e 100644 +--- a/fs/smb/client/inode.c ++++ b/fs/smb/client/inode.c +@@ -1403,7 +1403,7 @@ int cifs_get_inode_info(struct inode **inode, + struct cifs_fattr fattr = {}; + int rc; + +- if (is_inode_cache_good(*inode)) { ++ if (!data && is_inode_cache_good(*inode)) { + cifs_dbg(FYI, "No need to revalidate cached inode sizes\n"); + return 0; + } +@@ -1502,7 +1502,7 @@ int smb311_posix_get_inode_info(struct inode **inode, + struct cifs_fattr fattr = {}; + int rc; + +- if (is_inode_cache_good(*inode)) { ++ if (!data && is_inode_cache_good(*inode)) { + cifs_dbg(FYI, "No need to revalidate cached inode sizes\n"); + return 0; + } +diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c +index a588f6b3f3b6a5..793e9b2b79d6f9 100644 +--- a/fs/smb/client/smb2ops.c ++++ b/fs/smb/client/smb2ops.c +@@ -4964,6 +4964,10 @@ receive_encrypted_standard(struct TCP_Server_Info *server, + next_buffer = (char *)cifs_buf_get(); + else + next_buffer = (char *)cifs_small_buf_get(); ++ if (!next_buffer) { ++ cifs_server_dbg(VFS, "No memory for (large) SMB response\n"); ++ return -1; ++ } + memcpy(next_buffer, buf + next_cmd, pdu_length - next_cmd); + } + +diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h +index 9ff3cafd867962..1182c6fa61807d 100644 +--- a/fs/xfs/scrub/common.h ++++ b/fs/xfs/scrub/common.h +@@ -212,7 +212,6 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm) + bool xchk_dir_looks_zapped(struct xfs_inode *dp); + bool xchk_pptr_looks_zapped(struct xfs_inode *ip); + +-#ifdef CONFIG_XFS_ONLINE_REPAIR + /* Decide if a repair is required. */ + static inline bool xchk_needs_repair(const struct xfs_scrub_metadata *sm) + { +@@ -232,10 +231,6 @@ static inline bool xchk_could_repair(const struct xfs_scrub *sc) + return (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && + !(sc->flags & XREP_ALREADY_FIXED); + } +-#else +-# define xchk_needs_repair(sc) (false) +-# define xchk_could_repair(sc) (false) +-#endif /* CONFIG_XFS_ONLINE_REPAIR */ + + int xchk_metadata_inode_forks(struct xfs_scrub *sc); + +diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h +index b649da1a93eb8c..b3b1fe62814e7b 100644 +--- a/fs/xfs/scrub/repair.h ++++ b/fs/xfs/scrub/repair.h +@@ -173,7 +173,16 @@ bool xrep_buf_verify_struct(struct xfs_buf *bp, const struct xfs_buf_ops *ops); + #else + + #define xrep_ino_dqattach(sc) (0) +-#define xrep_will_attempt(sc) (false) ++ ++/* ++ * When online repair is not built into the kernel, we still want to attempt ++ * the repair so that the stub xrep_attempt below will return EOPNOTSUPP. ++ */ ++static inline bool xrep_will_attempt(const struct xfs_scrub *sc) ++{ ++ return (sc->sm->sm_flags & XFS_SCRUB_IFLAG_FORCE_REBUILD) || ++ xchk_needs_repair(sc->sm); ++} + + static inline int + xrep_attempt( +diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c +index 950f5a58dcd967..4ba02a490eface 100644 +--- a/fs/xfs/scrub/scrub.c ++++ b/fs/xfs/scrub/scrub.c +@@ -149,6 +149,18 @@ xchk_probe( + if (xchk_should_terminate(sc, &error)) + return error; + ++ /* ++ * If the caller is probing to see if repair works but repair isn't ++ * built into the kernel, return EOPNOTSUPP because that's the signal ++ * that userspace expects. If online repair is built in, set the ++ * CORRUPT flag (without any of the usual tracing/logging) to force us ++ * into xrep_probe. ++ */ ++ if (xchk_could_repair(sc)) { ++ if (!IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)) ++ return -EOPNOTSUPP; ++ sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT; ++ } + return 0; + } + +diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h +index 332cee28566208..14fc1b39c0cf3e 100644 +--- a/include/linux/mm_types.h ++++ b/include/linux/mm_types.h +@@ -873,10 +873,11 @@ struct mm_struct { + */ + unsigned int nr_cpus_allowed; + /** +- * @max_nr_cid: Maximum number of concurrency IDs allocated. ++ * @max_nr_cid: Maximum number of allowed concurrency ++ * IDs allocated. + * +- * Track the highest number of concurrency IDs allocated for the +- * mm. ++ * Track the highest number of allowed concurrency IDs ++ * allocated for the mm. + */ + atomic_t max_nr_cid; + /** +diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h +index 8268be0723eee9..bb71ad82b42ba8 100644 +--- a/include/linux/netdevice.h ++++ b/include/linux/netdevice.h +@@ -3138,6 +3138,8 @@ static inline struct net_device *first_net_device_rcu(struct net *net) + } + + int netdev_boot_setup_check(struct net_device *dev); ++struct net_device *dev_getbyhwaddr(struct net *net, unsigned short type, ++ const char *hwaddr); + struct net_device *dev_getbyhwaddr_rcu(struct net *net, unsigned short type, + const char *hwaddr); + struct net_device *dev_getfirstbyhwtype(struct net *net, unsigned short type); +diff --git a/include/linux/pci.h b/include/linux/pci.h +index db9b47ce3eefdc..f05903dd7695ef 100644 +--- a/include/linux/pci.h ++++ b/include/linux/pci.h +@@ -2297,6 +2297,7 @@ static inline void pci_fixup_device(enum pci_fixup_pass pass, + struct pci_dev *dev) { } + #endif + ++int pcim_intx(struct pci_dev *pdev, int enabled); + int pcim_request_all_regions(struct pci_dev *pdev, const char *name); + void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen); + void __iomem *pcim_iomap_region(struct pci_dev *pdev, int bar, +diff --git a/include/linux/pse-pd/pse.h b/include/linux/pse-pd/pse.h +index 591a53e082e650..df1592022d938e 100644 +--- a/include/linux/pse-pd/pse.h ++++ b/include/linux/pse-pd/pse.h +@@ -75,12 +75,8 @@ struct pse_control_status { + * @pi_disable: Configure the PSE PI as disabled. + * @pi_get_voltage: Return voltage similarly to get_voltage regulator + * callback. +- * @pi_get_current_limit: Get the configured current limit similarly to +- * get_current_limit regulator callback. +- * @pi_set_current_limit: Configure the current limit similarly to +- * set_current_limit regulator callback. +- * Should not return an error in case of MAX_PI_CURRENT +- * current value set. ++ * @pi_get_pw_limit: Get the configured power limit of the PSE PI. ++ * @pi_set_pw_limit: Configure the power limit of the PSE PI. + */ + struct pse_controller_ops { + int (*ethtool_get_status)(struct pse_controller_dev *pcdev, +@@ -91,10 +87,10 @@ struct pse_controller_ops { + int (*pi_enable)(struct pse_controller_dev *pcdev, int id); + int (*pi_disable)(struct pse_controller_dev *pcdev, int id); + int (*pi_get_voltage)(struct pse_controller_dev *pcdev, int id); +- int (*pi_get_current_limit)(struct pse_controller_dev *pcdev, +- int id); +- int (*pi_set_current_limit)(struct pse_controller_dev *pcdev, +- int id, int max_uA); ++ int (*pi_get_pw_limit)(struct pse_controller_dev *pcdev, ++ int id); ++ int (*pi_set_pw_limit)(struct pse_controller_dev *pcdev, ++ int id, int max_mW); + }; + + struct module; +diff --git a/include/net/gro.h b/include/net/gro.h +index b9b58c1f8d190b..7b548f91754bf3 100644 +--- a/include/net/gro.h ++++ b/include/net/gro.h +@@ -11,6 +11,9 @@ + #include + #include + ++/* This should be increased if a protocol with a bigger head is added. */ ++#define GRO_MAX_HEAD (MAX_HEADER + 128) ++ + struct napi_gro_cb { + union { + struct { +diff --git a/include/net/tcp.h b/include/net/tcp.h +index e9b37b76e894bb..bc04599547c36d 100644 +--- a/include/net/tcp.h ++++ b/include/net/tcp.h +@@ -41,6 +41,7 @@ + #include + #include + #include ++#include + + #include + #include +@@ -683,6 +684,19 @@ void tcp_fin(struct sock *sk); + void tcp_check_space(struct sock *sk); + void tcp_sack_compress_send_ack(struct sock *sk); + ++static inline void tcp_cleanup_skb(struct sk_buff *skb) ++{ ++ skb_dst_drop(skb); ++ secpath_reset(skb); ++} ++ ++static inline void tcp_add_receive_queue(struct sock *sk, struct sk_buff *skb) ++{ ++ DEBUG_NET_WARN_ON_ONCE(skb_dst(skb)); ++ DEBUG_NET_WARN_ON_ONCE(secpath_exists(skb)); ++ __skb_queue_tail(&sk->sk_receive_queue, skb); ++} ++ + /* tcp_timer.c */ + void tcp_init_xmit_timers(struct sock *); + static inline void tcp_clear_xmit_timers(struct sock *sk) +diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c +index d062c5c69211ba..0b0dfef9348036 100644 +--- a/io_uring/io_uring.c ++++ b/io_uring/io_uring.c +@@ -2045,6 +2045,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + req->opcode = 0; + return io_init_fail_req(req, -EINVAL); + } ++ opcode = array_index_nospec(opcode, IORING_OP_LAST); ++ + def = &io_issue_defs[opcode]; + if (unlikely(sqe_flags & ~SQE_COMMON_FLAGS)) { + /* enforce forwards compatibility on users */ +diff --git a/io_uring/rw.c b/io_uring/rw.c +index 29bb3010f9c06d..64322f463c2bd4 100644 +--- a/io_uring/rw.c ++++ b/io_uring/rw.c +@@ -866,7 +866,15 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags) + if (unlikely(ret)) + return ret; + +- ret = io_iter_do_read(rw, &io->iter); ++ if (unlikely(req->opcode == IORING_OP_READ_MULTISHOT)) { ++ void *cb_copy = rw->kiocb.ki_complete; ++ ++ rw->kiocb.ki_complete = NULL; ++ ret = io_iter_do_read(rw, &io->iter); ++ rw->kiocb.ki_complete = cb_copy; ++ } else { ++ ret = io_iter_do_read(rw, &io->iter); ++ } + + /* + * Some file systems like to return -EOPNOTSUPP for an IOCB_NOWAIT +@@ -891,7 +899,8 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags) + } else if (ret == -EIOCBQUEUED) { + return IOU_ISSUE_SKIP_COMPLETE; + } else if (ret == req->cqe.res || ret <= 0 || !force_nonblock || +- (req->flags & REQ_F_NOWAIT) || !need_complete_io(req)) { ++ (req->flags & REQ_F_NOWAIT) || !need_complete_io(req) || ++ (issue_flags & IO_URING_F_MULTISHOT)) { + /* read all, failed, already did sync or don't want to retry */ + goto done; + } +diff --git a/kernel/acct.c b/kernel/acct.c +index 179848ad33e978..d9d55fa4d01a71 100644 +--- a/kernel/acct.c ++++ b/kernel/acct.c +@@ -103,48 +103,50 @@ struct bsd_acct_struct { + atomic_long_t count; + struct rcu_head rcu; + struct mutex lock; +- int active; ++ bool active; ++ bool check_space; + unsigned long needcheck; + struct file *file; + struct pid_namespace *ns; + struct work_struct work; + struct completion done; ++ acct_t ac; + }; + +-static void do_acct_process(struct bsd_acct_struct *acct); ++static void fill_ac(struct bsd_acct_struct *acct); ++static void acct_write_process(struct bsd_acct_struct *acct); + + /* + * Check the amount of free space and suspend/resume accordingly. + */ +-static int check_free_space(struct bsd_acct_struct *acct) ++static bool check_free_space(struct bsd_acct_struct *acct) + { + struct kstatfs sbuf; + +- if (time_is_after_jiffies(acct->needcheck)) +- goto out; ++ if (!acct->check_space) ++ return acct->active; + + /* May block */ + if (vfs_statfs(&acct->file->f_path, &sbuf)) +- goto out; ++ return acct->active; + + if (acct->active) { + u64 suspend = sbuf.f_blocks * SUSPEND; + do_div(suspend, 100); + if (sbuf.f_bavail <= suspend) { +- acct->active = 0; ++ acct->active = false; + pr_info("Process accounting paused\n"); + } + } else { + u64 resume = sbuf.f_blocks * RESUME; + do_div(resume, 100); + if (sbuf.f_bavail >= resume) { +- acct->active = 1; ++ acct->active = true; + pr_info("Process accounting resumed\n"); + } + } + + acct->needcheck = jiffies + ACCT_TIMEOUT*HZ; +-out: + return acct->active; + } + +@@ -189,7 +191,11 @@ static void acct_pin_kill(struct fs_pin *pin) + { + struct bsd_acct_struct *acct = to_acct(pin); + mutex_lock(&acct->lock); +- do_acct_process(acct); ++ /* ++ * Fill the accounting struct with the exiting task's info ++ * before punting to the workqueue. ++ */ ++ fill_ac(acct); + schedule_work(&acct->work); + wait_for_completion(&acct->done); + cmpxchg(&acct->ns->bacct, pin, NULL); +@@ -202,6 +208,9 @@ static void close_work(struct work_struct *work) + { + struct bsd_acct_struct *acct = container_of(work, struct bsd_acct_struct, work); + struct file *file = acct->file; ++ ++ /* We were fired by acct_pin_kill() which holds acct->lock. */ ++ acct_write_process(acct); + if (file->f_op->flush) + file->f_op->flush(file, NULL); + __fput_sync(file); +@@ -234,6 +243,20 @@ static int acct_on(struct filename *pathname) + return -EACCES; + } + ++ /* Exclude kernel kernel internal filesystems. */ ++ if (file_inode(file)->i_sb->s_flags & (SB_NOUSER | SB_KERNMOUNT)) { ++ kfree(acct); ++ filp_close(file, NULL); ++ return -EINVAL; ++ } ++ ++ /* Exclude procfs and sysfs. */ ++ if (file_inode(file)->i_sb->s_iflags & SB_I_USERNS_VISIBLE) { ++ kfree(acct); ++ filp_close(file, NULL); ++ return -EINVAL; ++ } ++ + if (!(file->f_mode & FMODE_CAN_WRITE)) { + kfree(acct); + filp_close(file, NULL); +@@ -430,13 +453,27 @@ static u32 encode_float(u64 value) + * do_exit() or when switching to a different output file. + */ + +-static void fill_ac(acct_t *ac) ++static void fill_ac(struct bsd_acct_struct *acct) + { + struct pacct_struct *pacct = ¤t->signal->pacct; ++ struct file *file = acct->file; ++ acct_t *ac = &acct->ac; + u64 elapsed, run_time; + time64_t btime; + struct tty_struct *tty; + ++ lockdep_assert_held(&acct->lock); ++ ++ if (time_is_after_jiffies(acct->needcheck)) { ++ acct->check_space = false; ++ ++ /* Don't fill in @ac if nothing will be written. */ ++ if (!acct->active) ++ return; ++ } else { ++ acct->check_space = true; ++ } ++ + /* + * Fill the accounting struct with the needed info as recorded + * by the different kernel functions. +@@ -484,64 +521,61 @@ static void fill_ac(acct_t *ac) + ac->ac_majflt = encode_comp_t(pacct->ac_majflt); + ac->ac_exitcode = pacct->ac_exitcode; + spin_unlock_irq(¤t->sighand->siglock); +-} +-/* +- * do_acct_process does all actual work. Caller holds the reference to file. +- */ +-static void do_acct_process(struct bsd_acct_struct *acct) +-{ +- acct_t ac; +- unsigned long flim; +- const struct cred *orig_cred; +- struct file *file = acct->file; + +- /* +- * Accounting records are not subject to resource limits. +- */ +- flim = rlimit(RLIMIT_FSIZE); +- current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY; +- /* Perform file operations on behalf of whoever enabled accounting */ +- orig_cred = override_creds(file->f_cred); +- +- /* +- * First check to see if there is enough free_space to continue +- * the process accounting system. +- */ +- if (!check_free_space(acct)) +- goto out; +- +- fill_ac(&ac); + /* we really need to bite the bullet and change layout */ +- ac.ac_uid = from_kuid_munged(file->f_cred->user_ns, orig_cred->uid); +- ac.ac_gid = from_kgid_munged(file->f_cred->user_ns, orig_cred->gid); ++ ac->ac_uid = from_kuid_munged(file->f_cred->user_ns, current_uid()); ++ ac->ac_gid = from_kgid_munged(file->f_cred->user_ns, current_gid()); + #if ACCT_VERSION == 1 || ACCT_VERSION == 2 + /* backward-compatible 16 bit fields */ +- ac.ac_uid16 = ac.ac_uid; +- ac.ac_gid16 = ac.ac_gid; ++ ac->ac_uid16 = ac->ac_uid; ++ ac->ac_gid16 = ac->ac_gid; + #elif ACCT_VERSION == 3 + { + struct pid_namespace *ns = acct->ns; + +- ac.ac_pid = task_tgid_nr_ns(current, ns); ++ ac->ac_pid = task_tgid_nr_ns(current, ns); + rcu_read_lock(); +- ac.ac_ppid = task_tgid_nr_ns(rcu_dereference(current->real_parent), +- ns); ++ ac->ac_ppid = task_tgid_nr_ns(rcu_dereference(current->real_parent), ns); + rcu_read_unlock(); + } + #endif ++} ++ ++static void acct_write_process(struct bsd_acct_struct *acct) ++{ ++ struct file *file = acct->file; ++ const struct cred *cred; ++ acct_t *ac = &acct->ac; ++ ++ /* Perform file operations on behalf of whoever enabled accounting */ ++ cred = override_creds(file->f_cred); ++ + /* +- * Get freeze protection. If the fs is frozen, just skip the write +- * as we could deadlock the system otherwise. ++ * First check to see if there is enough free_space to continue ++ * the process accounting system. Then get freeze protection. If ++ * the fs is frozen, just skip the write as we could deadlock ++ * the system otherwise. + */ +- if (file_start_write_trylock(file)) { ++ if (check_free_space(acct) && file_start_write_trylock(file)) { + /* it's been opened O_APPEND, so position is irrelevant */ + loff_t pos = 0; +- __kernel_write(file, &ac, sizeof(acct_t), &pos); ++ __kernel_write(file, ac, sizeof(acct_t), &pos); + file_end_write(file); + } +-out: ++ ++ revert_creds(cred); ++} ++ ++static void do_acct_process(struct bsd_acct_struct *acct) ++{ ++ unsigned long flim; ++ ++ /* Accounting records are not subject to resource limits. */ ++ flim = rlimit(RLIMIT_FSIZE); ++ current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY; ++ fill_ac(acct); ++ acct_write_process(acct); + current->signal->rlim[RLIMIT_FSIZE].rlim_cur = flim; +- revert_creds(orig_cred); + } + + /** +diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c +index 8caf56a308d964..eac5d1edefe97b 100644 +--- a/kernel/bpf/arena.c ++++ b/kernel/bpf/arena.c +@@ -39,7 +39,7 @@ + */ + + /* number of bytes addressable by LDX/STX insn with 16-bit 'off' field */ +-#define GUARD_SZ (1ull << sizeof_field(struct bpf_insn, off) * 8) ++#define GUARD_SZ round_up(1ull << sizeof_field(struct bpf_insn, off) * 8, PAGE_SIZE << 1) + #define KERN_VM_SZ (SZ_4G + GUARD_SZ) + + struct bpf_arena { +diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c +index 20f05de92e9c3d..7996fcea3755ec 100644 +--- a/kernel/bpf/bpf_cgrp_storage.c ++++ b/kernel/bpf/bpf_cgrp_storage.c +@@ -154,7 +154,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr) + + static void cgroup_storage_map_free(struct bpf_map *map) + { +- bpf_local_storage_map_free(map, &cgroup_cache, NULL); ++ bpf_local_storage_map_free(map, &cgroup_cache, &bpf_cgrp_storage_busy); + } + + /* *gfp_flags* is a hidden argument provided by the verifier */ +diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c +index 10d0975deadabe..c89604e6b6aabd 100644 +--- a/kernel/bpf/btf.c ++++ b/kernel/bpf/btf.c +@@ -6507,6 +6507,8 @@ static const struct bpf_raw_tp_null_args raw_tp_null_args[] = { + /* rxrpc */ + { "rxrpc_recvdata", 0x1 }, + { "rxrpc_resend", 0x10 }, ++ /* skb */ ++ {"kfree_skb", 0x1000}, + /* sunrpc */ + { "xs_stream_read_data", 0x1 }, + /* ... from xprt_cong_event event class */ +diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c +index e1cfe890e0be64..1499d8caa9a351 100644 +--- a/kernel/bpf/ringbuf.c ++++ b/kernel/bpf/ringbuf.c +@@ -268,8 +268,6 @@ static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma + /* allow writable mapping for the consumer_pos only */ + if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE) + return -EPERM; +- } else { +- vm_flags_clear(vma, VM_MAYWRITE); + } + /* remap_vmalloc_range() checks size and offset constraints */ + return remap_vmalloc_range(vma, rb_map->rb, +@@ -289,8 +287,6 @@ static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma + * position, and the ring buffer data itself. + */ + return -EPERM; +- } else { +- vm_flags_clear(vma, VM_MAYWRITE); + } + /* remap_vmalloc_range() checks size and offset constraints */ + return remap_vmalloc_range(vma, rb_map->rb, vma->vm_pgoff + RINGBUF_PGOFF); +diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c +index 5684e8ce132d54..36cb18b73e7251 100644 +--- a/kernel/bpf/syscall.c ++++ b/kernel/bpf/syscall.c +@@ -1031,7 +1031,7 @@ static const struct vm_operations_struct bpf_map_default_vmops = { + static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) + { + struct bpf_map *map = filp->private_data; +- int err; ++ int err = 0; + + if (!map->ops->map_mmap || !IS_ERR_OR_NULL(map->record)) + return -ENOTSUPP; +@@ -1055,24 +1055,33 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) + err = -EACCES; + goto out; + } ++ bpf_map_write_active_inc(map); + } ++out: ++ mutex_unlock(&map->freeze_mutex); ++ if (err) ++ return err; + + /* set default open/close callbacks */ + vma->vm_ops = &bpf_map_default_vmops; + vma->vm_private_data = map; + vm_flags_clear(vma, VM_MAYEXEC); ++ /* If mapping is read-only, then disallow potentially re-mapping with ++ * PROT_WRITE by dropping VM_MAYWRITE flag. This VM_MAYWRITE clearing ++ * means that as far as BPF map's memory-mapped VMAs are concerned, ++ * VM_WRITE and VM_MAYWRITE and equivalent, if one of them is set, ++ * both should be set, so we can forget about VM_MAYWRITE and always ++ * check just VM_WRITE ++ */ + if (!(vma->vm_flags & VM_WRITE)) +- /* disallow re-mapping with PROT_WRITE */ + vm_flags_clear(vma, VM_MAYWRITE); + + err = map->ops->map_mmap(map, vma); +- if (err) +- goto out; ++ if (err) { ++ if (vma->vm_flags & VM_WRITE) ++ bpf_map_write_active_dec(map); ++ } + +- if (vma->vm_flags & VM_MAYWRITE) +- bpf_map_write_active_inc(map); +-out: +- mutex_unlock(&map->freeze_mutex); + return err; + } + +@@ -1964,8 +1973,6 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file, + return err; + } + +-#define MAP_LOOKUP_RETRIES 3 +- + int generic_map_lookup_batch(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr) +@@ -1975,8 +1982,8 @@ int generic_map_lookup_batch(struct bpf_map *map, + void __user *values = u64_to_user_ptr(attr->batch.values); + void __user *keys = u64_to_user_ptr(attr->batch.keys); + void *buf, *buf_prevkey, *prev_key, *key, *value; +- int err, retry = MAP_LOOKUP_RETRIES; + u32 value_size, cp, max_count; ++ int err; + + if (attr->batch.elem_flags & ~BPF_F_LOCK) + return -EINVAL; +@@ -2022,14 +2029,8 @@ int generic_map_lookup_batch(struct bpf_map *map, + err = bpf_map_copy_value(map, key, value, + attr->batch.elem_flags); + +- if (err == -ENOENT) { +- if (retry) { +- retry--; +- continue; +- } +- err = -EINTR; +- break; +- } ++ if (err == -ENOENT) ++ goto next_key; + + if (err) + goto free_buf; +@@ -2044,12 +2045,12 @@ int generic_map_lookup_batch(struct bpf_map *map, + goto free_buf; + } + ++ cp++; ++next_key: + if (!prev_key) + prev_key = buf_prevkey; + + swap(prev_key, key); +- retry = MAP_LOOKUP_RETRIES; +- cp++; + cond_resched(); + } + +diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h +index 66744d60904d57..f3e121888d050f 100644 +--- a/kernel/sched/sched.h ++++ b/kernel/sched/sched.h +@@ -3666,10 +3666,28 @@ static inline int __mm_cid_try_get(struct task_struct *t, struct mm_struct *mm) + { + struct cpumask *cidmask = mm_cidmask(mm); + struct mm_cid __percpu *pcpu_cid = mm->pcpu_cid; +- int cid = __this_cpu_read(pcpu_cid->recent_cid); ++ int cid, max_nr_cid, allowed_max_nr_cid; + ++ /* ++ * After shrinking the number of threads or reducing the number ++ * of allowed cpus, reduce the value of max_nr_cid so expansion ++ * of cid allocation will preserve cache locality if the number ++ * of threads or allowed cpus increase again. ++ */ ++ max_nr_cid = atomic_read(&mm->max_nr_cid); ++ while ((allowed_max_nr_cid = min_t(int, READ_ONCE(mm->nr_cpus_allowed), ++ atomic_read(&mm->mm_users))), ++ max_nr_cid > allowed_max_nr_cid) { ++ /* atomic_try_cmpxchg loads previous mm->max_nr_cid into max_nr_cid. */ ++ if (atomic_try_cmpxchg(&mm->max_nr_cid, &max_nr_cid, allowed_max_nr_cid)) { ++ max_nr_cid = allowed_max_nr_cid; ++ break; ++ } ++ } + /* Try to re-use recent cid. This improves cache locality. */ +- if (!mm_cid_is_unset(cid) && !cpumask_test_and_set_cpu(cid, cidmask)) ++ cid = __this_cpu_read(pcpu_cid->recent_cid); ++ if (!mm_cid_is_unset(cid) && cid < max_nr_cid && ++ !cpumask_test_and_set_cpu(cid, cidmask)) + return cid; + /* + * Expand cid allocation if the maximum number of concurrency +@@ -3677,8 +3695,9 @@ static inline int __mm_cid_try_get(struct task_struct *t, struct mm_struct *mm) + * and number of threads. Expanding cid allocation as much as + * possible improves cache locality. + */ +- cid = atomic_read(&mm->max_nr_cid); ++ cid = max_nr_cid; + while (cid < READ_ONCE(mm->nr_cpus_allowed) && cid < atomic_read(&mm->mm_users)) { ++ /* atomic_try_cmpxchg loads previous mm->max_nr_cid into cid. */ + if (!atomic_try_cmpxchg(&mm->max_nr_cid, &cid, cid + 1)) + continue; + if (!cpumask_test_and_set_cpu(cid, cidmask)) +diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c +index 2e113f8b13a28d..b1861a57e2b062 100644 +--- a/kernel/trace/ftrace.c ++++ b/kernel/trace/ftrace.c +@@ -3238,15 +3238,22 @@ static struct ftrace_hash *copy_hash(struct ftrace_hash *src) + * The filter_hash updates uses just the append_hash() function + * and the notrace_hash does not. + */ +-static int append_hash(struct ftrace_hash **hash, struct ftrace_hash *new_hash) ++static int append_hash(struct ftrace_hash **hash, struct ftrace_hash *new_hash, ++ int size_bits) + { + struct ftrace_func_entry *entry; + int size; + int i; + +- /* An empty hash does everything */ +- if (ftrace_hash_empty(*hash)) +- return 0; ++ if (*hash) { ++ /* An empty hash does everything */ ++ if (ftrace_hash_empty(*hash)) ++ return 0; ++ } else { ++ *hash = alloc_ftrace_hash(size_bits); ++ if (!*hash) ++ return -ENOMEM; ++ } + + /* If new_hash has everything make hash have everything */ + if (ftrace_hash_empty(new_hash)) { +@@ -3310,16 +3317,18 @@ static int intersect_hash(struct ftrace_hash **hash, struct ftrace_hash *new_has + /* Return a new hash that has a union of all @ops->filter_hash entries */ + static struct ftrace_hash *append_hashes(struct ftrace_ops *ops) + { +- struct ftrace_hash *new_hash; ++ struct ftrace_hash *new_hash = NULL; + struct ftrace_ops *subops; ++ int size_bits; + int ret; + +- new_hash = alloc_ftrace_hash(ops->func_hash->filter_hash->size_bits); +- if (!new_hash) +- return NULL; ++ if (ops->func_hash->filter_hash) ++ size_bits = ops->func_hash->filter_hash->size_bits; ++ else ++ size_bits = FTRACE_HASH_DEFAULT_BITS; + + list_for_each_entry(subops, &ops->subop_list, list) { +- ret = append_hash(&new_hash, subops->func_hash->filter_hash); ++ ret = append_hash(&new_hash, subops->func_hash->filter_hash, size_bits); + if (ret < 0) { + free_ftrace_hash(new_hash); + return NULL; +@@ -3328,7 +3337,8 @@ static struct ftrace_hash *append_hashes(struct ftrace_ops *ops) + if (ftrace_hash_empty(new_hash)) + break; + } +- return new_hash; ++ /* Can't return NULL as that means this failed */ ++ return new_hash ? : EMPTY_HASH; + } + + /* Make @ops trace evenything except what all its subops do not trace */ +@@ -3523,7 +3533,8 @@ int ftrace_startup_subops(struct ftrace_ops *ops, struct ftrace_ops *subops, int + filter_hash = alloc_and_copy_ftrace_hash(size_bits, ops->func_hash->filter_hash); + if (!filter_hash) + return -ENOMEM; +- ret = append_hash(&filter_hash, subops->func_hash->filter_hash); ++ ret = append_hash(&filter_hash, subops->func_hash->filter_hash, ++ size_bits); + if (ret < 0) { + free_ftrace_hash(filter_hash); + return ret; +@@ -5759,6 +5770,9 @@ __ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove) + return -ENOENT; + free_hash_entry(hash, entry); + return 0; ++ } else if (__ftrace_lookup_ip(hash, ip) != NULL) { ++ /* Already exists */ ++ return 0; + } + + entry = add_hash_entry(hash, ip); +diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c +index d2267b4406cd8a..13f817afba4c2d 100644 +--- a/kernel/trace/trace.c ++++ b/kernel/trace/trace.c +@@ -26,6 +26,7 @@ + #include + #include + #include ++#include + #include + #include + #include +@@ -535,19 +536,16 @@ LIST_HEAD(ftrace_trace_arrays); + int trace_array_get(struct trace_array *this_tr) + { + struct trace_array *tr; +- int ret = -ENODEV; + +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&trace_types_lock); + list_for_each_entry(tr, &ftrace_trace_arrays, list) { + if (tr == this_tr) { + tr->ref++; +- ret = 0; +- break; ++ return 0; + } + } +- mutex_unlock(&trace_types_lock); + +- return ret; ++ return -ENODEV; + } + + static void __trace_array_put(struct trace_array *this_tr) +@@ -1443,22 +1441,20 @@ EXPORT_SYMBOL_GPL(tracing_snapshot_alloc); + int tracing_snapshot_cond_enable(struct trace_array *tr, void *cond_data, + cond_update_fn_t update) + { +- struct cond_snapshot *cond_snapshot; +- int ret = 0; ++ struct cond_snapshot *cond_snapshot __free(kfree) = ++ kzalloc(sizeof(*cond_snapshot), GFP_KERNEL); ++ int ret; + +- cond_snapshot = kzalloc(sizeof(*cond_snapshot), GFP_KERNEL); + if (!cond_snapshot) + return -ENOMEM; + + cond_snapshot->cond_data = cond_data; + cond_snapshot->update = update; + +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&trace_types_lock); + +- if (tr->current_trace->use_max_tr) { +- ret = -EBUSY; +- goto fail_unlock; +- } ++ if (tr->current_trace->use_max_tr) ++ return -EBUSY; + + /* + * The cond_snapshot can only change to NULL without the +@@ -1468,29 +1464,20 @@ int tracing_snapshot_cond_enable(struct trace_array *tr, void *cond_data, + * do safely with only holding the trace_types_lock and not + * having to take the max_lock. + */ +- if (tr->cond_snapshot) { +- ret = -EBUSY; +- goto fail_unlock; +- } ++ if (tr->cond_snapshot) ++ return -EBUSY; + + ret = tracing_arm_snapshot_locked(tr); + if (ret) +- goto fail_unlock; ++ return ret; + + local_irq_disable(); + arch_spin_lock(&tr->max_lock); +- tr->cond_snapshot = cond_snapshot; ++ tr->cond_snapshot = no_free_ptr(cond_snapshot); + arch_spin_unlock(&tr->max_lock); + local_irq_enable(); + +- mutex_unlock(&trace_types_lock); +- +- return ret; +- +- fail_unlock: +- mutex_unlock(&trace_types_lock); +- kfree(cond_snapshot); +- return ret; ++ return 0; + } + EXPORT_SYMBOL_GPL(tracing_snapshot_cond_enable); + +@@ -2203,10 +2190,10 @@ static __init int init_trace_selftests(void) + + selftests_can_run = true; + +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&trace_types_lock); + + if (list_empty(&postponed_selftests)) +- goto out; ++ return 0; + + pr_info("Running postponed tracer tests:\n"); + +@@ -2235,9 +2222,6 @@ static __init int init_trace_selftests(void) + } + tracing_selftest_running = false; + +- out: +- mutex_unlock(&trace_types_lock); +- + return 0; + } + core_initcall(init_trace_selftests); +@@ -2807,7 +2791,7 @@ int tracepoint_printk_sysctl(const struct ctl_table *table, int write, + int save_tracepoint_printk; + int ret; + +- mutex_lock(&tracepoint_printk_mutex); ++ guard(mutex)(&tracepoint_printk_mutex); + save_tracepoint_printk = tracepoint_printk; + + ret = proc_dointvec(table, write, buffer, lenp, ppos); +@@ -2820,16 +2804,13 @@ int tracepoint_printk_sysctl(const struct ctl_table *table, int write, + tracepoint_printk = 0; + + if (save_tracepoint_printk == tracepoint_printk) +- goto out; ++ return ret; + + if (tracepoint_printk) + static_key_enable(&tracepoint_printk_key.key); + else + static_key_disable(&tracepoint_printk_key.key); + +- out: +- mutex_unlock(&tracepoint_printk_mutex); +- + return ret; + } + +@@ -5127,7 +5108,8 @@ static int tracing_trace_options_show(struct seq_file *m, void *v) + u32 tracer_flags; + int i; + +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&trace_types_lock); ++ + tracer_flags = tr->current_trace->flags->val; + trace_opts = tr->current_trace->flags->opts; + +@@ -5144,7 +5126,6 @@ static int tracing_trace_options_show(struct seq_file *m, void *v) + else + seq_printf(m, "no%s\n", trace_opts[i].name); + } +- mutex_unlock(&trace_types_lock); + + return 0; + } +@@ -5809,7 +5790,7 @@ trace_insert_eval_map_file(struct module *mod, struct trace_eval_map **start, + return; + } + +- mutex_lock(&trace_eval_mutex); ++ guard(mutex)(&trace_eval_mutex); + + if (!trace_eval_maps) + trace_eval_maps = map_array; +@@ -5833,8 +5814,6 @@ trace_insert_eval_map_file(struct module *mod, struct trace_eval_map **start, + map_array++; + } + memset(map_array, 0, sizeof(*map_array)); +- +- mutex_unlock(&trace_eval_mutex); + } + + static void trace_create_eval_file(struct dentry *d_tracer) +@@ -5996,26 +5975,15 @@ static int __tracing_resize_ring_buffer(struct trace_array *tr, + ssize_t tracing_resize_ring_buffer(struct trace_array *tr, + unsigned long size, int cpu_id) + { +- int ret; +- +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&trace_types_lock); + + if (cpu_id != RING_BUFFER_ALL_CPUS) { + /* make sure, this cpu is enabled in the mask */ +- if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) { +- ret = -EINVAL; +- goto out; +- } ++ if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) ++ return -EINVAL; + } + +- ret = __tracing_resize_ring_buffer(tr, size, cpu_id); +- if (ret < 0) +- ret = -ENOMEM; +- +-out: +- mutex_unlock(&trace_types_lock); +- +- return ret; ++ return __tracing_resize_ring_buffer(tr, size, cpu_id); + } + + static void update_last_data(struct trace_array *tr) +@@ -6106,9 +6074,9 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) + #ifdef CONFIG_TRACER_MAX_TRACE + bool had_max_tr; + #endif +- int ret = 0; ++ int ret; + +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&trace_types_lock); + + update_last_data(tr); + +@@ -6116,7 +6084,7 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) + ret = __tracing_resize_ring_buffer(tr, trace_buf_size, + RING_BUFFER_ALL_CPUS); + if (ret < 0) +- goto out; ++ return ret; + ret = 0; + } + +@@ -6124,43 +6092,37 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) + if (strcmp(t->name, buf) == 0) + break; + } +- if (!t) { +- ret = -EINVAL; +- goto out; +- } ++ if (!t) ++ return -EINVAL; ++ + if (t == tr->current_trace) +- goto out; ++ return 0; + + #ifdef CONFIG_TRACER_SNAPSHOT + if (t->use_max_tr) { + local_irq_disable(); + arch_spin_lock(&tr->max_lock); +- if (tr->cond_snapshot) +- ret = -EBUSY; ++ ret = tr->cond_snapshot ? -EBUSY : 0; + arch_spin_unlock(&tr->max_lock); + local_irq_enable(); + if (ret) +- goto out; ++ return ret; + } + #endif + /* Some tracers won't work on kernel command line */ + if (system_state < SYSTEM_RUNNING && t->noboot) { + pr_warn("Tracer '%s' is not allowed on command line, ignored\n", + t->name); +- goto out; ++ return 0; + } + + /* Some tracers are only allowed for the top level buffer */ +- if (!trace_ok_for_array(t, tr)) { +- ret = -EINVAL; +- goto out; +- } ++ if (!trace_ok_for_array(t, tr)) ++ return -EINVAL; + + /* If trace pipe files are being read, we can't change the tracer */ +- if (tr->trace_ref) { +- ret = -EBUSY; +- goto out; +- } ++ if (tr->trace_ref) ++ return -EBUSY; + + trace_branch_disable(); + +@@ -6191,7 +6153,7 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) + if (!had_max_tr && t->use_max_tr) { + ret = tracing_arm_snapshot_locked(tr); + if (ret) +- goto out; ++ return ret; + } + #else + tr->current_trace = &nop_trace; +@@ -6204,17 +6166,15 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) + if (t->use_max_tr) + tracing_disarm_snapshot(tr); + #endif +- goto out; ++ return ret; + } + } + + tr->current_trace = t; + tr->current_trace->enabled++; + trace_branch_enable(tr); +- out: +- mutex_unlock(&trace_types_lock); + +- return ret; ++ return 0; + } + + static ssize_t +@@ -6292,22 +6252,18 @@ tracing_thresh_write(struct file *filp, const char __user *ubuf, + struct trace_array *tr = filp->private_data; + int ret; + +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&trace_types_lock); + ret = tracing_nsecs_write(&tracing_thresh, ubuf, cnt, ppos); + if (ret < 0) +- goto out; ++ return ret; + + if (tr->current_trace->update_thresh) { + ret = tr->current_trace->update_thresh(tr); + if (ret < 0) +- goto out; ++ return ret; + } + +- ret = cnt; +-out: +- mutex_unlock(&trace_types_lock); +- +- return ret; ++ return cnt; + } + + #ifdef CONFIG_TRACER_MAX_TRACE +@@ -6526,31 +6482,29 @@ tracing_read_pipe(struct file *filp, char __user *ubuf, + * This is just a matter of traces coherency, the ring buffer itself + * is protected. + */ +- mutex_lock(&iter->mutex); ++ guard(mutex)(&iter->mutex); + + /* return any leftover data */ + sret = trace_seq_to_user(&iter->seq, ubuf, cnt); + if (sret != -EBUSY) +- goto out; ++ return sret; + + trace_seq_init(&iter->seq); + + if (iter->trace->read) { + sret = iter->trace->read(iter, filp, ubuf, cnt, ppos); + if (sret) +- goto out; ++ return sret; + } + + waitagain: + sret = tracing_wait_pipe(filp); + if (sret <= 0) +- goto out; ++ return sret; + + /* stop when tracing is finished */ +- if (trace_empty(iter)) { +- sret = 0; +- goto out; +- } ++ if (trace_empty(iter)) ++ return 0; + + if (cnt >= TRACE_SEQ_BUFFER_SIZE) + cnt = TRACE_SEQ_BUFFER_SIZE - 1; +@@ -6614,9 +6568,6 @@ tracing_read_pipe(struct file *filp, char __user *ubuf, + if (sret == -EBUSY) + goto waitagain; + +-out: +- mutex_unlock(&iter->mutex); +- + return sret; + } + +@@ -7208,25 +7159,19 @@ u64 tracing_event_time_stamp(struct trace_buffer *buffer, struct ring_buffer_eve + */ + int tracing_set_filter_buffering(struct trace_array *tr, bool set) + { +- int ret = 0; +- +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&trace_types_lock); + + if (set && tr->no_filter_buffering_ref++) +- goto out; ++ return 0; + + if (!set) { +- if (WARN_ON_ONCE(!tr->no_filter_buffering_ref)) { +- ret = -EINVAL; +- goto out; +- } ++ if (WARN_ON_ONCE(!tr->no_filter_buffering_ref)) ++ return -EINVAL; + + --tr->no_filter_buffering_ref; + } +- out: +- mutex_unlock(&trace_types_lock); + +- return ret; ++ return 0; + } + + struct ftrace_buffer_info { +@@ -7302,12 +7247,10 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, + if (ret) + return ret; + +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&trace_types_lock); + +- if (tr->current_trace->use_max_tr) { +- ret = -EBUSY; +- goto out; +- } ++ if (tr->current_trace->use_max_tr) ++ return -EBUSY; + + local_irq_disable(); + arch_spin_lock(&tr->max_lock); +@@ -7316,24 +7259,20 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, + arch_spin_unlock(&tr->max_lock); + local_irq_enable(); + if (ret) +- goto out; ++ return ret; + + switch (val) { + case 0: +- if (iter->cpu_file != RING_BUFFER_ALL_CPUS) { +- ret = -EINVAL; +- break; +- } ++ if (iter->cpu_file != RING_BUFFER_ALL_CPUS) ++ return -EINVAL; + if (tr->allocated_snapshot) + free_snapshot(tr); + break; + case 1: + /* Only allow per-cpu swap if the ring buffer supports it */ + #ifndef CONFIG_RING_BUFFER_ALLOW_SWAP +- if (iter->cpu_file != RING_BUFFER_ALL_CPUS) { +- ret = -EINVAL; +- break; +- } ++ if (iter->cpu_file != RING_BUFFER_ALL_CPUS) ++ return -EINVAL; + #endif + if (tr->allocated_snapshot) + ret = resize_buffer_duplicate_size(&tr->max_buffer, +@@ -7341,7 +7280,7 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, + + ret = tracing_arm_snapshot_locked(tr); + if (ret) +- break; ++ return ret; + + /* Now, we're going to swap */ + if (iter->cpu_file == RING_BUFFER_ALL_CPUS) { +@@ -7368,8 +7307,7 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, + *ppos += cnt; + ret = cnt; + } +-out: +- mutex_unlock(&trace_types_lock); ++ + return ret; + } + +@@ -7755,12 +7693,11 @@ void tracing_log_err(struct trace_array *tr, + + len += sizeof(CMD_PREFIX) + 2 * sizeof("\n") + strlen(cmd) + 1; + +- mutex_lock(&tracing_err_log_lock); ++ guard(mutex)(&tracing_err_log_lock); ++ + err = get_tracing_log_err(tr, len); +- if (PTR_ERR(err) == -ENOMEM) { +- mutex_unlock(&tracing_err_log_lock); ++ if (PTR_ERR(err) == -ENOMEM) + return; +- } + + snprintf(err->loc, TRACING_LOG_LOC_MAX, "%s: error: ", loc); + snprintf(err->cmd, len, "\n" CMD_PREFIX "%s\n", cmd); +@@ -7771,7 +7708,6 @@ void tracing_log_err(struct trace_array *tr, + err->info.ts = local_clock(); + + list_add_tail(&err->list, &tr->err_log); +- mutex_unlock(&tracing_err_log_lock); + } + + static void clear_tracing_err_log(struct trace_array *tr) +@@ -9519,20 +9455,17 @@ static int instance_mkdir(const char *name) + struct trace_array *tr; + int ret; + +- mutex_lock(&event_mutex); +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&event_mutex); ++ guard(mutex)(&trace_types_lock); + + ret = -EEXIST; + if (trace_array_find(name)) +- goto out_unlock; ++ return -EEXIST; + + tr = trace_array_create(name); + + ret = PTR_ERR_OR_ZERO(tr); + +-out_unlock: +- mutex_unlock(&trace_types_lock); +- mutex_unlock(&event_mutex); + return ret; + } + +@@ -9582,24 +9515,23 @@ struct trace_array *trace_array_get_by_name(const char *name, const char *system + { + struct trace_array *tr; + +- mutex_lock(&event_mutex); +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&event_mutex); ++ guard(mutex)(&trace_types_lock); + + list_for_each_entry(tr, &ftrace_trace_arrays, list) { +- if (tr->name && strcmp(tr->name, name) == 0) +- goto out_unlock; ++ if (tr->name && strcmp(tr->name, name) == 0) { ++ tr->ref++; ++ return tr; ++ } + } + + tr = trace_array_create_systems(name, systems, 0, 0); + + if (IS_ERR(tr)) + tr = NULL; +-out_unlock: +- if (tr) ++ else + tr->ref++; + +- mutex_unlock(&trace_types_lock); +- mutex_unlock(&event_mutex); + return tr; + } + EXPORT_SYMBOL_GPL(trace_array_get_by_name); +@@ -9650,48 +9582,36 @@ static int __remove_instance(struct trace_array *tr) + int trace_array_destroy(struct trace_array *this_tr) + { + struct trace_array *tr; +- int ret; + + if (!this_tr) + return -EINVAL; + +- mutex_lock(&event_mutex); +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&event_mutex); ++ guard(mutex)(&trace_types_lock); + +- ret = -ENODEV; + + /* Making sure trace array exists before destroying it. */ + list_for_each_entry(tr, &ftrace_trace_arrays, list) { +- if (tr == this_tr) { +- ret = __remove_instance(tr); +- break; +- } ++ if (tr == this_tr) ++ return __remove_instance(tr); + } + +- mutex_unlock(&trace_types_lock); +- mutex_unlock(&event_mutex); +- +- return ret; ++ return -ENODEV; + } + EXPORT_SYMBOL_GPL(trace_array_destroy); + + static int instance_rmdir(const char *name) + { + struct trace_array *tr; +- int ret; + +- mutex_lock(&event_mutex); +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&event_mutex); ++ guard(mutex)(&trace_types_lock); + +- ret = -ENODEV; + tr = trace_array_find(name); +- if (tr) +- ret = __remove_instance(tr); +- +- mutex_unlock(&trace_types_lock); +- mutex_unlock(&event_mutex); ++ if (!tr) ++ return -ENODEV; + +- return ret; ++ return __remove_instance(tr); + } + + static __init void create_trace_instances(struct dentry *d_tracer) +@@ -9704,19 +9624,16 @@ static __init void create_trace_instances(struct dentry *d_tracer) + if (MEM_FAIL(!trace_instance_dir, "Failed to create instances directory\n")) + return; + +- mutex_lock(&event_mutex); +- mutex_lock(&trace_types_lock); ++ guard(mutex)(&event_mutex); ++ guard(mutex)(&trace_types_lock); + + list_for_each_entry(tr, &ftrace_trace_arrays, list) { + if (!tr->name) + continue; + if (MEM_FAIL(trace_array_create_dir(tr) < 0, + "Failed to create instance directory\n")) +- break; ++ return; + } +- +- mutex_unlock(&trace_types_lock); +- mutex_unlock(&event_mutex); + } + + static void +@@ -9930,7 +9847,7 @@ static void trace_module_remove_evals(struct module *mod) + if (!mod->num_trace_evals) + return; + +- mutex_lock(&trace_eval_mutex); ++ guard(mutex)(&trace_eval_mutex); + + map = trace_eval_maps; + +@@ -9942,12 +9859,10 @@ static void trace_module_remove_evals(struct module *mod) + map = map->tail.next; + } + if (!map) +- goto out; ++ return; + + *last = trace_eval_jmp_to_tail(map)->tail.next; + kfree(map); +- out: +- mutex_unlock(&trace_eval_mutex); + } + #else + static inline void trace_module_remove_evals(struct module *mod) { } +diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c +index d358c9935164de..df56f9b7601094 100644 +--- a/kernel/trace/trace_functions.c ++++ b/kernel/trace/trace_functions.c +@@ -216,7 +216,7 @@ function_trace_call(unsigned long ip, unsigned long parent_ip, + + parent_ip = function_get_true_parent_ip(parent_ip, fregs); + +- trace_ctx = tracing_gen_ctx(); ++ trace_ctx = tracing_gen_ctx_dec(); + + data = this_cpu_ptr(tr->array_buffer.data); + if (!atomic_read(&data->disabled)) +@@ -321,7 +321,6 @@ function_no_repeats_trace_call(unsigned long ip, unsigned long parent_ip, + struct trace_array *tr = op->private; + struct trace_array_cpu *data; + unsigned int trace_ctx; +- unsigned long flags; + int bit; + + if (unlikely(!tr->function_enabled)) +@@ -347,8 +346,7 @@ function_no_repeats_trace_call(unsigned long ip, unsigned long parent_ip, + if (is_repeat_check(tr, last_info, ip, parent_ip)) + goto out; + +- local_save_flags(flags); +- trace_ctx = tracing_gen_ctx_flags(flags); ++ trace_ctx = tracing_gen_ctx_dec(); + process_repeats(tr, ip, parent_ip, last_info, trace_ctx); + + trace_function(tr, ip, parent_ip, trace_ctx); +diff --git a/lib/iov_iter.c b/lib/iov_iter.c +index 9ec806f989f258..65f550cb5081b9 100644 +--- a/lib/iov_iter.c ++++ b/lib/iov_iter.c +@@ -1428,6 +1428,8 @@ static ssize_t __import_iovec_ubuf(int type, const struct iovec __user *uvec, + struct iovec *iov = *iovp; + ssize_t ret; + ++ *iovp = NULL; ++ + if (compat) + ret = copy_compat_iovec_from_user(iov, uvec, 1); + else +@@ -1438,7 +1440,6 @@ static ssize_t __import_iovec_ubuf(int type, const struct iovec __user *uvec, + ret = import_ubuf(type, iov->iov_base, iov->iov_len, i); + if (unlikely(ret)) + return ret; +- *iovp = NULL; + return i->count; + } + +diff --git a/mm/madvise.c b/mm/madvise.c +index 0ceae57da7dad3..dcadd5b3457e78 100644 +--- a/mm/madvise.c ++++ b/mm/madvise.c +@@ -928,7 +928,16 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, + */ + end = vma->vm_end; + } +- VM_WARN_ON(start >= end); ++ /* ++ * If the memory region between start and end was ++ * originally backed by 4kB pages and then remapped to ++ * be backed by hugepages while mmap_lock was dropped, ++ * the adjustment for hugetlb vma above may have rounded ++ * end down to the start address. ++ */ ++ if (start == end) ++ return 0; ++ VM_WARN_ON(start > end); + } + + if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED) +diff --git a/mm/migrate_device.c b/mm/migrate_device.c +index 9cf26592ac934d..5bd888223cc8b8 100644 +--- a/mm/migrate_device.c ++++ b/mm/migrate_device.c +@@ -840,20 +840,15 @@ void migrate_device_finalize(unsigned long *src_pfns, + dst = src; + } + ++ if (!folio_is_zone_device(dst)) ++ folio_add_lru(dst); + remove_migration_ptes(src, dst, 0); + folio_unlock(src); +- +- if (folio_is_zone_device(src)) +- folio_put(src); +- else +- folio_putback_lru(src); ++ folio_put(src); + + if (dst != src) { + folio_unlock(dst); +- if (folio_is_zone_device(dst)) +- folio_put(dst); +- else +- folio_putback_lru(dst); ++ folio_put(dst); + } + } + } +diff --git a/mm/zswap.c b/mm/zswap.c +index b84c20d889b1b5..6e0c0fca583000 100644 +--- a/mm/zswap.c ++++ b/mm/zswap.c +@@ -1445,9 +1445,9 @@ static void shrink_worker(struct work_struct *w) + * main API + **********************************/ + +-static ssize_t zswap_store_page(struct page *page, +- struct obj_cgroup *objcg, +- struct zswap_pool *pool) ++static bool zswap_store_page(struct page *page, ++ struct obj_cgroup *objcg, ++ struct zswap_pool *pool) + { + swp_entry_t page_swpentry = page_swap_entry(page); + struct zswap_entry *entry, *old; +@@ -1456,7 +1456,7 @@ static ssize_t zswap_store_page(struct page *page, + entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page)); + if (!entry) { + zswap_reject_kmemcache_fail++; +- return -EINVAL; ++ return false; + } + + if (!zswap_compress(page, entry, pool)) +@@ -1483,13 +1483,17 @@ static ssize_t zswap_store_page(struct page *page, + + /* + * The entry is successfully compressed and stored in the tree, there is +- * no further possibility of failure. Grab refs to the pool and objcg. +- * These refs will be dropped by zswap_entry_free() when the entry is +- * removed from the tree. ++ * no further possibility of failure. Grab refs to the pool and objcg, ++ * charge zswap memory, and increment zswap_stored_pages. ++ * The opposite actions will be performed by zswap_entry_free() ++ * when the entry is removed from the tree. + */ + zswap_pool_get(pool); +- if (objcg) ++ if (objcg) { + obj_cgroup_get(objcg); ++ obj_cgroup_charge_zswap(objcg, entry->length); ++ } ++ atomic_long_inc(&zswap_stored_pages); + + /* + * We finish initializing the entry while it's already in xarray. +@@ -1510,13 +1514,13 @@ static ssize_t zswap_store_page(struct page *page, + zswap_lru_add(&zswap_list_lru, entry); + } + +- return entry->length; ++ return true; + + store_failed: + zpool_free(pool->zpool, entry->handle); + compress_failed: + zswap_entry_cache_free(entry); +- return -EINVAL; ++ return false; + } + + bool zswap_store(struct folio *folio) +@@ -1526,7 +1530,6 @@ bool zswap_store(struct folio *folio) + struct obj_cgroup *objcg = NULL; + struct mem_cgroup *memcg = NULL; + struct zswap_pool *pool; +- size_t compressed_bytes = 0; + bool ret = false; + long index; + +@@ -1564,20 +1567,14 @@ bool zswap_store(struct folio *folio) + + for (index = 0; index < nr_pages; ++index) { + struct page *page = folio_page(folio, index); +- ssize_t bytes; + +- bytes = zswap_store_page(page, objcg, pool); +- if (bytes < 0) ++ if (!zswap_store_page(page, objcg, pool)) + goto put_pool; +- compressed_bytes += bytes; + } + +- if (objcg) { +- obj_cgroup_charge_zswap(objcg, compressed_bytes); ++ if (objcg) + count_objcg_events(objcg, ZSWPOUT, nr_pages); +- } + +- atomic_long_add(nr_pages, &zswap_stored_pages); + count_vm_events(ZSWPOUT, nr_pages); + + ret = true; +diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c +index 501ec4249fedc3..8612023bec60dc 100644 +--- a/net/bpf/test_run.c ++++ b/net/bpf/test_run.c +@@ -660,12 +660,9 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, + void __user *data_in = u64_to_user_ptr(kattr->test.data_in); + void *data; + +- if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom) ++ if (user_size < ETH_HLEN || user_size > PAGE_SIZE - headroom - tailroom) + return ERR_PTR(-EINVAL); + +- if (user_size > size) +- return ERR_PTR(-EMSGSIZE); +- + size = SKB_DATA_ALIGN(size); + data = kzalloc(size + headroom + tailroom, GFP_USER); + if (!data) +diff --git a/net/core/dev.c b/net/core/dev.c +index fbb796375aa0ef..2b09714761c62a 100644 +--- a/net/core/dev.c ++++ b/net/core/dev.c +@@ -1012,6 +1012,12 @@ int netdev_get_name(struct net *net, char *name, int ifindex) + return ret; + } + ++static bool dev_addr_cmp(struct net_device *dev, unsigned short type, ++ const char *ha) ++{ ++ return dev->type == type && !memcmp(dev->dev_addr, ha, dev->addr_len); ++} ++ + /** + * dev_getbyhwaddr_rcu - find a device by its hardware address + * @net: the applicable net namespace +@@ -1020,7 +1026,7 @@ int netdev_get_name(struct net *net, char *name, int ifindex) + * + * Search for an interface by MAC address. Returns NULL if the device + * is not found or a pointer to the device. +- * The caller must hold RCU or RTNL. ++ * The caller must hold RCU. + * The returned device has not had its ref count increased + * and the caller must therefore be careful about locking + * +@@ -1032,14 +1038,39 @@ struct net_device *dev_getbyhwaddr_rcu(struct net *net, unsigned short type, + struct net_device *dev; + + for_each_netdev_rcu(net, dev) +- if (dev->type == type && +- !memcmp(dev->dev_addr, ha, dev->addr_len)) ++ if (dev_addr_cmp(dev, type, ha)) + return dev; + + return NULL; + } + EXPORT_SYMBOL(dev_getbyhwaddr_rcu); + ++/** ++ * dev_getbyhwaddr() - find a device by its hardware address ++ * @net: the applicable net namespace ++ * @type: media type of device ++ * @ha: hardware address ++ * ++ * Similar to dev_getbyhwaddr_rcu(), but the owner needs to hold ++ * rtnl_lock. ++ * ++ * Context: rtnl_lock() must be held. ++ * Return: pointer to the net_device, or NULL if not found ++ */ ++struct net_device *dev_getbyhwaddr(struct net *net, unsigned short type, ++ const char *ha) ++{ ++ struct net_device *dev; ++ ++ ASSERT_RTNL(); ++ for_each_netdev(net, dev) ++ if (dev_addr_cmp(dev, type, ha)) ++ return dev; ++ ++ return NULL; ++} ++EXPORT_SYMBOL(dev_getbyhwaddr); ++ + struct net_device *dev_getfirstbyhwtype(struct net *net, unsigned short type) + { + struct net_device *dev, *ret = NULL; +diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c +index 6efd4cccc9ddd2..212f0a048cab68 100644 +--- a/net/core/drop_monitor.c ++++ b/net/core/drop_monitor.c +@@ -1734,30 +1734,30 @@ static int __init init_net_drop_monitor(void) + return -ENOSPC; + } + +- rc = genl_register_family(&net_drop_monitor_family); +- if (rc) { +- pr_err("Could not create drop monitor netlink family\n"); +- return rc; ++ for_each_possible_cpu(cpu) { ++ net_dm_cpu_data_init(cpu); ++ net_dm_hw_cpu_data_init(cpu); + } +- WARN_ON(net_drop_monitor_family.mcgrp_offset != NET_DM_GRP_ALERT); + + rc = register_netdevice_notifier(&dropmon_net_notifier); + if (rc < 0) { + pr_crit("Failed to register netdevice notifier\n"); ++ return rc; ++ } ++ ++ rc = genl_register_family(&net_drop_monitor_family); ++ if (rc) { ++ pr_err("Could not create drop monitor netlink family\n"); + goto out_unreg; + } ++ WARN_ON(net_drop_monitor_family.mcgrp_offset != NET_DM_GRP_ALERT); + + rc = 0; + +- for_each_possible_cpu(cpu) { +- net_dm_cpu_data_init(cpu); +- net_dm_hw_cpu_data_init(cpu); +- } +- + goto out; + + out_unreg: +- genl_unregister_family(&net_drop_monitor_family); ++ WARN_ON(unregister_netdevice_notifier(&dropmon_net_notifier)); + out: + return rc; + } +@@ -1766,19 +1766,18 @@ static void exit_net_drop_monitor(void) + { + int cpu; + +- BUG_ON(unregister_netdevice_notifier(&dropmon_net_notifier)); +- + /* + * Because of the module_get/put we do in the trace state change path + * we are guaranteed not to have any current users when we get here + */ ++ BUG_ON(genl_unregister_family(&net_drop_monitor_family)); ++ ++ BUG_ON(unregister_netdevice_notifier(&dropmon_net_notifier)); + + for_each_possible_cpu(cpu) { + net_dm_hw_cpu_data_fini(cpu); + net_dm_cpu_data_fini(cpu); + } +- +- BUG_ON(genl_unregister_family(&net_drop_monitor_family)); + } + + module_init(init_net_drop_monitor); +diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c +index 5db41bf2ed93e0..9cd8de6bebb543 100644 +--- a/net/core/flow_dissector.c ++++ b/net/core/flow_dissector.c +@@ -853,23 +853,30 @@ __skb_flow_dissect_ports(const struct sk_buff *skb, + void *target_container, const void *data, + int nhoff, u8 ip_proto, int hlen) + { +- enum flow_dissector_key_id dissector_ports = FLOW_DISSECTOR_KEY_MAX; +- struct flow_dissector_key_ports *key_ports; ++ struct flow_dissector_key_ports_range *key_ports_range = NULL; ++ struct flow_dissector_key_ports *key_ports = NULL; ++ __be32 ports; + + if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS)) +- dissector_ports = FLOW_DISSECTOR_KEY_PORTS; +- else if (dissector_uses_key(flow_dissector, +- FLOW_DISSECTOR_KEY_PORTS_RANGE)) +- dissector_ports = FLOW_DISSECTOR_KEY_PORTS_RANGE; ++ key_ports = skb_flow_dissector_target(flow_dissector, ++ FLOW_DISSECTOR_KEY_PORTS, ++ target_container); + +- if (dissector_ports == FLOW_DISSECTOR_KEY_MAX) ++ if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS_RANGE)) ++ key_ports_range = skb_flow_dissector_target(flow_dissector, ++ FLOW_DISSECTOR_KEY_PORTS_RANGE, ++ target_container); ++ ++ if (!key_ports && !key_ports_range) + return; + +- key_ports = skb_flow_dissector_target(flow_dissector, +- dissector_ports, +- target_container); +- key_ports->ports = __skb_flow_get_ports(skb, nhoff, ip_proto, +- data, hlen); ++ ports = __skb_flow_get_ports(skb, nhoff, ip_proto, data, hlen); ++ ++ if (key_ports) ++ key_ports->ports = ports; ++ ++ if (key_ports_range) ++ key_ports_range->tp.ports = ports; + } + + static void +@@ -924,6 +931,7 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys, + struct flow_dissector *flow_dissector, + void *target_container) + { ++ struct flow_dissector_key_ports_range *key_ports_range = NULL; + struct flow_dissector_key_ports *key_ports = NULL; + struct flow_dissector_key_control *key_control; + struct flow_dissector_key_basic *key_basic; +@@ -968,20 +976,21 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys, + key_control->addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS; + } + +- if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS)) ++ if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS)) { + key_ports = skb_flow_dissector_target(flow_dissector, + FLOW_DISSECTOR_KEY_PORTS, + target_container); +- else if (dissector_uses_key(flow_dissector, +- FLOW_DISSECTOR_KEY_PORTS_RANGE)) +- key_ports = skb_flow_dissector_target(flow_dissector, +- FLOW_DISSECTOR_KEY_PORTS_RANGE, +- target_container); +- +- if (key_ports) { + key_ports->src = flow_keys->sport; + key_ports->dst = flow_keys->dport; + } ++ if (dissector_uses_key(flow_dissector, ++ FLOW_DISSECTOR_KEY_PORTS_RANGE)) { ++ key_ports_range = skb_flow_dissector_target(flow_dissector, ++ FLOW_DISSECTOR_KEY_PORTS_RANGE, ++ target_container); ++ key_ports_range->tp.src = flow_keys->sport; ++ key_ports_range->tp.dst = flow_keys->dport; ++ } + + if (dissector_uses_key(flow_dissector, + FLOW_DISSECTOR_KEY_FLOW_LABEL)) { +diff --git a/net/core/gro.c b/net/core/gro.c +index d1f44084e978fb..78b320b6317445 100644 +--- a/net/core/gro.c ++++ b/net/core/gro.c +@@ -7,9 +7,6 @@ + + #define MAX_GRO_SKBS 8 + +-/* This should be increased if a protocol with a bigger head is added. */ +-#define GRO_MAX_HEAD (MAX_HEADER + 128) +- + static DEFINE_SPINLOCK(offload_lock); + + /** +diff --git a/net/core/skbuff.c b/net/core/skbuff.c +index 6841e61a6bd0b6..f251a99f8d4217 100644 +--- a/net/core/skbuff.c ++++ b/net/core/skbuff.c +@@ -69,6 +69,7 @@ + #include + #include + #include ++#include + #include + #include + #include +@@ -95,7 +96,9 @@ + static struct kmem_cache *skbuff_ext_cache __ro_after_init; + #endif + +-#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER) ++#define GRO_MAX_HEAD_PAD (GRO_MAX_HEAD + NET_SKB_PAD + NET_IP_ALIGN) ++#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(max(MAX_TCP_HEADER, \ ++ GRO_MAX_HEAD_PAD)) + + /* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. + * This should ensure that SKB_SMALL_HEAD_HEADROOM is a unique +@@ -736,7 +739,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, + /* If requested length is either too small or too big, + * we use kmalloc() for skb->head allocation. + */ +- if (len <= SKB_WITH_OVERHEAD(1024) || ++ if (len <= SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE) || + len > SKB_WITH_OVERHEAD(PAGE_SIZE) || + (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { + skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); +@@ -816,7 +819,8 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len) + * When the small frag allocator is available, prefer it over kmalloc + * for small fragments + */ +- if ((!NAPI_HAS_SMALL_PAGE_FRAG && len <= SKB_WITH_OVERHEAD(1024)) || ++ if ((!NAPI_HAS_SMALL_PAGE_FRAG && ++ len <= SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)) || + len > SKB_WITH_OVERHEAD(PAGE_SIZE) || + (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { + skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX | SKB_ALLOC_NAPI, +diff --git a/net/core/sock_map.c b/net/core/sock_map.c +index f1b9b3958792cd..82a14f131d00c6 100644 +--- a/net/core/sock_map.c ++++ b/net/core/sock_map.c +@@ -303,7 +303,10 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk) + + write_lock_bh(&sk->sk_callback_lock); + if (stream_parser && stream_verdict && !psock->saved_data_ready) { +- ret = sk_psock_init_strp(sk, psock); ++ if (sk_is_tcp(sk)) ++ ret = sk_psock_init_strp(sk, psock); ++ else ++ ret = -EOPNOTSUPP; + if (ret) { + write_unlock_bh(&sk->sk_callback_lock); + sk_psock_put(sk, psock); +@@ -541,6 +544,9 @@ static bool sock_map_sk_state_allowed(const struct sock *sk) + return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_LISTEN); + if (sk_is_stream_unix(sk)) + return (1 << sk->sk_state) & TCPF_ESTABLISHED; ++ if (sk_is_vsock(sk) && ++ (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET)) ++ return (1 << sk->sk_state) & TCPF_ESTABLISHED; + return true; + } + +diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c +index f23a1ec6694cb2..814300eee39de1 100644 +--- a/net/ipv4/arp.c ++++ b/net/ipv4/arp.c +@@ -1077,7 +1077,7 @@ static int arp_req_set_public(struct net *net, struct arpreq *r, + __be32 mask = ((struct sockaddr_in *)&r->arp_netmask)->sin_addr.s_addr; + + if (!dev && (r->arp_flags & ATF_COM)) { +- dev = dev_getbyhwaddr_rcu(net, r->arp_ha.sa_family, ++ dev = dev_getbyhwaddr(net, r->arp_ha.sa_family, + r->arp_ha.sa_data); + if (!dev) + return -ENODEV; +diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c +index 0f523cbfe329ef..32b28fc21b63c0 100644 +--- a/net/ipv4/tcp_fastopen.c ++++ b/net/ipv4/tcp_fastopen.c +@@ -178,7 +178,7 @@ void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb) + if (!skb) + return; + +- skb_dst_drop(skb); ++ tcp_cleanup_skb(skb); + /* segs_in has been initialized to 1 in tcp_create_openreq_child(). + * Hence, reset segs_in to 0 before calling tcp_segs_in() + * to avoid double counting. Also, tcp_segs_in() expects +@@ -195,7 +195,7 @@ void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb) + TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_SYN; + + tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; +- __skb_queue_tail(&sk->sk_receive_queue, skb); ++ tcp_add_receive_queue(sk, skb); + tp->syn_data_acked = 1; + + /* u64_stats_update_begin(&tp->syncp) not needed here, +diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c +index 4811727b8a0225..0ee22e10fcfae7 100644 +--- a/net/ipv4/tcp_input.c ++++ b/net/ipv4/tcp_input.c +@@ -243,9 +243,15 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) + do_div(val, skb->truesize); + tcp_sk(sk)->scaling_ratio = val ? val : 1; + +- if (old_ratio != tcp_sk(sk)->scaling_ratio) +- WRITE_ONCE(tcp_sk(sk)->window_clamp, +- tcp_win_from_space(sk, sk->sk_rcvbuf)); ++ if (old_ratio != tcp_sk(sk)->scaling_ratio) { ++ struct tcp_sock *tp = tcp_sk(sk); ++ ++ val = tcp_win_from_space(sk, sk->sk_rcvbuf); ++ tcp_set_window_clamp(sk, val); ++ ++ if (tp->window_clamp < tp->rcvq_space.space) ++ tp->rcvq_space.space = tp->window_clamp; ++ } + } + icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, + tcp_sk(sk)->advmss); +@@ -4964,7 +4970,7 @@ static void tcp_ofo_queue(struct sock *sk) + tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); + fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN; + if (!eaten) +- __skb_queue_tail(&sk->sk_receive_queue, skb); ++ tcp_add_receive_queue(sk, skb); + else + kfree_skb_partial(skb, fragstolen); + +@@ -5156,7 +5162,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, + skb, fragstolen)) ? 1 : 0; + tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq); + if (!eaten) { +- __skb_queue_tail(&sk->sk_receive_queue, skb); ++ tcp_add_receive_queue(sk, skb); + skb_set_owner_r(skb, sk); + } + return eaten; +@@ -5239,7 +5245,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) + __kfree_skb(skb); + return; + } +- skb_dst_drop(skb); ++ tcp_cleanup_skb(skb); + __skb_pull(skb, tcp_hdr(skb)->doff * 4); + + reason = SKB_DROP_REASON_NOT_SPECIFIED; +@@ -6208,7 +6214,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPHITS); + + /* Bulk data transfer: receiver */ +- skb_dst_drop(skb); ++ tcp_cleanup_skb(skb); + __skb_pull(skb, tcp_header_len); + eaten = tcp_queue_rcv(sk, skb, &fragstolen); + +diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c +index c26f6c4b7bb4a3..96d68f9b1bb9de 100644 +--- a/net/ipv4/tcp_ipv4.c ++++ b/net/ipv4/tcp_ipv4.c +@@ -2025,7 +2025,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb, + */ + skb_condense(skb); + +- skb_dst_drop(skb); ++ tcp_cleanup_skb(skb); + + if (unlikely(tcp_checksum_complete(skb))) { + bh_unlock_sock(sk); +diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c +index 8e47e5355be613..4f648af8cfaafe 100644 +--- a/net/sched/cls_api.c ++++ b/net/sched/cls_api.c +@@ -97,7 +97,7 @@ tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp, + + err = xa_alloc_cyclic(&tcf_exts_miss_cookies_xa, &n->miss_cookie_base, + n, xa_limit_32b, &next, GFP_KERNEL); +- if (err) ++ if (err < 0) + goto err_xa_alloc; + + exts->miss_cookie_node = n; +diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c +index 53a081d49d28ac..7e3db87ae4333c 100644 +--- a/net/vmw_vsock/af_vsock.c ++++ b/net/vmw_vsock/af_vsock.c +@@ -1189,6 +1189,9 @@ static int vsock_read_skb(struct sock *sk, skb_read_actor_t read_actor) + { + struct vsock_sock *vsk = vsock_sk(sk); + ++ if (WARN_ON_ONCE(!vsk->transport)) ++ return -ENODEV; ++ + return vsk->transport->read_skb(vsk, read_actor); + } + +diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c +index b58c3818f284f1..f0e48e6911fc46 100644 +--- a/net/vmw_vsock/virtio_transport.c ++++ b/net/vmw_vsock/virtio_transport.c +@@ -670,6 +670,13 @@ static int virtio_vsock_vqs_init(struct virtio_vsock *vsock) + }; + int ret; + ++ mutex_lock(&vsock->rx_lock); ++ vsock->rx_buf_nr = 0; ++ vsock->rx_buf_max_nr = 0; ++ mutex_unlock(&vsock->rx_lock); ++ ++ atomic_set(&vsock->queued_replies, 0); ++ + ret = virtio_find_vqs(vdev, VSOCK_VQ_MAX, vsock->vqs, vqs_info, NULL); + if (ret < 0) + return ret; +@@ -779,9 +786,6 @@ static int virtio_vsock_probe(struct virtio_device *vdev) + + vsock->vdev = vdev; + +- vsock->rx_buf_nr = 0; +- vsock->rx_buf_max_nr = 0; +- atomic_set(&vsock->queued_replies, 0); + + mutex_init(&vsock->tx_lock); + mutex_init(&vsock->rx_lock); +diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c +index f201d9eca1df2f..07b96d56f3a577 100644 +--- a/net/vmw_vsock/vsock_bpf.c ++++ b/net/vmw_vsock/vsock_bpf.c +@@ -87,7 +87,7 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, + lock_sock(sk); + vsk = vsock_sk(sk); + +- if (!vsk->transport) { ++ if (WARN_ON_ONCE(!vsk->transport)) { + copied = -ENODEV; + goto out; + } +diff --git a/rust/ffi.rs b/rust/ffi.rs +index be153c4d551b24..584f75b49862b3 100644 +--- a/rust/ffi.rs ++++ b/rust/ffi.rs +@@ -10,4 +10,39 @@ + + #![no_std] + +-pub use core::ffi::*; ++macro_rules! alias { ++ ($($name:ident = $ty:ty;)*) => {$( ++ #[allow(non_camel_case_types, missing_docs)] ++ pub type $name = $ty; ++ ++ // Check size compatibility with `core`. ++ const _: () = assert!( ++ core::mem::size_of::<$name>() == core::mem::size_of::() ++ ); ++ )*} ++} ++ ++alias! { ++ // `core::ffi::c_char` is either `i8` or `u8` depending on architecture. In the kernel, we use ++ // `-funsigned-char` so it's always mapped to `u8`. ++ c_char = u8; ++ ++ c_schar = i8; ++ c_uchar = u8; ++ ++ c_short = i16; ++ c_ushort = u16; ++ ++ c_int = i32; ++ c_uint = u32; ++ ++ // In the kernel, `intptr_t` is defined to be `long` in all platforms, so we can map the type to ++ // `isize`. ++ c_long = isize; ++ c_ulong = usize; ++ ++ c_longlong = i64; ++ c_ulonglong = u64; ++} ++ ++pub use core::ffi::c_void; +diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs +index c926e0c2b8528c..d5e6a19ff6b7ba 100644 +--- a/rust/kernel/device.rs ++++ b/rust/kernel/device.rs +@@ -173,10 +173,10 @@ unsafe fn printk(&self, klevel: &[u8], msg: fmt::Arguments<'_>) { + #[cfg(CONFIG_PRINTK)] + unsafe { + bindings::_dev_printk( +- klevel as *const _ as *const core::ffi::c_char, ++ klevel as *const _ as *const crate::ffi::c_char, + self.as_raw(), + c_str!("%pA").as_char_ptr(), +- &msg as *const _ as *const core::ffi::c_void, ++ &msg as *const _ as *const crate::ffi::c_void, + ) + }; + } +diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs +index 52c5024324474f..5fece574ec023b 100644 +--- a/rust/kernel/error.rs ++++ b/rust/kernel/error.rs +@@ -153,11 +153,8 @@ pub(crate) fn to_blk_status(self) -> bindings::blk_status_t { + + /// Returns the error encoded as a pointer. + pub fn to_ptr(self) -> *mut T { +- #[cfg_attr(target_pointer_width = "32", allow(clippy::useless_conversion))] + // SAFETY: `self.0` is a valid error due to its invariant. +- unsafe { +- bindings::ERR_PTR(self.0.get().into()) as *mut _ +- } ++ unsafe { bindings::ERR_PTR(self.0.get() as _) as *mut _ } + } + + /// Returns a string representing the error, if one exists. +diff --git a/rust/kernel/firmware.rs b/rust/kernel/firmware.rs +index 13a374a5cdb743..c5162fdc95ff05 100644 +--- a/rust/kernel/firmware.rs ++++ b/rust/kernel/firmware.rs +@@ -12,7 +12,7 @@ + /// One of the following: `bindings::request_firmware`, `bindings::firmware_request_nowarn`, + /// `bindings::firmware_request_platform`, `bindings::request_firmware_direct`. + struct FwFunc( +- unsafe extern "C" fn(*mut *const bindings::firmware, *const i8, *mut bindings::device) -> i32, ++ unsafe extern "C" fn(*mut *const bindings::firmware, *const u8, *mut bindings::device) -> i32, + ); + + impl FwFunc { +diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs +index 7e2a79b3ae2636..8f88891fb1d20f 100644 +--- a/rust/kernel/miscdevice.rs ++++ b/rust/kernel/miscdevice.rs +@@ -11,16 +11,12 @@ + use crate::{ + bindings, + error::{to_result, Error, Result, VTABLE_DEFAULT_ERROR}, ++ ffi::{c_int, c_long, c_uint, c_ulong}, + prelude::*, + str::CStr, + types::{ForeignOwnable, Opaque}, + }; +-use core::{ +- ffi::{c_int, c_long, c_uint, c_ulong}, +- marker::PhantomData, +- mem::MaybeUninit, +- pin::Pin, +-}; ++use core::{marker::PhantomData, mem::MaybeUninit, pin::Pin}; + + /// Options for creating a misc device. + #[derive(Copy, Clone)] +@@ -229,7 +225,7 @@ impl VtableHelper { + // SAFETY: Ioctl calls can borrow the private data of the file. + let device = unsafe { ::borrow(private) }; + +- match T::ioctl(device, cmd, arg as usize) { ++ match T::ioctl(device, cmd, arg) { + Ok(ret) => ret as c_long, + Err(err) => err.to_errno() as c_long, + } +@@ -249,7 +245,7 @@ impl VtableHelper { + // SAFETY: Ioctl calls can borrow the private data of the file. + let device = unsafe { ::borrow(private) }; + +- match T::compat_ioctl(device, cmd, arg as usize) { ++ match T::compat_ioctl(device, cmd, arg) { + Ok(ret) => ret as c_long, + Err(err) => err.to_errno() as c_long, + } +diff --git a/rust/kernel/print.rs b/rust/kernel/print.rs +index a28077a7cb3011..b19ee490be58fd 100644 +--- a/rust/kernel/print.rs ++++ b/rust/kernel/print.rs +@@ -107,7 +107,7 @@ pub unsafe fn call_printk( + // SAFETY: TODO. + unsafe { + bindings::_printk( +- format_string.as_ptr() as _, ++ format_string.as_ptr(), + module_name.as_ptr(), + &args as *const _ as *const c_void, + ); +@@ -128,7 +128,7 @@ pub fn call_printk_cont(args: fmt::Arguments<'_>) { + #[cfg(CONFIG_PRINTK)] + unsafe { + bindings::_printk( +- format_strings::CONT.as_ptr() as _, ++ format_strings::CONT.as_ptr(), + &args as *const _ as *const c_void, + ); + } +diff --git a/rust/kernel/security.rs b/rust/kernel/security.rs +index 2522868862a1bf..ea4c58c8170336 100644 +--- a/rust/kernel/security.rs ++++ b/rust/kernel/security.rs +@@ -19,7 +19,7 @@ + /// successful call to `security_secid_to_secctx`, that has not yet been destroyed by calling + /// `security_release_secctx`. + pub struct SecurityCtx { +- secdata: *mut core::ffi::c_char, ++ secdata: *mut crate::ffi::c_char, + seclen: usize, + } + +diff --git a/rust/kernel/seq_file.rs b/rust/kernel/seq_file.rs +index 6ca29d576d029d..04947c6729792b 100644 +--- a/rust/kernel/seq_file.rs ++++ b/rust/kernel/seq_file.rs +@@ -36,7 +36,7 @@ pub fn call_printf(&self, args: core::fmt::Arguments<'_>) { + bindings::seq_printf( + self.inner.get(), + c_str!("%pA").as_char_ptr(), +- &args as *const _ as *const core::ffi::c_void, ++ &args as *const _ as *const crate::ffi::c_void, + ); + } + } +diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs +index d04c12a1426d1c..0f2765463dc840 100644 +--- a/rust/kernel/str.rs ++++ b/rust/kernel/str.rs +@@ -189,7 +189,7 @@ pub unsafe fn from_char_ptr<'a>(ptr: *const crate::ffi::c_char) -> &'a Self { + // to a `NUL`-terminated C string. + let len = unsafe { bindings::strlen(ptr) } + 1; + // SAFETY: Lifetime guaranteed by the safety precondition. +- let bytes = unsafe { core::slice::from_raw_parts(ptr as _, len as _) }; ++ let bytes = unsafe { core::slice::from_raw_parts(ptr as _, len) }; + // SAFETY: As `len` is returned by `strlen`, `bytes` does not contain interior `NUL`. + // As we have added 1 to `len`, the last byte is known to be `NUL`. + unsafe { Self::from_bytes_with_nul_unchecked(bytes) } +@@ -248,7 +248,7 @@ pub unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr { + /// Returns a C pointer to the string. + #[inline] + pub const fn as_char_ptr(&self) -> *const crate::ffi::c_char { +- self.0.as_ptr() as _ ++ self.0.as_ptr() + } + + /// Convert the string to a byte slice without the trailing `NUL` byte. +@@ -838,7 +838,7 @@ pub fn try_from_fmt(args: fmt::Arguments<'_>) -> Result { + // SAFETY: The buffer is valid for read because `f.bytes_written()` is bounded by `size` + // (which the minimum buffer size) and is non-zero (we wrote at least the `NUL` terminator) + // so `f.bytes_written() - 1` doesn't underflow. +- let ptr = unsafe { bindings::memchr(buf.as_ptr().cast(), 0, (f.bytes_written() - 1) as _) }; ++ let ptr = unsafe { bindings::memchr(buf.as_ptr().cast(), 0, f.bytes_written() - 1) }; + if !ptr.is_null() { + return Err(EINVAL); + } +diff --git a/rust/kernel/uaccess.rs b/rust/kernel/uaccess.rs +index 05b0b8d13b10da..cc044924867b89 100644 +--- a/rust/kernel/uaccess.rs ++++ b/rust/kernel/uaccess.rs +@@ -8,7 +8,7 @@ + alloc::Flags, + bindings, + error::Result, +- ffi::{c_ulong, c_void}, ++ ffi::c_void, + prelude::*, + transmute::{AsBytes, FromBytes}, + }; +@@ -224,13 +224,9 @@ pub fn read_raw(&mut self, out: &mut [MaybeUninit]) -> Result { + if len > self.length { + return Err(EFAULT); + } +- let Ok(len_ulong) = c_ulong::try_from(len) else { +- return Err(EFAULT); +- }; +- // SAFETY: `out_ptr` points into a mutable slice of length `len_ulong`, so we may write ++ // SAFETY: `out_ptr` points into a mutable slice of length `len`, so we may write + // that many bytes to it. +- let res = +- unsafe { bindings::copy_from_user(out_ptr, self.ptr as *const c_void, len_ulong) }; ++ let res = unsafe { bindings::copy_from_user(out_ptr, self.ptr as *const c_void, len) }; + if res != 0 { + return Err(EFAULT); + } +@@ -259,9 +255,6 @@ pub fn read(&mut self) -> Result { + if len > self.length { + return Err(EFAULT); + } +- let Ok(len_ulong) = c_ulong::try_from(len) else { +- return Err(EFAULT); +- }; + let mut out: MaybeUninit = MaybeUninit::uninit(); + // SAFETY: The local variable `out` is valid for writing `size_of::()` bytes. + // +@@ -272,7 +265,7 @@ pub fn read(&mut self) -> Result { + bindings::_copy_from_user( + out.as_mut_ptr().cast::(), + self.ptr as *const c_void, +- len_ulong, ++ len, + ) + }; + if res != 0 { +@@ -335,12 +328,9 @@ pub fn write_slice(&mut self, data: &[u8]) -> Result { + if len > self.length { + return Err(EFAULT); + } +- let Ok(len_ulong) = c_ulong::try_from(len) else { +- return Err(EFAULT); +- }; +- // SAFETY: `data_ptr` points into an immutable slice of length `len_ulong`, so we may read ++ // SAFETY: `data_ptr` points into an immutable slice of length `len`, so we may read + // that many bytes from it. +- let res = unsafe { bindings::copy_to_user(self.ptr as *mut c_void, data_ptr, len_ulong) }; ++ let res = unsafe { bindings::copy_to_user(self.ptr as *mut c_void, data_ptr, len) }; + if res != 0 { + return Err(EFAULT); + } +@@ -359,9 +349,6 @@ pub fn write(&mut self, value: &T) -> Result { + if len > self.length { + return Err(EFAULT); + } +- let Ok(len_ulong) = c_ulong::try_from(len) else { +- return Err(EFAULT); +- }; + // SAFETY: The reference points to a value of type `T`, so it is valid for reading + // `size_of::()` bytes. + // +@@ -372,7 +359,7 @@ pub fn write(&mut self, value: &T) -> Result { + bindings::_copy_to_user( + self.ptr as *mut c_void, + (value as *const T).cast::(), +- len_ulong, ++ len, + ) + }; + if res != 0 { +diff --git a/samples/rust/rust_print_main.rs b/samples/rust/rust_print_main.rs +index aed90a6feecfa7..7935b4772ec6ce 100644 +--- a/samples/rust/rust_print_main.rs ++++ b/samples/rust/rust_print_main.rs +@@ -83,7 +83,7 @@ fn drop(&mut self) { + } + + mod trace { +- use core::ffi::c_int; ++ use kernel::ffi::c_int; + + kernel::declare_trace! { + /// # Safety +diff --git a/sound/core/seq/seq_clientmgr.c b/sound/core/seq/seq_clientmgr.c +index 77b6ac9b5c11bc..9955c4d54e42a7 100644 +--- a/sound/core/seq/seq_clientmgr.c ++++ b/sound/core/seq/seq_clientmgr.c +@@ -678,12 +678,18 @@ static int snd_seq_deliver_single_event(struct snd_seq_client *client, + dest_port->time_real); + + #if IS_ENABLED(CONFIG_SND_SEQ_UMP) +- if (!(dest->filter & SNDRV_SEQ_FILTER_NO_CONVERT)) { +- if (snd_seq_ev_is_ump(event)) { ++ if (snd_seq_ev_is_ump(event)) { ++ if (!(dest->filter & SNDRV_SEQ_FILTER_NO_CONVERT)) { + result = snd_seq_deliver_from_ump(client, dest, dest_port, + event, atomic, hop); + goto __skip; +- } else if (snd_seq_client_is_ump(dest)) { ++ } else if (dest->type == USER_CLIENT && ++ !snd_seq_client_is_ump(dest)) { ++ result = 0; // drop the event ++ goto __skip; ++ } ++ } else if (snd_seq_client_is_ump(dest)) { ++ if (!(dest->filter & SNDRV_SEQ_FILTER_NO_CONVERT)) { + result = snd_seq_deliver_to_ump(client, dest, dest_port, + event, atomic, hop); + goto __skip; +diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c +index 14763c0f31ad9f..46a2204049993d 100644 +--- a/sound/pci/hda/hda_codec.c ++++ b/sound/pci/hda/hda_codec.c +@@ -2470,7 +2470,9 @@ int snd_hda_create_dig_out_ctls(struct hda_codec *codec, + break; + id = kctl->id; + id.index = spdif_index; +- snd_ctl_rename_id(codec->card, &kctl->id, &id); ++ err = snd_ctl_rename_id(codec->card, &kctl->id, &id); ++ if (err < 0) ++ return err; + } + bus->primary_dig_out_type = HDA_PCM_TYPE_HDMI; + } +diff --git a/sound/pci/hda/patch_conexant.c b/sound/pci/hda/patch_conexant.c +index 538c37a78a56f7..84ab357b840d67 100644 +--- a/sound/pci/hda/patch_conexant.c ++++ b/sound/pci/hda/patch_conexant.c +@@ -1080,6 +1080,7 @@ static const struct hda_quirk cxt5066_fixups[] = { + SND_PCI_QUIRK(0x103c, 0x814f, "HP ZBook 15u G3", CXT_FIXUP_MUTE_LED_GPIO), + SND_PCI_QUIRK(0x103c, 0x8174, "HP Spectre x360", CXT_FIXUP_HP_SPECTRE), + SND_PCI_QUIRK(0x103c, 0x822e, "HP ProBook 440 G4", CXT_FIXUP_MUTE_LED_GPIO), ++ SND_PCI_QUIRK(0x103c, 0x8231, "HP ProBook 450 G4", CXT_FIXUP_MUTE_LED_GPIO), + SND_PCI_QUIRK(0x103c, 0x828c, "HP EliteBook 840 G4", CXT_FIXUP_HP_DOCK), + SND_PCI_QUIRK(0x103c, 0x8299, "HP 800 G3 SFF", CXT_FIXUP_HP_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x103c, 0x829a, "HP 800 G3 DM", CXT_FIXUP_HP_MIC_NO_PRESENCE), +diff --git a/sound/pci/hda/patch_cs8409-tables.c b/sound/pci/hda/patch_cs8409-tables.c +index 759f48038273df..621f947e38174d 100644 +--- a/sound/pci/hda/patch_cs8409-tables.c ++++ b/sound/pci/hda/patch_cs8409-tables.c +@@ -121,7 +121,7 @@ static const struct cs8409_i2c_param cs42l42_init_reg_seq[] = { + { CS42L42_MIXER_CHA_VOL, 0x3F }, + { CS42L42_MIXER_CHB_VOL, 0x3F }, + { CS42L42_MIXER_ADC_VOL, 0x3f }, +- { CS42L42_HP_CTL, 0x03 }, ++ { CS42L42_HP_CTL, 0x0D }, + { CS42L42_MIC_DET_CTL1, 0xB6 }, + { CS42L42_TIPSENSE_CTL, 0xC2 }, + { CS42L42_HS_CLAMP_DISABLE, 0x01 }, +@@ -315,7 +315,7 @@ static const struct cs8409_i2c_param dolphin_c0_init_reg_seq[] = { + { CS42L42_ASP_TX_SZ_EN, 0x01 }, + { CS42L42_PWR_CTL1, 0x0A }, + { CS42L42_PWR_CTL2, 0x84 }, +- { CS42L42_HP_CTL, 0x03 }, ++ { CS42L42_HP_CTL, 0x0D }, + { CS42L42_MIXER_CHA_VOL, 0x3F }, + { CS42L42_MIXER_CHB_VOL, 0x3F }, + { CS42L42_MIXER_ADC_VOL, 0x3f }, +@@ -371,7 +371,7 @@ static const struct cs8409_i2c_param dolphin_c1_init_reg_seq[] = { + { CS42L42_ASP_TX_SZ_EN, 0x00 }, + { CS42L42_PWR_CTL1, 0x0E }, + { CS42L42_PWR_CTL2, 0x84 }, +- { CS42L42_HP_CTL, 0x01 }, ++ { CS42L42_HP_CTL, 0x0D }, + { CS42L42_MIXER_CHA_VOL, 0x3F }, + { CS42L42_MIXER_CHB_VOL, 0x3F }, + { CS42L42_MIXER_ADC_VOL, 0x3f }, +diff --git a/sound/pci/hda/patch_cs8409.c b/sound/pci/hda/patch_cs8409.c +index 614327218634c0..b760332a4e3577 100644 +--- a/sound/pci/hda/patch_cs8409.c ++++ b/sound/pci/hda/patch_cs8409.c +@@ -876,7 +876,7 @@ static void cs42l42_resume(struct sub_codec *cs42l42) + { CS42L42_DET_INT_STATUS2, 0x00 }, + { CS42L42_TSRS_PLUG_STATUS, 0x00 }, + }; +- int fsv_old, fsv_new; ++ unsigned int fsv; + + /* Bring CS42L42 out of Reset */ + spec->gpio_data = snd_hda_codec_read(codec, CS8409_PIN_AFG, 0, AC_VERB_GET_GPIO_DATA, 0); +@@ -893,13 +893,15 @@ static void cs42l42_resume(struct sub_codec *cs42l42) + /* Clear interrupts, by reading interrupt status registers */ + cs8409_i2c_bulk_read(cs42l42, irq_regs, ARRAY_SIZE(irq_regs)); + +- fsv_old = cs8409_i2c_read(cs42l42, CS42L42_HP_CTL); +- if (cs42l42->full_scale_vol == CS42L42_FULL_SCALE_VOL_0DB) +- fsv_new = fsv_old & ~CS42L42_FULL_SCALE_VOL_MASK; +- else +- fsv_new = fsv_old & CS42L42_FULL_SCALE_VOL_MASK; +- if (fsv_new != fsv_old) +- cs8409_i2c_write(cs42l42, CS42L42_HP_CTL, fsv_new); ++ fsv = cs8409_i2c_read(cs42l42, CS42L42_HP_CTL); ++ if (cs42l42->full_scale_vol) { ++ // Set the full scale volume bit ++ fsv |= CS42L42_FULL_SCALE_VOL_MASK; ++ cs8409_i2c_write(cs42l42, CS42L42_HP_CTL, fsv); ++ } ++ // Unmute analog channels A and B ++ fsv = (fsv & ~CS42L42_ANA_MUTE_AB); ++ cs8409_i2c_write(cs42l42, CS42L42_HP_CTL, fsv); + + /* we have to explicitly allow unsol event handling even during the + * resume phase so that the jack event is processed properly +@@ -920,7 +922,7 @@ static void cs42l42_suspend(struct sub_codec *cs42l42) + { CS42L42_MIXER_CHA_VOL, 0x3F }, + { CS42L42_MIXER_ADC_VOL, 0x3F }, + { CS42L42_MIXER_CHB_VOL, 0x3F }, +- { CS42L42_HP_CTL, 0x0F }, ++ { CS42L42_HP_CTL, 0x0D }, + { CS42L42_ASP_RX_DAI0_EN, 0x00 }, + { CS42L42_ASP_CLK_CFG, 0x00 }, + { CS42L42_PWR_CTL1, 0xFE }, +diff --git a/sound/pci/hda/patch_cs8409.h b/sound/pci/hda/patch_cs8409.h +index 5e48115caf096b..14645d25e70fd2 100644 +--- a/sound/pci/hda/patch_cs8409.h ++++ b/sound/pci/hda/patch_cs8409.h +@@ -230,9 +230,10 @@ enum cs8409_coefficient_index_registers { + #define CS42L42_PDN_TIMEOUT_US (250000) + #define CS42L42_PDN_SLEEP_US (2000) + #define CS42L42_INIT_TIMEOUT_MS (45) ++#define CS42L42_ANA_MUTE_AB (0x0C) + #define CS42L42_FULL_SCALE_VOL_MASK (2) +-#define CS42L42_FULL_SCALE_VOL_0DB (1) +-#define CS42L42_FULL_SCALE_VOL_MINUS6DB (0) ++#define CS42L42_FULL_SCALE_VOL_0DB (0) ++#define CS42L42_FULL_SCALE_VOL_MINUS6DB (1) + + /* Dell BULLSEYE / WARLOCK / CYBORG Specific Definitions */ + +diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c +index 6c352602987bac..ffe3de617d5ddb 100644 +--- a/sound/pci/hda/patch_realtek.c ++++ b/sound/pci/hda/patch_realtek.c +@@ -3790,6 +3790,7 @@ static void alc225_init(struct hda_codec *codec) + AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE); + + msleep(75); ++ alc_update_coef_idx(codec, 0x4a, 3 << 10, 0); + alc_update_coefex_idx(codec, 0x57, 0x04, 0x0007, 0x4); /* Hight power */ + } + } +diff --git a/sound/soc/fsl/fsl_micfil.c b/sound/soc/fsl/fsl_micfil.c +index 8c15389c9a04bc..5585f4c8f455a5 100644 +--- a/sound/soc/fsl/fsl_micfil.c ++++ b/sound/soc/fsl/fsl_micfil.c +@@ -157,6 +157,8 @@ static int micfil_set_quality(struct fsl_micfil *micfil) + case QUALITY_VLOW2: + qsel = MICFIL_QSEL_VLOW2_QUALITY; + break; ++ default: ++ return -EINVAL; + } + + return regmap_update_bits(micfil->regmap, REG_MICFIL_CTRL2, +diff --git a/sound/soc/fsl/imx-audmix.c b/sound/soc/fsl/imx-audmix.c +index 231400661c9060..50ecc5f51100ee 100644 +--- a/sound/soc/fsl/imx-audmix.c ++++ b/sound/soc/fsl/imx-audmix.c +@@ -23,7 +23,6 @@ struct imx_audmix { + struct snd_soc_card card; + struct platform_device *audmix_pdev; + struct platform_device *out_pdev; +- struct clk *cpu_mclk; + int num_dai; + struct snd_soc_dai_link *dai; + int num_dai_conf; +@@ -32,34 +31,11 @@ struct imx_audmix { + struct snd_soc_dapm_route *dapm_routes; + }; + +-static const u32 imx_audmix_rates[] = { +- 8000, 12000, 16000, 24000, 32000, 48000, 64000, 96000, +-}; +- +-static const struct snd_pcm_hw_constraint_list imx_audmix_rate_constraints = { +- .count = ARRAY_SIZE(imx_audmix_rates), +- .list = imx_audmix_rates, +-}; +- + static int imx_audmix_fe_startup(struct snd_pcm_substream *substream) + { +- struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream); +- struct imx_audmix *priv = snd_soc_card_get_drvdata(rtd->card); + struct snd_pcm_runtime *runtime = substream->runtime; +- struct device *dev = rtd->card->dev; +- unsigned long clk_rate = clk_get_rate(priv->cpu_mclk); + int ret; + +- if (clk_rate % 24576000 == 0) { +- ret = snd_pcm_hw_constraint_list(runtime, 0, +- SNDRV_PCM_HW_PARAM_RATE, +- &imx_audmix_rate_constraints); +- if (ret < 0) +- return ret; +- } else { +- dev_warn(dev, "mclk may be not supported %lu\n", clk_rate); +- } +- + ret = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_CHANNELS, + 1, 8); + if (ret < 0) +@@ -323,13 +299,6 @@ static int imx_audmix_probe(struct platform_device *pdev) + } + put_device(&cpu_pdev->dev); + +- priv->cpu_mclk = devm_clk_get(&cpu_pdev->dev, "mclk1"); +- if (IS_ERR(priv->cpu_mclk)) { +- ret = PTR_ERR(priv->cpu_mclk); +- dev_err(&cpu_pdev->dev, "failed to get DAI mclk1: %d\n", ret); +- return ret; +- } +- + priv->audmix_pdev = audmix_pdev; + priv->out_pdev = cpu_pdev; + +diff --git a/sound/soc/rockchip/rockchip_i2s_tdm.c b/sound/soc/rockchip/rockchip_i2s_tdm.c +index acd75e48851fcf..7feefeb6b876dc 100644 +--- a/sound/soc/rockchip/rockchip_i2s_tdm.c ++++ b/sound/soc/rockchip/rockchip_i2s_tdm.c +@@ -451,11 +451,11 @@ static int rockchip_i2s_tdm_set_fmt(struct snd_soc_dai *cpu_dai, + break; + case SND_SOC_DAIFMT_DSP_A: + val = I2S_TXCR_TFS_TDM_PCM; +- tdm_val = TDM_SHIFT_CTRL(0); ++ tdm_val = TDM_SHIFT_CTRL(2); + break; + case SND_SOC_DAIFMT_DSP_B: + val = I2S_TXCR_TFS_TDM_PCM; +- tdm_val = TDM_SHIFT_CTRL(2); ++ tdm_val = TDM_SHIFT_CTRL(4); + break; + default: + ret = -EINVAL; +diff --git a/sound/soc/sof/ipc4-topology.c b/sound/soc/sof/ipc4-topology.c +index b55eb977e443d4..70b7bfb080f473 100644 +--- a/sound/soc/sof/ipc4-topology.c ++++ b/sound/soc/sof/ipc4-topology.c +@@ -765,10 +765,16 @@ static int sof_ipc4_widget_setup_comp_dai(struct snd_sof_widget *swidget) + } + + list_for_each_entry(w, &sdev->widget_list, list) { +- if (w->widget->sname && ++ struct snd_sof_dai *alh_dai; ++ ++ if (!WIDGET_IS_DAI(w->id) || !w->widget->sname || + strcmp(w->widget->sname, swidget->widget->sname)) + continue; + ++ alh_dai = w->private; ++ if (alh_dai->type != SOF_DAI_INTEL_ALH) ++ continue; ++ + blob->alh_cfg.device_count++; + } + +@@ -2061,11 +2067,13 @@ sof_ipc4_prepare_copier_module(struct snd_sof_widget *swidget, + list_for_each_entry(w, &sdev->widget_list, list) { + u32 node_type; + +- if (w->widget->sname && ++ if (!WIDGET_IS_DAI(w->id) || !w->widget->sname || + strcmp(w->widget->sname, swidget->widget->sname)) + continue; + + dai = w->private; ++ if (dai->type != SOF_DAI_INTEL_ALH) ++ continue; + alh_copier = (struct sof_ipc4_copier *)dai->private; + alh_data = &alh_copier->data; + node_type = SOF_IPC4_GET_NODE_TYPE(alh_data->gtw_cfg.node_id); +diff --git a/sound/soc/sof/pcm.c b/sound/soc/sof/pcm.c +index 35a7462d8b6938..c5c6353f18ceef 100644 +--- a/sound/soc/sof/pcm.c ++++ b/sound/soc/sof/pcm.c +@@ -511,6 +511,8 @@ static int sof_pcm_close(struct snd_soc_component *component, + */ + } + ++ spcm->stream[substream->stream].substream = NULL; ++ + return 0; + } + +diff --git a/sound/soc/sof/stream-ipc.c b/sound/soc/sof/stream-ipc.c +index 794c7bbccbaf92..8262443ac89ad1 100644 +--- a/sound/soc/sof/stream-ipc.c ++++ b/sound/soc/sof/stream-ipc.c +@@ -43,7 +43,7 @@ int sof_ipc_msg_data(struct snd_sof_dev *sdev, + return -ESTRPIPE; + + posn_offset = stream->posn_offset; +- } else { ++ } else if (sps->cstream) { + + struct sof_compr_stream *sstream = sps->cstream->runtime->private_data; + +@@ -51,6 +51,10 @@ int sof_ipc_msg_data(struct snd_sof_dev *sdev, + return -ESTRPIPE; + + posn_offset = sstream->posn_offset; ++ ++ } else { ++ dev_err(sdev->dev, "%s: No stream opened\n", __func__); ++ return -EINVAL; + } + + snd_sof_dsp_mailbox_read(sdev, posn_offset, p, sz);