This series reshuffles things around, moving the management of SMD RPM
bus clocks to the interconnect framework where they belong. This helps
us solve a couple of issues:
1. We can work towards unused clk cleanup of RPMCC without worrying
about it killing some NoC bus, resulting in the SoC dying.
Deasserting actually unused RPM clocks (among other things) will
let us achieve "true SoC-wide power collapse states", also known as
VDD_LOW and VDD_MIN.
2. We no longer have to keep tons of quirky bus clock ifs in the icc
driver. You either have a RPM clock and call "rpm set rate" or you
have a single non-RPM clock (like AHB_CLK_SRC) or you don't have any.
3. There's less overhead - instead of going through layers and layers of
the CCF, ratesetting comes down to calling max() and sending a single
RPM message. ICC is very very dynamic so that's a big plus.
The clocks still need to be vaguely described in the clk-smd-rpm driver,
as it gives them an initial kickoff, before actually telling RPM to
enable DVFS scaling. After RPM receives that command, all clocks that
have not been assigned a rate are considered unused and are shut down
in hardware, leading to the same issue as described in point 1.
We can consider marking them __initconst in the future, but this series
is very fat even without that..
Apart from that, it squashes a couple of bugs that really need fixing..
The series is merged through a topic branch to manage the dependencies
between interconnect, Qualcomm clocks and Qualcomm SoC.
Up until now, we've been aggregating the bandwidth values and only
dividing them by the bus width of the source node. This was completely
wrong, as different nodes on a given path may (and usually do) have
varying bus widths. That in turn, resulted in the calculated clock rates
being completely bogus - usually they ended up being much higher, as
NoC_A<->NoC_B links are very wide.
Since we're not using the aggregate bandwidth value for anything other
than clock rate calculations, remodel qcom_icc_bus_aggregate() to
calculate the per-context clock rate for a given provider, taking into
account the bus width of every individual node.
Fixes: 30c8fa3ec6 ("interconnect: qcom: Add MSM8916 interconnect provider driver")
Reported-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Acked-by: Georgi Djakov <djakov@kernel.org>
Link: https://lore.kernel.org/r/20230526-topic-smd_icc-v7-22-09c78c175546@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Up until now, for some reason we've only been setting bandwidth values
on the active-only context. That pretty much meant that RPM could lift
all votes when entering sleep mode. Or never sleep at all.
That in turn could potentially break things like USB wakeup, as the
connection between APSS and SNoC/PNoC would simply be dead.
Set the values appropriately.
Fixes: 30c8fa3ec6 ("interconnect: qcom: Add MSM8916 interconnect provider driver")
Reviewed-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Acked-by: Georgi Djakov <djakov@kernel.org>
Link: https://lore.kernel.org/r/20230526-topic-smd_icc-v7-20-09c78c175546@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
SMD RPM only provides two buckets, one each for the active-only and
active-sleep RPM contexts. Use the correct constant to allocate and
operate on them.
This will make the qcom,icc.h header no longer work with this driver,
mostly because.. it was never meant to! The commit that introduced
bucket support to SMD RPM was trying to shove a square into a round
hole and it did not work out very well. That said, there are no
active users of SMD RPM ICC + qcom,icc.h, so that doesn't hurt.
Fixes: dcbce7b0a7 ("interconnect: qcom: icc-rpm: Support multiple buckets")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Acked-by: Georgi Djakov <djakov@kernel.org>
Link: https://lore.kernel.org/r/20230526-topic-smd_icc-v7-19-09c78c175546@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
The interconnect bus clocks are now handled within the ICC framework.
They still however need to get a kickstart *before* we call
clk_smd_rpm_enable_scaling(), or RPM will assume that they should all
be running at 0 kHz and the system will inevitably die.
Separate them out to ensure such a kickstart can still take place.
As a happy accident, the file got smaller:
Total: Before=41951, After=41555, chg -0.94%
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Georgi Djakov <djakov@kernel.org>
Link: https://lore.kernel.org/r/20230526-topic-smd_icc-v7-18-09c78c175546@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
The sole purpose of bus clocks that were previously registered with
rpmcc was to convey the aggregated bandwidth to RPM. There's no good
reason to keep them outside the interconnect framework, as it only
adds to the plentiful complexity.
Add the required code to handle these clocks from within SMD RPM ICC.
RPM-owned bus clocks are no longer considered a thing, but sadly we
have to allow for the existence of HLOS-owned bus clocks, as some
(mostly older) SoCs (ab)use these for bus scaling (e.g. MSM8998 and
&mmcc AHB_CLK_SRC).
This in turn is trivially solved with a single *clk, which is filled
and used iff qp.bus_clk_desc is absent and we have a "bus" clock-names
entry in the DT node.
This change should(tm) be fully compatible with all sorts of old
Device Trees as far as the interconnect functionality goes (modulo
abusing bus clock handles or wrongly using the qcom,icc.h binding,
but that's a mistake in and of itself).
Reviewed-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Acked-by: Georgi Djakov <djakov@kernel.org>
Link: https://lore.kernel.org/r/20230526-topic-smd_icc-v7-17-09c78c175546@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
The downstream kernel employs the concept of "keeping the bus alive"
by voting for the minimum (XO/19.2MHz) rate at all times on certain
(well, most) buses. This is a very important thing to have, as if we
either have a lackluster/wrong DT that doesn't specify a (high enough)
vote on a certain bus, we may lose access to the entire bus altogether.
This is very apparent when we only start introducing interconnect
support on a given platform and haven't yet introduced voting on all
peripherals.
The same can happen if we only have a single driver casting a vote on
a certain bus and that driver exits/crashes/suspends.
The keepalive vote is limited to the ACTIVE bucket, as keeping a
permanent vote on the SLEEP one could prevent the platform from properly
entering low power mode states.
Introduce the very same concept, with a slight twist: the vendor
kernel checks whether the rate is zero before setting the minimum
vote, but that's rather silly, as in doing so we're at the mercy
of CCF. Instead, explicitly clamp the rates to always be >= 19.2 MHz
for providers with keep_alive=true.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Acked-by: Georgi Djakov <djakov@kernel.org>
Link: https://lore.kernel.org/r/20230526-topic-smd_icc-v7-6-09c78c175546@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
icc_smd_rpm will do bus clock votes itself rather than taking the
unnecessary detour through the clock subsystem. However, it can only
do that after the clocks have been handed off and scaling has been
enabled in the RPM in clk-smd-rpm.
Move the icc_smd_rpm registration from smd-rpm.c to clk-smd-rpm.c
to avoid any possible races. icc_smd_rpm gets the driver data from
the smd-rpm device, so still register the platform device on the
smd-rpm parent device.
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
[Konrad: remove unrelated cleanups]
Acked-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Acked-by: Georgi Djakov <djakov@kernel.org>
Link: https://lore.kernel.org/r/20230526-topic-smd_icc-v7-5-09c78c175546@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
We just sorted the entries and fields last release, so just out of a
perverse sense of curiosity, I decided to see if we can keep things
ordered for even just one release.
The answer is "No. No we cannot".
I suggest that all kernel developers will need weekly training sessions,
involving a lot of Big Bird and Sesame Street. And at the yearly
maintainer summit, we will all sing the alphabet song together.
I doubt I will keep doing this. At some point "perverse sense of
curiosity" turns into just a cold dark place filled with sadness and
despair.
Repeats: 80e62bc848 ("MAINTAINERS: re-sort all entries and fields")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull dma-mapping fixes from Christoph Hellwig:
- swiotlb area sizing fixes (Petr Tesarik)
* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: reduce the number of areas to match actual memory pool size
swiotlb: always set the number of areas before allocating the pool
Pull irq update from Borislav Petkov:
- Optimize IRQ domain's name assignment
* tag 'irq_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqdomain: Use return value of strreplace()
Pull x86 fpu fix from Borislav Petkov:
- Do FPU AP initialization on Xen PV too which got missed by the recent
boot reordering work
* tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen: Fix secondary processors' FPU initialization
Pull x86 fix from Thomas Gleixner:
"A single fix for the mechanism to park CPUs with an INIT IPI.
On shutdown or kexec, the kernel tries to park the non-boot CPUs with
an INIT IPI. But the same code path is also used by the crash utility.
If the CPU which panics is not the boot CPU then it sends an INIT IPI
to the boot CPU which resets the machine.
Prevent this by validating that the CPU which runs the stop mechanism
is the boot CPU. If not, leave the other CPUs in HLT"
* tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Don't send INIT to boot CPU
Pull MIPS fixes from Thomas Bogendoerfer:
- fixes for KVM
- fix for loongson build and cpu probing
- DT fixes
* tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
MIPS: dts: add missing space before {
MIPS: Loongson: Fix build error when make modules_install
MIPS: KVM: Fix NULL pointer dereference
MIPS: Loongson: Fix cpu_probe_loongson() again
Pull xfs fix from Darrick Wong:
"Nothing exciting here, just getting rid of a gcc warning that I got
tired of seeing when I turn on gcov"
* tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix uninit warning in xfs_growfs_data
Pull more smb client updates from Steve French:
- fix potential use after free in unmount
- minor cleanup
- add worker to cleanup stale directory leases
* tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add a laundromat thread for cached directories
smb: client: remove redundant pointer 'server'
cifs: fix session state transition to avoid use-after-free issue
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
but task is already holding lock:
(&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes: 33313a747e ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull hotfixes from Andrew Morton:
"16 hotfixes. Six are cc:stable and the remainder address post-6.4
issues"
The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since
it was all hopefully fixed in mainline.
* tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib: dhry: fix sleeping allocations inside non-preemptable section
kasan, slub: fix HW_TAGS zeroing with slub_debug
kasan: fix type cast in memory_is_poisoned_n
mailmap: add entries for Heiko Stuebner
mailmap: update manpage link
bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
MAINTAINERS: add linux-next info
mailmap: add Markus Schneider-Pargmann
writeback: account the number of pages written back
mm: call arch_swap_restore() from do_swap_page()
squashfs: fix cache race with migration
mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
docs: update ocfs2-devel mailing list address
MAINTAINERS: update ocfs2-devel mailing list address
mm: disable CONFIG_PER_VMA_LOCK until its fixed
fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().
We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable. For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.
A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.
Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults. But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.
This fix can potentially regress some fork-heavy workloads. Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea0 ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mmap_region adds a newly created VMA into VMA tree and might modify it
afterwards before dropping the mmap_lock. This poses a problem for page
faults handled under per-VMA locks because they don't take the mmap_lock
and can stumble on this VMA while it's still being modified. Currently
this does not pose a problem since post-addition modifications are done
only for file-backed VMAs, which are not handled under per-VMA lock.
However, once support for handling file-backed page faults with per-VMA
locks is added, this will become a race.
Fix this by write-locking the VMA before inserting it into the VMA tree.
Other places where a new VMA is added into VMA tree do not modify it
after the insertion, so do not need the same locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>