BCA is a big set / family of devices sharing multiple hardware blocks.
It covers BCM4908, BCM63xx, BCM68xx devices and more.
Most of drivers that depend on ARCH_BCM4908 should actually depend on
ARCH_BCMBCA. To make such transition easier, cleaner and breakage-free
add a proper "select".
Later on - if we decide to keep ARCH_BCM4908 - it may be moved under
ARCH_BCMBCA menu. Keeping it may be helpful for limited compiling of DTS
files and "default" Kconfig entires. Or we may just decide to drop it.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: William Zhang <william.zhang@broadcom.com>
Acked-by: Kursad Oney <kursad.oney@broadcom.com>
Link: https://lore.kernel.org/r/20220714045858.7786-1-zajec5@gmail.com
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
This reverts commit 4a47c6385b.
Now that we have a proper fix for POSIX ACLs with overlayfs on top of
idmapped layers revert the temporary fix.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
During permission checking overlayfs will call
ovl_permission()
-> generic_permission()
-> acl_permission_check()
-> check_acl()
-> get_acl()
-> inode->i_op->get_acl() == ovl_get_acl()
-> get_acl() /* on the underlying filesystem */
-> inode->i_op->get_acl() == /*lower filesystem callback */
-> posix_acl_permission()
passing through the get_acl() request to the underlying filesystem.
Before returning these values to the VFS we need to take the idmapping of the
relevant layer into account and translate any ACL_{GROUP,USER} values according
to the idmapped mount.
We cannot alter the ACLs returned from the relevant layer directly as that
would alter the cached values filesystem wide for the lower filesystem. Instead
we can clone the ACLs and then apply the relevant idmapping of the layer.
This is obviously only relevant when idmapped layers are used.
Link: https://lore.kernel.org/r/20220708090134.385160-4-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: linux-unionfs@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
This cycle we added support for mounting overlayfs on top of idmapped mounts.
Recently I've started looking into potential corner cases when trying to add
additional tests and I noticed that reporting for POSIX ACLs is currently wrong
when using idmapped layers with overlayfs mounted on top of it.
I'm going to give a rather detailed explanation to both the origin of the
problem and the solution.
Let's assume the user creates the following directory layout and they have a
rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would
expect files on your host system to be owned. For example, ~/.bashrc for your
regular user would be owned by 1000:1000 and /root/.bashrc would be owned by
0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs
filesystem.
The user chooses to set POSIX ACLs using the setfacl binary granting the user
with uid 4 read, write, and execute permissions for their .bashrc file:
setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
Now they to expose the whole rootfs to a container using an idmapped mount. So
they first create:
mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap}
mkdir -pv /vol/contpool/ctrover/{over,work}
chown 10000000:10000000 /vol/contpool/ctrover/{over,work}
The user now creates an idmapped mount for the rootfs:
mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \
/var/lib/lxc/c2/rootfs \
/vol/contpool/lowermap
This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at
/vol/contpool/lowermap/home/ubuntu/.bashrc.
Assume the user wants to expose these idmapped mounts through an overlayfs
mount to a container.
mount -t overlay overlay \
-o lowerdir=/vol/contpool/lowermap, \
upperdir=/vol/contpool/overmap/over, \
workdir=/vol/contpool/overmap/work \
/vol/contpool/merge
The user can do this in two ways:
(1) Mount overlayfs in the initial user namespace and expose it to the
container.
(2) Mount overlayfs on top of the idmapped mounts inside of the container's
user namespace.
Let's assume the user chooses the (1) option and mounts overlayfs on the host
and then changes into a container which uses the idmapping 0:10000000:65536
which is the same used for the two idmapped mounts.
Now the user tries to retrieve the POSIX ACLs using the getfacl command
getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc
and to their surprise they see:
# file: vol/contpool/merge/home/ubuntu/.bashrc
# owner: 1000
# group: 1000
user::rw-
user:4294967295:rwx
group::r--
mask::rwx
other::r--
indicating the the uid wasn't correctly translated according to the idmapped
mount. The problem is how we currently translate POSIX ACLs. Let's inspect the
callchain in this example:
idmapped mount /vol/contpool/merge: 0:10000000:65536
caller's idmapping: 0:10000000:65536
overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
sys_getxattr()
-> path_getxattr()
-> getxattr()
-> do_getxattr()
|> vfs_getxattr()
| -> __vfs_getxattr()
| -> handler->get == ovl_posix_acl_xattr_get()
| -> ovl_xattr_get()
| -> vfs_getxattr()
| -> __vfs_getxattr()
| -> handler->get() /* lower filesystem callback */
|> posix_acl_fix_xattr_to_user()
{
4 = make_kuid(&init_user_ns, 4);
4 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 4);
/* FAILURE */
-1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
}
If the user chooses to use option (2) and mounts overlayfs on top of idmapped
mounts inside the container things don't look that much better:
idmapped mount /vol/contpool/merge: 0:10000000:65536
caller's idmapping: 0:10000000:65536
overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
sys_getxattr()
-> path_getxattr()
-> getxattr()
-> do_getxattr()
|> vfs_getxattr()
| -> __vfs_getxattr()
| -> handler->get == ovl_posix_acl_xattr_get()
| -> ovl_xattr_get()
| -> vfs_getxattr()
| -> __vfs_getxattr()
| -> handler->get() /* lower filesystem callback */
|> posix_acl_fix_xattr_to_user()
{
4 = make_kuid(&init_user_ns, 4);
4 = mapped_kuid_fs(&init_user_ns, 4);
/* FAILURE */
-1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
}
As is easily seen the problem arises because the idmapping of the lower mount
isn't taken into account as all of this happens in do_gexattr(). But
do_getxattr() is always called on an overlayfs mount and inode and thus cannot
possible take the idmapping of the lower layers into account.
This problem is similar for fscaps but there the translation happens as part of
vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain:
setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
The expected outcome here is that we'll receive the cap_net_raw capability as
we are able to map the uid associated with the fscap to 0 within our container.
IOW, we want to see 0 as the result of the idmapping translations.
If the user chooses option (1) we get the following callchain for fscaps:
idmapped mount /vol/contpool/merge: 0:10000000:65536
caller's idmapping: 0:10000000:65536
overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
sys_getxattr()
-> path_getxattr()
-> getxattr()
-> do_getxattr()
-> vfs_getxattr()
-> xattr_getsecurity()
-> security_inode_getsecurity() ________________________________
-> cap_inode_getsecurity() | |
{ V |
10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000); |
10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000); |
/* Expected result is 0 and thus that we own the fscap. */ |
0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000); |
} |
-> vfs_getxattr_alloc() |
-> handler->get == ovl_other_xattr_get() |
-> vfs_getxattr() |
-> xattr_getsecurity() |
-> security_inode_getsecurity() |
-> cap_inode_getsecurity() |
{ |
0 = make_kuid(0:0:4k /* lower s_user_ns */, 0); |
10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); |
10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000); |
|____________________________________________________________________|
}
-> vfs_getxattr_alloc()
-> handler->get == /* lower filesystem callback */
And if the user chooses option (2) we get:
idmapped mount /vol/contpool/merge: 0:10000000:65536
caller's idmapping: 0:10000000:65536
overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
sys_getxattr()
-> path_getxattr()
-> getxattr()
-> do_getxattr()
-> vfs_getxattr()
-> xattr_getsecurity()
-> security_inode_getsecurity() _______________________________
-> cap_inode_getsecurity() | |
{ V |
10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0); |
10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000); |
/* Expected result is 0 and thus that we own the fscap. */ |
0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000); |
} |
-> vfs_getxattr_alloc() |
-> handler->get == ovl_other_xattr_get() |
|-> vfs_getxattr() |
-> xattr_getsecurity() |
-> security_inode_getsecurity() |
-> cap_inode_getsecurity() |
{ |
0 = make_kuid(0:0:4k /* lower s_user_ns */, 0); |
10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); |
0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); |
|____________________________________________________________________|
}
-> vfs_getxattr_alloc()
-> handler->get == /* lower filesystem callback */
We can see how the translation happens correctly in those cases as the
conversion happens within the vfs_getxattr() helper.
For POSIX ACLs we need to do something similar. However, in contrast to fscaps
we cannot apply the fix directly to the kernel internal posix acl data
structure as this would alter the cached values and would also require a rework
of how we currently deal with POSIX ACLs in general which almost never take the
filesystem idmapping into account (the noteable exception being FUSE but even
there the implementation is special) and instead retrieve the raw values based
on the initial idmapping.
The correct values are then generated right before returning to userspace. The
fix for this is to move taking the mount's idmapping into account directly in
vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user().
To this end we split out two small and unexported helpers
posix_acl_getxattr_idmapped_mnt() and posix_acl_setxattr_idmapped_mnt(). The
former to be called in vfs_getxattr() and the latter to be called in
vfs_setxattr().
Let's go back to the original example. Assume the user chose option (1) and
mounted overlayfs on top of idmapped mounts on the host:
idmapped mount /vol/contpool/merge: 0:10000000:65536
caller's idmapping: 0:10000000:65536
overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
sys_getxattr()
-> path_getxattr()
-> getxattr()
-> do_getxattr()
|> vfs_getxattr()
| |> __vfs_getxattr()
| | -> handler->get == ovl_posix_acl_xattr_get()
| | -> ovl_xattr_get()
| | -> vfs_getxattr()
| | |> __vfs_getxattr()
| | | -> handler->get() /* lower filesystem callback */
| | |> posix_acl_getxattr_idmapped_mnt()
| | {
| | 4 = make_kuid(&init_user_ns, 4);
| | 10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
| | 10000004 = from_kuid(&init_user_ns, 10000004);
| | |_______________________
| | } |
| | |
| |> posix_acl_getxattr_idmapped_mnt() |
| { |
| V
| 10000004 = make_kuid(&init_user_ns, 10000004);
| 10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
| 10000004 = from_kuid(&init_user_ns, 10000004);
| } |_________________________________________________
| |
| |
|> posix_acl_fix_xattr_to_user() |
{ V
10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
/* SUCCESS */
4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004);
}
And similarly if the user chooses option (1) and mounted overayfs on top of
idmapped mounts inside the container:
idmapped mount /vol/contpool/merge: 0:10000000:65536
caller's idmapping: 0:10000000:65536
overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
sys_getxattr()
-> path_getxattr()
-> getxattr()
-> do_getxattr()
|> vfs_getxattr()
| |> __vfs_getxattr()
| | -> handler->get == ovl_posix_acl_xattr_get()
| | -> ovl_xattr_get()
| | -> vfs_getxattr()
| | |> __vfs_getxattr()
| | | -> handler->get() /* lower filesystem callback */
| | |> posix_acl_getxattr_idmapped_mnt()
| | {
| | 4 = make_kuid(&init_user_ns, 4);
| | 10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
| | 10000004 = from_kuid(&init_user_ns, 10000004);
| | |_______________________
| | } |
| | |
| |> posix_acl_getxattr_idmapped_mnt() |
| { V
| 10000004 = make_kuid(&init_user_ns, 10000004);
| 10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
| 10000004 = from_kuid(0(&init_user_ns, 10000004);
| |_________________________________________________
| } |
| |
|> posix_acl_fix_xattr_to_user() |
{ V
10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
/* SUCCESS */
4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004);
}
The last remaining problem we need to fix here is ovl_get_acl(). During
ovl_permission() overlayfs will call:
ovl_permission()
-> generic_permission()
-> acl_permission_check()
-> check_acl()
-> get_acl()
-> inode->i_op->get_acl() == ovl_get_acl()
> get_acl() /* on the underlying filesystem)
->inode->i_op->get_acl() == /*lower filesystem callback */
-> posix_acl_permission()
passing through the get_acl request to the underlying filesystem. This will
retrieve the acls stored in the lower filesystem without taking the idmapping
of the underlying mount into account as this would mean altering the cached
values for the lower filesystem. So we block using ACLs for now until we
decided on a nice way to fix this. Note this limitation both in the
documentation and in the code.
The most straightforward solution would be to have ovl_get_acl() simply
duplicate the ACLs, update the values according to the idmapped mount and
return it to acl_permission_check() so it can be used in posix_acl_permission()
forgetting them afterwards. This is a bit heavy handed but fairly
straightforward otherwise.
Link: https://github.com/brauner/mount-idmapped/issues/9
Link: https://lore.kernel.org/r/20220708090134.385160-2-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: linux-unionfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Bring in Miklos' tree which contains the temporary fix for POSIX ACLs
with overlayfs on top of idmapped layers. We will add a proper fix on
top of it and then revert the temporary fix.
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
If the PNP0D80 device is present and its _DSM appears to be valid,
there is no reason to avoid using it even if ACPI_FADT_LOW_POWER_S0
is unset in the FADT, because suspend-to-idle may be the only way to
suspend the system if S3 is not supported by the platform, so do not
return early from lps0_device_attach() in that case.
However, still check ACPI_FADT_LOW_POWER_S0 when deciding whether or
not suspend-to-idle should be the default system suspend method.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Revert commit 1cdda9486f ("ACPI / PM: LPIT: Register sysfs attributes
based on FADT"), because what it did was more confusing than it would
be to allow the sysfs attributes in question to be created regardless
of whether or not the relevant flag was set in the FADT.
If ACPI_FADT_LOW_POWER_S0 is not set, it need not mean that LPIT is
invalid and low-power S0 idle is not usable. It merely means that
using S3 on the given system is more beneficial from the energy
saving perspective than using low-power S0 idle.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
This driver is currently broken, it does not show the in0_input sysfs
file and also returns the following message on startup:
hwmon_device_register() is deprecated. Please convert the driver to
use hwmon_device_register_with_info().
This patch converts the driver and also cleans up the 'read' function.
Signed-off-by: Marc Ferland <ferlandm@amotus.ca>
Link: https://lore.kernel.org/r/20220712193504.1374656-1-ferlandm@amotus.ca
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to avoid printing a warning when modules do not exercise any
errata-dependent behavior and the SiFive errata are enabled.
- A fix to the Microchip PFSOC to attach the L2 cache to the CPU nodes.
* tag 'riscv-for-linus-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: don't warn for sifive erratas in modules
riscv: dts: microchip: hook up the mpfs' l2cache
bootgraph:
- fix parsing of /proc/version to be much more flexible
- check kernel version to disallow ftrace on anything older than 4.10
sleepgraph:
- include fix to bugzilla 212761 in case it regresses
- fix for -proc bug: https://github.com/intel/pm-graph/pull/20
- add -debugtiming arg to get timestamps on prints
- allow use of the netfix tool hosted in the github repo
- read s0ix data from pmc_core for better debug
- include more system data in the output log
- Do a better job testing input files useability
- flag more error data from dmesg in the timeline
- pre-parse the trace log to fix any ordering issues
- add new parser to process dmesg only timelines
- remove superflous sleep(5) in multitest mode
config/custom-timeline-functions.cfg:
- change some names to keep up to date
README:
- new version, small wording changes
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull KVM fixes from Paolo Bonzini:
"RISC-V:
- Fix missing PAGE_PFN_MASK
- Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
x86:
- Fix for nested virtualization when TSC scaling is active
- Estimate the size of fastcc subroutines conservatively, avoiding
disastrous underestimation when return thunks are enabled
- Avoid possible use of uninitialized fields of 'struct
kvm_lapic_irq'
Generic:
- Mark as such the boolean values available from the statistics file
descriptors
- Clarify statistics documentation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: emulate: do not adjust size of fastop and setcc subroutines
KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
Documentation: kvm: clarify histogram units
kvm: stats: tell userspace which values are boolean
x86/kvm: fix FASTOP_SIZE when return thunks are enabled
KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1
RISC-V: KVM: Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
riscv: Fix missing PAGE_PFN_MASK
Pull ceph fix from Ilya Dryomov:
"A folio locking fixup that Xiubo and David cooperated on, marked for
stable. Most of it is in netfs but I picked it up into ceph tree on
agreement with David"
* tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client:
netfs: do not unlock and put the folio twice
Pull spi fixes from Mark Brown:
"A few driver specific fixes, none especially remarkable, plus a
MAINTAINERS file update due to the previous maintainer for the NXP
FSPI driver having left the company"
* tag 'spi-fix-v5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: cadence-quadspi: Remove spi_master_put() in probe failure path
MAINTAINERS: change the NXP FSPI driver maintainer.
spi: amd: Limit max transfer and message size
spi: aspeed: Fix division by zero
spi: aspeed: Add dev_dbg() to dump the spi-mem direct mapping descriptor
With the new design in place, the show() and store() callbacks check if
the policy is active or not before proceeding any further to avoid
potential races. And in order to guarantee that cpufreq_policy_free()
must be called after clearing the policy->cpus mask, i.e. by marking the
policy inactive.
In order to avoid introducing a bug around this later, print a warning
message if we end up freeing an active policy.
Also update cpufreq_online() a bit to make sure we clear the cpus mask
for each error case before calling cpufreq_policy_free().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The SCMI v3.1 adds support for power values in micro-Watts. They are not
always in milli-Watts anymore (ignoring the bogo-Watts). Thus, the power
must be converted conditionally before sending to Energy Model. Add the
logic which handles the needed checks and conversions.
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In SCMI v3.1 the power scale can be in micro-Watts. The upper layers, e.g.
cpufreq and EM should handle received power values properly (upscale when
needed). Thus, provide an interface which allows to check what is the
scale for power values. The old interface allowed to distinguish between
bogo-Watts and milli-Watts only (which was good for older SCMI spec).
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The EM now uses the micro-Watts scale for the power values. Update
related documentation to reflect that fact.
Fix also a problematic sentence in the doc "to:" which triggers test
scripts complaining about wrong email address.
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The milli-Watts precision causes rounding errors while calculating
efficiency cost for each OPP. This is especially visible in the 'simple'
Energy Model (EM), where the power for each OPP is provided from OPP
framework. This can cause some OPPs to be marked inefficient, while
using micro-Watts precision that might not happen.
Update all EM users which access 'power' field and assume the value is
in milli-Watts.
Solve also an issue with potential overflow in calculation of energy
estimation on 32bit machine. It's needed now since the power value
(thus the 'cost' as well) are higher.
Example calculation which shows the rounding error and impact:
power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz
power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000
power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18
power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961
power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21
max_freq = 2000MHz
cost_a_mW = 18 * 2000MHz/500MHz = 72
cost_a_uW = 18000 * 2000MHz/500MHz = 72000
cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better
cost_b_uW = 21961 * 2000MHz/600MHz = 73203
The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly
better that the 'cost_b_uW' (this patch uses micro-Watts) and such
would have impact on the 'inefficient OPPs' information in the Cpufreq
framework. This patch set removes the rounding issue.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ARM SoC fixes from Arnd Bergmann:
"Most of the contents are bugfixes for the devicetree files:
- A Qualcomm MSM8974 pin controller regression, caused by a cleanup
patch that gets partially reverted here.
- Missing properties for Broadcom BCM49xx to fix timer detection and
SMP boot.
- Fix touchscreen pinctrl for imx6ull-colibri board
- Multiple fixes for Rockchip rk3399 based machines including the vdu
clock-rate fix, otg port fix on Quartz64-A and ethernet on
Quartz64-B
- Fixes for misspelled DT contents causing minor problems on
imx6qdl-ts7970m, orangepi-zero, sama5d2, kontron-kswitch-d10, and
ls1028a
And a couple of changes elsewhere:
- Fix binding for Allwinner D1 display pipeline
- Trivial code fixes to the TEE and reset controller driver
subsystems and the rockchip platform code.
- Multiple updates to the MAINTAINERS files, marking the Palm Treo
support as orphaned, and fixing some entries for added or changed
file names"
* tag 'soc-fixes-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits)
arm64: dts: broadcom: bcm4908: Fix cpu node for smp boot
arm64: dts: broadcom: bcm4908: Fix timer node for BCM4906 SoC
ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero
ARM: dts: at91: sama5d2: Fix typo in i2s1 node
tee: tee_get_drvdata(): fix description of return value
optee: Remove duplicate 'of' in two places.
ARM: dts: kswitch-d10: use open drain mode for coma-mode pins
ARM: dts: colibri-imx6ull: fix snvs pinmux group
optee: smc_abi.c: fix wrong pointer passed to IS_ERR/PTR_ERR()
MAINTAINERS: add polarfire rng, pci and clock drivers
MAINTAINERS: mark ARM/PALM TREO SUPPORT orphan
ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count
arm64: dts: ls1028a: Update SFP node to include clock
dt-bindings: display: sun4i: Fix D1 pipeline count
ARM: dts: qcom: msm8974: re-add missing pinctrl
reset: Fix devm bulk optional exclusive control getter
MAINTAINERS: rectify entry for SYNOPSYS AXS10x RESET CONTROLLER DRIVER
ARM: rockchip: Add missing of_node_put() in rockchip_suspend_init()
arm64: dts: rockchip: Assign RK3399 VDU clock rate
arm64: dts: rockchip: Fix Quartz64-A dwc3 otg port behavior
...
This reverts commit 253bf57555.
Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.
Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.
[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/
Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 4076942021.
Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.
Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.
[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/
Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 8ee922689d.
Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.
Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.
[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/
Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 48b36a602a.
Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.
Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.
[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/
Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull x86 platform driver fixes from Hans de Goede:
"Highlights:
- Fix brightness key events getting reported twice on some Dells.
Regression caused by recent Panasonic hotkey fixes
- Fix poweroff no longer working on some devices regression caused
by recent poweroff handler rework
- Mark new (in 5.19) Intel IFS driver as broken, because of some
issues surrounding the userspace (sysfs) API which need to be
cleared up
- Some hardware-id / quirk additions"
* tag 'platform-drivers-x86-v5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
ACPI: video: Fix acpi_video_handles_brightness_key_presses()
platform/x86: intel_atomisp2_led: Also turn off the always-on camera LED on the Asus T100TAF
platform/x86/intel/ifs: Mark as BROKEN
platform/x86: asus-wmi: Add key mappings
efi: Fix efi_power_off() not being run before acpi_power_off() when necessary
platform/x86: x86-android-tablets: Fix Lenovo Yoga Tablet 2 830/1050 poweroff again
platform/x86: gigabyte-wmi: add support for B660I AORUS PRO DDR4
platform/x86/amd/pmc: Add new platform support
platform/x86/amd/pmc: Add new acpi id for PMC controller
Pull xen fix from Juergen Gross:
"Fix for the Xen gntdev driver causing inappropriate WARN() messages"
* tag 'for-linus-5.19a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE
Pull drm fixes from Dave Airlie:
"This is the regular fixes pull for this week. This has a bunch of
amdgpu fixes, major one reverts the buddy allocator until it can be
tested more, otherwise just small ones, then i915 has a bunch of
fixes.
The outstanding firmware regressions reported by phoronix will
hopefully be dealt with ASAP.
amdgpu:
- revert buddy allocator support for now
- DP MST blank screen fix for specific platforms
- MEC firmware check fix for GC 10.3.7
- Deep color fix for DCE
- Fix possible divide by 0
- Coverage blend mode fix
- Fix cursor only commit timestamps
i915:
- Selftest fix
- TTM fix sg_table construction
- Error return fixes
- Fix a performance regression related to waitboost
- Fix GT resets"
* tag 'drm-fixes-2022-07-15' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Ensure valid event timestamp for cursor-only commits
drm/amd/display: correct check of coverage blend mode
drm/amd/pm: Prevent divide by zero
drm/amd/display: Only use depth 36 bpp linebuffers on DCN display engines.
drm/amdkfd: correct the MEC atomic support firmware checking for GC 10.3.7
drm/amd/display: Ignore First MST Sideband Message Return Error
drm/i915/selftests: fix subtraction overflow bug
drm/i915/gem: Look for waitboosting across the whole object prior to individual waits
drm/i915/gt: Serialize TLB invalidates with GT resets
drm/i915/gt: Serialize GRDOM access between multiple engine resets
drm/i915/ttm: fix sg_table construction
drm/i915/selftests: fix a couple IS_ERR() vs NULL tests
drm/i915: Fix vm use-after-free in vma destruction
drm/i915/guc: ADL-N should use the same GuC FW as ADL-S
drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector()
drm/i915/gvt: IS_ERR() vs NULL bug in intel_gvt_update_reg_whitelist()
Revert "drm/amdgpu: add drm buddy support to amdgpu"
Pyll sysctl fix from Luis Chamberlain:
"Only one fix for sysctl"
* tag 'sysctl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE
06781a5026 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register
value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong
though: It is calculated based on the maximum page read time, but the
timeout is also used for page write and block erase operations which
require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of
tBERS_max and tPROG_max.
This is for now the easiest and most obvious way to fix the driver.
There's room for improvements though: The NAND_OP_WAITRDY_INSTR tells us
the desired timeout for the current operation, so we could program the
timeout dynamically for each operation instead of setting a fixed
timeout. Also we could wire up the interrupt handler to actually detect
and forward timeouts occurred when waiting for the chip being ready.
As a sidenote I verified that the change in 06781a5026 is really
correct. I wired up the interrupt handler in my tree and measured the
time between starting the operation and the timeout interrupt handler
coming in. The time increases 41us with each step in the timeout
register which corresponds to 4096 clock cycles with the 99MHz clock
that I have.
Fixes: 06781a5026 ("mtd: rawnand: gpmi: Fix setting busy timeout setting")
Fixes: b120612206 ("mtd: rawniand: gpmi: use core timings instead of an empirical derivation")
Cc: stable@vger.kernel.org
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Han Xu <han.xu@nxp.com>
Tested-by: Tomasz Moń <tomasz.mon@camlingroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
[Why]
Changes from "Fix for dmub outbox notification enable" need to land
in DM or DMUB outbox notification would be disabled.
[How]
Enable outbox notification only after interrupt are enabled and IRQ
handlers registered. Any pending notification will be sent by DMUB
once outbox notification is enabled.
Fixes: ed72087064 ("drm/amd/display: Fix for dmub outbox notification enable")
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Stylon Wang <stylon.wang@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Pull a cpufreq ARM fix for 5.19-rc7 from Viresh Kumar:
- mediatek: Handle sram regulator probe deferral
* tag 'cpufreq-arm-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: mediatek: Handle sram regulator probe deferral
Instead of doing complicated calculations to find the size of the subroutines
(which are even more complicated because they need to be stringified into
an asm statement), just hardcode to 16.
It is less dense for a few combinations of IBT/SLS/retbleed, but it has
the advantage of being really simple.
Cc: stable@vger.kernel.org # 5.15.x: 84e7051c0b: x86/kvm: fix FASTOP_SIZE when return thunks are enabled
Cc: stable@vger.kernel.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Biao Huang says:
====================
stmmac: dwmac-mediatek: fix clock issue
changes in v5:
1. add reivewd-by as Matthias's comments.
2. fix "warning: unused variable 'ret' [-Wunused-variable]" as Jakub's comments
changes in v4:
1. improve commit message and test ko insertion/remove as Matthias's comments.
2. add patch "net: stmmac: fix pm runtime issue in stmmac_dvr_remove()" to
fix vlan filter deletion issue.
3. add patch "net: stmmac: fix unbalanced ptp clock issue in suspend/resume flow"
to fix unbalanced ptp clock issue in suspend/resume flow.
changes in v3:
1. delete mediatek_dwmac_exit() since there is no operation in it,
as Matthias's comments.
changes in v2:
1. clock configuration is still needed in probe,
and invoke mediatek_dwmac_clks_config() instead.
2. update commit message.
v1:
remove duplicated clock configuration in init/exit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Current stmmac driver will prepare/enable ptp_ref clock in
stmmac_init_tstamp_counter().
The stmmac_pltfr_noirq_suspend will disable it once in suspend flow.
But in resume flow,
stmmac_pltfr_noirq_resume --> stmmac_init_tstamp_counter
stmmac_resume --> stmmac_hw_setup --> stmmac_init_ptp --> stmmac_init_tstamp_counter
ptp_ref clock reference counter increases twice, which leads to unbalance
ptp clock when resume back.
Move ptp_ref clock prepare/enable out of stmmac_init_tstamp_counter to fix it.
Fixes: 0735e639f1 ("net: stmmac: skip only stmmac_ptp_register when resume from suspend")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If netif is running when stmmac_dvr_remove is invoked,
the unregister_netdev will call ndo_stop(stmmac_release) and
vlan_kill_rx_filter(stmmac_vlan_rx_kill_vid).
Currently, stmmac_dvr_remove() will disable pm runtime before
unregister_netdev. When stmmac_vlan_rx_kill_vid is invoked,
pm_runtime_resume_and_get in it returns EACCESS error number,
and reports:
dwmac-mediatek 11021000.ethernet eth0: stmmac_dvr_remove: removing driver
dwmac-mediatek 11021000.ethernet eth0: FPE workqueue stop
dwmac-mediatek 11021000.ethernet eth0: failed to kill vid 0081/0
Move the pm_runtime_disable to the end of stmmac_dvr_remove
to fix this issue.
Fixes: 6449520391 ("net: stmmac: properly handle with runtime pm in stmmac_dvr_remove()")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pm_runtime takes care of the clock handling in current
stmmac drivers, and dwmac-mediatek implement the
mediatek_dwmac_clks_config() as the callback for pm_runtime.
Then, stripping duplicated clocks handling in old init()/exit()
to fix clock issue in suspend/resume test.
As to clocks in probe/remove, vendor need symmetric handling to
ensure clocks balance.
Test pass, including suspend/resume and ko insertion/remove.
Fixes: 3186bdad97 ("stmmac: dwmac-mediatek: add platform level clocks management")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima says:
====================
sysctl: Fix data-races around ipv4_net_table (Round 2).
This series fixes data-races around 15 knobs after ip_default_ttl in
ipv4_net_table.
These two knobs are skipped.
- ip_local_port_range is safe with its own lock.
- ip_local_reserved_ports uses proc_do_large_bitmap(), which will need
an additional lock and can be fixed later.
So, the next round will start with igmp_link_local_mcast_reports.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While reading sysctl_tcp_probe_interval, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 05cbc0db03 ("ipv4: Create probe timer for tcp PMTU as per RFC4821")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While reading sysctl_tcp_probe_threshold, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 6b58e0a5f3 ("ipv4: Use binary search to choose tcp PMTU probe_size")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While reading sysctl_tcp_mtu_probe_floor, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: c04b79b6cf ("tcp: add new tcp_mtu_probe_floor sysctl")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>