RPMH and drivers that use RPMH APIs need Command DB API to find the
dynamic resource information. Let's match the RPMH to match the Command
DB configuration.
This should fix undefined symbol references reported by CI :
aarch64-linux-ld: drivers/clk/qcom/clk-rpmh.o: in function `clk_rpmh_probe':
>> clk-rpmh.c:(.text+0xac): undefined reference to `cmd_db_read_addr'
>> aarch64-linux-ld: clk-rpmh.c:(.text+0xc0): undefined reference to `cmd_db_read_aux_data'
aarch64-linux-ld: drivers/soc/qcom/rpmh-rsc.o: in function `rpmh_rsc_probe':
>> rpmh-rsc.c:(.text+0x42c): undefined reference to `cmd_db_ready'
aarch64-linux-ld: drivers/regulator/qcom-rpmh-regulator.o: in function `rpmh_regulator_probe':
>> qcom-rpmh-regulator.c:(.text+0x3e0): undefined reference to `cmd_db_read_addr'
Cc: Todd Kjos <tkjos@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Maulik Shah <mkshah@codeaurora.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Link: https://lore.kernel.org/r/20201008040907.7036-1-ilina@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
The functions geni_se_select_fifo_mode() and
geni_se_select_fifo_mode() are a little funny. They read/write a
bunch of memory mapped registers even if they don't change or aren't
relevant for the current protocol. Let's make them a little more
sane. We'll also add a comment explaining why we don't do some of the
operations for UART.
NOTE: there is no evidence at all that this makes any performance
difference and it fixes no bugs. However, it seems (to me) like it
makes the functions a little easier to understand. Decreasing the
amount of times we read/write memory mapped registers is also nice,
even if we are using "relaxed" variants.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20201013142448.v2.3.I646736d3969dc47de8daceb379c6ba85993de9f4@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
This reverts commit 02b9aec592.
As talked about in the patch ("soc: qcom: geni: More properly switch
to DMA mode"), swapping the order of geni_se_setup_m_cmd() and
geni_se_xx_dma_prep() can sometimes cause corrupted transfers. Thus
we traded one problem for another. Now that we've debugged the
problem further and fixed the geni helper functions to more disable
FIFO interrupts when we move to DMA mode we can revert it and end up
with (hopefully) zero problems!
To be explicit, the patch ("soc: qcom: geni: More properly switch
to DMA mode") is a prerequisite for this one.
Fixes: 02b9aec592 ("i2c: i2c-qcom-geni: Fix DMA transfer race")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Akash Asthana <akashast@codeaurora.org>
Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20201013142448.v2.2.I7b22281453b8a18ab16ef2bfd4c641fb1cc6a92c@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
On geni-i2c transfers using DMA, it was seen that if you program the
command (I2C_READ) before calling geni_se_rx_dma_prep() that it could
cause interrupts to fire. If we get unlucky, these interrupts can
just keep firing (and not be handled) blocking further progress and
hanging the system.
In commit 02b9aec592 ("i2c: i2c-qcom-geni: Fix DMA transfer race")
we avoided that by making sure we didn't program the command until
after geni_se_rx_dma_prep() was called. While that avoided the
problems, it also turns out to be invalid. At least in the TX case we
started seeing sporadic corrupted transfers. This is easily seen by
adding an msleep() between the DMA prep and the writing of the
command, which makes the problem worse. That means we need to revert
that commit and find another way to fix the bogus IRQs.
Specifically, after reverting commit 02b9aec592 ("i2c:
i2c-qcom-geni: Fix DMA transfer race"), I put some traces in. I found
that the when the interrupts were firing like crazy:
- "m_stat" had bits for M_RX_IRQ_EN, M_RX_FIFO_WATERMARK_EN set.
- "dma" was set.
Further debugging showed that I could make the problem happen more
reliably by adding an "msleep(1)" any time after geni_se_setup_m_cmd()
ran up until geni_se_rx_dma_prep() programmed the length.
A rather simple fix is to change geni_se_select_dma_mode() so it's a
true inverse of geni_se_select_fifo_mode() and disables all the FIFO
related interrupts. Now the problematic interrupts can't fire and we
can program things in the correct order without worrying.
As part of this, let's also change the writel_relaxed() in the prepare
function to a writel() so that our DMA is guaranteed to be prepared
now that we can't rely on geni_se_setup_m_cmd()'s writel().
NOTE: the only current user of GENI_SE_DMA in mainline is i2c.
Fixes: 37692de5d5 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
Fixes: 02b9aec592 ("i2c: i2c-qcom-geni: Fix DMA transfer race")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Akash Asthana <akashast@codeaurora.org>
Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20201013142448.v2.1.Ifdb1b69fa3367b81118e16e9e4e63299980ca798@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Older SoCs like MSM8916, MSM8939, MSM8974, MSM8996, ...
use "voltage corners" instead of "voltage levels".
It seems like they all use exactly the same set of corner values,
a value from 0-6 where 6 is the maximum corner (super turbo).
In preparation to add the power domains for MSM8916, rename
MAX_8996_RPMPD_STATE to MAX_CORNER_RPMPD_STATE to make it clear
that this is the max_state to be used for all SoCs using corners. -
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Link: https://lore.kernel.org/r/20200916104135.25085-2-stephan@gerhold.net
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Commit efde2659b0 ("drivers: qcom: rpmh-rsc: Use rcuidle tracepoints
for rpmh") was written to fix a bug seen in an unmerged series that
implemented a struct generic_pm_domain::power_off() callback calling
rpmh_flush(). See stack trace below.
Call trace:
dump_backtrace+0x0/0x174
show_stack+0x20/0x2c
dump_stack+0xc8/0x124
lockdep_rcu_suspicious+0xe4/0x104
__tcs_buffer_write+0x230/0x2d0
rpmh_rsc_write_ctrl_data+0x210/0x270
rpmh_flush+0x84/0x24c
rpmh_domain_power_off+0x78/0x98
_genpd_power_off+0x40/0xc0
genpd_power_off+0x168/0x208
Later the final merged solution is to use CPU PM notification to invoke
rpmh_flush() and power_off() callback of genpd is not implemented in the
driver.
CPU PM notifiers are run with RCU enabled/watching (see cpu_pm_notify()
and how it calls rcu_irq_enter_irqson() before calling the notifiers).
Remove this change since RCU will not be idle during CPU PM notifications
hence not required to use _rcuidle tracepoint. Using _rcuidle tracepoint
prevented rpmh driver to be loadable module as these are not exported
symbols.
This reverts commit efde2659b0.
Cc: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/1601877596-32676-2-git-send-email-mkshah@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Older chipsets may not be allowed to configure certain LLCC registers
as that is handled by the secure side software. However, this is not
the case for newer chipsets and they must configure these registers
according to the contents of the SCT table, while keeping in mind that
older targets may not have these capabilities. So add support to allow
such configuration of registers to enable capacity based allocation
and power collapse retention for capable chipsets.
Reason for choosing capacity based allocation rather than the default
way based allocation is because capacity based allocation allows more
finer grain partition and provides more flexibility in configuration.
As for the retention through power collapse, it has an advantage where
the cache hits are more when we wake up from power collapse although
it does burn more power but the exact power numbers are not known at
the moment.
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
[saiprakash.ranjan@codeaurora.org: use existing config and reword commit msg]
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Link: https://lore.kernel.org/r/dac7e11cf654fc6d75a6b5ca062ab87b01547810.1600151951.git.saiprakash.ranjan@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
smp2p_update_bits() should disable interrupts when it acquires its
spinlock. This is important because without the _irqsave, a priority
inversion can occur.
This function is called both with interrupts enabled in
qcom_q6v5_request_stop(), and with interrupts disabled in
ipa_smp2p_panic_notifier(). IRQ handling of spinlocks should be
consistent to avoid the panic notifier deadlocking because it's
sitting on the thread that's already got the lock via _request_stop().
Found via lockdep.
Cc: stable@vger.kernel.org
Fixes: 50e9964141 ("soc: qcom: smp2p: Qualcomm Shared Memory Point to Point")
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Evan Green <evgreen@chromium.org>
Link: https://lore.kernel.org/r/20200929133040.RESEND.1.Ideabf6dcdfc577cf39ce3d95b0e4aa1ac8b38f0c@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
tid_addr is not a "pointer to (pointer to int in userspace)"; it is in
fact a "pointer to (pointer to int in userspace) in userspace". So
sparse rightfully complains about passing a kernel pointer to
put_user().
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 453431a549 ("mm, treewide: rename kzfree() to
kfree_sensitive()") renamed kzfree() to kfree_sensitive(),
but it left a compatibility definition of kzfree() to avoid
being too disruptive.
Since then a few more instances of kzfree() have slipped in.
Just get rid of them and remove the compatibility definition
once and for all.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull timer fixes from Thomas Gleixner:
"A time namespace fix and a matching selftest. The futex absolute
timeouts which are based on CLOCK_MONOTONIC require time namespace
corrected. This was missed in the original time namesapce support"
* tag 'timers-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/timens: Add a test for futex()
futex: Adjust absolute futex timeouts with per time namespace offset
Pull scheduler fixes from Thomas Gleixner:
"Two scheduler fixes:
- A trivial build fix for sched_feat() to compile correctly with
CONFIG_JUMP_LABEL=n
- Replace a zero lenght array with a flexible array"
* tag 'sched-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/features: Fix !CONFIG_JUMP_LABEL case
sched: Replace zero-length array with flexible-array
Pull perf fix from Thomas Gleixner:
"A single fix to compute the field offset of the SNOOPX bit in the data
source bitmask of perf events correctly"
* tag 'perf-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: correct SNOOPX field offset
Pull locking fix from Thomas Gleixner:
"Just a trivial fix for kernel-doc warnings"
* tag 'locking-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/seqlocks: Fix kernel-doc warnings
Pull NTB fixes from Jon Mason.
* tag 'ntb-5.10' of git://github.com/jonmason/ntb:
NTB: Use struct_size() helper in devm_kzalloc()
ntb: intel: Fix memleak in intel_ntb_pci_probe
NTB: hw: amd: fix an issue about leak system resources
Pull i2c fix from Wolfram Sang:
"Regression fix for rc1 and stable kernels as well"
* 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: core: Restore acpi_walk_dep_device_list() getting called after registering the ACPI i2c devs
Pull more cifs updates from Steve French:
"Add support for stat of various special file types (WSL reparse points
for char, block, fifo)"
* tag '5.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
smb3: add some missing definitions from MS-FSCC
smb3: remove two unused variables
smb3: add support for stat of WSL reparse points for special file types
Pull more parisc updates from Helge Deller:
- During this merge window O_NONBLOCK was changed to become 000200000,
but we missed that the syscalls timerfd_create(), signalfd4(),
eventfd2(), pipe2(), inotify_init1() and userfaultfd() do a strict
bit-wise check of the flags parameter.
To provide backward compatibility with existing userspace we
introduce parisc specific wrappers for those syscalls which filter
out the old O_NONBLOCK value and replaces it with the new one.
- Prevent HIL bus driver to get stuck when keyboard or mouse isn't
attached
- Improve error return codes when setting rtc time
- Minor documentation fix in pata_ns87415.c
* 'parisc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
ata: pata_ns87415.c: Document support on parisc with superio chip
parisc: Add wrapper syscalls to fix O_NONBLOCK flag usage
hil/parisc: Disable HIL driver when it gets stuck
parisc: Improve error return codes when setting rtc time
Pull more xen updates from Juergen Gross:
- a series for the Xen pv block drivers adding module parameters for
better control of resource usge
- a cleanup series for the Xen event driver
* tag 'for-linus-5.10b-rc1c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
Documentation: add xen.fifo_events kernel parameter description
xen/events: unmask a fifo event channel only if it was masked
xen/events: only register debug interrupt for 2-level events
xen/events: make struct irq_info private to events_base.c
xen: remove no longer used functions
xen-blkfront: Apply changed parameter name to the document
xen-blkfront: add a parameter for disabling of persistent grants
xen-blkback: add a parameter for disabling of persistent grants
Pull SafeSetID updates from Micah Morton:
"The changes are mostly contained to within the SafeSetID LSM, with the
exception of a few 1-line changes to change some ns_capable() calls to
ns_capable_setid() -- causing a flag (CAP_OPT_INSETID) to be set that
is examined by SafeSetID code and nothing else in the kernel.
The changes to SafeSetID internally allow for setting up GID
transition security policies, as already existed for UIDs"
* tag 'safesetid-5.10' of git://github.com/micah-morton/linux:
LSM: SafeSetID: Fix warnings reported by test bot
LSM: SafeSetID: Add GID security policy handling
LSM: Signal to SafeSetID when setting group IDs
Pull random32 updates from Willy Tarreau:
"Make prandom_u32() less predictable.
This is the cleanup of the latest series of prandom_u32
experimentations consisting in using SipHash instead of Tausworthe to
produce the randoms used by the network stack.
The changes to the files were kept minimal, and the controversial
commit that used to take noise from the fast_pool (f227e3ec3b) was
reverted. Instead, a dedicated "net_rand_noise" per_cpu variable is
fed from various sources of activities (networking, scheduling) to
perturb the SipHash state using fast, non-trivially predictable data,
instead of keeping it fully deterministic. The goal is essentially to
make any occasional memory leakage or brute-force attempt useless.
The resulting code was verified to be very slightly faster on x86_64
than what is was with the controversial commit above, though this
remains barely above measurement noise. It was also tested on i386 and
arm, and build- tested only on arm64"
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
* tag '20201024-v4-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/prandom:
random32: add a selftest for the prandom32 code
random32: add noise from network and scheduling activity
random32: make prandom_u32() output unpredictable
Commit 21653a4181 ("i2c: core: Call i2c_acpi_install_space_handler()
before i2c_acpi_register_devices()")'s intention was to only move the
acpi_install_address_space_handler() call to the point before where
the ACPI declared i2c-children of the adapter where instantiated by
i2c_acpi_register_devices().
But i2c_acpi_install_space_handler() had a call to
acpi_walk_dep_device_list() hidden (that is I missed it) at the end
of it, so as an unwanted side-effect now acpi_walk_dep_device_list()
was also being called before i2c_acpi_register_devices().
Move the acpi_walk_dep_device_list() call to the end of
i2c_acpi_register_devices(), so that it is once again called *after*
the i2c_client-s hanging of the adapter have been created.
This fixes the Microsoft Surface Go 2 hanging at boot.
Fixes: 21653a4181 ("i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209627
Reported-by: Rainer Finke <rainer@finke.cc>
Reported-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Suggested-by: Maximilian Luz <luzmaximilian@gmail.com>
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph
- rdma error handling fixes (Chao Leng)
- fc error handling and reconnect fixes (James Smart)
- fix the qid displace when tracing ioctl command (Keith Busch)
- don't use BLK_MQ_REQ_NOWAIT for passthru (Chaitanya Kulkarni)
- fix MTDT for passthru (Logan Gunthorpe)
- blacklist Write Same on more devices (Kai-Heng Feng)
- fix an uninitialized work struct (zhenwei pi)"
- lightnvm out-of-bounds fix (Colin)
- SG allocation leak fix (Doug)
- rnbd fixes (Gioh, Guoqing, Jack)
- zone error translation fixes (Keith)
- kerneldoc markup fix (Mauro)
- zram lockdep fix (Peter)
- Kill unused io_context members (Yufen)
- NUMA memory allocation cleanup (Xianting)
- NBD config wakeup fix (Xiubo)
* tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block: (27 commits)
block: blk-mq: fix a kernel-doc markup
nvme-fc: shorten reconnect delay if possible for FC
nvme-fc: wait for queues to freeze before calling update_hr_hw_queues
nvme-fc: fix error loop in create_hw_io_queues
nvme-fc: fix io timeout to abort I/O
null_blk: use zone status for max active/open
nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru
nvmet: cleanup nvmet_passthru_map_sg()
nvmet: limit passthru MTDS by BIO_MAX_PAGES
nvmet: fix uninitialized work for zero kato
nvme-pci: disable Write Zeroes on Sandisk Skyhawk
nvme: use queuedata for nvme_req_qid
nvme-rdma: fix crash due to incorrect cqe
nvme-rdma: fix crash when connect rejected
block: remove unused members for io_context
blk-mq: remove the calling of local_memory_node()
zram: Fix __zram_bvec_{read,write}() locking order
skd_main: remove unused including <linux/version.h>
sgl_alloc_order: fix memory leak
lightnvm: fix out-of-bounds write to array devices->info[]
...
Pull io_uring fixes from Jens Axboe:
- fsize was missed in previous unification of work flags
- Few fixes cleaning up the flags unification creds cases (Pavel)
- Fix NUMA affinities for completely unplugged/replugged node for io-wq
- Two fallout fixes from the set_fs changes. One local to io_uring, one
for the splice entry point that io_uring uses.
- Linked timeout fixes (Pavel)
- Removal of ->flush() ->files work-around that we don't need anymore
with referenced files (Pavel)
- Various cleanups (Pavel)
* tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block:
splice: change exported internal do_splice() helper to take kernel offset
io_uring: make loop_rw_iter() use original user supplied pointers
io_uring: remove req cancel in ->flush()
io-wq: re-set NUMA node affinities if CPUs come online
io_uring: don't reuse linked_timeout
io_uring: unify fsize with def->work_flags
io_uring: fix racy REQ_F_LINK_TIMEOUT clearing
io_uring: do poll's hash_node init in common code
io_uring: inline io_poll_task_handler()
io_uring: remove extra ->file check in poll prep
io_uring: make cached_cq_overflow non atomic_t
io_uring: inline io_fail_links()
io_uring: kill ref get/drop in personality init
io_uring: flags-based creds init in queue
Pull misc vfs updates from Al Viro:
"Assorted stuff all over the place (the largest group here is
Christoph's stat cleanups)"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove KSTAT_QUERY_FLAGS
fs: remove vfs_stat_set_lookup_flags
fs: move vfs_fstatat out of line
fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
fs: remove vfs_statx_fd
fs: omfs: use kmemdup() rather than kmalloc+memcpy
[PATCH] reduce boilerplate in fsid handling
fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS
selftests: mount: add nosymfollow tests
Add a "nosymfollow" mount option.
Pull dma-mapping fixes from Christoph Hellwig:
- document the new dma_{alloc,free}_pages() API
- two fixups for the dma-mapping.h split
* tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: document dma_{alloc,free}_pages
dma-mapping: move more functions to dma-map-ops.h
ARM/sa1111: add a missing include of dma-map-ops.h
Pull KVM fixes from Paolo Bonzini:
"Two fixes for this merge window, and an unrelated bugfix for a host
hang"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: ioapic: break infinite recursion on lazy EOI
KVM: vmx: rename pi_init to avoid conflict with paride
KVM: x86/mmu: Avoid modulo operator on 64-bit value to fix i386 build
Pull x86 SEV-ES fixes from Borislav Petkov:
"Three fixes to SEV-ES to correct setting up the new early pagetable on
5-level paging machines, to always map boot_params and the kernel
cmdline, and disable stack protector for ../compressed/head{32,64}.c.
(Arvind Sankar)"
* tag 'x86_seves_fixes_for_v5.10_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/64: Explicitly map boot_params and command line
x86/head/64: Disable stack protection for head$(BITS).o
x86/boot/64: Initialize 5-level paging variables earlier
With the removal of the interrupt perturbations in previous random32
change (random32: make prandom_u32() output unpredictable), the PRNG
has become 100% deterministic again. While SipHash is expected to be
way more robust against brute force than the previous Tausworthe LFSR,
there's still the risk that whoever has even one temporary access to
the PRNG's internal state is able to predict all subsequent draws till
the next reseed (roughly every minute). This may happen through a side
channel attack or any data leak.
This patch restores the spirit of commit f227e3ec3b ("random32: update
the net random state on interrupt and activity") in that it will perturb
the internal PRNG's statee using externally collected noise, except that
it will not pick that noise from the random pool's bits nor upon
interrupt, but will rather combine a few elements along the Tx path
that are collectively hard to predict, such as dev, skb and txq
pointers, packet length and jiffies values. These ones are combined
using a single round of SipHash into a single long variable that is
mixed with the net_rand_state upon each invocation.
The operation was inlined because it produces very small and efficient
code, typically 3 xor, 2 add and 2 rol. The performance was measured
to be the same (even very slightly better) than before the switch to
SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
(i40e), the connection rate dropped from 556k/s to 555k/s while the
SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin <lkml@sdf.org>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output. An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.
It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack. Oops.
This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key. (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.) Speed is prioritized over security;
attacks are rare, while performance is always wanted.
Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.
Commit f227e3ec3b ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution. This patch replaces
it.
Reported-by: Amit Klein <aksecurity@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Fixes: f227e3ec3b ("random32: update the net random state on interrupt and activity")
Signed-off-by: George Spelvin <lkml@sdf.org>
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
to prandom.h for later use; merged George's prandom_seed() proposal;
inlined siprand_u32(); replaced the net_rand_state[] array with 4
members to fix a build issue; cosmetic cleanups to make checkpatch
happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau <w@1wt.eu>