linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-21 06:45:19 -04:00

Author	SHA1	Message	Date
Ben Gardon	b10a038e84	KVM: mmu: Add slots_arch_lock for memslot arch fields Add a new lock to protect the arch-specific fields of memslots if they need to be modified in a kvm->srcu read critical section. A future commit will use this lock to lazily allocate memslot rmaps for x86. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-5-bgardon@google.com> [Add Documentation/ hunk. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:26 -04:00
Linus Torvalds	39519f6a56	Merge tag 'fixes_for_v5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota and fanotify fixes from Jan Kara: "A fixup finishing disabling of quotactl_path() syscall (I've missed archs using different way to declare syscalls) and a fix of an fd leak in error handling path of fanotify" * tag 'fixes_for_v5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: finish disable quotactl_path syscall fanotify: fix copy_event_to_user() fid error clean up	2021-06-17 09:49:48 -07:00
Jens Axboe	fe76421d1d	io_uring: allow user configurable IO thread CPU affinity io-wq defaults to per-node masks for IO workers. This works fine by default, but isn't particularly handy for workloads that prefer more specific affinities, for either performance or isolation reasons. This adds IORING_REGISTER_IOWQ_AFF that allows the user to pass in a CPU mask that is then applied to IO thread workers, and an IORING_UNREGISTER_IOWQ_AFF that simply resets the masks back to the default of per-node. Note that no care is given to existing IO threads, they will need to go through a reschedule before the affinity is correct if they are already running or sleeping. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-17 10:25:50 -06:00
Greg Kroah-Hartman	8c51c9b59a	Merge tag 'iio-for-5.14b' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next Jonathan writes: Second set of Counter and IIO new device support, cleanups etc for 5.14 Counter ------ First part of general rework of counter subsystem to add a chrdev interface for event drive data capture. Most of it will hopefully land next cycle. * Consolidate documentation to avoid multiple copies of same docs in per device files. * Constify various arrays etc across subsystem. * 104-quad-8: - Annotate the module config parameter to avoid using it when kernel is locked down. - Spelling and trivial comment drops etc * Intel QEP - Follow up cleanups of trivial stuff from initial patch series. IIO --- Includes some cleanups as part of two ongoing audits - runtime pm usage in IIO. - Insufficient alignment on buffers passed to iio_push_to_buffers_with_timstamp() New device support * bosch,bmc150 - Add ID for BMA253 Minor features / cleanups / minor fixes / late breaking fixes * iio_push_to_buffers_with_timestamp() alignment fixes. This set includes those where the best option is to mark the buffer as __aligned(8). Normally this choice was made because there is too high a degree of possible variation in number of channels enabled to be able to guarantee the timestamp was always in the same location. This ruled out the more obvious structure form used in other drivers. Only one small class of related issues have patches under review and we can finally tighten up the explicit rules to reflect the hidden requirement. * dummy - Kconfig build dependency fix. * adi,ad_sigma_delta - General devm related simplifications for these devices. * adi,adf4350 - Fix some missing cleanup on error path. * adi,adis, ADC drivers. - Clean out unneeded spi_set_drvdata() * ams-taos,tcs3472 - Fix a potential free of an irq that was never allocated. * atlas,sensor - Drop unbalanced runtime pm call and use pm_runtime_resume_and_get() to reduce boilerplate. * bosch,bma180 - Fix bandwidth register values used. * bosch,bmc150 - Fix wrong pointer being dereferenced in remove. - Stop device trying unregister itself rather than the second device. - Refactor ACPI second device handing. - Add support for DUAL250E ACPI HID. - Move some stuff into the header to enable following patches to not add additional accessor functions. Drop existing accessors. - Add support for hinge angle setting with DUAL250E ACPI DSM to ensure keyboard and touchpad enabled correctly when in laptop mode and disabled otherwise. - Add label attr for the multiple sensor locations with DUAL250E ACPI HID. - Fix scale units for bma222 - Various reordering of devices supported lists to be alphabetical order. - Drop unnecessary duplicated chip_info_tbl[] entries. - Document that some devices have two interrupts, even if not currently used by the driver. - Move bma254 over to the bma255 driver. - Move to more consistent scale values, based on assumption that some datasheets use lower precision in their calculations in comparison with others. * hid-sensors - Use namespaces for exported symbols. - Update includes using manual inspection of output of the include-what-you-use tool. * invensense,icp10100 - Drop unbalanced runtime pm put. Use pm_runtime_resume_and_get() to cleanly handle potential error. * invensense,mpu6050 - Drop use of %hhx string formatting. - runtime pm boilerplate removal and drop an unbalanced call in remove. * liteon,ltr501 - Fix inaccurate volatile register list. - Fix wrong mode bit. - Add a missing leXX_to_cpu() conversion. - Mark ltr501_chip_info structure as const. * pulsed-light-lidar: - Boilerplate removal using runtime_pm_resume_and_get() * scmi-sensors - Formatting of SPDX fix. * silabs,si1133 - Fix a string format warning. - Drop remaining uses of %hhx string formatting. * silabs,si1145 - Drop use of %hhx string formatting. * ti,ads1015 - Drop unbalanced runtime pm call in remove and reduce boilerplate. * tag 'iio-for-5.14b' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (76 commits) iio: light: tcs3472: do not free unallocated IRQ iio: accel: bmc150: Use more consistent and accurate scale values iio: hid-sensors: Update header includes iio: pressure: icp10100: Balance runtime pm + use pm_runtime_resume_and_get() iio: prox: pulsed-light-v2: Use pm_runtime_resume_and_get() iio: chemical: atlas-sensor: Balance runtime pm + pm_runtime_resume_and_get() iio: adc: ads1015: Balance runtime pm + pm_runtime_resume_and_get() iio: imu: mpu6050: Balance runtime pm + use pm_runtime_resume_and_get() iio: hid-sensors: lighten exported symbols by moving to IIO_HID namespace iio: prox: isl29501: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: light: vcnl4035: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: light: vcnl4000: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: magn: rm3100: Fix alignment of buffer in iio_push_to_buffers_with_timestamp() iio: adc: ti-ads8688: Fix alignment of buffer in iio_push_to_buffers_with_timestamp() iio: adc: mxs-lradc: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: adc: hx711: Fix buffer alignment in iio_push_to_buffers_with_timestamp() iio: adc: at91-sama5d2: Fix buffer alignment in iio_push_to_buffers_with_timestamp() counter: interrupt-cnt: Add const qualifier for actions_list array iio: ltr501: mark ltr501_chip_info as const iio: ltr501: ltr501_read_ps(): add missing endianness conversion ...	2021-06-17 18:20:56 +02:00
Rafael J. Wysocki	aff0dbd03d	ACPI: scan: Make acpi_walk_dep_device_list() Because acpi_walk_dep_device_list() is only called by the code in the file in which it is defined, make it static, drop the export of it and drop its header from acpi.h. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>	2021-06-17 15:56:03 +02:00
Rafael J. Wysocki	2d0795148a	ACPI: scan: Define acpi_bus_put_acpi_device() as static inline Since acpi_bus_put_acpi_device() is a synonym for acpi_dev_put(), define it as static inline in analogy with the latter. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>	2021-06-17 15:54:25 +02:00
Wesley Sheng	8cf486e131	nvme.h: add missing nvme_lba_range_type endianness annotations Signed-off-by: Wesley Sheng <wesley.sheng@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-06-17 15:51:21 +02:00
Chaitanya Kulkarni	aaf2e048af	nvmet: add ZBD over ZNS backend support NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to communicate with a non-volatile memory subsystem using zones for NVMe protocol-based controllers. NVMeOF already support the ZNS NVMe Protocol compliant devices on the target in the passthru mode. There are generic zoned block devices like Shingled Magnetic Recording (SMR) HDDs that are not based on the NVMe protocol. This patch adds ZNS backend support for non-ZNS zoned block devices as NVMeOF targets. This support includes implementing the new command set NVME_CSI_ZNS, adding different command handlers for ZNS command set such as NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append, NVMe Zone Management Send and NVMe Zone Management Receive. With the new command set identifier, we also update the target command effects logs to reflect the ZNS compliant commands. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-06-17 15:51:21 +02:00
Chaitanya Kulkarni	ab5d0b38c0	nvmet: add Command Set Identifier support NVMe TP 4056 allows controllers to support different command sets. NVMeoF target currently only supports namespaces that contain traditional logical blocks that may be randomly read and written. In some applications there is a value in exposing namespaces that contain logical blocks that have special access rules (e.g. sequentially write required namespace such as Zoned Namespace (ZNS)). In order to support the Zoned Block Devices (ZBD) backend, controllers need to have support for ZNS Command Set Identifier (CSI). In this preparation patch, we adjust the code such that it can now support the default command set identifier. We update the namespace data structure to store the CSI value which defaults to NVME_CSI_NVM that represents traditional logical blocks namespace type. The CSI support is required to implement the ZBD backend for NVMeOF with host side NVMe ZNS interface, since ZNS commands belong to the different command set than the default one. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-06-17 15:51:20 +02:00
Chaitanya Kulkarni	c28a61471c	block: export blk_next_bio() The block layer provides emulation of zone management operations targeting all zones of a zoned block device only for the zone reset operation (REQ_OP_ZONE_RESET). In order to correctly implement exporting of zoned block devices with NVMeOF, emulating zone management operations targeting all zones of a device is also necessary for the open, close and finish zone operations (REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH). Instead of duplicating the code, export the existing helper from block layer so we can use a bio chaining pattern that is present in the block layer for REQ_OP_ZONE RESET all emulation in the NVMeOF zoned block device backend. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-06-17 15:51:20 +02:00
Arnd Bergmann	55712627bf	pata: ixp4xx: split platform data to its own header Portable drivers cannot use mach/platform.h, so move the structure into its own header. With this, compile testing can be enabled. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>	2021-06-17 15:31:04 +02:00
Arnd Bergmann	09aa9aabdc	soc: ixp4xx: move cpu detection to linux/soc/ixp4xx/cpu.h Generic drivers are unable to use the feature macros from mach/cpu.h or the feature bits from mach/hardware.h, so move these into a global header file along with some dummy helpers that list these features as disabled elsewhere. Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: Zoltan HERPAI <wigyori@uid0.hu> Cc: Raylynn Knight <rayknight@me.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>	2021-06-17 15:30:54 +02:00
Lukasz Luba	8f1b971b47	sched/cpufreq: Consider reduced CPU capacity in energy calculation Energy Aware Scheduling (EAS) needs to predict the decisions made by SchedUtil. The map_util_freq() exists to do that. There are corner cases where the max allowed frequency might be reduced (due to thermal). SchedUtil as a CPUFreq governor, is aware of that but EAS is not. This patch aims to address it. SchedUtil stores the maximum allowed frequency in 'sugov_policy::next_freq' field. EAS has to predict that value, which is the real used frequency. That value is made after a call to cpufreq_driver_resolve_freq() which clamps to the CPUFreq policy limits. In the existing code EAS is not able to predict that real frequency. This leads to energy estimation errors. To avoid wrong energy estimation in EAS (due to frequency miss prediction) make sure that the step which calculates Performance Domain frequency, is also aware of the allowed CPU capacity. Furthermore, modify map_util_freq() to not extend the frequency value. Instead, use map_util_perf() to extend the util value in both places: SchedUtil and EAS, but for EAS clamp it to max allowed CPU capacity. In the end, we achieve the same desirable behavior for both subsystems and alignment in regards to the real CPU frequency. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (For the schedutil part) Link: https://lore.kernel.org/r/20210614191238.23224-1-lukasz.luba@arm.com	2021-06-17 14:11:43 +02:00
Erik Rosen	e8e00c83a2	hwmon: (pmbus) Add support for reading direct mode coefficients Add support for reading and decoding direct format coefficients to the PMBus core driver. If the new flag PMBUS_USE_COEFFICIENTS_CMD is set, the driver will use the COEFFICIENTS register together with the information in the pmbus_sensor_attr structs to initialize relevant coefficients for the direct mode format. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> [groeck: Initialize ret with -EINVAL in pmbus_init_coefficients()] Signed-off-by: Guenter Roeck <linux@roeck-us.net>	2021-06-17 04:21:46 -07:00
Erik Rosen	dbc0860f7a	hwmon: (pmbus) Add new pmbus flag NO_WRITE_PROTECT Some PMBus chips respond with invalid data when reading the WRITE_PROTECT register. For such chips, this flag should be set so that the PMBus core driver doesn't use the WRITE_PROTECT command to determine its behavior. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>	2021-06-17 04:21:46 -07:00
Erik Rosen	86c908d90f	hwmon: (pmbus) Add new flag PMBUS_READ_STATUS_AFTER_FAILED_CHECK Some PMBus chips end up in an undefined state when trying to read an unsupported register. For such chips, it is necessary to reset the chip pmbus controller to a known state after a failed register check. This can be done by reading a known register. By setting this flag the driver will try to read the STATUS register after each failed register check. This read may fail, but it will put the chip into a known state. Signed-off-by: Erik Rosen <erik.rosen@metormote.com> Link: https://lore.kernel.org/r/20210507194023.61138-2-erik.rosen@metormote.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>	2021-06-17 04:21:45 -07:00
Matti Vaittinen	7a2c4cc537	devm-helpers: Add resource managed version of work init A few drivers which need a work-queue must cancel work at driver detach. Some of those implement remove() solely for this purpose. Help drivers to avoid unnecessary remove and error-branch implementation by adding managed verision of work initialization. This will also help drivers to avoid mixing manual and devm based unwinding when other resources are handled by devm. Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/94ff4175e7f2ff134ed2fa7d6e7641005cc9784b.1623146580.git.matti.vaittinen@fi.rohmeurope.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2021-06-17 13:21:06 +02:00
Arnd Bergmann	0a7790be18	media: subdev: disallow ioctl for saa6588/davinci The saa6588_ioctl() function expects to get called from other kernel functions with a 'saa6588_command' pointer, but I found nothing stops it from getting called from user space instead, which seems rather dangerous. The same thing happens in the davinci vpbe driver with its VENC_GET_FLD command. As a quick fix, add a separate .command() callback pointer for this driver and change the two callers over to that. This change can easily get backported to stable kernels if necessary, but since there are only two drivers, we may want to eventually replace this with a set of more specialized callbacks in the long run. Fixes: `c3fda7f835` ("V4L/DVB (10537): saa6588: convert to v4l2_subdev.") Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2021-06-17 10:18:37 +02:00
Tomi Valkeinen	0d346d2a6f	media: v4l2-subdev: add subdev-wide state struct We have 'struct v4l2_subdev_pad_config' which contains configuration for a single pad used for the TRY functionality, and an array of those structs is passed to various v4l2_subdev_pad_ops. I was working on subdev internal routing between pads, and realized that there's no way to add TRY functionality for routes, which is not pad specific configuration. Adding a separate struct for try-route config wouldn't work either, as e.g. set-fmt needs to know the try-route configuration to propagate the settings. This patch adds a new struct, 'struct v4l2_subdev_state' (which at the moment only contains the v4l2_subdev_pad_config array) and the new struct is used in most of the places where v4l2_subdev_pad_config was used. All v4l2_subdev_pad_ops functions taking v4l2_subdev_pad_config are changed to instead take v4l2_subdev_state. The changes to drivers/media/v4l2-core/v4l2-subdev.c and include/media/v4l2-subdev.h were written by hand, and all the driver changes were done with the semantic patch below. The spatch needs to be applied to a select list of directories. I used the following shell commands to apply the spatch: dirs="drivers/media/i2c drivers/media/platform drivers/media/usb drivers/media/test-drivers/vimc drivers/media/pci drivers/staging/media" for dir in $dirs; do spatch -j8 --dir --include-headers --no-show-diff --in-place --sp-file v4l2-subdev-state.cocci $dir; done Note that Coccinelle chokes on a few drivers (gcc extensions?). With minor changes we can make Coccinelle run fine, and these changes can be reverted after spatch. The diff for these changes is: For drivers/media/i2c/s5k5baf.c: @@ -1481,7 +1481,7 @@ static int s5k5baf_set_selection(struct v4l2_subdev sd, &s5k5baf_cis_rect, v4l2_subdev_get_try_crop(sd, cfg, PAD_CIS), v4l2_subdev_get_try_compose(sd, cfg, PAD_CIS), - v4l2_subdev_get_try_crop(sd, cfg, PAD_OUT) + v4l2_subdev_get_try_crop(sd, cfg, PAD_OUT), }; s5k5baf_set_rect_and_adjust(rects, rtype, &sel->r); return 0; For drivers/media/platform/s3c-camif/camif-capture.c: @@ -1230,7 +1230,7 @@ static int s3c_camif_subdev_get_fmt(struct v4l2_subdev sd, mf = camif->mbus_fmt; break; - case CAMIF_SD_PAD_SOURCE_C...CAMIF_SD_PAD_SOURCE_P: + case CAMIF_SD_PAD_SOURCE_C: / crop rectangle at camera interface input / mf->width = camif->camif_crop.width; mf->height = camif->camif_crop.height; @@ -1332,7 +1332,7 @@ static int s3c_camif_subdev_set_fmt(struct v4l2_subdev sd, } break; - case CAMIF_SD_PAD_SOURCE_C...CAMIF_SD_PAD_SOURCE_P: + case CAMIF_SD_PAD_SOURCE_C: /* Pixel format can be only changed on the sink pad. / mf->code = camif->mbus_fmt.code; mf->width = crop->width; The semantic patch is: // <smpl> // Change function parameter @@ identifier func; identifier cfg; @@ func(..., - struct v4l2_subdev_pad_config cfg + struct v4l2_subdev_state sd_state , ...) { <... - cfg + sd_state ...> } // Change function declaration parameter @@ identifier func; identifier cfg; type T; @@ T func(..., - struct v4l2_subdev_pad_config cfg + struct v4l2_subdev_state sd_state , ...); // Change function return value @@ identifier func; @@ - struct v4l2_subdev_pad_config + struct v4l2_subdev_state func(...) { ... } // Change function declaration return value @@ identifier func; @@ - struct v4l2_subdev_pad_config + struct v4l2_subdev_state func(...); // Some drivers pass a local pad_cfg for a single pad to a called function. Wrap it // inside a pad_state. @@ identifier func; identifier pad_cfg; @@ func(...) { ... struct v4l2_subdev_pad_config pad_cfg; + struct v4l2_subdev_state pad_state = { .pads = &pad_cfg }; <+... ( v4l2_subdev_call \| sensor_call \| isi_try_fse \| isc_try_fse \| saa_call_all ) (..., - &pad_cfg + &pad_state ,...) ...+> } // If the function uses fields from pad_config, access via state->pads @@ identifier func; identifier state; @@ func(..., struct v4l2_subdev_state state , ...) { <... ( - state->try_fmt + state->pads->try_fmt \| - state->try_crop + state->pads->try_crop \| - state->try_compose + state->pads->try_compose ) ...> } // If the function accesses the filehandle, use fh->state instead @@ struct v4l2_subdev_fh fh; @@ - fh->pad + fh->state @@ struct v4l2_subdev_fh fh; @@ - fh.pad + fh.state // Start of vsp1 specific @@ @@ struct vsp1_entity { ... - struct v4l2_subdev_pad_config config; + struct v4l2_subdev_state config; ... }; @@ symbol entity; @@ vsp1_entity_init(...) { ... entity->config = - v4l2_subdev_alloc_pad_config + v4l2_subdev_alloc_state (&entity->subdev); ... } @@ symbol entity; @@ vsp1_entity_destroy(...) { ... - v4l2_subdev_free_pad_config + v4l2_subdev_free_state (entity->config); ... } @exists@ identifier func =~ "(^vsp1.)\|(hsit_set_format)\|(sru_enum_frame_size)\|(sru_set_format)\|(uif_get_selection)\|(uif_set_selection)\|(uds_enum_frame_size)\|(uds_set_format)\|(brx_set_format)\|(brx_get_selection)\|(histo_get_selection)\|(histo_set_selection)\|(brx_set_selection)"; symbol config; @@ func(...) { ... - struct v4l2_subdev_pad_config config; + struct v4l2_subdev_state config; ... } // End of vsp1 specific // Start of rcar specific @@ identifier sd; identifier pad_cfg; @@ rvin_try_format(...) { ... - struct v4l2_subdev_pad_config pad_cfg; + struct v4l2_subdev_state sd_state; ... - pad_cfg = v4l2_subdev_alloc_pad_config(sd); + sd_state = v4l2_subdev_alloc_state(sd); <... - pad_cfg + sd_state ...> - v4l2_subdev_free_pad_config(pad_cfg); + v4l2_subdev_free_state(sd_state); ... } // End of rcar specific // Start of rockchip specific @@ identifier func =~ "(rkisp1_rsz_get_pad_fmt)\|(rkisp1_rsz_get_pad_crop)\|(rkisp1_rsz_register)"; symbol rsz; symbol pad_cfg; @@ func(...) { + struct v4l2_subdev_state state = { .pads = rsz->pad_cfg }; ... - rsz->pad_cfg + &state ... } @@ identifier func =~ "(rkisp1_isp_get_pad_fmt)\|(rkisp1_isp_get_pad_crop)"; symbol isp; symbol pad_cfg; @@ func(...) { + struct v4l2_subdev_state state = { .pads = isp->pad_cfg }; ... - isp->pad_cfg + &state ... } @@ symbol rkisp1; symbol isp; symbol pad_cfg; @@ rkisp1_isp_register(...) { + struct v4l2_subdev_state state = { .pads = rkisp1->isp.pad_cfg }; ... - rkisp1->isp.pad_cfg + &state ... } // End of rockchip specific // Start of tegra-video specific @@ identifier sd; identifier pad_cfg; @@ __tegra_channel_try_format(...) { ... - struct v4l2_subdev_pad_config pad_cfg; + struct v4l2_subdev_state sd_state; ... - pad_cfg = v4l2_subdev_alloc_pad_config(sd); + sd_state = v4l2_subdev_alloc_state(sd); <... - pad_cfg + sd_state ...> - v4l2_subdev_free_pad_config(pad_cfg); + v4l2_subdev_free_state(sd_state); ... } @@ identifier sd_state; @@ __tegra_channel_try_format(...) { ... struct v4l2_subdev_state *sd_state; <... - sd_state->try_crop + sd_state->pads->try_crop ...> } // End of tegra-video specific // </smpl> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2021-06-17 10:01:27 +02:00
Liu Shixin	10ff9976d0	crypto: api - remove CRYPTOA_U32 and related functions According to the advice of Eric and Herbert, type CRYPTOA_U32 has been unused for over a decade, so remove the code related to CRYPTOA_U32. After removing CRYPTOA_U32, the type of the variable attrs can be changed from union to struct. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-06-17 15:07:31 +08:00
Ard Biesheuvel	22ca9f4aaf	crypto: shash - avoid comparing pointers to exported functions under CFI crypto_shash_alg_has_setkey() is implemented by testing whether the .setkey() member of a struct shash_alg points to the default version, called shash_no_setkey(). As crypto_shash_alg_has_setkey() is a static inline, this requires shash_no_setkey() to be exported to modules. Unfortunately, when building with CFI, function pointers are routed via CFI stubs which are private to each module (or to the kernel proper) and so this function pointer comparison may fail spuriously. Let's fix this by turning crypto_shash_alg_has_setkey() into an out of line function. Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-06-17 15:07:31 +08:00
Kees Cook	1d1f6cc581	pstore/blk: Include zone in pstore_device_info Information was redundant between struct pstore_zone_info and struct pstore_device_info. Use struct pstore_zone_info, with member name "zone". Additionally untangle the logic for the "best effort" block device instance. Signed-off-by: Kees Cook <keescook@chromium.org> Fixed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/lkml/20210617005424.182305-1-pulehui@huawei.com	2021-06-16 21:09:31 -07:00
Hao Fang	e3211e414d	arm64: dts: hisilicon: use the correct HiSilicon copyright s/Hisilicon/HiSilicon/. It should use capital S, according to the official website https://www.hisilicon.com/en. Signed-off-by: Hao Fang <fanghao11@huawei.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Wei Xu <xuwei5@hisilicon.com>	2021-06-17 01:28:15 +00:00
Pablo Neira Ayuso	836382dc24	netfilter: nf_tables: add last expression Add a new optional expression that tells you when last matching on a given rule / set element element has happened. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-06-17 03:23:00 +02:00
Olof Johansson	1eb5f83ee9	Merge tag 'memory-controller-drv-tegra-5.14-2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into arm/drivers Memory controller drivers for v5.14 - Tegra SoC, part two Second set of changes for Tegra SoC memory controller drivers, containing patchset from Thierry Reding: "The goal here is to avoid early identity mappings altogether and instead postpone the need for the identity mappings to when devices are attached to the SMMU. This works by making the SMMU driver coordinate with the memory controller driver on when to start enforcing SMMU translations. This makes Tegra behave in a more standard way and pushes the code to deal with the Tegra-specific programming into the NVIDIA SMMU implementation." This pulls a dependency from Will Deacon (ARM SMMU driver) and contains further ARM SMMU driver patches to resolve complex dependencies between different patchsets. The pull from Will contains only one patch ("Implement ->probe_finalize()"). Further work in Will's tree might depend on this patch, therefore patch was applied there. On the other hand, this ("Implement ->probe_finalize()") patch is also a dependency for ARM SMMU driver changes for Tegra. These changes, bringing seamless transition from the firmware framebuffer to the OS framebuffer, depend on earlier Tegra memory controller driver patches. * tag 'memory-controller-drv-tegra-5.14-2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl: (37 commits) iommu/arm-smmu: Use Tegra implementation on Tegra186 iommu/arm-smmu: tegra: Implement SID override programming iommu/arm-smmu: tegra: Detect number of instances at runtime dt-bindings: arm-smmu: Add Tegra186 compatible string memory: tegra: Delete dead debugfs checking code iommu/arm-smmu: Implement ->probe_finalize() memory: tegra: Implement SID override programming memory: tegra: Split Tegra194 data into separate file memory: tegra: Add memory client IDs to tables memory: tegra: Unify drivers memory: tegra: Only initialize reset controller if available memory: tegra: Make IRQ support opitonal memory: tegra: Parameterize interrupt handler memory: tegra: Extract setup code into callback memory: tegra: Make per-SoC setup more generic memory: tegra: Push suspend/resume into SoC drivers memory: tegra: Introduce struct tegra_mc_ops memory: tegra: Unify struct tegra_mc across SoC generations memory: tegra: Consolidate register fields memory: tegra30-emc: Use devm_tegra_core_dev_init_opp_table() ... Link: https://lore.kernel.org/r/20210614195200.21657-1-krzysztof.kozlowski@canonical.com Signed-off-by: Olof Johansson <olof@lixom.net>	2021-06-16 17:36:30 -07:00
Jason Gunthorpe	915e4af59f	RDMA: Remove rdma_set_device_sysfs_group() The driver's device group can be specified as part of the ops structure like the device's port group. No need for the complicated API. Link: https://lore.kernel.org/r/8964785a34fd3a29ff5b6693493f575b717e594d.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:58:32 -03:00
Jason Gunthorpe	d7407d1669	RDMA: Change ops->init_port to ops->port_groups init_port was only being used to register sysfs attributes against the port kobject. Now that all users are creating static attribute_group's we can simply set the attribute_group list in the ops and the core code can just handle it directly. This makes all the sysfs management quite straightforward and prevents any driver from abusing the naked port kobject in future because no driver code can access it. Link: https://lore.kernel.org/r/114f68f3d921460eafe14cea5a80ca65d81729c3.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:58:31 -03:00
Jason Gunthorpe	054239f45c	RDMA/core: Expose the ib port sysfs attribute machinery Other things outside the core code are creating attributes against the port. This patch exposes the basic machinery to do this. The ib_port_attribute type allows creating groups of attributes attatched to the port and comes with the usual machinery to do this. Link: https://lore.kernel.org/r/5c4aeae57f6fa7c59a1d6d1c5506069516ae9bbf.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:58:30 -03:00
Jason Gunthorpe	b7066b32a1	RDMA/core: Create the device hw_counters through the normal groups mechanism Instead of calling device_add_groups() add the group to the existing groups array which is managed through device_add(). This requires setting up the hw_counters before device_add(), so it gets split up from the already split port sysfs flow. Move all the memory freeing to the release function. Link: https://lore.kernel.org/r/666250d937b64f6fdf45da9e2dc0b6e5e4f7abd8.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:58:30 -03:00
Jason Gunthorpe	467f432a52	RDMA/core: Split port and device counter sysfs attributes This code creates a 'struct hw_stats_attribute' for each sysfs entry that contains a naked 'struct attribute' inside. It then proceeds to attach this same structure to a 'struct device' kobj and a 'struct ib_port' kobj. However, this violates the typing requirements. 'struct device' requires the attribute to be a 'struct device_attribute' and 'struct ib_port' requires the attribute to be 'struct port_attribute'. This happens to work because the show/store function pointers in all three structures happen to be at the same offset and happen to be nearly the same signature. This means when container_of() was used to go between the wrong two types it still managed to work. However clang CFI detection notices that the function pointers have a slightly different signature. As with show/store this was only working because the device and port struct layouts happened to have the kobj at the front. Correct this by have two independent sets of data structures for the port and device case. The two different attributes correctly include the port/device_attribute struct and everything from there up is kept split. The show/store function call chains start with device/port unique functions that invoke a common show/store function pointer. Link: https://lore.kernel.org/r/a8b3864b4e722aed3657512af6aa47dc3c5033be.1623427137.git.leonro@nvidia.com Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:58:29 -03:00
Jason Gunthorpe	d8a5883814	RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer It is much saner to store a pointer to the kobject structure that contains the cannonical stats pointer than to copy the stats pointers into a public structure. Future patches will require the sysfs pointer for other purposes. Link: https://lore.kernel.org/r/f90551dfd296cde1cb507bbef27cca9891d19871.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:58:29 -03:00
Jason Gunthorpe	4b5f4d3fb4	RDMA: Split the alloc_hw_stats() ops to port and device variants This is being used to implement both the port and device global stats, which is causing some confusion in the drivers. For instance EFA and i40iw both seem to be misusing the device stats. Split it into two ops so drivers that don't support one or the other can leave the op NULL'd, making the calling code a little simpler to understand. Link: https://lore.kernel.org/r/1955c154197b2a159adc2dc97266ddc74afe420c.1623427137.git.leonro@nvidia.com Tested-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:58:29 -03:00
Bob Pearson	660a59369e	RDMA/rxe: Add bind MW fields to rxe_send_wr Add fields to struct rxe_send_wr in rdma_user_rxe.h to support bind MW work requests Link: https://lore.kernel.org/r/20210608042552.33275-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-16 20:51:17 -03:00
Dmytro Linkin	a5ae8fc905	net/mlx5e: Don't create devices during unload flow Running devlink reload command for port in switchdev mode cause resources to corrupt: driver can't release allocated EQ and reclaim memory pages, because "rdma" auxiliary device had add CQs which blocks EQ from deletion. Erroneous sequence happens during reload-down phase, and is following: 1. detach device - suspends auxiliary devices which support it, destroys others. During this step "eth-rep" and "rdma-rep" are destroyed, "eth" - suspended. 2. disable SRIOV - moves device to legacy mode; as part of disablement - rescans drivers. This step adds "rdma" auxiliary device. 3. destroy EQ table - <failure>. Driver shouldn't create any device during unload flows. To handle that implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset on device attach. If flag is set do no-op on drivers rescan. Fixes: `a925b5e309` ("net/mlx5: Register mlx5 devices to auxiliary virtual bus") Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-16 15:36:47 -07:00
Russell King	8fe55ef233	PCI: Dynamically map ECAM regions Attempting to boot 32-bit ARM kernels under QEMU's 3.x virt models fails when we have more than 512M of RAM in the model as we run out of vmalloc space for the PCI ECAM regions. This failure will be silent when running libvirt, as the console in that situation is a PCI device. In this configuration, the kernel maps the whole ECAM, which QEMU sets up for 256 buses, even when maybe only seven buses are in use. Each bus uses 1M of ECAM space, and ioremap() adds an additional guard page between allocations. The kernel vmap allocator will align these regions to 512K, resulting in each mapping eating 1.5M of vmalloc space. This means we need 384M of vmalloc space just to map all of these, which is very wasteful of resources. Fix this by only mapping the ECAM for buses we are going to be using. In my setups, this is around seven buses in most guests, which is 10.5M of vmalloc space - way smaller than the 384M that would otherwise be required. This also means that the kernel can boot without forcing extra RAM into highmem with the vmalloc= argument, or decreasing the virtual RAM available to the guest. Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/E1lhCAV-0002yb-50@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de>	2021-06-16 17:20:40 -05:00
Guvenc Gulce	194730a9be	net/smc: Make SMC statistics network namespace aware Make the gathered SMC statistics network namespace aware, for each namespace collect an own set of statistic information. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:54:02 -07:00
Guvenc Gulce	f0dd7bf5e3	net/smc: Add netlink support for SMC fallback statistics Add support to collect more detailed SMC fallback reason statistics and provide these statistics to user space on the netlink interface. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:54:02 -07:00
Guvenc Gulce	8c40602b4b	net/smc: Add netlink support for SMC statistics Add the netlink function which collects the statistics information and delivers it to the userspace. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-16 12:54:02 -07:00
Hugh Dickins	22061a1ffa	mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() There is a race between THP unmapping and truncation, when truncate sees pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it, but before its page_remove_rmap() gets to decrement compound_mapcount: generating false "BUG: Bad page cache" reports that the page is still mapped when deleted. This commit fixes that, but not in the way I hoped. The first attempt used try_to_unmap(page, TTU_SYNC\|TTU_IGNORE_MLOCK) instead of unmap_mapping_range() in truncate_cleanup_page(): it has often been an annoyance that we usually call unmap_mapping_range() with no pages locked, but there apply it to a single locked page. try_to_unmap() looks more suitable for a single locked page. However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): it is used to insert THP migration entries, but not used to unmap THPs. Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB needs are different, I'm too ignorant of the DAX cases, and couldn't decide how far to go for anon+swap. Set that aside. The second attempt took a different tack: make no change in truncate.c, but modify zap_huge_pmd() to insert an invalidated huge pmd instead of clearing it initially, then pmd_clear() between page_remove_rmap() and unlocking at the end. Nice. But powerpc blows that approach out of the water, with its serialize_against_pte_lookup(), and interesting pgtable usage. It would need serious help to get working on powerpc (with a minor optimization issue on s390 too). Set that aside. Just add an "if (page_mapped(page)) synchronize_rcu();" or other such delay, after unmapping in truncate_cleanup_page()? Perhaps, but though that's likely to reduce or eliminate the number of incidents, it would give less assurance of whether we had identified the problem correctly. This successful iteration introduces "unmap_mapping_page(page)" instead of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, with an addition to details. Then zap_pmd_range() watches for this case, and does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but safe. Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to assert its interface; but currently that's only used to make sure that page->mapping is stable, and zap_pmd_range() doesn't care if the page is locked or not. Along these lines, in invalidate_inode_pages2_range() move the initial unmap_mapping_range() out from under page lock, before then calling unmap_mapping_page() under page lock if still mapped. Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com Fixes: `fc127da085` ("truncate: handle file thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-16 09:24:42 -07:00
Hugh Dickins	732ed55823	mm/thp: try_to_unmap() use TTU_SYNC for safe splitting Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0. And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount. Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page(). When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: `fec89c109f` ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-16 09:24:42 -07:00
Hugh Dickins	3b77e8c8cd	mm/thp: make is_huge_zero_pmd() safe and quicker Most callers of is_huge_zero_pmd() supply a pmd already verified present; but a few (notably zap_huge_pmd()) do not - it might be a pmd migration entry, in which the pfn is encoded differently from a present pmd: which might pass the is_huge_zero_pmd() test (though not on x86, since L1TF forced us to protect against that); or perhaps even crash in pmd_page() applied to a swap-like entry. Make it safe by adding pmd_present() check into is_huge_zero_pmd() itself; and make it quicker by saving huge_zero_pfn, so that is_huge_zero_pmd() will not need to do that pmd_page() lookup each time. __split_huge_pmd_locked() checked pmd_trans_huge() before: that worked, but is unnecessary now that is_huge_zero_pmd() checks present. Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com Fixes: `e71769ae52` ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-16 09:24:42 -07:00
Mike Kravetz	846be08578	mm/hugetlb: expand restore_reserve_on_error functionality The routine restore_reserve_on_error is called to restore reservation information when an error occurs after page allocation. The routine alloc_huge_page modifies the mapping reserve map and potentially the reserve count during allocation. If code calling alloc_huge_page encounters an error after allocation and needs to free the page, the reservation information needs to be adjusted. Currently, restore_reserve_on_error only takes action on pages for which the reserve count was adjusted(HPageRestoreReserve flag). There is nothing wrong with these adjustments. However, alloc_huge_page ALWAYS modifies the reserve map during allocation even if the reserve count is not adjusted. This can cause issues as observed during development of this patch [1]. One specific series of operations causing an issue is: - Create a shared hugetlb mapping Reservations for all pages created by default - Fault in a page in the mapping Reservation exists so reservation count is decremented - Punch a hole in the file/mapping at index previously faulted Reservation and any associated pages will be removed - Allocate a page to fill the hole No reservation entry, so reserve count unmodified Reservation entry added to map by alloc_huge_page - Error after allocation and before instantiating the page Reservation entry remains in map - Allocate a page to fill the hole Reservation entry exists, so decrement reservation count This will cause a reservation count underflow as the reservation count was decremented twice for the same index. A user would observe a very large number for HugePages_Rsvd in /proc/meminfo. This would also likely cause subsequent allocations of hugetlb pages to fail as it would 'appear' that all pages are reserved. This sequence of operations is unlikely to happen, however they were easily reproduced and observed using hacked up code as described in [1]. Address the issue by having the routine restore_reserve_on_error take action on pages where HPageRestoreReserve is not set. In this case, we need to remove any reserve map entry created by alloc_huge_page. A new helper routine vma_del_reservation assists with this operation. There are three callers of alloc_huge_page which do not currently call restore_reserve_on error before freeing a page on error paths. Add those missing calls. [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.com/ Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com Fixes: `96b96a96dd` ("mm/hugetlb: fix huge page reservation leak in private mapping error paths" Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-16 09:24:42 -07:00
Peter Xu	099dd6878b	mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare I found it by pure code review, that pte_same_as_swp() of unuse_vma() didn't take uffd-wp bit into account when comparing ptes. pte_same_as_swp() returning false negative could cause failure to swapoff swap ptes that was wr-protected by userfaultfd. Link: https://lkml.kernel.org/r/20210603180546.9083-1-peterx@redhat.com Fixes: `f45ec5ff16` ("userfaultfd: wp: support swap and page migration") Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [5.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-16 09:24:42 -07:00
Naoya Horiguchi	25182f05ff	mm,hwpoison: fix race with hugetlb page allocation When hugetlb page fault (under overcommitting situation) and memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race: CPU0: CPU1: gather_surplus_pages() page = alloc_surplus_huge_page() memory_failure_hugetlb() get_hwpoison_page(page) __get_hwpoison_page(page) get_page_unless_zero(page) zero = put_page_testzero(page) VM_BUG_ON_PAGE(!zero, page) enqueue_huge_page(h, page) put_page(page) __get_hwpoison_page() only checks the page refcount before taking an additional one for memory error handling, which is not enough because there's a time window where compound pages have non-zero refcount during hugetlb page initialization. So make __get_hwpoison_page() check page status a bit more for hugetlb pages with get_hwpoison_huge_page(). Checking hugetlb-specific flags under hugetlb_lock makes sure that the hugetlb page is not transitive. It's notable that another new function, HWPoisonHandlable(), is helpful to prevent a race against other transitive page states (like a generic compound page just before PageHuge becomes true). Link: https://lkml.kernel.org/r/20210603233632.2964832-2-nao.horiguchi@gmail.com Fixes: `ead07f6a86` ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> [5.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-16 09:24:42 -07:00
Hans de Goede	c8d9c3674c	Merge remote-tracking branch 'linux-pm/acpi-scan' into review-hans	2021-06-16 17:48:22 +02:00
Hans de Goede	6c8f2df3b5	Merge tag 'intel-gpio-v5.14-1' into review-hans intel-gpio for v5.14-1 * Export two functions from GPIO ACPI for wider use * Clean up Whiskey Cove and Crystal Cove GPIO drivers The following is an automated git shortlog grouped by driver: crystalcove: - remove platform_set_drvdata() + cleanup probe gpiolib: - acpi: Add acpi_gpio_get_io_resource() - acpi: Introduce acpi_get_and_request_gpiod() helper wcove: - Split error handling for CTRL and IRQ registers - Unify style of to_reg() with to_ireg() - Use IRQ hardware number getter instead of direct access	2021-06-16 17:48:18 +02:00
Maximilian Luz	e8e298a653	platform/surface: aggregator_cdev: Allow enabling of events from user-space While events can already be enabled and disabled via the generic request IOCTL, this bypasses the internal reference counting mechanism of the controller. Due to that, disabling an event will turn it off regardless of any other client having requested said event, which may break functionality of that client. To solve this, add IOCTLs wrapping the ssam_controller_event_enable() and ssam_controller_event_disable() functions, which have been previously introduced for this specific purpose. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210604134755.535590-6-luzmaximilian@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2021-06-16 17:47:53 +02:00
Maximilian Luz	776c53c6a4	platform/surface: aggregator_cdev: Add support for forwarding events to user-space Currently, debugging unknown events requires writing a custom driver. This is somewhat difficult, slow to adapt, and not entirely user-friendly for quickly trying to figure out things on devices of some third-party user. We can do better. We already have a user-space interface intended for debugging SAM EC requests, so let's add support for receiving events to that. This commit provides support for receiving events by reading from the controller file. It additionally introduces two new IOCTLs to control which event categories will be forwarded. Specifically, a user-space client can specify which target categories it wants to receive events from by registering the corresponding notifier(s) via the IOCTLs and after that, read the received events by reading from the controller device. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210604134755.535590-5-luzmaximilian@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2021-06-16 17:47:53 +02:00
Maximilian Luz	b2763358fe	platform/surface: aggregator: Update copyright It's 2021, update the copyright accordingly. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210604134755.535590-4-luzmaximilian@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2021-06-16 17:47:53 +02:00
Maximilian Luz	4b38a1dcf3	platform/surface: aggregator: Allow enabling of events without notifiers We can already enable and disable SAM events via one of two ways: either via a (non-observer) notifier tied to a specific event group, or a generic event enable/disable request. In some instances, however, neither method may be desirable. The first method will tie the event enable request to a specific notifier, however, when we want to receive notifications for multiple event groups of the same target category and forward this to the same notifier callback, we may receive duplicate events, i.e. one event per registered notifier. The second method will bypass the internal reference counting mechanism, meaning that a disable request will disable the event regardless of any other client driver using it, which may break the functionality of that driver. To address this problem, add new functions that allow enabling and disabling of events via the event reference counting mechanism built into the controller, without needing to register a notifier. This can then be used in combination with observer notifiers to process multiple events of the same target category without duplication in the same callback function. Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Link: https://lore.kernel.org/r/20210604134755.535590-3-luzmaximilian@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2021-06-16 17:47:53 +02:00

... 13 14 15 16 17 ...

131299 Commits