linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-13 17:45:09 -05:00

Author	SHA1	Message	Date
Boris Brezillon	d48f62b9a0	mtd: nand: move of_get_nand_xxx() helpers into nand_base.c Now that all drivers go through nand_set_flash_node() to parse the generic NAND properties, we can move all of_get_nand_xxx() helpers in to nand_base.c, make them static and remove of_mtd.c and of_mtd.h. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-05-05 23:52:00 +02:00
Boris Brezillon	aab616e31d	mtd: kill the nand_ecclayout struct Now that all MTD drivers have moved to the mtd_ooblayout_ops model we can safely remove the struct nand_ecclayout definition, and all the remaining places where it was still used. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-05-05 23:51:51 +02:00
Boris Brezillon	7f2b092c9e	mtd: nand: kill the ecc->layout field Now that all NAND drivers have switched to mtd_ooblayout_ops, we can kill the ecc->layout field. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-05-05 23:51:50 +02:00
Boris Brezillon	a411679fb5	mtd: onenand: switch to mtd_ooblayout_ops Implementing the mtd_ooblayout_ops interface is the new way of exposing ECC/OOB layout to MTD users. Modify the onenand drivers to switch to this approach. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-05-05 23:51:49 +02:00
Boris Brezillon	04a123a99f	mtd: nand: fsmc: get rid of the fsmc_nand_eccplace struct Now that mtd_ooblayout_ecc() returns the ECC byte position using the OOB free method, we can get rid of the fsmc_nand_eccplace struct. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-05-05 23:51:43 +02:00
Boris Brezillon	e5b2d30e42	mtd: nand: sharpsl: switch to mtd_ooblayout_ops Implementing the mtd_ooblayout_ops interface is the new way of exposing ECC/OOB layout to MTD users. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-05-05 23:51:36 +02:00
Boris Brezillon	41b207a70d	mtd: nand: implement the default mtd_ooblayout_ops Replace the default nand_ecclayout definitions for large and small page devices with the equivalent mtd_ooblayout_ops. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-04-19 22:05:56 +02:00
Boris Brezillon	adbbc3bc82	mtd: create an mtd_ooblayout_ops struct to ease ECC layout definition ECC layout definitions are currently exposed using the nand_ecclayout struct which embeds oobfree and eccpos arrays with predefined size. This approach was acceptable when NAND chips were providing relatively small OOB regions, but MLC and TLC now provide OOB regions of several hundreds of bytes, which implies a non negligible overhead for everybody even those who only need to support legacy NANDs. Create an mtd_ooblayout_ops interface providing the same functionality (expose the ECC and oobfree layout) without the need for this huge structure. The mtd->ecclayout is now deprecated and should be replaced by the equivalent mtd_ooblayout_ops. In the meantime we provide a wrapper around the ->ecclayout field to ease migration to this new model. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-04-19 22:05:55 +02:00
Boris Brezillon	036d6543f8	mtd: add mtd_set_ecclayout() helper function Add an mtd_set_ecclayout() helper function to avoid direct accesses to the mtd->ecclayout field. This will ease future reworks of ECC layout definition. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-04-19 22:05:53 +02:00
Boris Brezillon	75eb2cec25	mtd: add mtd_ooblayout_xxx() helper functions In order to make the ecclayout definition completely dynamic we need to rework the way the OOB layout are defined and iterated. Create a few mtd_ooblayout_xxx() helpers to ease OOB bytes manipulation and hide ecclayout internals to their users. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-04-19 22:05:47 +02:00
Boris Brezillon	9d02fc2a51	mtd: nand: export default read/write oob functions Export the default read/write oob functions (for the standard and syndrome scheme), so that drivers can use them for their raw implementation and implement their own functions for the normal oob operation. This is required if your ECC engine is capable of fixing some of the OOB data. In this case you have to overload the ->read_oob() and ->write_oob(), but if you don't specify the ->read/write_oob_raw() functions they are assigned to the ->read/write_oob() implementation, which is not what you want. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-04-19 22:05:38 +02:00
Raghav Dogra	7a65417216	mtd/ifc: Add support for IFC controller version 2.0 The new IFC controller version 2.0 has a different memory map page. Upto IFC 1.4 PAGE size is 4 KB and from IFC2.0 PAGE size is 64KB. This patch segregates the IFC global and runtime registers to appropriate PAGE sizes. Signed-off-by: Jaiprakash Singh <b44839@freescale.com> Signed-off-by: Raghav Dogra <raghav@freescale.com> Acked-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Raghav Dogra <raghav.dogra@nxp.com> Acked-by: Scott Wood <oss@buserror.net> Acked-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-04-19 22:04:53 +02:00
Rafał Miłecki	dd2dcc0042	of: mtd: prepare helper reading NAND ECC algo from DT NAND subsystem is being slightly reworked to store ECC details in separated fields. In future we'll want to add support for more DT properties as specifying every possible setup with a single "nand-ecc-mode" is a pretty bad idea. To allow this let's add a helper that will support something like "nand-ecc-algo" in future. Right now we use it for keeping backward compatibility. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-04-19 22:04:48 +02:00
Rafał Miłecki	b0fcd8ab7b	mtd: nand: add new enum for storing ECC algorithm Our nand_ecc_modes_t is already a bit abused by value NAND_ECC_SOFT_BCH. This enum should store ECC mode only and putting algorithm details there is a bad idea. It would result in too many values impossible to support in a sane way. To solve this problem let's add a new enum. We'll have to modify all drivers to set it properly but once it's done it'll be possible to drop NAND_ECC_SOFT_BCH. That will result in a cleaner design and more possibilities like setting ECC algorithm for hardware ECC mode. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2016-04-19 22:02:32 +02:00
Boris Brezillon	8de53481b4	Merge branch 'mtd-nand-trigger' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds into nand/next Pull leds-trigger changes from Jacek Anaszewski. Create a generic mtd led-trigger to replace the exisitng nand led-trigger implementation. * 'mtd-nand-trigger' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: mtd: Hook I/O activity to the MTD LED trigger mtd: nand: Remove the "nand-disk" LED trigger leds: trigger: Introduce a MTD (NAND/NOR) trigger mtd: Uninline mtd_write_oob and move it to mtdcore.c leds: trigger: Introduce a kernel panic LED trigger	2016-04-19 21:44:11 +02:00
Roger Quadros	10f22ee367	mtd: nand: omap2: Implement NAND ready using gpiolib The GPMC WAIT pin status are now available over gpiolib. Update the omap_dev_ready() function to use gpio instead of directly accessing GPMC register space. Signed-off-by: Roger Quadros <rogerq@ti.com> Acked-by: Brian Norris <computersforpeace@gmail.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Tony Lindgren <tony@atomide.com>	2016-04-15 11:55:37 +03:00
Roger Quadros	9e6946215d	memory: omap-gpmc: Prevent GPMC_STATUS from being accessed via gpmc_regs GPMC_STATUS register is private to the GPMC module and must not be accessed directly by NAND driver through the gpmc_regs. They must use gpmc_omap_get_nand_ops() instead. Signed-off-by: Roger Quadros <rogerq@ti.com> Acked-by: Tony Lindgren <tony@atomide.com>	2016-04-15 11:55:28 +03:00
Roger Quadros	c9711ec525	mtd: nand: omap: Clean up device tree support Move NAND specific device tree parsing to NAND driver. The NAND controller node must have a compatible id, register space resource and interrupt resource. Signed-off-by: Roger Quadros <rogerq@ti.com> Acked-by: Brian Norris <computersforpeace@gmail.com> Acked-by: Tony Lindgren <tony@atomide.com>	2016-04-15 11:53:36 +03:00
Roger Quadros	c509aefd75	mtd: nand: omap: Use gpmc_omap_get_nand_ops() to get NAND registers Deprecate nand register passing via platform data and use gpmc_omap_get_nand_ops() instead. Signed-off-by: Roger Quadros <rogerq@ti.com> Acked-by: Brian Norris <computersforpeace@gmail.com> Acked-by: Tony Lindgren <tony@atomide.com>	2016-04-15 11:53:02 +03:00
Roger Quadros	384258f252	memory: omap-gpmc: Implement IRQ domain for NAND IRQs GPMC provides 2 interrupts for NAND use. i.e. fifoevent and termcount. Use IRQ domain for this. NAND device tree node can then get the necessary interrupts by using gpmc as the interrupt parent. Legacy boot uses gpmc_get_client_irq to get the NAND interrupts from the GPMC IRQ domain. Get rid of custom bitmasks and use IRQ domain for that as well. Signed-off-by: Roger Quadros <rogerq@ti.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Tony Lindgren <tony@atomide.com>	2016-04-15 11:52:28 +03:00
Roger Quadros	f47fcad63f	memory: omap-gpmc: Introduce GPMC to NAND interface The OMAP GPMC module has certain registers dedicated for NAND access and some NAND bits mixed with other GPMC functionality. For the NAND dedicated registers we have the struct gpmc_nand_regs. The NAND driver needs to access NAND specific bits from the following non-dedicated registers - EMPTYWRITEBUFFERSTATUS from GPMC_STATUS For accessing these bits we introduce the struct gpmc_nand_ops. Add gpmc_omap_get_nand_ops() that returns the gpmc_nand_ops along with updating the gpmc_nand_regs. This API will be called by the OMAP NAND driver to access the necessary bits in GPMC register space. Signed-off-by: Roger Quadros <rogerq@ti.com> Acked-by: Tony Lindgren <tony@atomide.com>	2016-04-15 11:51:34 +03:00
Roger Quadros	fabe7d7756	ARM: OMAP2+: gpmc: Add gpmc timings and settings to platform data Add device_timings, gpmc_timings and gpmc_setting to gpmc platform data. Signed-off-by: Roger Quadros <rogerq@ti.com> Acked-by: Tony Lindgren <tony@atomide.com>	2016-04-15 11:51:02 +03:00
Roger Quadros	58bc67fc32	ARM: OMAP2+: gpmc: Add platform data Add a platform data structure for GPMC. It contains all the necessary platform information that needs to be passed from platform init code to GPMC driver. Signed-off-by: Roger Quadros <rogerq@ti.com> Acked-by: Tony Lindgren <tony@atomide.com>	2016-04-15 11:50:44 +03:00
Ezequiel Garcia	4b721174c7	leds: trigger: Introduce a MTD (NAND/NOR) trigger This commit introduces a MTD trigger for flash (NAND/NOR) device activity. The implementation is copied from IDE disk. This trigger deprecates the "nand-disk" LED trigger, but for backwards compatibility, we still keep the "nand-disk" trigger around. The motivation for deprecating the "nand-disk" LED trigger is that it only works for NAND drivers, whereas the "mtd" LED trigger is more generic (in fact, "nand-disk" currently only works for certain NAND drivers). Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>	2016-04-13 10:23:14 +02:00
Ezequiel Garcia	0c034fe377	mtd: Uninline mtd_write_oob and move it to mtdcore.c There's no reason for having mtd_write_oob inlined in mtd.h header. Move it to mtdcore.c where it belongs. Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>	2016-04-13 10:23:14 +02:00
Linus Torvalds	d5a38f6e46	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is quite a bit here, including some overdue refactoring and cleanup on the mon_client and osd_client code from Ilya, scattered writeback support for CephFS and a pile of bug fixes from Zheng, and a few random cleanups and fixes from others" [ I already decided not to pull this because of it having been rebased recently, but ended up changing my mind after all. Next time I'll really hold people to it. Oh well. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits) libceph: use KMEM_CACHE macro ceph: use kmem_cache_zalloc rbd: use KMEM_CACHE macro ceph: use lookup request to revalidate dentry ceph: kill ceph_get_dentry_parent_inode() ceph: fix security xattr deadlock ceph: don't request vxattrs from MDS ceph: fix mounting same fs multiple times ceph: remove unnecessary NULL check ceph: avoid updating directory inode's i_size accidentally ceph: fix race during filling readdir cache libceph: use sizeof_footer() more ceph: kill ceph_empty_snapc ceph: fix a wrong comparison ceph: replace CURRENT_TIME by current_fs_time() ceph: scattered page writeback libceph: add helper that duplicates last extent operation libceph: enable large, variable-sized OSD requests libceph: osdc->req_mempool should be backed by a slab pool libceph: make r_request msg_size calculation clearer ...	2016-03-26 15:53:16 -07:00
Linus Torvalds	b4cec5f668	Merge tag 'ntb-4.6' of git://github.com/jonmason/ntb Pull NTB bug fixes from Jon Mason: "NTB bug fixes for tasklet from spinning forever, link errors, translation window setup, NULL ptr dereference, and ntb-perf errors. Also, a modification to the driver API that makes _addr functions optional" * tag 'ntb-4.6' of git://github.com/jonmason/ntb: NTB: Remove _addr functions from ntb_hw_amd NTB: Make _addr functions optional in the API NTB: Fix incorrect clean up routine in ntb_perf NTB: Fix incorrect return check in ntb_perf ntb: fix possible NULL dereference ntb: add missing setup of translation window ntb: stop link work when we do not have memory ntb: stop tasklet from spinning forever during shutdown. ntb: perf test: fix address space confusion	2016-03-26 11:37:42 -07:00
Linus Torvalds	895a1067d5	Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull more SCSI updates from James Bottomley: "The only new stuff which missed the first pull request is an update to the UFS driver. The rest is an assortment of bug fixes and minor tweaks which appeared recently (some are fixes for recent code and some are stuff spotted recently by the checkers or the new gcc-6 compiler [most of Arnd's stuff])" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits) scsi_common: do not clobber fixed sense information scsi: ufs: select CONFIG_NLS scsi: fc: use get/put_unaligned64 for wwn access fnic: move printk()s outside of the critical code section. qla2xxx: avoid maybe_uninitialized warning megaraid_sas: add missing curly braces in ioctl handler lpfc: fix misleading indentation scsi_transport_sas: add 'scsi_target_id' sysfs attribute scsi_dh_alua: uninitialized variable in alua_check_vpd() scsi: ufs-qcom: add printouts of testbus debug registers scsi: ufs-qcom: enable/disable the device ref clock scsi: ufs-qcom: set PA_Local_TX_LCC_Enable before link startup scsi: ufs: add device quirk delay before putting UFS rails in LPM scsi: ufs: fix leakage during link off state scsi: ufs: tune UniPro parameters to optimize hibern8 exit time scsi: ufs: handle non spec compliant bkops behaviour by device scsi: ufs: add retry for query descriptors scsi: ufs: add error recovery after DL NAC error scsi: ufs: make error handling bit faster scsi: ufs: disable vccq if it's not needed by UFS device ...	2016-03-26 11:31:01 -07:00
Alexander Potapenko	cd11016e5f	mm, kasan: stackdepot implementation. Enable stackdepot for SLAB Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot will allow KASAN store allocation/deallocation stack traces for memory chunks. The stack traces are stored in a hash table and referenced by handles which reside in the kasan_alloc_meta and kasan_free_meta structures in the allocated memory chunks. IRQ stack traces are cut below the IRQ entry point to avoid unnecessary duplication. Right now stackdepot support is only enabled in SLAB allocator. Once KASAN features in SLAB are on par with those in SLUB we can switch SLUB to stackdepot as well, thus removing the dependency on SLUB stack bookkeeping, which wastes a lot of memory. This patch is based on the "mm: kasan: stack depots" patch originally prepared by Dmitry Chernenkov. Joonsoo has said that he plans to reuse the stackdepot code for the mm/page_owner.c debugging facility. [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t] [aryabinin@virtuozzo.com: comment style fixes] Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Alexander Potapenko	be7635e728	arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections KASAN needs to know whether the allocation happens in an IRQ handler. This lets us strip everything below the IRQ entry point to reduce the number of unique stack traces needed to be stored. Move the definition of __irq_entry to <linux/interrupt.h> so that the users don't need to pull in <linux/ftrace.h>. Also introduce the __softirq_entry macro which is similar to __irq_entry, but puts the corresponding functions to the .softirqentry.text section. Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Alexander Potapenko	505f5dcb1c	mm, kasan: add GFP flags to KASAN API Add GFP flags to KASAN hooks for future patches to use. This patch is based on the "mm: kasan: unified support for SLUB and SLAB allocators" patch originally prepared by Dmitry Chernenkov. Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Alexander Potapenko	7ed2f9e663	mm, kasan: SLAB support Add KASAN hooks to SLAB allocator. This patch is based on the "mm: kasan: unified support for SLUB and SLAB allocators" patch originally prepared by Dmitry Chernenkov. Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Tetsuo Handa	aaf4fb712b	include/linux/oom.h: remove undefined oom_kills_count()/note_oom_kill() A leftover from commit `c32b3cbe0d` ("oom, PM: make OOM detection in the freezer path raceless"). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Tetsuo Handa	bb29902a75	oom, oom_reaper: protect oom_reaper_list using simpler way "oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task" tried to protect oom_reaper_list using MMF_OOM_KILLED flag. But we can do it by simply checking tsk->oom_reaper_list != NULL. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Vladimir Davydov	29c696e1c6	oom: make oom_reaper_list single linked Entries are only added/removed from oom_reaper_list at head so we can use a single linked list and hence save a word in task_struct. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Michal Hocko	855b018325	oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task Tetsuo has reported that oom_kill_allocating_task=1 will cause oom_reaper_list corruption because oom_kill_process doesn't follow standard OOM exclusion (aka ignores TIF_MEMDIE) and allows to enqueue the same task multiple times - e.g. by sacrificing the same child multiple times. This patch fixes the issue by introducing a new MMF_OOM_KILLED mm flag which is set in oom_kill_process atomically and oom reaper is disabled if the flag was already set. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Michal Hocko	03049269de	mm, oom_reaper: implement OOM victims queuing wake_oom_reaper has allowed only 1 oom victim to be queued. The main reason for that was the simplicity as other solutions would require some way of queuing. The current approach is racy and that was deemed sufficient as the oom_reaper is considered a best effort approach to help with oom handling when the OOM victim cannot terminate in a reasonable time. The race could lead to missing an oom victim which can get stuck out_of_memory wake_oom_reaper cmpxchg // OK oom_reaper oom_reap_task __oom_reap_task oom_victim terminates atomic_inc_not_zero // fail out_of_memory wake_oom_reaper cmpxchg // fails task_to_reap = NULL This race requires 2 OOM invocations in a short time period which is not very likely but certainly not impossible. E.g. the original victim might have not released a lot of memory for some reason. The situation would improve considerably if wake_oom_reaper used a more robust queuing. This is what this patch implements. This means adding oom_reaper_list list_head into task_struct (eat a hole before embeded thread_struct for that purpose) and a oom_reaper_lock spinlock for queuing synchronization. wake_oom_reaper will then add the task on the queue and oom_reaper will dequeue it. Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Michal Hocko	36324a990c	oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space When oom_reaper manages to unmap all the eligible vmas there shouldn't be much of the freable memory held by the oom victim left anymore so it makes sense to clear the TIF_MEMDIE flag for the victim and allow the OOM killer to select another task. The lack of TIF_MEMDIE also means that the victim cannot access memory reserves anymore but that shouldn't be a problem because it would get the access again if it needs to allocate and hits the OOM killer again due to the fatal_signal_pending resp. PF_EXITING check. We can safely hide the task from the OOM killer because it is clearly not a good candidate anymore as everyhing reclaimable has been torn down already. This patch will allow to cap the time an OOM victim can keep TIF_MEMDIE and thus hold off further global OOM killer actions granted the oom reaper is able to take mmap_sem for the associated mm struct. This is not guaranteed now but further steps should make sure that mmap_sem for write should be blocked killable which will help to reduce such a lock contention. This is not done by this patch. Note that exit_oom_victim might be called on a remote task from __oom_reap_task now so we have to check and clear the flag atomically otherwise we might race and underflow oom_victims or wake up waiters too early. Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andrea Argangeli <andrea@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Michal Hocko	aac4536355	mm, oom: introduce oom reaper This patch (of 5): This is based on the idea from Mel Gorman discussed during LSFMM 2015 and independently brought up by Oleg Nesterov. The OOM killer currently allows to kill only a single task in a good hope that the task will terminate in a reasonable time and frees up its memory. Such a task (oom victim) will get an access to memory reserves via mark_oom_victim to allow a forward progress should there be a need for additional memory during exit path. It has been shown (e.g. by Tetsuo Handa) that it is not that hard to construct workloads which break the core assumption mentioned above and the OOM victim might take unbounded amount of time to exit because it might be blocked in the uninterruptible state waiting for an event (e.g. lock) which is blocked by another task looping in the page allocator. This patch reduces the probability of such a lockup by introducing a specialized kernel thread (oom_reaper) which tries to reclaim additional memory by preemptively reaping the anonymous or swapped out memory owned by the oom victim under an assumption that such a memory won't be needed when its owner is killed and kicked from the userspace anyway. There is one notable exception to this, though, if the OOM victim was in the process of coredumping the result would be incomplete. This is considered a reasonable constrain because the overall system health is more important than debugability of a particular application. A kernel thread has been chosen because we need a reliable way of invocation so workqueue context is not appropriate because all the workers might be busy (e.g. allocating memory). Kswapd which sounds like another good fit is not appropriate as well because it might get blocked on locks during reclaim as well. oom_reaper has to take mmap_sem on the target task for reading so the solution is not 100% because the semaphore might be held or blocked for write but the probability is reduced considerably wrt. basically any lock blocking forward progress as described above. In order to prevent from blocking on the lock without any forward progress we are using only a trylock and retry 10 times with a short sleep in between. Users of mmap_sem which need it for write should be carefully reviewed to use _killable waiting as much as possible and reduce allocations requests done with the lock held to absolute minimum to reduce the risk even further. The API between oom killer and oom reaper is quite trivial. wake_oom_reaper updates mm_to_reap with cmpxchg to guarantee only NULL->mm transition and oom_reaper clear this atomically once it is done with the work. This means that only a single mm_struct can be reaped at the time. As the operation is potentially disruptive we are trying to limit it to the ncessary minimum and the reaper blocks any updates while it operates on an mm. mm_struct is pinned by mm_count to allow parallel exit_mmap and a race is detected by atomic_inc_not_zero(mm_users). Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Mel Gorman <mgorman@suse.de> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Andrew Morton	69b27baf00	sched: add schedule_timeout_idle() This will be needed in the patch "mm, oom: introduce oom reaper". Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-25 16:37:42 -07:00
Yan, Zheng	315f240880	ceph: fix security xattr deadlock When security is enabled, security module can call filesystem's getxattr/setxattr callbacks during d_instantiate(). For cephfs, d_instantiate() is usually called by MDS' dispatch thread, while handling MDS reply. If the MDS reply does not include xattrs and corresponding caps, getxattr/setxattr need to send a new request to MDS and waits for the reply. This makes MDS' dispatch sleep, nobody handles later MDS replies. The fix is make sure lookup/atomic_open reply include xattrs and corresponding caps. So getxattr can be handled by cached xattrs. This requires some modification to both MDS and request message. (Client tells MDS what caps it wants; MDS encodes proper caps in the reply) Smack security module may call setxattr during d_instantiate(). Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL to us. So just make setxattr return error when called by MDS' dispatch thread. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2016-03-25 18:51:55 +01:00
Yan, Zheng	2c63f49a72	libceph: add helper that duplicates last extent operation This helper duplicates last extent operation in OSD request, then adjusts the new extent operation's offset and length. The helper is for scatterd page writeback, which adds nonconsecutive dirty pages to single OSD request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2016-03-25 18:51:43 +01:00
Ilya Dryomov	3f1af42ad0	libceph: enable large, variable-sized OSD requests Turn r_ops into a flexible array member to enable large, consisting of up to 16 ops, OSD requests. The use case is scattered writeback in cephfs and, as far as the kernel client is concerned, 16 is just a made up number. r_ops had size 3 for copyup+hint+write, but copyup is really a special case - it can only happen once. ceph_osd_request_cache is therefore stuffed with num_ops=2 requests, anything bigger than that is allocated with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which means either num_ops=1 or num_ops=2 for use_mempool=true - all existing users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with that. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2016-03-25 18:51:43 +01:00
Yan, Zheng	7665d85b73	libceph: move r_reply_op_{len,result} into struct ceph_osd_req_op This avoids defining large array of r_reply_op_{len,result} in in struct ceph_osd_request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2016-03-25 18:51:42 +01:00
Ilya Dryomov	de2aa102ea	libceph: rename ceph_osd_req_op::payload_len to indata_len Follow userspace nomenclature on this - the next commit adds outdata_len. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2016-03-25 18:51:41 +01:00
Ilya Dryomov	168b9090c7	libceph: monc hunt rate is 3s with backoff up to 30s Unless we are in the process of setting up a client (i.e. connecting to the monitor cluster for the first time), apply a backoff: every time we want to reopen a session, increase our timeout by a multiple (currently 2); when we complete the connection, reduce that multipler by 50%. Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2016-03-25 18:51:39 +01:00
Ilya Dryomov	58d81b1294	libceph: monc ping rate is 10s Split ping interval and ping timeout: ping interval is 10s; keepalive timeout is 30s. Make monc_ping_timeout a constant while at it - it's not actually exported as a mount option (and the rest of tick-related settings won't be either), so it's got no place in ceph_options. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2016-03-25 18:51:39 +01:00
Ilya Dryomov	82dcabad75	libceph: revamp subs code, switch to SUBSCRIBE2 protocol It is currently hard-coded in the mon_client that mdsmap and monmap subs are continuous, while osdmap sub is always "onetime". To better handle full clusters/pools in the osd_client, we need to be able to issue continuous osdmap subs. Revamp subs code to allow us to specify for each sub whether it should be continuous or not. Although not strictly required for the above, switch to SUBSCRIBE2 protocol while at it, eliminating the ambiguity between a request for "every map since X" and a request for "just the latest" when we don't have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now required - it's been supported since pre-argonaut (2010). Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling in before we validate the epoch and successfully install the new map can mess up mon_client sub state. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2016-03-25 18:51:38 +01:00
Linus Torvalds	f98c2135f8	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Just a couple of dma-buf related fixes and some amdgpu fixes, along with a regression fix for radeon off but default feature, but makes my 30" monitor happy again" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: drm/radeon/mst: cleanup code indentation drm/radeon/mst: fix regression in lane/link handling. drm/amdgpu: add invalidate_page callback for userptrs drm/amdgpu: Revert "remove the userptr rmn->lock" drm/amdgpu: clean up path handling for powerplay drm/amd/powerplay: fix memory leak of tdp_table dma-buf/fence: fix fence_is_later v2 dma-buf: Update docs for SYNC ioctl drm: remove excess description dma-buf, drm, ion: Propagate error code from dma_buf_start_cpu_access() drm/atmel-hlcdc: use helper to get crtc state drm/atomic: use helper to get crtc state	2016-03-25 08:48:31 -07:00
Linus Torvalds	11caf57f6a	Merge tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "There are only three patches this time, most other changes to files in include/asm-generic tend to go through the tree of whoever depends on the change. Two patches are cleanups for stuff that is no longer needed, the main change is to adapt the generic version of BUG_ON() for CONFIG_BUG=n to make it behave consistently with BUG(). This avoids undefined behavior along with a number of warnings about that undefined behavior in randconfig builds when we keep going on after hitting a BUG_ON()" * tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: remove old nonatomic-io wrapper files asm-generic: default BUG_ON(x) to if(x)BUG() asm-generic: page.h: Remove useless get_user_page and free_user_page	2016-03-24 23:13:48 -07:00

1 2 3 4 5 ...

82351 Commits