Add devm_regulator_register_notifier, this adds the resource against the
device for the consumer supply we are registering the notifier for. There
seem to be few use-cases where this wouldn't be the users intention and
this ensures the notifiers will always be removed at the correct time.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Current sh_mobile_sdhi's platform data is set via sh_mobile_sdhi_info
and it is just copied to tmio_mmc_data.
Now, tmio mmc platform data is specified via tmio_mmc_data.
This patch replace sh_mobile_sdhi_info to tmio_mmc_data
struct sh_mobile_sdhi_info { -> struct tmio_mmc_data {
int dma_slave_tx; -> void *chan_priv_tx;
int dma_slave_rx; -> void *chan_priv_rx;
unsigned long tmio_flags; -> unsigned long flags;
unsigned long tmio_caps; -> unsigned long capabilities;
unsigned long tmio_caps2; -> unsigned long capabilities2;
u32 tmio_ocr_mask; -> u32 ocr_mask;
unsigned int cd_gpio; -> unsigned int cd_gpio;
}; unsigned int hclk;
void (*set_pwr)(...);
void (*set_clk_div)(...);
};
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
dma_request_slave_channel_compat() in tmio_mmc_dma
needs .chan_priv_tx/.chan_priv_rx. But these are copied from
sh_mobile_sdhi only, and sh_mobile_sdhi_info is now almost
same as tmio_mmc_data except .chan_priv_?x.
sh_mobile_sdhi_info can be replaced to tmio_mmc_data, but it is
used from ${LINUX}/arch/arm/mach-shmobile, ${LINUX}/arch/sh.
So, this patch adds .chan_priv_?x into tmio_mmc_data as 1st step,
and sh_mobile_sdhi driver has dummy operation for now.
It will be replaced/removed together with platform data replace.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Since commit 7bced39751 ("net_dma: simple removal") removed the net_dma
support entirely, net_dma_find_channel has no users left. Remove the function
entirely.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Commit 7bce d397 510a ("net_dma: simple removal") removed the net_dma support
entirely but left some functions and prototypes in the dmaengine header.
Remove them.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.
try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation. In that case, try_to_grab_pending() returns -ENOENT. In
this case, __cancel_work_timer() currently invokes flush_work(). The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping
Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority. Let's say task A just got woken
up from flush_work() by the completion of the target work item. If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing. This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.
task A task B worker
executing work
__cancel_work_timer()
try_to_grab_pending()
set work CANCELING
flush_work()
block for work completion
completion, wakes up A
__cancel_work_timer()
while (forever) {
try_to_grab_pending()
-ENOENT as work is being canceled
flush_work()
false as work is no longer executing
}
This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.
Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com
v3: bit_waitqueue() can't be used for work items defined in vmalloc
area. Switched to custom wake function which matches the target
work item and exclusive wait and wakeup.
v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
the target bit waitqueue has wait_bit_queue's on it. Use
DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu
Vizoso.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rabin Vincent <rabin.vincent@axis.com>
Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com>
Cc: stable@vger.kernel.org
Tested-by: Jesper Nilsson <jesper.nilsson@axis.com>
Tested-by: Rabin Vincent <rabin.vincent@axis.com>
These functions are not exported nor used anywhere, so there is no
reason to put them in public headers.
Also drop unused bcma_chipco_(suspend|resume).
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
We were providing declarations but actual code was compiled only with
CONFIG_BCMA_HOST_PCI set. This could result in:
ERROR: "bcma_host_pci_down" [drivers/net/wireless/brcm80211/brcmsmac/brcmsmac.ko] undefined!
ERROR: "bcma_host_pci_up" [drivers/net/wireless/brcm80211/brcmsmac/brcmsmac.ko] undefined!
ERROR: "bcma_host_pci_down" [drivers/net/wireless/b43/b43.ko] undefined!
ERROR: "bcma_host_pci_up" [drivers/net/wireless/b43/b43.ko] undefined!
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
This reverts commit 5a7d2efdd9.
As per discussion on the mailing list, this is not the right
thing to do. NULL cookies are valid in the stubs.
Reported-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Add device managed variants of gpiod_get_array() / gpiod_put_array()
functions for conveniently obtaining and disposing of an entire array
of GPIOs with one function call.
Signed-off-by: Rojhalat Ibrahim <imr@rtschenk.de>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Introduce new functions for conveniently obtaining and disposing of
an entire array of GPIOs with one function call.
ACPI parts tested by Mika Westerberg, DT parts tested by Rojhalat
Ibrahim.
Change log:
v5: move the ACPI functions to gpiolib-acpi.c
v4: - use shorter names for members of struct gpio_descs
- rename lut_gpio_count to platform_gpio_count for clarity
- add check for successful memory allocation
- use ERR_CAST()
v3: - rebase on current linux-gpio devel branch
- fix ACPI GPIO counting
- allow for zero-sized arrays
- make the flags argument mandatory for the new functions
- clarify documentation
v2: change interface
Suggested-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Rojhalat Ibrahim <imr@rtschenk.de>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Rojhalat Ibrahim <imr@rtschenk.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This patch fixes service discovery behaviour, when provided uuid filter
is empty and HCI_QUIRK_STRICT_DUPLICATE_FILTER is set. Before this
patch, empty uuid filter was unable to trigger scan restart, and that
caused inconsistent behaviour in applications.
Example: two DBus clients call BlueZ, one to find all devices with
service abcd, second to find all devices with rssi smaller than -90.
Sum of those filters, that is passed to mgmt_service_scan is empty
filter, with no rssi or uuids set.
That caused kernel not to restart scan when quirk was set.
That was inconsistent with what happen when there's only one of those
two filters set (scan is restarted and reports devices).
To fix that, new variable hdev->discovery.result_filtering was
introduced. It can indicate that filtered scan is running, no matter
what uuid or rssi filter is set.
Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
They're used to initialize various static fields, though static
cpumasks should generally be avoided.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The fib_table was wrapped in several places with an
rcu_read_lock/rcu_read_unlock however after looking over the code I found
several spots where the tables were being accessed as just standard
pointers without any protections. This change fixes that so that all of
the proper protections are in place when accessing the table to take RCU
replacement or removal of the table into account.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use cases like rotation require these hooks to have some context so they
know how to prepare and cleanup the frame buffer correctly.
For i915 specifically, object backing pages need to be mapped differently
for different rotation modes and the driver needs to know which mapping to
instantiate and which to tear down when transitioning between them.
v2: Made passed in states const. (Daniel Vetter)
[airlied: add mdp5 and atmel fixups]
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
- use the atomic helpers for plane_upate/disable hooks (Matt Roper)
- refactor the initial plane config code (Damien)
- ppgtt prep patches for dynamic pagetable alloc (Ben Widawsky, reworked and
rebased by a lot of other people)
- framebuffer modifier support from Tvrtko Ursulin, drm core code from Rob Clark
- piles of workaround patches for skl from Damien and Nick Hoath
- vGPU support for xengt on the client side (Yu Zhang)
- and the usual smaller things all over
* tag 'drm-intel-next-2015-02-14' of git://anongit.freedesktop.org/drm-intel: (88 commits)
drm/i915: Update DRIVER_DATE to 20150214
drm/i915: Remove references to previously removed UMS config option
drm/i915/skl: Use a LRI for WaDisableDgMirrorFixInHalfSliceChicken5
drm/i915/skl: Fix always true comparison in a revision id check
drm/i915/skl: Implement WaEnableLbsSlaRetryTimerDecrement
drm/i915/skl: Implement WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken
drm/i915: Add process identifier to requests
drm/i915/skl: Implement WaBarrierPerformanceFixDisable
drm/i915/skl: Implement WaCcsTlbPrefetchDisable:skl
drm/i915/skl: Implement WaDisableChickenBitTSGBarrierAckForFFSliceCS
drm/i915/skl: Implement WaDisableHDCInvalidation
drm/i915/skl: Implement WaDisableLSQCROPERFforOCL
drm/i915/skl: Implement WaDisablePartialResolveInVc
drm/i915/skl: Introduce a SKL specific init_workarounds()
drm/i915/skl: Document that we implement WaRsClearFWBitsAtReset
drm/i915/skl: Implement WaSetGAPSunitClckGateDisable
drm/i915/skl: Make the init clock gating function skylake specific
drm/i915/skl: Provide a gen9 specific init_render_ring()
drm/i915/skl: Document the WM read latency W/A with its name
drm/i915/skl: Also detect eDRAM on SKL
...
We need to store device offsets in 64 bit as the device
address space may be larger than the CPU's.
Fixes GPU init failures on radeons with 4GB or more of
vram on 32 bit kernels. We put vram at the start of the
GPU's address space so the gart aperture starts at 4 GB
causing all GPU addresses in the gart aperture to get
truncated.
bug:
https://bugs.freedesktop.org/show_bug.cgi?id=89072
[airlied: fix warning on nouveau build]
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: thellstrom@vmware.com
Acked-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current implementation is limited by the number of addresses that
fit into an unsigned long. This causes problems on 32-bit Tegra where
unsigned long is 32-bit but drm_mm is used to manage an IOVA space of
4 GiB. Given the 32-bit limitation, the range is limited to 4 GiB - 1
(or 4 GiB - 4 KiB for page granularity).
This commit changes the start and size of the range to be an unsigned
64-bit integer, thus allowing much larger ranges to be supported.
[airlied: fix i915 warnings and coloring callback]
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
fixupo
Currently HID code maps usages from telephony page into BTN_0, BTN_1, etc
keys which get interpreted by mousedev and userspace as left/right/middle
button clicks, which is not really helpful.
This change adds mappings for usages that have corresponding input event
definitions, and leaves the rest unmapped. This can be changed when
there are userspace consumers for more telephony usages.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
It currently is required that all users of NO_SUSPEND interrupt
lines pass the IRQF_NO_SUSPEND flag when requesting the IRQ or the
WARN_ON_ONCE() in irq_pm_install_action() will trigger. That is
done to warn about situations in which unprepared interrupt handlers
may be run unnecessarily for suspended devices and may attempt to
access those devices by mistake. However, it may cause drivers
that have no technical reasons for using IRQF_NO_SUSPEND to set
that flag just because they happen to share the interrupt line
with something like a timer.
Moreover, the generic handling of wakeup interrupts introduced by
commit 9ce7a25849 (genirq: Simplify wakeup mechanism) only works
for IRQs without any NO_SUSPEND users, so the drivers of wakeup
devices needing to use shared NO_SUSPEND interrupt lines for
signaling system wakeup generally have to detect wakeup in their
interrupt handlers. Thus if they happen to share an interrupt line
with a NO_SUSPEND user, they also need to request that their
interrupt handlers be run after suspend_device_irqs().
In both cases the reason for using IRQF_NO_SUSPEND is not because
the driver in question has a genuine need to run its interrupt
handler after suspend_device_irqs(), but because it happens to
share the line with some other NO_SUSPEND user. Otherwise, the
driver would do without IRQF_NO_SUSPEND just fine.
To make it possible to specify that condition explicitly, introduce
a new IRQ action handler flag for shared IRQs, IRQF_COND_SUSPEND,
that, when set, will indicate to the IRQ core that the interrupt
user is generally fine with suspending the IRQ, but it also can
tolerate handler invocations after suspend_device_irqs() and, in
particular, it is capable of detecting system wakeup and triggering
it as appropriate from its interrupt handler.
That will allow us to work around a problem with a shared timer
interrupt line on at91 platforms.
Link: http://marc.info/?l=linux-kernel&m=142252777602084&w=2
Link: http://marc.info/?t=142252775300011&r=1&w=2
Link: https://lkml.org/lkml/2014/12/15/552
Reported-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Merge "First fixes batch for AT91 on 4.0" from Nicolas Ferre:
- PM slowclock fixes for DDR and timeouts
- fix some DT entries
- little defconfig updates
- the removal of a harmful watchdog option + its detailed documentation
* tag 'at91-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91:
ARM: at91/dt: keep watchdog running in idle mode
dts: Documentation: AT91 Watchdog, explain what atmel,idle-halt property really do
ARM: at91/defconfig: add at91rm9200 ethernet support
ARM: at91/defconfig: remove CONFIG_SYSFS_DEPRECATED
ARM: at91/dt: at91sam9260: fix usart pinctrl
ARM: at91/dt: sama5d4: add missing alias for i2c0
ARM: at91/dt: at91sam9263: Fixup sram1 device tree node
ARM: at91: pm: fix SRAM allocation
ARM: at91: pm: fix at91rm9200 standby
pm: at91: Workaround DDRSDRC self-refresh bug with LPDDR1 memories.
pm: at91: pm_slowclock: fix suspend/resume hang up in timeouts
The NFT_USERDATA_MAXLEN is defined to 256, however we only have a u8
to store its size. Introduce a struct nft_userdata which contains a
length field and indicate its presence using a single bit in the rule.
The length field of struct nft_userdata is also a u8, however we don't
store zero sized data, so the actual length is udata->len + 1.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There are no users of snd_soc_jack_new() left and new users should use
snd_soc_card_jack_new() instead. So remove the function.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Jacks are typically card level elements, but are currently registered with a
CODEC. When it was originally introduced snd_soc_jack_new() took a
snd_soc_card as its parameter, but at that time DAPM was only implemented at
the CODEC level and there was only one CODEC per card. This made it clear
which CODEC to use for the jack DAPM operations. But the multi-component
patchset added support for having multiple CODECs per card and with it the
API was updated to register jacks with a specific CODEC instance instead.
Subsequently DAPM support at the card level has been introduced, but the
snd_soc_jack_new() API has so remained unchanged.
This leaves us with the issue that the DAPM pins that are managed by the
jack detection logic usually are part of the card DAPM context but are
accessed through a CODEC DAPM context. Currently this works fine, but might
break in the future if we take a more hierarchical approach to DAPM
contexts.
Furthermore with componentization progressing systems that do not register
a snd_soc_codec might appear, while these system may still want to able to
register a jack.
This patch addresses these issues by adding a new function called
snd_soc_card_jack_new() that can be used to register jacks with the card
rather than a CODEC.
This new function is mostly identical to snd_soc_jack_new() except that it
additionally allows to directly specify the DAPM pins associated with the
jack. This was done since most users of snd_soc_jack_new() typically call
snd_soc_jack_add_pins() right after it, which is not necessary with the new
API and allows to reduce the amount of boiler plate code.
The old snd_soc_jack_new() is re-implemented as a wrapper around
snd_soc_card_jack_new().
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Change ->set_info to take new qc_info structure which contains all the
necessary information both for XFS and VFS. Convert Q_SETINFO handler
to use this structure.
Signed-off-by: Jan Kara <jack@suse.cz>
Create new internal interface for getting information about quota which
contains everything needed for both VFS quotas and XFS quotas. Make VFS
use this and hook it up to Q_GETINFO.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
The gpio binding document says that new code should always use named
gpios. Patch 40b73183 added support to parse a list of gpios from child
nodes, but does not make it possible to use named gpios. This patch adds
the con_id property and implements it is done in gpiolib.c, where the
old-style of using unnamed gpios still works.
Signed-off-by: Olliver Schinagl <oliver@schinagl.nl>
Acked-by: Bryan Wu <cooloney@gmail.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Commit 48a9db462d ("drivers/dma: remove unused support for MEMSET
operations") removed support for the memset operation in dmaengine, but left
the fill_aligned field that was supposed to set the buffer alignment for the
memset operations.
Remove that field too.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
This patch fix spelling typo found in alsa-driver-api.xml.
It is because this file is generated from comments in source files,
I have to fix source files.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Some device drivers offload part of aggregation including AddBA/DelBA
negotiations to firmware. In such scenario, the PMF configuration of
the station needs to be provided to driver to enable encryption of
AddBA/DelBA action frames.
Signed-off-by: SenthilKumar Jegadeesan <sjegadee@qti.qualcomm.com>
[fix commit log, documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Sometimes the driver might want to modify private data in interfaces
that are down. One possible use-case is cleaning up interface state
after HW recovery. Some interfaces that were up before the recovery took
place might be down now, but they might still be "dirty".
Introduce a new iterate_interfaces() API and a new ACTIVE iterator flag.
This way the internal implementation of the both active and inactive
APIs remains the same.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the device supports waking up on 'any' signal - i.e. it continues
operating as usual and wakes up the host on pretty much anything that
happens, then it makes no sense to also configure the more restricted
WoWLAN mode where the device operates more autonomously but also in a
more restricted fashion.
Currently only cw2100 supports both 'any' and other triggers, but it
seems to be broken as it doesn't configure anything to the device, so
we can't currently get into a situation where both even can correctly
be configured. This is about to change (Intel devices are going to
support both and have different behaviour depending on configuration)
so make sure the conflicting modes cannot be configured.
(It seems that cw2100 advertises 'any' and 'disconnect' as a means of
saying that's what it will always do, but that isn't really the way
this API was meant to be used nor does it actually mean anything as
'any' always implies 'disconnect' already, and the driver doesn't
change device configuration in any way depending on the settings.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch adds MAX77843 core/irq driver to support PMIC,
MUIC(Micro USB Interface Controller), Charger, Fuel Gauge,
LED and Haptic device.
Signed-off-by: Jaewon Kim <jaewon02.kim@samsung.com>
Signed-off-by: Beomho Seo <beomho.seo@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
This adds support for the MediaTek MT6397 PMIC. This is a
multifunction device with the following sub modules:
- Regulator
- RTC
- Audio codec
- GPIO
- Clock
It is interfaced to the host controller using SPI interface by a proprietary
hardware called PMIC wrapper or pwrap. MT6397 MFD is a child device of the
pwrap.
Signed-off-by: Flora Fu, MediaTek
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Unlike IPv4 this code notifies on all cases where mpls routes
are added or removed and it never automatically removes routes.
Avoiding both the userspace confusion that is caused by omitting
route updates and the possibility of a flood of netlink traffic
when an interface goes doew.
For now reserved labels are handled automatically and userspace
is not notified.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds two new netlink routing attributes:
RTA_VIA and RTA_NEWDST.
RTA_VIA specifies the specifies the next machine to send a packet to
like RTA_GATEWAY. RTA_VIA differs from RTA_GATEWAY in that it
includes the address family of the address of the next machine to send
a packet to. Currently the MPLS code supports addresses in AF_INET,
AF_INET6 and AF_PACKET. For AF_INET and AF_INET6 the destination mac
address is acquired from the neighbour table. For AF_PACKET the
destination mac_address is specified in the netlink configuration.
I think raw destination mac address support with the family AF_PACKET
will prove useful. There is MPLS-TP which is defined to operate
on machines that do not support internet packets of any flavor. Further
seem to be corner cases where it can be useful. At this point
I don't care much either way.
RTA_NEWDST specifies the destination address to forward the packet
with. MPLS typically changes it's destination address at every hop.
For a swap operation RTA_NEWDST is specified with a length of one label.
For a push operation RTA_NEWDST is specified with two or more labels.
For a pop operation RTA_NEWDST is not specified or equivalently an emtpy
RTAN_NEWDST is specified.
Those new netlink attributes are used to implement handling of rt-netlink
RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the
MPLS label table.
rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message,
verify no unhandled attributes or unhandled values are present and sets
up the data structures for mpls_route_add and mpls_route_del.
I did my best to match up with the existing conventions with the caveats
that MPLS addresses are all destination-specific-addresses, and so
don't properly have a scope.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This sysctl gives two benefits. By defaulting the table size to 0
mpls even when compiled in and enabled defaults to not forwarding
any packets. This prevents unpleasant surprises for users.
The other benefit is that as mpls labels are allocated locally a dense
table a small dense label table may be used which saves memory and
is extremely simple and efficient to implement.
This sysctl allows userspace to choose the restrictions on the label
table size userspace applications need to cope with.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds a new Kconfig option MPLS_ROUTING.
The core of this change is the code to look at an mpls packet received
from another machine. Look that packet up in a routing table and
forward the packet on.
Support of MPLS over ATM is not considered or attempted here. This
implemntation follows RFC3032 and implements the MPLS shim header that
can pass over essentially any network.
What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
net->mpls.platform_label[]. What RFC3031 refers to as the Next Label
Hop Forwarding Entry (NHLFE) I call mpls_route. Though calling it the
label fordwarding information base (lfib) might also be valid.
Further the implemntation forwards packets as described in RFC3032.
There is no need and given the original motivation for MPLS a strong
discincentive to have a flexible label forwarding path. In essence
the logic is the topmost label is read, looked up, removed, and
replaced by 0 or more new lables and the sent out the specified
interface to it's next hop.
Quite a few optional features are not implemented here. Among them
are generation of ICMP errors when the TTL is exceeded or the packet
is larger than the next hop MTU (those conditions are detected and the
packets are dropped instead of generating an icmp error). The traffic
class field is always set to 0. The implementation focuses on IP over
MPLS and does not handle egress of other kinds of protocols.
Instead of implementing coordination with the neighbour table and
sorting out how to input next hops in a different address family (for
which there is value). I was lazy and implemented a next hop mac
address instead. The code is simpler and there are flavor of MPLS
such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
appropriate so a next hop by mac address would need to be implemented
at some point.
Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.
Decoding the mpls header must be done by first byeswapping a 32bit bit
endian word into the local cpu endian and then bit shifting to extract
the pieces. There is no C bit-field that can represent a wire format
mpls header on a little endian machine as the low bits of the 20bit
label wind up in the wrong half of third byte. Therefore internally
everything is deal with in cpu native byte order except when writing
to and reading from a packet.
For management simplicity if a label is configured to forward out
an interface that is down the packet is dropped early. Similarly
if an network interface is removed rt_dev is updated to NULL
(so no reference is preserved) and any packets for that label
are dropped. Keeping the label entries in the kernel allows
the kernel label table to function as the definitive source
of which labels are allocated and which are not.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For MPLS I am building the code so that either the neighbour mac
address can be specified or we can have a next hop in ipv4 or ipv6.
The kind of next hop we have is indicated by the neighbour table
pointer. A neighbour table pointer of NULL is a link layer address.
A non-NULL neighbour table pointer indicates which neighbour table and
thus which address family the next hop address is in that we need to
look up.
The code either sends a packet directly or looks up the appropriate
neighbour table entry and sends the packet.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While looking at the mpls code I found myself writing yet another
version of neigh_lookup_noref. We currently have __ipv4_lookup_noref
and __ipv6_lookup_noref.
So to make my work a little easier and to make it a smidge easier to
verify/maintain the mpls code in the future I stopped and wrote
___neigh_lookup_noref. Then I rewote __ipv4_lookup_noref and
__ipv6_lookup_noref in terms of this new function. I tested my new
version by verifying that the same code is generated in
ip_finish_output2 and ip6_finish_output2 where these functions are
inlined.
To get to ___neigh_lookup_noref I added a new neighbour cache table
function key_eq. So that the static size of the key would be
available.
I also added __neigh_lookup_noref for people who want to to lookup
a neighbour table entry quickly but don't know which neibhgour table
they are going to look up.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>