Merge tag 'v4.10-rc5' into for-linus

Sync up with mainline to apply fixup to a commit that came through
power supply tree.
This commit is contained in:
Dmitry Torokhov
2017-01-24 09:57:18 -08:00
17699 changed files with 1338733 additions and 566855 deletions

2
.gitattributes vendored Normal file
View File

@@ -0,0 +1,2 @@
*.c diff=cpp
*.h diff=cpp

View File

@@ -75,6 +75,8 @@ Jean Tourrilhes <jt@hpl.hp.com>
Jeff Garzik <jgarzik@pretzel.yyz.us>
Jens Axboe <axboe@suse.de>
Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Johan Hovold <johan@kernel.org> <jhovold@gmail.com>
Johan Hovold <johan@kernel.org> <johan@hovoldconsulting.com>
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
John Stultz <johnstul@us.ibm.com>
<josh@joshtriplett.org> <josh@freedesktop.org>
@@ -125,6 +127,7 @@ Peter Oruba <peter@oruba.de>
Peter Oruba <peter.oruba@amd.com>
Pratyush Anand <pratyush.anand@gmail.com> <pratyush.anand@st.com>
Praveen BP <praveenbp@ti.com>
Qais Yousef <qsyousef@gmail.com> <qais.yousef@imgtec.com>
Rajesh Shah <rajesh.shah@intel.com>
Ralf Baechle <ralf@linux-mips.org>
Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
@@ -134,6 +137,7 @@ Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Rudolf Marek <R.Marek@sh.cvut.cz>
Rui Saraiva <rmps@joel.ist.utl.pt>
Sachin P Sant <ssant@in.ibm.com>
Sarangdhar Joshi <spjoshi@codeaurora.org>
Sam Ravnborg <sam@mars.ravnborg.org>
Santosh Shilimkar <ssantosh@kernel.org>
Santosh Shilimkar <santosh.shilimkar@oracle.org>
@@ -147,10 +151,13 @@ Shuah Khan <shuah@kernel.org> <shuah.kh@samsung.com>
Simon Kelley <simon@thekelleys.org.uk>
Stéphane Witzmann <stephane.witzmann@ubpmes.univ-bpclermont.fr>
Stephen Hemminger <shemminger@osdl.org>
Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Subhash Jadavani <subhashj@codeaurora.org>
Sudeep Holla <sudeep.holla@arm.com> Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Sumit Semwal <sumit.semwal@ti.com>
Tejun Heo <htejun@gmail.com>
Thomas Graf <tgraf@suug.ch>
Thomas Pedersen <twp@codeaurora.org>
Tony Luck <tony.luck@intel.com>
Tsuneo Yoshioka <Tsuneo.Yoshioka@f-secure.com>
Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
@@ -160,6 +167,7 @@ Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Viresh Kumar <vireshk@kernel.org> <viresh.kumar@st.com>
Viresh Kumar <vireshk@kernel.org> <viresh.linux@gmail.com>
Viresh Kumar <vireshk@kernel.org> <viresh.kumar2@arm.com>
Vlad Dogaru <ddvlad@gmail.com> <vlad.dogaru@intel.com>
Vladimir Davydov <vdavydov.dev@gmail.com> <vdavydov@virtuozzo.com>
Vladimir Davydov <vdavydov.dev@gmail.com> <vdavydov@parallels.com>
Takashi YOSHII <takashi.yoshii.zj@renesas.com>

46
CREDITS
View File

@@ -9,7 +9,7 @@
Linus
----------
M: Matt Mackal
N: Matt Mackal
E: mpm@selenic.com
D: SLOB slab allocator
@@ -1090,6 +1090,10 @@ S: 6350 Stoneridge Mall Road
S: Pleasanton, CA 94588
S: USA
N: Dmitry Eremin-Solenikov
E: dbaryshkov@gmail.com
D: Power Supply Maintainer from v3.14 - v3.15
N: Doug Evans
E: dje@cygnus.com
D: Wrote Xenix FS (part of standard kernel since 0.99.15)
@@ -1860,10 +1864,11 @@ S: The Netherlands
N: Martin Kepplinger
E: martink@posteo.de
E: martin.kepplinger@theobroma-systems.com
E: martin.kepplinger@ginzinger.com
W: http://www.martinkepplinger.com
D: mma8452 accelerators iio driver
D: Kernel cleanups
D: pegasus_notetaker input driver
D: Kernel fixes and cleanups
S: Garnisonstraße 26
S: 4020 Linz
S: Austria
@@ -1905,7 +1910,7 @@ S: Ra'annana, Israel
N: Andi Kleen
E: andi@firstfloor.org
U: http://www.halobates.de
W: http://www.halobates.de
D: network, x86, NUMA, various hacks
S: Schwalbenstr. 96
S: 85551 Ottobrunn
@@ -1944,6 +1949,11 @@ E: kraxel@bytesex.org
E: kraxel@suse.de
D: video4linux, bttv, vesafb, some scsi, misc fixes
N: Hans J. Koch
D: USERSPACE I/O, MAX6650
D: Hans passed away in June 2016, and will be greatly missed.
W: https://lwn.net/Articles/691000/
N: Harald Koenig
E: koenig@tat.physik.uni-tuebingen.de
D: XFree86 (S3), DCF77, some kernel hacks and fixes
@@ -2079,8 +2089,8 @@ D: ST Microelectronics SPEAr13xx PCI host bridge driver
D: Synopsys Designware PCI host bridge driver
N: Gabor Kuti
M: seasons@falcon.sch.bme.hu
M: seasons@makosteszta.sote.hu
E: seasons@falcon.sch.bme.hu
E: seasons@makosteszta.sote.hu
D: Original author of software suspend
N: Jaroslav Kysela
@@ -2287,11 +2297,11 @@ D: Initial implementation of VC's, pty's and select()
N: Pavel Machek
E: pavel@ucw.cz
D: Softcursor for vga, hypertech cdrom support, vcsa bugfix, nbd
P: 4096R/92DFCE96 4FA7 9EEF FCD4 C44F C585 B8C7 C060 2241 92DF CE96
D: Softcursor for vga, hypertech cdrom support, vcsa bugfix, nbd,
D: sun4/330 port, capabilities for elf, speedup for rm on ext2, USB,
D: work on suspend-to-ram/disk, killing duplicates from ioctl32
S: Volkova 1131
S: 198 00 Praha 9
D: work on suspend-to-ram/disk, killing duplicates from ioctl32,
D: Altera SoCFPGA and Nokia N900 support.
S: Czech Republic
N: Paul Mackerras
@@ -2765,6 +2775,10 @@ S: C/ Mieses 20, 9-B
S: Valladolid 47009
S: Spain
N: Peter Oruba
D: AMD Microcode loader driver
S: Germany
N: Jens Osterkamp
E: jens@de.ibm.com
D: Maintainer of Spidernet network driver for Cell
@@ -3518,6 +3532,10 @@ S: 145 Howard St.
S: Northborough, MA 01532
S: USA
N: Doug Thompson
E: dougthompson@xmission.com
D: EDAC
N: Tommy Thorn
E: Tommy.Thorn@irisa.fr
W: http://www.irisa.fr/prive/thorn/index.html
@@ -3654,6 +3672,10 @@ S: Obere Heerbergstrasse 17
S: 97078 Wuerzburg
S: Germany
N: Jason Uhlenkott
E: juhlenko@akamai.com
D: I3000 EDAC driver
N: Greg Ungerer
E: gerg@snapgear.com
D: uClinux kernel hacker
@@ -3691,7 +3713,7 @@ S: Germany
N: Geert Uytterhoeven
E: geert@linux-m68k.org
W: http://users.telenet.be/geertu/
P: 1024/862678A6 C51D 361C 0BD1 4C90 B275 C553 6EEA 11BA 8626 78A6
P: 4096R/4804B4BC3F55EEFB 750D 82B0 A781 5431 5E25 925B 4804 B4BC 3F55 EEFB
D: m68k/Amiga and PPC/CHRP Longtrail coordinator
D: Frame buffer device and XF68_FBDev maintainer
D: m68k IDE maintainer
@@ -3927,8 +3949,6 @@ E: gwingerde@gmail.com
D: Ralink rt2x00 WLAN driver
D: Minix V2 file-system
D: Misc fixes
S: Geessinkweg 177
S: 7544 TX Enschede
S: The Netherlands
N: Lars Wirzenius

View File

@@ -14,13 +14,8 @@ Following translations are available on the WWW:
- this file.
ABI/
- info on kernel <-> userspace ABI and relative interface stability.
BUG-HUNTING
- brute force method of doing binary search of patches to find bug.
Changes
- list of changes that break older software packages.
CodingStyle
- how the maintainers expect the C code in the kernel to look.
- nothing here, just a pointer to process/coding-style.rst.
DMA-API.txt
- DMA API, pci_ API & extensions for non-consistent memory machines.
DMA-API-HOWTO.txt
@@ -33,8 +28,6 @@ DocBook/
- directory with DocBook templates etc. for kernel documentation.
EDID/
- directory with info on customizing EDID for broken gfx/displays.
HOWTO
- the process and procedures of how to do Linux kernel development.
IPMI.txt
- info on Linux Intelligent Platform Management Interface (IPMI) Driver.
IRQ-affinity.txt
@@ -46,61 +39,43 @@ IRQ.txt
Intel-IOMMU.txt
- basic info on the Intel IOMMU virtualization support.
Makefile
- some files in Documentation dir are actually sample code to build
ManagementStyle
- how to (attempt to) manage kernel hackers.
- It's not of interest for those who aren't touching the build system.
Makefile.sphinx
- It's not of interest for those who aren't touching the build system.
PCI/
- info related to PCI drivers.
RCU/
- directory with info on RCU (read-copy update).
SAK.txt
- info on Secure Attention Keys.
SM501.txt
- Silicon Motion SM501 multimedia companion chip
SecurityBugs
- procedure for reporting security bugs found in the kernel.
SubmitChecklist
- Linux kernel patch submission checklist.
SubmittingDrivers
- procedure to get a new driver source included into the kernel tree.
SubmittingPatches
- procedure to get a source patch included into the kernel tree.
VGA-softcursor.txt
- how to change your VGA cursor from a blinking underscore.
- nothing here, just a pointer to process/coding-style.rst.
accounting/
- documentation on accounting and taskstats.
acpi/
- info on ACPI-specific hooks in the kernel.
admin-guide/
- info related to Linux users and system admins.
aoe/
- description of AoE (ATA over Ethernet) along with config examples.
applying-patches.txt
- description of various trees and how to apply their patches.
arm/
- directory with info about Linux on the ARM architecture.
arm64/
- directory with info about Linux on the 64 bit ARM architecture.
assoc_array.txt
- generic associative array intro.
atomic_ops.txt
- semantics and behavior of atomic and bitmask operations.
auxdisplay/
- misc. LCD driver documentation (cfag12864b, ks0108).
backlight/
- directory with info on controlling backlights in flat panel displays
bad_memory.txt
- how to use kernel parameters to exclude bad RAM regions.
basic_profiling.txt
- basic instructions for those who wants to profile Linux kernel.
bcache.txt
- Block-layer cache on fast SSDs to improve slow (raid) I/O performance.
binfmt_misc.txt
- info on the kernel support for extra binary formats.
blackfin/
- directory with documentation for the Blackfin arch.
block/
- info on the Block I/O (BIO) layer.
blockdev/
- info on block devices & drivers
braille-console.txt
- info on how to use serial devices for Braille support.
bt8xxgpio.txt
- info on how to modify a bt8xx video card for GPIO usage.
btmrvl.txt
@@ -113,18 +88,24 @@ cachetlb.txt
- describes the cache/TLB flushing interfaces Linux uses.
cdrom/
- directory with information on the CD-ROM drivers that Linux has.
cgroups/
- cgroups features, including cpusets and memory controller.
cgroup-v1/
- cgroups v1 features, including cpusets and memory controller.
cgroup-v2.txt
- cgroups v2 features, including cpusets and memory controller.
circular-buffers.txt
- how to make use of the existing circular buffer infrastructure
clk.txt
- info on the common clock framework
coccinelle.txt
- info on how to get and use the Coccinelle code checking tool.
cma/
- Continuous Memory Area (CMA) debugfs interface.
conf.py
- It's not of interest for those who aren't touching the build system.
connector/
- docs on the netlink based userspace<->kernel space communication mod.
console/
- documentation on Linux console drivers.
core-api/
- documentation on kernel core components.
cpu-freq/
- info on CPU frequency and voltage scaling.
cpu-hotplug.txt
@@ -149,42 +130,42 @@ debugging-via-ohci1394.txt
- how to use firewire like a hardware debugger memory reader.
dell_rbu.txt
- document demonstrating the use of the Dell Remote BIOS Update driver.
development-process/
- how to work with the mainline kernel development process.
dev-tools/
- directory with info on development tools for the kernel.
device-mapper/
- directory with info on Device Mapper.
devices.txt
- plain ASCII listing of all the nodes in /dev/ with major minor #'s.
dmaengine/
- the DMA engine and controller API guides.
devicetree/
- directory with info on device tree files used by OF/PowerPC/ARM
digsig.txt
-info on the Digital Signature Verification API
dma-buf-sharing.txt
- the DMA Buffer Sharing API Guide
docutils.conf
- nothing here. Just a configuration file for docutils.
dontdiff
- file containing a list of files that should never be diff'ed.
driver-api/
- the Linux driver implementer's API guide.
driver-model/
- directory with info about Linux driver model.
dvb/
- info on Linux Digital Video Broadcast (DVB) subsystem.
dynamic-debug-howto.txt
- how to use the dynamic debug (dyndbg) feature.
early-userspace/
- info about initramfs, klibc, and userspace early during boot.
edac.txt
- information on EDAC - Error Detection And Correction
efi-stub.txt
- How to use the EFI boot stub to bypass GRUB or elilo on EFI systems.
eisa.txt
- info on EISA bus support.
email-clients.txt
- info on how to use e-mail to send un-mangled (git) patches.
extcon/
- directory with porting guide for Android kernel switch driver.
isa.txt
- info on EISA bus support.
fault-injection/
- dir with docs about the fault injection capabilities infrastructure.
fb/
- directory with info on the frame buffer graphics abstraction layer.
features/
- status of feature implementation on different architectures.
filesystems/
- info on the vfs and the various filesystems that Linux supports.
firmware_class/
@@ -193,20 +174,22 @@ flexible-arrays.txt
- how to make use of flexible sized arrays in linux
fmc/
- information about the FMC bus abstraction
fpga/
- FPGA Manager Core.
frv/
- Fujitsu FR-V Linux documentation.
futex-requeue-pi.txt
- info on requeueing of tasks from a non-PI futex to a PI futex
gcov.txt
- use of GCC's coverage testing tool "gcov" with the Linux kernel
gcc-plugins.txt
- GCC plugin infrastructure.
gpio/
- gpio related documentation
gpu/
- directory with information on GPU driver developer's guide.
hid/
- directory with information on human interface devices
highuid.txt
- notes on the change from 16 bit to 32 bit user/group IDs.
hsi.txt
- HSI subsystem overview.
hwspinlock.txt
- hardware spinlock provides hardware assistance for synchronization
timers/
@@ -217,18 +200,18 @@ hwmon/
- directory with docs on various hardware monitoring drivers.
i2c/
- directory with info about the I2C bus/protocol (2 wire, kHz speed).
i2o/
- directory with info about the Linux I2O subsystem.
x86/i386/
- directory with info about Linux on Intel 32 bit architecture.
ia64/
- directory with info about Linux on Intel 64 bit architecture.
ide/
- Information regarding the Enhanced IDE drive.
iio/
- info on industrial IIO configfs support.
index.rst
- main index for the documentation at ReST format.
infiniband/
- directory with documents concerning Linux InfiniBand support.
init.txt
- what to do when the kernel can't find the 1st process to run.
initrd.txt
- how to use the RAM disk as an initial/temporary root filesystem.
input/
- info on Linux input device support.
intel_txt.txt
@@ -247,28 +230,16 @@ isapnp.txt
- info on Linux ISA Plug & Play support.
isdn/
- directory with info on the Linux ISDN support, and supported cards.
java.txt
- info on the in-kernel binary support for Java(tm).
ja_JP/
- directory with Japanese translations of various documents
kbuild/
- directory with info about the kernel build process.
kernel-doc-nano-HOWTO.txt
- outdated info about kernel-doc documentation.
kdump/
- directory with mini HowTo on getting the crash dump code to work.
kernel-docs.txt
- listing of various WWW + books that document kernel internals.
kernel-documentation.rst
doc-guide/
- how to write and format reStructuredText kernel documentation
kernel-parameters.txt
- summary listing of command line / boot prompt args for the kernel.
kernel-per-CPU-kthreads.txt
- List of all per-CPU kthreads and how they introduce jitter.
kmemcheck.txt
- info on dynamic checker that detects uses of uninitialized memory.
kmemleak.txt
- info on how to make use of the kernel memory leak detection system
ko_KR/
- directory with Korean translations of various documents
kobject.txt
- info of the kobject infrastructure of the Linux kernel.
kprobes.txt
@@ -283,8 +254,8 @@ ldm.txt
- a brief description of LDM (Windows Dynamic Disks).
leds/
- directory with info about LED handling under Linux.
local_ops.txt
- semantics and behavior of local atomic operations.
livepatch/
- info on kernel live patching.
locking/
- directory with info about kernel locking primitives
lockup-watchdogs.txt
@@ -297,22 +268,24 @@ lzo.txt
- kernel LZO decompressor input formats
m68k/
- directory with info about Linux on Motorola 68k architecture.
magic-number.txt
- list of magic numbers used to mark/protect kernel data structures.
mailbox.txt
- How to write drivers for the common mailbox framework (IPC).
md.txt
- info on boot arguments for the multiple devices driver.
media-framework.txt
- info on media framework, its data structures, functions and usage.
md-cluster.txt
- info on shared-device RAID MD cluster.
media/
- info on media drivers: uAPI, kAPI and driver documentation.
memory-barriers.txt
- info on Linux kernel memory barriers.
memory-devices/
- directory with info on parts like the Texas Instruments EMIF driver
memory-hotplug.txt
- Hotpluggable memory support, how to use and current status.
men-chameleon-bus.txt
- info on MEN chameleon bus.
metag/
- directory with info about Linux on Meta architecture.
mic/
- Intel Many Integrated Core (MIC) architecture device driver.
mips/
- directory with info about Linux on MIPS architecture.
misc-devices/
@@ -321,12 +294,8 @@ mmc/
- directory with info about the MMC subsystem
mn10300/
- directory with info about the mn10300 architecture port
module-signing.txt
- Kernel module signing for increased security when loading modules.
mtd/
- directory with info about memory technology devices (flash)
mono.txt
- how to execute Mono-based .NET binaries with the help of BINFMT_MISC.
namespaces/
- directory with various information about namespaces
netlabel/
@@ -335,30 +304,42 @@ networking/
- directory with info on various aspects of networking with Linux.
nfc/
- directory relating info about Near Field Communications support.
nios2/
- Linux on the Nios II architecture.
nommu-mmap.txt
- documentation about no-mmu memory mapping support.
numastat.txt
- info on how to read Numa policy hit/miss statistics in sysfs.
oops-tracing.txt
- how to decode those nasty internal kernel error dump messages.
ntb.txt
- info on Non-Transparent Bridge (NTB) drivers.
nvdimm/
- info on non-volatile devices.
nvmem/
- info on non volatile memory framework.
output/
- default directory where html/LaTeX/pdf files will be written.
padata.txt
- An introduction to the "padata" parallel execution API
parisc/
- directory with info on using Linux on PA-RISC architecture.
parport.txt
- how to use the parallel-port driver.
parport-lowlevel.txt
- description and usage of the low level parallel port functions.
pcmcia/
- info on the Linux PCMCIA driver.
percpu-rw-semaphore.txt
- RCU based read-write semaphore optimized for locking for reading
perf/
- info about the APM X-Gene SoC Performance Monitoring Unit (PMU).
phy/
- ino on Samsung USB 2.0 PHY adaptation layer.
phy.txt
- Description of the generic PHY framework.
pi-futex.txt
- documentation on lightweight priority inheritance futexes.
pinctrl.txt
- info on pinctrl subsystem and the PINMUX/PINCONF and drivers
platform/
- List of supported hardware by compal and Dell laptop.
pnp.txt
- Linux Plug and Play documentation.
power/
@@ -371,14 +352,16 @@ preempt-locking.txt
- info on locking under a preemptive kernel.
printk-formats.txt
- how to get printk format specifiers right
process/
- how to work with the mainline kernel development process.
pps/
- directory with information on the pulse-per-second support
pti/
- directory with info on Intel MID PTI.
ptp/
- directory with info on support for IEEE 1588 PTP clocks in Linux.
pwm.txt
- info on the pulse width modulation driver subsystem
ramoops.txt
- documentation of the ramoops oops/panic logging module.
rapidio/
- directory with info on RapidIO packet-based fabric interconnect
rbtree.txt
@@ -405,8 +388,6 @@ security/
- directory that contains security-related info
serial/
- directory with info on the low level serial API.
serial-console.txt
- how to set up Linux with a serial line console as the default.
sgi-ioc4.txt
- description of the SGI IOC4 PCI (multi function) device.
sh/
@@ -415,24 +396,20 @@ smsc_ece1099.txt
-info on the smsc Keyboard Scan Expansion/GPIO Expansion device.
sound/
- directory with info on sound card support.
sparse.txt
- info on how to obtain and use the sparse tool for typechecking.
spi/
- overview of Linux kernel Serial Peripheral Interface (SPI) support.
stable_api_nonsense.txt
- info on why the kernel does not have a stable in-kernel api or abi.
stable_kernel_rules.txt
- rules and procedures for the -stable kernel releases.
sphinx/
- no documentation here, just files required by Sphinx toolchain.
sphinx-static/
- no documentation here, just files required by Sphinx toolchain.
static-keys.txt
- info on how static keys allow debug code in hotpaths via patching
svga.txt
- short guide on selecting video modes at boot via VGA BIOS.
sysfs-rules.txt
- How not to use sysfs.
sync_file.txt
- Sync file API guide.
sysctl/
- directory with info on the /proc/sys/* files.
sysrq.txt
- info on the magic SysRq key.
target/
- directory with info on generating TCM v4 fabric .ko modules
this_cpu_ops.txt
@@ -441,39 +418,29 @@ thermal/
- directory with information on managing thermal issues (CPU/temp)
trace/
- directory with info on tracing technologies within linux
translations/
- translations of this document from English to another language
unaligned-memory-access.txt
- info on how to avoid arch breaking unaligned memory access in code.
unicode.txt
- info on the Unicode character/font mapping used in Linux.
unshare.txt
- description of the Linux unshare system call.
usb/
- directory with info regarding the Universal Serial Bus.
vDSO/
- directory with info regarding virtual dynamic shared objects
vfio.txt
- info on Virtual Function I/O used in guest/hypervisor instances.
vgaarbiter.txt
- info on enable/disable the legacy decoding on different VGA devices
video-output.txt
- sysfs class driver interface to enable/disable a video output device.
video4linux/
- directory with info regarding video/TV/radio cards and linux.
virtual/
- directory with information on the various linux virtualizations.
vm/
- directory with info on the Linux vm code.
vme_api.txt
- file relating info on the VME bus API in linux
volatile-considered-harmful.txt
- Why the "volatile" type class should not be used
w1/
- directory with documents regarding the 1-wire (w1) subsystem.
watchdog/
- how to auto-reboot Linux if it has "fallen and can't get up". ;-)
wimax/
- directory with info about Intel Wireless Wimax Connections
workqueue.txt
core-api/workqueue.rst
- information on the Concurrency Managed Workqueue implementation
x86/x86_64/
- directory with info on Linux support for AMD x86-64 (Hammer) machines.
@@ -483,7 +450,5 @@ xtensa/
- directory with documents relating to arch/xtensa port/implementation
xz.txt
- how to make use of the XZ data compression within linux kernel
zh_CN/
- directory with Chinese translations of various documents
zorro.txt
- info on writing drivers for Zorro bus devices found on Amigas.

View File

@@ -84,4 +84,4 @@ stable:
- Kernel-internal symbols. Do not rely on the presence, absence, location, or
type of any kernel symbol, either in System.map files or the kernel binary
itself. See Documentation/stable_api_nonsense.txt.
itself. See Documentation/process/stable-api-nonsense.rst.

View File

@@ -8,3 +8,17 @@ Description:
Any device associated with a device-tree node will have
an of_path symlink pointing to the corresponding device
node in /sys/firmware/devicetree/
What: /sys/devices/*/devspec
Date: October 2016
Contact: Device Tree mailing list <devicetree@vger.kernel.org>
Description:
If CONFIG_OF is enabled, then this file is present. When
read, it returns full name of the device node.
What: /sys/devices/*/obppath
Date: October 2016
Contact: Device Tree mailing list <devicetree@vger.kernel.org>
Description:
If CONFIG_OF is enabled, then this file is present. When
read, it returns full name of the device node.

View File

@@ -235,3 +235,45 @@ Description:
write_same_max_bytes is 0, write same is not supported
by the device.
What: /sys/block/<disk>/queue/write_zeroes_max_bytes
Date: November 2016
Contact: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Description:
Devices that support write zeroes operation in which a
single request can be issued to zero out the range of
contiguous blocks on storage without having any payload
in the request. This can be used to optimize writing zeroes
to the devices. write_zeroes_max_bytes indicates how many
bytes can be written in a single write zeroes command. If
write_zeroes_max_bytes is 0, write zeroes is not supported
by the device.
What: /sys/block/<disk>/queue/zoned
Date: September 2016
Contact: Damien Le Moal <damien.lemoal@hgst.com>
Description:
zoned indicates if the device is a zoned block device
and the zone model of the device if it is indeed zoned.
The possible values indicated by zoned are "none" for
regular block devices and "host-aware" or "host-managed"
for zoned block devices. The characteristics of
host-aware and host-managed zoned block devices are
described in the ZBC (Zoned Block Commands) and ZAC
(Zoned Device ATA Command Set) standards. These standards
also define the "drive-managed" zone model. However,
since drive-managed zoned block devices do not support
zone commands, they will be treated as regular block
devices and zoned will report "none".
What: /sys/block/<disk>/queue/chunk_sectors
Date: September 2016
Contact: Hannes Reinecke <hare@suse.com>
Description:
chunk_sectors has different meaning depending on the type
of the disk. For a RAID device (dm-raid), chunk_sectors
indicates the size in 512B sectors of the RAID volume
stripe segment. For a zoned block device, either
host-aware or host-managed, chunk_sectors indicates the
size of 512B sectors of the zones of the device, with
the eventual exception of the last zone of the device
which may be smaller.

View File

@@ -0,0 +1,21 @@
What: /sys/bus/fsl-mc/drivers/.../bind
Date: December 2016
Contact: stuart.yoder@nxp.com
Description:
Writing a device location to this file will cause
the driver to attempt to bind to the device found at
this location. The format for the location is Object.Id
and is the same as found in /sys/bus/fsl-mc/devices/.
For example:
# echo dpni.2 > /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/bind
What: /sys/bus/fsl-mc/drivers/.../unbind
Date: December 2016
Contact: stuart.yoder@nxp.com
Description:
Writing a device location to this file will cause the
driver to attempt to unbind from the device found at
this location. The format for the location is Object.Id
and is the same as found in /sys/bus/fsl-mc/devices/.
For example:
# echo dpni.2 > /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/unbind

View File

@@ -329,6 +329,7 @@ What: /sys/bus/iio/devices/iio:deviceX/in_pressure_scale
What: /sys/bus/iio/devices/iio:deviceX/in_humidityrelative_scale
What: /sys/bus/iio/devices/iio:deviceX/in_velocity_sqrt(x^2+y^2+z^2)_scale
What: /sys/bus/iio/devices/iio:deviceX/in_illuminance_scale
What: /sys/bus/iio/devices/iio:deviceX/in_countY_scale
KernelVersion: 2.6.35
Contact: linux-iio@vger.kernel.org
Description:
@@ -1579,3 +1580,20 @@ Contact: linux-iio@vger.kernel.org
Description:
Raw (unscaled no offset etc.) electric conductivity reading that
can be processed to siemens per meter.
What: /sys/bus/iio/devices/iio:deviceX/in_countY_raw
KernelVersion: 4.9
Contact: linux-iio@vger.kernel.org
Description:
Raw counter device counts from channel Y. For quadrature
counters, multiplication by an available [Y]_scale results in
the counts of a single quadrature signal phase from channel Y.
What: /sys/bus/iio/devices/iio:deviceX/in_indexY_raw
KernelVersion: 4.9
Contact: linux-iio@vger.kernel.org
Description:
Raw counter device index value from channel Y. This attribute
provides an absolute positional reference (e.g. a pulse once per
revolution) which may be used to home positional systems as
required.

View File

@@ -0,0 +1,36 @@
What: /sys/bus/iio/devices/iio:deviceX/in_altvoltageY_invert
Date: October 2016
KernelVersion: 4.9
Contact: Peter Rosin <peda@axentia.se>
Description:
The DAC is used to find the peak level of an alternating
voltage input signal by a binary search using the output
of a comparator wired to an interrupt pin. Like so:
_
| \
input +------>-------|+ \
| \
.-------. | }---.
| | | / |
| dac|-->--|- / |
| | |_/ |
| | |
| | |
| irq|------<-------'
| |
'-------'
The boolean invert attribute (0/1) should be set when the
input signal is centered around the maximum value of the
dac instead of zero. The envelope detector will search
from below in this case and will also invert the result.
The edge/level of the interrupt is also switched to its
opposite value.
What: /sys/bus/iio/devices/iio:deviceX/in_altvoltageY_compare_interval
Date: October 2016
KernelVersion: 4.9
Contact: Peter Rosin <peda@axentia.se>
Description:
Number of milliseconds to wait for the comparator in each
step of the binary search for the input peak level. Needs
to relate to the frequency of the input signal.

View File

@@ -0,0 +1,125 @@
What: /sys/bus/iio/devices/iio:deviceX/in_count_count_direction_available
What: /sys/bus/iio/devices/iio:deviceX/in_count_count_mode_available
What: /sys/bus/iio/devices/iio:deviceX/in_count_noise_error_available
What: /sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available
What: /sys/bus/iio/devices/iio:deviceX/in_index_index_polarity_available
What: /sys/bus/iio/devices/iio:deviceX/in_index_synchronous_mode_available
KernelVersion: 4.9
Contact: linux-iio@vger.kernel.org
Description:
Discrete set of available values for the respective counter
configuration are listed in this file.
What: /sys/bus/iio/devices/iio:deviceX/in_countY_count_direction
KernelVersion: 4.9
Contact: linux-iio@vger.kernel.org
Description:
Read-only attribute that indicates whether the counter for
channel Y is counting up or down.
What: /sys/bus/iio/devices/iio:deviceX/in_countY_count_mode
KernelVersion: 4.9
Contact: linux-iio@vger.kernel.org
Description:
Count mode for channel Y. Four count modes are available:
normal, range limit, non-recycle, and modulo-n. The preset value
for channel Y is used by the count mode where required.
Normal:
Counting is continuous in either direction.
Range Limit:
An upper or lower limit is set, mimicking limit switches
in the mechanical counterpart. The upper limit is set to
the preset value, while the lower limit is set to 0. The
counter freezes at count = preset when counting up, and
at count = 0 when counting down. At either of these
limits, the counting is resumed only when the count
direction is reversed.
Non-recycle:
Counter is disabled whenever a 24-bit count overflow or
underflow takes place. The counter is re-enabled when a
new count value is loaded to the counter via a preset
operation or write to raw.
Modulo-N:
A count boundary is set between 0 and the preset value.
The counter is reset to 0 at count = preset when
counting up, while the counter is set to the preset
value at count = 0 when counting down; the counter does
not freeze at the bundary points, but counts
continuously throughout.
What: /sys/bus/iio/devices/iio:deviceX/in_countY_noise_error
KernelVersion: 4.9
Contact: linux-iio@vger.kernel.org
Description:
Read-only attribute that indicates whether excessive noise is
present at the channel Y count inputs in quadrature clock mode;
irrelevant in non-quadrature clock mode.
What: /sys/bus/iio/devices/iio:deviceX/in_countY_preset
KernelVersion: 4.9
Contact: linux-iio@vger.kernel.org
Description:
If the counter device supports preset registers, the preset
count for channel Y is provided by this attribute.
What: /sys/bus/iio/devices/iio:deviceX/in_countY_quadrature_mode
KernelVersion: 4.9
Contact: linux-iio@vger.kernel.org
Description:
Configure channel Y counter for non-quadrature or quadrature
clock mode. Selecting non-quadrature clock mode will disable
synchronous load mode. In quadrature clock mode, the channel Y
scale attribute selects the encoder phase division (scale of 1
selects full-cycle, scale of 0.5 selects half-cycle, scale of
0.25 selects quarter-cycle) processed by the channel Y counter.
Non-quadrature:
The filter and decoder circuit are bypassed. Encoder A
input serves as the count input and B as the UP/DOWN
direction control input, with B = 1 selecting UP Count
mode and B = 0 selecting Down Count mode.
Quadrature:
Encoder A and B inputs are digitally filtered and
decoded for UP/DN clock.
What: /sys/bus/iio/devices/iio:deviceX/in_countY_set_to_preset_on_index
KernelVersion: 4.9
Contact: linux-iio@vger.kernel.org
Description:
Whether to set channel Y counter with channel Y preset value
when channel Y index input is active, or continuously count.
Valid attribute values are boolean.
What: /sys/bus/iio/devices/iio:deviceX/in_indexY_index_polarity
KernelVersion: 4.9
Contact: linux-iio@vger.kernel.org
Description:
Active level of channel Y index input; irrelevant in
non-synchronous load mode.
What: /sys/bus/iio/devices/iio:deviceX/in_indexY_synchronous_mode
KernelVersion: 4.9
Contact: linux-iio@vger.kernel.org
Description:
Configure channel Y counter for non-synchronous or synchronous
load mode. Synchronous load mode cannot be selected in
non-quadrature clock mode.
Non-synchronous:
A logic low level is the active level at this index
input. The index function (as enabled via
set_to_preset_on_index) is performed directly on the
active level of the index input.
Synchronous:
Intended for interfacing with encoder Index output in
quadrature clock mode. The active level is configured
via index_polarity. The index function (as enabled via
set_to_preset_on_index) is performed synchronously with
the quadrature clock on the active level of the index
input.

View File

@@ -0,0 +1,18 @@
What: /sys/bus/iio/devices/iio:deviceX/calibrate
Date: July 2015
KernelVersion: 4.7
Contact: linux-iio@vger.kernel.org
Description:
Writing '1' will perform a FOC (Fast Online Calibration). The
corresponding calibration offsets can be read from *_calibbias
entries.
What: /sys/bus/iio/devices/iio:deviceX/location
Date: July 2015
KernelVersion: 4.7
Contact: linux-iio@vger.kernel.org
Description:
This attribute returns a string with the physical location where
the motion sensor is placed. For example, in a laptop a motion
sensor can be located on the base or on the lid. Current valid
values are 'base' and 'lid'.

View File

@@ -0,0 +1,8 @@
What: /sys/bus/iio/devices/iio:deviceX/out_voltageY_raw_available
Date: October 2016
KernelVersion: 4.9
Contact: Peter Rosin <peda@axentia.se>
Description:
The range of available values represented as the minimum value,
the step and the maximum value, all enclosed in square brackets.
Example: [0 1 256]

View File

@@ -0,0 +1,19 @@
What: /sys/bus/iio/devices/iio:deviceX/proximity_on_chip_ambient_infrared_suppression
Date: January 2011
KernelVersion: 2.6.37
Contact: linux-iio@vger.kernel.org
Description:
From ISL29018 Data Sheet (FN6619.4, Oct 8, 2012) regarding the
infrared suppression:
Scheme 0, makes full n (4, 8, 12, 16) bits (unsigned) proximity
detection. The range of Scheme 0 proximity count is from 0 to
2^n. Logic 1 of this bit, Scheme 1, makes n-1 (3, 7, 11, 15)
bits (2's complementary) proximity_less_ambient detection. The
range of Scheme 1 proximity count is from -2^(n-1) to 2^(n-1).
The sign bit is extended for resolutions less than 16. While
Scheme 0 has wider dynamic range, Scheme 1 proximity detection
is less affected by the ambient IR noise variation.
0 Sensing IR from LED and ambient
1 Sensing IR from LED with ambient IR rejection

View File

@@ -0,0 +1,20 @@
What: /sys/bus/iio/devices/device[n]/in_illuminance_calibrate
KernelVersion: 2.6.37
Contact: linux-iio@vger.kernel.org
Description:
This property causes an internal calibration of the als gain trim
value which is later used in calculating illuminance in lux.
What: /sys/bus/iio/devices/device[n]/in_illuminance_lux_table
KernelVersion: 2.6.37
Contact: linux-iio@vger.kernel.org
Description:
This property gets/sets the table of coefficients
used in calculating illuminance in lux.
What: /sys/bus/iio/devices/device[n]/in_illuminance_input_target
KernelVersion: 2.6.37
Contact: linux-iio@vger.kernel.org
Description:
This property is the known externally illuminance (in lux).
It is used in the process of calibrating the device accuracy.

View File

@@ -0,0 +1,8 @@
What: /sys/bus/iio/devices/iio:deviceX/out_resistance_raw_available
Date: October 2016
KernelVersion: 4.9
Contact: Peter Rosin <peda@axentia.se>
Description:
The range of available values represented as the minimum value,
the step and the maximum value, all enclosed in square brackets.
Example: [0 1 256]

View File

@@ -294,3 +294,10 @@ Description:
a firmware bug to the system vendor. Writing to this file
taints the kernel with TAINT_FIRMWARE_WORKAROUND, which
reduces the supportability of your system.
What: /sys/bus/pci/devices/.../revision
Date: November 2016
Contact: Emil Velikov <emil.l.velikov@gmail.com>
Description:
This file contains the revision field of the the PCI device.
The value comes from device config space. The file is read only.

View File

@@ -6,7 +6,7 @@ Description:
Being used for adding and removing rbd block devices.
Usage: <mon ip addr> <options> <pool name> <rbd image name> [snap name]
Usage: <mon ip addr> <options> <pool name> <rbd image name> [<snap name>]
$ echo "192.168.0.1 name=admin rbd foo" > /sys/bus/rbd/add
@@ -14,9 +14,13 @@ The snapshot name can be "-" or omitted to map the image read/write. A <dev-id>
will be assigned for any registered block device. If snapshot is used, it will
be mapped read-only.
Removal of a device:
Usage: <dev-id> [force]
$ echo <dev-id> > /sys/bus/rbd/remove
$ echo 2 > /sys/bus/rbd/remove
Optional "force" argument which when passed will wait for running requests and
then unmap the image. Requests sent to the driver after initiating the removal
will be failed. (August 2016, since 4.9.)
What: /sys/bus/rbd/add_single_major
Date: December 2013
@@ -43,10 +47,25 @@ Description: Available only if rbd module is inserted with single_major
Entries under /sys/bus/rbd/devices/<dev-id>/
--------------------------------------------
client_addr
The ceph unique client entity_addr_t (address + nonce).
The format is <address>:<port>/<nonce>: '1.2.3.4:1234/5678' or
'[1:2:3:4:5:6:7:8]:1234/5678'. (August 2016, since 4.9.)
client_id
The ceph unique client id that was assigned for this specific session.
cluster_fsid
The ceph cluster UUID. (August 2016, since 4.9.)
config_info
The string written into /sys/bus/rbd/add{,_single_major}. (August
2016, since 4.9.)
features
A hexadecimal encoding of the feature bits for this image.
@@ -92,6 +111,10 @@ current_snap
The current snapshot for which the device is mapped.
snap_id
The current snapshot's id. (August 2016, since 4.9.)
parent
Information identifying the chain of parent images in a layered rbd

View File

@@ -0,0 +1,111 @@
What: /sys/.../<device>/mdev_supported_types/
Date: October 2016
Contact: Kirti Wankhede <kwankhede@nvidia.com>
Description:
This directory contains list of directories of currently
supported mediated device types and their details for
<device>. Supported type attributes are defined by the
vendor driver who registers with Mediated device framework.
Each supported type is a directory whose name is created
by adding the device driver string as a prefix to the
string provided by the vendor driver.
What: /sys/.../<device>/mdev_supported_types/<type-id>/
Date: October 2016
Contact: Kirti Wankhede <kwankhede@nvidia.com>
Description:
This directory gives details of supported type, like name,
description, available_instances, device_api etc.
'device_api' and 'available_instances' are mandatory
attributes to be provided by vendor driver. 'name',
'description' and other vendor driver specific attributes
are optional.
What: /sys/.../mdev_supported_types/<type-id>/create
Date: October 2016
Contact: Kirti Wankhede <kwankhede@nvidia.com>
Description:
Writing UUID to this file will create mediated device of
type <type-id> for parent device <device>. This is a
write-only file.
For example:
# echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \
/sys/devices/foo/mdev_supported_types/foo-1/create
What: /sys/.../mdev_supported_types/<type-id>/devices/
Date: October 2016
Contact: Kirti Wankhede <kwankhede@nvidia.com>
Description:
This directory contains symbolic links pointing to mdev
devices sysfs entries which are created of this <type-id>.
What: /sys/.../mdev_supported_types/<type-id>/available_instances
Date: October 2016
Contact: Kirti Wankhede <kwankhede@nvidia.com>
Description:
Reading this attribute will show the number of mediated
devices of type <type-id> that can be created. This is a
readonly file.
Users:
Userspace applications interested in creating mediated
device of that type. Userspace application should check
the number of available instances could be created before
creating mediated device of this type.
What: /sys/.../mdev_supported_types/<type-id>/device_api
Date: October 2016
Contact: Kirti Wankhede <kwankhede@nvidia.com>
Description:
Reading this attribute will show VFIO device API supported
by this type. For example, "vfio-pci" for a PCI device,
"vfio-platform" for platform device.
What: /sys/.../mdev_supported_types/<type-id>/name
Date: October 2016
Contact: Kirti Wankhede <kwankhede@nvidia.com>
Description:
Reading this attribute will show human readable name of the
mediated device that will get created of type <type-id>.
This is optional attribute. For example: "Grid M60-0Q"
Users:
Userspace applications interested in knowing the name of
a particular <type-id> that can help in understanding the
type of mediated device.
What: /sys/.../mdev_supported_types/<type-id>/description
Date: October 2016
Contact: Kirti Wankhede <kwankhede@nvidia.com>
Description:
Reading this attribute will show description of the type of
mediated device that will get created of type <type-id>.
This is optional attribute. For example:
"2 heads, 512M FB, 2560x1600 maximum resolution"
Users:
Userspace applications interested in knowing the details of
a particular <type-id> that can help in understanding the
features provided by that type of mediated device.
What: /sys/.../<device>/<UUID>/
Date: October 2016
Contact: Kirti Wankhede <kwankhede@nvidia.com>
Description:
This directory represents device directory of mediated
device. It contains all the attributes related to mediated
device.
What: /sys/.../<device>/<UUID>/mdev_type
Date: October 2016
Contact: Kirti Wankhede <kwankhede@nvidia.com>
Description:
This is symbolic link pointing to supported type, <type-id>
directory of which this mediated device is created.
What: /sys/.../<device>/<UUID>/remove
Date: October 2016
Contact: Kirti Wankhede <kwankhede@nvidia.com>
Description:
Writing '1' to this file destroys the mediated device. The
vendor driver can fail the remove() callback if that device
is active and the vendor driver doesn't support hot unplug.
Example:
# echo 1 > /sys/bus/mdev/devices/<UUID>/remove

View File

@@ -220,8 +220,11 @@ What: /sys/class/cxl/<card>/reset
Date: October 2014
Contact: linuxppc-dev@lists.ozlabs.org
Description: write only
Writing 1 will issue a PERST to card which may cause the card
to reload the FPGA depending on load_image_on_perst.
Writing 1 will issue a PERST to card provided there are no
contexts active on any one of the card AFUs. This may cause
the card to reload the FPGA depending on load_image_on_perst.
Writing -1 will do a force PERST irrespective of any active
contexts on the card AFUs.
Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<card>/perst_reloads_same_image (not in a guest)

View File

@@ -0,0 +1,11 @@
What: /sys/class/fpga_bridge/<bridge>/name
Date: January 2016
KernelVersion: 4.5
Contact: Alan Tull <atull@opensource.altera.com>
Description: Name of low level FPGA bridge driver.
What: /sys/class/fpga_bridge/<bridge>/state
Date: January 2016
KernelVersion: 4.5
Contact: Alan Tull <atull@opensource.altera.com>
Description: Show bridge state as "enabled" or "disabled"

View File

@@ -4,16 +4,24 @@ KernelVersion: 2.6.17
Contact: Richard Purdie <rpurdie@rpsys.net>
Description:
Set the brightness of the LED. Most LEDs don't
have hardware brightness support so will just be turned on for
have hardware brightness support, so will just be turned on for
non-zero brightness settings. The value is between 0 and
/sys/class/leds/<led>/max_brightness.
Writing 0 to this file clears active trigger.
Writing non-zero to this file while trigger is active changes the
top brightness trigger is going to use.
What: /sys/class/leds/<led>/max_brightness
Date: March 2006
KernelVersion: 2.6.17
Contact: Richard Purdie <rpurdie@rpsys.net>
Description:
Maximum brightness level for this led, default is 255 (LED_FULL).
Maximum brightness level for this LED, default is 255 (LED_FULL).
If the LED does not support different brightness levels, this
should be 1.
What: /sys/class/leds/<led>/trigger
Date: March 2006
@@ -21,10 +29,11 @@ KernelVersion: 2.6.17
Contact: Richard Purdie <rpurdie@rpsys.net>
Description:
Set the trigger for this LED. A trigger is a kernel based source
of led events.
of LED events.
You can change triggers in a similar manner to the way an IO
scheduler is chosen. Trigger specific parameters can appear in
/sys/class/leds/<led> once a given trigger is selected.
/sys/class/leds/<led> once a given trigger is selected. For
their documentation see sysfs-class-led-trigger-*.
What: /sys/class/leds/<led>/inverted
Date: January 2011

View File

@@ -0,0 +1,36 @@
What: /sys/class/leds/<led>/delay_on
Date: Jun 2012
KernelVersion: 3.6
Contact: linux-leds@vger.kernel.org
Description:
Specifies for how many milliseconds the LED has to stay at
LED_FULL brightness after it has been armed.
Defaults to 100 ms.
What: /sys/class/leds/<led>/delay_off
Date: Jun 2012
KernelVersion: 3.6
Contact: linux-leds@vger.kernel.org
Description:
Specifies for how many milliseconds the LED has to stay at
LED_OFF brightness after it has been armed.
Defaults to 100 ms.
What: /sys/class/leds/<led>/invert
Date: Jun 2012
KernelVersion: 3.6
Contact: linux-leds@vger.kernel.org
Description:
Reverse the blink logic. If set to 0 (default) blink on for
delay_on ms, then blink off for delay_off ms, leaving the LED
normally off. If set to 1, blink off for delay_off ms, then
blink on for delay_on ms, leaving the LED normally on.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/shot
Date: Jun 2012
KernelVersion: 3.6
Contact: linux-leds@vger.kernel.org
Description:
Write any non-empty string to signal an events, this starts a
blink sequence if not already running.

View File

@@ -0,0 +1,12 @@
What: /sys/class/leds/<led>/ports/<port>
Date: September 2016
KernelVersion: 4.9
Contact: linux-leds@vger.kernel.org
linux-usb@vger.kernel.org
Description:
Every dir entry represents a single USB port that can be
selected for the USB port trigger. Selecting ports makes trigger
observing them for any connected devices and lighting on LED if
there are any.
Echoing "1" value selects USB port. Echoing "0" unselects it.
Current state can be also read.

View File

@@ -29,3 +29,19 @@ Description: Display fw status registers content
Also number of registers varies between 1 and 6
depending on generation.
What: /sys/class/mei/meiN/hbm_ver
Date: Aug 2016
KernelVersion: 4.9
Contact: Tomas Winkler <tomas.winkler@intel.com>
Description: Display the negotiated HBM protocol version.
The HBM protocol version negotiated
between the driver and the device.
What: /sys/class/mei/meiN/hbm_ver_drv
Date: Aug 2016
KernelVersion: 4.9
Contact: Tomas Winkler <tomas.winkler@intel.com>
Description: Display the driver HBM protocol version.
The HBM protocol version supported by the driver.

View File

@@ -153,7 +153,7 @@ Description:
What: /sys/class/mic/mic(x)/heartbeat_enable
Date: March 2015
KernelVersion: 3.20
KernelVersion: 4.4
Contact: Ashutosh Dixit <ashutosh.dixit@intel.com>
Description:
The MIC drivers detect and inform user space about card crashes

View File

@@ -22,7 +22,7 @@ Description:
What: /sys/class/power_supply/max14577-charger/device/fast_charge_timer
Date: October 2014
KernelVersion: 3.18.0
Contact: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Contact: Krzysztof Kozlowski <krzk@kernel.org>
Description:
This entry shows and sets the maximum time the max14577
charger operates in fast-charge mode. When the timer expires
@@ -36,7 +36,7 @@ Description:
What: /sys/class/power_supply/max77693-charger/device/fast_charge_timer
Date: January 2015
KernelVersion: 3.19.0
Contact: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Contact: Krzysztof Kozlowski <krzk@kernel.org>
Description:
This entry shows and sets the maximum time the max77693
charger operates in fast-charge mode. When the timer expires
@@ -50,7 +50,7 @@ Description:
What: /sys/class/power_supply/max77693-charger/device/top_off_threshold_current
Date: January 2015
KernelVersion: 3.19.0
Contact: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Contact: Krzysztof Kozlowski <krzk@kernel.org>
Description:
This entry shows and sets the charging current threshold for
entering top-off charging mode. When charging current in fast
@@ -65,7 +65,7 @@ Description:
What: /sys/class/power_supply/max77693-charger/device/top_off_timer
Date: January 2015
KernelVersion: 3.19.0
Contact: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Contact: Krzysztof Kozlowski <krzk@kernel.org>
Description:
This entry shows and sets the maximum time the max77693
charger operates in top-off charge mode. When the timer expires

View File

@@ -0,0 +1,50 @@
What: /sys/class/remoteproc/.../firmware
Date: October 2016
Contact: Matt Redfearn <matt.redfearn@imgtec.com>
Description: Remote processor firmware
Reports the name of the firmware currently loaded to the
remote processor.
To change the running firmware, ensure the remote processor is
stopped (using /sys/class/remoteproc/.../state) and write a new filename.
What: /sys/class/remoteproc/.../state
Date: October 2016
Contact: Matt Redfearn <matt.redfearn@imgtec.com>
Description: Remote processor state
Reports the state of the remote processor, which will be one of:
"offline"
"suspended"
"running"
"crashed"
"invalid"
"offline" means the remote processor is powered off.
"suspended" means that the remote processor is suspended and
must be woken to receive messages.
"running" is the normal state of an available remote processor
"crashed" indicates that a problem/crash has been detected on
the remote processor.
"invalid" is returned if the remote processor is in an
unknown state.
Writing this file controls the state of the remote processor.
The following states can be written:
"start"
"stop"
Writing "start" will attempt to start the processor running the
firmware indicated by, or written to,
/sys/class/remoteproc/.../firmware. The remote processor should
transition to "running" state.
Writing "stop" will attempt to halt the remote processor and
return it to the "offline" state.

View File

@@ -1,8 +1,9 @@
What: Attribute for calibrating ST-Ericsson AB8500 Real Time Clock
What: /sys/class/rtc/rtc0/device/rtc_calibration
Date: Oct 2011
KernelVersion: 3.0
Contact: Mark Godfrey <mark.godfrey@stericsson.com>
Description: The rtc_calibration attribute allows the userspace to
Description: Attribute for calibrating ST-Ericsson AB8500 Real Time Clock
The rtc_calibration attribute allows the userspace to
calibrate the AB8500.s 32KHz Real Time Clock.
Every 60 seconds the AB8500 will correct the RTC's value
by adding to it the value of this attribute.

View File

@@ -272,6 +272,22 @@ Description: Parameters for the CPU cache attributes
the modified cache line is written to main
memory only when it is replaced
What: /sys/devices/system/cpu/cpu*/cache/index*/id
Date: September 2016
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Cache id
The id provides a unique number for a specific instance of
a cache of a particular type. E.g. there may be a level
3 unified cache on each socket in a server and we may
assign them ids 0, 1, 2, ...
Note that id value can be non-contiguous. E.g. level 1
caches typically exist per core, but there may not be a
power of two cores on a socket, so these caches may be
numbered 0, 1, 2, 3, 4, 5, 8, 9, 10, ...
What: /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats
/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat
/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub_turbo_stat

View File

@@ -1,4 +1,4 @@
What: state
What: /sys/devices/system/ibm_rtl/state
Date: Sep 2010
KernelVersion: 2.6.37
Contact: Vernon Mauery <vernux@us.ibm.com>
@@ -10,7 +10,7 @@ Description: The state file allows a means by which to change in and
Users: The ibm-prtm userspace daemon uses this interface.
What: version
What: /sys/devices/system/ibm_rtl/version
Date: Sep 2010
KernelVersion: 2.6.37
Contact: Vernon Mauery <vernux@us.ibm.com>

View File

@@ -35,6 +35,12 @@ Description: Displays a set of alternate modes supported by a wheel. Each
DF-EX <*--------> G25 <-> G27
DF-EX <*----------------> G27
G29:
DF-EX <*> DFP <-> G25 <-> G27 <-> G29
DF-EX <*--------> G25 <-> G27 <-> G29
DF-EX <*----------------> G27 <-> G29
DF-EX <*------------------------> G29
DFGT:
DF-EX <*> DFP <-> DFGT
DF-EX <*--------> DFGT
@@ -50,3 +56,12 @@ Description: Displays the real model of the wheel regardless of any
alternate mode the wheel might be switched to.
It is a read-only value.
This entry is not created for devices that have only one mode.
What: /sys/bus/hid/drivers/logitech/<dev>/combine_pedals
Date: Sep 2016
KernelVersion: 4.9
Contact: Simon Wood <simon@mungewell.org>
Description: Controls whether a combined value of accelerator and brake is
reported on the Y axis of the controller. Useful for older games
which can do not work with separate accelerator/brake axis.
Off ('0') by default, enabled by setting '1'.

View File

@@ -24,6 +24,7 @@ What: /sys/bus/hid/devices/<bus>:<vid>:<pid>.<n>/wacom_led/status0_luminance
Date: August 2014
Contact: linux-input@vger.kernel.org
Description:
<obsoleted by the LED class API now exported by the driver>
Writing to this file sets the status LED luminance (1..127)
when the stylus does not touch the tablet surface, and no
button is pressed on the stylus. This luminance level is
@@ -33,6 +34,7 @@ What: /sys/bus/hid/devices/<bus>:<vid>:<pid>.<n>/wacom_led/status1_luminance
Date: August 2014
Contact: linux-input@vger.kernel.org
Description:
<obsoleted by the LED class API now exported by the driver>
Writing to this file sets the status LED luminance (1..127)
when the stylus touches the tablet surface, or any button is
pressed on the stylus.
@@ -41,6 +43,7 @@ What: /sys/bus/hid/devices/<bus>:<vid>:<pid>.<n>/wacom_led/status_led0_select
Date: August 2014
Contact: linux-input@vger.kernel.org
Description:
<obsoleted by the LED class API now exported by the driver>
Writing to this file sets which one of the four (for Intuos 4
and Intuos 5) or of the right four (for Cintiq 21UX2 and Cintiq
24HD) status LEDs is active (0..3). The other three LEDs on the
@@ -50,6 +53,7 @@ What: /sys/bus/hid/devices/<bus>:<vid>:<pid>.<n>/wacom_led/status_led1_select
Date: August 2014
Contact: linux-input@vger.kernel.org
Description:
<obsoleted by the LED class API now exported by the driver>
Writing to this file sets which one of the left four (for Cintiq 21UX2
and Cintiq 24HD) status LEDs is active (0..3). The other three LEDs on
the left are always inactive.
@@ -91,6 +95,7 @@ What: /sys/bus/hid/devices/<bus>:<vid>:<pid>.<n>/wacom_remote/<serial_number>/r
Date: July 2015
Contact: linux-input@vger.kernel.org
Description:
<obsoleted by the LED class API now exported by the driver>
Reading from this file reports the mode status of the
remote as indicated by the LED lights on the device. If no
reports have been received from the paired device, reading

View File

@@ -1,31 +0,0 @@
What: /sys/bus/i2c/devices/<busnum>-<devaddr>/pressure0_input
Date: June 2010
Contact: Christoph Mair <christoph.mair@gmail.com>
Description: Start a pressure measurement and read the result. Values
represent the ambient air pressure in pascal (0.01 millibar).
Reading: returns the current air pressure.
What: /sys/bus/i2c/devices/<busnum>-<devaddr>/temp0_input
Date: June 2010
Contact: Christoph Mair <christoph.mair@gmail.com>
Description: Measure the ambient temperature. The returned value represents
the ambient temperature in units of 0.1 degree celsius.
Reading: returns the current temperature.
What: /sys/bus/i2c/devices/<busnum>-<devaddr>/oversampling
Date: June 2010
Contact: Christoph Mair <christoph.mair@gmail.com>
Description: Tell the bmp085 to use more samples to calculate a pressure
value. When writing to this file the chip will use 2^x samples
to calculate the next pressure value with x being the value
written. Using this feature will decrease RMS noise and
increase the measurement time.
Reading: returns the current oversampling setting.
Writing: sets a new oversampling setting.
Accepted values: 0..3.

View File

@@ -0,0 +1,53 @@
What: /sys/kernel/irq
Date: September 2016
KernelVersion: 4.9
Contact: Craig Gallek <kraig@google.com>
Description: Directory containing information about the system's IRQs.
Specifically, data from the associated struct irq_desc.
The information here is similar to that in /proc/interrupts
but in a more machine-friendly format. This directory contains
one subdirectory for each Linux IRQ number.
What: /sys/kernel/irq/<irq>/actions
Date: September 2016
KernelVersion: 4.9
Contact: Craig Gallek <kraig@google.com>
Description: The IRQ action chain. A comma-separated list of zero or more
device names associated with this interrupt.
What: /sys/kernel/irq/<irq>/chip_name
Date: September 2016
KernelVersion: 4.9
Contact: Craig Gallek <kraig@google.com>
Description: Human-readable chip name supplied by the associated device
driver.
What: /sys/kernel/irq/<irq>/hwirq
Date: September 2016
KernelVersion: 4.9
Contact: Craig Gallek <kraig@google.com>
Description: When interrupt translation domains are used, this file contains
the underlying hardware IRQ number used for this Linux IRQ.
What: /sys/kernel/irq/<irq>/name
Date: September 2016
KernelVersion: 4.9
Contact: Craig Gallek <kraig@google.com>
Description: Human-readable flow handler name as defined by the irq chip
driver.
What: /sys/kernel/irq/<irq>/per_cpu_count
Date: September 2016
KernelVersion: 4.9
Contact: Craig Gallek <kraig@google.com>
Description: The number of times the interrupt has fired since boot. This
is a comma-separated list of counters; one per CPU in CPU id
order. NOTE: This file consistently shows counters for all
CPU ids. This differs from the behavior of /proc/interrupts
which only shows counters for online CPUs.
What: /sys/kernel/irq/<irq>/type
Date: September 2016
KernelVersion: 4.9
Contact: Craig Gallek <kraig@google.com>
Description: The type of the interrupt. Either the string 'level' or 'edge'.

View File

@@ -347,7 +347,7 @@ Description:
because of fragmentation, SLUB will retry with the minimum order
possible depending on its characteristics.
When debug_guardpage_minorder=N (N > 0) parameter is specified
(see Documentation/kernel-parameters.txt), the minimum possible
(see Documentation/admin-guide/kernel-parameters.rst), the minimum possible
order is used and this sysfs entry can not be used to change
the order at run time.

View File

@@ -0,0 +1,15 @@
What: /sys/devices/platform/<phy-name>/role
Date: October 2016
KernelVersion: 4.10
Contact: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Description:
This file can be read and write.
The file can show/change the phy mode for role swap of usb.
Write the following strings to change the mode:
"host" - switching mode from peripheral to host.
"peripheral" - switching mode from host to peripheral.
Read the file, then it shows the following strings:
"host" - The mode is host now.
"peripheral" - The mode is peripheral now.

View File

@@ -0,0 +1,17 @@
What: /sys/devices/platform/8086%x:00/firmware_version
Date: November 2016
KernelVersion: 4.10
Contact: "Sebastien Guiriec" <sebastien.guiriec@intel.com>
Description:
LPE Firmware version for SST driver on all atom
plaforms (BYT/CHT/Merrifield/BSW).
If the FW has never been loaded it will display:
"FW not yet loaded"
If FW has been loaded it will display:
"v01.aa.bb.cc"
aa: Major version is reflecting SoC version:
0d: BYT FW
0b: BSW FW
07: Merrifield FW
bb: Minor version
cc: Build version

View File

@@ -7,30 +7,35 @@ Description:
subsystem.
What: /sys/power/state
Date: May 2014
Date: November 2016
Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
Description:
The /sys/power/state file controls system sleep states.
Reading from this file returns the available sleep state
labels, which may be "mem", "standby", "freeze" and "disk"
(hibernation). The meanings of the first three labels depend on
the relative_sleep_states command line argument as follows:
1) relative_sleep_states = 1
"mem", "standby", "freeze" represent non-hibernation sleep
states from the deepest ("mem", always present) to the
shallowest ("freeze"). "standby" and "freeze" may or may
not be present depending on the capabilities of the
platform. "freeze" can only be present if "standby" is
present.
2) relative_sleep_states = 0 (default)
"mem" - "suspend-to-RAM", present if supported.
"standby" - "power-on suspend", present if supported.
"freeze" - "suspend-to-idle", always present.
labels, which may be "mem" (suspend), "standby" (power-on
suspend), "freeze" (suspend-to-idle) and "disk" (hibernation).
Writing to this file one of these strings causes the system to
transition into the corresponding state, if available. See
Documentation/power/states.txt for a description of what
"suspend-to-RAM", "power-on suspend" and "suspend-to-idle" mean.
Writing one of the above strings to this file causes the system
to transition into the corresponding state, if available.
See Documentation/power/states.txt for more information.
What: /sys/power/mem_sleep
Date: November 2016
Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
Description:
The /sys/power/mem_sleep file controls the operating mode of
system suspend. Reading from it returns the available modes
as "s2idle" (always present), "shallow" and "deep" (present if
supported). The mode that will be used on subsequent attempts
to suspend the system (by writing "mem" to the /sys/power/state
file described above) is enclosed in square brackets.
Writing one of the above strings to this file causes the mode
represented by it to be used on subsequent attempts to suspend
the system.
See Documentation/power/states.txt for more information.
What: /sys/power/disk
Date: September 2006

View File

@@ -1,246 +0,0 @@
Table of contents
=================
Last updated: 20 December 2005
Contents
========
- Introduction
- Devices not appearing
- Finding patch that caused a bug
-- Finding using git-bisect
-- Finding it the old way
- Fixing the bug
Introduction
============
Always try the latest kernel from kernel.org and build from source. If you are
not confident in doing that please report the bug to your distribution vendor
instead of to a kernel developer.
Finding bugs is not always easy. Have a go though. If you can't find it don't
give up. Report as much as you have found to the relevant maintainer. See
MAINTAINERS for who that is for the subsystem you have worked on.
Before you submit a bug report read REPORTING-BUGS.
Devices not appearing
=====================
Often this is caused by udev. Check that first before blaming it on the
kernel.
Finding patch that caused a bug
===============================
Finding using git-bisect
------------------------
Using the provided tools with git makes finding bugs easy provided the bug is
reproducible.
Steps to do it:
- start using git for the kernel source
- read the man page for git-bisect
- have fun
Finding it the old way
----------------------
[Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)]
This is how to track down a bug if you know nothing about kernel hacking.
It's a brute force approach but it works pretty well.
You need:
. A reproducible bug - it has to happen predictably (sorry)
. All the kernel tar files from a revision that worked to the
revision that doesn't
You will then do:
. Rebuild a revision that you believe works, install, and verify that.
. Do a binary search over the kernels to figure out which one
introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but
you know that 1.3.69 does. Pick a kernel in the middle and build
that, like 1.3.50. Build & test; if it works, pick the mid point
between .50 and .69, else the mid point between .28 and .50.
. You'll narrow it down to the kernel that introduced the bug. You
can probably do better than this but it gets tricky.
. Narrow it down to a subdirectory
- Copy kernel that works into "test". Let's say that 3.62 works,
but 3.63 doesn't. So you diff -r those two kernels and come
up with a list of directories that changed. For each of those
directories:
Copy the non-working directory next to the working directory
as "dir.63".
One directory at time, try moving the working directory to
"dir.62" and mv dir.63 dir"time, try
mv dir dir.62
mv dir.63 dir
find dir -name '*.[oa]' -print | xargs rm -f
And then rebuild and retest. Assuming that all related
changes were contained in the sub directory, this should
isolate the change to a directory.
Problems: changes in header files may have occurred; I've
found in my case that they were self explanatory - you may
or may not want to give up when that happens.
. Narrow it down to a file
- You can apply the same technique to each file in the directory,
hoping that the changes in that file are self contained.
. Narrow it down to a routine
- You can take the old file and the new file and manually create
a merged file that has
#ifdef VER62
routine()
{
...
}
#else
routine()
{
...
}
#endif
And then walk through that file, one routine at a time and
prefix it with
#define VER62
/* both routines here */
#undef VER62
Then recompile, retest, move the ifdefs until you find the one
that makes the difference.
Finally, you take all the info that you have, kernel revisions, bug
description, the extent to which you have narrowed it down, and pass
that off to whomever you believe is the maintainer of that section.
A post to linux.dev.kernel isn't such a bad idea if you've done some
work to narrow it down.
If you get it down to a routine, you'll probably get a fix in 24 hours.
My apologies to Linus and the other kernel hackers for describing this
brute force approach, it's hardly what a kernel hacker would do. However,
it does work and it lets non-hackers help fix bugs. And it is cool
because Linux snapshots will let you do this - something that you can't
do with vendor supplied releases.
Fixing the bug
==============
Nobody is going to tell you how to fix bugs. Seriously. You need to work it
out. But below are some hints on how to use the tools.
To debug a kernel, use objdump and look for the hex offset from the crash
output to find the valid line of code/assembler. Without debug symbols, you
will see the assembler code for the routine shown, but if your kernel has
debug symbols the C code will also be available. (Debug symbols can be enabled
in the kernel hacking menu of the menu configuration.) For example:
objdump -r -S -l --disassemble net/dccp/ipv4.o
NB.: you need to be at the top level of the kernel tree for this to pick up
your C files.
If you don't have access to the code you can also debug on some crash dumps
e.g. crash dump output as shown by Dave Miller.
> EIP is at ip_queue_xmit+0x14/0x4c0
> ...
> Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
> 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
> <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85
>
> Put the bytes into a "foo.s" file like this:
>
> .text
> .globl foo
> foo:
> .byte .... /* bytes from Code: part of OOPS dump */
>
> Compile it with "gcc -c -o foo.o foo.s" then look at the output of
> "objdump --disassemble foo.o".
>
> Output:
>
> ip_queue_xmit:
> push %ebp
> push %edi
> push %esi
> push %ebx
> sub $0xbc, %esp
> mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)
> mov 0x8(%ebp), %ebx ! %ebx = skb->sk
> mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
In addition, you can use GDB to figure out the exact file and line
number of the OOPS from the vmlinux file. If you have
CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the
OOPS:
EIP: 0060:[<c021e50e>] Not tainted VLI
And use GDB to translate that to human-readable form:
gdb vmlinux
(gdb) l *0xc021e50e
If you don't have CONFIG_DEBUG_INFO enabled, you use the function
offset from the OOPS:
EIP is at vt_ioctl+0xda8/0x1482
And recompile the kernel with CONFIG_DEBUG_INFO enabled:
make vmlinux
gdb vmlinux
(gdb) p vt_ioctl
(gdb) l *(0x<address of vt_ioctl> + 0xda8)
or, as one command
(gdb) l *(vt_ioctl + 0xda8)
If you have a call trace, such as :-
>Call Trace:
> [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
> [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
> [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
> ...
this shows the problem in the :jbd: module. You can load that module in gdb
and list the relevant code.
gdb fs/jbd/jbd.ko
(gdb) p log_wait_commit
(gdb) l *(0x<address> + 0xa3)
or
(gdb) l *(log_wait_commit + 0xa3)
Another very useful option of the Kernel Hacking section in menuconfig is
Debug memory allocations. This will help you see whether data has been
initialised and not set before use etc. To see the values that get assigned
with this look at mm/slab.c and search for POISON_INUSE. When using this an
Oops will often show the poisoned data instead of zero which is the default.
Once you have worked out a fix please submit it upstream. After all open
source is about sharing what you do and don't you want to be recognised for
your genius?
Please do read Documentation/SubmittingPatches though to help your code get
accepted.

View File

@@ -1,411 +0,0 @@
Intro
=====
This document is designed to provide a list of the minimum levels of
software necessary to run the 3.0 kernels.
This document is originally based on my "Changes" file for 2.0.x kernels
and therefore owes credit to the same people as that file (Jared Mauch,
Axel Boldt, Alessandro Sigala, and countless other users all over the
'net).
Current Minimal Requirements
============================
Upgrade to at *least* these software revisions before thinking you've
encountered a bug! If you're unsure what version you're currently
running, the suggested command should tell you.
Again, keep in mind that this list assumes you are already functionally
running a Linux kernel. Also, not all tools are necessary on all
systems; obviously, if you don't have any ISDN hardware, for example,
you probably needn't concern yourself with isdn4k-utils.
o GNU C 3.2 # gcc --version
o GNU make 3.80 # make --version
o binutils 2.12 # ld -v
o util-linux 2.10o # fdformat --version
o module-init-tools 0.9.10 # depmod -V
o e2fsprogs 1.41.4 # e2fsck -V
o jfsutils 1.1.3 # fsck.jfs -V
o reiserfsprogs 3.6.3 # reiserfsck -V
o xfsprogs 2.6.0 # xfs_db -V
o squashfs-tools 4.0 # mksquashfs -version
o btrfs-progs 0.18 # btrfsck
o pcmciautils 004 # pccardctl -V
o quota-tools 3.09 # quota -V
o PPP 2.4.0 # pppd --version
o isdn4k-utils 3.1pre1 # isdnctrl 2>&1|grep version
o nfs-utils 1.0.5 # showmount --version
o procps 3.2.0 # ps --version
o oprofile 0.9 # oprofiled --version
o udev 081 # udevd --version
o grub 0.93 # grub --version || grub-install --version
o mcelog 0.6 # mcelog --version
o iptables 1.4.2 # iptables -V
o openssl & libcrypto 1.0.0 # openssl version
o bc 1.06.95 # bc --version
Kernel compilation
==================
GCC
---
The gcc version requirements may vary depending on the type of CPU in your
computer.
Make
----
You will need GNU make 3.80 or later to build the kernel.
Binutils
--------
Linux on IA-32 has recently switched from using as86 to using gas for
assembling the 16-bit boot code, removing the need for as86 to compile
your kernel. This change does, however, mean that you need a recent
release of binutils.
Perl
----
You will need perl 5 and the following modules: Getopt::Long, Getopt::Std,
File::Basename, and File::Find to build the kernel.
BC
--
You will need bc to build kernels 3.10 and higher
OpenSSL
-------
Module signing and external certificate handling use the OpenSSL program and
crypto library to do key creation and signature generation.
You will need openssl to build kernels 3.7 and higher if module signing is
enabled. You will also need openssl development packages to build kernels 4.3
and higher.
System utilities
================
Architectural changes
---------------------
DevFS has been obsoleted in favour of udev
(http://www.kernel.org/pub/linux/utils/kernel/hotplug/)
32-bit UID support is now in place. Have fun!
Linux documentation for functions is transitioning to inline
documentation via specially-formatted comments near their
definitions in the source. These comments can be combined with the
SGML templates in the Documentation/DocBook directory to make DocBook
files, which can then be converted by DocBook stylesheets to PostScript,
HTML, PDF files, and several other formats. In order to convert from
DocBook format to a format of your choice, you'll need to install Jade as
well as the desired DocBook stylesheets.
Util-linux
----------
New versions of util-linux provide *fdisk support for larger disks,
support new options to mount, recognize more supported partition
types, have a fdformat which works with 2.4 kernels, and similar goodies.
You'll probably want to upgrade.
Ksymoops
--------
If the unthinkable happens and your kernel oopses, you may need the
ksymoops tool to decode it, but in most cases you don't.
It is generally preferred to build the kernel with CONFIG_KALLSYMS so
that it produces readable dumps that can be used as-is (this also
produces better output than ksymoops). If for some reason your kernel
is not build with CONFIG_KALLSYMS and you have no way to rebuild and
reproduce the Oops with that option, then you can still decode that Oops
with ksymoops.
Module-Init-Tools
-----------------
A new module loader is now in the kernel that requires module-init-tools
to use. It is backward compatible with the 2.4.x series kernels.
Mkinitrd
--------
These changes to the /lib/modules file tree layout also require that
mkinitrd be upgraded.
E2fsprogs
---------
The latest version of e2fsprogs fixes several bugs in fsck and
debugfs. Obviously, it's a good idea to upgrade.
JFSutils
--------
The jfsutils package contains the utilities for the file system.
The following utilities are available:
o fsck.jfs - initiate replay of the transaction log, and check
and repair a JFS formatted partition.
o mkfs.jfs - create a JFS formatted partition.
o other file system utilities are also available in this package.
Reiserfsprogs
-------------
The reiserfsprogs package should be used for reiserfs-3.6.x
(Linux kernels 2.4.x). It is a combined package and contains working
versions of mkreiserfs, resize_reiserfs, debugreiserfs and
reiserfsck. These utils work on both i386 and alpha platforms.
Xfsprogs
--------
The latest version of xfsprogs contains mkfs.xfs, xfs_db, and the
xfs_repair utilities, among others, for the XFS filesystem. It is
architecture independent and any version from 2.0.0 onward should
work correctly with this version of the XFS kernel code (2.6.0 or
later is recommended, due to some significant improvements).
PCMCIAutils
-----------
PCMCIAutils replaces pcmcia-cs. It properly sets up
PCMCIA sockets at system startup and loads the appropriate modules
for 16-bit PCMCIA devices if the kernel is modularized and the hotplug
subsystem is used.
Quota-tools
-----------
Support for 32 bit uid's and gid's is required if you want to use
the newer version 2 quota format. Quota-tools version 3.07 and
newer has this support. Use the recommended version or newer
from the table above.
Intel IA32 microcode
--------------------
A driver has been added to allow updating of Intel IA32 microcode,
accessible as a normal (misc) character device. If you are not using
udev you may need to:
mkdir /dev/cpu
mknod /dev/cpu/microcode c 10 184
chmod 0644 /dev/cpu/microcode
as root before you can use this. You'll probably also want to
get the user-space microcode_ctl utility to use with this.
udev
----
udev is a userspace application for populating /dev dynamically with
only entries for devices actually present. udev replaces the basic
functionality of devfs, while allowing persistent device naming for
devices.
FUSE
----
Needs libfuse 2.4.0 or later. Absolute minimum is 2.3.0 but mount
options 'direct_io' and 'kernel_cache' won't work.
Networking
==========
General changes
---------------
If you have advanced network configuration needs, you should probably
consider using the network tools from ip-route2.
Packet Filter / NAT
-------------------
The packet filtering and NAT code uses the same tools like the previous 2.4.x
kernel series (iptables). It still includes backwards-compatibility modules
for 2.2.x-style ipchains and 2.0.x-style ipfwadm.
PPP
---
The PPP driver has been restructured to support multilink and to
enable it to operate over diverse media layers. If you use PPP,
upgrade pppd to at least 2.4.0.
If you are not using udev, you must have the device file /dev/ppp
which can be made by:
mknod /dev/ppp c 108 0
as root.
Isdn4k-utils
------------
Due to changes in the length of the phone number field, isdn4k-utils
needs to be recompiled or (preferably) upgraded.
NFS-utils
---------
In ancient (2.4 and earlier) kernels, the nfs server needed to know
about any client that expected to be able to access files via NFS. This
information would be given to the kernel by "mountd" when the client
mounted the filesystem, or by "exportfs" at system startup. exportfs
would take information about active clients from /var/lib/nfs/rmtab.
This approach is quite fragile as it depends on rmtab being correct
which is not always easy, particularly when trying to implement
fail-over. Even when the system is working well, rmtab suffers from
getting lots of old entries that never get removed.
With modern kernels we have the option of having the kernel tell mountd
when it gets a request from an unknown host, and mountd can give
appropriate export information to the kernel. This removes the
dependency on rmtab and means that the kernel only needs to know about
currently active clients.
To enable this new functionality, you need to:
mount -t nfsd nfsd /proc/fs/nfsd
before running exportfs or mountd. It is recommended that all NFS
services be protected from the internet-at-large by a firewall where
that is possible.
mcelog
------
On x86 kernels the mcelog utility is needed to process and log machine check
events when CONFIG_X86_MCE is enabled. Machine check events are errors reported
by the CPU. Processing them is strongly encouraged.
Getting updated software
========================
Kernel compilation
******************
gcc
---
o <ftp://ftp.gnu.org/gnu/gcc/>
Make
----
o <ftp://ftp.gnu.org/gnu/make/>
Binutils
--------
o <ftp://ftp.kernel.org/pub/linux/devel/binutils/>
OpenSSL
-------
o <https://www.openssl.org/>
System utilities
****************
Util-linux
----------
o <ftp://ftp.kernel.org/pub/linux/utils/util-linux/>
Ksymoops
--------
o <ftp://ftp.kernel.org/pub/linux/utils/kernel/ksymoops/v2.4/>
Module-Init-Tools
-----------------
o <ftp://ftp.kernel.org/pub/linux/kernel/people/rusty/modules/>
Mkinitrd
--------
o <https://code.launchpad.net/initrd-tools/main>
E2fsprogs
---------
o <http://prdownloads.sourceforge.net/e2fsprogs/e2fsprogs-1.29.tar.gz>
JFSutils
--------
o <http://jfs.sourceforge.net/>
Reiserfsprogs
-------------
o <http://www.kernel.org/pub/linux/utils/fs/reiserfs/>
Xfsprogs
--------
o <ftp://oss.sgi.com/projects/xfs/>
Pcmciautils
-----------
o <ftp://ftp.kernel.org/pub/linux/utils/kernel/pcmcia/>
Quota-tools
----------
o <http://sourceforge.net/projects/linuxquota/>
DocBook Stylesheets
-------------------
o <http://sourceforge.net/projects/docbook/files/docbook-dsssl/>
XMLTO XSLT Frontend
-------------------
o <http://cyberelk.net/tim/xmlto/>
Intel P6 microcode
------------------
o <https://downloadcenter.intel.com/>
udev
----
o <http://www.freedesktop.org/software/systemd/man/udev.html>
FUSE
----
o <http://sourceforge.net/projects/fuse>
mcelog
------
o <http://www.mcelog.org/>
Networking
**********
PPP
---
o <ftp://ftp.samba.org/pub/ppp/>
Isdn4k-utils
------------
o <ftp://ftp.isdn4linux.de/pub/isdn4linux/utils/>
NFS-utils
---------
o <http://sourceforge.net/project/showfiles.php?group_id=14>
Iptables
--------
o <http://www.iptables.org/downloads.html>
Ip-route2
---------
o <https://www.kernel.org/pub/linux/utils/net/iproute2/>
OProfile
--------
o <http://oprofile.sf.net/download/>
NFS-Utils
---------
o <http://nfs.sourceforge.net/>

1
Documentation/Changes Symbolic link
View File

@@ -0,0 +1 @@
process/changes.rst

View File

@@ -1,27 +0,0 @@
Code of Conflict
----------------
The Linux kernel development effort is a very personal process compared
to "traditional" ways of developing software. Your code and ideas
behind it will be carefully reviewed, often resulting in critique and
criticism. The review will almost always require improvements to the
code before it can be included in the kernel. Know that this happens
because everyone involved wants to see the best possible solution for
the overall success of Linux. This development process has been proven
to create the most robust operating system kernel ever, and we do not
want to do anything to cause the quality of submission and eventual
result to ever decrease.
If however, anyone feels personally abused, threatened, or otherwise
uncomfortable due to this process, that is not acceptable. If so,
please contact the Linux Foundation's Technical Advisory Board at
<tab@lists.linux-foundation.org>, or the individual members, and they
will work to resolve the issue to the best of their ability. For more
information on who is on the Technical Advisory Board and what their
role is, please see:
http://www.linuxfoundation.org/programs/advisory-councils/tab
As a reviewer of code, please strive to keep things civil and focused on
the technical issues involved. We are all humans, and frustrations can
be high on both sides of the process. Try to keep in mind the immortal
words of Bill and Ted, "Be excellent to each other."

View File

@@ -1,956 +1 @@
Linux kernel coding style
This is a short document describing the preferred coding style for the
linux kernel. Coding style is very personal, and I won't _force_ my
views on anybody, but this is what goes for anything that I have to be
able to maintain, and I'd prefer it for most other things too. Please
at least consider the points made here.
First off, I'd suggest printing out a copy of the GNU coding standards,
and NOT read it. Burn them, it's a great symbolic gesture.
Anyway, here goes:
Chapter 1: Indentation
Tabs are 8 characters, and thus indentations are also 8 characters.
There are heretic movements that try to make indentations 4 (or even 2!)
characters deep, and that is akin to trying to define the value of PI to
be 3.
Rationale: The whole idea behind indentation is to clearly define where
a block of control starts and ends. Especially when you've been looking
at your screen for 20 straight hours, you'll find it a lot easier to see
how the indentation works if you have large indentations.
Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen. The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.
In short, 8-char indents make things easier to read, and have the added
benefit of warning you when you're nesting your functions too deep.
Heed that warning.
The preferred way to ease multiple indentation levels in a switch statement is
to align the "switch" and its subordinate "case" labels in the same column
instead of "double-indenting" the "case" labels. E.g.:
switch (suffix) {
case 'G':
case 'g':
mem <<= 30;
break;
case 'M':
case 'm':
mem <<= 20;
break;
case 'K':
case 'k':
mem <<= 10;
/* fall through */
default:
break;
}
Don't put multiple statements on a single line unless you have
something to hide:
if (condition) do_this;
do_something_everytime;
Don't put multiple assignments on a single line either. Kernel coding style
is super simple. Avoid tricky expressions.
Outside of comments, documentation and except in Kconfig, spaces are never
used for indentation, and the above example is deliberately broken.
Get a decent editor and don't leave whitespace at the end of lines.
Chapter 2: Breaking long lines and strings
Coding style is all about readability and maintainability using commonly
available tools.
The limit on the length of lines is 80 columns and this is a strongly
preferred limit.
Statements longer than 80 columns will be broken into sensible chunks, unless
exceeding 80 columns significantly increases readability and does not hide
information. Descendants are always substantially shorter than the parent and
are placed substantially to the right. The same applies to function headers
with a long argument list. However, never break user-visible strings such as
printk messages, because that breaks the ability to grep for them.
Chapter 3: Placing Braces and Spaces
The other issue that always comes up in C styling is the placement of
braces. Unlike the indent size, there are few technical reasons to
choose one placement strategy over the other, but the preferred way, as
shown to us by the prophets Kernighan and Ritchie, is to put the opening
brace last on the line, and put the closing brace first, thusly:
if (x is true) {
we do y
}
This applies to all non-function statement blocks (if, switch, for,
while, do). E.g.:
switch (action) {
case KOBJ_ADD:
return "add";
case KOBJ_REMOVE:
return "remove";
case KOBJ_CHANGE:
return "change";
default:
return NULL;
}
However, there is one special case, namely functions: they have the
opening brace at the beginning of the next line, thus:
int function(int x)
{
body of function
}
Heretic people all over the world have claimed that this inconsistency
is ... well ... inconsistent, but all right-thinking people know that
(a) K&R are _right_ and (b) K&R are right. Besides, functions are
special anyway (you can't nest them in C).
Note that the closing brace is empty on a line of its own, _except_ in
the cases where it is followed by a continuation of the same statement,
ie a "while" in a do-statement or an "else" in an if-statement, like
this:
do {
body of do-loop
} while (condition);
and
if (x == y) {
..
} else if (x > y) {
...
} else {
....
}
Rationale: K&R.
Also, note that this brace-placement also minimizes the number of empty
(or almost empty) lines, without any loss of readability. Thus, as the
supply of new-lines on your screen is not a renewable resource (think
25-line terminal screens here), you have more empty lines to put
comments on.
Do not unnecessarily use braces where a single statement will do.
if (condition)
action();
and
if (condition)
do_this();
else
do_that();
This does not apply if only one branch of a conditional statement is a single
statement; in the latter case use braces in both branches:
if (condition) {
do_this();
do_that();
} else {
otherwise();
}
3.1: Spaces
Linux kernel style for use of spaces depends (mostly) on
function-versus-keyword usage. Use a space after (most) keywords. The
notable exceptions are sizeof, typeof, alignof, and __attribute__, which look
somewhat like functions (and are usually used with parentheses in Linux,
although they are not required in the language, as in: "sizeof info" after
"struct fileinfo info;" is declared).
So use a space after these keywords:
if, switch, case, for, do, while
but not with sizeof, typeof, alignof, or __attribute__. E.g.,
s = sizeof(struct file);
Do not add spaces around (inside) parenthesized expressions. This example is
*bad*:
s = sizeof( struct file );
When declaring pointer data or a function that returns a pointer type, the
preferred use of '*' is adjacent to the data name or function name and not
adjacent to the type name. Examples:
char *linux_banner;
unsigned long long memparse(char *ptr, char **retptr);
char *match_strdup(substring_t *s);
Use one space around (on each side of) most binary and ternary operators,
such as any of these:
= + - < > * / % | & ^ <= >= == != ? :
but no space after unary operators:
& * + - ~ ! sizeof typeof alignof __attribute__ defined
no space before the postfix increment & decrement unary operators:
++ --
no space after the prefix increment & decrement unary operators:
++ --
and no space around the '.' and "->" structure member operators.
Do not leave trailing whitespace at the ends of lines. Some editors with
"smart" indentation will insert whitespace at the beginning of new lines as
appropriate, so you can start typing the next line of code right away.
However, some such editors do not remove the whitespace if you end up not
putting a line of code there, such as if you leave a blank line. As a result,
you end up with lines containing trailing whitespace.
Git will warn you about patches that introduce trailing whitespace, and can
optionally strip the trailing whitespace for you; however, if applying a series
of patches, this may make later patches in the series fail by changing their
context lines.
Chapter 4: Naming
C is a Spartan language, and so should your naming be. Unlike Modula-2
and Pascal programmers, C programmers do not use cute names like
ThisVariableIsATemporaryCounter. A C programmer would call that
variable "tmp", which is much easier to write, and not the least more
difficult to understand.
HOWEVER, while mixed-case names are frowned upon, descriptive names for
global variables are a must. To call a global function "foo" is a
shooting offense.
GLOBAL variables (to be used only if you _really_ need them) need to
have descriptive names, as do global functions. If you have a function
that counts the number of active users, you should call that
"count_active_users()" or similar, you should _not_ call it "cntusr()".
Encoding the type of a function into the name (so-called Hungarian
notation) is brain damaged - the compiler knows the types anyway and can
check those, and it only confuses the programmer. No wonder MicroSoft
makes buggy programs.
LOCAL variable names should be short, and to the point. If you have
some random integer loop counter, it should probably be called "i".
Calling it "loop_counter" is non-productive, if there is no chance of it
being mis-understood. Similarly, "tmp" can be just about any type of
variable that is used to hold a temporary value.
If you are afraid to mix up your local variable names, you have another
problem, which is called the function-growth-hormone-imbalance syndrome.
See chapter 6 (Functions).
Chapter 5: Typedefs
Please don't use things like "vps_t".
It's a _mistake_ to use typedef for structures and pointers. When you see a
vps_t a;
in the source, what does it mean?
In contrast, if it says
struct virtual_container *a;
you can actually tell what "a" is.
Lots of people think that typedefs "help readability". Not so. They are
useful only for:
(a) totally opaque objects (where the typedef is actively used to _hide_
what the object is).
Example: "pte_t" etc. opaque objects that you can only access using
the proper accessor functions.
NOTE! Opaqueness and "accessor functions" are not good in themselves.
The reason we have them for things like pte_t etc. is that there
really is absolutely _zero_ portably accessible information there.
(b) Clear integer types, where the abstraction _helps_ avoid confusion
whether it is "int" or "long".
u8/u16/u32 are perfectly fine typedefs, although they fit into
category (d) better than here.
NOTE! Again - there needs to be a _reason_ for this. If something is
"unsigned long", then there's no reason to do
typedef unsigned long myflags_t;
but if there is a clear reason for why it under certain circumstances
might be an "unsigned int" and under other configurations might be
"unsigned long", then by all means go ahead and use a typedef.
(c) when you use sparse to literally create a _new_ type for
type-checking.
(d) New types which are identical to standard C99 types, in certain
exceptional circumstances.
Although it would only take a short amount of time for the eyes and
brain to become accustomed to the standard types like 'uint32_t',
some people object to their use anyway.
Therefore, the Linux-specific 'u8/u16/u32/u64' types and their
signed equivalents which are identical to standard types are
permitted -- although they are not mandatory in new code of your
own.
When editing existing code which already uses one or the other set
of types, you should conform to the existing choices in that code.
(e) Types safe for use in userspace.
In certain structures which are visible to userspace, we cannot
require C99 types and cannot use the 'u32' form above. Thus, we
use __u32 and similar types in all structures which are shared
with userspace.
Maybe there are other cases too, but the rule should basically be to NEVER
EVER use a typedef unless you can clearly match one of those rules.
In general, a pointer, or a struct that has elements that can reasonably
be directly accessed should _never_ be a typedef.
Chapter 6: Functions
Functions should be short and sweet, and do just one thing. They should
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
as we all know), and do one thing and do that well.
The maximum length of a function is inversely proportional to the
complexity and indentation level of that function. So, if you have a
conceptually simple function that is just one long (but simple)
case-statement, where you have to do lots of small things for a lot of
different cases, it's OK to have a longer function.
However, if you have a complex function, and you suspect that a
less-than-gifted first-year high-school student might not even
understand what the function is all about, you should adhere to the
maximum limits all the more closely. Use helper functions with
descriptive names (you can ask the compiler to in-line them if you think
it's performance-critical, and it will probably do a better job of it
than you would have done).
Another measure of the function is the number of local variables. They
shouldn't exceed 5-10, or you're doing something wrong. Re-think the
function, and split it into smaller pieces. A human brain can
generally easily keep track of about 7 different things, anything more
and it gets confused. You know you're brilliant, but maybe you'd like
to understand what you did 2 weeks from now.
In source files, separate functions with one blank line. If the function is
exported, the EXPORT* macro for it should follow immediately after the closing
function brace line. E.g.:
int system_is_up(void)
{
return system_state == SYSTEM_RUNNING;
}
EXPORT_SYMBOL(system_is_up);
In function prototypes, include parameter names with their data types.
Although this is not required by the C language, it is preferred in Linux
because it is a simple way to add valuable information for the reader.
Chapter 7: Centralized exiting of functions
Albeit deprecated by some people, the equivalent of the goto statement is
used frequently by compilers in form of the unconditional jump instruction.
The goto statement comes in handy when a function exits from multiple
locations and some common work such as cleanup has to be done. If there is no
cleanup needed then just return directly.
Choose label names which say what the goto does or why the goto exists. An
example of a good name could be "out_free_buffer:" if the goto frees "buffer".
Avoid using GW-BASIC names like "err1:" and "err2:", as you would have to
renumber them if you ever add or remove exit paths, and they make correctness
difficult to verify anyway.
It is advised to indent labels with a single space (not tab), so that
"diff -p" does not confuse labels with functions.
The rationale for using gotos is:
- unconditional statements are easier to understand and follow
- nesting is reduced
- errors by not updating individual exit points when making
modifications are prevented
- saves the compiler work to optimize redundant code away ;)
int fun(int a)
{
int result = 0;
char *buffer;
buffer = kmalloc(SIZE, GFP_KERNEL);
if (!buffer)
return -ENOMEM;
if (condition1) {
while (loop1) {
...
}
result = 1;
goto out_buffer;
}
...
out_free_buffer:
kfree(buffer);
return result;
}
A common type of bug to be aware of is "one err bugs" which look like this:
err:
kfree(foo->bar);
kfree(foo);
return ret;
The bug in this code is that on some exit paths "foo" is NULL. Normally the
fix for this is to split it up into two error labels "err_free_bar:" and
"err_free_foo:":
err_free_bar:
kfree(foo->bar);
err_free_foo:
kfree(foo);
return ret;
Ideally you should simulate errors to test all exit paths.
Chapter 8: Commenting
Comments are good, but there is also a danger of over-commenting. NEVER
try to explain HOW your code works in a comment: it's much better to
write the code so that the _working_ is obvious, and it's a waste of
time to explain badly written code.
Generally, you want your comments to tell WHAT your code does, not HOW.
Also, try to avoid putting comments inside a function body: if the
function is so complex that you need to separately comment parts of it,
you should probably go back to chapter 6 for a while. You can make
small comments to note or warn about something particularly clever (or
ugly), but try to avoid excess. Instead, put the comments at the head
of the function, telling people what it does, and possibly WHY it does
it.
When commenting the kernel API functions, please use the kernel-doc format.
See the files Documentation/kernel-documentation.rst and scripts/kernel-doc
for details.
The preferred style for long (multi-line) comments is:
/*
* This is the preferred style for multi-line
* comments in the Linux kernel source code.
* Please use it consistently.
*
* Description: A column of asterisks on the left side,
* with beginning and ending almost-blank lines.
*/
For files in net/ and drivers/net/ the preferred style for long (multi-line)
comments is a little different.
/* The preferred comment style for files in net/ and drivers/net
* looks like this.
*
* It is nearly the same as the generally preferred comment style,
* but there is no initial almost-blank line.
*/
It's also important to comment data, whether they are basic types or derived
types. To this end, use just one data declaration per line (no commas for
multiple data declarations). This leaves you room for a small comment on each
item, explaining its use.
Chapter 9: You've made a mess of it
That's OK, we all do. You've probably been told by your long-time Unix
user helper that "GNU emacs" automatically formats the C sources for
you, and you've noticed that yes, it does do that, but the defaults it
uses are less than desirable (in fact, they are worse than random
typing - an infinite number of monkeys typing into GNU emacs would never
make a good program).
So, you can either get rid of GNU emacs, or change it to use saner
values. To do the latter, you can stick the following in your .emacs file:
(defun c-lineup-arglist-tabs-only (ignored)
"Line up argument lists by tabs, not spaces"
(let* ((anchor (c-langelem-pos c-syntactic-element))
(column (c-langelem-2nd-pos c-syntactic-element))
(offset (- (1+ column) anchor))
(steps (floor offset c-basic-offset)))
(* (max steps 1)
c-basic-offset)))
(add-hook 'c-mode-common-hook
(lambda ()
;; Add kernel style
(c-add-style
"linux-tabs-only"
'("linux" (c-offsets-alist
(arglist-cont-nonempty
c-lineup-gcc-asm-reg
c-lineup-arglist-tabs-only))))))
(add-hook 'c-mode-hook
(lambda ()
(let ((filename (buffer-file-name)))
;; Enable kernel mode for the appropriate files
(when (and filename
(string-match (expand-file-name "~/src/linux-trees")
filename))
(setq indent-tabs-mode t)
(setq show-trailing-whitespace t)
(c-set-style "linux-tabs-only")))))
This will make emacs go better with the kernel coding style for C
files below ~/src/linux-trees.
But even if you fail in getting emacs to do sane formatting, not
everything is lost: use "indent".
Now, again, GNU indent has the same brain-dead settings that GNU emacs
has, which is why you need to give it a few command line options.
However, that's not too bad, because even the makers of GNU indent
recognize the authority of K&R (the GNU people aren't evil, they are
just severely misguided in this matter), so you just give indent the
options "-kr -i8" (stands for "K&R, 8 character indents"), or use
"scripts/Lindent", which indents in the latest style.
"indent" has a lot of options, and especially when it comes to comment
re-formatting you may want to take a look at the man page. But
remember: "indent" is not a fix for bad programming.
Chapter 10: Kconfig configuration files
For all of the Kconfig* configuration files throughout the source tree,
the indentation is somewhat different. Lines under a "config" definition
are indented with one tab, while help text is indented an additional two
spaces. Example:
config AUDIT
bool "Auditing support"
depends on NET
help
Enable auditing infrastructure that can be used with another
kernel subsystem, such as SELinux (which requires this for
logging of avc messages output). Does not do system-call
auditing without CONFIG_AUDITSYSCALL.
Seriously dangerous features (such as write support for certain
filesystems) should advertise this prominently in their prompt string:
config ADFS_FS_RW
bool "ADFS write support (DANGEROUS)"
depends on ADFS_FS
...
For full documentation on the configuration files, see the file
Documentation/kbuild/kconfig-language.txt.
Chapter 11: Data structures
Data structures that have visibility outside the single-threaded
environment they are created and destroyed in should always have
reference counts. In the kernel, garbage collection doesn't exist (and
outside the kernel garbage collection is slow and inefficient), which
means that you absolutely _have_ to reference count all your uses.
Reference counting means that you can avoid locking, and allows multiple
users to have access to the data structure in parallel - and not having
to worry about the structure suddenly going away from under them just
because they slept or did something else for a while.
Note that locking is _not_ a replacement for reference counting.
Locking is used to keep data structures coherent, while reference
counting is a memory management technique. Usually both are needed, and
they are not to be confused with each other.
Many data structures can indeed have two levels of reference counting,
when there are users of different "classes". The subclass count counts
the number of subclass users, and decrements the global count just once
when the subclass count goes to zero.
Examples of this kind of "multi-level-reference-counting" can be found in
memory management ("struct mm_struct": mm_users and mm_count), and in
filesystem code ("struct super_block": s_count and s_active).
Remember: if another thread can find your data structure, and you don't
have a reference count on it, you almost certainly have a bug.
Chapter 12: Macros, Enums and RTL
Names of macros defining constants and labels in enums are capitalized.
#define CONSTANT 0x12345
Enums are preferred when defining several related constants.
CAPITALIZED macro names are appreciated but macros resembling functions
may be named in lower case.
Generally, inline functions are preferable to macros resembling functions.
Macros with multiple statements should be enclosed in a do - while block:
#define macrofun(a, b, c) \
do { \
if (a == 5) \
do_this(b, c); \
} while (0)
Things to avoid when using macros:
1) macros that affect control flow:
#define FOO(x) \
do { \
if (blah(x) < 0) \
return -EBUGGERED; \
} while (0)
is a _very_ bad idea. It looks like a function call but exits the "calling"
function; don't break the internal parsers of those who will read the code.
2) macros that depend on having a local variable with a magic name:
#define FOO(val) bar(index, val)
might look like a good thing, but it's confusing as hell when one reads the
code and it's prone to breakage from seemingly innocent changes.
3) macros with arguments that are used as l-values: FOO(x) = y; will
bite you if somebody e.g. turns FOO into an inline function.
4) forgetting about precedence: macros defining constants using expressions
must enclose the expression in parentheses. Beware of similar issues with
macros using parameters.
#define CONSTANT 0x4000
#define CONSTEXP (CONSTANT | 3)
5) namespace collisions when defining local variables in macros resembling
functions:
#define FOO(x) \
({ \
typeof(x) ret; \
ret = calc_ret(x); \
(ret); \
})
ret is a common name for a local variable - __foo_ret is less likely
to collide with an existing variable.
The cpp manual deals with macros exhaustively. The gcc internals manual also
covers RTL which is used frequently with assembly language in the kernel.
Chapter 13: Printing kernel messages
Kernel developers like to be seen as literate. Do mind the spelling
of kernel messages to make a good impression. Do not use crippled
words like "dont"; use "do not" or "don't" instead. Make the messages
concise, clear, and unambiguous.
Kernel messages do not have to be terminated with a period.
Printing numbers in parentheses (%d) adds no value and should be avoided.
There are a number of driver model diagnostic macros in <linux/device.h>
which you should use to make sure messages are matched to the right device
and driver, and are tagged with the right level: dev_err(), dev_warn(),
dev_info(), and so forth. For messages that aren't associated with a
particular device, <linux/printk.h> defines pr_notice(), pr_info(),
pr_warn(), pr_err(), etc.
Coming up with good debugging messages can be quite a challenge; and once
you have them, they can be a huge help for remote troubleshooting. However
debug message printing is handled differently than printing other non-debug
messages. While the other pr_XXX() functions print unconditionally,
pr_debug() does not; it is compiled out by default, unless either DEBUG is
defined or CONFIG_DYNAMIC_DEBUG is set. That is true for dev_dbg() also,
and a related convention uses VERBOSE_DEBUG to add dev_vdbg() messages to
the ones already enabled by DEBUG.
Many subsystems have Kconfig debug options to turn on -DDEBUG in the
corresponding Makefile; in other cases specific files #define DEBUG. And
when a debug message should be unconditionally printed, such as if it is
already inside a debug-related #ifdef section, printk(KERN_DEBUG ...) can be
used.
Chapter 14: Allocating memory
The kernel provides the following general purpose memory allocators:
kmalloc(), kzalloc(), kmalloc_array(), kcalloc(), vmalloc(), and
vzalloc(). Please refer to the API documentation for further information
about them.
The preferred form for passing a size of a struct is the following:
p = kmalloc(sizeof(*p), ...);
The alternative form where struct name is spelled out hurts readability and
introduces an opportunity for a bug when the pointer variable type is changed
but the corresponding sizeof that is passed to a memory allocator is not.
Casting the return value which is a void pointer is redundant. The conversion
from void pointer to any other pointer type is guaranteed by the C programming
language.
The preferred form for allocating an array is the following:
p = kmalloc_array(n, sizeof(...), ...);
The preferred form for allocating a zeroed array is the following:
p = kcalloc(n, sizeof(...), ...);
Both forms check for overflow on the allocation size n * sizeof(...),
and return NULL if that occurred.
Chapter 15: The inline disease
There appears to be a common misperception that gcc has a magic "make me
faster" speedup option called "inline". While the use of inlines can be
appropriate (for example as a means of replacing macros, see Chapter 12), it
very often is not. Abundant use of the inline keyword leads to a much bigger
kernel, which in turn slows the system as a whole down, due to a bigger
icache footprint for the CPU and simply because there is less memory
available for the pagecache. Just think about it; a pagecache miss causes a
disk seek, which easily takes 5 milliseconds. There are a LOT of cpu cycles
that can go into these 5 milliseconds.
A reasonable rule of thumb is to not put inline at functions that have more
than 3 lines of code in them. An exception to this rule are the cases where
a parameter is known to be a compiletime constant, and as a result of this
constantness you *know* the compiler will be able to optimize most of your
function away at compile time. For a good example of this later case, see
the kmalloc() inline function.
Often people argue that adding inline to functions that are static and used
only once is always a win since there is no space tradeoff. While this is
technically correct, gcc is capable of inlining these automatically without
help, and the maintenance issue of removing the inline when a second user
appears outweighs the potential value of the hint that tells gcc to do
something it would have done anyway.
Chapter 16: Function return values and names
Functions can return values of many different kinds, and one of the
most common is a value indicating whether the function succeeded or
failed. Such a value can be represented as an error-code integer
(-Exxx = failure, 0 = success) or a "succeeded" boolean (0 = failure,
non-zero = success).
Mixing up these two sorts of representations is a fertile source of
difficult-to-find bugs. If the C language included a strong distinction
between integers and booleans then the compiler would find these mistakes
for us... but it doesn't. To help prevent such bugs, always follow this
convention:
If the name of a function is an action or an imperative command,
the function should return an error-code integer. If the name
is a predicate, the function should return a "succeeded" boolean.
For example, "add work" is a command, and the add_work() function returns 0
for success or -EBUSY for failure. In the same way, "PCI device present" is
a predicate, and the pci_dev_present() function returns 1 if it succeeds in
finding a matching device or 0 if it doesn't.
All EXPORTed functions must respect this convention, and so should all
public functions. Private (static) functions need not, but it is
recommended that they do.
Functions whose return value is the actual result of a computation, rather
than an indication of whether the computation succeeded, are not subject to
this rule. Generally they indicate failure by returning some out-of-range
result. Typical examples would be functions that return pointers; they use
NULL or the ERR_PTR mechanism to report failure.
Chapter 17: Don't re-invent the kernel macros
The header file include/linux/kernel.h contains a number of macros that
you should use, rather than explicitly coding some variant of them yourself.
For example, if you need to calculate the length of an array, take advantage
of the macro
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
Similarly, if you need to calculate the size of some structure member, use
#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
There are also min() and max() macros that do strict type checking if you
need them. Feel free to peruse that header file to see what else is already
defined that you shouldn't reproduce in your code.
Chapter 18: Editor modelines and other cruft
Some editors can interpret configuration information embedded in source files,
indicated with special markers. For example, emacs interprets lines marked
like this:
-*- mode: c -*-
Or like this:
/*
Local Variables:
compile-command: "gcc -DMAGIC_DEBUG_FLAG foo.c"
End:
*/
Vim interprets markers that look like this:
/* vim:set sw=8 noet */
Do not include any of these in source files. People have their own personal
editor configurations, and your source files should not override them. This
includes markers for indentation and mode configuration. People may use their
own custom mode, or may have some other magic method for making indentation
work correctly.
Chapter 19: Inline assembly
In architecture-specific code, you may need to use inline assembly to interface
with CPU or platform functionality. Don't hesitate to do so when necessary.
However, don't use inline assembly gratuitously when C can do the job. You can
and should poke hardware from C when possible.
Consider writing simple helper functions that wrap common bits of inline
assembly, rather than repeatedly writing them with slight variations. Remember
that inline assembly can use C parameters.
Large, non-trivial assembly functions should go in .S files, with corresponding
C prototypes defined in C header files. The C prototypes for assembly
functions should use "asmlinkage".
You may need to mark your asm statement as volatile, to prevent GCC from
removing it if GCC doesn't notice any side effects. You don't always need to
do so, though, and doing so unnecessarily can limit optimization.
When writing a single inline assembly statement containing multiple
instructions, put each instruction on a separate line in a separate quoted
string, and end each string except the last with \n\t to properly indent the
next instruction in the assembly output:
asm ("magic %reg1, #42\n\t"
"more_magic %reg2, %reg3"
: /* outputs */ : /* inputs */ : /* clobbers */);
Chapter 20: Conditional Compilation
Wherever possible, don't use preprocessor conditionals (#if, #ifdef) in .c
files; doing so makes code harder to read and logic harder to follow. Instead,
use such conditionals in a header file defining functions for use in those .c
files, providing no-op stub versions in the #else case, and then call those
functions unconditionally from .c files. The compiler will avoid generating
any code for the stub calls, producing identical results, but the logic will
remain easy to follow.
Prefer to compile out entire functions, rather than portions of functions or
portions of expressions. Rather than putting an ifdef in an expression, factor
out part or all of the expression into a separate helper function and apply the
conditional to that function.
If you have a function or variable which may potentially go unused in a
particular configuration, and the compiler would warn about its definition
going unused, mark the definition as __maybe_unused rather than wrapping it in
a preprocessor conditional. (However, if a function or variable *always* goes
unused, delete it.)
Within code, where possible, use the IS_ENABLED macro to convert a Kconfig
symbol into a C boolean expression, and use it in a normal C conditional:
if (IS_ENABLED(CONFIG_SOMETHING)) {
...
}
The compiler will constant-fold the conditional away, and include or exclude
the block of code just as with an #ifdef, so this will not add any runtime
overhead. However, this approach still allows the C compiler to see the code
inside the block, and check it for correctness (syntax, types, symbol
references, etc). Thus, you still have to use an #ifdef if the code inside the
block references symbols that will not exist if the condition is not met.
At the end of any non-trivial #if or #ifdef block (more than a few lines),
place a comment after the #endif on the same line, noting the conditional
expression used. For instance:
#ifdef CONFIG_SOMETHING
...
#endif /* CONFIG_SOMETHING */
Appendix I: References
The C Programming Language, Second Edition
by Brian W. Kernighan and Dennis M. Ritchie.
Prentice Hall, Inc., 1988.
ISBN 0-13-110362-8 (paperback), 0-13-110370-9 (hardback).
The Practice of Programming
by Brian W. Kernighan and Rob Pike.
Addison-Wesley, Inc., 1999.
ISBN 0-201-61586-X.
GNU manuals - where in compliance with K&R and this text - for cpp, gcc,
gcc internals and indent, all available from http://www.gnu.org/manual/
WG14 is the international standardization working group for the programming
language C, URL: http://www.open-std.org/JTC1/SC22/WG14/
Kernel CodingStyle, by greg@kroah.com at OLS 2002:
http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/
This file has moved to process/coding-style.rst

View File

@@ -699,7 +699,7 @@ to use the dma_sync_*() interfaces.
dma_addr_t mapping;
mapping = dma_map_single(cp->dev, buffer, len, DMA_FROM_DEVICE);
if (dma_mapping_error(cp->dev, dma_handle)) {
if (dma_mapping_error(cp->dev, mapping)) {
/*
* reduce current DMA mapping usage,
* delay and try again later or

View File

@@ -277,14 +277,26 @@ and <size> parameters are provided to do partial page mapping, it is
recommended that you never use these unless you really know what the
cache width is.
dma_addr_t
dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
void
dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
API for mapping and unmapping for MMIO resources. All the notes and
warnings for the other mapping APIs apply here. The API should only be
used to map device MMIO resources, mapping of RAM is not permitted.
int
dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
In some circumstances dma_map_single() and dma_map_page() will fail to create
a mapping. A driver can check for these errors by testing the returned
DMA address with dma_mapping_error(). A non-zero return value means the mapping
could not be created and the driver should take appropriate action (e.g.
reduce current DMA mapping usage or delay and try again later).
In some circumstances dma_map_single(), dma_map_page() and dma_map_resource()
will fail to create a mapping. A driver can check for these errors by testing
the returned DMA address with dma_mapping_error(). A non-zero return value
means the mapping could not be created and the driver should take appropriate
action (e.g. reduce current DMA mapping usage or delay and try again later).
int
dma_map_sg(struct device *dev, struct scatterlist *sg,

View File

@@ -126,3 +126,20 @@ means that we won't try quite as hard to get them.
NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
though ARM64 patches will likely be posted soon.
DMA_ATTR_NO_WARN
----------------
This tells the DMA-mapping subsystem to suppress allocation failure reports
(similarly to __GFP_NOWARN).
On some architectures allocation failures are reported with error messages
to the system logs. Although this can help to identify and debug problems,
drivers which handle failures (eg, retry later) have no problems with them,
and can actually flood the system logs with error messages that aren't any
problem at all, depending on the implementation of the retry mechanism.
So, this provides a way for drivers to avoid those error messages on calls
where allocation failures are not a problem, and shouldn't bother the logs.
NOTE: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC.

View File

@@ -1,584 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE set PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<set>
<setinfo>
<title>The 802.11 subsystems &ndash; for kernel developers</title>
<subtitle>
Explaining wireless 802.11 networking in the Linux kernel
</subtitle>
<copyright>
<year>2007-2009</year>
<holder>Johannes Berg</holder>
</copyright>
<authorgroup>
<author>
<firstname>Johannes</firstname>
<surname>Berg</surname>
<affiliation>
<address><email>johannes@sipsolutions.net</email></address>
</affiliation>
</author>
</authorgroup>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License version 2 as published by the Free Software Foundation.
</para>
<para>
This documentation is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this documentation; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
<abstract>
<para>
These books attempt to give a description of the
various subsystems that play a role in 802.11 wireless
networking in Linux. Since these books are for kernel
developers they attempts to document the structures
and functions used in the kernel as well as giving a
higher-level overview.
</para>
<para>
The reader is expected to be familiar with the 802.11
standard as published by the IEEE in 802.11-2007 (or
possibly later versions). References to this standard
will be given as "802.11-2007 8.1.5".
</para>
</abstract>
</setinfo>
<book id="cfg80211-developers-guide">
<bookinfo>
<title>The cfg80211 subsystem</title>
<abstract>
!Pinclude/net/cfg80211.h Introduction
</abstract>
</bookinfo>
<chapter>
<title>Device registration</title>
!Pinclude/net/cfg80211.h Device registration
!Finclude/net/cfg80211.h ieee80211_channel_flags
!Finclude/net/cfg80211.h ieee80211_channel
!Finclude/net/cfg80211.h ieee80211_rate_flags
!Finclude/net/cfg80211.h ieee80211_rate
!Finclude/net/cfg80211.h ieee80211_sta_ht_cap
!Finclude/net/cfg80211.h ieee80211_supported_band
!Finclude/net/cfg80211.h cfg80211_signal_type
!Finclude/net/cfg80211.h wiphy_params_flags
!Finclude/net/cfg80211.h wiphy_flags
!Finclude/net/cfg80211.h wiphy
!Finclude/net/cfg80211.h wireless_dev
!Finclude/net/cfg80211.h wiphy_new
!Finclude/net/cfg80211.h wiphy_register
!Finclude/net/cfg80211.h wiphy_unregister
!Finclude/net/cfg80211.h wiphy_free
!Finclude/net/cfg80211.h wiphy_name
!Finclude/net/cfg80211.h wiphy_dev
!Finclude/net/cfg80211.h wiphy_priv
!Finclude/net/cfg80211.h priv_to_wiphy
!Finclude/net/cfg80211.h set_wiphy_dev
!Finclude/net/cfg80211.h wdev_priv
!Finclude/net/cfg80211.h ieee80211_iface_limit
!Finclude/net/cfg80211.h ieee80211_iface_combination
!Finclude/net/cfg80211.h cfg80211_check_combinations
</chapter>
<chapter>
<title>Actions and configuration</title>
!Pinclude/net/cfg80211.h Actions and configuration
!Finclude/net/cfg80211.h cfg80211_ops
!Finclude/net/cfg80211.h vif_params
!Finclude/net/cfg80211.h key_params
!Finclude/net/cfg80211.h survey_info_flags
!Finclude/net/cfg80211.h survey_info
!Finclude/net/cfg80211.h cfg80211_beacon_data
!Finclude/net/cfg80211.h cfg80211_ap_settings
!Finclude/net/cfg80211.h station_parameters
!Finclude/net/cfg80211.h rate_info_flags
!Finclude/net/cfg80211.h rate_info
!Finclude/net/cfg80211.h station_info
!Finclude/net/cfg80211.h monitor_flags
!Finclude/net/cfg80211.h mpath_info_flags
!Finclude/net/cfg80211.h mpath_info
!Finclude/net/cfg80211.h bss_parameters
!Finclude/net/cfg80211.h ieee80211_txq_params
!Finclude/net/cfg80211.h cfg80211_crypto_settings
!Finclude/net/cfg80211.h cfg80211_auth_request
!Finclude/net/cfg80211.h cfg80211_assoc_request
!Finclude/net/cfg80211.h cfg80211_deauth_request
!Finclude/net/cfg80211.h cfg80211_disassoc_request
!Finclude/net/cfg80211.h cfg80211_ibss_params
!Finclude/net/cfg80211.h cfg80211_connect_params
!Finclude/net/cfg80211.h cfg80211_pmksa
!Finclude/net/cfg80211.h cfg80211_rx_mlme_mgmt
!Finclude/net/cfg80211.h cfg80211_auth_timeout
!Finclude/net/cfg80211.h cfg80211_rx_assoc_resp
!Finclude/net/cfg80211.h cfg80211_assoc_timeout
!Finclude/net/cfg80211.h cfg80211_tx_mlme_mgmt
!Finclude/net/cfg80211.h cfg80211_ibss_joined
!Finclude/net/cfg80211.h cfg80211_connect_result
!Finclude/net/cfg80211.h cfg80211_connect_bss
!Finclude/net/cfg80211.h cfg80211_connect_timeout
!Finclude/net/cfg80211.h cfg80211_roamed
!Finclude/net/cfg80211.h cfg80211_disconnected
!Finclude/net/cfg80211.h cfg80211_ready_on_channel
!Finclude/net/cfg80211.h cfg80211_remain_on_channel_expired
!Finclude/net/cfg80211.h cfg80211_new_sta
!Finclude/net/cfg80211.h cfg80211_rx_mgmt
!Finclude/net/cfg80211.h cfg80211_mgmt_tx_status
!Finclude/net/cfg80211.h cfg80211_cqm_rssi_notify
!Finclude/net/cfg80211.h cfg80211_cqm_pktloss_notify
!Finclude/net/cfg80211.h cfg80211_michael_mic_failure
</chapter>
<chapter>
<title>Scanning and BSS list handling</title>
!Pinclude/net/cfg80211.h Scanning and BSS list handling
!Finclude/net/cfg80211.h cfg80211_ssid
!Finclude/net/cfg80211.h cfg80211_scan_request
!Finclude/net/cfg80211.h cfg80211_scan_done
!Finclude/net/cfg80211.h cfg80211_bss
!Finclude/net/cfg80211.h cfg80211_inform_bss
!Finclude/net/cfg80211.h cfg80211_inform_bss_frame_data
!Finclude/net/cfg80211.h cfg80211_inform_bss_data
!Finclude/net/cfg80211.h cfg80211_unlink_bss
!Finclude/net/cfg80211.h cfg80211_find_ie
!Finclude/net/cfg80211.h ieee80211_bss_get_ie
</chapter>
<chapter>
<title>Utility functions</title>
!Pinclude/net/cfg80211.h Utility functions
!Finclude/net/cfg80211.h ieee80211_channel_to_frequency
!Finclude/net/cfg80211.h ieee80211_frequency_to_channel
!Finclude/net/cfg80211.h ieee80211_get_channel
!Finclude/net/cfg80211.h ieee80211_get_response_rate
!Finclude/net/cfg80211.h ieee80211_hdrlen
!Finclude/net/cfg80211.h ieee80211_get_hdrlen_from_skb
!Finclude/net/cfg80211.h ieee80211_radiotap_iterator
</chapter>
<chapter>
<title>Data path helpers</title>
!Pinclude/net/cfg80211.h Data path helpers
!Finclude/net/cfg80211.h ieee80211_data_to_8023
!Finclude/net/cfg80211.h ieee80211_data_from_8023
!Finclude/net/cfg80211.h ieee80211_amsdu_to_8023s
!Finclude/net/cfg80211.h cfg80211_classify8021d
</chapter>
<chapter>
<title>Regulatory enforcement infrastructure</title>
!Pinclude/net/cfg80211.h Regulatory enforcement infrastructure
!Finclude/net/cfg80211.h regulatory_hint
!Finclude/net/cfg80211.h wiphy_apply_custom_regulatory
!Finclude/net/cfg80211.h freq_reg_info
</chapter>
<chapter>
<title>RFkill integration</title>
!Pinclude/net/cfg80211.h RFkill integration
!Finclude/net/cfg80211.h wiphy_rfkill_set_hw_state
!Finclude/net/cfg80211.h wiphy_rfkill_start_polling
!Finclude/net/cfg80211.h wiphy_rfkill_stop_polling
</chapter>
<chapter>
<title>Test mode</title>
!Pinclude/net/cfg80211.h Test mode
!Finclude/net/cfg80211.h cfg80211_testmode_alloc_reply_skb
!Finclude/net/cfg80211.h cfg80211_testmode_reply
!Finclude/net/cfg80211.h cfg80211_testmode_alloc_event_skb
!Finclude/net/cfg80211.h cfg80211_testmode_event
</chapter>
</book>
<book id="mac80211-developers-guide">
<bookinfo>
<title>The mac80211 subsystem</title>
<abstract>
!Pinclude/net/mac80211.h Introduction
!Pinclude/net/mac80211.h Warning
</abstract>
</bookinfo>
<toc></toc>
<!--
Generally, this document shall be ordered by increasing complexity.
It is important to note that readers should be able to read only
the first few sections to get a working driver and only advanced
usage should require reading the full document.
-->
<part>
<title>The basic mac80211 driver interface</title>
<partintro>
<para>
You should read and understand the information contained
within this part of the book while implementing a driver.
In some chapters, advanced usage is noted, that may be
skipped at first.
</para>
<para>
This part of the book only covers station and monitor mode
functionality, additional information required to implement
the other modes is covered in the second part of the book.
</para>
</partintro>
<chapter id="basics">
<title>Basic hardware handling</title>
<para>TBD</para>
<para>
This chapter shall contain information on getting a hw
struct allocated and registered with mac80211.
</para>
<para>
Since it is required to allocate rates/modes before registering
a hw struct, this chapter shall also contain information on setting
up the rate/mode structs.
</para>
<para>
Additionally, some discussion about the callbacks and
the general programming model should be in here, including
the definition of ieee80211_ops which will be referred to
a lot.
</para>
<para>
Finally, a discussion of hardware capabilities should be done
with references to other parts of the book.
</para>
<!-- intentionally multiple !F lines to get proper order -->
!Finclude/net/mac80211.h ieee80211_hw
!Finclude/net/mac80211.h ieee80211_hw_flags
!Finclude/net/mac80211.h SET_IEEE80211_DEV
!Finclude/net/mac80211.h SET_IEEE80211_PERM_ADDR
!Finclude/net/mac80211.h ieee80211_ops
!Finclude/net/mac80211.h ieee80211_alloc_hw
!Finclude/net/mac80211.h ieee80211_register_hw
!Finclude/net/mac80211.h ieee80211_unregister_hw
!Finclude/net/mac80211.h ieee80211_free_hw
</chapter>
<chapter id="phy-handling">
<title>PHY configuration</title>
<para>TBD</para>
<para>
This chapter should describe PHY handling including
start/stop callbacks and the various structures used.
</para>
!Finclude/net/mac80211.h ieee80211_conf
!Finclude/net/mac80211.h ieee80211_conf_flags
</chapter>
<chapter id="iface-handling">
<title>Virtual interfaces</title>
<para>TBD</para>
<para>
This chapter should describe virtual interface basics
that are relevant to the driver (VLANs, MGMT etc are not.)
It should explain the use of the add_iface/remove_iface
callbacks as well as the interface configuration callbacks.
</para>
<para>Things related to AP mode should be discussed there.</para>
<para>
Things related to supporting multiple interfaces should be
in the appropriate chapter, a BIG FAT note should be here about
this though and the recommendation to allow only a single
interface in STA mode at first!
</para>
!Finclude/net/mac80211.h ieee80211_vif
</chapter>
<chapter id="rx-tx">
<title>Receive and transmit processing</title>
<sect1>
<title>what should be here</title>
<para>TBD</para>
<para>
This should describe the receive and transmit
paths in mac80211/the drivers as well as
transmit status handling.
</para>
</sect1>
<sect1>
<title>Frame format</title>
!Pinclude/net/mac80211.h Frame format
</sect1>
<sect1>
<title>Packet alignment</title>
!Pnet/mac80211/rx.c Packet alignment
</sect1>
<sect1>
<title>Calling into mac80211 from interrupts</title>
!Pinclude/net/mac80211.h Calling mac80211 from interrupts
</sect1>
<sect1>
<title>functions/definitions</title>
!Finclude/net/mac80211.h ieee80211_rx_status
!Finclude/net/mac80211.h mac80211_rx_flags
!Finclude/net/mac80211.h mac80211_tx_info_flags
!Finclude/net/mac80211.h mac80211_tx_control_flags
!Finclude/net/mac80211.h mac80211_rate_control_flags
!Finclude/net/mac80211.h ieee80211_tx_rate
!Finclude/net/mac80211.h ieee80211_tx_info
!Finclude/net/mac80211.h ieee80211_tx_info_clear_status
!Finclude/net/mac80211.h ieee80211_rx
!Finclude/net/mac80211.h ieee80211_rx_ni
!Finclude/net/mac80211.h ieee80211_rx_irqsafe
!Finclude/net/mac80211.h ieee80211_tx_status
!Finclude/net/mac80211.h ieee80211_tx_status_ni
!Finclude/net/mac80211.h ieee80211_tx_status_irqsafe
!Finclude/net/mac80211.h ieee80211_rts_get
!Finclude/net/mac80211.h ieee80211_rts_duration
!Finclude/net/mac80211.h ieee80211_ctstoself_get
!Finclude/net/mac80211.h ieee80211_ctstoself_duration
!Finclude/net/mac80211.h ieee80211_generic_frame_duration
!Finclude/net/mac80211.h ieee80211_wake_queue
!Finclude/net/mac80211.h ieee80211_stop_queue
!Finclude/net/mac80211.h ieee80211_wake_queues
!Finclude/net/mac80211.h ieee80211_stop_queues
!Finclude/net/mac80211.h ieee80211_queue_stopped
</sect1>
</chapter>
<chapter id="filters">
<title>Frame filtering</title>
!Pinclude/net/mac80211.h Frame filtering
!Finclude/net/mac80211.h ieee80211_filter_flags
</chapter>
<chapter id="workqueue">
<title>The mac80211 workqueue</title>
!Pinclude/net/mac80211.h mac80211 workqueue
!Finclude/net/mac80211.h ieee80211_queue_work
!Finclude/net/mac80211.h ieee80211_queue_delayed_work
</chapter>
</part>
<part id="advanced">
<title>Advanced driver interface</title>
<partintro>
<para>
Information contained within this part of the book is
of interest only for advanced interaction of mac80211
with drivers to exploit more hardware capabilities and
improve performance.
</para>
</partintro>
<chapter id="led-support">
<title>LED support</title>
<para>
Mac80211 supports various ways of blinking LEDs. Wherever possible,
device LEDs should be exposed as LED class devices and hooked up to
the appropriate trigger, which will then be triggered appropriately
by mac80211.
</para>
!Finclude/net/mac80211.h ieee80211_get_tx_led_name
!Finclude/net/mac80211.h ieee80211_get_rx_led_name
!Finclude/net/mac80211.h ieee80211_get_assoc_led_name
!Finclude/net/mac80211.h ieee80211_get_radio_led_name
!Finclude/net/mac80211.h ieee80211_tpt_blink
!Finclude/net/mac80211.h ieee80211_tpt_led_trigger_flags
!Finclude/net/mac80211.h ieee80211_create_tpt_led_trigger
</chapter>
<chapter id="hardware-crypto-offload">
<title>Hardware crypto acceleration</title>
!Pinclude/net/mac80211.h Hardware crypto acceleration
<!-- intentionally multiple !F lines to get proper order -->
!Finclude/net/mac80211.h set_key_cmd
!Finclude/net/mac80211.h ieee80211_key_conf
!Finclude/net/mac80211.h ieee80211_key_flags
!Finclude/net/mac80211.h ieee80211_get_tkip_p1k
!Finclude/net/mac80211.h ieee80211_get_tkip_p1k_iv
!Finclude/net/mac80211.h ieee80211_get_tkip_p2k
</chapter>
<chapter id="powersave">
<title>Powersave support</title>
!Pinclude/net/mac80211.h Powersave support
</chapter>
<chapter id="beacon-filter">
<title>Beacon filter support</title>
!Pinclude/net/mac80211.h Beacon filter support
!Finclude/net/mac80211.h ieee80211_beacon_loss
</chapter>
<chapter id="qos">
<title>Multiple queues and QoS support</title>
<para>TBD</para>
!Finclude/net/mac80211.h ieee80211_tx_queue_params
</chapter>
<chapter id="AP">
<title>Access point mode support</title>
<para>TBD</para>
<para>Some parts of the if_conf should be discussed here instead</para>
<para>
Insert notes about VLAN interfaces with hw crypto here or
in the hw crypto chapter.
</para>
<section id="ps-client">
<title>support for powersaving clients</title>
!Pinclude/net/mac80211.h AP support for powersaving clients
!Finclude/net/mac80211.h ieee80211_get_buffered_bc
!Finclude/net/mac80211.h ieee80211_beacon_get
!Finclude/net/mac80211.h ieee80211_sta_eosp
!Finclude/net/mac80211.h ieee80211_frame_release_type
!Finclude/net/mac80211.h ieee80211_sta_ps_transition
!Finclude/net/mac80211.h ieee80211_sta_ps_transition_ni
!Finclude/net/mac80211.h ieee80211_sta_set_buffered
!Finclude/net/mac80211.h ieee80211_sta_block_awake
</section>
</chapter>
<chapter id="multi-iface">
<title>Supporting multiple virtual interfaces</title>
<para>TBD</para>
<para>
Note: WDS with identical MAC address should almost always be OK
</para>
<para>
Insert notes about having multiple virtual interfaces with
different MAC addresses here, note which configurations are
supported by mac80211, add notes about supporting hw crypto
with it.
</para>
!Finclude/net/mac80211.h ieee80211_iterate_active_interfaces
!Finclude/net/mac80211.h ieee80211_iterate_active_interfaces_atomic
</chapter>
<chapter id="station-handling">
<title>Station handling</title>
<para>TODO</para>
!Finclude/net/mac80211.h ieee80211_sta
!Finclude/net/mac80211.h sta_notify_cmd
!Finclude/net/mac80211.h ieee80211_find_sta
!Finclude/net/mac80211.h ieee80211_find_sta_by_ifaddr
</chapter>
<chapter id="hardware-scan-offload">
<title>Hardware scan offload</title>
<para>TBD</para>
!Finclude/net/mac80211.h ieee80211_scan_completed
</chapter>
<chapter id="aggregation">
<title>Aggregation</title>
<sect1>
<title>TX A-MPDU aggregation</title>
!Pnet/mac80211/agg-tx.c TX A-MPDU aggregation
!Cnet/mac80211/agg-tx.c
</sect1>
<sect1>
<title>RX A-MPDU aggregation</title>
!Pnet/mac80211/agg-rx.c RX A-MPDU aggregation
!Cnet/mac80211/agg-rx.c
!Finclude/net/mac80211.h ieee80211_ampdu_mlme_action
</sect1>
</chapter>
<chapter id="smps">
<title>Spatial Multiplexing Powersave (SMPS)</title>
!Pinclude/net/mac80211.h Spatial multiplexing power save
!Finclude/net/mac80211.h ieee80211_request_smps
!Finclude/net/mac80211.h ieee80211_smps_mode
</chapter>
</part>
<part id="rate-control">
<title>Rate control interface</title>
<partintro>
<para>TBD</para>
<para>
This part of the book describes the rate control algorithm
interface and how it relates to mac80211 and drivers.
</para>
</partintro>
<chapter id="ratecontrol-api">
<title>Rate Control API</title>
<para>TBD</para>
!Finclude/net/mac80211.h ieee80211_start_tx_ba_session
!Finclude/net/mac80211.h ieee80211_start_tx_ba_cb_irqsafe
!Finclude/net/mac80211.h ieee80211_stop_tx_ba_session
!Finclude/net/mac80211.h ieee80211_stop_tx_ba_cb_irqsafe
!Finclude/net/mac80211.h ieee80211_rate_control_changed
!Finclude/net/mac80211.h ieee80211_tx_rate_control
!Finclude/net/mac80211.h rate_control_send_low
</chapter>
</part>
<part id="internal">
<title>Internals</title>
<partintro>
<para>TBD</para>
<para>
This part of the book describes mac80211 internals.
</para>
</partintro>
<chapter id="key-handling">
<title>Key handling</title>
<sect1>
<title>Key handling basics</title>
!Pnet/mac80211/key.c Key handling basics
</sect1>
<sect1>
<title>MORE TBD</title>
<para>TBD</para>
</sect1>
</chapter>
<chapter id="rx-processing">
<title>Receive processing</title>
<para>TBD</para>
</chapter>
<chapter id="tx-processing">
<title>Transmit processing</title>
<para>TBD</para>
</chapter>
<chapter id="sta-info">
<title>Station info handling</title>
<sect1>
<title>Programming information</title>
!Fnet/mac80211/sta_info.h sta_info
!Fnet/mac80211/sta_info.h ieee80211_sta_info_flags
</sect1>
<sect1>
<title>STA information lifetime rules</title>
!Pnet/mac80211/sta_info.c STA information lifetime rules
</sect1>
</chapter>
<chapter id="aggregation-internals">
<title>Aggregation</title>
!Fnet/mac80211/sta_info.h sta_ampdu_mlme
!Fnet/mac80211/sta_info.h tid_ampdu_tx
!Fnet/mac80211/sta_info.h tid_ampdu_rx
</chapter>
<chapter id="synchronisation">
<title>Synchronisation</title>
<para>TBD</para>
<para>Locking, lots of RCU</para>
</chapter>
</part>
</book>
</set>

View File

@@ -9,13 +9,11 @@
DOCBOOKS := z8530book.xml \
kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
writing_usb_driver.xml networking.xml \
kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \
kernel-api.xml filesystems.xml lsm.xml kgdb.xml \
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
80211.xml debugobjects.xml sh.xml regulator.xml \
alsa-driver-api.xml writing-an-alsa-driver.xml \
tracepoint.xml w1.xml \
writing_musb_glue_layer.xml crypto-API.xml iio.xml
sh.xml regulator.xml w1.xml \
writing_musb_glue_layer.xml iio.xml
ifeq ($(DOCBOOKS),)
@@ -264,6 +262,7 @@ clean-files := $(DOCBOOKS) \
$(patsubst %.xml, %.aux.xml, $(DOCBOOKS)) \
$(patsubst %.xml, %.xml.db, $(DOCBOOKS)) \
$(patsubst %.xml, %.xml, $(DOCBOOKS)) \
$(patsubst %.xml, .%.xml.cmd, $(DOCBOOKS)) \
$(index)
clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man

View File

@@ -1,142 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<!-- ****************************************************** -->
<!-- Header -->
<!-- ****************************************************** -->
<book id="ALSA-Driver-API">
<bookinfo>
<title>The ALSA Driver API</title>
<legalnotice>
<para>
This document is free; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
</para>
<para>
This document is distributed in the hope that it will be useful,
but <emphasis>WITHOUT ANY WARRANTY</emphasis>; without even the
implied warranty of <emphasis>MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE</emphasis>. See the GNU General Public License
for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter><title>Management of Cards and Devices</title>
<sect1><title>Card Management</title>
!Esound/core/init.c
</sect1>
<sect1><title>Device Components</title>
!Esound/core/device.c
</sect1>
<sect1><title>Module requests and Device File Entries</title>
!Esound/core/sound.c
</sect1>
<sect1><title>Memory Management Helpers</title>
!Esound/core/memory.c
!Esound/core/memalloc.c
</sect1>
</chapter>
<chapter><title>PCM API</title>
<sect1><title>PCM Core</title>
!Esound/core/pcm.c
!Esound/core/pcm_lib.c
!Esound/core/pcm_native.c
!Iinclude/sound/pcm.h
</sect1>
<sect1><title>PCM Format Helpers</title>
!Esound/core/pcm_misc.c
</sect1>
<sect1><title>PCM Memory Management</title>
!Esound/core/pcm_memory.c
</sect1>
<sect1><title>PCM DMA Engine API</title>
!Esound/core/pcm_dmaengine.c
!Iinclude/sound/dmaengine_pcm.h
</sect1>
</chapter>
<chapter><title>Control/Mixer API</title>
<sect1><title>General Control Interface</title>
!Esound/core/control.c
</sect1>
<sect1><title>AC97 Codec API</title>
!Esound/pci/ac97/ac97_codec.c
!Esound/pci/ac97/ac97_pcm.c
</sect1>
<sect1><title>Virtual Master Control API</title>
!Esound/core/vmaster.c
!Iinclude/sound/control.h
</sect1>
</chapter>
<chapter><title>MIDI API</title>
<sect1><title>Raw MIDI API</title>
!Esound/core/rawmidi.c
</sect1>
<sect1><title>MPU401-UART API</title>
!Esound/drivers/mpu401/mpu401_uart.c
</sect1>
</chapter>
<chapter><title>Proc Info API</title>
<sect1><title>Proc Info Interface</title>
!Esound/core/info.c
</sect1>
</chapter>
<chapter><title>Compress Offload</title>
<sect1><title>Compress Offload API</title>
!Esound/core/compress_offload.c
!Iinclude/uapi/sound/compress_offload.h
!Iinclude/uapi/sound/compress_params.h
!Iinclude/sound/compress_driver.h
</sect1>
</chapter>
<chapter><title>ASoC</title>
<sect1><title>ASoC Core API</title>
!Iinclude/sound/soc.h
!Esound/soc/soc-core.c
<!-- !Esound/soc/soc-cache.c no docbook comments here -->
!Esound/soc/soc-devres.c
!Esound/soc/soc-io.c
!Esound/soc/soc-pcm.c
!Esound/soc/soc-ops.c
!Esound/soc/soc-compress.c
</sect1>
<sect1><title>ASoC DAPM API</title>
!Esound/soc/soc-dapm.c
</sect1>
<sect1><title>ASoC DMA Engine API</title>
!Esound/soc/soc-generic-dmaengine-pcm.c
</sect1>
</chapter>
<chapter><title>Miscellaneous Functions</title>
<sect1><title>Hardware-Dependent Devices API</title>
!Esound/core/hwdep.c
</sect1>
<sect1><title>Jack Abstraction Layer API</title>
!Iinclude/sound/jack.h
!Esound/core/jack.c
!Esound/soc/soc-jack.c
</sect1>
<sect1><title>ISA DMA Helpers</title>
!Esound/core/isadma.c
</sect1>
<sect1><title>Other Helper Macros</title>
!Iinclude/sound/core.h
</sect1>
</chapter>
</book>

File diff suppressed because it is too large Load Diff

View File

@@ -1,443 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="debug-objects-guide">
<bookinfo>
<title>Debug objects life time</title>
<authorgroup>
<author>
<firstname>Thomas</firstname>
<surname>Gleixner</surname>
<affiliation>
<address>
<email>tglx@linutronix.de</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2008</year>
<holder>Thomas Gleixner</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License version 2 as published by the Free Software Foundation.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
debugobjects is a generic infrastructure to track the life time
of kernel objects and validate the operations on those.
</para>
<para>
debugobjects is useful to check for the following error patterns:
<itemizedlist>
<listitem><para>Activation of uninitialized objects</para></listitem>
<listitem><para>Initialization of active objects</para></listitem>
<listitem><para>Usage of freed/destroyed objects</para></listitem>
</itemizedlist>
</para>
<para>
debugobjects is not changing the data structure of the real
object so it can be compiled in with a minimal runtime impact
and enabled on demand with a kernel command line option.
</para>
</chapter>
<chapter id="howto">
<title>Howto use debugobjects</title>
<para>
A kernel subsystem needs to provide a data structure which
describes the object type and add calls into the debug code at
appropriate places. The data structure to describe the object
type needs at minimum the name of the object type. Optional
functions can and should be provided to fixup detected problems
so the kernel can continue to work and the debug information can
be retrieved from a live system instead of hard core debugging
with serial consoles and stack trace transcripts from the
monitor.
</para>
<para>
The debug calls provided by debugobjects are:
<itemizedlist>
<listitem><para>debug_object_init</para></listitem>
<listitem><para>debug_object_init_on_stack</para></listitem>
<listitem><para>debug_object_activate</para></listitem>
<listitem><para>debug_object_deactivate</para></listitem>
<listitem><para>debug_object_destroy</para></listitem>
<listitem><para>debug_object_free</para></listitem>
<listitem><para>debug_object_assert_init</para></listitem>
</itemizedlist>
Each of these functions takes the address of the real object and
a pointer to the object type specific debug description
structure.
</para>
<para>
Each detected error is reported in the statistics and a limited
number of errors are printk'ed including a full stack trace.
</para>
<para>
The statistics are available via /sys/kernel/debug/debug_objects/stats.
They provide information about the number of warnings and the
number of successful fixups along with information about the
usage of the internal tracking objects and the state of the
internal tracking objects pool.
</para>
</chapter>
<chapter id="debugfunctions">
<title>Debug functions</title>
<sect1 id="prototypes">
<title>Debug object function reference</title>
!Elib/debugobjects.c
</sect1>
<sect1 id="debug_object_init">
<title>debug_object_init</title>
<para>
This function is called whenever the initialization function
of a real object is called.
</para>
<para>
When the real object is already tracked by debugobjects it is
checked, whether the object can be initialized. Initializing
is not allowed for active and destroyed objects. When
debugobjects detects an error, then it calls the fixup_init
function of the object type description structure if provided
by the caller. The fixup function can correct the problem
before the real initialization of the object happens. E.g. it
can deactivate an active object in order to prevent damage to
the subsystem.
</para>
<para>
When the real object is not yet tracked by debugobjects,
debugobjects allocates a tracker object for the real object
and sets the tracker object state to ODEBUG_STATE_INIT. It
verifies that the object is not on the callers stack. If it is
on the callers stack then a limited number of warnings
including a full stack trace is printk'ed. The calling code
must use debug_object_init_on_stack() and remove the object
before leaving the function which allocated it. See next
section.
</para>
</sect1>
<sect1 id="debug_object_init_on_stack">
<title>debug_object_init_on_stack</title>
<para>
This function is called whenever the initialization function
of a real object which resides on the stack is called.
</para>
<para>
When the real object is already tracked by debugobjects it is
checked, whether the object can be initialized. Initializing
is not allowed for active and destroyed objects. When
debugobjects detects an error, then it calls the fixup_init
function of the object type description structure if provided
by the caller. The fixup function can correct the problem
before the real initialization of the object happens. E.g. it
can deactivate an active object in order to prevent damage to
the subsystem.
</para>
<para>
When the real object is not yet tracked by debugobjects
debugobjects allocates a tracker object for the real object
and sets the tracker object state to ODEBUG_STATE_INIT. It
verifies that the object is on the callers stack.
</para>
<para>
An object which is on the stack must be removed from the
tracker by calling debug_object_free() before the function
which allocates the object returns. Otherwise we keep track of
stale objects.
</para>
</sect1>
<sect1 id="debug_object_activate">
<title>debug_object_activate</title>
<para>
This function is called whenever the activation function of a
real object is called.
</para>
<para>
When the real object is already tracked by debugobjects it is
checked, whether the object can be activated. Activating is
not allowed for active and destroyed objects. When
debugobjects detects an error, then it calls the
fixup_activate function of the object type description
structure if provided by the caller. The fixup function can
correct the problem before the real activation of the object
happens. E.g. it can deactivate an active object in order to
prevent damage to the subsystem.
</para>
<para>
When the real object is not yet tracked by debugobjects then
the fixup_activate function is called if available. This is
necessary to allow the legitimate activation of statically
allocated and initialized objects. The fixup function checks
whether the object is valid and calls the debug_objects_init()
function to initialize the tracking of this object.
</para>
<para>
When the activation is legitimate, then the state of the
associated tracker object is set to ODEBUG_STATE_ACTIVE.
</para>
</sect1>
<sect1 id="debug_object_deactivate">
<title>debug_object_deactivate</title>
<para>
This function is called whenever the deactivation function of
a real object is called.
</para>
<para>
When the real object is tracked by debugobjects it is checked,
whether the object can be deactivated. Deactivating is not
allowed for untracked or destroyed objects.
</para>
<para>
When the deactivation is legitimate, then the state of the
associated tracker object is set to ODEBUG_STATE_INACTIVE.
</para>
</sect1>
<sect1 id="debug_object_destroy">
<title>debug_object_destroy</title>
<para>
This function is called to mark an object destroyed. This is
useful to prevent the usage of invalid objects, which are
still available in memory: either statically allocated objects
or objects which are freed later.
</para>
<para>
When the real object is tracked by debugobjects it is checked,
whether the object can be destroyed. Destruction is not
allowed for active and destroyed objects. When debugobjects
detects an error, then it calls the fixup_destroy function of
the object type description structure if provided by the
caller. The fixup function can correct the problem before the
real destruction of the object happens. E.g. it can deactivate
an active object in order to prevent damage to the subsystem.
</para>
<para>
When the destruction is legitimate, then the state of the
associated tracker object is set to ODEBUG_STATE_DESTROYED.
</para>
</sect1>
<sect1 id="debug_object_free">
<title>debug_object_free</title>
<para>
This function is called before an object is freed.
</para>
<para>
When the real object is tracked by debugobjects it is checked,
whether the object can be freed. Free is not allowed for
active objects. When debugobjects detects an error, then it
calls the fixup_free function of the object type description
structure if provided by the caller. The fixup function can
correct the problem before the real free of the object
happens. E.g. it can deactivate an active object in order to
prevent damage to the subsystem.
</para>
<para>
Note that debug_object_free removes the object from the
tracker. Later usage of the object is detected by the other
debug checks.
</para>
</sect1>
<sect1 id="debug_object_assert_init">
<title>debug_object_assert_init</title>
<para>
This function is called to assert that an object has been
initialized.
</para>
<para>
When the real object is not tracked by debugobjects, it calls
fixup_assert_init of the object type description structure
provided by the caller, with the hardcoded object state
ODEBUG_NOT_AVAILABLE. The fixup function can correct the problem
by calling debug_object_init and other specific initializing
functions.
</para>
<para>
When the real object is already tracked by debugobjects it is
ignored.
</para>
</sect1>
</chapter>
<chapter id="fixupfunctions">
<title>Fixup functions</title>
<sect1 id="debug_obj_descr">
<title>Debug object type description structure</title>
!Iinclude/linux/debugobjects.h
</sect1>
<sect1 id="fixup_init">
<title>fixup_init</title>
<para>
This function is called from the debug code whenever a problem
in debug_object_init is detected. The function takes the
address of the object and the state which is currently
recorded in the tracker.
</para>
<para>
Called from debug_object_init when the object state is:
<itemizedlist>
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
</itemizedlist>
</para>
<para>
The function returns true when the fixup was successful,
otherwise false. The return value is used to update the
statistics.
</para>
<para>
Note, that the function needs to call the debug_object_init()
function again, after the damage has been repaired in order to
keep the state consistent.
</para>
</sect1>
<sect1 id="fixup_activate">
<title>fixup_activate</title>
<para>
This function is called from the debug code whenever a problem
in debug_object_activate is detected.
</para>
<para>
Called from debug_object_activate when the object state is:
<itemizedlist>
<listitem><para>ODEBUG_STATE_NOTAVAILABLE</para></listitem>
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
</itemizedlist>
</para>
<para>
The function returns true when the fixup was successful,
otherwise false. The return value is used to update the
statistics.
</para>
<para>
Note that the function needs to call the debug_object_activate()
function again after the damage has been repaired in order to
keep the state consistent.
</para>
<para>
The activation of statically initialized objects is a special
case. When debug_object_activate() has no tracked object for
this object address then fixup_activate() is called with
object state ODEBUG_STATE_NOTAVAILABLE. The fixup function
needs to check whether this is a legitimate case of a
statically initialized object or not. In case it is it calls
debug_object_init() and debug_object_activate() to make the
object known to the tracker and marked active. In this case
the function should return false because this is not a real
fixup.
</para>
</sect1>
<sect1 id="fixup_destroy">
<title>fixup_destroy</title>
<para>
This function is called from the debug code whenever a problem
in debug_object_destroy is detected.
</para>
<para>
Called from debug_object_destroy when the object state is:
<itemizedlist>
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
</itemizedlist>
</para>
<para>
The function returns true when the fixup was successful,
otherwise false. The return value is used to update the
statistics.
</para>
</sect1>
<sect1 id="fixup_free">
<title>fixup_free</title>
<para>
This function is called from the debug code whenever a problem
in debug_object_free is detected. Further it can be called
from the debug checks in kfree/vfree, when an active object is
detected from the debug_check_no_obj_freed() sanity checks.
</para>
<para>
Called from debug_object_free() or debug_check_no_obj_freed()
when the object state is:
<itemizedlist>
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
</itemizedlist>
</para>
<para>
The function returns true when the fixup was successful,
otherwise false. The return value is used to update the
statistics.
</para>
</sect1>
<sect1 id="fixup_assert_init">
<title>fixup_assert_init</title>
<para>
This function is called from the debug code whenever a problem
in debug_object_assert_init is detected.
</para>
<para>
Called from debug_object_assert_init() with a hardcoded state
ODEBUG_STATE_NOTAVAILABLE when the object is not found in the
debug bucket.
</para>
<para>
The function returns true when the fixup was successful,
otherwise false. The return value is used to update the
statistics.
</para>
<para>
Note, this function should make sure debug_object_init() is
called before returning.
</para>
<para>
The handling of statically initialized objects is a special
case. The fixup function should check if this is a legitimate
case of a statically initialized object or not. In this case only
debug_object_init() should be called to make the object known to
the tracker. Then the function should return false because this
is not
a real fixup.
</para>
</sect1>
</chapter>
<chapter id="bugs">
<title>Known Bugs And Assumptions</title>
<para>
None (knock on wood).
</para>
</chapter>
</book>

View File

@@ -483,7 +483,7 @@ printk(KERN_INFO "my ip: %pI4\n", &amp;ipaddress);
<function>get_user()</function>
/
<function>put_user()</function>
<filename class="headerfile">include/asm/uaccess.h</filename>
<filename class="headerfile">include/linux/uaccess.h</filename>
</title>
<para>
@@ -1208,8 +1208,8 @@ static struct block_device_operations opt_fops = {
<listitem>
<para>
Finally, don't forget to read <filename>Documentation/SubmittingPatches</filename>
and possibly <filename>Documentation/SubmittingDrivers</filename>.
Finally, don't forget to read <filename>Documentation/process/submitting-patches.rst</filename>
and possibly <filename>Documentation/process/submitting-drivers.rst</filename>.
</para>
</listitem>
</itemizedlist>

View File

@@ -1,112 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="Tracepoints">
<bookinfo>
<title>The Linux Kernel Tracepoint API</title>
<authorgroup>
<author>
<firstname>Jason</firstname>
<surname>Baron</surname>
<affiliation>
<address>
<email>jbaron@redhat.com</email>
</address>
</affiliation>
</author>
<author>
<firstname>William</firstname>
<surname>Cohen</surname>
<affiliation>
<address>
<email>wcohen@redhat.com</email>
</address>
</affiliation>
</author>
</authorgroup>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
Tracepoints are static probe points that are located in strategic points
throughout the kernel. 'Probes' register/unregister with tracepoints
via a callback mechanism. The 'probes' are strictly typed functions that
are passed a unique set of parameters defined by each tracepoint.
</para>
<para>
From this simple callback mechanism, 'probes' can be used to profile, debug,
and understand kernel behavior. There are a number of tools that provide a
framework for using 'probes'. These tools include Systemtap, ftrace, and
LTTng.
</para>
<para>
Tracepoints are defined in a number of header files via various macros. Thus,
the purpose of this document is to provide a clear accounting of the available
tracepoints. The intention is to understand not only what tracepoints are
available but also to understand where future tracepoints might be added.
</para>
<para>
The API presented has functions of the form:
<function>trace_tracepointname(function parameters)</function>. These are the
tracepoints callbacks that are found throughout the code. Registering and
unregistering probes with these callback sites is covered in the
<filename>Documentation/trace/*</filename> directory.
</para>
</chapter>
<chapter id="irq">
<title>IRQ</title>
!Iinclude/trace/events/irq.h
</chapter>
<chapter id="signal">
<title>SIGNAL</title>
!Iinclude/trace/events/signal.h
</chapter>
<chapter id="block">
<title>Block IO</title>
!Iinclude/trace/events/block.h
</chapter>
<chapter id="workqueue">
<title>Workqueue</title>
!Iinclude/trace/events/workqueue.h
</chapter>
</book>

View File

@@ -45,6 +45,13 @@ GPL version 2.
</abstract>
<revhistory>
<revision>
<revnumber>0.10</revnumber>
<date>2016-10-17</date>
<authorinitials>sch</authorinitials>
<revremark>Added generic hyperv driver
</revremark>
</revision>
<revision>
<revnumber>0.9</revnumber>
<date>2009-07-16</date>
@@ -1033,6 +1040,61 @@ int main()
</chapter>
<chapter id="uio_hv_generic" xreflabel="Using Generic driver for Hyper-V VMBUS">
<?dbhtml filename="uio_hv_generic.html"?>
<title>Generic Hyper-V UIO driver</title>
<para>
The generic driver is a kernel module named uio_hv_generic.
It supports devices on the Hyper-V VMBus similar to uio_pci_generic
on PCI bus.
</para>
<sect1 id="uio_hv_generic_binding">
<title>Making the driver recognize the device</title>
<para>
Since the driver does not declare any device GUID's, it will not get loaded
automatically and will not automatically bind to any devices, you must load it
and allocate id to the driver yourself. For example, to use the network device
GUID:
<programlisting>
modprobe uio_hv_generic
echo &quot;f8615163-df3e-46c5-913f-f2d2f965ed0e&quot; &gt; /sys/bus/vmbus/drivers/uio_hv_generic/new_id
</programlisting>
</para>
<para>
If there already is a hardware specific kernel driver for the device, the
generic driver still won't bind to it, in this case if you want to use the
generic driver (why would you?) you'll have to manually unbind the hardware
specific driver and bind the generic driver, like this:
<programlisting>
echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 &gt; /sys/bus/vmbus/drivers/hv_netvsc/unbind
echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 &gt; /sys/bus/vmbus/drivers/uio_hv_generic/bind
</programlisting>
</para>
<para>
You can verify that the device has been bound to the driver
by looking for it in sysfs, for example like the following:
<programlisting>
ls -l /sys/bus/vmbus/devices/vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver
</programlisting>
Which if successful should print
<programlisting>
.../vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver -&gt; ../../../bus/vmbus/drivers/uio_hv_generic
</programlisting>
</para>
</sect1>
<sect1 id="uio_hv_generic_internals">
<title>Things to know about uio_hv_generic</title>
<para>
On each interrupt, uio_hv_generic sets the Interrupt Disable bit.
This prevents the device from generating further interrupts
until the bit is cleared. The userspace driver should clear this
bit before blocking and waiting for more interrupts.
</para>
</sect1>
</chapter>
<appendix id="app1">
<title>Further information</title>
<itemizedlist>

View File

@@ -1,992 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="Linux-USB-API">
<bookinfo>
<title>The Linux-USB Host Side API</title>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction to USB on Linux</title>
<para>A Universal Serial Bus (USB) is used to connect a host,
such as a PC or workstation, to a number of peripheral
devices. USB uses a tree structure, with the host as the
root (the system's master), hubs as interior nodes, and
peripherals as leaves (and slaves).
Modern PCs support several such trees of USB devices, usually
one USB 2.0 tree (480 Mbit/sec each) with
a few USB 1.1 trees (12 Mbit/sec each) that are used when you
connect a USB 1.1 device directly to the machine's "root hub".
</para>
<para>That master/slave asymmetry was designed-in for a number of
reasons, one being ease of use. It is not physically possible to
assemble (legal) USB cables incorrectly: all upstream "to the host"
connectors are the rectangular type (matching the sockets on
root hubs), and all downstream connectors are the squarish type
(or they are built into the peripheral).
Also, the host software doesn't need to deal with distributed
auto-configuration since the pre-designated master node manages all that.
And finally, at the electrical level, bus protocol overhead is reduced by
eliminating arbitration and moving scheduling into the host software.
</para>
<para>USB 1.0 was announced in January 1996 and was revised
as USB 1.1 (with improvements in hub specification and
support for interrupt-out transfers) in September 1998.
USB 2.0 was released in April 2000, adding high-speed
transfers and transaction-translating hubs (used for USB 1.1
and 1.0 backward compatibility).
</para>
<para>Kernel developers added USB support to Linux early in the 2.2 kernel
series, shortly before 2.3 development forked. Updates from 2.3 were
regularly folded back into 2.2 releases, which improved reliability and
brought <filename>/sbin/hotplug</filename> support as well more drivers.
Such improvements were continued in the 2.5 kernel series, where they added
USB 2.0 support, improved performance, and made the host controller drivers
(HCDs) more consistent. They also simplified the API (to make bugs less
likely) and added internal "kerneldoc" documentation.
</para>
<para>Linux can run inside USB devices as well as on
the hosts that control the devices.
But USB device drivers running inside those peripherals
don't do the same things as the ones running inside hosts,
so they've been given a different name:
<emphasis>gadget drivers</emphasis>.
This document does not cover gadget drivers.
</para>
</chapter>
<chapter id="host">
<title>USB Host-Side API Model</title>
<para>Host-side drivers for USB devices talk to the "usbcore" APIs.
There are two. One is intended for
<emphasis>general-purpose</emphasis> drivers (exposed through
driver frameworks), and the other is for drivers that are
<emphasis>part of the core</emphasis>.
Such core drivers include the <emphasis>hub</emphasis> driver
(which manages trees of USB devices) and several different kinds
of <emphasis>host controller drivers</emphasis>,
which control individual busses.
</para>
<para>The device model seen by USB drivers is relatively complex.
</para>
<itemizedlist>
<listitem><para>USB supports four kinds of data transfers
(control, bulk, interrupt, and isochronous). Two of them (control
and bulk) use bandwidth as it's available,
while the other two (interrupt and isochronous)
are scheduled to provide guaranteed bandwidth.
</para></listitem>
<listitem><para>The device description model includes one or more
"configurations" per device, only one of which is active at a time.
Devices that are capable of high-speed operation must also support
full-speed configurations, along with a way to ask about the
"other speed" configurations which might be used.
</para></listitem>
<listitem><para>Configurations have one or more "interfaces", each
of which may have "alternate settings". Interfaces may be
standardized by USB "Class" specifications, or may be specific to
a vendor or device.</para>
<para>USB device drivers actually bind to interfaces, not devices.
Think of them as "interface drivers", though you
may not see many devices where the distinction is important.
<emphasis>Most USB devices are simple, with only one configuration,
one interface, and one alternate setting.</emphasis>
</para></listitem>
<listitem><para>Interfaces have one or more "endpoints", each of
which supports one type and direction of data transfer such as
"bulk out" or "interrupt in". The entire configuration may have
up to sixteen endpoints in each direction, allocated as needed
among all the interfaces.
</para></listitem>
<listitem><para>Data transfer on USB is packetized; each endpoint
has a maximum packet size.
Drivers must often be aware of conventions such as flagging the end
of bulk transfers using "short" (including zero length) packets.
</para></listitem>
<listitem><para>The Linux USB API supports synchronous calls for
control and bulk messages.
It also supports asynchronous calls for all kinds of data transfer,
using request structures called "URBs" (USB Request Blocks).
</para></listitem>
</itemizedlist>
<para>Accordingly, the USB Core API exposed to device drivers
covers quite a lot of territory. You'll probably need to consult
the USB 2.0 specification, available online from www.usb.org at
no cost, as well as class or device specifications.
</para>
<para>The only host-side drivers that actually touch hardware
(reading/writing registers, handling IRQs, and so on) are the HCDs.
In theory, all HCDs provide the same functionality through the same
API. In practice, that's becoming more true on the 2.5 kernels,
but there are still differences that crop up especially with
fault handling. Different controllers don't necessarily report
the same aspects of failures, and recovery from faults (including
software-induced ones like unlinking an URB) isn't yet fully
consistent.
Device driver authors should make a point of doing disconnect
testing (while the device is active) with each different host
controller driver, to make sure drivers don't have bugs of
their own as well as to make sure they aren't relying on some
HCD-specific behavior.
(You will need external USB 1.1 and/or
USB 2.0 hubs to perform all those tests.)
</para>
</chapter>
<chapter id="types"><title>USB-Standard Types</title>
<para>In <filename>&lt;linux/usb/ch9.h&gt;</filename> you will find
the USB data types defined in chapter 9 of the USB specification.
These data types are used throughout USB, and in APIs including
this host side API, gadget APIs, and usbfs.
</para>
!Iinclude/linux/usb/ch9.h
</chapter>
<chapter id="hostside"><title>Host-Side Data Types and Macros</title>
<para>The host side API exposes several layers to drivers, some of
which are more necessary than others.
These support lifecycle models for host side drivers
and devices, and support passing buffers through usbcore to
some HCD that performs the I/O for the device driver.
</para>
!Iinclude/linux/usb.h
</chapter>
<chapter id="usbcore"><title>USB Core APIs</title>
<para>There are two basic I/O models in the USB API.
The most elemental one is asynchronous: drivers submit requests
in the form of an URB, and the URB's completion callback
handle the next step.
All USB transfer types support that model, although there
are special cases for control URBs (which always have setup
and status stages, but may not have a data stage) and
isochronous URBs (which allow large packets and include
per-packet fault reports).
Built on top of that is synchronous API support, where a
driver calls a routine that allocates one or more URBs,
submits them, and waits until they complete.
There are synchronous wrappers for single-buffer control
and bulk transfers (which are awkward to use in some
driver disconnect scenarios), and for scatterlist based
streaming i/o (bulk or interrupt).
</para>
<para>USB drivers need to provide buffers that can be
used for DMA, although they don't necessarily need to
provide the DMA mapping themselves.
There are APIs to use used when allocating DMA buffers,
which can prevent use of bounce buffers on some systems.
In some cases, drivers may be able to rely on 64bit DMA
to eliminate another kind of bounce buffer.
</para>
!Edrivers/usb/core/urb.c
!Edrivers/usb/core/message.c
!Edrivers/usb/core/file.c
!Edrivers/usb/core/driver.c
!Edrivers/usb/core/usb.c
!Edrivers/usb/core/hub.c
</chapter>
<chapter id="hcd"><title>Host Controller APIs</title>
<para>These APIs are only for use by host controller drivers,
most of which implement standard register interfaces such as
EHCI, OHCI, or UHCI.
UHCI was one of the first interfaces, designed by Intel and
also used by VIA; it doesn't do much in hardware.
OHCI was designed later, to have the hardware do more work
(bigger transfers, tracking protocol state, and so on).
EHCI was designed with USB 2.0; its design has features that
resemble OHCI (hardware does much more work) as well as
UHCI (some parts of ISO support, TD list processing).
</para>
<para>There are host controllers other than the "big three",
although most PCI based controllers (and a few non-PCI based
ones) use one of those interfaces.
Not all host controllers use DMA; some use PIO, and there
is also a simulator.
</para>
<para>The same basic APIs are available to drivers for all
those controllers.
For historical reasons they are in two layers:
<structname>struct usb_bus</structname> is a rather thin
layer that became available in the 2.2 kernels, while
<structname>struct usb_hcd</structname> is a more featureful
layer (available in later 2.4 kernels and in 2.5) that
lets HCDs share common code, to shrink driver size
and significantly reduce hcd-specific behaviors.
</para>
!Edrivers/usb/core/hcd.c
!Edrivers/usb/core/hcd-pci.c
!Idrivers/usb/core/buffer.c
</chapter>
<chapter id="usbfs">
<title>The USB Filesystem (usbfs)</title>
<para>This chapter presents the Linux <emphasis>usbfs</emphasis>.
You may prefer to avoid writing new kernel code for your
USB driver; that's the problem that usbfs set out to solve.
User mode device drivers are usually packaged as applications
or libraries, and may use usbfs through some programming library
that wraps it. Such libraries include
<ulink url="http://libusb.sourceforge.net">libusb</ulink>
for C/C++, and
<ulink url="http://jUSB.sourceforge.net">jUSB</ulink> for Java.
</para>
<note><title>Unfinished</title>
<para>This particular documentation is incomplete,
especially with respect to the asynchronous mode.
As of kernel 2.5.66 the code and this (new) documentation
need to be cross-reviewed.
</para>
</note>
<para>Configure usbfs into Linux kernels by enabling the
<emphasis>USB filesystem</emphasis> option (CONFIG_USB_DEVICEFS),
and you get basic support for user mode USB device drivers.
Until relatively recently it was often (confusingly) called
<emphasis>usbdevfs</emphasis> although it wasn't solving what
<emphasis>devfs</emphasis> was.
Every USB device will appear in usbfs, regardless of whether or
not it has a kernel driver.
</para>
<sect1 id="usbfs-files">
<title>What files are in "usbfs"?</title>
<para>Conventionally mounted at
<filename>/proc/bus/usb</filename>, usbfs
features include:
<itemizedlist>
<listitem><para><filename>/proc/bus/usb/devices</filename>
... a text file
showing each of the USB devices on known to the kernel,
and their configuration descriptors.
You can also poll() this to learn about new devices.
</para></listitem>
<listitem><para><filename>/proc/bus/usb/BBB/DDD</filename>
... magic files
exposing the each device's configuration descriptors, and
supporting a series of ioctls for making device requests,
including I/O to devices. (Purely for access by programs.)
</para></listitem>
</itemizedlist>
</para>
<para> Each bus is given a number (BBB) based on when it was
enumerated; within each bus, each device is given a similar
number (DDD).
Those BBB/DDD paths are not "stable" identifiers;
expect them to change even if you always leave the devices
plugged in to the same hub port.
<emphasis>Don't even think of saving these in application
configuration files.</emphasis>
Stable identifiers are available, for user mode applications
that want to use them. HID and networking devices expose
these stable IDs, so that for example you can be sure that
you told the right UPS to power down its second server.
"usbfs" doesn't (yet) expose those IDs.
</para>
</sect1>
<sect1 id="usbfs-fstab">
<title>Mounting and Access Control</title>
<para>There are a number of mount options for usbfs, which will
be of most interest to you if you need to override the default
access control policy.
That policy is that only root may read or write device files
(<filename>/proc/bus/BBB/DDD</filename>) although anyone may read
the <filename>devices</filename>
or <filename>drivers</filename> files.
I/O requests to the device also need the CAP_SYS_RAWIO capability,
</para>
<para>The significance of that is that by default, all user mode
device drivers need super-user privileges.
You can change modes or ownership in a driver setup
when the device hotplugs, or maye just start the
driver right then, as a privileged server (or some activity
within one).
That's the most secure approach for multi-user systems,
but for single user systems ("trusted" by that user)
it's more convenient just to grant everyone all access
(using the <emphasis>devmode=0666</emphasis> option)
so the driver can start whenever it's needed.
</para>
<para>The mount options for usbfs, usable in /etc/fstab or
in command line invocations of <emphasis>mount</emphasis>, are:
<variablelist>
<varlistentry>
<term><emphasis>busgid</emphasis>=NNNNN</term>
<listitem><para>Controls the GID used for the
/proc/bus/usb/BBB
directories. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>busmode</emphasis>=MMM</term>
<listitem><para>Controls the file mode used for the
/proc/bus/usb/BBB
directories. (Default: 0555)
</para></listitem></varlistentry>
<varlistentry><term><emphasis>busuid</emphasis>=NNNNN</term>
<listitem><para>Controls the UID used for the
/proc/bus/usb/BBB
directories. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>devgid</emphasis>=NNNNN</term>
<listitem><para>Controls the GID used for the
/proc/bus/usb/BBB/DDD
files. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>devmode</emphasis>=MMM</term>
<listitem><para>Controls the file mode used for the
/proc/bus/usb/BBB/DDD
files. (Default: 0644)</para></listitem></varlistentry>
<varlistentry><term><emphasis>devuid</emphasis>=NNNNN</term>
<listitem><para>Controls the UID used for the
/proc/bus/usb/BBB/DDD
files. (Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>listgid</emphasis>=NNNNN</term>
<listitem><para>Controls the GID used for the
/proc/bus/usb/devices and drivers files.
(Default: 0)</para></listitem></varlistentry>
<varlistentry><term><emphasis>listmode</emphasis>=MMM</term>
<listitem><para>Controls the file mode used for the
/proc/bus/usb/devices and drivers files.
(Default: 0444)</para></listitem></varlistentry>
<varlistentry><term><emphasis>listuid</emphasis>=NNNNN</term>
<listitem><para>Controls the UID used for the
/proc/bus/usb/devices and drivers files.
(Default: 0)</para></listitem></varlistentry>
</variablelist>
</para>
<para>Note that many Linux distributions hard-wire the mount options
for usbfs in their init scripts, such as
<filename>/etc/rc.d/rc.sysinit</filename>,
rather than making it easy to set this per-system
policy in <filename>/etc/fstab</filename>.
</para>
</sect1>
<sect1 id="usbfs-devices">
<title>/proc/bus/usb/devices</title>
<para>This file is handy for status viewing tools in user
mode, which can scan the text format and ignore most of it.
More detailed device status (including class and vendor
status) is available from device-specific files.
For information about the current format of this file,
see the
<filename>Documentation/usb/proc_usb_info.txt</filename>
file in your Linux kernel sources.
</para>
<para>This file, in combination with the poll() system call, can
also be used to detect when devices are added or removed:
<programlisting>int fd;
struct pollfd pfd;
fd = open("/proc/bus/usb/devices", O_RDONLY);
pfd = { fd, POLLIN, 0 };
for (;;) {
/* The first time through, this call will return immediately. */
poll(&amp;pfd, 1, -1);
/* To see what's changed, compare the file's previous and current
contents or scan the filesystem. (Scanning is more precise.) */
}</programlisting>
Note that this behavior is intended to be used for informational
and debug purposes. It would be more appropriate to use programs
such as udev or HAL to initialize a device or start a user-mode
helper program, for instance.
</para>
</sect1>
<sect1 id="usbfs-bbbddd">
<title>/proc/bus/usb/BBB/DDD</title>
<para>Use these files in one of these basic ways:
</para>
<para><emphasis>They can be read,</emphasis>
producing first the device descriptor
(18 bytes) and then the descriptors for the current configuration.
See the USB 2.0 spec for details about those binary data formats.
You'll need to convert most multibyte values from little endian
format to your native host byte order, although a few of the
fields in the device descriptor (both of the BCD-encoded fields,
and the vendor and product IDs) will be byteswapped for you.
Note that configuration descriptors include descriptors for
interfaces, altsettings, endpoints, and maybe additional
class descriptors.
</para>
<para><emphasis>Perform USB operations</emphasis> using
<emphasis>ioctl()</emphasis> requests to make endpoint I/O
requests (synchronously or asynchronously) or manage
the device.
These requests need the CAP_SYS_RAWIO capability,
as well as filesystem access permissions.
Only one ioctl request can be made on one of these
device files at a time.
This means that if you are synchronously reading an endpoint
from one thread, you won't be able to write to a different
endpoint from another thread until the read completes.
This works for <emphasis>half duplex</emphasis> protocols,
but otherwise you'd use asynchronous i/o requests.
</para>
</sect1>
<sect1 id="usbfs-lifecycle">
<title>Life Cycle of User Mode Drivers</title>
<para>Such a driver first needs to find a device file
for a device it knows how to handle.
Maybe it was told about it because a
<filename>/sbin/hotplug</filename> event handling agent
chose that driver to handle the new device.
Or maybe it's an application that scans all the
/proc/bus/usb device files, and ignores most devices.
In either case, it should <function>read()</function> all
the descriptors from the device file,
and check them against what it knows how to handle.
It might just reject everything except a particular
vendor and product ID, or need a more complex policy.
</para>
<para>Never assume there will only be one such device
on the system at a time!
If your code can't handle more than one device at
a time, at least detect when there's more than one, and
have your users choose which device to use.
</para>
<para>Once your user mode driver knows what device to use,
it interacts with it in either of two styles.
The simple style is to make only control requests; some
devices don't need more complex interactions than those.
(An example might be software using vendor-specific control
requests for some initialization or configuration tasks,
with a kernel driver for the rest.)
</para>
<para>More likely, you need a more complex style driver:
one using non-control endpoints, reading or writing data
and claiming exclusive use of an interface.
<emphasis>Bulk</emphasis> transfers are easiest to use,
but only their sibling <emphasis>interrupt</emphasis> transfers
work with low speed devices.
Both interrupt and <emphasis>isochronous</emphasis> transfers
offer service guarantees because their bandwidth is reserved.
Such "periodic" transfers are awkward to use through usbfs,
unless you're using the asynchronous calls. However, interrupt
transfers can also be used in a synchronous "one shot" style.
</para>
<para>Your user-mode driver should never need to worry
about cleaning up request state when the device is
disconnected, although it should close its open file
descriptors as soon as it starts seeing the ENODEV
errors.
</para>
</sect1>
<sect1 id="usbfs-ioctl"><title>The ioctl() Requests</title>
<para>To use these ioctls, you need to include the following
headers in your userspace program:
<programlisting>#include &lt;linux/usb.h&gt;
#include &lt;linux/usbdevice_fs.h&gt;
#include &lt;asm/byteorder.h&gt;</programlisting>
The standard USB device model requests, from "Chapter 9" of
the USB 2.0 specification, are automatically included from
the <filename>&lt;linux/usb/ch9.h&gt;</filename> header.
</para>
<para>Unless noted otherwise, the ioctl requests
described here will
update the modification time on the usbfs file to which
they are applied (unless they fail).
A return of zero indicates success; otherwise, a
standard USB error code is returned. (These are
documented in
<filename>Documentation/usb/error-codes.txt</filename>
in your kernel sources.)
</para>
<para>Each of these files multiplexes access to several
I/O streams, one per endpoint.
Each device has one control endpoint (endpoint zero)
which supports a limited RPC style RPC access.
Devices are configured
by hub_wq (in the kernel) setting a device-wide
<emphasis>configuration</emphasis> that affects things
like power consumption and basic functionality.
The endpoints are part of USB <emphasis>interfaces</emphasis>,
which may have <emphasis>altsettings</emphasis>
affecting things like which endpoints are available.
Many devices only have a single configuration and interface,
so drivers for them will ignore configurations and altsettings.
</para>
<sect2 id="usbfs-mgmt">
<title>Management/Status Requests</title>
<para>A number of usbfs requests don't deal very directly
with device I/O.
They mostly relate to device management and status.
These are all synchronous requests.
</para>
<variablelist>
<varlistentry><term>USBDEVFS_CLAIMINTERFACE</term>
<listitem><para>This is used to force usbfs to
claim a specific interface,
which has not previously been claimed by usbfs or any other
kernel driver.
The ioctl parameter is an integer holding the number of
the interface (bInterfaceNumber from descriptor).
</para><para>
Note that if your driver doesn't claim an interface
before trying to use one of its endpoints, and no
other driver has bound to it, then the interface is
automatically claimed by usbfs.
</para><para>
This claim will be released by a RELEASEINTERFACE ioctl,
or by closing the file descriptor.
File modification time is not updated by this request.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_CONNECTINFO</term>
<listitem><para>Says whether the device is lowspeed.
The ioctl parameter points to a structure like this:
<programlisting>struct usbdevfs_connectinfo {
unsigned int devnum;
unsigned char slow;
}; </programlisting>
File modification time is not updated by this request.
</para><para>
<emphasis>You can't tell whether a "not slow"
device is connected at high speed (480 MBit/sec)
or just full speed (12 MBit/sec).</emphasis>
You should know the devnum value already,
it's the DDD value of the device file name.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_GETDRIVER</term>
<listitem><para>Returns the name of the kernel driver
bound to a given interface (a string). Parameter
is a pointer to this structure, which is modified:
<programlisting>struct usbdevfs_getdriver {
unsigned int interface;
char driver[USBDEVFS_MAXDRIVERNAME + 1];
};</programlisting>
File modification time is not updated by this request.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_IOCTL</term>
<listitem><para>Passes a request from userspace through
to a kernel driver that has an ioctl entry in the
<emphasis>struct usb_driver</emphasis> it registered.
<programlisting>struct usbdevfs_ioctl {
int ifno;
int ioctl_code;
void *data;
};
/* user mode call looks like this.
* 'request' becomes the driver->ioctl() 'code' parameter.
* the size of 'param' is encoded in 'request', and that data
* is copied to or from the driver->ioctl() 'buf' parameter.
*/
static int
usbdev_ioctl (int fd, int ifno, unsigned request, void *param)
{
struct usbdevfs_ioctl wrapper;
wrapper.ifno = ifno;
wrapper.ioctl_code = request;
wrapper.data = param;
return ioctl (fd, USBDEVFS_IOCTL, &amp;wrapper);
} </programlisting>
File modification time is not updated by this request.
</para><para>
This request lets kernel drivers talk to user mode code
through filesystem operations even when they don't create
a character or block special device.
It's also been used to do things like ask devices what
device special file should be used.
Two pre-defined ioctls are used
to disconnect and reconnect kernel drivers, so
that user mode code can completely manage binding
and configuration of devices.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_RELEASEINTERFACE</term>
<listitem><para>This is used to release the claim usbfs
made on interface, either implicitly or because of a
USBDEVFS_CLAIMINTERFACE call, before the file
descriptor is closed.
The ioctl parameter is an integer holding the number of
the interface (bInterfaceNumber from descriptor);
File modification time is not updated by this request.
</para><warning><para>
<emphasis>No security check is made to ensure
that the task which made the claim is the one
which is releasing it.
This means that user mode driver may interfere
other ones. </emphasis>
</para></warning></listitem></varlistentry>
<varlistentry><term>USBDEVFS_RESETEP</term>
<listitem><para>Resets the data toggle value for an endpoint
(bulk or interrupt) to DATA0.
The ioctl parameter is an integer endpoint number
(1 to 15, as identified in the endpoint descriptor),
with USB_DIR_IN added if the device's endpoint sends
data to the host.
</para><warning><para>
<emphasis>Avoid using this request.
It should probably be removed.</emphasis>
Using it typically means the device and driver will lose
toggle synchronization. If you really lost synchronization,
you likely need to completely handshake with the device,
using a request like CLEAR_HALT
or SET_INTERFACE.
</para></warning></listitem></varlistentry>
<varlistentry><term>USBDEVFS_DROP_PRIVILEGES</term>
<listitem><para>This is used to relinquish the ability
to do certain operations which are considered to be
privileged on a usbfs file descriptor.
This includes claiming arbitrary interfaces, resetting
a device on which there are currently claimed interfaces
from other users, and issuing USBDEVFS_IOCTL calls.
The ioctl parameter is a 32 bit mask of interfaces
the user is allowed to claim on this file descriptor.
You may issue this ioctl more than one time to narrow
said mask.
</para></listitem></varlistentry>
</variablelist>
</sect2>
<sect2 id="usbfs-sync">
<title>Synchronous I/O Support</title>
<para>Synchronous requests involve the kernel blocking
until the user mode request completes, either by
finishing successfully or by reporting an error.
In most cases this is the simplest way to use usbfs,
although as noted above it does prevent performing I/O
to more than one endpoint at a time.
</para>
<variablelist>
<varlistentry><term>USBDEVFS_BULK</term>
<listitem><para>Issues a bulk read or write request to the
device.
The ioctl parameter is a pointer to this structure:
<programlisting>struct usbdevfs_bulktransfer {
unsigned int ep;
unsigned int len;
unsigned int timeout; /* in milliseconds */
void *data;
};</programlisting>
</para><para>The "ep" value identifies a
bulk endpoint number (1 to 15, as identified in an endpoint
descriptor),
masked with USB_DIR_IN when referring to an endpoint which
sends data to the host from the device.
The length of the data buffer is identified by "len";
Recent kernels support requests up to about 128KBytes.
<emphasis>FIXME say how read length is returned,
and how short reads are handled.</emphasis>.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_CLEAR_HALT</term>
<listitem><para>Clears endpoint halt (stall) and
resets the endpoint toggle. This is only
meaningful for bulk or interrupt endpoints.
The ioctl parameter is an integer endpoint number
(1 to 15, as identified in an endpoint descriptor),
masked with USB_DIR_IN when referring to an endpoint which
sends data to the host from the device.
</para><para>
Use this on bulk or interrupt endpoints which have
stalled, returning <emphasis>-EPIPE</emphasis> status
to a data transfer request.
Do not issue the control request directly, since
that could invalidate the host's record of the
data toggle.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_CONTROL</term>
<listitem><para>Issues a control request to the device.
The ioctl parameter points to a structure like this:
<programlisting>struct usbdevfs_ctrltransfer {
__u8 bRequestType;
__u8 bRequest;
__u16 wValue;
__u16 wIndex;
__u16 wLength;
__u32 timeout; /* in milliseconds */
void *data;
};</programlisting>
</para><para>
The first eight bytes of this structure are the contents
of the SETUP packet to be sent to the device; see the
USB 2.0 specification for details.
The bRequestType value is composed by combining a
USB_TYPE_* value, a USB_DIR_* value, and a
USB_RECIP_* value (from
<emphasis>&lt;linux/usb.h&gt;</emphasis>).
If wLength is nonzero, it describes the length of the data
buffer, which is either written to the device
(USB_DIR_OUT) or read from the device (USB_DIR_IN).
</para><para>
At this writing, you can't transfer more than 4 KBytes
of data to or from a device; usbfs has a limit, and
some host controller drivers have a limit.
(That's not usually a problem.)
<emphasis>Also</emphasis> there's no way to say it's
not OK to get a short read back from the device.
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_RESET</term>
<listitem><para>Does a USB level device reset.
The ioctl parameter is ignored.
After the reset, this rebinds all device interfaces.
File modification time is not updated by this request.
</para><warning><para>
<emphasis>Avoid using this call</emphasis>
until some usbcore bugs get fixed,
since it does not fully synchronize device, interface,
and driver (not just usbfs) state.
</para></warning></listitem></varlistentry>
<varlistentry><term>USBDEVFS_SETINTERFACE</term>
<listitem><para>Sets the alternate setting for an
interface. The ioctl parameter is a pointer to a
structure like this:
<programlisting>struct usbdevfs_setinterface {
unsigned int interface;
unsigned int altsetting;
}; </programlisting>
File modification time is not updated by this request.
</para><para>
Those struct members are from some interface descriptor
applying to the current configuration.
The interface number is the bInterfaceNumber value, and
the altsetting number is the bAlternateSetting value.
(This resets each endpoint in the interface.)
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_SETCONFIGURATION</term>
<listitem><para>Issues the
<function>usb_set_configuration</function> call
for the device.
The parameter is an integer holding the number of
a configuration (bConfigurationValue from descriptor).
File modification time is not updated by this request.
</para><warning><para>
<emphasis>Avoid using this call</emphasis>
until some usbcore bugs get fixed,
since it does not fully synchronize device, interface,
and driver (not just usbfs) state.
</para></warning></listitem></varlistentry>
</variablelist>
</sect2>
<sect2 id="usbfs-async">
<title>Asynchronous I/O Support</title>
<para>As mentioned above, there are situations where it may be
important to initiate concurrent operations from user mode code.
This is particularly important for periodic transfers
(interrupt and isochronous), but it can be used for other
kinds of USB requests too.
In such cases, the asynchronous requests described here
are essential. Rather than submitting one request and having
the kernel block until it completes, the blocking is separate.
</para>
<para>These requests are packaged into a structure that
resembles the URB used by kernel device drivers.
(No POSIX Async I/O support here, sorry.)
It identifies the endpoint type (USBDEVFS_URB_TYPE_*),
endpoint (number, masked with USB_DIR_IN as appropriate),
buffer and length, and a user "context" value serving to
uniquely identify each request.
(It's usually a pointer to per-request data.)
Flags can modify requests (not as many as supported for
kernel drivers).
</para>
<para>Each request can specify a realtime signal number
(between SIGRTMIN and SIGRTMAX, inclusive) to request a
signal be sent when the request completes.
</para>
<para>When usbfs returns these urbs, the status value
is updated, and the buffer may have been modified.
Except for isochronous transfers, the actual_length is
updated to say how many bytes were transferred; if the
USBDEVFS_URB_DISABLE_SPD flag is set
("short packets are not OK"), if fewer bytes were read
than were requested then you get an error report.
</para>
<programlisting>struct usbdevfs_iso_packet_desc {
unsigned int length;
unsigned int actual_length;
unsigned int status;
};
struct usbdevfs_urb {
unsigned char type;
unsigned char endpoint;
int status;
unsigned int flags;
void *buffer;
int buffer_length;
int actual_length;
int start_frame;
int number_of_packets;
int error_count;
unsigned int signr;
void *usercontext;
struct usbdevfs_iso_packet_desc iso_frame_desc[];
};</programlisting>
<para> For these asynchronous requests, the file modification
time reflects when the request was initiated.
This contrasts with their use with the synchronous requests,
where it reflects when requests complete.
</para>
<variablelist>
<varlistentry><term>USBDEVFS_DISCARDURB</term>
<listitem><para>
<emphasis>TBS</emphasis>
File modification time is not updated by this request.
</para><para>
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_DISCSIGNAL</term>
<listitem><para>
<emphasis>TBS</emphasis>
File modification time is not updated by this request.
</para><para>
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_REAPURB</term>
<listitem><para>
<emphasis>TBS</emphasis>
File modification time is not updated by this request.
</para><para>
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_REAPURBNDELAY</term>
<listitem><para>
<emphasis>TBS</emphasis>
File modification time is not updated by this request.
</para><para>
</para></listitem></varlistentry>
<varlistentry><term>USBDEVFS_SUBMITURB</term>
<listitem><para>
<emphasis>TBS</emphasis>
</para><para>
</para></listitem></varlistentry>
</variablelist>
</sect2>
</sect1>
</chapter>
</book>
<!-- vim:syntax=sgml:sw=4
-->

File diff suppressed because it is too large Load Diff

View File

@@ -1,604 +0,0 @@
HOWTO do Linux kernel development
---------------------------------
This is the be-all, end-all document on this topic. It contains
instructions on how to become a Linux kernel developer and how to learn
to work with the Linux kernel development community. It tries to not
contain anything related to the technical aspects of kernel programming,
but will help point you in the right direction for that.
If anything in this document becomes out of date, please send in patches
to the maintainer of this file, who is listed at the bottom of the
document.
Introduction
------------
So, you want to learn how to become a Linux kernel developer? Or you
have been told by your manager, "Go write a Linux driver for this
device." This document's goal is to teach you everything you need to
know to achieve this by describing the process you need to go through,
and hints on how to work with the community. It will also try to
explain some of the reasons why the community works like it does.
The kernel is written mostly in C, with some architecture-dependent
parts written in assembly. A good understanding of C is required for
kernel development. Assembly (any architecture) is not required unless
you plan to do low-level development for that architecture. Though they
are not a good substitute for a solid C education and/or years of
experience, the following books are good for, if anything, reference:
- "The C Programming Language" by Kernighan and Ritchie [Prentice Hall]
- "Practical C Programming" by Steve Oualline [O'Reilly]
- "C: A Reference Manual" by Harbison and Steele [Prentice Hall]
The kernel is written using GNU C and the GNU toolchain. While it
adheres to the ISO C89 standard, it uses a number of extensions that are
not featured in the standard. The kernel is a freestanding C
environment, with no reliance on the standard C library, so some
portions of the C standard are not supported. Arbitrary long long
divisions and floating point are not allowed. It can sometimes be
difficult to understand the assumptions the kernel has on the toolchain
and the extensions that it uses, and unfortunately there is no
definitive reference for them. Please check the gcc info pages (`info
gcc`) for some information on them.
Please remember that you are trying to learn how to work with the
existing development community. It is a diverse group of people, with
high standards for coding, style and procedure. These standards have
been created over time based on what they have found to work best for
such a large and geographically dispersed team. Try to learn as much as
possible about these standards ahead of time, as they are well
documented; do not expect people to adapt to you or your company's way
of doing things.
Legal Issues
------------
The Linux kernel source code is released under the GPL. Please see the
file, COPYING, in the main directory of the source tree, for details on
the license. If you have further questions about the license, please
contact a lawyer, and do not ask on the Linux kernel mailing list. The
people on the mailing lists are not lawyers, and you should not rely on
their statements on legal matters.
For common questions and answers about the GPL, please see:
http://www.gnu.org/licenses/gpl-faq.html
Documentation
-------------
The Linux kernel source tree has a large range of documents that are
invaluable for learning how to interact with the kernel community. When
new features are added to the kernel, it is recommended that new
documentation files are also added which explain how to use the feature.
When a kernel change causes the interface that the kernel exposes to
userspace to change, it is recommended that you send the information or
a patch to the manual pages explaining the change to the manual pages
maintainer at mtk.manpages@gmail.com, and CC the list
linux-api@vger.kernel.org.
Here is a list of files that are in the kernel source tree that are
required reading:
README
This file gives a short background on the Linux kernel and describes
what is necessary to do to configure and build the kernel. People
who are new to the kernel should start here.
Documentation/Changes
This file gives a list of the minimum levels of various software
packages that are necessary to build and run the kernel
successfully.
Documentation/CodingStyle
This describes the Linux kernel coding style, and some of the
rationale behind it. All new code is expected to follow the
guidelines in this document. Most maintainers will only accept
patches if these rules are followed, and many people will only
review code if it is in the proper style.
Documentation/SubmittingPatches
Documentation/SubmittingDrivers
These files describe in explicit detail how to successfully create
and send a patch, including (but not limited to):
- Email contents
- Email format
- Who to send it to
Following these rules will not guarantee success (as all patches are
subject to scrutiny for content and style), but not following them
will almost always prevent it.
Other excellent descriptions of how to create patches properly are:
"The Perfect Patch"
http://www.ozlabs.org/~akpm/stuff/tpp.txt
"Linux kernel patch submission format"
http://linux.yyz.us/patch-format.html
Documentation/stable_api_nonsense.txt
This file describes the rationale behind the conscious decision to
not have a stable API within the kernel, including things like:
- Subsystem shim-layers (for compatibility?)
- Driver portability between Operating Systems.
- Mitigating rapid change within the kernel source tree (or
preventing rapid change)
This document is crucial for understanding the Linux development
philosophy and is very important for people moving to Linux from
development on other Operating Systems.
Documentation/SecurityBugs
If you feel you have found a security problem in the Linux kernel,
please follow the steps in this document to help notify the kernel
developers, and help solve the issue.
Documentation/ManagementStyle
This document describes how Linux kernel maintainers operate and the
shared ethos behind their methodologies. This is important reading
for anyone new to kernel development (or anyone simply curious about
it), as it resolves a lot of common misconceptions and confusion
about the unique behavior of kernel maintainers.
Documentation/stable_kernel_rules.txt
This file describes the rules on how the stable kernel releases
happen, and what to do if you want to get a change into one of these
releases.
Documentation/kernel-docs.txt
A list of external documentation that pertains to kernel
development. Please consult this list if you do not find what you
are looking for within the in-kernel documentation.
Documentation/applying-patches.txt
A good introduction describing exactly what a patch is and how to
apply it to the different development branches of the kernel.
The kernel also has a large number of documents that can be
automatically generated from the source code itself. This includes a
full description of the in-kernel API, and rules on how to handle
locking properly. The documents will be created in the
Documentation/DocBook/ directory and can be generated as PDF,
Postscript, HTML, and man pages by running:
make pdfdocs
make psdocs
make htmldocs
make mandocs
respectively from the main kernel source directory.
Becoming A Kernel Developer
---------------------------
If you do not know anything about Linux kernel development, you should
look at the Linux KernelNewbies project:
http://kernelnewbies.org
It consists of a helpful mailing list where you can ask almost any type
of basic kernel development question (make sure to search the archives
first, before asking something that has already been answered in the
past.) It also has an IRC channel that you can use to ask questions in
real-time, and a lot of helpful documentation that is useful for
learning about Linux kernel development.
The website has basic information about code organization, subsystems,
and current projects (both in-tree and out-of-tree). It also describes
some basic logistical information, like how to compile a kernel and
apply a patch.
If you do not know where you want to start, but you want to look for
some task to start doing to join into the kernel development community,
go to the Linux Kernel Janitor's project:
http://kernelnewbies.org/KernelJanitors
It is a great place to start. It describes a list of relatively simple
problems that need to be cleaned up and fixed within the Linux kernel
source tree. Working with the developers in charge of this project, you
will learn the basics of getting your patch into the Linux kernel tree,
and possibly be pointed in the direction of what to go work on next, if
you do not already have an idea.
If you already have a chunk of code that you want to put into the kernel
tree, but need some help getting it in the proper form, the
kernel-mentors project was created to help you out with this. It is a
mailing list, and can be found at:
http://selenic.com/mailman/listinfo/kernel-mentors
Before making any actual modifications to the Linux kernel code, it is
imperative to understand how the code in question works. For this
purpose, nothing is better than reading through it directly (most tricky
bits are commented well), perhaps even with the help of specialized
tools. One such tool that is particularly recommended is the Linux
Cross-Reference project, which is able to present source code in a
self-referential, indexed webpage format. An excellent up-to-date
repository of the kernel code may be found at:
http://lxr.free-electrons.com/
The development process
-----------------------
Linux kernel development process currently consists of a few different
main kernel "branches" and lots of different subsystem-specific kernel
branches. These different branches are:
- main 4.x kernel tree
- 4.x.y -stable kernel tree
- 4.x -git kernel patches
- subsystem specific kernel trees and patches
- the 4.x -next kernel tree for integration tests
4.x kernel tree
-----------------
4.x kernels are maintained by Linus Torvalds, and can be found on
kernel.org in the pub/linux/kernel/v4.x/ directory. Its development
process is as follows:
- As soon as a new kernel is released a two weeks window is open,
during this period of time maintainers can submit big diffs to
Linus, usually the patches that have already been included in the
-next kernel for a few weeks. The preferred way to submit big changes
is using git (the kernel's source management tool, more information
can be found at http://git-scm.com/) but plain patches are also just
fine.
- After two weeks a -rc1 kernel is released it is now possible to push
only patches that do not include new features that could affect the
stability of the whole kernel. Please note that a whole new driver
(or filesystem) might be accepted after -rc1 because there is no
risk of causing regressions with such a change as long as the change
is self-contained and does not affect areas outside of the code that
is being added. git can be used to send patches to Linus after -rc1
is released, but the patches need to also be sent to a public
mailing list for review.
- A new -rc is released whenever Linus deems the current git tree to
be in a reasonably sane state adequate for testing. The goal is to
release a new -rc kernel every week.
- Process continues until the kernel is considered "ready", the
process should last around 6 weeks.
It is worth mentioning what Andrew Morton wrote on the linux-kernel
mailing list about kernel releases:
"Nobody knows when a kernel will be released, because it's
released according to perceived bug status, not according to a
preconceived timeline."
4.x.y -stable kernel tree
-------------------------
Kernels with 3-part versions are -stable kernels. They contain
relatively small and critical fixes for security problems or significant
regressions discovered in a given 4.x kernel.
This is the recommended branch for users who want the most recent stable
kernel and are not interested in helping test development/experimental
versions.
If no 4.x.y kernel is available, then the highest numbered 4.x
kernel is the current stable kernel.
4.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and
are released as needs dictate. The normal release period is approximately
two weeks, but it can be longer if there are no pressing problems. A
security-related problem, instead, can cause a release to happen almost
instantly.
The file Documentation/stable_kernel_rules.txt in the kernel tree
documents what kinds of changes are acceptable for the -stable tree, and
how the release process works.
4.x -git patches
----------------
These are daily snapshots of Linus' kernel tree which are managed in a
git repository (hence the name.) These patches are usually released
daily and represent the current state of Linus' tree. They are more
experimental than -rc kernels since they are generated automatically
without even a cursory glance to see if they are sane.
Subsystem Specific kernel trees and patches
-------------------------------------------
The maintainers of the various kernel subsystems --- and also many
kernel subsystem developers --- expose their current state of
development in source repositories. That way, others can see what is
happening in the different areas of the kernel. In areas where
development is rapid, a developer may be asked to base his submissions
onto such a subsystem kernel tree so that conflicts between the
submission and other already ongoing work are avoided.
Most of these repositories are git trees, but there are also other SCMs
in use, or patch queues being published as quilt series. Addresses of
these subsystem repositories are listed in the MAINTAINERS file. Many
of them can be browsed at http://git.kernel.org/.
Before a proposed patch is committed to such a subsystem tree, it is
subject to review which primarily happens on mailing lists (see the
respective section below). For several kernel subsystems, this review
process is tracked with the tool patchwork. Patchwork offers a web
interface which shows patch postings, any comments on a patch or
revisions to it, and maintainers can mark patches as under review,
accepted, or rejected. Most of these patchwork sites are listed at
http://patchwork.kernel.org/.
4.x -next kernel tree for integration tests
-------------------------------------------
Before updates from subsystem trees are merged into the mainline 4.x
tree, they need to be integration-tested. For this purpose, a special
testing repository exists into which virtually all subsystem trees are
pulled on an almost daily basis:
http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git
This way, the -next kernel gives a summary outlook onto what will be
expected to go into the mainline kernel at the next merge period.
Adventurous testers are very welcome to runtime-test the -next kernel.
Bug Reporting
-------------
bugzilla.kernel.org is where the Linux kernel developers track kernel
bugs. Users are encouraged to report all bugs that they find in this
tool. For details on how to use the kernel bugzilla, please see:
http://bugzilla.kernel.org/page.cgi?id=faq.html
The file REPORTING-BUGS in the main kernel source directory has a good
template for how to report a possible kernel bug, and details what kind
of information is needed by the kernel developers to help track down the
problem.
Managing bug reports
--------------------
One of the best ways to put into practice your hacking skills is by fixing
bugs reported by other people. Not only you will help to make the kernel
more stable, you'll learn to fix real world problems and you will improve
your skills, and other developers will be aware of your presence. Fixing
bugs is one of the best ways to get merits among other developers, because
not many people like wasting time fixing other people's bugs.
To work in the already reported bug reports, go to http://bugzilla.kernel.org.
If you want to be advised of the future bug reports, you can subscribe to the
bugme-new mailing list (only new bug reports are mailed here) or to the
bugme-janitor mailing list (every change in the bugzilla is mailed here)
http://lists.linux-foundation.org/mailman/listinfo/bugme-new
http://lists.linux-foundation.org/mailman/listinfo/bugme-janitors
Mailing lists
-------------
As some of the above documents describe, the majority of the core kernel
developers participate on the Linux Kernel Mailing list. Details on how
to subscribe and unsubscribe from the list can be found at:
http://vger.kernel.org/vger-lists.html#linux-kernel
There are archives of the mailing list on the web in many different
places. Use a search engine to find these archives. For example:
http://dir.gmane.org/gmane.linux.kernel
It is highly recommended that you search the archives about the topic
you want to bring up, before you post it to the list. A lot of things
already discussed in detail are only recorded at the mailing list
archives.
Most of the individual kernel subsystems also have their own separate
mailing list where they do their development efforts. See the
MAINTAINERS file for a list of what these lists are for the different
groups.
Many of the lists are hosted on kernel.org. Information on them can be
found at:
http://vger.kernel.org/vger-lists.html
Please remember to follow good behavioral habits when using the lists.
Though a bit cheesy, the following URL has some simple guidelines for
interacting with the list (or any list):
http://www.albion.com/netiquette/
If multiple people respond to your mail, the CC: list of recipients may
get pretty large. Don't remove anybody from the CC: list without a good
reason, or don't reply only to the list address. Get used to receiving the
mail twice, one from the sender and the one from the list, and don't try
to tune that by adding fancy mail-headers, people will not like it.
Remember to keep the context and the attribution of your replies intact,
keep the "John Kernelhacker wrote ...:" lines at the top of your reply, and
add your statements between the individual quoted sections instead of
writing at the top of the mail.
If you add patches to your mail, make sure they are plain readable text
as stated in Documentation/SubmittingPatches. Kernel developers don't
want to deal with attachments or compressed patches; they may want
to comment on individual lines of your patch, which works only that way.
Make sure you use a mail program that does not mangle spaces and tab
characters. A good first test is to send the mail to yourself and try
to apply your own patch by yourself. If that doesn't work, get your
mail program fixed or change it until it works.
Above all, please remember to show respect to other subscribers.
Working with the community
--------------------------
The goal of the kernel community is to provide the best possible kernel
there is. When you submit a patch for acceptance, it will be reviewed
on its technical merits and those alone. So, what should you be
expecting?
- criticism
- comments
- requests for change
- requests for justification
- silence
Remember, this is part of getting your patch into the kernel. You have
to be able to take criticism and comments about your patches, evaluate
them at a technical level and either rework your patches or provide
clear and concise reasoning as to why those changes should not be made.
If there are no responses to your posting, wait a few days and try
again, sometimes things get lost in the huge volume.
What should you not do?
- expect your patch to be accepted without question
- become defensive
- ignore comments
- resubmit the patch without making any of the requested changes
In a community that is looking for the best technical solution possible,
there will always be differing opinions on how beneficial a patch is.
You have to be cooperative, and willing to adapt your idea to fit within
the kernel. Or at least be willing to prove your idea is worth it.
Remember, being wrong is acceptable as long as you are willing to work
toward a solution that is right.
It is normal that the answers to your first patch might simply be a list
of a dozen things you should correct. This does _not_ imply that your
patch will not be accepted, and it is _not_ meant against you
personally. Simply correct all issues raised against your patch and
resend it.
Differences between the kernel community and corporate structures
-----------------------------------------------------------------
The kernel community works differently than most traditional corporate
development environments. Here are a list of things that you can try to
do to avoid problems:
Good things to say regarding your proposed changes:
- "This solves multiple problems."
- "This deletes 2000 lines of code."
- "Here is a patch that explains what I am trying to describe."
- "I tested it on 5 different architectures..."
- "Here is a series of small patches that..."
- "This increases performance on typical machines..."
Bad things you should avoid saying:
- "We did it this way in AIX/ptx/Solaris, so therefore it must be
good..."
- "I've being doing this for 20 years, so..."
- "This is required for my company to make money"
- "This is for our Enterprise product line."
- "Here is my 1000 page design document that describes my idea"
- "I've been working on this for 6 months..."
- "Here's a 5000 line patch that..."
- "I rewrote all of the current mess, and here it is..."
- "I have a deadline, and this patch needs to be applied now."
Another way the kernel community is different than most traditional
software engineering work environments is the faceless nature of
interaction. One benefit of using email and irc as the primary forms of
communication is the lack of discrimination based on gender or race.
The Linux kernel work environment is accepting of women and minorities
because all you are is an email address. The international aspect also
helps to level the playing field because you can't guess gender based on
a person's name. A man may be named Andrea and a woman may be named Pat.
Most women who have worked in the Linux kernel and have expressed an
opinion have had positive experiences.
The language barrier can cause problems for some people who are not
comfortable with English. A good grasp of the language can be needed in
order to get ideas across properly on mailing lists, so it is
recommended that you check your emails to make sure they make sense in
English before sending them.
Break up your changes
---------------------
The Linux kernel community does not gladly accept large chunks of code
dropped on it all at once. The changes need to be properly introduced,
discussed, and broken up into tiny, individual portions. This is almost
the exact opposite of what companies are used to doing. Your proposal
should also be introduced very early in the development process, so that
you can receive feedback on what you are doing. It also lets the
community feel that you are working with them, and not simply using them
as a dumping ground for your feature. However, don't send 50 emails at
one time to a mailing list, your patch series should be smaller than
that almost all of the time.
The reasons for breaking things up are the following:
1) Small patches increase the likelihood that your patches will be
applied, since they don't take much time or effort to verify for
correctness. A 5 line patch can be applied by a maintainer with
barely a second glance. However, a 500 line patch may take hours to
review for correctness (the time it takes is exponentially
proportional to the size of the patch, or something).
Small patches also make it very easy to debug when something goes
wrong. It's much easier to back out patches one by one than it is
to dissect a very large patch after it's been applied (and broken
something).
2) It's important not only to send small patches, but also to rewrite
and simplify (or simply re-order) patches before submitting them.
Here is an analogy from kernel developer Al Viro:
"Think of a teacher grading homework from a math student. The
teacher does not want to see the student's trials and errors
before they came up with the solution. They want to see the
cleanest, most elegant answer. A good student knows this, and
would never submit her intermediate work before the final
solution."
The same is true of kernel development. The maintainers and
reviewers do not want to see the thought process behind the
solution to the problem one is solving. They want to see a
simple and elegant solution."
It may be challenging to keep the balance between presenting an elegant
solution and working together with the community and discussing your
unfinished work. Therefore it is good to get early in the process to
get feedback to improve your work, but also keep your changes in small
chunks that they may get already accepted, even when your whole task is
not ready for inclusion now.
Also realize that it is not acceptable to send patches for inclusion
that are unfinished and will be "fixed up later."
Justify your change
-------------------
Along with breaking up your patches, it is very important for you to let
the Linux community know why they should add this change. New features
must be justified as being needed and useful.
Document your change
--------------------
When sending in your patches, pay special attention to what you say in
the text in your email. This information will become the ChangeLog
information for the patch, and will be preserved for everyone to see for
all time. It should describe the patch completely, containing:
- why the change is necessary
- the overall design approach in the patch
- implementation details
- testing results
For more details on what this should all look like, please see the
ChangeLog section of the document:
"The Perfect Patch"
http://www.ozlabs.org/~akpm/stuff/tpp.txt
All of these things are sometimes very hard to do. It can take years to
perfect these practices (if at all). It's a continuous process of
improvement that requires a lot of patience and determination. But
don't give up, it's possible. Many have done it before, and each had to
start exactly where you are now.
----------
Thanks to Paolo Ciarrocchi who allowed the "Development Process"
(http://lwn.net/Articles/94386/) section
to be based on text he had written, and to Randy Dunlap and Gerrit
Huizenga for some of the list of things you should and should not say.
Also thanks to Pat Mochel, Hanna Linder, Randy Dunlap, Kay Sievers,
Vojtech Pavlik, Jan Kara, Josh Boyer, Kees Cook, Andrew Morton, Andi
Kleen, Vadim Lobanov, Jesper Juhl, Adrian Bunk, Keri Harris, Frans Pop,
David A. Wheeler, Junio Hamano, Michael Kerrisk, and Alex Shepard for
their review, comments, and contributions. Without their help, this
document would not have been possible.
Maintainer: Greg Kroah-Hartman <greg@kroah.com>

View File

@@ -111,6 +111,8 @@ ipmi_ssif - A driver for accessing BMCs on the SMBus. It uses the
I2C kernel driver's SMBus interfaces to send and receive IPMI messages
over the SMBus.
ipmi_powernv - A driver for access BMCs on POWERNV systems.
ipmi_watchdog - IPMI requires systems to have a very capable watchdog
timer. This driver implements the standard Linux watchdog timer
interface on top of the IPMI message handler.
@@ -118,17 +120,15 @@ interface on top of the IPMI message handler.
ipmi_poweroff - Some systems support the ability to be turned off via
IPMI commands.
These are all individually selectable via configuration options.
bt-bmc - This is not part of the main driver, but instead a driver for
accessing a BMC-side interface of a BT interface. It is used on BMCs
running Linux to provide an interface to the host.
Note that the KCS-only interface has been removed. The af_ipmi driver
is no longer supported and has been removed because it was impossible
to do 32 bit emulation on 64-bit kernels with it.
These are all individually selectable via configuration options.
Much documentation for the interface is in the include files. The
IPMI include files are:
net/af_ipmi.h - Contains the socket interface.
linux/ipmi.h - Contains the user interface and IOCTL interface for IPMI.
linux/ipmi_smi.h - Contains the interface for system management interfaces
@@ -245,6 +245,16 @@ addressed (because some boards actually have multiple BMCs on them)
and the user should not have to care what type of SMI is below them.
Watching For Interfaces
When your code comes up, the IPMI driver may or may not have detected
if IPMI devices exist. So you might have to defer your setup until
the device is detected, or you might be able to do it immediately.
To handle this, and to allow for discovery, you register an SMI
watcher with ipmi_smi_watcher_register() to iterate over interfaces
and tell you when they come and go.
Creating the User
To user the message handler, you must first create a user using
@@ -263,7 +273,7 @@ closing the device automatically destroys the user.
Messaging
To send a message from kernel-land, the ipmi_request() call does
To send a message from kernel-land, the ipmi_request_settime() call does
pretty much all message handling. Most of the parameter are
self-explanatory. However, it takes a "msgid" parameter. This is NOT
the sequence number of messages. It is simply a long value that is
@@ -352,11 +362,12 @@ that for more details.
The SI Driver
-------------
The SI driver allows up to 4 KCS or SMIC interfaces to be configured
in the system. By default, scan the ACPI tables for interfaces, and
if it doesn't find any the driver will attempt to register one KCS
interface at the spec-specified I/O port 0xca2 without interrupts.
You can change this at module load time (for a module) with:
The SI driver allows KCS, BT, and SMIC interfaces to be configured
in the system. It discovers interfaces through a host of different
methods, depending on the system.
You can specify up to four interfaces on the module load line and
control some module parameters:
modprobe ipmi_si.o type=<type1>,<type2>....
ports=<port1>,<port2>... addrs=<addr1>,<addr2>...
@@ -367,7 +378,7 @@ You can change this at module load time (for a module) with:
force_kipmid=<enable1>,<enable2>,...
kipmid_max_busy_us=<ustime1>,<ustime2>,...
unload_when_empty=[0|1]
trydefaults=[0|1] trydmi=[0|1] tryacpi=[0|1]
trydmi=[0|1] tryacpi=[0|1]
tryplatform=[0|1] trypci=[0|1]
Each of these except try... items is a list, the first item for the
@@ -386,10 +397,6 @@ use the I/O port given as the device address.
If you specify irqs as non-zero for an interface, the driver will
attempt to use the given interrupt for the device.
trydefaults sets whether the standard IPMI interface at 0xca2 and
any interfaces specified by ACPE are tried. By default, the driver
tries it, set this value to zero to turn this off.
The other try... items disable discovery by their corresponding
names. These are all enabled by default, set them to zero to disable
them. The tryplatform disables openfirmware.
@@ -434,7 +441,7 @@ kernel command line as:
ipmi_si.type=<type1>,<type2>...
ipmi_si.ports=<port1>,<port2>... ipmi_si.addrs=<addr1>,<addr2>...
ipmi_si.irqs=<irq1>,<irq2>... ipmi_si.trydefaults=[0|1]
ipmi_si.irqs=<irq1>,<irq2>...
ipmi_si.regspacings=<sp1>,<sp2>,...
ipmi_si.regsizes=<size1>,<size2>,...
ipmi_si.regshifts=<shift1>,<shift2>,...
@@ -444,11 +451,6 @@ kernel command line as:
It works the same as the module parameters of the same names.
By default, the driver will attempt to detect any device specified by
ACPI, and if none of those then a KCS device at the spec-specified
0xca2. If you want to turn this off, set the "trydefaults" option to
false.
If your IPMI interface does not support interrupts and is a KCS or
SMIC interface, the IPMI driver will start a kernel thread for the
interface to help speed things up. This is a low-priority kernel
@@ -500,7 +502,8 @@ at module load time (for a module) with:
addr=<i2caddr1>[,<i2caddr2>[,...]]
adapter=<adapter1>[,<adapter2>[...]]
dbg=<flags1>,<flags2>...
slave_addrs=<addr1>,<addr2>,...
slave_addrs=<addr1>,<addr2>,...
tryacpi=[0|1] trydmi=[0|1]
[dbg_probe=1]
The addresses are normal I2C addresses. The adapter is the string
@@ -513,6 +516,9 @@ spaces in kernel parameters.
The debug flags are bit flags for each BMC found, they are:
IPMI messages: 1, driver state: 2, timing: 4, I2C probe: 8
The tryxxx parameters can be used to disable detecting interfaces
from various sources.
Setting dbg_probe to 1 will enable debugging of the probing and
detection process for BMCs on the SMBusses.
@@ -535,7 +541,8 @@ kernel command line as:
ipmi_ssif.adapter=<adapter1>[,<adapter2>[...]]
ipmi_ssif.dbg=<flags1>[,<flags2>[...]]
ipmi_ssif.dbg_probe=1
ipmi_ssif.slave_addrs=<addr1>[,<addr2>[...]]
ipmi_ssif.slave_addrs=<addr1>[,<addr2>[...]]
ipmi_ssif.tryacpi=[0|1] ipmi_ssif.trydmi=[0|1]
These are the same options as on the module command line.

View File

@@ -1,3 +1 @@
subdir-y := accounting auxdisplay blackfin \
filesystems filesystems ia64 laptops mic misc-devices \
networking pcmcia prctl ptp timers vDSO watchdog
subdir-y :=

View File

@@ -10,6 +10,8 @@ _SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(src
SPHINX_CONF = conf.py
PAPER =
BUILDDIR = $(obj)/output
PDFLATEX = xelatex
LATEXOPTS = -interaction=batchmode
# User-friendly check for sphinx-build
HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi)
@@ -29,7 +31,7 @@ else ifneq ($(DOCBOOKS),)
else # HAVE_SPHINX
# User-friendly check for pdflatex
HAVE_PDFLATEX := $(shell if which xelatex >/dev/null 2>&1; then echo 1; else echo 0; fi)
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
@@ -51,8 +53,8 @@ loop_cmd = $(echo-cmd) $(cmd_$(1))
# $5 reST source folder relative to $(srctree)/$(src),
# e.g. "media" for the linux-tv book-set at ./Documentation/media
quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4);
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media all;\
quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2;\
BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \
$(SPHINXBUILD) \
-b $2 \
@@ -67,16 +69,19 @@ htmldocs:
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
latexdocs:
ifeq ($(HAVE_PDFLATEX),0)
$(warning The 'xelatex' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
@echo " SKIP Sphinx $@ target."
else # HAVE_PDFLATEX
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
endif # HAVE_PDFLATEX
ifeq ($(HAVE_PDFLATEX),0)
pdfdocs:
$(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
@echo " SKIP Sphinx $@ target."
else # HAVE_PDFLATEX
pdfdocs: latexdocs
ifneq ($(HAVE_PDFLATEX),0)
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=xelatex LATEXOPTS="-interaction=nonstopmode" -C $(BUILDDIR)/$(var)/latex)
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex;)
endif # HAVE_PDFLATEX
epubdocs:
@@ -93,6 +98,7 @@ installmandocs:
cleandocs:
$(Q)rm -rf $(BUILDDIR)
$(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) -C Documentation/media clean
endif # HAVE_SPHINX

View File

@@ -1,276 +0,0 @@
Linux kernel management style
This is a short document describing the preferred (or made up, depending
on who you ask) management style for the linux kernel. It's meant to
mirror the CodingStyle document to some degree, and mainly written to
avoid answering (*) the same (or similar) questions over and over again.
Management style is very personal and much harder to quantify than
simple coding style rules, so this document may or may not have anything
to do with reality. It started as a lark, but that doesn't mean that it
might not actually be true. You'll have to decide for yourself.
Btw, when talking about "kernel manager", it's all about the technical
lead persons, not the people who do traditional management inside
companies. If you sign purchase orders or you have any clue about the
budget of your group, you're almost certainly not a kernel manager.
These suggestions may or may not apply to you.
First off, I'd suggest buying "Seven Habits of Highly Effective
People", and NOT read it. Burn it, it's a great symbolic gesture.
(*) This document does so not so much by answering the question, but by
making it painfully obvious to the questioner that we don't have a clue
to what the answer is.
Anyway, here goes:
Chapter 1: Decisions
Everybody thinks managers make decisions, and that decision-making is
important. The bigger and more painful the decision, the bigger the
manager must be to make it. That's very deep and obvious, but it's not
actually true.
The name of the game is to _avoid_ having to make a decision. In
particular, if somebody tells you "choose (a) or (b), we really need you
to decide on this", you're in trouble as a manager. The people you
manage had better know the details better than you, so if they come to
you for a technical decision, you're screwed. You're clearly not
competent to make that decision for them.
(Corollary:if the people you manage don't know the details better than
you, you're also screwed, although for a totally different reason.
Namely that you are in the wrong job, and that _they_ should be managing
your brilliance instead).
So the name of the game is to _avoid_ decisions, at least the big and
painful ones. Making small and non-consequential decisions is fine, and
makes you look like you know what you're doing, so what a kernel manager
needs to do is to turn the big and painful ones into small things where
nobody really cares.
It helps to realize that the key difference between a big decision and a
small one is whether you can fix your decision afterwards. Any decision
can be made small by just always making sure that if you were wrong (and
you _will_ be wrong), you can always undo the damage later by
backtracking. Suddenly, you get to be doubly managerial for making
_two_ inconsequential decisions - the wrong one _and_ the right one.
And people will even see that as true leadership (*cough* bullshit
*cough*).
Thus the key to avoiding big decisions becomes to just avoiding to do
things that can't be undone. Don't get ushered into a corner from which
you cannot escape. A cornered rat may be dangerous - a cornered manager
is just pitiful.
It turns out that since nobody would be stupid enough to ever really let
a kernel manager have huge fiscal responsibility _anyway_, it's usually
fairly easy to backtrack. Since you're not going to be able to waste
huge amounts of money that you might not be able to repay, the only
thing you can backtrack on is a technical decision, and there
back-tracking is very easy: just tell everybody that you were an
incompetent nincompoop, say you're sorry, and undo all the worthless
work you had people work on for the last year. Suddenly the decision
you made a year ago wasn't a big decision after all, since it could be
easily undone.
It turns out that some people have trouble with this approach, for two
reasons:
- admitting you were an idiot is harder than it looks. We all like to
maintain appearances, and coming out in public to say that you were
wrong is sometimes very hard indeed.
- having somebody tell you that what you worked on for the last year
wasn't worthwhile after all can be hard on the poor lowly engineers
too, and while the actual _work_ was easy enough to undo by just
deleting it, you may have irrevocably lost the trust of that
engineer. And remember: "irrevocable" was what we tried to avoid in
the first place, and your decision ended up being a big one after
all.
Happily, both of these reasons can be mitigated effectively by just
admitting up-front that you don't have a friggin' clue, and telling
people ahead of the fact that your decision is purely preliminary, and
might be the wrong thing. You should always reserve the right to change
your mind, and make people very _aware_ of that. And it's much easier
to admit that you are stupid when you haven't _yet_ done the really
stupid thing.
Then, when it really does turn out to be stupid, people just roll their
eyes and say "Oops, he did it again".
This preemptive admission of incompetence might also make the people who
actually do the work also think twice about whether it's worth doing or
not. After all, if _they_ aren't certain whether it's a good idea, you
sure as hell shouldn't encourage them by promising them that what they
work on will be included. Make them at least think twice before they
embark on a big endeavor.
Remember: they'd better know more about the details than you do, and
they usually already think they have the answer to everything. The best
thing you can do as a manager is not to instill confidence, but rather a
healthy dose of critical thinking on what they do.
Btw, another way to avoid a decision is to plaintively just whine "can't
we just do both?" and look pitiful. Trust me, it works. If it's not
clear which approach is better, they'll eventually figure it out. The
answer may end up being that both teams get so frustrated by the
situation that they just give up.
That may sound like a failure, but it's usually a sign that there was
something wrong with both projects, and the reason the people involved
couldn't decide was that they were both wrong. You end up coming up
smelling like roses, and you avoided yet another decision that you could
have screwed up on.
Chapter 2: People
Most people are idiots, and being a manager means you'll have to deal
with it, and perhaps more importantly, that _they_ have to deal with
_you_.
It turns out that while it's easy to undo technical mistakes, it's not
as easy to undo personality disorders. You just have to live with
theirs - and yours.
However, in order to prepare yourself as a kernel manager, it's best to
remember not to burn any bridges, bomb any innocent villagers, or
alienate too many kernel developers. It turns out that alienating people
is fairly easy, and un-alienating them is hard. Thus "alienating"
immediately falls under the heading of "not reversible", and becomes a
no-no according to Chapter 1.
There's just a few simple rules here:
(1) don't call people d*ckheads (at least not in public)
(2) learn how to apologize when you forgot rule (1)
The problem with #1 is that it's very easy to do, since you can say
"you're a d*ckhead" in millions of different ways (*), sometimes without
even realizing it, and almost always with a white-hot conviction that
you are right.
And the more convinced you are that you are right (and let's face it,
you can call just about _anybody_ a d*ckhead, and you often _will_ be
right), the harder it ends up being to apologize afterwards.
To solve this problem, you really only have two options:
- get really good at apologies
- spread the "love" out so evenly that nobody really ends up feeling
like they get unfairly targeted. Make it inventive enough, and they
might even be amused.
The option of being unfailingly polite really doesn't exist. Nobody will
trust somebody who is so clearly hiding his true character.
(*) Paul Simon sang "Fifty Ways to Leave Your Lover", because quite
frankly, "A Million Ways to Tell a Developer He Is a D*ckhead" doesn't
scan nearly as well. But I'm sure he thought about it.
Chapter 3: People II - the Good Kind
While it turns out that most people are idiots, the corollary to that is
sadly that you are one too, and that while we can all bask in the secure
knowledge that we're better than the average person (let's face it,
nobody ever believes that they're average or below-average), we should
also admit that we're not the sharpest knife around, and there will be
other people that are less of an idiot than you are.
Some people react badly to smart people. Others take advantage of them.
Make sure that you, as a kernel maintainer, are in the second group.
Suck up to them, because they are the people who will make your job
easier. In particular, they'll be able to make your decisions for you,
which is what the game is all about.
So when you find somebody smarter than you are, just coast along. Your
management responsibilities largely become ones of saying "Sounds like a
good idea - go wild", or "That sounds good, but what about xxx?". The
second version in particular is a great way to either learn something
new about "xxx" or seem _extra_ managerial by pointing out something the
smarter person hadn't thought about. In either case, you win.
One thing to look out for is to realize that greatness in one area does
not necessarily translate to other areas. So you might prod people in
specific directions, but let's face it, they might be good at what they
do, and suck at everything else. The good news is that people tend to
naturally gravitate back to what they are good at, so it's not like you
are doing something irreversible when you _do_ prod them in some
direction, just don't push too hard.
Chapter 4: Placing blame
Things will go wrong, and people want somebody to blame. Tag, you're it.
It's not actually that hard to accept the blame, especially if people
kind of realize that it wasn't _all_ your fault. Which brings us to the
best way of taking the blame: do it for another guy. You'll feel good
for taking the fall, he'll feel good about not getting blamed, and the
guy who lost his whole 36GB porn-collection because of your incompetence
will grudgingly admit that you at least didn't try to weasel out of it.
Then make the developer who really screwed up (if you can find him) know
_in_private_ that he screwed up. Not just so he can avoid it in the
future, but so that he knows he owes you one. And, perhaps even more
importantly, he's also likely the person who can fix it. Because, let's
face it, it sure ain't you.
Taking the blame is also why you get to be manager in the first place.
It's part of what makes people trust you, and allow you the potential
glory, because you're the one who gets to say "I screwed up". And if
you've followed the previous rules, you'll be pretty good at saying that
by now.
Chapter 5: Things to avoid
There's one thing people hate even more than being called "d*ckhead",
and that is being called a "d*ckhead" in a sanctimonious voice. The
first you can apologize for, the second one you won't really get the
chance. They likely will no longer be listening even if you otherwise
do a good job.
We all think we're better than anybody else, which means that when
somebody else puts on airs, it _really_ rubs us the wrong way. You may
be morally and intellectually superior to everybody around you, but
don't try to make it too obvious unless you really _intend_ to irritate
somebody (*).
Similarly, don't be too polite or subtle about things. Politeness easily
ends up going overboard and hiding the problem, and as they say, "On the
internet, nobody can hear you being subtle". Use a big blunt object to
hammer the point in, because you can't really depend on people getting
your point otherwise.
Some humor can help pad both the bluntness and the moralizing. Going
overboard to the point of being ridiculous can drive a point home
without making it painful to the recipient, who just thinks you're being
silly. It can thus help get through the personal mental block we all
have about criticism.
(*) Hint: internet newsgroups that are not directly related to your work
are great ways to take out your frustrations at other people. Write
insulting posts with a sneer just to get into a good flame every once in
a while, and you'll feel cleansed. Just don't crap too close to home.
Chapter 6: Why me?
Since your main responsibility seems to be to take the blame for other
peoples mistakes, and make it painfully obvious to everybody else that
you're incompetent, the obvious question becomes one of why do it in the
first place?
First off, while you may or may not get screaming teenage girls (or
boys, let's not be judgmental or sexist here) knocking on your dressing
room door, you _will_ get an immense feeling of personal accomplishment
for being "in charge". Never mind the fact that you're really leading
by trying to keep up with everybody else and running after them as fast
as you can. Everybody will still think you're the person in charge.
It's a great job if you can hack it.

View File

@@ -49,25 +49,17 @@ depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
CONFIG_PCIEAER = y.
2.2 Load PCI Express AER Root Driver
There is a case where a system has AER support in BIOS. Enabling the AER
Root driver and having AER support in BIOS may result unpredictable
behavior. To avoid this conflict, a successful load of the AER Root driver
requires ACPI _OSC support in the BIOS to allow the AER Root driver to
request for native control of AER. See the PCI FW 3.0 Specification for
details regarding OSC usage. Currently, lots of firmwares don't provide
_OSC support while they use PCI Express. To support such firmwares,
forceload, a parameter of type bool, could enable AER to continue to
be initiated although firmwares have no _OSC support. To enable the
walkaround, pls. add aerdriver.forceload=y to kernel boot parameter line
when booting kernel. Note that forceload=n by default.
nosourceid, another parameter of type bool, can be used when broken
hardware (mostly chipsets) has root ports that cannot obtain the reporting
source ID. nosourceid=n by default.
Some systems have AER support in firmware. Enabling Linux AER support at
the same time the firmware handles AER may result in unpredictable
behavior. Therefore, Linux does not handle AER events unless the firmware
grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
Specification for details regarding _OSC usage.
2.3 AER error output
When a PCI-E AER error is captured, an error message will be outputted to
console. If it's a correctable error, it is outputted as a warning.
When a PCIe AER error is captured, an error message will be output to
console. If it's a correctable error, it is output as a warning.
Otherwise, it is printed as an error. So users could choose different
log level to filter out correctable error messages.

View File

@@ -547,7 +547,7 @@ The <tt>rcu_access_pointer()</tt> on line&nbsp;6 is similar to
It could reuse a value formerly fetched from this same pointer.
It could also fetch the pointer from <tt>gp</tt> in a byte-at-a-time
manner, resulting in <i>load tearing</i>, in turn resulting a bytewise
mash-up of two distince pointer values.
mash-up of two distinct pointer values.
It might even use value-speculation optimizations, where it makes
a wrong guess, but by the time it gets around to checking the
value, an update has changed the pointer to match the wrong guess.
@@ -659,6 +659,29 @@ systems with more than one CPU:
In other words, a given instance of <tt>synchronize_rcu()</tt>
can avoid waiting on a given RCU read-side critical section only
if it can prove that <tt>synchronize_rcu()</tt> started first.
<p>
A related question is &ldquo;When <tt>rcu_read_lock()</tt>
doesn't generate any code, why does it matter how it relates
to a grace period?&rdquo;
The answer is that it is not the relationship of
<tt>rcu_read_lock()</tt> itself that is important, but rather
the relationship of the code within the enclosed RCU read-side
critical section to the code preceding and following the
grace period.
If we take this viewpoint, then a given RCU read-side critical
section begins before a given grace period when some access
preceding the grace period observes the effect of some access
within the critical section, in which case none of the accesses
within the critical section may observe the effects of any
access following the grace period.
<p>
As of late 2016, mathematical models of RCU take this
viewpoint, for example, see slides&nbsp;62 and&nbsp;63
of the
<a href="http://www2.rdrop.com/users/paulmck/scalability/paper/LinuxMM.2016.10.04c.LCE.pdf">2016 LinuxCon EU</a>
presentation.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
@@ -2493,6 +2516,28 @@ or some future &ldquo;lazy&rdquo;
variant of <tt>call_rcu()</tt> that might one day be created for
energy-efficiency purposes.
<p>
That said, there are limits.
RCU requires that the <tt>rcu_head</tt> structure be aligned to a
two-byte boundary, and passing a misaligned <tt>rcu_head</tt>
structure to one of the <tt>call_rcu()</tt> family of functions
will result in a splat.
It is therefore necessary to exercise caution when packing
structures containing fields of type <tt>rcu_head</tt>.
Why not a four-byte or even eight-byte alignment requirement?
Because the m68k architecture provides only two-byte alignment,
and thus acts as alignment's least common denominator.
<p>
The reason for reserving the bottom bit of pointers to
<tt>rcu_head</tt> structures is to leave the door open to
&ldquo;lazy&rdquo; callbacks whose invocations can safely be deferred.
Deferring invocation could potentially have energy-efficiency
benefits, but only if the rate of non-lazy callbacks decreases
significantly for some important workload.
In the meantime, reserving the bottom bit keeps this option open
in case it one day becomes useful.
<h3><a name="Performance, Scalability, Response Time, and Reliability">
Performance, Scalability, Response Time, and Reliability</a></h3>

View File

@@ -57,7 +57,7 @@ Call Trace:
[<ffffffff817db154>] kernel_thread_helper+0x4/0x10
[<ffffffff81066430>] ? finish_task_switch+0x80/0x110
[<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe
[<ffffffff81097510>] ? __init_kthread_worker+0x70/0x70
[<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70
[<ffffffff817db150>] ? gs_change+0xb/0xb
Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows:

View File

@@ -10,21 +10,6 @@ status messages via printk(), which can be examined via the dmesg
command (perhaps grepping for "torture"). The test is started
when the module is loaded, and stops when the module is unloaded.
CONFIG_RCU_TORTURE_TEST_RUNNABLE
It is also possible to specify CONFIG_RCU_TORTURE_TEST=y, which will
result in the tests being loaded into the base kernel. In this case,
the CONFIG_RCU_TORTURE_TEST_RUNNABLE config option is used to specify
whether the RCU torture tests are to be started immediately during
boot or whether the /proc/sys/kernel/rcutorture_runnable file is used
to enable them. This /proc file can be used to repeatedly pause and
restart the tests, regardless of the initial state specified by the
CONFIG_RCU_TORTURE_TEST_RUNNABLE config option.
You will normally -not- want to start the RCU torture tests during boot
(and thus the default is CONFIG_RCU_TORTURE_TEST_RUNNABLE=n), but doing
this can sometimes be useful in finding boot-time bugs.
MODULE PARAMETERS

View File

@@ -237,7 +237,7 @@ rcu_dereference()
The reader uses rcu_dereference() to fetch an RCU-protected
pointer, which returns a value that may then be safely
dereferenced. Note that rcu_deference() does not actually
dereferenced. Note that rcu_dereference() does not actually
dereference the pointer, instead, it protects the pointer for
later dereferencing. It also executes any needed memory-barrier
instructions for a given CPU architecture. Currently, only Alpha

View File

@@ -1,38 +0,0 @@
Linux kernel developers take security very seriously. As such, we'd
like to know when a security bug is found so that it can be fixed and
disclosed as quickly as possible. Please report security bugs to the
Linux kernel security team.
1) Contact
The Linux kernel security team can be contacted by email at
<security@kernel.org>. This is a private list of security officers
who will help verify the bug report and develop and release a fix.
It is possible that the security team will bring in extra help from
area maintainers to understand and fix the security vulnerability.
As it is with any bug, the more information provided the easier it
will be to diagnose and fix. Please review the procedure outlined in
REPORTING-BUGS if you are unclear about what information is helpful.
Any exploit code is very helpful and will not be released without
consent from the reporter unless it has already been made public.
2) Disclosure
The goal of the Linux kernel security team is to work with the
bug submitter to bug resolution as well as disclosure. We prefer
to fully disclose the bug as soon as possible. It is reasonable to
delay disclosure when the bug or the fix is not yet fully understood,
the solution is not well-tested or for vendor coordination. However, we
expect these delays to be short, measurable in days, not weeks or months.
A disclosure date is negotiated by the security team working with the
bug submitter as well as vendors. However, the kernel security team
holds the final say when setting a disclosure date. The timeframe for
disclosure is from immediate (esp. if it's already publicly known)
to a few weeks. As a basic default policy, we expect report date to
disclosure date to be on the order of 7 days.
3) Non-disclosure agreements
The Linux kernel security team is not a formal body and therefore unable
to enter any non-disclosure agreements.

View File

@@ -1,109 +0,0 @@
Linux Kernel patch submission checklist
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here are some basic things that developers should do if they want to see their
kernel patch submissions accepted more quickly.
These are all above and beyond the documentation that is provided in
Documentation/SubmittingPatches and elsewhere regarding submitting Linux
kernel patches.
1: If you use a facility then #include the file that defines/declares
that facility. Don't depend on other header files pulling in ones
that you use.
2: Builds cleanly with applicable or modified CONFIG options =y, =m, and
=n. No gcc warnings/errors, no linker warnings/errors.
2b: Passes allnoconfig, allmodconfig
2c: Builds successfully when using O=builddir
3: Builds on multiple CPU architectures by using local cross-compile tools
or some other build farm.
4: ppc64 is a good architecture for cross-compilation checking because it
tends to use `unsigned long' for 64-bit quantities.
5: Check your patch for general style as detailed in
Documentation/CodingStyle. Check for trivial violations with the
patch style checker prior to submission (scripts/checkpatch.pl).
You should be able to justify all violations that remain in
your patch.
6: Any new or modified CONFIG options don't muck up the config menu.
7: All new Kconfig options have help text.
8: Has been carefully reviewed with respect to relevant Kconfig
combinations. This is very hard to get right with testing -- brainpower
pays off here.
9: Check cleanly with sparse.
10: Use 'make checkstack' and 'make namespacecheck' and fix any problems
that they find. Note: checkstack does not point out problems explicitly,
but any one function that uses more than 512 bytes on the stack is a
candidate for change.
11: Include kernel-doc to document global kernel APIs. (Not required for
static functions, but OK there also.) Use 'make htmldocs' or 'make
mandocs' to check the kernel-doc and fix any issues.
12: Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_ATOMIC_SLEEP, CONFIG_PROVE_RCU
and CONFIG_DEBUG_OBJECTS_RCU_HEAD all simultaneously enabled.
13: Has been build- and runtime tested with and without CONFIG_SMP and
CONFIG_PREEMPT.
14: If the patch affects IO/Disk, etc: has been tested with and without
CONFIG_LBDAF.
15: All codepaths have been exercised with all lockdep features enabled.
16: All new /proc entries are documented under Documentation/
17: All new kernel boot parameters are documented in
Documentation/kernel-parameters.txt.
18: All new module parameters are documented with MODULE_PARM_DESC()
19: All new userspace interfaces are documented in Documentation/ABI/.
See Documentation/ABI/README for more information.
Patches that change userspace interfaces should be CCed to
linux-api@vger.kernel.org.
20: Check that it all passes `make headers_check'.
21: Has been checked with injection of at least slab and page-allocation
failures. See Documentation/fault-injection/.
If the new code is substantial, addition of subsystem-specific fault
injection might be appropriate.
22: Newly-added code has been compiled with `gcc -W' (use "make
EXTRA_CFLAGS=-W"). This will generate lots of noise, but is good for
finding bugs like "warning: comparison between signed and unsigned".
23: Tested after it has been merged into the -mm patchset to make sure
that it still works with all of the other queued patches and various
changes in the VM, VFS, and other subsystems.
24: All memory barriers {e.g., barrier(), rmb(), wmb()} need a comment in the
source code that explains the logic of what they are doing and why.
25: If any ioctl's are added by the patch, then also update
Documentation/ioctl/ioctl-number.txt.
26: If your modified source code depends on or uses any of the kernel
APIs or features that are related to the following kconfig symbols,
then test multiple builds with the related kconfig symbols disabled
and/or =m (if that option is available) [not all of these at the
same time, just various/random combinations of them]:
CONFIG_SMP, CONFIG_SYSFS, CONFIG_PROC_FS, CONFIG_INPUT, CONFIG_PCI,
CONFIG_BLOCK, CONFIG_PM, CONFIG_MAGIC_SYSRQ,
CONFIG_NET, CONFIG_INET=n (but latter with CONFIG_NET=y)

View File

@@ -1,163 +0,0 @@
Submitting Drivers For The Linux Kernel
---------------------------------------
This document is intended to explain how to submit device drivers to the
various kernel trees. Note that if you are interested in video card drivers
you should probably talk to XFree86 (http://www.xfree86.org/) and/or X.Org
(http://x.org/) instead.
Also read the Documentation/SubmittingPatches document.
Allocating Device Numbers
-------------------------
Major and minor numbers for block and character devices are allocated
by the Linux assigned name and number authority (currently this is
Torben Mathiasen). The site is http://www.lanana.org/. This
also deals with allocating numbers for devices that are not going to
be submitted to the mainstream kernel.
See Documentation/devices.txt for more information on this.
If you don't use assigned numbers then when your device is submitted it will
be given an assigned number even if that is different from values you may
have shipped to customers before.
Who To Submit Drivers To
------------------------
Linux 2.0:
No new drivers are accepted for this kernel tree.
Linux 2.2:
No new drivers are accepted for this kernel tree.
Linux 2.4:
If the code area has a general maintainer then please submit it to
the maintainer listed in MAINTAINERS in the kernel file. If the
maintainer does not respond or you cannot find the appropriate
maintainer then please contact Willy Tarreau <w@1wt.eu>.
Linux 2.6:
The same rules apply as 2.4 except that you should follow linux-kernel
to track changes in API's. The final contact point for Linux 2.6
submissions is Andrew Morton.
What Criteria Determine Acceptance
----------------------------------
Licensing: The code must be released to us under the
GNU General Public License. We don't insist on any kind
of exclusive GPL licensing, and if you wish the driver
to be useful to other communities such as BSD you may well
wish to release under multiple licenses.
See accepted licenses at include/linux/module.h
Copyright: The copyright owner must agree to use of GPL.
It's best if the submitter and copyright owner
are the same person/entity. If not, the name of
the person/entity authorizing use of GPL should be
listed in case it's necessary to verify the will of
the copyright owner.
Interfaces: If your driver uses existing interfaces and behaves like
other drivers in the same class it will be much more likely
to be accepted than if it invents gratuitous new ones.
If you need to implement a common API over Linux and NT
drivers do it in userspace.
Code: Please use the Linux style of code formatting as documented
in Documentation/CodingStyle. If you have sections of code
that need to be in other formats, for example because they
are shared with a windows driver kit and you want to
maintain them just once separate them out nicely and note
this fact.
Portability: Pointers are not always 32bits, not all computers are little
endian, people do not all have floating point and you
shouldn't use inline x86 assembler in your driver without
careful thought. Pure x86 drivers generally are not popular.
If you only have x86 hardware it is hard to test portability
but it is easy to make sure the code can easily be made
portable.
Clarity: It helps if anyone can see how to fix the driver. It helps
you because you get patches not bug reports. If you submit a
driver that intentionally obfuscates how the hardware works
it will go in the bitbucket.
PM support: Since Linux is used on many portable and desktop systems, your
driver is likely to be used on such a system and therefore it
should support basic power management by implementing, if
necessary, the .suspend and .resume methods used during the
system-wide suspend and resume transitions. You should verify
that your driver correctly handles the suspend and resume, but
if you are unable to ensure that, please at least define the
.suspend method returning the -ENOSYS ("Function not
implemented") error. You should also try to make sure that your
driver uses as little power as possible when it's not doing
anything. For the driver testing instructions see
Documentation/power/drivers-testing.txt and for a relatively
complete overview of the power management issues related to
drivers see Documentation/power/devices.txt .
Control: In general if there is active maintenance of a driver by
the author then patches will be redirected to them unless
they are totally obvious and without need of checking.
If you want to be the contact and update point for the
driver it is a good idea to state this in the comments,
and include an entry in MAINTAINERS for your driver.
What Criteria Do Not Determine Acceptance
-----------------------------------------
Vendor: Being the hardware vendor and maintaining the driver is
often a good thing. If there is a stable working driver from
other people already in the tree don't expect 'we are the
vendor' to get your driver chosen. Ideally work with the
existing driver author to build a single perfect driver.
Author: It doesn't matter if a large Linux company wrote the driver,
or you did. Nobody has any special access to the kernel
tree. Anyone who tells you otherwise isn't telling the
whole story.
Resources
---------
Linux kernel master tree:
ftp.??.kernel.org:/pub/linux/kernel/...
?? == your country code, such as "us", "uk", "fr", etc.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git
Linux kernel mailing list:
linux-kernel@vger.kernel.org
[mail majordomo@vger.kernel.org to subscribe]
Linux Device Drivers, Third Edition (covers 2.6.10):
http://lwn.net/Kernel/LDD3/ (free version)
LWN.net:
Weekly summary of kernel development activity - http://lwn.net/
2.6 API changes:
http://lwn.net/Articles/2.6-kernel-api/
Porting drivers from prior kernels to 2.6:
http://lwn.net/Articles/driver-porting/
KernelNewbies:
Documentation and assistance for new kernel programmers
http://kernelnewbies.org/
Linux USB project:
http://www.linux-usb.org/
How to NOT write kernel driver by Arjan van de Ven:
http://www.fenrus.org/how-to-not-write-a-device-driver-paper.pdf
Kernel Janitor:
http://kernelnewbies.org/KernelJanitors
GIT, Fast Version Control System:
http://git-scm.com/

View File

@@ -1,821 +1 @@
How to Get Your Change Into the Linux Kernel
or
Care And Operation Of Your Linus Torvalds
For a person or company who wishes to submit a change to the Linux
kernel, the process can sometimes be daunting if you're not familiar
with "the system." This text is a collection of suggestions which
can greatly increase the chances of your change being accepted.
This document contains a large number of suggestions in a relatively terse
format. For detailed information on how the kernel development process
works, see Documentation/development-process. Also, read
Documentation/SubmitChecklist for a list of items to check before
submitting code. If you are submitting a driver, also read
Documentation/SubmittingDrivers; for device tree binding patches, read
Documentation/devicetree/bindings/submitting-patches.txt.
Many of these steps describe the default behavior of the git version
control system; if you use git to prepare your patches, you'll find much
of the mechanical work done for you, though you'll still need to prepare
and document a sensible set of patches. In general, use of git will make
your life as a kernel developer easier.
--------------------------------------------
SECTION 1 - CREATING AND SENDING YOUR CHANGE
--------------------------------------------
0) Obtain a current source tree
-------------------------------
If you do not have a repository with the current kernel source handy, use
git to obtain one. You'll want to start with the mainline repository,
which can be grabbed with:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Note, however, that you may not want to develop against the mainline tree
directly. Most subsystem maintainers run their own trees and want to see
patches prepared against those trees. See the "T:" entry for the subsystem
in the MAINTAINERS file to find that tree, or simply ask the maintainer if
the tree is not listed there.
It is still possible to download kernel releases via tarballs (as described
in the next section), but that is the hard way to do kernel development.
1) "diff -up"
------------
If you must generate your patches by hand, use "diff -up" or "diff -uprN"
to create patches. Git generates patches in this form by default; if
you're using git, you can skip this section entirely.
All changes to the Linux kernel occur in the form of patches, as
generated by diff(1). When creating your patch, make sure to create it
in "unified diff" format, as supplied by the '-u' argument to diff(1).
Also, please use the '-p' argument which shows which C function each
change is in - that makes the resultant diff a lot easier to read.
Patches should be based in the root kernel source directory,
not in any lower subdirectory.
To create a patch for a single file, it is often sufficient to do:
SRCTREE= linux
MYFILE= drivers/net/mydriver.c
cd $SRCTREE
cp $MYFILE $MYFILE.orig
vi $MYFILE # make your change
cd ..
diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch
To create a patch for multiple files, you should unpack a "vanilla",
or unmodified kernel source tree, and generate a diff against your
own source tree. For example:
MYSRC= /devel/linux
tar xvfz linux-3.19.tar.gz
mv linux-3.19 linux-3.19-vanilla
diff -uprN -X linux-3.19-vanilla/Documentation/dontdiff \
linux-3.19-vanilla $MYSRC > /tmp/patch
"dontdiff" is a list of files which are generated by the kernel during
the build process, and should be ignored in any diff(1)-generated
patch.
Make sure your patch does not include any extra files which do not
belong in a patch submission. Make sure to review your patch -after-
generating it with diff(1), to ensure accuracy.
If your changes produce a lot of deltas, you need to split them into
individual patches which modify things in logical stages; see section
#3. This will facilitate review by other kernel developers,
very important if you want your patch accepted.
If you're using git, "git rebase -i" can help you with this process. If
you're not using git, quilt <http://savannah.nongnu.org/projects/quilt>
is another popular alternative.
2) Describe your changes.
-------------------------
Describe your problem. Whether your patch is a one-line bug fix or
5000 lines of a new feature, there must be an underlying problem that
motivated you to do this work. Convince the reviewer that there is a
problem worth fixing and that it makes sense for them to read past the
first paragraph.
Describe user-visible impact. Straight up crashes and lockups are
pretty convincing, but not all bugs are that blatant. Even if the
problem was spotted during code review, describe the impact you think
it can have on users. Keep in mind that the majority of Linux
installations run kernels from secondary stable trees or
vendor/product-specific trees that cherry-pick only specific patches
from upstream, so include anything that could help route your change
downstream: provoking circumstances, excerpts from dmesg, crash
descriptions, performance regressions, latency spikes, lockups, etc.
Quantify optimizations and trade-offs. If you claim improvements in
performance, memory consumption, stack footprint, or binary size,
include numbers that back them up. But also describe non-obvious
costs. Optimizations usually aren't free but trade-offs between CPU,
memory, and readability; or, when it comes to heuristics, between
different workloads. Describe the expected downsides of your
optimization so that the reviewer can weigh costs against benefits.
Once the problem is established, describe what you are actually doing
about it in technical detail. It's important to describe the change
in plain English for the reviewer to verify that the code is behaving
as you intend it to.
The maintainer will thank you if you write your patch description in a
form which can be easily pulled into Linux's source code management
system, git, as a "commit log". See #15, below.
Solve only one problem per patch. If your description starts to get
long, that's a sign that you probably need to split up your patch.
See #3, next.
When you submit or resubmit a patch or patch series, include the
complete patch description and justification for it. Don't just
say that this is version N of the patch (series). Don't expect the
subsystem maintainer to refer back to earlier patch versions or referenced
URLs to find the patch description and put that into the patch.
I.e., the patch (series) and its description should be self-contained.
This benefits both the maintainers and reviewers. Some reviewers
probably didn't even receive earlier versions of the patch.
Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour.
If the patch fixes a logged bug entry, refer to that bug entry by
number and URL. If the patch follows from a mailing list discussion,
give a URL to the mailing list archive; use the https://lkml.kernel.org/
redirector with a Message-Id, to ensure that the links cannot become
stale.
However, try to make your explanation understandable without external
resources. In addition to giving a URL to a mailing list archive or
bug, summarize the relevant points of the discussion that led to the
patch as submitted.
If you want to refer to a specific commit, don't just refer to the
SHA-1 ID of the commit. Please also include the oneline summary of
the commit, to make it easier for reviewers to know what it is about.
Example:
Commit e21d2170f36602ae2708 ("video: remove unnecessary
platform_set_drvdata()") removed the unnecessary
platform_set_drvdata(), but left the variable "dev" unused,
delete it.
You should also be sure to use at least the first twelve characters of the
SHA-1 ID. The kernel repository holds a *lot* of objects, making
collisions with shorter IDs a real possibility. Bear in mind that, even if
there is no collision with your six-character ID now, that condition may
change five years from now.
If your patch fixes a bug in a specific commit, e.g. you found an issue using
git-bisect, please use the 'Fixes:' tag with the first 12 characters of the
SHA-1 ID, and the one line summary. For example:
Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()")
The following git-config settings can be used to add a pretty format for
outputting the above style in the git log or git show commands
[core]
abbrev = 12
[pretty]
fixes = Fixes: %h (\"%s\")
3) Separate your changes.
-------------------------
Separate each _logical change_ into a separate patch.
For example, if your changes include both bug fixes and performance
enhancements for a single driver, separate those changes into two
or more patches. If your changes include an API update, and a new
driver which uses that new API, separate those into two patches.
On the other hand, if you make a single change to numerous files,
group those changes into a single patch. Thus a single logical change
is contained within a single patch.
The point to remember is that each patch should make an easily understood
change that can be verified by reviewers. Each patch should be justifiable
on its own merits.
If one patch depends on another patch in order for a change to be
complete, that is OK. Simply note "this patch depends on patch X"
in your patch description.
When dividing your change into a series of patches, take special care to
ensure that the kernel builds and runs properly after each patch in the
series. Developers using "git bisect" to track down a problem can end up
splitting your patch series at any point; they will not thank you if you
introduce bugs in the middle.
If you cannot condense your patch set into a smaller set of patches,
then only post say 15 or so at a time and wait for review and integration.
4) Style-check your changes.
----------------------------
Check your patch for basic style violations, details of which can be
found in Documentation/CodingStyle. Failure to do so simply wastes
the reviewers time and will get your patch rejected, probably
without even being read.
One significant exception is when moving code from one file to
another -- in this case you should not modify the moved code at all in
the same patch which moves it. This clearly delineates the act of
moving the code and your changes. This greatly aids review of the
actual differences and allows tools to better track the history of
the code itself.
Check your patches with the patch style checker prior to submission
(scripts/checkpatch.pl). Note, though, that the style checker should be
viewed as a guide, not as a replacement for human judgment. If your code
looks better with a violation then its probably best left alone.
The checker reports at three levels:
- ERROR: things that are very likely to be wrong
- WARNING: things requiring careful review
- CHECK: things requiring thought
You should be able to justify all violations that remain in your
patch.
5) Select the recipients for your patch.
----------------------------------------
You should always copy the appropriate subsystem maintainer(s) on any patch
to code that they maintain; look through the MAINTAINERS file and the
source code revision history to see who those maintainers are. The
script scripts/get_maintainer.pl can be very useful at this step. If you
cannot find a maintainer for the subsystem you are working on, Andrew
Morton (akpm@linux-foundation.org) serves as a maintainer of last resort.
You should also normally choose at least one mailing list to receive a copy
of your patch set. linux-kernel@vger.kernel.org functions as a list of
last resort, but the volume on that list has caused a number of developers
to tune it out. Look in the MAINTAINERS file for a subsystem-specific
list; your patch will probably get more attention there. Please do not
spam unrelated lists, though.
Many kernel-related lists are hosted on vger.kernel.org; you can find a
list of them at http://vger.kernel.org/vger-lists.html. There are
kernel-related lists hosted elsewhere as well, though.
Do not send more than 15 patches at once to the vger mailing lists!!!
Linus Torvalds is the final arbiter of all changes accepted into the
Linux kernel. His e-mail address is <torvalds@linux-foundation.org>.
He gets a lot of e-mail, and, at this point, very few patches go through
Linus directly, so typically you should do your best to -avoid-
sending him e-mail.
If you have a patch that fixes an exploitable security bug, send that patch
to security@kernel.org. For severe bugs, a short embargo may be considered
to allow distributors to get the patch out to users; in such cases,
obviously, the patch should not be sent to any public lists.
Patches that fix a severe bug in a released kernel should be directed
toward the stable maintainers by putting a line like this:
Cc: stable@vger.kernel.org
into the sign-off area of your patch (note, NOT an email recipient). You
should also read Documentation/stable_kernel_rules.txt in addition to this
file.
Note, however, that some subsystem maintainers want to come to their own
conclusions on which patches should go to the stable trees. The networking
maintainer, in particular, would rather not see individual developers
adding lines like the above to their patches.
If changes affect userland-kernel interfaces, please send the MAN-PAGES
maintainer (as listed in the MAINTAINERS file) a man-pages patch, or at
least a notification of the change, so that some information makes its way
into the manual pages. User-space API changes should also be copied to
linux-api@vger.kernel.org.
For small patches you may want to CC the Trivial Patch Monkey
trivial@kernel.org which collects "trivial" patches. Have a look
into the MAINTAINERS file for its current manager.
Trivial patches must qualify for one of the following rules:
Spelling fixes in documentation
Spelling fixes for errors which could break grep(1)
Warning fixes (cluttering with useless warnings is bad)
Compilation fixes (only if they are actually correct)
Runtime fixes (only if they actually fix things)
Removing use of deprecated functions/macros
Contact detail and documentation fixes
Non-portable code replaced by portable code (even in arch-specific,
since people copy, as long as it's trivial)
Any fix by the author/maintainer of the file (ie. patch monkey
in re-transmission mode)
6) No MIME, no links, no compression, no attachments. Just plain text.
-----------------------------------------------------------------------
Linus and other kernel developers need to be able to read and comment
on the changes you are submitting. It is important for a kernel
developer to be able to "quote" your changes, using standard e-mail
tools, so that they may comment on specific portions of your code.
For this reason, all patches should be submitted by e-mail "inline".
WARNING: Be wary of your editor's word-wrap corrupting your patch,
if you choose to cut-n-paste your patch.
Do not attach the patch as a MIME attachment, compressed or not.
Many popular e-mail applications will not always transmit a MIME
attachment as plain text, making it impossible to comment on your
code. A MIME attachment also takes Linus a bit more time to process,
decreasing the likelihood of your MIME-attached change being accepted.
Exception: If your mailer is mangling patches then someone may ask
you to re-send them using MIME.
See Documentation/email-clients.txt for hints about configuring
your e-mail client so that it sends your patches untouched.
7) E-mail size.
---------------
Large changes are not appropriate for mailing lists, and some
maintainers. If your patch, uncompressed, exceeds 300 kB in size,
it is preferred that you store your patch on an Internet-accessible
server, and provide instead a URL (link) pointing to your patch. But note
that if your patch exceeds 300 kB, it almost certainly needs to be broken up
anyway.
8) Respond to review comments.
------------------------------
Your patch will almost certainly get comments from reviewers on ways in
which the patch can be improved. You must respond to those comments;
ignoring reviewers is a good way to get ignored in return. Review comments
or questions that do not lead to a code change should almost certainly
bring about a comment or changelog entry so that the next reviewer better
understands what is going on.
Be sure to tell the reviewers what changes you are making and to thank them
for their time. Code review is a tiring and time-consuming process, and
reviewers sometimes get grumpy. Even in that case, though, respond
politely and address the problems they have pointed out.
9) Don't get discouraged - or impatient.
----------------------------------------
After you have submitted your change, be patient and wait. Reviewers are
busy people and may not get to your patch right away.
Once upon a time, patches used to disappear into the void without comment,
but the development process works more smoothly than that now. You should
receive comments within a week or so; if that does not happen, make sure
that you have sent your patches to the right place. Wait for a minimum of
one week before resubmitting or pinging reviewers - possibly longer during
busy times like merge windows.
10) Include PATCH in the subject
--------------------------------
Due to high e-mail traffic to Linus, and to linux-kernel, it is common
convention to prefix your subject line with [PATCH]. This lets Linus
and other kernel developers more easily distinguish patches from other
e-mail discussions.
11) Sign your work
------------------
To improve tracking of who did what, especially with patches that can
percolate to their final resting place in the kernel through several
layers of maintainers, we've introduced a "sign-off" procedure on
patches that are being emailed around.
The sign-off is a simple line at the end of the explanation for the
patch, which certifies that you wrote it or otherwise have the right to
pass it on as an open-source patch. The rules are pretty simple: if you
can certify the below:
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
then you just add a line saying
Signed-off-by: Random J Developer <random@developer.example.org>
using your real name (sorry, no pseudonyms or anonymous contributions.)
Some people also put extra tags at the end. They'll just be ignored for
now, but you can do this to mark internal company procedures or just
point out some special detail about the sign-off.
If you are a subsystem or branch maintainer, sometimes you need to slightly
modify patches you receive in order to merge them, because the code is not
exactly the same in your tree and the submitters'. If you stick strictly to
rule (c), you should ask the submitter to rediff, but this is a totally
counter-productive waste of time and energy. Rule (b) allows you to adjust
the code, but then it is very impolite to change one submitter's code and
make him endorse your bugs. To solve this problem, it is recommended that
you add a line between the last Signed-off-by header and yours, indicating
the nature of your changes. While there is nothing mandatory about this, it
seems like prepending the description with your mail and/or name, all
enclosed in square brackets, is noticeable enough to make it obvious that
you are responsible for last-minute changes. Example :
Signed-off-by: Random J Developer <random@developer.example.org>
[lucky@maintainer.example.org: struct foo moved from foo.c to foo.h]
Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org>
This practice is particularly helpful if you maintain a stable branch and
want at the same time to credit the author, track changes, merge the fix,
and protect the submitter from complaints. Note that under no circumstances
can you change the author's identity (the From header), as it is the one
which appears in the changelog.
Special note to back-porters: It seems to be a common and useful practice
to insert an indication of the origin of a patch at the top of the commit
message (just after the subject line) to facilitate tracking. For instance,
here's what we see in a 3.x-stable release:
Date: Tue Oct 7 07:26:38 2014 -0400
libata: Un-break ATA blacklist
commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream.
And here's what might appear in an older kernel once a patch is backported:
Date: Tue May 13 22:12:27 2008 +0200
wireless, airo: waitbusy() won't delay
[backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a]
Whatever the format, this information provides a valuable help to people
tracking your trees, and to people trying to troubleshoot bugs in your
tree.
12) When to use Acked-by: and Cc:
---------------------------------
The Signed-off-by: tag indicates that the signer was involved in the
development of the patch, or that he/she was in the patch's delivery path.
If a person was not directly involved in the preparation or handling of a
patch but wishes to signify and record their approval of it then they can
ask to have an Acked-by: line added to the patch's changelog.
Acked-by: is often used by the maintainer of the affected code when that
maintainer neither contributed to nor forwarded the patch.
Acked-by: is not as formal as Signed-off-by:. It is a record that the acker
has at least reviewed the patch and has indicated acceptance. Hence patch
mergers will sometimes manually convert an acker's "yep, looks good to me"
into an Acked-by: (but note that it is usually better to ask for an
explicit ack).
Acked-by: does not necessarily indicate acknowledgement of the entire patch.
For example, if a patch affects multiple subsystems and has an Acked-by: from
one subsystem maintainer then this usually indicates acknowledgement of just
the part which affects that maintainer's code. Judgement should be used here.
When in doubt people should refer to the original discussion in the mailing
list archives.
If a person has had the opportunity to comment on a patch, but has not
provided such comments, you may optionally add a "Cc:" tag to the patch.
This is the only tag which might be added without an explicit action by the
person it names - but it should indicate that this person was copied on the
patch. This tag documents that potentially interested parties
have been included in the discussion.
13) Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
--------------------------------------------------------------------------
The Reported-by tag gives credit to people who find bugs and report them and it
hopefully inspires them to help us again in the future. Please note that if
the bug was reported in private, then ask for permission first before using the
Reported-by tag.
A Tested-by: tag indicates that the patch has been successfully tested (in
some environment) by the person named. This tag informs maintainers that
some testing has been performed, provides a means to locate testers for
future patches, and ensures credit for the testers.
Reviewed-by:, instead, indicates that the patch has been reviewed and found
acceptable according to the Reviewer's Statement:
Reviewer's statement of oversight
By offering my Reviewed-by: tag, I state that:
(a) I have carried out a technical review of this patch to
evaluate its appropriateness and readiness for inclusion into
the mainline kernel.
(b) Any problems, concerns, or questions relating to the patch
have been communicated back to the submitter. I am satisfied
with the submitter's response to my comments.
(c) While there may be things that could be improved with this
submission, I believe that it is, at this time, (1) a
worthwhile modification to the kernel, and (2) free of known
issues which would argue against its inclusion.
(d) While I have reviewed the patch and believe it to be sound, I
do not (unless explicitly stated elsewhere) make any
warranties or guarantees that it will achieve its stated
purpose or function properly in any given situation.
A Reviewed-by tag is a statement of opinion that the patch is an
appropriate modification of the kernel without any remaining serious
technical issues. Any interested reviewer (who has done the work) can
offer a Reviewed-by tag for a patch. This tag serves to give credit to
reviewers and to inform maintainers of the degree of review which has been
done on the patch. Reviewed-by: tags, when supplied by reviewers known to
understand the subject area and to perform thorough reviews, will normally
increase the likelihood of your patch getting into the kernel.
A Suggested-by: tag indicates that the patch idea is suggested by the person
named and ensures credit to the person for the idea. Please note that this
tag should not be added without the reporter's permission, especially if the
idea was not posted in a public forum. That said, if we diligently credit our
idea reporters, they will, hopefully, be inspired to help us again in the
future.
A Fixes: tag indicates that the patch fixes an issue in a previous commit. It
is used to make it easy to determine where a bug originated, which can help
review a bug fix. This tag also assists the stable kernel team in determining
which stable kernel versions should receive your fix. This is the preferred
method for indicating a bug fixed by the patch. See #2 above for more details.
14) The canonical patch format
------------------------------
This section describes how the patch itself should be formatted. Note
that, if you have your patches stored in a git repository, proper patch
formatting can be had with "git format-patch". The tools cannot create
the necessary text, though, so read the instructions below anyway.
The canonical patch subject line is:
Subject: [PATCH 001/123] subsystem: summary phrase
The canonical patch message body contains the following:
- A "from" line specifying the patch author (only needed if the person
sending the patch is not the author).
- An empty line.
- The body of the explanation, line wrapped at 75 columns, which will
be copied to the permanent changelog to describe this patch.
- The "Signed-off-by:" lines, described above, which will
also go in the changelog.
- A marker line containing simply "---".
- Any additional comments not suitable for the changelog.
- The actual patch (diff output).
The Subject line format makes it very easy to sort the emails
alphabetically by subject line - pretty much any email reader will
support that - since because the sequence number is zero-padded,
the numerical and alphabetic sort is the same.
The "subsystem" in the email's Subject should identify which
area or subsystem of the kernel is being patched.
The "summary phrase" in the email's Subject should concisely
describe the patch which that email contains. The "summary
phrase" should not be a filename. Do not use the same "summary
phrase" for every patch in a whole patch series (where a "patch
series" is an ordered sequence of multiple, related patches).
Bear in mind that the "summary phrase" of your email becomes a
globally-unique identifier for that patch. It propagates all the way
into the git changelog. The "summary phrase" may later be used in
developer discussions which refer to the patch. People will want to
google for the "summary phrase" to read discussion regarding that
patch. It will also be the only thing that people may quickly see
when, two or three months later, they are going through perhaps
thousands of patches using tools such as "gitk" or "git log
--oneline".
For these reasons, the "summary" must be no more than 70-75
characters, and it must describe both what the patch changes, as well
as why the patch might be necessary. It is challenging to be both
succinct and descriptive, but that is what a well-written summary
should do.
The "summary phrase" may be prefixed by tags enclosed in square
brackets: "Subject: [PATCH <tag>...] <summary phrase>". The tags are
not considered part of the summary phrase, but describe how the patch
should be treated. Common tags might include a version descriptor if
the multiple versions of the patch have been sent out in response to
comments (i.e., "v1, v2, v3"), or "RFC" to indicate a request for
comments. If there are four patches in a patch series the individual
patches may be numbered like this: 1/4, 2/4, 3/4, 4/4. This assures
that developers understand the order in which the patches should be
applied and that they have reviewed or applied all of the patches in
the patch series.
A couple of example Subjects:
Subject: [PATCH 2/5] ext2: improve scalability of bitmap searching
Subject: [PATCH v2 01/27] x86: fix eflags tracking
The "from" line must be the very first line in the message body,
and has the form:
From: Original Author <author@example.com>
The "from" line specifies who will be credited as the author of the
patch in the permanent changelog. If the "from" line is missing,
then the "From:" line from the email header will be used to determine
the patch author in the changelog.
The explanation body will be committed to the permanent source
changelog, so should make sense to a competent reader who has long
since forgotten the immediate details of the discussion that might
have led to this patch. Including symptoms of the failure which the
patch addresses (kernel log messages, oops messages, etc.) is
especially useful for people who might be searching the commit logs
looking for the applicable patch. If a patch fixes a compile failure,
it may not be necessary to include _all_ of the compile failures; just
enough that it is likely that someone searching for the patch can find
it. As in the "summary phrase", it is important to be both succinct as
well as descriptive.
The "---" marker line serves the essential purpose of marking for patch
handling tools where the changelog message ends.
One good use for the additional comments after the "---" marker is for
a diffstat, to show what files have changed, and the number of
inserted and deleted lines per file. A diffstat is especially useful
on bigger patches. Other comments relevant only to the moment or the
maintainer, not suitable for the permanent changelog, should also go
here. A good example of such comments might be "patch changelogs"
which describe what has changed between the v1 and v2 version of the
patch.
If you are going to include a diffstat after the "---" marker, please
use diffstat options "-p 1 -w 70" so that filenames are listed from
the top of the kernel source tree and don't use too much horizontal
space (easily fit in 80 columns, maybe with some indentation). (git
generates appropriate diffstats by default.)
See more details on the proper patch format in the following
references.
15) Explicit In-Reply-To headers
--------------------------------
It can be helpful to manually add In-Reply-To: headers to a patch
(e.g., when using "git send-email") to associate the patch with
previous relevant discussion, e.g. to link a bug fix to the email with
the bug report. However, for a multi-patch series, it is generally
best to avoid using In-Reply-To: to link to older versions of the
series. This way multiple versions of the patch don't become an
unmanageable forest of references in email clients. If a link is
helpful, you can use the https://lkml.kernel.org/ redirector (e.g., in
the cover email text) to link to an earlier version of the patch series.
16) Sending "git pull" requests
-------------------------------
If you have a series of patches, it may be most convenient to have the
maintainer pull them directly into the subsystem repository with a
"git pull" operation. Note, however, that pulling patches from a developer
requires a higher degree of trust than taking patches from a mailing list.
As a result, many subsystem maintainers are reluctant to take pull
requests, especially from new, unknown developers. If in doubt you can use
the pull request as the cover letter for a normal posting of the patch
series, giving the maintainer the option of using either.
A pull request should have [GIT] or [PULL] in the subject line. The
request itself should include the repository name and the branch of
interest on a single line; it should look something like:
Please pull from
git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus
to get these changes:
A pull request should also include an overall message saying what will be
included in the request, a "git shortlog" listing of the patches
themselves, and a diffstat showing the overall effect of the patch series.
The easiest way to get all this information together is, of course, to let
git do it for you with the "git request-pull" command.
Some maintainers (including Linus) want to see pull requests from signed
commits; that increases their confidence that the request actually came
from you. Linus, in particular, will not pull from public hosting sites
like GitHub in the absence of a signed tag.
The first step toward creating such tags is to make a GNUPG key and get it
signed by one or more core kernel developers. This step can be hard for
new developers, but there is no way around it. Attending conferences can
be a good way to find developers who can sign your key.
Once you have prepared a patch series in git that you wish to have somebody
pull, create a signed tag with "git tag -s". This will create a new tag
identifying the last commit in the series and containing a signature
created with your private key. You will also have the opportunity to add a
changelog-style message to the tag; this is an ideal place to describe the
effects of the pull request as a whole.
If the tree the maintainer will be pulling from is not the repository you
are working from, don't forget to push the signed tag explicitly to the
public tree.
When generating your pull request, use the signed tag as the target. A
command like this will do the trick:
git request-pull master git://my.public.tree/linux.git my-signed-tag
----------------------
SECTION 2 - REFERENCES
----------------------
Andrew Morton, "The perfect patch" (tpp).
<http://www.ozlabs.org/~akpm/stuff/tpp.txt>
Jeff Garzik, "Linux kernel patch submission format".
<http://linux.yyz.us/patch-format.html>
Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer".
<http://www.kroah.com/log/linux/maintainer.html>
<http://www.kroah.com/log/linux/maintainer-02.html>
<http://www.kroah.com/log/linux/maintainer-03.html>
<http://www.kroah.com/log/linux/maintainer-04.html>
<http://www.kroah.com/log/linux/maintainer-05.html>
<http://www.kroah.com/log/linux/maintainer-06.html>
NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!
<https://lkml.org/lkml/2005/7/11/336>
Kernel Documentation/CodingStyle:
<Documentation/CodingStyle>
Linus Torvalds's mail on the canonical patch format:
<http://lkml.org/lkml/2005/4/7/183>
Andi Kleen, "On submitting kernel patches"
Some strategies to get difficult or controversial changes in.
http://halobates.de/on-submitting-patches.pdf
--
This file has moved to process/submitting-patches.rst

View File

@@ -1,39 +0,0 @@
Software cursor for VGA by Pavel Machek <pavel@atrey.karlin.mff.cuni.cz>
======================= and Martin Mares <mj@atrey.karlin.mff.cuni.cz>
Linux now has some ability to manipulate cursor appearance. Normally, you
can set the size of hardware cursor (and also work around some ugly bugs in
those miserable Trident cards--see #define TRIDENT_GLITCH in drivers/video/
vgacon.c). You can now play a few new tricks: you can make your cursor look
like a non-blinking red block, make it inverse background of the character it's
over or to highlight that character and still choose whether the original
hardware cursor should remain visible or not. There may be other things I have
never thought of.
The cursor appearance is controlled by a "<ESC>[?1;2;3c" escape sequence
where 1, 2 and 3 are parameters described below. If you omit any of them,
they will default to zeroes.
Parameter 1 specifies cursor size (0=default, 1=invisible, 2=underline, ...,
8=full block) + 16 if you want the software cursor to be applied + 32 if you
want to always change the background color + 64 if you dislike having the
background the same as the foreground. Highlights are ignored for the last two
flags.
The second parameter selects character attribute bits you want to change
(by simply XORing them with the value of this parameter). On standard VGA,
the high four bits specify background and the low four the foreground. In both
groups, low three bits set color (as in normal color codes used by the console)
and the most significant one turns on highlight (or sometimes blinking--it
depends on the configuration of your VGA).
The third parameter consists of character attribute bits you want to set.
Bit setting takes place before bit toggling, so you can simply clear a bit by
including it in both the set mask and the toggle mask.
Examples:
=========
To get normal blinking underline, use: echo -e '\033[?2c'
To get blinking block, use: echo -e '\033[?6c'
To get red non-blinking block, use: echo -e '\033[?17;0;64c'

View File

@@ -1,7 +0,0 @@
# List of programs to build
hostprogs-y := getdelays
# Tell kbuild to always build the programs
always := $(hostprogs-y)
HOSTCFLAGS_getdelays.o += -I$(objtree)/usr/include

View File

@@ -54,9 +54,9 @@ are sent to userspace without requiring a command. If it is the last exiting
task of a thread group, the per-tgid statistics are also sent. More details
are given in the taskstats interface description.
The getdelays.c userspace utility in this directory allows simple commands to
be run and the corresponding delay statistics to be displayed. It also serves
as an example of using the taskstats interface.
The getdelays.c userspace utility in tools/accounting directory allows simple
commands to be run and the corresponding delay statistics to be displayed. It
also serves as an example of using the taskstats interface.
Usage
-----

View File

@@ -0,0 +1,97 @@
_DSD Device Properties Usage Rules
----------------------------------
Properties, Property Sets and Property Subsets
----------------------------------------------
The _DSD (Device Specific Data) configuration object, introduced in ACPI 5.1,
allows any type of device configuration data to be provided via the ACPI
namespace. In principle, the format of the data may be arbitrary, but it has to
be identified by a UUID which must be recognized by the driver processing the
_DSD output. However, there are generic UUIDs defined for _DSD recognized by
the ACPI subsystem in the Linux kernel which automatically processes the data
packages associated with them and makes those data available to device drivers
as "device properties".
A device property is a data item consisting of a string key and a value (of a
specific type) associated with it.
In the ACPI _DSD context it is an element of the sub-package following the
generic Device Properties UUID in the _DSD return package as specified in the
Device Properties UUID definition document [1].
It also may be regarded as the definition of a key and the associated data type
that can be returned by _DSD in the Device Properties UUID sub-package for a
given device.
A property set is a collection of properties applicable to a hardware entity
like a device. In the ACPI _DSD context it is the set of all properties that
can be returned in the Device Properties UUID sub-package for the device in
question.
Property subsets are nested collections of properties. Each of them is
associated with an additional key (name) allowing the subset to be referred
to as a whole (and to be treated as a separate entity). The canonical
representation of property subsets is via the mechanism specified in the
Hierarchical Properties Extension UUID definition document [2].
Property sets may be hierarchical. That is, a property set may contain
multiple property subsets that each may contain property subsets of its
own and so on.
General Validity Rule for Property Sets
---------------------------------------
Valid property sets must follow the guidance given by the Device Properties UUID
definition document [1].
_DSD properties are intended to be used in addition to, and not instead of, the
existing mechanisms defined by the ACPI specification. Therefore, as a rule,
they should only be used if the ACPI specification does not make direct
provisions for handling the underlying use case. It generally is invalid to
return property sets which do not follow that rule from _DSD in data packages
associated with the Device Properties UUID.
Additional Considerations
-------------------------
There are cases in which, even if the general rule given above is followed in
principle, the property set may still not be regarded as a valid one.
For example, that applies to device properties which may cause kernel code
(either a device driver or a library/subsystem) to access hardware in a way
possibly leading to a conflict with AML methods in the ACPI namespace. In
particular, that may happen if the kernel code uses device properties to
manipulate hardware normally controlled by ACPI methods related to power
management, like _PSx and _DSW (for device objects) or _ON and _OFF (for power
resource objects), or by ACPI device disabling/enabling methods, like _DIS and
_SRS.
In all cases in which kernel code may do something that will confuse AML as a
result of using device properties, the device properties in question are not
suitable for the ACPI environment and consequently they cannot belong to a valid
property set.
Property Sets and Device Tree Bindings
--------------------------------------
It often is useful to make _DSD return property sets that follow Device Tree
bindings.
In those cases, however, the above validity considerations must be taken into
account in the first place and returning invalid property sets from _DSD must be
avoided. For this reason, it may not be possible to make _DSD return a property
set following the given DT binding literally and completely. Still, for the
sake of code re-use, it may make sense to provide as much of the configuration
data as possible in the form of device properties and complement that with an
ACPI-specific mechanism suitable for the use case at hand.
In any case, property sets following DT bindings literally should not be
expected to automatically work in the ACPI environment regardless of their
contents.
References
----------
[1] http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
[2] http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf

View File

@@ -0,0 +1,96 @@
Special Usage Model of the ACPI Control Method Lid Device
Copyright (C) 2016, Intel Corporation
Author: Lv Zheng <lv.zheng@intel.com>
Abstract:
Platforms containing lids convey lid state (open/close) to OSPMs using a
control method lid device. To implement this, the AML tables issue
Notify(lid_device, 0x80) to notify the OSPMs whenever the lid state has
changed. The _LID control method for the lid device must be implemented to
report the "current" state of the lid as either "opened" or "closed".
For most platforms, both the _LID method and the lid notifications are
reliable. However, there are exceptions. In order to work with these
exceptional buggy platforms, special restrictions and expections should be
taken into account. This document describes the restrictions and the
expections of the Linux ACPI lid device driver.
1. Restrictions of the returning value of the _LID control method
The _LID control method is described to return the "current" lid state.
However the word of "current" has ambiguity, some buggy AML tables return
the lid state upon the last lid notification instead of returning the lid
state upon the last _LID evaluation. There won't be difference when the
_LID control method is evaluated during the runtime, the problem is its
initial returning value. When the AML tables implement this control method
with cached value, the initial returning value is likely not reliable.
There are platforms always retun "closed" as initial lid state.
2. Restrictions of the lid state change notifications
There are buggy AML tables never notifying when the lid device state is
changed to "opened". Thus the "opened" notification is not guaranteed. But
it is guaranteed that the AML tables always notify "closed" when the lid
state is changed to "closed". The "closed" notification is normally used to
trigger some system power saving operations on Windows. Since it is fully
tested, it is reliable from all AML tables.
3. Expections for the userspace users of the ACPI lid device driver
The ACPI button driver exports the lid state to the userspace via the
following file:
/proc/acpi/button/lid/LID0/state
This file actually calls the _LID control method described above. And given
the previous explanation, it is not reliable enough on some platforms. So
it is advised for the userspace program to not to solely rely on this file
to determine the actual lid state.
The ACPI button driver emits the following input event to the userspace:
SW_LID
The ACPI lid device driver is implemented to try to deliver the platform
triggered events to the userspace. However, given the fact that the buggy
firmware cannot make sure "opened"/"closed" events are paired, the ACPI
button driver uses the following 3 modes in order not to trigger issues.
If the userspace hasn't been prepared to ignore the unreliable "opened"
events and the unreliable initial state notification, Linux users can use
the following kernel parameters to handle the possible issues:
A. button.lid_init_state=method:
When this option is specified, the ACPI button driver reports the
initial lid state using the returning value of the _LID control method
and whether the "opened"/"closed" events are paired fully relies on the
firmware implementation.
This option can be used to fix some platforms where the returning value
of the _LID control method is reliable but the initial lid state
notification is missing.
This option is the default behavior during the period the userspace
isn't ready to handle the buggy AML tables.
B. button.lid_init_state=open:
When this option is specified, the ACPI button driver always reports the
initial lid state as "opened" and whether the "opened"/"closed" events
are paired fully relies on the firmware implementation.
This may fix some platforms where the returning value of the _LID
control method is not reliable and the initial lid state notification is
missing.
If the userspace has been prepared to ignore the unreliable "opened" events
and the unreliable initial state notification, Linux users should always
use the following kernel parameter:
C. button.lid_init_state=ignore:
When this option is specified, the ACPI button driver never reports the
initial lid state and there is a compensation mechanism implemented to
ensure that the reliable "closed" notifications can always be delievered
to the userspace by always pairing "closed" input events with complement
"opened" input events. But there is still no guarantee that the "opened"
notifications can be delivered to the userspace when the lid is actually
opens given that some AML tables do not send "opened" notifications
reliably.
In this mode, if everything is correctly implemented by the platform
firmware, the old userspace programs should still work. Otherwise, the
new userspace programs are required to work with the ACPI button driver.
This option will be the default behavior after the userspace is ready to
handle the buggy AML tables.

View File

@@ -415,3 +415,12 @@ the "compatible" property in the _DSD or a _CID as long as one of their
ancestors provides a _DSD with a valid "compatible" property. Such device
objects are then simply regarded as additional "blocks" providing hierarchical
configuration information to the driver of the composite ancestor device.
However, PRP0001 can only be returned from either _HID or _CID of a device
object if all of the properties returned by the _DSD associated with it (either
the _DSD of the device object itself or the _DSD of its ancestor in the
"composite device" case described above) can be used in the ACPI environment.
Otherwise, the _DSD itself is regarded as invalid and therefore the "compatible"
property returned by it is meaningless.
Refer to DSD-properties-rules.txt for more information.

View File

@@ -28,8 +28,8 @@ index, like the ASL example below shows:
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package ()
{
Package () {"reset-gpio", Package() {^BTH, 1, 1, 0 }},
Package () {"shutdown-gpio", Package() {^BTH, 0, 0, 0 }},
Package () {"reset-gpios", Package() {^BTH, 1, 1, 0 }},
Package () {"shutdown-gpios", Package() {^BTH, 0, 0, 0 }},
}
})
}
@@ -48,9 +48,71 @@ Since ACPI GpioIo() resource does not have a field saying whether it is
active low or high, the "active_low" argument can be used here. Setting
it to 1 marks the GPIO as active low.
In our Bluetooth example the "reset-gpio" refers to the second GpioIo()
In our Bluetooth example the "reset-gpios" refers to the second GpioIo()
resource, second pin in that resource with the GPIO number of 31.
It is possible to leave holes in the array of GPIOs. This is useful in
cases like with SPI host controllers where some chip selects may be
implemented as GPIOs and some as native signals. For example a SPI host
controller can have chip selects 0 and 2 implemented as GPIOs and 1 as
native:
Package () {
"cs-gpios",
Package () {
^GPIO, 19, 0, 0, // chip select 0: GPIO
0, // chip select 1: native signal
^GPIO, 20, 0, 0, // chip select 2: GPIO
}
}
Other supported properties
--------------------------
Following Device Tree compatible device properties are also supported by
_DSD device properties for GPIO controllers:
- gpio-hog
- output-high
- output-low
- input
- line-name
Example:
Name (_DSD, Package () {
// _DSD Hierarchical Properties Extension UUID
ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
Package () {
Package () {"hog-gpio8", "G8PU"}
}
})
Name (G8PU, Package () {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () {
Package () {"gpio-hog", 1},
Package () {"gpios", Package () {8, 0}},
Package () {"output-high", 1},
Package () {"line-name", "gpio8-pullup"},
}
})
- gpio-line-names
Example:
Package () {
"gpio-line-names",
Package () {
"SPI0_CS_N", "EXP2_INT", "MUX6_IO", "UART0_RXD", "MUX7_IO",
"LVL_C_A1", "MUX0_IO", "SPI1_MISO"
}
}
See Documentation/devicetree/bindings/gpio/gpio.txt for more information
about these properties.
ACPI GPIO Mappings Provided by Drivers
--------------------------------------
@@ -83,8 +145,8 @@ static const struct acpi_gpio_params reset_gpio = { 1, 1, false };
static const struct acpi_gpio_params shutdown_gpio = { 0, 0, false };
static const struct acpi_gpio_mapping bluetooth_acpi_gpios[] = {
{ "reset-gpio", &reset_gpio, 1 },
{ "shutdown-gpio", &shutdown_gpio, 1 },
{ "reset-gpios", &reset_gpio, 1 },
{ "shutdown-gpios", &shutdown_gpio, 1 },
{ },
};

187
Documentation/acpi/osi.txt Normal file
View File

@@ -0,0 +1,187 @@
ACPI _OSI and _REV methods
--------------------------
An ACPI BIOS can use the "Operating System Interfaces" method (_OSI)
to find out what the operating system supports. Eg. If BIOS
AML code includes _OSI("XYZ"), the kernel's AML interpreter
can evaluate that method, look to see if it supports 'XYZ'
and answer YES or NO to the BIOS.
The ACPI _REV method returns the "Revision of the ACPI specification
that OSPM supports"
This document explains how and why the BIOS and Linux should use these methods.
It also explains how and why they are widely misused.
How to use _OSI
---------------
Linux runs on two groups of machines -- those that are tested by the OEM
to be compatible with Linux, and those that were never tested with Linux,
but where Linux was installed to replace the original OS (Windows or OSX).
The larger group is the systems tested to run only Windows. Not only that,
but many were tested to run with just one specific version of Windows.
So even though the BIOS may use _OSI to query what version of Windows is running,
only a single path through the BIOS has actually been tested.
Experience shows that taking untested paths through the BIOS
exposes Linux to an entire category of BIOS bugs.
For this reason, Linux _OSI defaults must continue to claim compatibility
with all versions of Windows.
But Linux isn't actually compatible with Windows, and the Linux community
has also been hurt with regressions when Linux adds the latest version of
Windows to its list of _OSI strings. So it is possible that additional strings
will be more thoroughly vetted before shipping upstream in the future.
But it is likely that they will all eventually be added.
What should an OEM do if they want to support Linux and Windows
using the same BIOS image? Often they need to do something different
for Linux to deal with how Linux is different from Windows.
Here the BIOS should ask exactly what it wants to know:
_OSI("Linux-OEM-my_interface_name")
where 'OEM' is needed if this is an OEM-specific hook,
and 'my_interface_name' describes the hook, which could be a
quirk, a bug, or a bug-fix.
In addition, the OEM should send a patch to upstream Linux
via the linux-acpi@vger.kernel.org mailing list. When that patch
is checked into Linux, the OS will answer "YES" when the BIOS
on the OEM's system uses _OSI to ask if the interface is supported
by the OS. Linux distributors can back-port that patch for Linux
pre-installs, and it will be included by all distributions that
re-base to upstream. If the distribution can not update the kernel binary,
they can also add an acpi_osi=Linux-OEM-my_interface_name
cmdline parameter to the boot loader, as needed.
If the string refers to a feature where the upstream kernel
eventually grows support, a patch should be sent to remove
the string when that support is added to the kernel.
That was easy. Read on, to find out how to do it wrong.
Before _OSI, there was _OS
--------------------------
ACPI 1.0 specified "_OS" as an
"object that evaluates to a string that identifies the operating system."
The ACPI BIOS flow would include an evaluation of _OS, and the AML
interpreter in the kernel would return to it a string identifying the OS:
Windows 98, SE: "Microsoft Windows"
Windows ME: "Microsoft WindowsME:Millenium Edition"
Windows NT: "Microsoft Windows NT"
The idea was on a platform tasked with running multiple OS's,
the BIOS could use _OS to enable devices that an OS
might support, or enable quirks or bug workarounds
necessary to make the platform compatible with that pre-existing OS.
But _OS had fundamental problems. First, the BIOS needed to know the name
of every possible version of the OS that would run on it, and needed to know
all the quirks of those OS's. Certainly it would make more sense
for the BIOS to ask *specific* things of the OS, such
"do you support a specific interface", and thus in ACPI 3.0,
_OSI was born to replace _OS.
_OS was abandoned, though even today, many BIOS look for
_OS "Microsoft Windows NT", though it seems somewhat far-fetched
that anybody would install those old operating systems
over what came with the machine.
Linux answers "Microsoft Windows NT" to please that BIOS idiom.
That is the *only* viable strategy, as that is what modern Windows does,
and so doing otherwise could steer the BIOS down an untested path.
_OSI is born, and immediately misused
--------------------------------------
With _OSI, the *BIOS* provides the string describing an interface,
and asks the OS: "YES/NO, are you compatible with this interface?"
eg. _OSI("3.0 Thermal Model") would return TRUE if the OS knows how
to deal with the thermal extensions made to the ACPI 3.0 specification.
An old OS that doesn't know about those extensions would answer FALSE,
and a new OS may be able to return TRUE.
For an OS-specific interface, the ACPI spec said that the BIOS and the OS
were to agree on a string of the form such as "Windows-interface_name".
But two bad things happened. First, the Windows ecosystem used _OSI
not as designed, but as a direct replacement for _OS -- identifying
the OS version, rather than an OS supported interface. Indeed, right
from the start, the ACPI 3.0 spec itself codified this misuse
in example code using _OSI("Windows 2001").
This misuse was adopted and continues today.
Linux had no choice but to also return TRUE to _OSI("Windows 2001")
and its successors. To do otherwise would virtually guarantee breaking
a BIOS that has been tested only with that _OSI returning TRUE.
This strategy is problematic, as Linux is never completely compatible with
the latest version of Windows, and sometimes it takes more than a year
to iron out incompatibilities.
Not to be out-done, the Linux community made things worse by returning TRUE
to _OSI("Linux"). Doing so is even worse than the Windows misuse
of _OSI, as "Linux" does not even contain any version information.
_OSI("Linux") led to some BIOS' malfunctioning due to BIOS writer's
using it in untested BIOS flows. But some OEM's used _OSI("Linux")
in tested flows to support real Linux features. In 2009, Linux
removed _OSI("Linux"), and added a cmdline parameter to restore it
for legacy systems still needed it. Further a BIOS_BUG warning prints
for all BIOS's that invoke it.
No BIOS should use _OSI("Linux").
The result is a strategy for Linux to maximize compatibility with
ACPI BIOS that are tested on Windows machines. There is a real risk
of over-stating that compatibility; but the alternative has often been
catastrophic failure resulting from the BIOS taking paths that
were never validated under *any* OS.
Do not use _REV
---------------
Since _OSI("Linux") went away, some BIOS writers used _REV
to support Linux and Windows differences in the same BIOS.
_REV was defined in ACPI 1.0 to return the version of ACPI
supported by the OS and the OS AML interpreter.
Modern Windows returns _REV = 2. Linux used ACPI_CA_SUPPORT_LEVEL,
which would increment, based on the version of the spec supported.
Unfortunately, _REV was also misused. eg. some BIOS would check
for _REV = 3, and do something for Linux, but when Linux returned
_REV = 4, that support broke.
In response to this problem, Linux returns _REV = 2 always,
from mid-2015 onward. The ACPI specification will also be updated
to reflect that _REV is deprecated, and always returns 2.
Apple Mac and _OSI("Darwin")
----------------------------
On Apple's Mac platforms, the ACPI BIOS invokes _OSI("Darwin")
to determine if the machine is running Apple OSX.
Like Linux's _OSI("*Windows*") strategy, Linux defaults to
answering YES to _OSI("Darwin") to enable full access
to the hardware and validated BIOS paths seen by OSX.
Just like on Windows-tested platforms, this strategy has risks.
Starting in Linux-3.18, the kernel answered YES to _OSI("Darwin")
for the purpose of enabling Mac Thunderbolt support. Further,
if the kernel noticed _OSI("Darwin") being invoked, it additionally
disabled all _OSI("*Windows*") to keep poorly written Mac BIOS
from going down untested combinations of paths.
The Linux-3.18 change in default caused power regressions on Mac
laptops, and the 3.18 implementation did not allow changing
the default via cmdline "acpi_osi=!Darwin". Linux-4.7 fixed
the ability to use acpi_osi=!Darwin as a workaround, and
we hope to see Mac Thunderbolt power management support in Linux-4.11.

View File

@@ -101,6 +101,6 @@ received a notification, it will set the backlight level accordingly. This does
not affect the sending of event to user space, they are always sent to user
space regardless of whether or not the video module controls the backlight level
directly. This behaviour can be controlled through the brightness_switch_enabled
module parameter as documented in kernel-parameters.txt. It is recommended to
module parameter as documented in admin-guide/kernel-parameters.rst. It is recommended to
disable this behaviour once a GUI environment starts up and wants to have full
control of the backlight level.

View File

@@ -1,527 +0,0 @@
Adding a New System Call
========================
This document describes what's involved in adding a new system call to the
Linux kernel, over and above the normal submission advice in
Documentation/SubmittingPatches.
System Call Alternatives
------------------------
The first thing to consider when adding a new system call is whether one of
the alternatives might be suitable instead. Although system calls are the
most traditional and most obvious interaction points between userspace and the
kernel, there are other possibilities -- choose what fits best for your
interface.
- If the operations involved can be made to look like a filesystem-like
object, it may make more sense to create a new filesystem or device. This
also makes it easier to encapsulate the new functionality in a kernel module
rather than requiring it to be built into the main kernel.
- If the new functionality involves operations where the kernel notifies
userspace that something has happened, then returning a new file
descriptor for the relevant object allows userspace to use
poll/select/epoll to receive that notification.
- However, operations that don't map to read(2)/write(2)-like operations
have to be implemented as ioctl(2) requests, which can lead to a
somewhat opaque API.
- If you're just exposing runtime system information, a new node in sysfs
(see Documentation/filesystems/sysfs.txt) or the /proc filesystem may be
more appropriate. However, access to these mechanisms requires that the
relevant filesystem is mounted, which might not always be the case (e.g.
in a namespaced/sandboxed/chrooted environment). Avoid adding any API to
debugfs, as this is not considered a 'production' interface to userspace.
- If the operation is specific to a particular file or file descriptor, then
an additional fcntl(2) command option may be more appropriate. However,
fcntl(2) is a multiplexing system call that hides a lot of complexity, so
this option is best for when the new function is closely analogous to
existing fcntl(2) functionality, or the new functionality is very simple
(for example, getting/setting a simple flag related to a file descriptor).
- If the operation is specific to a particular task or process, then an
additional prctl(2) command option may be more appropriate. As with
fcntl(2), this system call is a complicated multiplexor so is best reserved
for near-analogs of existing prctl() commands or getting/setting a simple
flag related to a process.
Designing the API: Planning for Extension
-----------------------------------------
A new system call forms part of the API of the kernel, and has to be supported
indefinitely. As such, it's a very good idea to explicitly discuss the
interface on the kernel mailing list, and it's important to plan for future
extensions of the interface.
(The syscall table is littered with historical examples where this wasn't done,
together with the corresponding follow-up system calls -- eventfd/eventfd2,
dup2/dup3, inotify_init/inotify_init1, pipe/pipe2, renameat/renameat2 -- so
learn from the history of the kernel and plan for extensions from the start.)
For simpler system calls that only take a couple of arguments, the preferred
way to allow for future extensibility is to include a flags argument to the
system call. To make sure that userspace programs can safely use flags
between kernel versions, check whether the flags value holds any unknown
flags, and reject the system call (with EINVAL) if it does:
if (flags & ~(THING_FLAG1 | THING_FLAG2 | THING_FLAG3))
return -EINVAL;
(If no flags values are used yet, check that the flags argument is zero.)
For more sophisticated system calls that involve a larger number of arguments,
it's preferred to encapsulate the majority of the arguments into a structure
that is passed in by pointer. Such a structure can cope with future extension
by including a size argument in the structure:
struct xyzzy_params {
u32 size; /* userspace sets p->size = sizeof(struct xyzzy_params) */
u32 param_1;
u64 param_2;
u64 param_3;
};
As long as any subsequently added field, say param_4, is designed so that a
zero value gives the previous behaviour, then this allows both directions of
version mismatch:
- To cope with a later userspace program calling an older kernel, the kernel
code should check that any memory beyond the size of the structure that it
expects is zero (effectively checking that param_4 == 0).
- To cope with an older userspace program calling a newer kernel, the kernel
code can zero-extend a smaller instance of the structure (effectively
setting param_4 = 0).
See perf_event_open(2) and the perf_copy_attr() function (in
kernel/events/core.c) for an example of this approach.
Designing the API: Other Considerations
---------------------------------------
If your new system call allows userspace to refer to a kernel object, it
should use a file descriptor as the handle for that object -- don't invent a
new type of userspace object handle when the kernel already has mechanisms and
well-defined semantics for using file descriptors.
If your new xyzzy(2) system call does return a new file descriptor, then the
flags argument should include a value that is equivalent to setting O_CLOEXEC
on the new FD. This makes it possible for userspace to close the timing
window between xyzzy() and calling fcntl(fd, F_SETFD, FD_CLOEXEC), where an
unexpected fork() and execve() in another thread could leak a descriptor to
the exec'ed program. (However, resist the temptation to re-use the actual value
of the O_CLOEXEC constant, as it is architecture-specific and is part of a
numbering space of O_* flags that is fairly full.)
If your system call returns a new file descriptor, you should also consider
what it means to use the poll(2) family of system calls on that file
descriptor. Making a file descriptor ready for reading or writing is the
normal way for the kernel to indicate to userspace that an event has
occurred on the corresponding kernel object.
If your new xyzzy(2) system call involves a filename argument:
int sys_xyzzy(const char __user *path, ..., unsigned int flags);
you should also consider whether an xyzzyat(2) version is more appropriate:
int sys_xyzzyat(int dfd, const char __user *path, ..., unsigned int flags);
This allows more flexibility for how userspace specifies the file in question;
in particular it allows userspace to request the functionality for an
already-opened file descriptor using the AT_EMPTY_PATH flag, effectively giving
an fxyzzy(3) operation for free:
- xyzzyat(AT_FDCWD, path, ..., 0) is equivalent to xyzzy(path,...)
- xyzzyat(fd, "", ..., AT_EMPTY_PATH) is equivalent to fxyzzy(fd, ...)
(For more details on the rationale of the *at() calls, see the openat(2) man
page; for an example of AT_EMPTY_PATH, see the fstatat(2) man page.)
If your new xyzzy(2) system call involves a parameter describing an offset
within a file, make its type loff_t so that 64-bit offsets can be supported
even on 32-bit architectures.
If your new xyzzy(2) system call involves privileged functionality, it needs
to be governed by the appropriate Linux capability bit (checked with a call to
capable()), as described in the capabilities(7) man page. Choose an existing
capability bit that governs related functionality, but try to avoid combining
lots of only vaguely related functions together under the same bit, as this
goes against capabilities' purpose of splitting the power of root. In
particular, avoid adding new uses of the already overly-general CAP_SYS_ADMIN
capability.
If your new xyzzy(2) system call manipulates a process other than the calling
process, it should be restricted (using a call to ptrace_may_access()) so that
only a calling process with the same permissions as the target process, or
with the necessary capabilities, can manipulate the target process.
Finally, be aware that some non-x86 architectures have an easier time if
system call parameters that are explicitly 64-bit fall on odd-numbered
arguments (i.e. parameter 1, 3, 5), to allow use of contiguous pairs of 32-bit
registers. (This concern does not apply if the arguments are part of a
structure that's passed in by pointer.)
Proposing the API
-----------------
To make new system calls easy to review, it's best to divide up the patchset
into separate chunks. These should include at least the following items as
distinct commits (each of which is described further below):
- The core implementation of the system call, together with prototypes,
generic numbering, Kconfig changes and fallback stub implementation.
- Wiring up of the new system call for one particular architecture, usually
x86 (including all of x86_64, x86_32 and x32).
- A demonstration of the use of the new system call in userspace via a
selftest in tools/testing/selftests/.
- A draft man-page for the new system call, either as plain text in the
cover letter, or as a patch to the (separate) man-pages repository.
New system call proposals, like any change to the kernel's API, should always
be cc'ed to linux-api@vger.kernel.org.
Generic System Call Implementation
----------------------------------
The main entry point for your new xyzzy(2) system call will be called
sys_xyzzy(), but you add this entry point with the appropriate
SYSCALL_DEFINEn() macro rather than explicitly. The 'n' indicates the number
of arguments to the system call, and the macro takes the system call name
followed by the (type, name) pairs for the parameters as arguments. Using
this macro allows metadata about the new system call to be made available for
other tools.
The new entry point also needs a corresponding function prototype, in
include/linux/syscalls.h, marked as asmlinkage to match the way that system
calls are invoked:
asmlinkage long sys_xyzzy(...);
Some architectures (e.g. x86) have their own architecture-specific syscall
tables, but several other architectures share a generic syscall table. Add your
new system call to the generic list by adding an entry to the list in
include/uapi/asm-generic/unistd.h:
#define __NR_xyzzy 292
__SYSCALL(__NR_xyzzy, sys_xyzzy)
Also update the __NR_syscalls count to reflect the additional system call, and
note that if multiple new system calls are added in the same merge window,
your new syscall number may get adjusted to resolve conflicts.
The file kernel/sys_ni.c provides a fallback stub implementation of each system
call, returning -ENOSYS. Add your new system call here too:
cond_syscall(sys_xyzzy);
Your new kernel functionality, and the system call that controls it, should
normally be optional, so add a CONFIG option (typically to init/Kconfig) for
it. As usual for new CONFIG options:
- Include a description of the new functionality and system call controlled
by the option.
- Make the option depend on EXPERT if it should be hidden from normal users.
- Make any new source files implementing the function dependent on the CONFIG
option in the Makefile (e.g. "obj-$(CONFIG_XYZZY_SYSCALL) += xyzzy.c").
- Double check that the kernel still builds with the new CONFIG option turned
off.
To summarize, you need a commit that includes:
- CONFIG option for the new function, normally in init/Kconfig
- SYSCALL_DEFINEn(xyzzy, ...) for the entry point
- corresponding prototype in include/linux/syscalls.h
- generic table entry in include/uapi/asm-generic/unistd.h
- fallback stub in kernel/sys_ni.c
x86 System Call Implementation
------------------------------
To wire up your new system call for x86 platforms, you need to update the
master syscall tables. Assuming your new system call isn't special in some
way (see below), this involves a "common" entry (for x86_64 and x32) in
arch/x86/entry/syscalls/syscall_64.tbl:
333 common xyzzy sys_xyzzy
and an "i386" entry in arch/x86/entry/syscalls/syscall_32.tbl:
380 i386 xyzzy sys_xyzzy
Again, these numbers are liable to be changed if there are conflicts in the
relevant merge window.
Compatibility System Calls (Generic)
------------------------------------
For most system calls the same 64-bit implementation can be invoked even when
the userspace program is itself 32-bit; even if the system call's parameters
include an explicit pointer, this is handled transparently.
However, there are a couple of situations where a compatibility layer is
needed to cope with size differences between 32-bit and 64-bit.
The first is if the 64-bit kernel also supports 32-bit userspace programs, and
so needs to parse areas of (__user) memory that could hold either 32-bit or
64-bit values. In particular, this is needed whenever a system call argument
is:
- a pointer to a pointer
- a pointer to a struct containing a pointer (e.g. struct iovec __user *)
- a pointer to a varying sized integral type (time_t, off_t, long, ...)
- a pointer to a struct containing a varying sized integral type.
The second situation that requires a compatibility layer is if one of the
system call's arguments has a type that is explicitly 64-bit even on a 32-bit
architecture, for example loff_t or __u64. In this case, a value that arrives
at a 64-bit kernel from a 32-bit application will be split into two 32-bit
values, which then need to be re-assembled in the compatibility layer.
(Note that a system call argument that's a pointer to an explicit 64-bit type
does *not* need a compatibility layer; for example, splice(2)'s arguments of
type loff_t __user * do not trigger the need for a compat_ system call.)
The compatibility version of the system call is called compat_sys_xyzzy(), and
is added with the COMPAT_SYSCALL_DEFINEn() macro, analogously to
SYSCALL_DEFINEn. This version of the implementation runs as part of a 64-bit
kernel, but expects to receive 32-bit parameter values and does whatever is
needed to deal with them. (Typically, the compat_sys_ version converts the
values to 64-bit versions and either calls on to the sys_ version, or both of
them call a common inner implementation function.)
The compat entry point also needs a corresponding function prototype, in
include/linux/compat.h, marked as asmlinkage to match the way that system
calls are invoked:
asmlinkage long compat_sys_xyzzy(...);
If the system call involves a structure that is laid out differently on 32-bit
and 64-bit systems, say struct xyzzy_args, then the include/linux/compat.h
header file should also include a compat version of the structure (struct
compat_xyzzy_args) where each variable-size field has the appropriate compat_
type that corresponds to the type in struct xyzzy_args. The
compat_sys_xyzzy() routine can then use this compat_ structure to parse the
arguments from a 32-bit invocation.
For example, if there are fields:
struct xyzzy_args {
const char __user *ptr;
__kernel_long_t varying_val;
u64 fixed_val;
/* ... */
};
in struct xyzzy_args, then struct compat_xyzzy_args would have:
struct compat_xyzzy_args {
compat_uptr_t ptr;
compat_long_t varying_val;
u64 fixed_val;
/* ... */
};
The generic system call list also needs adjusting to allow for the compat
version; the entry in include/uapi/asm-generic/unistd.h should use
__SC_COMP rather than __SYSCALL:
#define __NR_xyzzy 292
__SC_COMP(__NR_xyzzy, sys_xyzzy, compat_sys_xyzzy)
To summarize, you need:
- a COMPAT_SYSCALL_DEFINEn(xyzzy, ...) for the compat entry point
- corresponding prototype in include/linux/compat.h
- (if needed) 32-bit mapping struct in include/linux/compat.h
- instance of __SC_COMP not __SYSCALL in include/uapi/asm-generic/unistd.h
Compatibility System Calls (x86)
--------------------------------
To wire up the x86 architecture of a system call with a compatibility version,
the entries in the syscall tables need to be adjusted.
First, the entry in arch/x86/entry/syscalls/syscall_32.tbl gets an extra
column to indicate that a 32-bit userspace program running on a 64-bit kernel
should hit the compat entry point:
380 i386 xyzzy sys_xyzzy compat_sys_xyzzy
Second, you need to figure out what should happen for the x32 ABI version of
the new system call. There's a choice here: the layout of the arguments
should either match the 64-bit version or the 32-bit version.
If there's a pointer-to-a-pointer involved, the decision is easy: x32 is
ILP32, so the layout should match the 32-bit version, and the entry in
arch/x86/entry/syscalls/syscall_64.tbl is split so that x32 programs hit the
compatibility wrapper:
333 64 xyzzy sys_xyzzy
...
555 x32 xyzzy compat_sys_xyzzy
If no pointers are involved, then it is preferable to re-use the 64-bit system
call for the x32 ABI (and consequently the entry in
arch/x86/entry/syscalls/syscall_64.tbl is unchanged).
In either case, you should check that the types involved in your argument
layout do indeed map exactly from x32 (-mx32) to either the 32-bit (-m32) or
64-bit (-m64) equivalents.
System Calls Returning Elsewhere
--------------------------------
For most system calls, once the system call is complete the user program
continues exactly where it left off -- at the next instruction, with the
stack the same and most of the registers the same as before the system call,
and with the same virtual memory space.
However, a few system calls do things differently. They might return to a
different location (rt_sigreturn) or change the memory space (fork/vfork/clone)
or even architecture (execve/execveat) of the program.
To allow for this, the kernel implementation of the system call may need to
save and restore additional registers to the kernel stack, allowing complete
control of where and how execution continues after the system call.
This is arch-specific, but typically involves defining assembly entry points
that save/restore additional registers and invoke the real system call entry
point.
For x86_64, this is implemented as a stub_xyzzy entry point in
arch/x86/entry/entry_64.S, and the entry in the syscall table
(arch/x86/entry/syscalls/syscall_64.tbl) is adjusted to match:
333 common xyzzy stub_xyzzy
The equivalent for 32-bit programs running on a 64-bit kernel is normally
called stub32_xyzzy and implemented in arch/x86/entry/entry_64_compat.S,
with the corresponding syscall table adjustment in
arch/x86/entry/syscalls/syscall_32.tbl:
380 i386 xyzzy sys_xyzzy stub32_xyzzy
If the system call needs a compatibility layer (as in the previous section)
then the stub32_ version needs to call on to the compat_sys_ version of the
system call rather than the native 64-bit version. Also, if the x32 ABI
implementation is not common with the x86_64 version, then its syscall
table will also need to invoke a stub that calls on to the compat_sys_
version.
For completeness, it's also nice to set up a mapping so that user-mode Linux
still works -- its syscall table will reference stub_xyzzy, but the UML build
doesn't include arch/x86/entry/entry_64.S implementation (because UML
simulates registers etc). Fixing this is as simple as adding a #define to
arch/x86/um/sys_call_table_64.c:
#define stub_xyzzy sys_xyzzy
Other Details
-------------
Most of the kernel treats system calls in a generic way, but there is the
occasional exception that may need updating for your particular system call.
The audit subsystem is one such special case; it includes (arch-specific)
functions that classify some special types of system call -- specifically
file open (open/openat), program execution (execve/exeveat) or socket
multiplexor (socketcall) operations. If your new system call is analogous to
one of these, then the audit system should be updated.
More generally, if there is an existing system call that is analogous to your
new system call, it's worth doing a kernel-wide grep for the existing system
call to check there are no other special cases.
Testing
-------
A new system call should obviously be tested; it is also useful to provide
reviewers with a demonstration of how user space programs will use the system
call. A good way to combine these aims is to include a simple self-test
program in a new directory under tools/testing/selftests/.
For a new system call, there will obviously be no libc wrapper function and so
the test will need to invoke it using syscall(); also, if the system call
involves a new userspace-visible structure, the corresponding header will need
to be installed to compile the test.
Make sure the selftest runs successfully on all supported architectures. For
example, check that it works when compiled as an x86_64 (-m64), x86_32 (-m32)
and x32 (-mx32) ABI program.
For more extensive and thorough testing of new functionality, you should also
consider adding tests to the Linux Test Project, or to the xfstests project
for filesystem-related changes.
- https://linux-test-project.github.io/
- git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
Man Page
--------
All new system calls should come with a complete man page, ideally using groff
markup, but plain text will do. If groff is used, it's helpful to include a
pre-rendered ASCII version of the man page in the cover email for the
patchset, for the convenience of reviewers.
The man page should be cc'ed to linux-man@vger.kernel.org
For more details, see https://www.kernel.org/doc/man-pages/patches.html
References and Sources
----------------------
- LWN article from Michael Kerrisk on use of flags argument in system calls:
https://lwn.net/Articles/585415/
- LWN article from Michael Kerrisk on how to handle unknown flags in a system
call: https://lwn.net/Articles/588444/
- LWN article from Jake Edge describing constraints on 64-bit system call
arguments: https://lwn.net/Articles/311630/
- Pair of LWN articles from David Drysdale that describe the system call
implementation paths in detail for v3.14:
- https://lwn.net/Articles/604287/
- https://lwn.net/Articles/604515/
- Architecture-specific requirements for system calls are discussed in the
syscall(2) man-page:
http://man7.org/linux/man-pages/man2/syscall.2.html#NOTES
- Collated emails from Linus Torvalds discussing the problems with ioctl():
http://yarchive.net/comp/linux/ioctl.html
- "How to not invent kernel interfaces", Arnd Bergmann,
http://www.ukuug.org/events/linux2007/2007/papers/Bergmann.pdf
- LWN article from Michael Kerrisk on avoiding new uses of CAP_SYS_ADMIN:
https://lwn.net/Articles/486306/
- Recommendation from Andrew Morton that all related information for a new
system call should come in the same email thread:
https://lkml.org/lkml/2014/7/24/641
- Recommendation from Michael Kerrisk that a new system call should come with
a man page: https://lkml.org/lkml/2014/6/13/309
- Suggestion from Thomas Gleixner that x86 wire-up should be in a separate
commit: https://lkml.org/lkml/2014/11/19/254
- Suggestion from Greg Kroah-Hartman that it's good for new system calls to
come with a man-page & selftest: https://lkml.org/lkml/2014/3/19/710
- Discussion from Michael Kerrisk of new system call vs. prctl(2) extension:
https://lkml.org/lkml/2014/6/3/411
- Suggestion from Ingo Molnar that system calls that involve multiple
arguments should encapsulate those arguments in a struct, which includes a
size field for future extensibility: https://lkml.org/lkml/2015/7/30/117
- Numbering oddities arising from (re-)use of O_* numbering space flags:
- commit 75069f2b5bfb ("vfs: renumber FMODE_NONOTIFY and add to uniqueness
check")
- commit 12ed2e36c98a ("fanotify: FMODE_NONOTIFY and __O_SYNC in sparc
conflict")
- commit bb458c644a59 ("Safer ABI for O_TMPFILE")
- Discussion from Matthew Wilcox about restrictions on 64-bit arguments:
https://lkml.org/lkml/2008/12/12/187
- Recommendation from Greg Kroah-Hartman that unknown flags should be
policed: https://lkml.org/lkml/2014/7/17/577
- Recommendation from Linus Torvalds that x32 system calls should prefer
compatibility with 64-bit versions rather than 32-bit versions:
https://lkml.org/lkml/2011/8/31/244

View File

@@ -0,0 +1,411 @@
Linux kernel release 4.x <http://kernel.org/>
=============================================
These are the release notes for Linux version 4. Read them carefully,
as they tell you what this is all about, explain how to install the
kernel, and what to do if something goes wrong.
What is Linux?
--------------
Linux is a clone of the operating system Unix, written from scratch by
Linus Torvalds with assistance from a loosely-knit team of hackers across
the Net. It aims towards POSIX and Single UNIX Specification compliance.
It has all the features you would expect in a modern fully-fledged Unix,
including true multitasking, virtual memory, shared libraries, demand
loading, shared copy-on-write executables, proper memory management,
and multistack networking including IPv4 and IPv6.
It is distributed under the GNU General Public License - see the
accompanying COPYING file for more details.
On what hardware does it run?
-----------------------------
Although originally developed first for 32-bit x86-based PCs (386 or higher),
today Linux also runs on (at least) the Compaq Alpha AXP, Sun SPARC and
UltraSPARC, Motorola 68000, PowerPC, PowerPC64, ARM, Hitachi SuperH, Cell,
IBM S/390, MIPS, HP PA-RISC, Intel IA-64, DEC VAX, AMD x86-64, AXIS CRIS,
Xtensa, Tilera TILE, AVR32, ARC and Renesas M32R architectures.
Linux is easily portable to most general-purpose 32- or 64-bit architectures
as long as they have a paged memory management unit (PMMU) and a port of the
GNU C compiler (gcc) (part of The GNU Compiler Collection, GCC). Linux has
also been ported to a number of architectures without a PMMU, although
functionality is then obviously somewhat limited.
Linux has also been ported to itself. You can now run the kernel as a
userspace application - this is called UserMode Linux (UML).
Documentation
-------------
- There is a lot of documentation available both in electronic form on
the Internet and in books, both Linux-specific and pertaining to
general UNIX questions. I'd recommend looking into the documentation
subdirectories on any Linux FTP site for the LDP (Linux Documentation
Project) books. This README is not meant to be documentation on the
system: there are much better sources available.
- There are various README files in the Documentation/ subdirectory:
these typically contain kernel-specific installation notes for some
drivers for example. See Documentation/00-INDEX for a list of what
is contained in each file. Please read the
:ref:`Documentation/process/changes.rst <changes>` file, as it
contains information about the problems, which may result by upgrading
your kernel.
- The Documentation/DocBook/ subdirectory contains several guides for
kernel developers and users. These guides can be rendered in a
number of formats: PostScript (.ps), PDF, HTML, & man-pages, among others.
After installation, ``make psdocs``, ``make pdfdocs``, ``make htmldocs``,
or ``make mandocs`` will render the documentation in the requested format.
Installing the kernel source
----------------------------
- If you install the full sources, put the kernel tarball in a
directory where you have permissions (e.g. your home directory) and
unpack it::
xz -cd linux-4.X.tar.xz | tar xvf -
Replace "X" with the version number of the latest kernel.
Do NOT use the /usr/src/linux area! This area has a (usually
incomplete) set of kernel headers that are used by the library header
files. They should match the library, and not get messed up by
whatever the kernel-du-jour happens to be.
- You can also upgrade between 4.x releases by patching. Patches are
distributed in the xz format. To install by patching, get all the
newer patch files, enter the top level directory of the kernel source
(linux-4.X) and execute::
xz -cd ../patch-4.x.xz | patch -p1
Replace "x" for all versions bigger than the version "X" of your current
source tree, **in_order**, and you should be ok. You may want to remove
the backup files (some-file-name~ or some-file-name.orig), and make sure
that there are no failed patches (some-file-name# or some-file-name.rej).
If there are, either you or I have made a mistake.
Unlike patches for the 4.x kernels, patches for the 4.x.y kernels
(also known as the -stable kernels) are not incremental but instead apply
directly to the base 4.x kernel. For example, if your base kernel is 4.0
and you want to apply the 4.0.3 patch, you must not first apply the 4.0.1
and 4.0.2 patches. Similarly, if you are running kernel version 4.0.2 and
want to jump to 4.0.3, you must first reverse the 4.0.2 patch (that is,
patch -R) **before** applying the 4.0.3 patch. You can read more on this in
:ref:`Documentation/process/applying-patches.rst <applying_patches>`.
Alternatively, the script patch-kernel can be used to automate this
process. It determines the current kernel version and applies any
patches found::
linux/scripts/patch-kernel linux
The first argument in the command above is the location of the
kernel source. Patches are applied from the current directory, but
an alternative directory can be specified as the second argument.
- Make sure you have no stale .o files and dependencies lying around::
cd linux
make mrproper
You should now have the sources correctly installed.
Software requirements
---------------------
Compiling and running the 4.x kernels requires up-to-date
versions of various software packages. Consult
:ref:`Documentation/process/changes.rst <changes>` for the minimum version numbers
required and how to get updates for these packages. Beware that using
excessively old versions of these packages can cause indirect
errors that are very difficult to track down, so don't assume that
you can just update packages when obvious problems arise during
build or operation.
Build directory for the kernel
------------------------------
When compiling the kernel, all output files will per default be
stored together with the kernel source code.
Using the option ``make O=output/dir`` allows you to specify an alternate
place for the output files (including .config).
Example::
kernel source code: /usr/src/linux-4.X
build directory: /home/name/build/kernel
To configure and build the kernel, use::
cd /usr/src/linux-4.X
make O=/home/name/build/kernel menuconfig
make O=/home/name/build/kernel
sudo make O=/home/name/build/kernel modules_install install
Please note: If the ``O=output/dir`` option is used, then it must be
used for all invocations of make.
Configuring the kernel
----------------------
Do not skip this step even if you are only upgrading one minor
version. New configuration options are added in each release, and
odd problems will turn up if the configuration files are not set up
as expected. If you want to carry your existing configuration to a
new version with minimal work, use ``make oldconfig``, which will
only ask you for the answers to new questions.
- Alternative configuration commands are::
"make config" Plain text interface.
"make menuconfig" Text based color menus, radiolists & dialogs.
"make nconfig" Enhanced text based color menus.
"make xconfig" Qt based configuration tool.
"make gconfig" GTK+ based configuration tool.
"make oldconfig" Default all questions based on the contents of
your existing ./.config file and asking about
new config symbols.
"make silentoldconfig"
Like above, but avoids cluttering the screen
with questions already answered.
Additionally updates the dependencies.
"make olddefconfig"
Like above, but sets new symbols to their default
values without prompting.
"make defconfig" Create a ./.config file by using the default
symbol values from either arch/$ARCH/defconfig
or arch/$ARCH/configs/${PLATFORM}_defconfig,
depending on the architecture.
"make ${PLATFORM}_defconfig"
Create a ./.config file by using the default
symbol values from
arch/$ARCH/configs/${PLATFORM}_defconfig.
Use "make help" to get a list of all available
platforms of your architecture.
"make allyesconfig"
Create a ./.config file by setting symbol
values to 'y' as much as possible.
"make allmodconfig"
Create a ./.config file by setting symbol
values to 'm' as much as possible.
"make allnoconfig" Create a ./.config file by setting symbol
values to 'n' as much as possible.
"make randconfig" Create a ./.config file by setting symbol
values to random values.
"make localmodconfig" Create a config based on current config and
loaded modules (lsmod). Disables any module
option that is not needed for the loaded modules.
To create a localmodconfig for another machine,
store the lsmod of that machine into a file
and pass it in as a LSMOD parameter.
target$ lsmod > /tmp/mylsmod
target$ scp /tmp/mylsmod host:/tmp
host$ make LSMOD=/tmp/mylsmod localmodconfig
The above also works when cross compiling.
"make localyesconfig" Similar to localmodconfig, except it will convert
all module options to built in (=y) options.
You can find more information on using the Linux kernel config tools
in Documentation/kbuild/kconfig.txt.
- NOTES on ``make config``:
- Having unnecessary drivers will make the kernel bigger, and can
under some circumstances lead to problems: probing for a
nonexistent controller card may confuse your other controllers
- A kernel with math-emulation compiled in will still use the
coprocessor if one is present: the math emulation will just
never get used in that case. The kernel will be slightly larger,
but will work on different machines regardless of whether they
have a math coprocessor or not.
- The "kernel hacking" configuration details usually result in a
bigger or slower kernel (or both), and can even make the kernel
less stable by configuring some routines to actively try to
break bad code to find kernel problems (kmalloc()). Thus you
should probably answer 'n' to the questions for "development",
"experimental", or "debugging" features.
Compiling the kernel
--------------------
- Make sure you have at least gcc 3.2 available.
For more information, refer to :ref:`Documentation/process/changes.rst <changes>`.
Please note that you can still run a.out user programs with this kernel.
- Do a ``make`` to create a compressed kernel image. It is also
possible to do ``make install`` if you have lilo installed to suit the
kernel makefiles, but you may want to check your particular lilo setup first.
To do the actual install, you have to be root, but none of the normal
build should require that. Don't take the name of root in vain.
- If you configured any of the parts of the kernel as ``modules``, you
will also have to do ``make modules_install``.
- Verbose kernel compile/build output:
Normally, the kernel build system runs in a fairly quiet mode (but not
totally silent). However, sometimes you or other kernel developers need
to see compile, link, or other commands exactly as they are executed.
For this, use "verbose" build mode. This is done by passing
``V=1`` to the ``make`` command, e.g.::
make V=1 all
To have the build system also tell the reason for the rebuild of each
target, use ``V=2``. The default is ``V=0``.
- Keep a backup kernel handy in case something goes wrong. This is
especially true for the development releases, since each new release
contains new code which has not been debugged. Make sure you keep a
backup of the modules corresponding to that kernel, as well. If you
are installing a new kernel with the same version number as your
working kernel, make a backup of your modules directory before you
do a ``make modules_install``.
Alternatively, before compiling, use the kernel config option
"LOCALVERSION" to append a unique suffix to the regular kernel version.
LOCALVERSION can be set in the "General Setup" menu.
- In order to boot your new kernel, you'll need to copy the kernel
image (e.g. .../linux/arch/x86/boot/bzImage after compilation)
to the place where your regular bootable kernel is found.
- Booting a kernel directly from a floppy without the assistance of a
bootloader such as LILO, is no longer supported.
If you boot Linux from the hard drive, chances are you use LILO, which
uses the kernel image as specified in the file /etc/lilo.conf. The
kernel image file is usually /vmlinuz, /boot/vmlinuz, /bzImage or
/boot/bzImage. To use the new kernel, save a copy of the old image
and copy the new image over the old one. Then, you MUST RERUN LILO
to update the loading map! If you don't, you won't be able to boot
the new kernel image.
Reinstalling LILO is usually a matter of running /sbin/lilo.
You may wish to edit /etc/lilo.conf to specify an entry for your
old kernel image (say, /vmlinux.old) in case the new one does not
work. See the LILO docs for more information.
After reinstalling LILO, you should be all set. Shutdown the system,
reboot, and enjoy!
If you ever need to change the default root device, video mode,
ramdisk size, etc. in the kernel image, use the ``rdev`` program (or
alternatively the LILO boot options when appropriate). No need to
recompile the kernel to change these parameters.
- Reboot with the new kernel and enjoy.
If something goes wrong
-----------------------
- If you have problems that seem to be due to kernel bugs, please check
the file MAINTAINERS to see if there is a particular person associated
with the part of the kernel that you are having trouble with. If there
isn't anyone listed there, then the second best thing is to mail
them to me (torvalds@linux-foundation.org), and possibly to any other
relevant mailing-list or to the newsgroup.
- In all bug-reports, *please* tell what kernel you are talking about,
how to duplicate the problem, and what your setup is (use your common
sense). If the problem is new, tell me so, and if the problem is
old, please try to tell me when you first noticed it.
- If the bug results in a message like::
unable to handle kernel paging request at address C0000010
Oops: 0002
EIP: 0010:XXXXXXXX
eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx
esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx
ds: xxxx es: xxxx fs: xxxx gs: xxxx
Pid: xx, process nr: xx
xx xx xx xx xx xx xx xx xx xx
or similar kernel debugging information on your screen or in your
system log, please duplicate it *exactly*. The dump may look
incomprehensible to you, but it does contain information that may
help debugging the problem. The text above the dump is also
important: it tells something about why the kernel dumped code (in
the above example, it's due to a bad kernel pointer). More information
on making sense of the dump is in Documentation/admin-guide/oops-tracing.rst
- If you compiled the kernel with CONFIG_KALLSYMS you can send the dump
as is, otherwise you will have to use the ``ksymoops`` program to make
sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred).
This utility can be downloaded from
ftp://ftp.<country>.kernel.org/pub/linux/utils/kernel/ksymoops/ .
Alternatively, you can do the dump lookup by hand:
- In debugging dumps like the above, it helps enormously if you can
look up what the EIP value means. The hex value as such doesn't help
me or anybody else very much: it will depend on your particular
kernel setup. What you should do is take the hex value from the EIP
line (ignore the ``0010:``), and look it up in the kernel namelist to
see which kernel function contains the offending address.
To find out the kernel function name, you'll need to find the system
binary associated with the kernel that exhibited the symptom. This is
the file 'linux/vmlinux'. To extract the namelist and match it against
the EIP from the kernel crash, do::
nm vmlinux | sort | less
This will give you a list of kernel addresses sorted in ascending
order, from which it is simple to find the function that contains the
offending address. Note that the address given by the kernel
debugging messages will not necessarily match exactly with the
function addresses (in fact, that is very unlikely), so you can't
just 'grep' the list: the list will, however, give you the starting
point of each kernel function, so by looking for the function that
has a starting address lower than the one you are searching for but
is followed by a function with a higher address you will find the one
you want. In fact, it may be a good idea to include a bit of
"context" in your problem report, giving a few lines around the
interesting one.
If you for some reason cannot do the above (you have a pre-compiled
kernel image or similar), telling me as much about your setup as
possible will help. Please read the :ref:`admin-guide/reporting-bugs.rst <reportingbugs>`
document for details.
- Alternatively, you can use gdb on a running kernel. (read-only; i.e. you
cannot change values or set break points.) To do this, first compile the
kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make
clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``).
After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``.
You can now use all the usual gdb commands. The command to look up the
point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes
with the EIP value.)
gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly)
disregards the starting offset for which the kernel is compiled.

View File

@@ -0,0 +1,151 @@
Kernel Support for miscellaneous (your favourite) Binary Formats v1.1
=====================================================================
This Kernel feature allows you to invoke almost (for restrictions see below)
every program by simply typing its name in the shell.
This includes for example compiled Java(TM), Python or Emacs programs.
To achieve this you must tell binfmt_misc which interpreter has to be invoked
with which binary. Binfmt_misc recognises the binary-type by matching some bytes
at the beginning of the file with a magic byte sequence (masking out specified
bits) you have supplied. Binfmt_misc can also recognise a filename extension
aka ``.com`` or ``.exe``.
First you must mount binfmt_misc::
mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
To actually register a new binary type, you have to set up a string looking like
``:name:type:offset:magic:mask:interpreter:flags`` (where you can choose the
``:`` upon your needs) and echo it to ``/proc/sys/fs/binfmt_misc/register``.
Here is what the fields mean:
- ``name``
is an identifier string. A new /proc file will be created with this
``name below /proc/sys/fs/binfmt_misc``; cannot contain slashes ``/`` for
obvious reasons.
- ``type``
is the type of recognition. Give ``M`` for magic and ``E`` for extension.
- ``offset``
is the offset of the magic/mask in the file, counted in bytes. This
defaults to 0 if you omit it (i.e. you write ``:name:type::magic...``).
Ignored when using filename extension matching.
- ``magic``
is the byte sequence binfmt_misc is matching for. The magic string
may contain hex-encoded characters like ``\x0a`` or ``\xA4``. Note that you
must escape any NUL bytes; parsing halts at the first one. In a shell
environment you might have to write ``\\x0a`` to prevent the shell from
eating your ``\``.
If you chose filename extension matching, this is the extension to be
recognised (without the ``.``, the ``\x0a`` specials are not allowed).
Extension matching is case sensitive, and slashes ``/`` are not allowed!
- ``mask``
is an (optional, defaults to all 0xff) mask. You can mask out some
bits from matching by supplying a string like magic and as long as magic.
The mask is anded with the byte sequence of the file. Note that you must
escape any NUL bytes; parsing halts at the first one. Ignored when using
filename extension matching.
- ``interpreter``
is the program that should be invoked with the binary as first
argument (specify the full path)
- ``flags``
is an optional field that controls several aspects of the invocation
of the interpreter. It is a string of capital letters, each controls a
certain aspect. The following flags are supported:
``P`` - preserve-argv[0]
Legacy behavior of binfmt_misc is to overwrite
the original argv[0] with the full path to the binary. When this
flag is included, binfmt_misc will add an argument to the argument
vector for this purpose, thus preserving the original ``argv[0]``.
e.g. If your interp is set to ``/bin/foo`` and you run ``blah``
(which is in ``/usr/local/bin``), then the kernel will execute
``/bin/foo`` with ``argv[]`` set to ``["/bin/foo", "/usr/local/bin/blah", "blah"]``. The interp has to be aware of this so it can
execute ``/usr/local/bin/blah``
with ``argv[]`` set to ``["blah"]``.
``O`` - open-binary
Legacy behavior of binfmt_misc is to pass the full path
of the binary to the interpreter as an argument. When this flag is
included, binfmt_misc will open the file for reading and pass its
descriptor as an argument, instead of the full path, thus allowing
the interpreter to execute non-readable binaries. This feature
should be used with care - the interpreter has to be trusted not to
emit the contents of the non-readable binary.
``C`` - credentials
Currently, the behavior of binfmt_misc is to calculate
the credentials and security token of the new process according to
the interpreter. When this flag is included, these attributes are
calculated according to the binary. It also implies the ``O`` flag.
This feature should be used with care as the interpreter
will run with root permissions when a setuid binary owned by root
is run with binfmt_misc.
``F`` - fix binary
The usual behaviour of binfmt_misc is to spawn the
binary lazily when the misc format file is invoked. However,
this doesn``t work very well in the face of mount namespaces and
changeroots, so the ``F`` mode opens the binary as soon as the
emulation is installed and uses the opened image to spawn the
emulator, meaning it is always available once installed,
regardless of how the environment changes.
There are some restrictions:
- the whole register string may not exceed 1920 characters
- the magic must reside in the first 128 bytes of the file, i.e.
offset+size(magic) has to be less than 128
- the interpreter string may not exceed 127 characters
To use binfmt_misc you have to mount it first. You can mount it with
``mount -t binfmt_misc none /proc/sys/fs/binfmt_misc`` command, or you can add
a line ``none /proc/sys/fs/binfmt_misc binfmt_misc defaults 0 0`` to your
``/etc/fstab`` so it auto mounts on boot.
You may want to add the binary formats in one of your ``/etc/rc`` scripts during
boot-up. Read the manual of your init program to figure out how to do this
right.
Think about the order of adding entries! Later added entries are matched first!
A few examples (assumed you are in ``/proc/sys/fs/binfmt_misc``):
- enable support for em86 (like binfmt_em86, for Alpha AXP only)::
echo ':i386:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
echo ':i486:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x06:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
- enable support for packed DOS applications (pre-configured dosemu hdimages)::
echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register
- enable support for Windows executables using wine::
echo ':DOSWin:M::MZ::/usr/local/bin/wine:' > register
For java support see Documentation/admin-guide/java.rst
You can enable/disable binfmt_misc or one binary type by echoing 0 (to disable)
or 1 (to enable) to ``/proc/sys/fs/binfmt_misc/status`` or
``/proc/.../the_name``.
Catting the file tells you the current status of ``binfmt_misc/the_entry``.
You can remove one entry or all entries by echoing -1 to ``/proc/.../the_name``
or ``/proc/sys/fs/binfmt_misc/status``.
Hints
-----
If you want to pass special arguments to your interpreter, you can
write a wrapper script for it. See Documentation/admin-guide/java.rst for an
example.
Your interpreter should NOT look in the PATH for the filename; the kernel
passes it the full filename (or the file descriptor) to use. Using ``$PATH`` can
cause unexpected behaviour and can be a security hazard.
Richard Günther <rguenth@tat.physik.uni-tuebingen.de>

View File

@@ -0,0 +1,38 @@
Linux Braille Console
=====================
To get early boot messages on a braille device (before userspace screen
readers can start), you first need to compile the support for the usual serial
console (see :ref:`Documentation/admin-guide/serial-console.rst <serial_console>`), and
for braille device
(in :menuselection:`Device Drivers --> Accessibility support --> Console on braille device`).
Then you need to specify a ``console=brl``, option on the kernel command line, the
format is::
console=brl,serial_options...
where ``serial_options...`` are the same as described in
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`.
So for instance you can use ``console=brl,ttyS0`` if the braille device is connected to the first serial port, and ``console=brl,ttyS0,115200`` to
override the baud rate to 115200, etc.
By default, the braille device will just show the last kernel message (console
mode). To review previous messages, press the Insert key to switch to the VT
review mode. In review mode, the arrow keys permit to browse in the VT content,
:kbd:`PAGE-UP`/:kbd:`PAGE-DOWN` keys go at the top/bottom of the screen, and
the :kbd:`HOME` key goes back
to the cursor, hence providing very basic screen reviewing facility.
Sound feedback can be obtained by adding the ``braille_console.sound=1`` kernel
parameter.
For simplicity, only one braille console can be enabled, other uses of
``console=brl,...`` will be discarded. Also note that it does not interfere with
the console selection mechanism described in
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`.
For now, only the VisioBraille device is supported.
Samuel Thibault <samuel.thibault@ens-lyon.org>

View File

@@ -0,0 +1,76 @@
Bisecting a bug
+++++++++++++++
Last updated: 28 October 2016
Introduction
============
Always try the latest kernel from kernel.org and build from source. If you are
not confident in doing that please report the bug to your distribution vendor
instead of to a kernel developer.
Finding bugs is not always easy. Have a go though. If you can't find it don't
give up. Report as much as you have found to the relevant maintainer. See
MAINTAINERS for who that is for the subsystem you have worked on.
Before you submit a bug report read
:ref:`Documentation/admin-guide/reporting-bugs.rst <reportingbugs>`.
Devices not appearing
=====================
Often this is caused by udev/systemd. Check that first before blaming it
on the kernel.
Finding patch that caused a bug
===============================
Using the provided tools with ``git`` makes finding bugs easy provided the bug
is reproducible.
Steps to do it:
- build the Kernel from its git source
- start bisect with [#f1]_::
$ git bisect start
- mark the broken changeset with::
$ git bisect bad [commit]
- mark a changeset where the code is known to work with::
$ git bisect good [commit]
- rebuild the Kernel and test
- interact with git bisect by using either::
$ git bisect good
or::
$ git bisect bad
depending if the bug happened on the changeset you're testing
- After some interactions, git bisect will give you the changeset that
likely caused the bug.
- For example, if you know that the current version is bad, and version
4.8 is good, you could do::
$ git bisect start
$ git bisect bad # Current version is bad
$ git bisect good v4.8
.. [#f1] You can, optionally, provide both good and bad arguments at git
start with ``git bisect start [BAD] [GOOD]``
For further references, please read:
- The man page for ``git-bisect``
- `Fighting regressions with git bisect <https://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html>`_
- `Fully automated bisecting with "git bisect run" <https://lwn.net/Articles/317154>`_
- `Using Git bisect to figure out when brokenness was introduced <http://webchick.net/node/99>`_

View File

@@ -0,0 +1,369 @@
Bug hunting
===========
Kernel bug reports often come with a stack dump like the one below::
------------[ cut here ]------------
WARNING: CPU: 1 PID: 28102 at kernel/module.c:1108 module_put+0x57/0x70
Modules linked in: dvb_usb_gp8psk(-) dvb_usb dvb_core nvidia_drm(PO) nvidia_modeset(PO) snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd soundcore nvidia(PO) [last unloaded: rc_core]
CPU: 1 PID: 28102 Comm: rmmod Tainted: P WC O 4.8.4-build.1 #1
Hardware name: MSI MS-7309/MS-7309, BIOS V1.12 02/23/2009
00000000 c12ba080 00000000 00000000 c103ed6a c1616014 00000001 00006dc6
c1615862 00000454 c109e8a7 c109e8a7 00000009 ffffffff 00000000 f13f6a10
f5f5a600 c103ee33 00000009 00000000 00000000 c109e8a7 f80ca4d0 c109f617
Call Trace:
[<c12ba080>] ? dump_stack+0x44/0x64
[<c103ed6a>] ? __warn+0xfa/0x120
[<c109e8a7>] ? module_put+0x57/0x70
[<c109e8a7>] ? module_put+0x57/0x70
[<c103ee33>] ? warn_slowpath_null+0x23/0x30
[<c109e8a7>] ? module_put+0x57/0x70
[<f80ca4d0>] ? gp8psk_fe_set_frontend+0x460/0x460 [dvb_usb_gp8psk]
[<c109f617>] ? symbol_put_addr+0x27/0x50
[<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
[<f80bb3bf>] ? dvb_usb_exit+0x2f/0xd0 [dvb_usb]
[<c13d03bc>] ? usb_disable_endpoint+0x7c/0xb0
[<f80bb48a>] ? dvb_usb_device_exit+0x2a/0x50 [dvb_usb]
[<c13d2882>] ? usb_unbind_interface+0x62/0x250
[<c136b514>] ? __pm_runtime_idle+0x44/0x70
[<c13620d8>] ? __device_release_driver+0x78/0x120
[<c1362907>] ? driver_detach+0x87/0x90
[<c1361c48>] ? bus_remove_driver+0x38/0x90
[<c13d1c18>] ? usb_deregister+0x58/0xb0
[<c109fbb0>] ? SyS_delete_module+0x130/0x1f0
[<c1055654>] ? task_work_run+0x64/0x80
[<c1000fa5>] ? exit_to_usermode_loop+0x85/0x90
[<c10013f0>] ? do_fast_syscall_32+0x80/0x130
[<c1549f43>] ? sysenter_past_esp+0x40/0x6a
---[ end trace 6ebc60ef3981792f ]---
Such stack traces provide enough information to identify the line inside the
Kernel's source code where the bug happened. Depending on the severity of
the issue, it may also contain the word **Oops**, as on this one::
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c06969d4>] iret_exc+0x7d0/0xa59
*pdpt = 000000002258a001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP
...
Despite being an **Oops** or some other sort of stack trace, the offended
line is usually required to identify and handle the bug. Along this chapter,
we'll refer to "Oops" for all kinds of stack traces that need to be analized.
.. note::
``ksymoops`` is useless on 2.6 or upper. Please use the Oops in its original
format (from ``dmesg``, etc). Ignore any references in this or other docs to
"decoding the Oops" or "running it through ksymoops".
If you post an Oops from 2.6+ that has been run through ``ksymoops``,
people will just tell you to repost it.
Where is the Oops message is located?
-------------------------------------
Normally the Oops text is read from the kernel buffers by klogd and
handed to ``syslogd`` which writes it to a syslog file, typically
``/var/log/messages`` (depends on ``/etc/syslog.conf``). On systems with
systemd, it may also be stored by the ``journald`` daemon, and accessed
by running ``journalctl`` command.
Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
read the data from the kernel buffers and save it. Or you can
``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
``kmsg`` is a "never ending file".
If the machine has crashed so badly that you cannot enter commands or
the disk is not available then you have three options:
(1) Hand copy the text from the screen and type it in after the machine
has restarted. Messy but it is the only option if you have not
planned for a crash. Alternatively, you can take a picture of
the screen with a digital camera - not nice, but better than
nothing. If the messages scroll off the top of the console, you
may find that booting with a higher resolution (eg, ``vga=791``)
will allow you to read more of the text. (Caveat: This needs ``vesafb``,
so won't help for 'early' oopses)
(2) Boot with a serial console (see
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
run a null modem to a second machine and capture the output there
using your favourite communication program. Minicom works well.
(3) Use Kdump (see Documentation/kdump/kdump.txt),
extract the kernel ring buffer from old memory with using dmesg
gdbmacro in Documentation/kdump/gdbmacros.txt.
Finding the bug's location
--------------------------
Reporting a bug works best if you point the location of the bug at the
Kernel source file. There are two methods for doing that. Usually, using
``gdb`` is easier, but the Kernel should be pre-compiled with debug info.
gdb
^^^
The GNU debug (``gdb``) is the best way to figure out the exact file and line
number of the OOPS from the ``vmlinux`` file.
The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
This can be set by running::
$ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
On a kernel compiled with ``CONFIG_DEBUG_INFO``, you can simply copy the
EIP value from the OOPS::
EIP: 0060:[<c021e50e>] Not tainted VLI
And use GDB to translate that to human-readable form::
$ gdb vmlinux
(gdb) l *0xc021e50e
If you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function
offset from the OOPS::
EIP is at vt_ioctl+0xda8/0x1482
And recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled::
$ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
$ make vmlinux
$ gdb vmlinux
(gdb) l *vt_ioctl+0xda8
0x1888 is in vt_ioctl (drivers/tty/vt/vt_ioctl.c:293).
288 {
289 struct vc_data *vc = NULL;
290 int ret = 0;
291
292 console_lock();
293 if (VT_BUSY(vc_num))
294 ret = -EBUSY;
295 else if (vc_num)
296 vc = vc_deallocate(vc_num);
297 console_unlock();
or, if you want to be more verbose::
(gdb) p vt_ioctl
$1 = {int (struct tty_struct *, unsigned int, unsigned long)} 0xae0 <vt_ioctl>
(gdb) l *0xae0+0xda8
You could, instead, use the object file::
$ make drivers/tty/
$ gdb drivers/tty/vt/vt_ioctl.o
(gdb) l *vt_ioctl+0xda8
If you have a call trace, such as::
Call Trace:
[<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
[<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
[<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
...
this shows the problem likely in the :jbd: module. You can load that module
in gdb and list the relevant code::
$ gdb fs/jbd/jbd.ko
(gdb) l *log_wait_commit+0xa3
.. note::
You can also do the same for any function call at the stack trace,
like this one::
[<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
The position where the above call happened can be seen with::
$ gdb drivers/media/usb/dvb-usb/dvb-usb.o
(gdb) l *dvb_usb_adapter_frontend_exit+0x3a
objdump
^^^^^^^
To debug a kernel, use objdump and look for the hex offset from the crash
output to find the valid line of code/assembler. Without debug symbols, you
will see the assembler code for the routine shown, but if your kernel has
debug symbols the C code will also be available. (Debug symbols can be enabled
in the kernel hacking menu of the menu configuration.) For example::
$ objdump -r -S -l --disassemble net/dccp/ipv4.o
.. note::
You need to be at the top level of the kernel tree for this to pick up
your C files.
If you don't have access to the code you can also debug on some crash dumps
e.g. crash dump output as shown by Dave Miller::
EIP is at +0x14/0x4c0
...
Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
<8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85
Put the bytes into a "foo.s" file like this:
.text
.globl foo
foo:
.byte .... /* bytes from Code: part of OOPS dump */
Compile it with "gcc -c -o foo.o foo.s" then look at the output of
"objdump --disassemble foo.o".
Output:
ip_queue_xmit:
push %ebp
push %edi
push %esi
push %ebx
sub $0xbc, %esp
mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)
mov 0x8(%ebp), %ebx ! %ebx = skb->sk
mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
Reporting the bug
-----------------
Once you find where the bug happened, by inspecting its location,
you could either try to fix it yourself or report it upstream.
In order to report it upstream, you should identify the mailing list
used for the development of the affected code. This can be done by using
the ``get_maintainer.pl`` script.
For example, if you find a bug at the gspca's conex.c file, you can get
their maintainers with::
$ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB),commit_signer:1/1=100%)
Tejun Heo <tj@kernel.org> (commit_signer:1/1=100%)
Bhaktipriya Shridhar <bhaktipriya96@gmail.com> (commit_signer:1/1=100%,authored:1/1=100%,added_lines:4/4=100%,removed_lines:9/9=100%)
linux-media@vger.kernel.org (open list:GSPCA USB WEBCAM DRIVER)
linux-kernel@vger.kernel.org (open list)
Please notice that it will point to:
- The last developers that touched on the source code. On the above example,
Tejun and Bhaktipriya (in this specific case, none really envolved on the
development of this file);
- The driver maintainer (Hans Verkuil);
- The subsystem maintainer (Mauro Carvalho Chehab)
- The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
- the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
Usually, the fastest way to have your bug fixed is to report it to mailing
list used for the development of the code (linux-media ML) copying the driver maintainer (Hans).
If you are totally stumped as to whom to send the report, and
``get_maintainer.pl`` didn't provide you anything useful, send it to
linux-kernel@vger.kernel.org.
Thanks for your help in making Linux as stable as humanly possible.
Fixing the bug
--------------
If you know programming, you could help us by not only reporting the bug,
but also providing us with a solution. After all open source is about
sharing what you do and don't you want to be recognised for your genius?
If you decide to take this way, once you have worked out a fix please submit
it upstream.
Please do read
ref:`Documentation/process/submitting-patches.rst <submittingpatches>` though
to help your code get accepted.
---------------------------------------------------------------------------
Notes on Oops tracing with ``klogd``
------------------------------------
In order to help Linus and the other kernel developers there has been
substantial support incorporated into ``klogd`` for processing protection
faults. In order to have full support for address resolution at least
version 1.3-pl3 of the ``sysklogd`` package should be used.
When a protection fault occurs the ``klogd`` daemon automatically
translates important addresses in the kernel log messages to their
symbolic equivalents. This translated kernel message is then
forwarded through whatever reporting mechanism ``klogd`` is using. The
protection fault message can be simply cut out of the message files
and forwarded to the kernel developers.
Two types of address resolution are performed by ``klogd``. The first is
static translation and the second is dynamic translation. Static
translation uses the System.map file in much the same manner that
ksymoops does. In order to do static translation the ``klogd`` daemon
must be able to find a system map file at daemon initialization time.
See the klogd man page for information on how ``klogd`` searches for map
files.
Dynamic address translation is important when kernel loadable modules
are being used. Since memory for kernel modules is allocated from the
kernel's dynamic memory pools there are no fixed locations for either
the start of the module or for functions and symbols in the module.
The kernel supports system calls which allow a program to determine
which modules are loaded and their location in memory. Using these
system calls the klogd daemon builds a symbol table which can be used
to debug a protection fault which occurs in a loadable kernel module.
At the very minimum klogd will provide the name of the module which
generated the protection fault. There may be additional symbolic
information available if the developer of the loadable module chose to
export symbol information from the module.
Since the kernel module environment can be dynamic there must be a
mechanism for notifying the ``klogd`` daemon when a change in module
environment occurs. There are command line options available which
allow klogd to signal the currently executing daemon that symbol
information should be refreshed. See the ``klogd`` manual page for more
information.
A patch is included with the sysklogd distribution which modifies the
``modules-2.0.0`` package to automatically signal klogd whenever a module
is loaded or unloaded. Applying this patch provides essentially
seamless support for debugging protection faults which occur with
kernel loadable modules.
The following is an example of a protection fault in a loadable module
processed by ``klogd``::
Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc
Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000
Aug 29 09:51:01 blizard kernel: *pde = 00000000
Aug 29 09:51:01 blizard kernel: Oops: 0002
Aug 29 09:51:01 blizard kernel: CPU: 0
Aug 29 09:51:01 blizard kernel: EIP: 0010:[oops:_oops+16/3868]
Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212
Aug 29 09:51:01 blizard kernel: eax: 315e97cc ebx: 003a6f80 ecx: 001be77b edx: 00237c0c
Aug 29 09:51:01 blizard kernel: esi: 00000000 edi: bffffdb3 ebp: 00589f90 esp: 00589f8c
Aug 29 09:51:01 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000)
Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001
Aug 29 09:51:01 blizard kernel: 00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00
Aug 29 09:51:01 blizard kernel: bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036
Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128]
Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3
---------------------------------------------------------------------------
::
Dr. G.W. Wettstein Oncology Research Div. Computing Facility
Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com
820 4th St. N.
Fargo, ND 58122
Phone: 701-234-7556

View File

@@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = 'Linux Kernel User Documentation'
tags.add("subproject")
latex_documents = [
('index', 'linux-user.tex', 'Linux Kernel User Documentation',
'The kernel development community', 'manual'),
]

View File

@@ -0,0 +1,268 @@
Linux allocated devices (4.x+ version)
======================================
This list is the Linux Device List, the official registry of allocated
device numbers and ``/dev`` directory nodes for the Linux operating
system.
The LaTeX version of this document is no longer maintained, nor is
the document that used to reside at lanana.org. This version in the
mainline Linux kernel is the master document. Updates shall be sent
as patches to the kernel maintainers (see the
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` document).
Specifically explore the sections titled "CHAR and MISC DRIVERS", and
"BLOCK LAYER" in the MAINTAINERS file to find the right maintainers
to involve for character and block devices.
This document is included by reference into the Filesystem Hierarchy
Standard (FHS). The FHS is available from http://www.pathname.com/fhs/.
Allocations marked (68k/Amiga) apply to Linux/68k on the Amiga
platform only. Allocations marked (68k/Atari) apply to Linux/68k on
the Atari platform only.
This document is in the public domain. The authors requests, however,
that semantically altered versions are not distributed without
permission of the authors, assuming the authors can be contacted without
an unreasonable effort.
.. attention::
DEVICE DRIVERS AUTHORS PLEASE READ THIS
Linux now has extensive support for dynamic allocation of device numbering
and can use ``sysfs`` and ``udev`` (``systemd``) to handle the naming needs.
There are still some exceptions in the serial and boot device area. Before
asking for a device number make sure you actually need one.
To have a major number allocated, or a minor number in situations
where that applies (e.g. busmice), please submit a patch and send to
the authors as indicated above.
Keep the description of the device *in the same format
as this list*. The reason for this is that it is the only way we have
found to ensure we have all the requisite information to publish your
device and avoid conflicts.
Finally, sometimes we have to play "namespace police." Please don't be
offended. We often get submissions for ``/dev`` names that would be bound
to cause conflicts down the road. We are trying to avoid getting in a
situation where we would have to suffer an incompatible forward
change. Therefore, please consult with us **before** you make your
device names and numbers in any way public, at least to the point
where it would be at all difficult to get them changed.
Your cooperation is appreciated.
.. include:: devices.txt
:literal:
Additional ``/dev/`` directory entries
--------------------------------------
This section details additional entries that should or may exist in
the /dev directory. It is preferred that symbolic links use the same
form (absolute or relative) as is indicated here. Links are
classified as "hard" or "symbolic" depending on the preferred type of
link; if possible, the indicated type of link should be used.
Compulsory links
++++++++++++++++
These links should exist on all systems:
=============== =============== =============== ===============================
/dev/fd /proc/self/fd symbolic File descriptors
/dev/stdin fd/0 symbolic stdin file descriptor
/dev/stdout fd/1 symbolic stdout file descriptor
/dev/stderr fd/2 symbolic stderr file descriptor
/dev/nfsd socksys symbolic Required by iBCS-2
/dev/X0R null symbolic Required by iBCS-2
=============== =============== =============== ===============================
Note: ``/dev/X0R`` is <letter X>-<digit 0>-<letter R>.
Recommended links
+++++++++++++++++
It is recommended that these links exist on all systems:
=============== =============== =============== ===============================
/dev/core /proc/kcore symbolic Backward compatibility
/dev/ramdisk ram0 symbolic Backward compatibility
/dev/ftape qft0 symbolic Backward compatibility
/dev/bttv0 video0 symbolic Backward compatibility
/dev/radio radio0 symbolic Backward compatibility
/dev/i2o* /dev/i2o/* symbolic Backward compatibility
/dev/scd? sr? hard Alternate SCSI CD-ROM name
=============== =============== =============== ===============================
Locally defined links
+++++++++++++++++++++
The following links may be established locally to conform to the
configuration of the system. This is merely a tabulation of existing
practice, and does not constitute a recommendation. However, if they
exist, they should have the following uses.
=============== =============== =============== ===============================
/dev/mouse mouse port symbolic Current mouse device
/dev/tape tape device symbolic Current tape device
/dev/cdrom CD-ROM device symbolic Current CD-ROM device
/dev/cdwriter CD-writer symbolic Current CD-writer device
/dev/scanner scanner symbolic Current scanner device
/dev/modem modem port symbolic Current dialout device
/dev/root root device symbolic Current root filesystem
/dev/swap swap device symbolic Current swap device
=============== =============== =============== ===============================
``/dev/modem`` should not be used for a modem which supports dialin as
well as dialout, as it tends to cause lock file problems. If it
exists, ``/dev/modem`` should point to the appropriate primary TTY device
(the use of the alternate callout devices is deprecated).
For SCSI devices, ``/dev/tape`` and ``/dev/cdrom`` should point to the
*cooked* devices (``/dev/st*`` and ``/dev/sr*``, respectively), whereas
``/dev/cdwriter`` and /dev/scanner should point to the appropriate generic
SCSI devices (/dev/sg*).
``/dev/mouse`` may point to a primary serial TTY device, a hardware mouse
device, or a socket for a mouse driver program (e.g. ``/dev/gpmdata``).
Sockets and pipes
+++++++++++++++++
Non-transient sockets and named pipes may exist in /dev. Common entries are:
=============== =============== ===============================================
/dev/printer socket lpd local socket
/dev/log socket syslog local socket
/dev/gpmdata socket gpm mouse multiplexer
=============== =============== ===============================================
Mount points
++++++++++++
The following names are reserved for mounting special filesystems
under /dev. These special filesystems provide kernel interfaces that
cannot be provided with standard device nodes.
=============== =============== ===============================================
/dev/pts devpts PTY slave filesystem
/dev/shm tmpfs POSIX shared memory maintenance access
=============== =============== ===============================================
Terminal devices
----------------
Terminal, or TTY devices are a special class of character devices. A
terminal device is any device that could act as a controlling terminal
for a session; this includes virtual consoles, serial ports, and
pseudoterminals (PTYs).
All terminal devices share a common set of capabilities known as line
disciplines; these include the common terminal line discipline as well
as SLIP and PPP modes.
All terminal devices are named similarly; this section explains the
naming and use of the various types of TTYs. Note that the naming
conventions include several historical warts; some of these are
Linux-specific, some were inherited from other systems, and some
reflect Linux outgrowing a borrowed convention.
A hash mark (``#``) in a device name is used here to indicate a decimal
number without leading zeroes.
Virtual consoles and the console device
+++++++++++++++++++++++++++++++++++++++
Virtual consoles are full-screen terminal displays on the system video
monitor. Virtual consoles are named ``/dev/tty#``, with numbering
starting at ``/dev/tty1``; ``/dev/tty0`` is the current virtual console.
``/dev/tty0`` is the device that should be used to access the system video
card on those architectures for which the frame buffer devices
(``/dev/fb*``) are not applicable. Do not use ``/dev/console``
for this purpose.
The console device, ``/dev/console``, is the device to which system
messages should be sent, and on which logins should be permitted in
single-user mode. Starting with Linux 2.1.71, ``/dev/console`` is managed
by the kernel; for previous versions it should be a symbolic link to
either ``/dev/tty0``, a specific virtual console such as ``/dev/tty1``, or to
a serial port primary (``tty*``, not ``cu*``) device, depending on the
configuration of the system.
Serial ports
++++++++++++
Serial ports are RS-232 serial ports and any device which simulates
one, either in hardware (such as internal modems) or in software (such
as the ISDN driver.) Under Linux, each serial ports has two device
names, the primary or callin device and the alternate or callout one.
Each kind of device is indicated by a different letter. For any
letter X, the names of the devices are ``/dev/ttyX#`` and ``/dev/cux#``,
respectively; for historical reasons, ``/dev/ttyS#`` and ``/dev/ttyC#``
correspond to ``/dev/cua#`` and ``/dev/cub#``. In the future, it should be
expected that multiple letters will be used; all letters will be upper
case for the "tty" device (e.g. ``/dev/ttyDP#``) and lower case for the
"cu" device (e.g. ``/dev/cudp#``).
The names ``/dev/ttyQ#`` and ``/dev/cuq#`` are reserved for local use.
The alternate devices provide for kernel-based exclusion and somewhat
different defaults than the primary devices. Their main purpose is to
allow the use of serial ports with programs with no inherent or broken
support for serial ports. Their use is deprecated, and they may be
removed from a future version of Linux.
Arbitration of serial ports is provided by the use of lock files with
the names ``/var/lock/LCK..ttyX#``. The contents of the lock file should
be the PID of the locking process as an ASCII number.
It is common practice to install links such as /dev/modem
which point to serial ports. In order to ensure proper locking in the
presence of these links, it is recommended that software chase
symlinks and lock all possible names; additionally, it is recommended
that a lock file be installed with the corresponding alternate
device. In order to avoid deadlocks, it is recommended that the locks
are acquired in the following order, and released in the reverse:
1. The symbolic link name, if any (``/var/lock/LCK..modem``)
2. The "tty" name (``/var/lock/LCK..ttyS2``)
3. The alternate device name (``/var/lock/LCK..cua2``)
In the case of nested symbolic links, the lock files should be
installed in the order the symlinks are resolved.
Under no circumstances should an application hold a lock while waiting
for another to be released. In addition, applications which attempt
to create lock files for the corresponding alternate device names
should take into account the possibility of being used on a non-serial
port TTY, for which no alternate device would exist.
Pseudoterminals (PTYs)
++++++++++++++++++++++
Pseudoterminals, or PTYs, are used to create login sessions or provide
other capabilities requiring a TTY line discipline (including SLIP or
PPP capability) to arbitrary data-generation processes. Each PTY has
a master side, named ``/dev/pty[p-za-e][0-9a-f]``, and a slave side, named
``/dev/tty[p-za-e][0-9a-f]``. The kernel arbitrates the use of PTYs by
allowing each master side to be opened only once.
Once the master side has been opened, the corresponding slave device
can be used in the same manner as any TTY device. The master and
slave devices are connected by the kernel, generating the equivalent
of a bidirectional pipe with TTY capabilities.
Recent versions of the Linux kernels and GNU libc contain support for
the System V/Unix98 naming scheme for PTYs, which assigns a common
device, ``/dev/ptmx``, to all the masters (opening it will automatically
give you a previously unassigned PTY) and a subdirectory, ``/dev/pts``,
for the slaves; the slaves are named with decimal integers (``/dev/pts/#``
in our notation). This removes the problem of exhausting the
namespace and enables the kernel to automatically create the device
nodes for the slaves on demand using the "devpts" filesystem.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,353 @@
Dynamic debug
+++++++++++++
Introduction
============
This document describes how to use the dynamic debug (dyndbg) feature.
Dynamic debug is designed to allow you to dynamically enable/disable
kernel code to obtain additional kernel information. Currently, if
``CONFIG_DYNAMIC_DEBUG`` is set, then all ``pr_debug()``/``dev_dbg()`` and
``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically
enabled per-callsite.
If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just
shortcut for ``print_hex_dump(KERN_DEBUG)``.
For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is
its ``prefix_str`` argument, if it is constant string; or ``hexdump``
in case ``prefix_str`` is build dynamically.
Dynamic debug has even more useful features:
* Simple query language allows turning on and off debugging
statements by matching any combination of 0 or 1 of:
- source filename
- function name
- line number (including ranges of line numbers)
- module name
- format string
* Provides a debugfs control file: ``<debugfs>/dynamic_debug/control``
which can be read to display the complete list of known debug
statements, to help guide you
Controlling dynamic debug Behaviour
===================================
The behaviour of ``pr_debug()``/``dev_dbg()`` are controlled via writing to a
control file in the 'debugfs' filesystem. Thus, you must first mount
the debugfs filesystem, in order to make use of this feature.
Subsequently, we refer to the control file as:
``<debugfs>/dynamic_debug/control``. For example, if you want to enable
printing from source file ``svcsock.c``, line 1603 you simply do::
nullarbor:~ # echo 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
If you make a mistake with the syntax, the write will fail thus::
nullarbor:~ # echo 'file svcsock.c wtf 1 +p' >
<debugfs>/dynamic_debug/control
-bash: echo: write error: Invalid argument
Viewing Dynamic Debug Behaviour
===============================
You can view the currently configured behaviour of all the debug
statements via::
nullarbor:~ # cat <debugfs>/dynamic_debug/control
# filename:lineno [module]function flags format
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012"
...
You can also apply standard Unix text manipulation filters to this
data, e.g.::
nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l
62
nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l
42
The third column shows the currently enabled flags for each debug
statement callsite (see below for definitions of the flags). The
default value, with no flags enabled, is ``=_``. So you can view all
the debug statement callsites with any non-default flags::
nullarbor:~ # awk '$3 != "=_"' <debugfs>/dynamic_debug/control
# filename:lineno [module]function flags format
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012"
Command Language Reference
==========================
At the lexical level, a command comprises a sequence of words separated
by spaces or tabs. So these are all equivalent::
nullarbor:~ # echo -c 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
nullarbor:~ # echo -c ' file svcsock.c line 1603 +p ' >
<debugfs>/dynamic_debug/control
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
Command submissions are bounded by a write() system call.
Multiple commands can be written together, separated by ``;`` or ``\n``::
~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \
> <debugfs>/dynamic_debug/control
If your query set is big, you can batch them too::
~# cat query-batch-file > <debugfs>/dynamic_debug/control
A another way is to use wildcard. The match rule support ``*`` (matches
zero or more characters) and ``?`` (matches exactly one character).For
example, you can match all usb drivers::
~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
At the syntactical level, a command comprises a sequence of match
specifications, followed by a flags change specification::
command ::= match-spec* flags-spec
The match-spec's are used to choose a subset of the known pr_debug()
callsites to which to apply the flags-spec. Think of them as a query
with implicit ANDs between each pair. Note that an empty list of
match-specs will select all debug statement callsites.
A match specification comprises a keyword, which controls the
attribute of the callsite to be compared, and a value to compare
against. Possible keywords are:::
match-spec ::= 'func' string |
'file' string |
'module' string |
'format' string |
'line' line-range
line-range ::= lineno |
'-'lineno |
lineno'-' |
lineno'-'lineno
lineno ::= unsigned-int
.. note::
``line-range`` cannot contain space, e.g.
"1-30" is valid range but "1 - 30" is not.
The meanings of each keyword are:
func
The given string is compared against the function name
of each callsite. Example::
func svc_tcp_accept
file
The given string is compared against either the full pathname, the
src-root relative pathname, or the basename of the source file of
each callsite. Examples::
file svcsock.c
file kernel/freezer.c
file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c
module
The given string is compared against the module name
of each callsite. The module name is the string as
seen in ``lsmod``, i.e. without the directory or the ``.ko``
suffix and with ``-`` changed to ``_``. Examples::
module sunrpc
module nfsd
format
The given string is searched for in the dynamic debug format
string. Note that the string does not need to match the
entire format, only some part. Whitespace and other
special characters can be escaped using C octal character
escape ``\ooo`` notation, e.g. the space character is ``\040``.
Alternatively, the string can be enclosed in double quote
characters (``"``) or single quote characters (``'``).
Examples::
format svcrdma: // many of the NFS/RDMA server pr_debugs
format readahead // some pr_debugs in the readahead cache
format nfsd:\040SETATTR // one way to match a format with whitespace
format "nfsd: SETATTR" // a neater way to match a format with whitespace
format 'nfsd: SETATTR' // yet another way to match a format with whitespace
line
The given line number or range of line numbers is compared
against the line number of each ``pr_debug()`` callsite. A single
line number matches the callsite line number exactly. A
range of line numbers matches any callsite between the first
and last line number inclusive. An empty first number means
the first line in the file, an empty line number means the
last number in the file. Examples::
line 1603 // exactly line 1603
line 1600-1605 // the six lines from line 1600 to line 1605
line -1605 // the 1605 lines from line 1 to line 1605
line 1600- // all lines from line 1600 to the end of the file
The flags specification comprises a change operation followed
by one or more flag characters. The change operation is one
of the characters::
- remove the given flags
+ add the given flags
= set the flags to the given flags
The flags are::
p enables the pr_debug() callsite.
f Include the function name in the printed message
l Include line number in the printed message
m Include module name in the printed message
t Include thread ID in messages not generated from interrupt context
_ No flags are set. (Or'd with others on input)
For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only ``p`` flag
have meaning, other flags ignored.
For display, the flags are preceded by ``=``
(mnemonic: what the flags are currently equal to).
Note the regexp ``^[-+=][flmpt_]+$`` matches a flags specification.
To clear all flags at once, use ``=_`` or ``-flmpt``.
Debug messages during Boot Process
==================================
To activate debug messages for core code and built-in modules during
the boot process, even before userspace and debugfs exists, use
``dyndbg="QUERY"``, ``module.dyndbg="QUERY"``, or ``ddebug_query="QUERY"``
(``ddebug_query`` is obsoleted by ``dyndbg``, and deprecated). QUERY follows
the syntax described above, but must not exceed 1023 characters. Your
bootloader may impose lower limits.
These ``dyndbg`` params are processed just after the ddebug tables are
processed, as part of the arch_initcall. Thus you can enable debug
messages in all code run after this arch_initcall via this boot
parameter.
On an x86 system for example ACPI enablement is a subsys_initcall and::
dyndbg="file ec.c +p"
will show early Embedded Controller transactions during ACPI setup if
your machine (typically a laptop) has an Embedded Controller.
PCI (or other devices) initialization also is a hot candidate for using
this boot parameter for debugging purposes.
If ``foo`` module is not built-in, ``foo.dyndbg`` will still be processed at
boot time, without effect, but will be reprocessed when module is
loaded later. ``dyndbg_query=`` and bare ``dyndbg=`` are only processed at
boot.
Debug Messages at Module Initialization Time
============================================
When ``modprobe foo`` is called, modprobe scans ``/proc/cmdline`` for
``foo.params``, strips ``foo.``, and passes them to the kernel along with
params given in modprobe args or ``/etc/modprob.d/*.conf`` files,
in the following order:
1. parameters given via ``/etc/modprobe.d/*.conf``::
options foo dyndbg=+pt
options foo dyndbg # defaults to +p
2. ``foo.dyndbg`` as given in boot args, ``foo.`` is stripped and passed::
foo.dyndbg=" func bar +p; func buz +mp"
3. args to modprobe::
modprobe foo dyndbg==pmf # override previous settings
These ``dyndbg`` queries are applied in order, with last having final say.
This allows boot args to override or modify those from ``/etc/modprobe.d``
(sensible, since 1 is system wide, 2 is kernel or boot specific), and
modprobe args to override both.
In the ``foo.dyndbg="QUERY"`` form, the query must exclude ``module foo``.
``foo`` is extracted from the param-name, and applied to each query in
``QUERY``, and only 1 match-spec of each type is allowed.
The ``dyndbg`` option is a "fake" module parameter, which means:
- modules do not need to define it explicitly
- every module gets it tacitly, whether they use pr_debug or not
- it doesn't appear in ``/sys/module/$module/parameters/``
To see it, grep the control file, or inspect ``/proc/cmdline.``
For ``CONFIG_DYNAMIC_DEBUG`` kernels, any settings given at boot-time (or
enabled by ``-DDEBUG`` flag during compilation) can be disabled later via
the sysfs interface if the debug messages are no longer needed::
echo "module module_name -p" > <debugfs>/dynamic_debug/control
Examples
========
::
// enable the message at line 1603 of file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
// enable all the messages in file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c +p' >
<debugfs>/dynamic_debug/control
// enable all the messages in the NFS server module
nullarbor:~ # echo -n 'module nfsd +p' >
<debugfs>/dynamic_debug/control
// enable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process +p' >
<debugfs>/dynamic_debug/control
// disable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process -p' >
<debugfs>/dynamic_debug/control
// enable messages for NFS calls READ, READLINK, READDIR and READDIR+.
nullarbor:~ # echo -n 'format "nfsd: READ" +p' >
<debugfs>/dynamic_debug/control
// enable messages in files of which the paths include string "usb"
nullarbor:~ # echo -n '*usb* +p' > <debugfs>/dynamic_debug/control
// enable all messages
nullarbor:~ # echo -n '+p' > <debugfs>/dynamic_debug/control
// add module, function to all enabled messages
nullarbor:~ # echo -n '+mf' > <debugfs>/dynamic_debug/control
// boot-args example, with newlines and comments for readability
Kernel command line: ...
// see whats going on in dyndbg=value processing
dynamic_debug.verbose=1
// enable pr_debugs in 2 builtins, #cmt is stripped
dyndbg="module params +p #cmt ; module sys +p"
// enable pr_debugs in 2 functions in a module loaded later
pc87360.dyndbg="func pc87360_init_device +p; func pc87360_find +p"

View File

@@ -0,0 +1,69 @@
The Linux kernel user's and administrator's guide
=================================================
The following is a collection of user-oriented documents that have been
added to the kernel over time. There is, as yet, little overall order or
organization here — this material was not written to be a single, coherent
document! With luck things will improve quickly over time.
This initial section contains overall information, including the README
file describing the kernel as a whole, documentation on kernel parameters,
etc.
.. toctree::
:maxdepth: 1
README
kernel-parameters
devices
Here is a set of documents aimed at users who are trying to track down
problems and bugs in particular.
.. toctree::
:maxdepth: 1
reporting-bugs
security-bugs
bug-hunting
bug-bisect
tainted-kernels
ramoops
dynamic-debug-howto
init
This is the beginning of a section with information of interest to
application developers. Documents covering various aspects of the kernel
ABI will be found here.
.. toctree::
:maxdepth: 1
sysfs-rules
The rest of this manual consists of various unordered guides on how to
configure specific aspects of kernel behavior to your liking.
.. toctree::
:maxdepth: 1
initrd
serial-console
braille-console
parport
md
module-signing
sysrq
unicode
vga-softcursor
binfmt-misc
mono
java
ras
.. only:: subproject and html
Indices
=======
* :ref:`genindex`

View File

@@ -0,0 +1,52 @@
Explaining the dreaded "No init found." boot hang message
=========================================================
OK, so you've got this pretty unintuitive message (currently located
in init/main.c) and are wondering what the H*** went wrong.
Some high-level reasons for failure (listed roughly in order of execution)
to load the init binary are:
A) Unable to mount root FS
B) init binary doesn't exist on rootfs
C) broken console device
D) binary exists but dependencies not available
E) binary cannot be loaded
Detailed explanations:
A) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
to get more detailed kernel messages.
B) make sure you have the correct root FS type
(and ``root=`` kernel parameter points to the correct partition),
required drivers such as storage hardware (such as SCSI or USB!)
and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
to be pre-loaded by an initrd)
C) Possibly a conflict in ``console= setup`` --> initial console unavailable.
E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
missing interrupt-based configuration).
Try using a different ``console= device`` or e.g. ``netconsole=``.
D) e.g. required library dependencies of the init binary such as
``/lib/ld-linux.so.2`` missing or broken. Use
``readelf -d <INIT>|grep NEEDED`` to find out which libraries are required.
E) make sure the binary's architecture matches your hardware.
E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
In case you tried loading a non-binary file here (shell script?),
you should make sure that the script specifies an interpreter in its shebang
header line (``#!/...``) that is fully working (including its library
dependencies). And before tackling scripts, better first test a simple
non-script binary such as ``/bin/sh`` and confirm its successful execution.
To find out more, add code ``to init/main.c`` to display kernel_execve()s
return values.
Please extend this explanation whenever you find new failure causes
(after all loading the init binary is a CRITICAL and hard transition step
which needs to be made as painless as possible), then submit patch to LKML.
Further TODOs:
- Implement the various ``run_init_process()`` invocations via a struct array
which can then store the ``kernel_execve()`` result value and on failure
log it all by iterating over **all** results (very important usability fix).
- try to make the implementation itself more helpful in general,
e.g. by providing additional error messages at affected places.
Andreas Mohr <andi at lisas period de>

View File

@@ -0,0 +1,383 @@
Using the initial RAM disk (initrd)
===================================
Written 1996,2000 by Werner Almesberger <werner.almesberger@epfl.ch> and
Hans Lermen <lermen@fgan.de>
initrd provides the capability to load a RAM disk by the boot loader.
This RAM disk can then be mounted as the root file system and programs
can be run from it. Afterwards, a new root file system can be mounted
from a different device. The previous root (from initrd) is then moved
to a directory and can be subsequently unmounted.
initrd is mainly designed to allow system startup to occur in two phases,
where the kernel comes up with a minimum set of compiled-in drivers, and
where additional modules are loaded from initrd.
This document gives a brief overview of the use of initrd. A more detailed
discussion of the boot process can be found in [#f1]_.
Operation
---------
When using initrd, the system typically boots as follows:
1) the boot loader loads the kernel and the initial RAM disk
2) the kernel converts initrd into a "normal" RAM disk and
frees the memory used by initrd
3) if the root device is not ``/dev/ram0``, the old (deprecated)
change_root procedure is followed. see the "Obsolete root change
mechanism" section below.
4) root device is mounted. if it is ``/dev/ram0``, the initrd image is
then mounted as root
5) /sbin/init is executed (this can be any valid executable, including
shell scripts; it is run with uid 0 and can do basically everything
init can do).
6) init mounts the "real" root file system
7) init places the root file system at the root directory using the
pivot_root system call
8) init execs the ``/sbin/init`` on the new root filesystem, performing
the usual boot sequence
9) the initrd file system is removed
Note that changing the root directory does not involve unmounting it.
It is therefore possible to leave processes running on initrd during that
procedure. Also note that file systems mounted under initrd continue to
be accessible.
Boot command-line options
-------------------------
initrd adds the following new options::
initrd=<path> (e.g. LOADLIN)
Loads the specified file as the initial RAM disk. When using LILO, you
have to specify the RAM disk image file in /etc/lilo.conf, using the
INITRD configuration variable.
noinitrd
initrd data is preserved but it is not converted to a RAM disk and
the "normal" root file system is mounted. initrd data can be read
from /dev/initrd. Note that the data in initrd can have any structure
in this case and doesn't necessarily have to be a file system image.
This option is used mainly for debugging.
Note: /dev/initrd is read-only and it can only be used once. As soon
as the last process has closed it, all data is freed and /dev/initrd
can't be opened anymore.
root=/dev/ram0
initrd is mounted as root, and the normal boot procedure is followed,
with the RAM disk mounted as root.
Compressed cpio images
----------------------
Recent kernels have support for populating a ramdisk from a compressed cpio
archive. On such systems, the creation of a ramdisk image doesn't need to
involve special block devices or loopbacks; you merely create a directory on
disk with the desired initrd content, cd to that directory, and run (as an
example)::
find . | cpio --quiet -H newc -o | gzip -9 -n > /boot/imagefile.img
Examining the contents of an existing image file is just as simple::
mkdir /tmp/imagefile
cd /tmp/imagefile
gzip -cd /boot/imagefile.img | cpio -imd --quiet
Installation
------------
First, a directory for the initrd file system has to be created on the
"normal" root file system, e.g.::
# mkdir /initrd
The name is not relevant. More details can be found on the
:manpage:`pivot_root(2)` man page.
If the root file system is created during the boot procedure (i.e. if
you're building an install floppy), the root file system creation
procedure should create the ``/initrd`` directory.
If initrd will not be mounted in some cases, its content is still
accessible if the following device has been created::
# mknod /dev/initrd b 1 250
# chmod 400 /dev/initrd
Second, the kernel has to be compiled with RAM disk support and with
support for the initial RAM disk enabled. Also, at least all components
needed to execute programs from initrd (e.g. executable format and file
system) must be compiled into the kernel.
Third, you have to create the RAM disk image. This is done by creating a
file system on a block device, copying files to it as needed, and then
copying the content of the block device to the initrd file. With recent
kernels, at least three types of devices are suitable for that:
- a floppy disk (works everywhere but it's painfully slow)
- a RAM disk (fast, but allocates physical memory)
- a loopback device (the most elegant solution)
We'll describe the loopback device method:
1) make sure loopback block devices are configured into the kernel
2) create an empty file system of the appropriate size, e.g.::
# dd if=/dev/zero of=initrd bs=300k count=1
# mke2fs -F -m0 initrd
(if space is critical, you may want to use the Minix FS instead of Ext2)
3) mount the file system, e.g.::
# mount -t ext2 -o loop initrd /mnt
4) create the console device::
# mkdir /mnt/dev
# mknod /mnt/dev/console c 5 1
5) copy all the files that are needed to properly use the initrd
environment. Don't forget the most important file, ``/sbin/init``
.. note:: ``/sbin/init`` permissions must include "x" (execute).
6) correct operation the initrd environment can frequently be tested
even without rebooting with the command::
# chroot /mnt /sbin/init
This is of course limited to initrds that do not interfere with the
general system state (e.g. by reconfiguring network interfaces,
overwriting mounted devices, trying to start already running demons,
etc. Note however that it is usually possible to use pivot_root in
such a chroot'ed initrd environment.)
7) unmount the file system::
# umount /mnt
8) the initrd is now in the file "initrd". Optionally, it can now be
compressed::
# gzip -9 initrd
For experimenting with initrd, you may want to take a rescue floppy and
only add a symbolic link from ``/sbin/init`` to ``/bin/sh``. Alternatively, you
can try the experimental newlib environment [#f2]_ to create a small
initrd.
Finally, you have to boot the kernel and load initrd. Almost all Linux
boot loaders support initrd. Since the boot process is still compatible
with an older mechanism, the following boot command line parameters
have to be given::
root=/dev/ram0 rw
(rw is only necessary if writing to the initrd file system.)
With LOADLIN, you simply execute::
LOADLIN <kernel> initrd=<disk_image>
e.g.::
LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 rw
With LILO, you add the option ``INITRD=<path>`` to either the global section
or to the section of the respective kernel in ``/etc/lilo.conf``, and pass
the options using APPEND, e.g.::
image = /bzImage
initrd = /boot/initrd.gz
append = "root=/dev/ram0 rw"
and run ``/sbin/lilo``
For other boot loaders, please refer to the respective documentation.
Now you can boot and enjoy using initrd.
Changing the root device
------------------------
When finished with its duties, init typically changes the root device
and proceeds with starting the Linux system on the "real" root device.
The procedure involves the following steps:
- mounting the new root file system
- turning it into the root file system
- removing all accesses to the old (initrd) root file system
- unmounting the initrd file system and de-allocating the RAM disk
Mounting the new root file system is easy: it just needs to be mounted on
a directory under the current root. Example::
# mkdir /new-root
# mount -o ro /dev/hda1 /new-root
The root change is accomplished with the pivot_root system call, which
is also available via the ``pivot_root`` utility (see :manpage:`pivot_root(8)`
man page; ``pivot_root`` is distributed with util-linux version 2.10h or higher
[#f3]_). ``pivot_root`` moves the current root to a directory under the new
root, and puts the new root at its place. The directory for the old root
must exist before calling ``pivot_root``. Example::
# cd /new-root
# mkdir initrd
# pivot_root . initrd
Now, the init process may still access the old root via its
executable, shared libraries, standard input/output/error, and its
current root directory. All these references are dropped by the
following command::
# exec chroot . what-follows <dev/console >dev/console 2>&1
Where what-follows is a program under the new root, e.g. ``/sbin/init``
If the new root file system will be used with udev and has no valid
``/dev`` directory, udev must be initialized before invoking chroot in order
to provide ``/dev/console``.
Note: implementation details of pivot_root may change with time. In order
to ensure compatibility, the following points should be observed:
- before calling pivot_root, the current directory of the invoking
process should point to the new root directory
- use . as the first argument, and the _relative_ path of the directory
for the old root as the second argument
- a chroot program must be available under the old and the new root
- chroot to the new root afterwards
- use relative paths for dev/console in the exec command
Now, the initrd can be unmounted and the memory allocated by the RAM
disk can be freed::
# umount /initrd
# blockdev --flushbufs /dev/ram0
It is also possible to use initrd with an NFS-mounted root, see the
:manpage:`pivot_root(8)` man page for details.
Usage scenarios
---------------
The main motivation for implementing initrd was to allow for modular
kernel configuration at system installation. The procedure would work
as follows:
1) system boots from floppy or other media with a minimal kernel
(e.g. support for RAM disks, initrd, a.out, and the Ext2 FS) and
loads initrd
2) ``/sbin/init`` determines what is needed to (1) mount the "real" root FS
(i.e. device type, device drivers, file system) and (2) the
distribution media (e.g. CD-ROM, network, tape, ...). This can be
done by asking the user, by auto-probing, or by using a hybrid
approach.
3) ``/sbin/init`` loads the necessary kernel modules
4) ``/sbin/init`` creates and populates the root file system (this doesn't
have to be a very usable system yet)
5) ``/sbin/init`` invokes ``pivot_root`` to change the root file system and
execs - via chroot - a program that continues the installation
6) the boot loader is installed
7) the boot loader is configured to load an initrd with the set of
modules that was used to bring up the system (e.g. ``/initrd`` can be
modified, then unmounted, and finally, the image is written from
``/dev/ram0`` or ``/dev/rd/0`` to a file)
8) now the system is bootable and additional installation tasks can be
performed
The key role of initrd here is to re-use the configuration data during
normal system operation without requiring the use of a bloated "generic"
kernel or re-compiling or re-linking the kernel.
A second scenario is for installations where Linux runs on systems with
different hardware configurations in a single administrative domain. In
such cases, it is desirable to generate only a small set of kernels
(ideally only one) and to keep the system-specific part of configuration
information as small as possible. In this case, a common initrd could be
generated with all the necessary modules. Then, only ``/sbin/init`` or a file
read by it would have to be different.
A third scenario is more convenient recovery disks, because information
like the location of the root FS partition doesn't have to be provided at
boot time, but the system loaded from initrd can invoke a user-friendly
dialog and it can also perform some sanity checks (or even some form of
auto-detection).
Last not least, CD-ROM distributors may use it for better installation
from CD, e.g. by using a boot floppy and bootstrapping a bigger RAM disk
via initrd from CD; or by booting via a loader like ``LOADLIN`` or directly
from the CD-ROM, and loading the RAM disk from CD without need of
floppies.
Obsolete root change mechanism
------------------------------
The following mechanism was used before the introduction of pivot_root.
Current kernels still support it, but you should _not_ rely on its
continued availability.
It works by mounting the "real" root device (i.e. the one set with rdev
in the kernel image or with root=... at the boot command line) as the
root file system when linuxrc exits. The initrd file system is then
unmounted, or, if it is still busy, moved to a directory ``/initrd``, if
such a directory exists on the new root file system.
In order to use this mechanism, you do not have to specify the boot
command options root, init, or rw. (If specified, they will affect
the real root file system, not the initrd environment.)
If /proc is mounted, the "real" root device can be changed from within
linuxrc by writing the number of the new root FS device to the special
file /proc/sys/kernel/real-root-dev, e.g.::
# echo 0x301 >/proc/sys/kernel/real-root-dev
Note that the mechanism is incompatible with NFS and similar file
systems.
This old, deprecated mechanism is commonly called ``change_root``, while
the new, supported mechanism is called ``pivot_root``.
Mixed change_root and pivot_root mechanism
------------------------------------------
In case you did not want to use ``root=/dev/ram0`` to trigger the pivot_root
mechanism, you may create both ``/linuxrc`` and ``/sbin/init`` in your initrd
image.
``/linuxrc`` would contain only the following::
#! /bin/sh
mount -n -t proc proc /proc
echo 0x0100 >/proc/sys/kernel/real-root-dev
umount -n /proc
Once linuxrc exited, the kernel would mount again your initrd as root,
this time executing ``/sbin/init``. Again, it would be the duty of this init
to build the right environment (maybe using the ``root= device`` passed on
the cmdline) before the final execution of the real ``/sbin/init``.
Resources
---------
.. [#f1] Almesberger, Werner; "Booting Linux: The History and the Future"
http://www.almesberger.net/cv/papers/ols2k-9.ps.gz
.. [#f2] newlib package (experimental), with initrd example
https://www.sourceware.org/newlib/
.. [#f3] util-linux: Miscellaneous utilities for Linux
https://www.kernel.org/pub/linux/utils/util-linux/

View File

@@ -0,0 +1,423 @@
Java(tm) Binary Kernel Support for Linux v1.03
----------------------------------------------
Linux beats them ALL! While all other OS's are TALKING about direct
support of Java Binaries in the OS, Linux is doing it!
You can execute Java applications and Java Applets just like any
other program after you have done the following:
1) You MUST FIRST install the Java Developers Kit for Linux.
The Java on Linux HOWTO gives the details on getting and
installing this. This HOWTO can be found at:
ftp://sunsite.unc.edu/pub/Linux/docs/HOWTO/Java-HOWTO
You should also set up a reasonable CLASSPATH environment
variable to use Java applications that make use of any
nonstandard classes (not included in the same directory
as the application itself).
2) You have to compile BINFMT_MISC either as a module or into
the kernel (``CONFIG_BINFMT_MISC``) and set it up properly.
If you choose to compile it as a module, you will have
to insert it manually with modprobe/insmod, as kmod
cannot easily be supported with binfmt_misc.
Read the file 'binfmt_misc.txt' in this directory to know
more about the configuration process.
3) Add the following configuration items to binfmt_misc
(you should really have read ``binfmt_misc.txt`` now):
support for Java applications::
':Java:M::\xca\xfe\xba\xbe::/usr/local/bin/javawrapper:'
support for executable Jar files::
':ExecutableJAR:E::jar::/usr/local/bin/jarwrapper:'
support for Java Applets::
':Applet:E::html::/usr/bin/appletviewer:'
or the following, if you want to be more selective::
':Applet:M::<!--applet::/usr/bin/appletviewer:'
Of course you have to fix the path names. The path/file names given in this
document match the Debian 2.1 system. (i.e. jdk installed in ``/usr``,
custom wrappers from this document in ``/usr/local``)
Note, that for the more selective applet support you have to modify
existing html-files to contain ``<!--applet-->`` in the first line
(``<`` has to be the first character!) to let this work!
For the compiled Java programs you need a wrapper script like the
following (this is because Java is broken in case of the filename
handling), again fix the path names, both in the script and in the
above given configuration string.
You, too, need the little program after the script. Compile like::
gcc -O2 -o javaclassname javaclassname.c
and stick it to ``/usr/local/bin``.
Both the javawrapper shellscript and the javaclassname program
were supplied by Colin J. Watson <cjw44@cam.ac.uk>.
Javawrapper shell script:
.. code-block:: sh
#!/bin/bash
# /usr/local/bin/javawrapper - the wrapper for binfmt_misc/java
if [ -z "$1" ]; then
exec 1>&2
echo Usage: $0 class-file
exit 1
fi
CLASS=$1
FQCLASS=`/usr/local/bin/javaclassname $1`
FQCLASSN=`echo $FQCLASS | sed -e 's/^.*\.\([^.]*\)$/\1/'`
FQCLASSP=`echo $FQCLASS | sed -e 's-\.-/-g' -e 's-^[^/]*$--' -e 's-/[^/]*$--'`
# for example:
# CLASS=Test.class
# FQCLASS=foo.bar.Test
# FQCLASSN=Test
# FQCLASSP=foo/bar
unset CLASSBASE
declare -i LINKLEVEL=0
while :; do
if [ "`basename $CLASS .class`" == "$FQCLASSN" ]; then
# See if this directory works straight off
cd -L `dirname $CLASS`
CLASSDIR=$PWD
cd $OLDPWD
if echo $CLASSDIR | grep -q "$FQCLASSP$"; then
CLASSBASE=`echo $CLASSDIR | sed -e "s.$FQCLASSP$.."`
break;
fi
# Try dereferencing the directory name
cd -P `dirname $CLASS`
CLASSDIR=$PWD
cd $OLDPWD
if echo $CLASSDIR | grep -q "$FQCLASSP$"; then
CLASSBASE=`echo $CLASSDIR | sed -e "s.$FQCLASSP$.."`
break;
fi
# If no other possible filename exists
if [ ! -L $CLASS ]; then
exec 1>&2
echo $0:
echo " $CLASS should be in a" \
"directory tree called $FQCLASSP"
exit 1
fi
fi
if [ ! -L $CLASS ]; then break; fi
# Go down one more level of symbolic links
let LINKLEVEL+=1
if [ $LINKLEVEL -gt 5 ]; then
exec 1>&2
echo $0:
echo " Too many symbolic links encountered"
exit 1
fi
CLASS=`ls --color=no -l $CLASS | sed -e 's/^.* \([^ ]*\)$/\1/'`
done
if [ -z "$CLASSBASE" ]; then
if [ -z "$FQCLASSP" ]; then
GOODNAME=$FQCLASSN.class
else
GOODNAME=$FQCLASSP/$FQCLASSN.class
fi
exec 1>&2
echo $0:
echo " $FQCLASS should be in a file called $GOODNAME"
exit 1
fi
if ! echo $CLASSPATH | grep -q "^\(.*:\)*$CLASSBASE\(:.*\)*"; then
# class is not in CLASSPATH, so prepend dir of class to CLASSPATH
if [ -z "${CLASSPATH}" ] ; then
export CLASSPATH=$CLASSBASE
else
export CLASSPATH=$CLASSBASE:$CLASSPATH
fi
fi
shift
/usr/bin/java $FQCLASS "$@"
javaclassname.c:
.. code-block:: c
/* javaclassname.c
*
* Extracts the class name from a Java class file; intended for use in a Java
* wrapper of the type supported by the binfmt_misc option in the Linux kernel.
*
* Copyright (C) 1999 Colin J. Watson <cjw44@cam.ac.uk>.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <sys/types.h>
/* From Sun's Java VM Specification, as tag entries in the constant pool. */
#define CP_UTF8 1
#define CP_INTEGER 3
#define CP_FLOAT 4
#define CP_LONG 5
#define CP_DOUBLE 6
#define CP_CLASS 7
#define CP_STRING 8
#define CP_FIELDREF 9
#define CP_METHODREF 10
#define CP_INTERFACEMETHODREF 11
#define CP_NAMEANDTYPE 12
#define CP_METHODHANDLE 15
#define CP_METHODTYPE 16
#define CP_INVOKEDYNAMIC 18
/* Define some commonly used error messages */
#define seek_error() error("%s: Cannot seek\n", program)
#define corrupt_error() error("%s: Class file corrupt\n", program)
#define eof_error() error("%s: Unexpected end of file\n", program)
#define utf8_error() error("%s: Only ASCII 1-255 supported\n", program);
char *program;
long *pool;
u_int8_t read_8(FILE *classfile);
u_int16_t read_16(FILE *classfile);
void skip_constant(FILE *classfile, u_int16_t *cur);
void error(const char *format, ...);
int main(int argc, char **argv);
/* Reads in an unsigned 8-bit integer. */
u_int8_t read_8(FILE *classfile)
{
int b = fgetc(classfile);
if(b == EOF)
eof_error();
return (u_int8_t)b;
}
/* Reads in an unsigned 16-bit integer. */
u_int16_t read_16(FILE *classfile)
{
int b1, b2;
b1 = fgetc(classfile);
if(b1 == EOF)
eof_error();
b2 = fgetc(classfile);
if(b2 == EOF)
eof_error();
return (u_int16_t)((b1 << 8) | b2);
}
/* Reads in a value from the constant pool. */
void skip_constant(FILE *classfile, u_int16_t *cur)
{
u_int16_t len;
int seekerr = 1;
pool[*cur] = ftell(classfile);
switch(read_8(classfile))
{
case CP_UTF8:
len = read_16(classfile);
seekerr = fseek(classfile, len, SEEK_CUR);
break;
case CP_CLASS:
case CP_STRING:
case CP_METHODTYPE:
seekerr = fseek(classfile, 2, SEEK_CUR);
break;
case CP_METHODHANDLE:
seekerr = fseek(classfile, 3, SEEK_CUR);
break;
case CP_INTEGER:
case CP_FLOAT:
case CP_FIELDREF:
case CP_METHODREF:
case CP_INTERFACEMETHODREF:
case CP_NAMEANDTYPE:
case CP_INVOKEDYNAMIC:
seekerr = fseek(classfile, 4, SEEK_CUR);
break;
case CP_LONG:
case CP_DOUBLE:
seekerr = fseek(classfile, 8, SEEK_CUR);
++(*cur);
break;
default:
corrupt_error();
}
if(seekerr)
seek_error();
}
void error(const char *format, ...)
{
va_list ap;
va_start(ap, format);
vfprintf(stderr, format, ap);
va_end(ap);
exit(1);
}
int main(int argc, char **argv)
{
FILE *classfile;
u_int16_t cp_count, i, this_class, classinfo_ptr;
u_int8_t length;
program = argv[0];
if(!argv[1])
error("%s: Missing input file\n", program);
classfile = fopen(argv[1], "rb");
if(!classfile)
error("%s: Error opening %s\n", program, argv[1]);
if(fseek(classfile, 8, SEEK_SET)) /* skip magic and version numbers */
seek_error();
cp_count = read_16(classfile);
pool = calloc(cp_count, sizeof(long));
if(!pool)
error("%s: Out of memory for constant pool\n", program);
for(i = 1; i < cp_count; ++i)
skip_constant(classfile, &i);
if(fseek(classfile, 2, SEEK_CUR)) /* skip access flags */
seek_error();
this_class = read_16(classfile);
if(this_class < 1 || this_class >= cp_count)
corrupt_error();
if(!pool[this_class] || pool[this_class] == -1)
corrupt_error();
if(fseek(classfile, pool[this_class] + 1, SEEK_SET))
seek_error();
classinfo_ptr = read_16(classfile);
if(classinfo_ptr < 1 || classinfo_ptr >= cp_count)
corrupt_error();
if(!pool[classinfo_ptr] || pool[classinfo_ptr] == -1)
corrupt_error();
if(fseek(classfile, pool[classinfo_ptr] + 1, SEEK_SET))
seek_error();
length = read_16(classfile);
for(i = 0; i < length; ++i)
{
u_int8_t x = read_8(classfile);
if((x & 0x80) || !x)
{
if((x & 0xE0) == 0xC0)
{
u_int8_t y = read_8(classfile);
if((y & 0xC0) == 0x80)
{
int c = ((x & 0x1f) << 6) + (y & 0x3f);
if(c) putchar(c);
else utf8_error();
}
else utf8_error();
}
else utf8_error();
}
else if(x == '/') putchar('.');
else putchar(x);
}
putchar('\n');
free(pool);
fclose(classfile);
return 0;
}
jarwrapper::
#!/bin/bash
# /usr/local/java/bin/jarwrapper - the wrapper for binfmt_misc/jar
java -jar $1
Now simply ``chmod +x`` the ``.class``, ``.jar`` and/or ``.html`` files you
want to execute.
To add a Java program to your path best put a symbolic link to the main
.class file into /usr/bin (or another place you like) omitting the .class
extension. The directory containing the original .class file will be
added to your CLASSPATH during execution.
To test your new setup, enter in the following simple Java app, and name
it "HelloWorld.java":
.. code-block:: java
class HelloWorld {
public static void main(String args[]) {
System.out.println("Hello World!");
}
}
Now compile the application with::
javac HelloWorld.java
Set the executable permissions of the binary file, with::
chmod 755 HelloWorld.class
And then execute it::
./HelloWorld.class
To execute Java Jar files, simple chmod the ``*.jar`` files to include
the execution bit, then just do::
./Application.jar
To execute Java Applets, simple chmod the ``*.html`` files to include
the execution bit, then just do::
./Applet.html
originally by Brian A. Lantz, brian@lantz.com
heavily edited for binfmt_misc by Richard Günther
new scripts by Colin J. Watson <cjw44@cam.ac.uk>
added executable Jar file support by Kurt Huwig <kurt@iku-netz.de>

View File

@@ -0,0 +1,209 @@
The kernel's command-line parameters
====================================
The following is a consolidated list of the kernel parameters as
implemented by the __setup(), core_param() and module_param() macros
and sorted into English Dictionary order (defined as ignoring all
punctuation and sorting digits before letters in a case insensitive
manner), and with descriptions where known.
The kernel parses parameters from the kernel command line up to "--";
if it doesn't recognize a parameter and it doesn't contain a '.', the
parameter gets passed to init: parameters with '=' go into init's
environment, others are passed as command line arguments to init.
Everything after "--" is passed as an argument to init.
Module parameters can be specified in two ways: via the kernel command
line with a module name prefix, or via modprobe, e.g.::
(kernel command line) usbcore.blinkenlights=1
(modprobe command line) modprobe usbcore blinkenlights=1
Parameters for modules which are built into the kernel need to be
specified on the kernel command line. modprobe looks through the
kernel command line (/proc/cmdline) and collects module parameters
when it loads a module, so the kernel command line can be used for
loadable modules too.
Hyphens (dashes) and underscores are equivalent in parameter names, so::
log_buf_len=1M print-fatal-signals=1
can also be entered as::
log-buf-len=1M print_fatal_signals=1
Double-quotes can be used to protect spaces in values, e.g.::
param="spaces in here"
cpu lists:
----------
Some kernel parameters take a list of CPUs as a value, e.g. isolcpus,
nohz_full, irqaffinity, rcu_nocbs. The format of this list is:
<cpu number>,...,<cpu number>
or
<cpu number>-<cpu number>
(must be a positive range in ascending order)
or a mixture
<cpu number>,...,<cpu number>-<cpu number>
Note that for the special case of a range one can split the range into equal
sized groups and for each group use some amount from the beginning of that
group:
<cpu number>-cpu number>:<used size>/<group size>
For example one can add to the command line following parameter:
isolcpus=1,2,10-20,100-2000:2/25
where the final item represents CPUs 100,101,125,126,150,151,...
This document may not be entirely up to date and comprehensive. The command
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
module. Loadable modules, after being loaded into the running kernel, also
reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
parameters may be changed at runtime by the command
``echo -n ${value} > /sys/module/${modulename}/parameters/${parm}``.
The parameters listed below are only valid if certain kernel build options were
enabled and if respective hardware is present. The text in square brackets at
the beginning of each description states the restrictions within which a
parameter is applicable::
ACPI ACPI support is enabled.
AGP AGP (Accelerated Graphics Port) is enabled.
ALSA ALSA sound support is enabled.
APIC APIC support is enabled.
APM Advanced Power Management support is enabled.
ARM ARM architecture is enabled.
AVR32 AVR32 architecture is enabled.
AX25 Appropriate AX.25 support is enabled.
BLACKFIN Blackfin architecture is enabled.
CLK Common clock infrastructure is enabled.
CMA Contiguous Memory Area support is enabled.
DRM Direct Rendering Management support is enabled.
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
EFI EFI Partitioning (GPT) is enabled
EIDE EIDE/ATAPI support is enabled.
EVM Extended Verification Module
FB The frame buffer device is enabled.
FTRACE Function tracing enabled.
GCOV GCOV profiling is enabled.
HW Appropriate hardware is enabled.
IA-64 IA-64 architecture is enabled.
IMA Integrity measurement architecture is enabled.
IOSCHED More than one I/O scheduler is enabled.
IP_PNP IP DHCP, BOOTP, or RARP is enabled.
IPV6 IPv6 support is enabled.
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
JOY Appropriate joystick support is enabled.
KGDB Kernel debugger support is enabled.
KVM Kernel Virtual Machine support is enabled.
LIBATA Libata driver is enabled
LP Printer support is enabled.
LOOP Loopback device support is enabled.
M68k M68k architecture is enabled.
These options have more detailed description inside of
Documentation/m68k/kernel-options.txt.
MDA MDA console support is enabled.
MIPS MIPS architecture is enabled.
MOUSE Appropriate mouse support is enabled.
MSI Message Signaled Interrupts (PCI).
MTD MTD (Memory Technology Device) support is enabled.
NET Appropriate network support is enabled.
NUMA NUMA support is enabled.
NFS Appropriate NFS support is enabled.
OSS OSS sound support is enabled.
PV_OPS A paravirtualized kernel is enabled.
PARIDE The ParIDE (parallel port IDE) subsystem is enabled.
PARISC The PA-RISC architecture is enabled.
PCI PCI bus support is enabled.
PCIE PCI Express support is enabled.
PCMCIA The PCMCIA subsystem is enabled.
PNP Plug & Play support is enabled.
PPC PowerPC architecture is enabled.
PPT Parallel port support is enabled.
PS2 Appropriate PS/2 support is enabled.
RAM RAM disk support is enabled.
S390 S390 architecture is enabled.
SCSI Appropriate SCSI support is enabled.
A lot of drivers have their options described inside
the Documentation/scsi/ sub-directory.
SECURITY Different security models are enabled.
SELINUX SELinux support is enabled.
APPARMOR AppArmor support is enabled.
SERIAL Serial support is enabled.
SH SuperH architecture is enabled.
SMP The kernel is an SMP kernel.
SPARC Sparc architecture is enabled.
SWSUSP Software suspend (hibernation) is enabled.
SUSPEND System suspend states are enabled.
TPM TPM drivers are enabled.
TS Appropriate touchscreen support is enabled.
UMS USB Mass Storage support is enabled.
USB USB support is enabled.
USBHID USB Human Interface Device support is enabled.
V4L Video For Linux support is enabled.
VMMIO Driver for memory mapped virtio devices is enabled.
VGA The VGA console has been enabled.
VT Virtual terminal support is enabled.
WDT Watchdog support is enabled.
XT IBM PC/XT MFM hard disk support is enabled.
X86-32 X86-32, aka i386 architecture is enabled.
X86-64 X86-64 architecture is enabled.
More X86-64 boot options can be found in
Documentation/x86/x86_64/boot-options.txt .
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
X86_UV SGI UV support is enabled.
XEN Xen support is enabled
In addition, the following text indicates that the option::
BUGS= Relates to possible processor bugs on the said processor.
KNL Is a kernel start-up parameter.
BOOT Is a boot loader parameter.
Parameters denoted with BOOT are actually interpreted by the boot
loader, and have no meaning to the kernel directly.
Do not modify the syntax of boot loader parameters without extreme
need or coordination with <Documentation/x86/boot.txt>.
There are also arch-specific kernel-parameters not documented here.
See for example <Documentation/x86/x86_64/boot-options.txt>.
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
a trailing = on the name of any parameter states that that parameter will
be entered as an environment variable, whereas its absence indicates that
it will appear as a kernel argument readable via /proc/cmdline by programs
running once the system is up.
The number of kernel parameters is not limited, but the length of the
complete command line (parameters including spaces etc.) is limited to
a fixed number of characters. This limit depends on the architecture
and is between 256 and 4096 characters. It is defined in the file
./include/asm/setup.h as COMMAND_LINE_SIZE.
Finally, the [KMG] suffix is commonly described after a number of kernel
parameter values. These 'K', 'M', and 'G' letters represent the _binary_
multipliers 'Kilo', 'Mega', and 'Giga', equalling 2^10, 2^20, and 2^30
bytes respectively. Such letter suffixes can also be entirely omitted:
.. include:: kernel-parameters.txt
:literal:
Todo
----
Add more DRM drivers.

View File

@@ -1,171 +1,3 @@
Kernel Parameters
~~~~~~~~~~~~~~~~~
The following is a consolidated list of the kernel parameters as
implemented by the __setup(), core_param() and module_param() macros
and sorted into English Dictionary order (defined as ignoring all
punctuation and sorting digits before letters in a case insensitive
manner), and with descriptions where known.
The kernel parses parameters from the kernel command line up to "--";
if it doesn't recognize a parameter and it doesn't contain a '.', the
parameter gets passed to init: parameters with '=' go into init's
environment, others are passed as command line arguments to init.
Everything after "--" is passed as an argument to init.
Module parameters can be specified in two ways: via the kernel command
line with a module name prefix, or via modprobe, e.g.:
(kernel command line) usbcore.blinkenlights=1
(modprobe command line) modprobe usbcore blinkenlights=1
Parameters for modules which are built into the kernel need to be
specified on the kernel command line. modprobe looks through the
kernel command line (/proc/cmdline) and collects module parameters
when it loads a module, so the kernel command line can be used for
loadable modules too.
Hyphens (dashes) and underscores are equivalent in parameter names, so
log_buf_len=1M print-fatal-signals=1
can also be entered as
log-buf-len=1M print_fatal_signals=1
Double-quotes can be used to protect spaces in values, e.g.:
param="spaces in here"
This document may not be entirely up to date and comprehensive. The command
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
module. Loadable modules, after being loaded into the running kernel, also
reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
parameters may be changed at runtime by the command
"echo -n ${value} > /sys/module/${modulename}/parameters/${parm}".
The parameters listed below are only valid if certain kernel build options were
enabled and if respective hardware is present. The text in square brackets at
the beginning of each description states the restrictions within which a
parameter is applicable:
ACPI ACPI support is enabled.
AGP AGP (Accelerated Graphics Port) is enabled.
ALSA ALSA sound support is enabled.
APIC APIC support is enabled.
APM Advanced Power Management support is enabled.
ARM ARM architecture is enabled.
AVR32 AVR32 architecture is enabled.
AX25 Appropriate AX.25 support is enabled.
BLACKFIN Blackfin architecture is enabled.
CLK Common clock infrastructure is enabled.
CMA Contiguous Memory Area support is enabled.
DRM Direct Rendering Management support is enabled.
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
EFI EFI Partitioning (GPT) is enabled
EIDE EIDE/ATAPI support is enabled.
EVM Extended Verification Module
FB The frame buffer device is enabled.
FTRACE Function tracing enabled.
GCOV GCOV profiling is enabled.
HW Appropriate hardware is enabled.
IA-64 IA-64 architecture is enabled.
IMA Integrity measurement architecture is enabled.
IOSCHED More than one I/O scheduler is enabled.
IP_PNP IP DHCP, BOOTP, or RARP is enabled.
IPV6 IPv6 support is enabled.
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
JOY Appropriate joystick support is enabled.
KGDB Kernel debugger support is enabled.
KVM Kernel Virtual Machine support is enabled.
LIBATA Libata driver is enabled
LP Printer support is enabled.
LOOP Loopback device support is enabled.
M68k M68k architecture is enabled.
These options have more detailed description inside of
Documentation/m68k/kernel-options.txt.
MDA MDA console support is enabled.
MIPS MIPS architecture is enabled.
MOUSE Appropriate mouse support is enabled.
MSI Message Signaled Interrupts (PCI).
MTD MTD (Memory Technology Device) support is enabled.
NET Appropriate network support is enabled.
NUMA NUMA support is enabled.
NFS Appropriate NFS support is enabled.
OSS OSS sound support is enabled.
PV_OPS A paravirtualized kernel is enabled.
PARIDE The ParIDE (parallel port IDE) subsystem is enabled.
PARISC The PA-RISC architecture is enabled.
PCI PCI bus support is enabled.
PCIE PCI Express support is enabled.
PCMCIA The PCMCIA subsystem is enabled.
PNP Plug & Play support is enabled.
PPC PowerPC architecture is enabled.
PPT Parallel port support is enabled.
PS2 Appropriate PS/2 support is enabled.
RAM RAM disk support is enabled.
S390 S390 architecture is enabled.
SCSI Appropriate SCSI support is enabled.
A lot of drivers have their options described inside
the Documentation/scsi/ sub-directory.
SECURITY Different security models are enabled.
SELINUX SELinux support is enabled.
APPARMOR AppArmor support is enabled.
SERIAL Serial support is enabled.
SH SuperH architecture is enabled.
SMP The kernel is an SMP kernel.
SPARC Sparc architecture is enabled.
SWSUSP Software suspend (hibernation) is enabled.
SUSPEND System suspend states are enabled.
TPM TPM drivers are enabled.
TS Appropriate touchscreen support is enabled.
UMS USB Mass Storage support is enabled.
USB USB support is enabled.
USBHID USB Human Interface Device support is enabled.
V4L Video For Linux support is enabled.
VMMIO Driver for memory mapped virtio devices is enabled.
VGA The VGA console has been enabled.
VT Virtual terminal support is enabled.
WDT Watchdog support is enabled.
XT IBM PC/XT MFM hard disk support is enabled.
X86-32 X86-32, aka i386 architecture is enabled.
X86-64 X86-64 architecture is enabled.
More X86-64 boot options can be found in
Documentation/x86/x86_64/boot-options.txt .
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
X86_UV SGI UV support is enabled.
XEN Xen support is enabled
In addition, the following text indicates that the option:
BUGS= Relates to possible processor bugs on the said processor.
KNL Is a kernel start-up parameter.
BOOT Is a boot loader parameter.
Parameters denoted with BOOT are actually interpreted by the boot
loader, and have no meaning to the kernel directly.
Do not modify the syntax of boot loader parameters without extreme
need or coordination with <Documentation/x86/boot.txt>.
There are also arch-specific kernel-parameters not documented here.
See for example <Documentation/x86/x86_64/boot-options.txt>.
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
a trailing = on the name of any parameter states that that parameter will
be entered as an environment variable, whereas its absence indicates that
it will appear as a kernel argument readable via /proc/cmdline by programs
running once the system is up.
The number of kernel parameters is not limited, but the length of the
complete command line (parameters including spaces etc.) is limited to
a fixed number of characters. This limit depends on the architecture
and is between 256 and 4096 characters. It is defined in the file
./include/asm/setup.h as COMMAND_LINE_SIZE.
Finally, the [KMG] suffix is commonly described after a number of kernel
parameter values. These 'K', 'M', and 'G' letters represent the _binary_
multipliers 'Kilo', 'Mega', and 'Giga', equalling 2^10, 2^20, and 2^30
bytes respectively. Such letter suffixes can also be entirely omitted.
acpi= [HW,ACPI,X86,ARM64]
Advanced Configuration and Power Interface
Format: { force | on | off | strict | noirq | rsdt |
@@ -274,6 +106,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
use by PCI
Format: <irq>,<irq>...
acpi_mask_gpe= [HW,ACPI]
Due to the existence of _Lxx/_Exx, some GPEs triggered
by unsupported hardware/firmware features can result in
GPE floodings that cannot be automatically disabled by
the GPE dispatcher.
This facility can be used to prevent such uncontrolled
GPE floodings.
Format: <int>
Support masking of GPEs numbered from 0x00 to 0x7f.
acpi_no_auto_serialize [HW,ACPI]
Disable auto-serialization of AML methods
AML control methods that contain the opcodes to create
@@ -460,6 +302,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
driver will print ACPI tables for AMD IOMMU during
IOMMU initialization.
amd_iommu_intr= [HW,X86-64]
Specifies one of the following AMD IOMMU interrupt
remapping modes:
legacy - Use legacy interrupt remapping mode.
vapic - Use virtual APIC mode, which allows IOMMU
to inject interrupts directly into guest.
This mode requires kvm-amd.avic=1.
(Default when IOMMU HW support is present.)
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: <a>,<b>
@@ -698,6 +549,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
loops can be debugged more effectively on production
systems.
clocksource.arm_arch_timer.fsl-a008585=
[ARM64]
Format: <bool>
Enable/disable the workaround of Freescale/NXP
erratum A-008585. This can be useful for KVM
guests, if the guest device tree doesn't show the
erratum. If unspecified, the workaround is
enabled based on the device tree.
clearcpuid=BITNUM [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
@@ -762,7 +622,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
bits, and "f" is flow control ("r" for RTS or
omit it). Default is "9600n8".
See Documentation/serial-console.txt for more
See Documentation/admin-guide/serial-console.rst for more
information. See
Documentation/networking/netconsole.txt for an
alternative.
@@ -1013,6 +873,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
dscc4.setup= [NET]
dump_apple_properties [X86]
Dump name and content of EFI device properties on
x86 Macs. Useful for driver authors to determine
what data is available or for reverse-engineering.
dyndbg[="val"] [KNL,DYNAMIC_DEBUG]
module.dyndbg[="val"]
Enable debug messages at boot time. See
@@ -1025,12 +890,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
nopku [X86] Disable Memory Protection Keys CPU feature found
in some Intel CPUs.
eagerfpu= [X86]
on enable eager fpu restore
off disable eager fpu restore
auto selects the default scheme, which automatically
enables eagerfpu restore for xsaveopt.
module.async_probe [KNL]
Enable asynchronous probe on this module.
@@ -1045,11 +904,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
determined by the stdout-path property in device
tree's chosen node.
cdns,<addr>
Start an early, polled-mode console on a cadence serial
port at the specified address. The cadence serial port
must already be setup and configured. Options are not
yet supported.
cdns,<addr>[,options]
Start an early, polled-mode console on a Cadence
(xuartps) serial port at the specified address. Only
supported option is baud rate. If baud rate is not
specified, the serial port must already be setup and
configured.
uart[8250],io,<addr>[,options]
uart[8250],mmio,<addr>[,options]
@@ -1364,6 +1224,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Format: <unsigned int> such that (rxsize & ~0x1fffc0) == 0.
Default: 1024
gpio-mockup.gpio_mockup_ranges
[HW] Sets the ranges of gpiochip of for this device.
Format: <start1>,<end1>,<start2>,<end2>...
hardlockup_all_cpu_backtrace=
[KNL] Should the hard-lockup detector generate
backtraces on all cpus.
@@ -1587,6 +1451,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
The builtin appraise policy appraises all files
owned by uid=0.
ima_canonical_fmt [IMA]
Use the canonical format for the binary runtime
measurements, instead of host native format.
ima_hash= [IMA]
Format: { md5 | sha1 | rmd160 | sha256 | sha384
| sha512 | ... }
@@ -1650,6 +1518,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
initrd= [BOOT] Specify the location of the initial ramdisk
init_pkru= [x86] Specify the default memory protection keys rights
register contents for all processes. 0x55555554 by
default (disallow access to all but pkey 0). Can
override in debugfs after boot.
inport.irq= [HW] Inport (ATI XL and Microsoft) busmouse driver
Format: <irq>
@@ -1701,6 +1574,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
disable
Do not enable intel_pstate as the default
scaling driver for the supported processors
passive
Use intel_pstate as a scaling driver, but configure it
to work with generic cpufreq governors (instead of
enabling its internal governor). This mode cannot be
used along with the hardware-managed P-states (HWP)
feature.
force
Enable intel_pstate on systems that prohibit it by default
in favor of acpi-cpufreq. Forcing the intel_pstate driver
@@ -1721,6 +1600,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Description Table, specifies preferred power management
profile as "Enterprise Server" or "Performance Server",
then this feature is turned on by default.
per_cpu_perf_limits
Allow per-logical-CPU P-State performance control limits using
cpufreq sysfs interface
intremap= [X86-64, Intel-IOMMU]
on enable Interrupt Remapping (default)
@@ -1768,13 +1650,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
See Documentation/filesystems/nfs/nfsroot.txt.
irqaffinity= [SMP] Set the default irq affinity mask
Format:
<cpu number>,...,<cpu number>
or
<cpu number>-<cpu number>
(must be a positive range in ascending order)
or a mixture
<cpu number>,...,<cpu number>-<cpu number>
The argument is a cpu list, as described above.
irqfixup [HW]
When an interrupt is not handled search all handlers
@@ -1791,13 +1667,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Format: <RDP>,<reset>,<pci_scan>,<verbosity>
isolcpus= [KNL,SMP] Isolate CPUs from the general scheduler.
Format:
<cpu number>,...,<cpu number>
or
<cpu number>-<cpu number>
(must be a positive range in ascending order)
or a mixture
<cpu number>,...,<cpu number>-<cpu number>
The argument is a cpu list, as described above.
This option can be used to specify one or more CPUs
to isolate from the general SMP balancing and scheduling
@@ -1911,9 +1781,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
kmemcheck=2 (one-shot mode)
Default: 2 (one-shot mode)
kstack=N [X86] Print N words from the kernel stack
in oops dumps.
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)
@@ -2188,7 +2055,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
mce=option [X86-64] See Documentation/x86/x86_64/boot-options.txt
md= [HW] RAID subsystems devices and level
See Documentation/md.txt.
See Documentation/admin-guide/md.rst.
mdacon= [MDA]
Format: <first>,<last>
@@ -2278,6 +2145,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
memory contents and reserves bad memory
regions that are detected.
mem_sleep_default= [SUSPEND] Default system suspend mode:
s2idle - Suspend-To-Idle
shallow - Power-On Suspend or equivalent (if supported)
deep - Suspend-To-RAM or equivalent (if supported)
See Documentation/power/states.txt.
meye.*= [HW] Set MotionEye Camera parameters
See Documentation/video4linux/meye.txt.
@@ -2354,7 +2227,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
that the amount of memory usable for all allocations
is not too small.
movable_node [KNL,X86] Boot-time switch to enable the effects
movable_node [KNL] Boot-time switch to enable the effects
of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
MTD_Partition= [MTD]
@@ -2430,6 +2303,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
nfsrootdebug [NFS] enable nfsroot debugging messages.
See Documentation/filesystems/nfs/nfsroot.txt.
nfs.callback_nr_threads=
[NFSv4] set the total number of threads that the
NFS client will assign to service NFSv4 callback
requests.
nfs.callback_tcpport=
[NFS] set the TCP port on which the NFSv4 callback
channel should listen.
@@ -2453,6 +2331,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
of returning the full 64-bit number.
The default is to return 64-bit inode numbers.
nfs.max_session_cb_slots=
[NFSv4.1] Sets the maximum number of session
slots the client will assign to the callback
channel. This determines the maximum number of
callbacks the client will process in parallel for
a particular server.
nfs.max_session_slots=
[NFSv4.1] Sets the maximum number of session slots
the client will attempt to negotiate with the server.
@@ -2486,7 +2371,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
will be sent.
The default is to send the implementation identification
information.
nfs.recover_lost_locks =
[NFSv4] Attempt to recover locks that were lost due
to a lease timeout on the server. Please note that
@@ -2659,6 +2544,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Default: on
nohz_full= [KNL,BOOT]
The argument is a cpu list, as described above.
In kernels built with CONFIG_NO_HZ_FULL=y, set
the specified list of CPUs whose tick will be stopped
whenever possible. The boot CPU will be forced outside
@@ -2694,6 +2580,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
fault handling.
no-vmw-sched-clock
[X86,PV_OPS] Disable paravirtualized VMware scheduler
clock and use the default one.
no-steal-acc [X86,KVM] Disable paravirtualized steal time accounting.
steal time is computed, but won't influence scheduler
behaviour
@@ -3175,6 +3065,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
may be specified.
Format: <port>,<port>....
powersave=off [PPC] This option disables power saving features.
It specifically disables cpuidle and sets the
platform machine description specific power_save
function to NULL. On Idle the CPU just reduces
execution priority.
ppc_strict_facility_enable
[PPC] This option catches any kernel floating point,
Altivec, VSX and SPE outside of regions specifically
@@ -3258,12 +3154,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
r128= [HW,DRM]
raid= [HW,RAID]
See Documentation/md.txt.
See Documentation/admin-guide/md.rst.
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
See Documentation/blockdev/ramdisk.txt.
rcu_nocbs= [KNL]
The argument is a cpu list, as described above.
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
Invocation of these CPUs' RCU callbacks will
@@ -3606,13 +3504,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/cgroup-v1/cpusets.txt.
relative_sleep_states=
[SUSPEND] Use sleep state labeling where the deepest
state available other than hibernation is always "mem".
Format: { "0" | "1" }
0 -- Traditional sleep state labels.
1 -- Relative sleep state labels.
reserve= [KNL,BUGS] Force the kernel to ignore some iomem area
reservetop= [X86-32]
@@ -3762,12 +3653,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
shapers= [NET]
Maximal number of shapers.
show_msr= [x86] show boot-time MSR settings
Format: { <integer> }
Show boot-time (BIOS-initialized) MSR settings.
The parameter means the number of CPUs to show,
for example 1 means boot CPU only.
simeth= [IA-64]
simscsi=
@@ -3936,10 +3821,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
it if 0 is given (See Documentation/cgroup-v1/memory.txt)
swiotlb= [ARM,IA-64,PPC,MIPS,X86]
Format: { <int> | force }
Format: { <int> | force | noforce }
<int> -- Number of I/O TLB slabs
force -- force using of bounce buffers even if they
wouldn't be automatically used by the kernel
noforce -- Never use bounce buffers (for debugging)
switches= [HW,M68k]
@@ -4135,7 +4021,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
See also Documentation/input/joystick-parport.txt
udbg-immortal [PPC] When debugging early kernel crashes that
happen after console_init() and before a proper
happen after console_init() and before a proper
console driver takes over, this boot options might
help "seeing" what's going on.
@@ -4249,6 +4135,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
u = IGNORE_UAS (don't bind to the uas driver);
w = NO_WP_DETECT (don't test whether the
medium is write-protected).
y = ALWAYS_SYNC (issue a SYNCHRONIZE_CACHE
even if the device claims no cache)
Example: quirks=0419:aaf5:rl,0421:0433:rc
user_debug= [KNL,ARM]
@@ -4500,9 +4388,3 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
xirc2ps_cs= [NET,PCMCIA]
Format:
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
______________________________________________________________________
TODO:
Add more DRM drivers.

View File

@@ -0,0 +1,727 @@
RAID arrays
===========
Boot time assembly of RAID arrays
---------------------------------
Tools that manage md devices can be found at
http://www.kernel.org/pub/linux/utils/raid/
You can boot with your md device with the following kernel command
lines:
for old raid arrays without persistent superblocks::
md=<md device no.>,<raid level>,<chunk size factor>,<fault level>,dev0,dev1,...,devn
for raid arrays with persistent superblocks::
md=<md device no.>,dev0,dev1,...,devn
or, to assemble a partitionable array::
md=d<md device no.>,dev0,dev1,...,devn
``md device no.``
+++++++++++++++++
The number of the md device
================= =========
``md device no.`` device
================= =========
0 md0
1 md1
2 md2
3 md3
4 md4
================= =========
``raid level``
++++++++++++++
level of the RAID array
=============== =============
``raid level`` level
=============== =============
-1 linear mode
0 striped mode
=============== =============
other modes are only supported with persistent super blocks
``chunk size factor``
+++++++++++++++++++++
(raid-0 and raid-1 only)
Set the chunk size as 4k << n.
``fault level``
+++++++++++++++
Totally ignored
``dev0`` to ``devn``
++++++++++++++++++++
e.g. ``/dev/hda1``, ``/dev/hdc1``, ``/dev/sda1``, ``/dev/sdb1``
A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this::
e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro
Boot time autodetection of RAID arrays
--------------------------------------
When md is compiled into the kernel (not as module), partitions of
type 0xfd are scanned and automatically assembled into RAID arrays.
This autodetection may be suppressed with the kernel parameter
``raid=noautodetect``. As of kernel 2.6.9, only drives with a type 0
superblock can be autodetected and run at boot time.
The kernel parameter ``raid=partitionable`` (or ``raid=part``) means
that all auto-detected arrays are assembled as partitionable.
Boot time assembly of degraded/dirty arrays
-------------------------------------------
If a raid5 or raid6 array is both dirty and degraded, it could have
undetectable data corruption. This is because the fact that it is
``dirty`` means that the parity cannot be trusted, and the fact that it
is degraded means that some datablocks are missing and cannot reliably
be reconstructed (due to no parity).
For this reason, md will normally refuse to start such an array. This
requires the sysadmin to take action to explicitly start the array
despite possible corruption. This is normally done with::
mdadm --assemble --force ....
This option is not really available if the array has the root
filesystem on it. In order to support this booting from such an
array, md supports a module parameter ``start_dirty_degraded`` which,
when set to 1, bypassed the checks and will allows dirty degraded
arrays to be started.
So, to boot with a root filesystem of a dirty degraded raid 5 or 6, use::
md-mod.start_dirty_degraded=1
Superblock formats
------------------
The md driver can support a variety of different superblock formats.
Currently, it supports superblock formats ``0.90.0`` and the ``md-1`` format
introduced in the 2.5 development series.
The kernel will autodetect which format superblock is being used.
Superblock format ``0`` is treated differently to others for legacy
reasons - it is the original superblock format.
General Rules - apply for all superblock formats
------------------------------------------------
An array is ``created`` by writing appropriate superblocks to all
devices.
It is ``assembled`` by associating each of these devices with an
particular md virtual device. Once it is completely assembled, it can
be accessed.
An array should be created by a user-space tool. This will write
superblocks to all devices. It will usually mark the array as
``unclean``, or with some devices missing so that the kernel md driver
can create appropriate redundancy (copying in raid 1, parity
calculation in raid 4/5).
When an array is assembled, it is first initialized with the
SET_ARRAY_INFO ioctl. This contains, in particular, a major and minor
version number. The major version number selects which superblock
format is to be used. The minor number might be used to tune handling
of the format, such as suggesting where on each device to look for the
superblock.
Then each device is added using the ADD_NEW_DISK ioctl. This
provides, in particular, a major and minor number identifying the
device to add.
The array is started with the RUN_ARRAY ioctl.
Once started, new devices can be added. They should have an
appropriate superblock written to them, and then be passed in with
ADD_NEW_DISK.
Devices that have failed or are not yet active can be detached from an
array using HOT_REMOVE_DISK.
Specific Rules that apply to format-0 super block arrays, and arrays with no superblock (non-persistent)
--------------------------------------------------------------------------------------------------------
An array can be ``created`` by describing the array (level, chunksize
etc) in a SET_ARRAY_INFO ioctl. This must have ``major_version==0`` and
``raid_disks != 0``.
Then uninitialized devices can be added with ADD_NEW_DISK. The
structure passed to ADD_NEW_DISK must specify the state of the device
and its role in the array.
Once started with RUN_ARRAY, uninitialized spares can be added with
HOT_ADD_DISK.
MD devices in sysfs
-------------------
md devices appear in sysfs (``/sys``) as regular block devices,
e.g.::
/sys/block/md0
Each ``md`` device will contain a subdirectory called ``md`` which
contains further md-specific information about the device.
All md devices contain:
level
a text file indicating the ``raid level``. e.g. raid0, raid1,
raid5, linear, multipath, faulty.
If no raid level has been set yet (array is still being
assembled), the value will reflect whatever has been written
to it, which may be a name like the above, or may be a number
such as ``0``, ``5``, etc.
raid_disks
a text file with a simple number indicating the number of devices
in a fully functional array. If this is not yet known, the file
will be empty. If an array is being resized this will contain
the new number of devices.
Some raid levels allow this value to be set while the array is
active. This will reconfigure the array. Otherwise it can only
be set while assembling an array.
A change to this attribute will not be permitted if it would
reduce the size of the array. To reduce the number of drives
in an e.g. raid5, the array size must first be reduced by
setting the ``array_size`` attribute.
chunk_size
This is the size in bytes for ``chunks`` and is only relevant to
raid levels that involve striping (0,4,5,6,10). The address space
of the array is conceptually divided into chunks and consecutive
chunks are striped onto neighbouring devices.
The size should be at least PAGE_SIZE (4k) and should be a power
of 2. This can only be set while assembling an array
layout
The ``layout`` for the array for the particular level. This is
simply a number that is interpretted differently by different
levels. It can be written while assembling an array.
array_size
This can be used to artificially constrain the available space in
the array to be less than is actually available on the combined
devices. Writing a number (in Kilobytes) which is less than
the available size will set the size. Any reconfiguration of the
array (e.g. adding devices) will not cause the size to change.
Writing the word ``default`` will cause the effective size of the
array to be whatever size is actually available based on
``level``, ``chunk_size`` and ``component_size``.
This can be used to reduce the size of the array before reducing
the number of devices in a raid4/5/6, or to support external
metadata formats which mandate such clipping.
reshape_position
This is either ``none`` or a sector number within the devices of
the array where ``reshape`` is up to. If this is set, the three
attributes mentioned above (raid_disks, chunk_size, layout) can
potentially have 2 values, an old and a new value. If these
values differ, reading the attribute returns::
new (old)
and writing will effect the ``new`` value, leaving the ``old``
unchanged.
component_size
For arrays with data redundancy (i.e. not raid0, linear, faulty,
multipath), all components must be the same size - or at least
there must a size that they all provide space for. This is a key
part or the geometry of the array. It is measured in sectors
and can be read from here. Writing to this value may resize
the array if the personality supports it (raid1, raid5, raid6),
and if the component drives are large enough.
metadata_version
This indicates the format that is being used to record metadata
about the array. It can be 0.90 (traditional format), 1.0, 1.1,
1.2 (newer format in varying locations) or ``none`` indicating that
the kernel isn't managing metadata at all.
Alternately it can be ``external:`` followed by a string which
is set by user-space. This indicates that metadata is managed
by a user-space program. Any device failure or other event that
requires a metadata update will cause array activity to be
suspended until the event is acknowledged.
resync_start
The point at which resync should start. If no resync is needed,
this will be a very large number (or ``none`` since 2.6.30-rc1). At
array creation it will default to 0, though starting the array as
``clean`` will set it much larger.
new_dev
This file can be written but not read. The value written should
be a block device number as major:minor. e.g. 8:0
This will cause that device to be attached to the array, if it is
available. It will then appear at md/dev-XXX (depending on the
name of the device) and further configuration is then possible.
safe_mode_delay
When an md array has seen no write requests for a certain period
of time, it will be marked as ``clean``. When another write
request arrives, the array is marked as ``dirty`` before the write
commences. This is known as ``safe_mode``.
The ``certain period`` is controlled by this file which stores the
period as a number of seconds. The default is 200msec (0.200).
Writing a value of 0 disables safemode.
array_state
This file contains a single word which describes the current
state of the array. In many cases, the state can be set by
writing the word for the desired state, however some states
cannot be explicitly set, and some transitions are not allowed.
Select/poll works on this file. All changes except between
Active_idle and active (which can be frequent and are not
very interesting) are notified. active->active_idle is
reported if the metadata is externally managed.
clear
No devices, no size, no level
Writing is equivalent to STOP_ARRAY ioctl
inactive
May have some settings, but array is not active
all IO results in error
When written, doesn't tear down array, but just stops it
suspended (not supported yet)
All IO requests will block. The array can be reconfigured.
Writing this, if accepted, will block until array is quiessent
readonly
no resync can happen. no superblocks get written.
Write requests fail
read-auto
like readonly, but behaves like ``clean`` on a write request.
clean
no pending writes, but otherwise active.
When written to inactive array, starts without resync
If a write request arrives then
if metadata is known, mark ``dirty`` and switch to ``active``.
if not known, block and switch to write-pending
If written to an active array that has pending writes, then fails.
active
fully active: IO and resync can be happening.
When written to inactive array, starts with resync
write-pending
clean, but writes are blocked waiting for ``active`` to be written.
active-idle
like active, but no writes have been seen for a while (safe_mode_delay).
bitmap/location
This indicates where the write-intent bitmap for the array is
stored.
It can be one of ``none``, ``file`` or ``[+-]N``.
``file`` may later be extended to ``file:/file/name``
``[+-]N`` means that many sectors from the start of the metadata.
This is replicated on all devices. For arrays with externally
managed metadata, the offset is from the beginning of the
device.
bitmap/chunksize
The size, in bytes, of the chunk which will be represented by a
single bit. For RAID456, it is a portion of an individual
device. For RAID10, it is a portion of the array. For RAID1, it
is both (they come to the same thing).
bitmap/time_base
The time, in seconds, between looking for bits in the bitmap to
be cleared. In the current implementation, a bit will be cleared
between 2 and 3 times ``time_base`` after all the covered blocks
are known to be in-sync.
bitmap/backlog
When write-mostly devices are active in a RAID1, write requests
to those devices proceed in the background - the filesystem (or
other user of the device) does not have to wait for them.
``backlog`` sets a limit on the number of concurrent background
writes. If there are more than this, new writes will by
synchronous.
bitmap/metadata
This can be either ``internal`` or ``external``.
``internal``
is the default and means the metadata for the bitmap
is stored in the first 256 bytes of the allocated space and is
managed by the md module.
``external``
means that bitmap metadata is managed externally to
the kernel (i.e. by some userspace program)
bitmap/can_clear
This is either ``true`` or ``false``. If ``true``, then bits in the
bitmap will be cleared when the corresponding blocks are thought
to be in-sync. If ``false``, bits will never be cleared.
This is automatically set to ``false`` if a write happens on a
degraded array, or if the array becomes degraded during a write.
When metadata is managed externally, it should be set to true
once the array becomes non-degraded, and this fact has been
recorded in the metadata.
As component devices are added to an md array, they appear in the ``md``
directory as new directories named::
dev-XXX
where ``XXX`` is a name that the kernel knows for the device, e.g. hdb1.
Each directory contains:
block
a symlink to the block device in /sys/block, e.g.::
/sys/block/md0/md/dev-hdb1/block -> ../../../../block/hdb/hdb1
super
A file containing an image of the superblock read from, or
written to, that device.
state
A file recording the current state of the device in the array
which can be a comma separated list of:
faulty
device has been kicked from active use due to
a detected fault, or it has unacknowledged bad
blocks
in_sync
device is a fully in-sync member of the array
writemostly
device will only be subject to read
requests if there are no other options.
This applies only to raid1 arrays.
blocked
device has failed, and the failure hasn't been
acknowledged yet by the metadata handler.
Writes that would write to this device if
it were not faulty are blocked.
spare
device is working, but not a full member.
This includes spares that are in the process
of being recovered to
write_error
device has ever seen a write error.
want_replacement
device is (mostly) working but probably
should be replaced, either due to errors or
due to user request.
replacement
device is a replacement for another active
device with same raid_disk.
This list may grow in future.
This can be written to.
Writing ``faulty`` simulates a failure on the device.
Writing ``remove`` removes the device from the array.
Writing ``writemostly`` sets the writemostly flag.
Writing ``-writemostly`` clears the writemostly flag.
Writing ``blocked`` sets the ``blocked`` flag.
Writing ``-blocked`` clears the ``blocked`` flags and allows writes
to complete and possibly simulates an error.
Writing ``in_sync`` sets the in_sync flag.
Writing ``write_error`` sets writeerrorseen flag.
Writing ``-write_error`` clears writeerrorseen flag.
Writing ``want_replacement`` is allowed at any time except to a
replacement device or a spare. It sets the flag.
Writing ``-want_replacement`` is allowed at any time. It clears
the flag.
Writing ``replacement`` or ``-replacement`` is only allowed before
starting the array. It sets or clears the flag.
This file responds to select/poll. Any change to ``faulty``
or ``blocked`` causes an event.
errors
An approximate count of read errors that have been detected on
this device but have not caused the device to be evicted from
the array (either because they were corrected or because they
happened while the array was read-only). When using version-1
metadata, this value persists across restarts of the array.
This value can be written while assembling an array thus
providing an ongoing count for arrays with metadata managed by
userspace.
slot
This gives the role that the device has in the array. It will
either be ``none`` if the device is not active in the array
(i.e. is a spare or has failed) or an integer less than the
``raid_disks`` number for the array indicating which position
it currently fills. This can only be set while assembling an
array. A device for which this is set is assumed to be working.
offset
This gives the location in the device (in sectors from the
start) where data from the array will be stored. Any part of
the device before this offset is not touched, unless it is
used for storing metadata (Formats 1.1 and 1.2).
size
The amount of the device, after the offset, that can be used
for storage of data. This will normally be the same as the
component_size. This can be written while assembling an
array. If a value less than the current component_size is
written, it will be rejected.
recovery_start
When the device is not ``in_sync``, this records the number of
sectors from the start of the device which are known to be
correct. This is normally zero, but during a recovery
operation it will steadily increase, and if the recovery is
interrupted, restoring this value can cause recovery to
avoid repeating the earlier blocks. With v1.x metadata, this
value is saved and restored automatically.
This can be set whenever the device is not an active member of
the array, either before the array is activated, or before
the ``slot`` is set.
Setting this to ``none`` is equivalent to setting ``in_sync``.
Setting to any other value also clears the ``in_sync`` flag.
bad_blocks
This gives the list of all known bad blocks in the form of
start address and length (in sectors respectively). If output
is too big to fit in a page, it will be truncated. Writing
``sector length`` to this file adds new acknowledged (i.e.
recorded to disk safely) bad blocks.
unacknowledged_bad_blocks
This gives the list of known-but-not-yet-saved-to-disk bad
blocks in the same form of ``bad_blocks``. If output is too big
to fit in a page, it will be truncated. Writing to this file
adds bad blocks without acknowledging them. This is largely
for testing.
An active md device will also contain an entry for each active device
in the array. These are named::
rdNN
where ``NN`` is the position in the array, starting from 0.
So for a 3 drive array there will be rd0, rd1, rd2.
These are symbolic links to the appropriate ``dev-XXX`` entry.
Thus, for example::
cat /sys/block/md*/md/rd*/state
will show ``in_sync`` on every line.
Active md devices for levels that support data redundancy (1,4,5,6,10)
also have
sync_action
a text file that can be used to monitor and control the rebuild
process. It contains one word which can be one of:
resync
redundancy is being recalculated after unclean
shutdown or creation
recover
a hot spare is being built to replace a
failed/missing device
idle
nothing is happening
check
A full check of redundancy was requested and is
happening. This reads all blocks and checks
them. A repair may also happen for some raid
levels.
repair
A full check and repair is happening. This is
similar to ``resync``, but was requested by the
user, and the write-intent bitmap is NOT used to
optimise the process.
This file is writable, and each of the strings that could be
read are meaningful for writing.
``idle`` will stop an active resync/recovery etc. There is no
guarantee that another resync/recovery may not be automatically
started again, though some event will be needed to trigger
this.
``resync`` or ``recovery`` can be used to restart the
corresponding operation if it was stopped with ``idle``.
``check`` and ``repair`` will start the appropriate process
providing the current state is ``idle``.
This file responds to select/poll. Any important change in the value
triggers a poll event. Sometimes the value will briefly be
``recover`` if a recovery seems to be needed, but cannot be
achieved. In that case, the transition to ``recover`` isn't
notified, but the transition away is.
degraded
This contains a count of the number of devices by which the
arrays is degraded. So an optimal array will show ``0``. A
single failed/missing drive will show ``1``, etc.
This file responds to select/poll, any increase or decrease
in the count of missing devices will trigger an event.
mismatch_count
When performing ``check`` and ``repair``, and possibly when
performing ``resync``, md will count the number of errors that are
found. The count in ``mismatch_cnt`` is the number of sectors
that were re-written, or (for ``check``) would have been
re-written. As most raid levels work in units of pages rather
than sectors, this may be larger than the number of actual errors
by a factor of the number of sectors in a page.
bitmap_set_bits
If the array has a write-intent bitmap, then writing to this
attribute can set bits in the bitmap, indicating that a resync
would need to check the corresponding blocks. Either individual
numbers or start-end pairs can be written. Multiple numbers
can be separated by a space.
Note that the numbers are ``bit`` numbers, not ``block`` numbers.
They should be scaled by the bitmap_chunksize.
sync_speed_min, sync_speed_max
This are similar to ``/proc/sys/dev/raid/speed_limit_{min,max}``
however they only apply to the particular array.
If no value has been written to these, or if the word ``system``
is written, then the system-wide value is used. If a value,
in kibibytes-per-second is written, then it is used.
When the files are read, they show the currently active value
followed by ``(local)`` or ``(system)`` depending on whether it is
a locally set or system-wide value.
sync_completed
This shows the number of sectors that have been completed of
whatever the current sync_action is, followed by the number of
sectors in total that could need to be processed. The two
numbers are separated by a ``/`` thus effectively showing one
value, a fraction of the process that is complete.
A ``select`` on this attribute will return when resync completes,
when it reaches the current sync_max (below) and possibly at
other times.
sync_speed
This shows the current actual speed, in K/sec, of the current
sync_action. It is averaged over the last 30 seconds.
suspend_lo, suspend_hi
The two values, given as numbers of sectors, indicate a range
within the array where IO will be blocked. This is currently
only supported for raid4/5/6.
sync_min, sync_max
The two values, given as numbers of sectors, indicate a range
within the array where ``check``/``repair`` will operate. Must be
a multiple of chunk_size. When it reaches ``sync_max`` it will
pause, rather than complete.
You can use ``select`` or ``poll`` on ``sync_completed`` to wait for
that number to reach sync_max. Then you can either increase
``sync_max``, or can write ``idle`` to ``sync_action``.
The value of ``max`` for ``sync_max`` effectively disables the limit.
When a resync is active, the value can only ever be increased,
never decreased.
The value of ``0`` is the minimum for ``sync_min``.
Each active md device may also have attributes specific to the
personality module that manages it.
These are specific to the implementation of the module and could
change substantially if the implementation changes.
These currently include:
stripe_cache_size (currently raid5 only)
number of entries in the stripe cache. This is writable, but
there are upper and lower limits (32768, 17). Default is 256.
strip_cache_active (currently raid5 only)
number of active entries in the stripe cache
preread_bypass_threshold (currently raid5 only)
number of times a stripe requiring preread will be bypassed by
a stripe that does not require preread. For fairness defaults
to 1. Setting this to 0 disables bypass accounting and
requires preread stripes to wait until all full-width stripe-
writes are complete. Valid values are 0 to stripe_cache_size.

View File

@@ -0,0 +1,285 @@
Kernel module signing facility
------------------------------
.. CONTENTS
..
.. - Overview.
.. - Configuring module signing.
.. - Generating signing keys.
.. - Public keys in the kernel.
.. - Manually signing modules.
.. - Signed modules and stripping.
.. - Loading signed modules.
.. - Non-valid signatures and unsigned modules.
.. - Administering/protecting the private key.
========
Overview
========
The kernel module signing facility cryptographically signs modules during
installation and then checks the signature upon loading the module. This
allows increased kernel security by disallowing the loading of unsigned modules
or modules signed with an invalid key. Module signing increases security by
making it harder to load a malicious module into the kernel. The module
signature checking is done by the kernel so that it is not necessary to have
trusted userspace bits.
This facility uses X.509 ITU-T standard certificates to encode the public keys
involved. The signatures are not themselves encoded in any industrial standard
type. The facility currently only supports the RSA public key encryption
standard (though it is pluggable and permits others to be used). The possible
hash algorithms that can be used are SHA-1, SHA-224, SHA-256, SHA-384, and
SHA-512 (the algorithm is selected by data in the signature).
==========================
Configuring module signing
==========================
The module signing facility is enabled by going to the
:menuselection:`Enable Loadable Module Support` section of
the kernel configuration and turning on::
CONFIG_MODULE_SIG "Module signature verification"
This has a number of options available:
(1) :menuselection:`Require modules to be validly signed`
(``CONFIG_MODULE_SIG_FORCE``)
This specifies how the kernel should deal with a module that has a
signature for which the key is not known or a module that is unsigned.
If this is off (ie. "permissive"), then modules for which the key is not
available and modules that are unsigned are permitted, but the kernel will
be marked as being tainted, and the concerned modules will be marked as
tainted, shown with the character 'E'.
If this is on (ie. "restrictive"), only modules that have a valid
signature that can be verified by a public key in the kernel's possession
will be loaded. All other modules will generate an error.
Irrespective of the setting here, if the module has a signature block that
cannot be parsed, it will be rejected out of hand.
(2) :menuselection:`Automatically sign all modules`
(``CONFIG_MODULE_SIG_ALL``)
If this is on then modules will be automatically signed during the
modules_install phase of a build. If this is off, then the modules must
be signed manually using::
scripts/sign-file
(3) :menuselection:`Which hash algorithm should modules be signed with?`
This presents a choice of which hash algorithm the installation phase will
sign the modules with:
=============================== ==========================================
``CONFIG_MODULE_SIG_SHA1`` :menuselection:`Sign modules with SHA-1`
``CONFIG_MODULE_SIG_SHA224`` :menuselection:`Sign modules with SHA-224`
``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with SHA-256`
``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with SHA-384`
``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with SHA-512`
=============================== ==========================================
The algorithm selected here will also be built into the kernel (rather
than being a module) so that modules signed with that algorithm can have
their signatures checked without causing a dependency loop.
(4) :menuselection:`File name or PKCS#11 URI of module signing key`
(``CONFIG_MODULE_SIG_KEY``)
Setting this option to something other than its default of
``certs/signing_key.pem`` will disable the autogeneration of signing keys
and allow the kernel modules to be signed with a key of your choosing.
The string provided should identify a file containing both a private key
and its corresponding X.509 certificate in PEM form, or — on systems where
the OpenSSL ENGINE_pkcs11 is functional — a PKCS#11 URI as defined by
RFC7512. In the latter case, the PKCS#11 URI should reference both a
certificate and a private key.
If the PEM file containing the private key is encrypted, or if the
PKCS#11 token requries a PIN, this can be provided at build time by
means of the ``KBUILD_SIGN_PIN`` variable.
(5) :menuselection:`Additional X.509 keys for default system keyring`
(``CONFIG_SYSTEM_TRUSTED_KEYS``)
This option can be set to the filename of a PEM-encoded file containing
additional certificates which will be included in the system keyring by
default.
Note that enabling module signing adds a dependency on the OpenSSL devel
packages to the kernel build processes for the tool that does the signing.
=======================
Generating signing keys
=======================
Cryptographic keypairs are required to generate and check signatures. A
private key is used to generate a signature and the corresponding public key is
used to check it. The private key is only needed during the build, after which
it can be deleted or stored securely. The public key gets built into the
kernel so that it can be used to check the signatures as the modules are
loaded.
Under normal conditions, when ``CONFIG_MODULE_SIG_KEY`` is unchanged from its
default, the kernel build will automatically generate a new keypair using
openssl if one does not exist in the file::
certs/signing_key.pem
during the building of vmlinux (the public part of the key needs to be built
into vmlinux) using parameters in the::
certs/x509.genkey
file (which is also generated if it does not already exist).
It is strongly recommended that you provide your own x509.genkey file.
Most notably, in the x509.genkey file, the req_distinguished_name section
should be altered from the default::
[ req_distinguished_name ]
#O = Unspecified company
CN = Build time autogenerated kernel key
#emailAddress = unspecified.user@unspecified.company
The generated RSA key size can also be set with::
[ req ]
default_bits = 4096
It is also possible to manually generate the key private/public files using the
x509.genkey key generation configuration file in the root node of the Linux
kernel sources tree and the openssl command. The following is an example to
generate the public/private key files::
openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
-config x509.genkey -outform PEM -out kernel_key.pem \
-keyout kernel_key.pem
The full pathname for the resulting kernel_key.pem file can then be specified
in the ``CONFIG_MODULE_SIG_KEY`` option, and the certificate and key therein will
be used instead of an autogenerated keypair.
=========================
Public keys in the kernel
=========================
The kernel contains a ring of public keys that can be viewed by root. They're
in a keyring called ".system_keyring" that can be seen by::
[root@deneb ~]# cat /proc/keys
...
223c7853 I------ 1 perm 1f030000 0 0 keyring .system_keyring: 1
302d2d52 I------ 1 perm 1f010000 0 0 asymmetri Fedora kernel signing key: d69a84e6bce3d216b979e9505b3e3ef9a7118079: X509.RSA a7118079 []
...
Beyond the public key generated specifically for module signing, additional
trusted certificates can be provided in a PEM-encoded file referenced by the
``CONFIG_SYSTEM_TRUSTED_KEYS`` configuration option.
Further, the architecture code may take public keys from a hardware store and
add those in also (e.g. from the UEFI key database).
Finally, it is possible to add additional public keys by doing::
keyctl padd asymmetric "" [.system_keyring-ID] <[key-file]
e.g.::
keyctl padd asymmetric "" 0x223c7853 <my_public_key.x509
Note, however, that the kernel will only permit keys to be added to
``.system_keyring _if_`` the new key's X.509 wrapper is validly signed by a key
that is already resident in the .system_keyring at the time the key was added.
========================
Manually signing modules
========================
To manually sign a module, use the scripts/sign-file tool available in
the Linux kernel source tree. The script requires 4 arguments:
1. The hash algorithm (e.g., sha256)
2. The private key filename or PKCS#11 URI
3. The public key filename
4. The kernel module to be signed
The following is an example to sign a kernel module::
scripts/sign-file sha512 kernel-signkey.priv \
kernel-signkey.x509 module.ko
The hash algorithm used does not have to match the one configured, but if it
doesn't, you should make sure that hash algorithm is either built into the
kernel or can be loaded without requiring itself.
If the private key requires a passphrase or PIN, it can be provided in the
$KBUILD_SIGN_PIN environment variable.
============================
Signed modules and stripping
============================
A signed module has a digital signature simply appended at the end. The string
``~Module signature appended~.`` at the end of the module's file confirms that a
signature is present but it does not confirm that the signature is valid!
Signed modules are BRITTLE as the signature is outside of the defined ELF
container. Thus they MAY NOT be stripped once the signature is computed and
attached. Note the entire module is the signed payload, including any and all
debug information present at the time of signing.
======================
Loading signed modules
======================
Modules are loaded with insmod, modprobe, ``init_module()`` or
``finit_module()``, exactly as for unsigned modules as no processing is
done in userspace. The signature checking is all done within the kernel.
=========================================
Non-valid signatures and unsigned modules
=========================================
If ``CONFIG_MODULE_SIG_FORCE`` is enabled or module.sig_enforce=1 is supplied on
the kernel command line, the kernel will only load validly signed modules
for which it has a public key. Otherwise, it will also load modules that are
unsigned. Any module for which the kernel has a key, but which proves to have
a signature mismatch will not be permitted to load.
Any module that has an unparseable signature will be rejected.
=========================================
Administering/protecting the private key
=========================================
Since the private key is used to sign modules, viruses and malware could use
the private key to sign modules and compromise the operating system. The
private key must be either destroyed or moved to a secure location and not kept
in the root node of the kernel source tree.
If you use the same private key to sign modules for multiple kernel
configurations, you must ensure that the module version information is
sufficient to prevent loading a module into a different kernel. Either
set ``CONFIG_MODVERSIONS=y`` or ensure that each configuration has a different
kernel release string by changing ``EXTRAVERSION`` or ``CONFIG_LOCALVERSION``.

View File

@@ -0,0 +1,70 @@
Mono(tm) Binary Kernel Support for Linux
-----------------------------------------
To configure Linux to automatically execute Mono-based .NET binaries
(in the form of .exe files) without the need to use the mono CLR
wrapper, you can use the BINFMT_MISC kernel support.
This will allow you to execute Mono-based .NET binaries just like any
other program after you have done the following:
1) You MUST FIRST install the Mono CLR support, either by downloading
a binary package, a source tarball or by installing from CVS. Binary
packages for several distributions can be found at:
http://go-mono.com/download.html
Instructions for compiling Mono can be found at:
http://www.go-mono.com/compiling.html
Once the Mono CLR support has been installed, just check that
``/usr/bin/mono`` (which could be located elsewhere, for example
``/usr/local/bin/mono``) is working.
2) You have to compile BINFMT_MISC either as a module or into
the kernel (``CONFIG_BINFMT_MISC``) and set it up properly.
If you choose to compile it as a module, you will have
to insert it manually with modprobe/insmod, as kmod
cannot be easily supported with binfmt_misc.
Read the file ``binfmt_misc.txt`` in this directory to know
more about the configuration process.
3) Add the following entries to ``/etc/rc.local`` or similar script
to be run at system startup:
.. code-block:: sh
# Insert BINFMT_MISC module into the kernel
if [ ! -e /proc/sys/fs/binfmt_misc/register ]; then
/sbin/modprobe binfmt_misc
# Some distributions, like Fedora Core, perform
# the following command automatically when the
# binfmt_misc module is loaded into the kernel
# or during normal boot up (systemd-based systems).
# Thus, it is possible that the following line
# is not needed at all.
mount -t binfmt_misc none /proc/sys/fs/binfmt_misc
fi
# Register support for .NET CLR binaries
if [ -e /proc/sys/fs/binfmt_misc/register ]; then
# Replace /usr/bin/mono with the correct pathname to
# the Mono CLR runtime (usually /usr/local/bin/mono
# when compiling from sources or CVS).
echo ':CLR:M::MZ::/usr/bin/mono:' > /proc/sys/fs/binfmt_misc/register
else
echo "No binfmt_misc support"
exit 1
fi
4) Check that ``.exe`` binaries can be ran without the need of a
wrapper script, simply by launching the ``.exe`` file directly
from a command prompt, for example::
/usr/bin/xsd.exe
.. note::
If this fails with a permission denied error, check
that the ``.exe`` file has execute permissions.

View File

@@ -0,0 +1,286 @@
Parport
+++++++
The ``parport`` code provides parallel-port support under Linux. This
includes the ability to share one port between multiple device
drivers.
You can pass parameters to the ``parport`` code to override its automatic
detection of your hardware. This is particularly useful if you want
to use IRQs, since in general these can't be autoprobed successfully.
By default IRQs are not used even if they **can** be probed. This is
because there are a lot of people using the same IRQ for their
parallel port and a sound card or network card.
The ``parport`` code is split into two parts: generic (which deals with
port-sharing) and architecture-dependent (which deals with actually
using the port).
Parport as modules
==================
If you load the `parport`` code as a module, say::
# insmod parport
to load the generic ``parport`` code. You then must load the
architecture-dependent code with (for example)::
# insmod parport_pc io=0x3bc,0x378,0x278 irq=none,7,auto
to tell the ``parport`` code that you want three PC-style ports, one at
0x3bc with no IRQ, one at 0x378 using IRQ 7, and one at 0x278 with an
auto-detected IRQ. Currently, PC-style (``parport_pc``), Sun ``bpp``,
Amiga, Atari, and MFC3 hardware is supported.
PCI parallel I/O card support comes from ``parport_pc``. Base I/O
addresses should not be specified for supported PCI cards since they
are automatically detected.
modprobe
--------
If you use modprobe , you will find it useful to add lines as below to a
configuration file in /etc/modprobe.d/ directory::
alias parport_lowlevel parport_pc
options parport_pc io=0x378,0x278 irq=7,auto
modprobe will load ``parport_pc`` (with the options ``io=0x378,0x278 irq=7,auto``)
whenever a parallel port device driver (such as ``lp``) is loaded.
Note that these are example lines only! You shouldn't in general need
to specify any options to ``parport_pc`` in order to be able to use a
parallel port.
Parport probe [optional]
------------------------
In 2.2 kernels there was a module called ``parport_probe``, which was used
for collecting IEEE 1284 device ID information. This has now been
enhanced and now lives with the IEEE 1284 support. When a parallel
port is detected, the devices that are connected to it are analysed,
and information is logged like this::
parport0: Printer, BJC-210 (Canon)
The probe information is available from files in ``/proc/sys/dev/parport/``.
Parport linked into the kernel statically
=========================================
If you compile the ``parport`` code into the kernel, then you can use
kernel boot parameters to get the same effect. Add something like the
following to your LILO command line::
parport=0x3bc parport=0x378,7 parport=0x278,auto,nofifo
You can have many ``parport=...`` statements, one for each port you want
to add. Adding ``parport=0`` to the kernel command-line will disable
parport support entirely. Adding ``parport=auto`` to the kernel
command-line will make ``parport`` use any IRQ lines or DMA channels that
it auto-detects.
Files in /proc
==============
If you have configured the ``/proc`` filesystem into your kernel, you will
see a new directory entry: ``/proc/sys/dev/parport``. In there will be a
directory entry for each parallel port for which parport is
configured. In each of those directories are a collection of files
describing that parallel port.
The ``/proc/sys/dev/parport`` directory tree looks like::
parport
|-- default
| |-- spintime
| `-- timeslice
|-- parport0
| |-- autoprobe
| |-- autoprobe0
| |-- autoprobe1
| |-- autoprobe2
| |-- autoprobe3
| |-- devices
| | |-- active
| | `-- lp
| | `-- timeslice
| |-- base-addr
| |-- irq
| |-- dma
| |-- modes
| `-- spintime
`-- parport1
|-- autoprobe
|-- autoprobe0
|-- autoprobe1
|-- autoprobe2
|-- autoprobe3
|-- devices
| |-- active
| `-- ppa
| `-- timeslice
|-- base-addr
|-- irq
|-- dma
|-- modes
`-- spintime
.. tabularcolumns:: |p{4.0cm}|p{13.5cm}|
======================= =======================================================
File Contents
======================= =======================================================
``devices/active`` A list of the device drivers using that port. A "+"
will appear by the name of the device currently using
the port (it might not appear against any). The
string "none" means that there are no device drivers
using that port.
``base-addr`` Parallel port's base address, or addresses if the port
has more than one in which case they are separated
with tabs. These values might not have any sensible
meaning for some ports.
``irq`` Parallel port's IRQ, or -1 if none is being used.
``dma`` Parallel port's DMA channel, or -1 if none is being
used.
``modes`` Parallel port's hardware modes, comma-separated,
meaning:
- PCSPP
PC-style SPP registers are available.
- TRISTATE
Port is bidirectional.
- COMPAT
Hardware acceleration for printers is
available and will be used.
- EPP
Hardware acceleration for EPP protocol
is available and will be used.
- ECP
Hardware acceleration for ECP protocol
is available and will be used.
- DMA
DMA is available and will be used.
Note that the current implementation will only take
advantage of COMPAT and ECP modes if it has an IRQ
line to use.
``autoprobe`` Any IEEE-1284 device ID information that has been
acquired from the (non-IEEE 1284.3) device.
``autoprobe[0-3]`` IEEE 1284 device ID information retrieved from
daisy-chain devices that conform to IEEE 1284.3.
``spintime`` The number of microseconds to busy-loop while waiting
for the peripheral to respond. You might find that
adjusting this improves performance, depending on your
peripherals. This is a port-wide setting, i.e. it
applies to all devices on a particular port.
``timeslice`` The number of milliseconds that a device driver is
allowed to keep a port claimed for. This is advisory,
and driver can ignore it if it must.
``default/*`` The defaults for spintime and timeslice. When a new
port is registered, it picks up the default spintime.
When a new device is registered, it picks up the
default timeslice.
======================= =======================================================
Device drivers
==============
Once the parport code is initialised, you can attach device drivers to
specific ports. Normally this happens automatically; if the lp driver
is loaded it will create one lp device for each port found. You can
override this, though, by using parameters either when you load the lp
driver::
# insmod lp parport=0,2
or on the LILO command line::
lp=parport0 lp=parport2
Both the above examples would inform lp that you want ``/dev/lp0`` to be
the first parallel port, and /dev/lp1 to be the **third** parallel port,
with no lp device associated with the second port (parport1). Note
that this is different to the way older kernels worked; there used to
be a static association between the I/O port address and the device
name, so ``/dev/lp0`` was always the port at 0x3bc. This is no longer the
case - if you only have one port, it will default to being ``/dev/lp0``,
regardless of base address.
Also:
* If you selected the IEEE 1284 support at compile time, you can say
``lp=auto`` on the kernel command line, and lp will create devices
only for those ports that seem to have printers attached.
* If you give PLIP the ``timid`` parameter, either with ``plip=timid`` on
the command line, or with ``insmod plip timid=1`` when using modules,
it will avoid any ports that seem to be in use by other devices.
* IRQ autoprobing works only for a few port types at the moment.
Reporting printer problems with parport
=======================================
If you are having problems printing, please go through these steps to
try to narrow down where the problem area is.
When reporting problems with parport, really you need to give all of
the messages that ``parport_pc`` spits out when it initialises. There are
several code paths:
- polling
- interrupt-driven, protocol in software
- interrupt-driven, protocol in hardware using PIO
- interrupt-driven, protocol in hardware using DMA
The kernel messages that ``parport_pc`` logs give an indication of which
code path is being used. (They could be a lot better actually..)
For normal printer protocol, having IEEE 1284 modes enabled or not
should not make a difference.
To turn off the 'protocol in hardware' code paths, disable
``CONFIG_PARPORT_PC_FIFO``. Note that when they are enabled they are not
necessarily **used**; it depends on whether the hardware is available,
enabled by the BIOS, and detected by the driver.
So, to start with, disable ``CONFIG_PARPORT_PC_FIFO``, and load ``parport_pc``
with ``irq=none``. See if printing works then. It really should,
because this is the simplest code path.
If that works fine, try with ``io=0x378 irq=7`` (adjust for your
hardware), to make it use interrupt-driven in-software protocol.
If **that** works fine, then one of the hardware modes isn't working
right. Enable ``CONFIG_FIFO`` (no, it isn't a module option,
and yes, it should be), set the port to ECP mode in the BIOS and note
the DMA channel, and try with::
io=0x378 irq=7 dma=none (for PIO)
io=0x378 irq=7 dma=3 (for DMA)
----------
philb@gnu.org
tim@cyberelk.net

Some files were not shown because too many files have changed in this diff Show More