Merge branch 'mauro' into docs-mw

Mauro says:

This changeset contains the kernel-doc.py script to replace the verable
kernel-doc originally written in Perl. It replaces the first version and the
second series I sent on the top of it.

I tried to stay as close as possible of the original Perl implementation
on the first patch introducing kernel-doc.py, as it helps to double check
if each function was  properly translated to Python.  This have been
helpful debugging troubles that happened during the conversion.

I worked hard to make it bug-compatible with the original one. Still, its
output has a couple of differences from the original one:

- The tab expansion works better with the Python script. With that, some
  outputs that contain tabs at kernel-doc markups are now different;

- The new script  works better stripping blank lines. So, there are a couple
  of empty new lines that are now stripped with this version;

- There is a buggy logic at kernel-doc to strip empty description and
  return sections. I was not able to replicate the exact behavior. So, I ended
  adding an extra logic to strip empty sections with a different algorithm.

Yet, on my tests, the results are compatible with the venerable script
output for all .. kernel-doc tags found in Documentation/. I double-checked
this by adding support to output the kernel-doc commands when V=1, and
then I ran a diff between kernel-doc.pl and kernel-doc.py for the same
command lines.

The only patch that doesn't belong to this series is a patch dropping
kernel-doc.pl. I opted to keep it for now, as it can help to better
test the new tools.

With such changes, if one wants to build docs with the old script,
all it is needed is to use KERNELDOC parameter, e.g.:

	$ make KERNELDOC=scripts/kernel-doc.pl htmldocs
This commit is contained in:
Jonathan Corbet
2025-04-09 12:24:51 -06:00
11 changed files with 5868 additions and 2441 deletions

2
.pylintrc Normal file
View File

@@ -0,0 +1,2 @@
[MASTER]
init-hook='import sys; sys.path += ["scripts/lib/kdoc", "scripts/lib/abi"]'

View File

@@ -60,7 +60,7 @@ endif #HAVE_LATEXMK
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
KERNELDOC = $(srctree)/scripts/kernel-doc
KERNELDOC = $(srctree)/scripts/kernel-doc.py
KERNELDOC_CONF = -D kerneldoc_srctree=$(srctree) -D kerneldoc_bin=$(KERNELDOC)
ALLSPHINXOPTS = $(KERNELDOC_CONF) $(PAPEROPT_$(PAPER)) $(SPHINXOPTS)
ifneq ($(wildcard $(srctree)/.config),)

View File

@@ -540,7 +540,7 @@ pdf_documents = [
# kernel-doc extension configuration for running Sphinx directly (e.g. by Read
# the Docs). In a normal build, these are supplied from the Makefile via command
# line arguments.
kerneldoc_bin = '../scripts/kernel-doc'
kerneldoc_bin = '../scripts/kernel-doc.py'
kerneldoc_srctree = '..'
# ------------------------------------------------------------------------------

View File

@@ -43,6 +43,29 @@ from sphinx.util import logging
__version__ = '1.0'
def cmd_str(cmd):
"""
Helper function to output a command line that can be used to produce
the same records via command line. Helpful to debug troubles at the
script.
"""
cmd_line = ""
for w in cmd:
if w == "" or " " in w:
esc_cmd = "'" + w + "'"
else:
esc_cmd = w
if cmd_line:
cmd_line += " " + esc_cmd
continue
else:
cmd_line = esc_cmd
return cmd_line
class KernelDocDirective(Directive):
"""Extract kernel-doc comments from the specified file"""
required_argument = 1
@@ -57,6 +80,7 @@ class KernelDocDirective(Directive):
}
has_content = False
logger = logging.getLogger('kerneldoc')
verbose = 0
def run(self):
env = self.state.document.settings.env
@@ -65,6 +89,13 @@ class KernelDocDirective(Directive):
filename = env.config.kerneldoc_srctree + '/' + self.arguments[0]
export_file_patterns = []
verbose = os.environ.get("V")
if verbose:
try:
self.verbose = int(verbose)
except ValueError:
pass
# Tell sphinx of the dependency
env.note_dependency(os.path.abspath(filename))
@@ -87,6 +118,10 @@ class KernelDocDirective(Directive):
identifiers = self.options.get('identifiers').split()
if identifiers:
for i in identifiers:
i = i.rstrip("\\").strip()
if not i:
continue
cmd += ['-function', i]
else:
cmd += ['-no-doc-sections']
@@ -95,15 +130,26 @@ class KernelDocDirective(Directive):
no_identifiers = self.options.get('no-identifiers').split()
if no_identifiers:
for i in no_identifiers:
i = i.rstrip("\\").strip()
if not i:
continue
cmd += ['-nosymbol', i]
for pattern in export_file_patterns:
pattern = pattern.rstrip("\\").strip()
if not pattern:
continue
for f in glob.glob(env.config.kerneldoc_srctree + '/' + pattern):
env.note_dependency(os.path.abspath(f))
cmd += ['-export-file', f]
cmd += [filename]
if self.verbose >= 1:
print(cmd_str(cmd))
try:
self.logger.verbose("calling kernel-doc '%s'" % (" ".join(cmd)))

File diff suppressed because it is too large Load Diff

1
scripts/kernel-doc Symbolic link
View File

@@ -0,0 +1 @@
kernel-doc.py

2439
scripts/kernel-doc.pl Executable file

File diff suppressed because it is too large Load Diff

315
scripts/kernel-doc.py Executable file
View File

@@ -0,0 +1,315 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0
# Copyright(c) 2025: Mauro Carvalho Chehab <mchehab@kernel.org>.
#
# pylint: disable=C0103,R0915
#
# Converted from the kernel-doc script originally written in Perl
# under GPLv2, copyrighted since 1998 by the following authors:
#
# Aditya Srivastava <yashsri421@gmail.com>
# Akira Yokosawa <akiyks@gmail.com>
# Alexander A. Klimov <grandmaster@al2klimov.de>
# Alexander Lobakin <aleksander.lobakin@intel.com>
# André Almeida <andrealmeid@igalia.com>
# Andy Shevchenko <andriy.shevchenko@linux.intel.com>
# Anna-Maria Behnsen <anna-maria@linutronix.de>
# Armin Kuster <akuster@mvista.com>
# Bart Van Assche <bart.vanassche@sandisk.com>
# Ben Hutchings <ben@decadent.org.uk>
# Borislav Petkov <bbpetkov@yahoo.de>
# Chen-Yu Tsai <wenst@chromium.org>
# Coco Li <lixiaoyan@google.com>
# Conchúr Navid <conchur@web.de>
# Daniel Santos <daniel.santos@pobox.com>
# Danilo Cesar Lemes de Paula <danilo.cesar@collabora.co.uk>
# Dan Luedtke <mail@danrl.de>
# Donald Hunter <donald.hunter@gmail.com>
# Gabriel Krisman Bertazi <krisman@collabora.co.uk>
# Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# Harvey Harrison <harvey.harrison@gmail.com>
# Horia Geanta <horia.geanta@freescale.com>
# Ilya Dryomov <idryomov@gmail.com>
# Jakub Kicinski <kuba@kernel.org>
# Jani Nikula <jani.nikula@intel.com>
# Jason Baron <jbaron@redhat.com>
# Jason Gunthorpe <jgg@nvidia.com>
# Jérémy Bobbio <lunar@debian.org>
# Johannes Berg <johannes.berg@intel.com>
# Johannes Weiner <hannes@cmpxchg.org>
# Jonathan Cameron <Jonathan.Cameron@huawei.com>
# Jonathan Corbet <corbet@lwn.net>
# Jonathan Neuschäfer <j.neuschaefer@gmx.net>
# Kamil Rytarowski <n54@gmx.com>
# Kees Cook <kees@kernel.org>
# Laurent Pinchart <laurent.pinchart@ideasonboard.com>
# Levin, Alexander (Sasha Levin) <alexander.levin@verizon.com>
# Linus Torvalds <torvalds@linux-foundation.org>
# Lucas De Marchi <lucas.demarchi@profusion.mobi>
# Mark Rutland <mark.rutland@arm.com>
# Markus Heiser <markus.heiser@darmarit.de>
# Martin Waitz <tali@admingilde.org>
# Masahiro Yamada <masahiroy@kernel.org>
# Matthew Wilcox <willy@infradead.org>
# Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
# Michal Wajdeczko <michal.wajdeczko@intel.com>
# Michael Zucchi
# Mike Rapoport <rppt@linux.ibm.com>
# Niklas Söderlund <niklas.soderlund@corigine.com>
# Nishanth Menon <nm@ti.com>
# Paolo Bonzini <pbonzini@redhat.com>
# Pavan Kumar Linga <pavan.kumar.linga@intel.com>
# Pavel Pisa <pisa@cmp.felk.cvut.cz>
# Peter Maydell <peter.maydell@linaro.org>
# Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
# Randy Dunlap <rdunlap@infradead.org>
# Richard Kennedy <richard@rsk.demon.co.uk>
# Rich Walker <rw@shadow.org.uk>
# Rolf Eike Beer <eike-kernel@sf-tec.de>
# Sakari Ailus <sakari.ailus@linux.intel.com>
# Silvio Fricke <silvio.fricke@gmail.com>
# Simon Huggins
# Tim Waugh <twaugh@redhat.com>
# Tomasz Warniełło <tomasz.warniello@gmail.com>
# Utkarsh Tripathi <utripathi2002@gmail.com>
# valdis.kletnieks@vt.edu <valdis.kletnieks@vt.edu>
# Vegard Nossum <vegard.nossum@oracle.com>
# Will Deacon <will.deacon@arm.com>
# Yacine Belkadi <yacine.belkadi.1@gmail.com>
# Yujie Liu <yujie.liu@intel.com>
"""
kernel_doc
==========
Print formatted kernel documentation to stdout
Read C language source or header FILEs, extract embedded
documentation comments, and print formatted documentation
to standard output.
The documentation comments are identified by the "/**"
opening comment mark.
See Documentation/doc-guide/kernel-doc.rst for the
documentation comment syntax.
"""
import argparse
import logging
import os
import sys
# Import Python modules
LIB_DIR = "lib/kdoc"
SRC_DIR = os.path.dirname(os.path.realpath(__file__))
sys.path.insert(0, os.path.join(SRC_DIR, LIB_DIR))
from kdoc_files import KernelFiles # pylint: disable=C0413
from kdoc_output import RestFormat, ManFormat # pylint: disable=C0413
DESC = """
Read C language source or header FILEs, extract embedded documentation comments,
and print formatted documentation to standard output.
The documentation comments are identified by the "/**" opening comment mark.
See Documentation/doc-guide/kernel-doc.rst for the documentation comment syntax.
"""
EXPORT_FILE_DESC = """
Specify an additional FILE in which to look for EXPORT_SYMBOL information.
May be used multiple times.
"""
EXPORT_DESC = """
Only output documentation for the symbols that have been
exported using EXPORT_SYMBOL() and related macros in any input
FILE or -export-file FILE.
"""
INTERNAL_DESC = """
Only output documentation for the symbols that have NOT been
exported using EXPORT_SYMBOL() and related macros in any input
FILE or -export-file FILE.
"""
FUNCTION_DESC = """
Only output documentation for the given function or DOC: section
title. All other functions and DOC: sections are ignored.
May be used multiple times.
"""
NOSYMBOL_DESC = """
Exclude the specified symbol from the output documentation.
May be used multiple times.
"""
FILES_DESC = """
Header and C source files to be parsed.
"""
WARN_CONTENTS_BEFORE_SECTIONS_DESC = """
Warns if there are contents before sections (deprecated).
This option is kept just for backward-compatibility, but it does nothing,
neither here nor at the original Perl script.
"""
class MsgFormatter(logging.Formatter):
"""Helper class to format warnings on a similar way to kernel-doc.pl"""
def format(self, record):
record.levelname = record.levelname.capitalize()
return logging.Formatter.format(self, record)
def main():
"""Main program"""
parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
description=DESC)
# Normal arguments
parser.add_argument("-v", "-verbose", "--verbose", action="store_true",
help="Verbose output, more warnings and other information.")
parser.add_argument("-d", "-debug", "--debug", action="store_true",
help="Enable debug messages")
parser.add_argument("-M", "-modulename", "--modulename",
default="Kernel API",
help="Allow setting a module name at the output.")
parser.add_argument("-l", "-enable-lineno", "--enable_lineno",
action="store_true",
help="Enable line number output (only in ReST mode)")
# Arguments to control the warning behavior
parser.add_argument("-Wreturn", "--wreturn", action="store_true",
help="Warns about the lack of a return markup on functions.")
parser.add_argument("-Wshort-desc", "-Wshort-description", "--wshort-desc",
action="store_true",
help="Warns if initial short description is missing")
parser.add_argument("-Wcontents-before-sections",
"--wcontents-before-sections", action="store_true",
help=WARN_CONTENTS_BEFORE_SECTIONS_DESC)
parser.add_argument("-Wall", "--wall", action="store_true",
help="Enable all types of warnings")
parser.add_argument("-Werror", "--werror", action="store_true",
help="Treat warnings as errors.")
parser.add_argument("-export-file", "--export-file", action='append',
help=EXPORT_FILE_DESC)
# Output format mutually-exclusive group
out_group = parser.add_argument_group("Output format selection (mutually exclusive)")
out_fmt = out_group.add_mutually_exclusive_group()
out_fmt.add_argument("-m", "-man", "--man", action="store_true",
help="Output troff manual page format.")
out_fmt.add_argument("-r", "-rst", "--rst", action="store_true",
help="Output reStructuredText format (default).")
out_fmt.add_argument("-N", "-none", "--none", action="store_true",
help="Do not output documentation, only warnings.")
# Output selection mutually-exclusive group
sel_group = parser.add_argument_group("Output selection (mutually exclusive)")
sel_mut = sel_group.add_mutually_exclusive_group()
sel_mut.add_argument("-e", "-export", "--export", action='store_true',
help=EXPORT_DESC)
sel_mut.add_argument("-i", "-internal", "--internal", action='store_true',
help=INTERNAL_DESC)
sel_mut.add_argument("-s", "-function", "--symbol", action='append',
help=FUNCTION_DESC)
# Those are valid for all 3 types of filter
parser.add_argument("-n", "-nosymbol", "--nosymbol", action='append',
help=NOSYMBOL_DESC)
parser.add_argument("-D", "-no-doc-sections", "--no-doc-sections",
action='store_true', help="Don't outputt DOC sections")
parser.add_argument("files", metavar="FILE",
nargs="+", help=FILES_DESC)
args = parser.parse_args()
if args.wall:
args.wreturn = True
args.wshort_desc = True
args.wcontents_before_sections = True
logger = logging.getLogger()
if not args.debug:
logger.setLevel(logging.INFO)
else:
logger.setLevel(logging.DEBUG)
formatter = MsgFormatter('%(levelname)s: %(message)s')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
if args.man:
out_style = ManFormat(modulename=args.modulename)
elif args.none:
out_style = None
else:
out_style = RestFormat()
kfiles = KernelFiles(verbose=args.verbose,
out_style=out_style, werror=args.werror,
wreturn=args.wreturn, wshort_desc=args.wshort_desc,
wcontents_before_sections=args.wcontents_before_sections)
kfiles.parse(args.files, export_file=args.export_file)
for t in kfiles.msg(enable_lineno=args.enable_lineno, export=args.export,
internal=args.internal, symbol=args.symbol,
nosymbol=args.nosymbol, export_file=args.export_file,
no_doc_sections=args.no_doc_sections):
msg = t[1]
if msg:
print(msg)
error_count = kfiles.errors
if not error_count:
sys.exit(0)
if args.werror:
print(f"{error_count} warnings as errors")
sys.exit(error_count)
if args.verbose:
print(f"{error_count} errors")
if args.none:
sys.exit(0)
sys.exit(error_count)
# Call main method
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,282 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0
# Copyright(c) 2025: Mauro Carvalho Chehab <mchehab@kernel.org>.
#
# pylint: disable=R0903,R0913,R0914,R0917
"""
Parse lernel-doc tags on multiple kernel source files.
"""
import argparse
import logging
import os
import re
from kdoc_parser import KernelDoc
from kdoc_output import OutputFormat
class GlobSourceFiles:
"""
Parse C source code file names and directories via an Interactor.
"""
def __init__(self, srctree=None, valid_extensions=None):
"""
Initialize valid extensions with a tuple.
If not defined, assume default C extensions (.c and .h)
It would be possible to use python's glob function, but it is
very slow, and it is not interactive. So, it would wait to read all
directories before actually do something.
So, let's use our own implementation.
"""
if not valid_extensions:
self.extensions = (".c", ".h")
else:
self.extensions = valid_extensions
self.srctree = srctree
def _parse_dir(self, dirname):
"""Internal function to parse files recursively"""
with os.scandir(dirname) as obj:
for entry in obj:
name = os.path.join(dirname, entry.name)
if entry.is_dir():
yield from self._parse_dir(name)
if not entry.is_file():
continue
basename = os.path.basename(name)
if not basename.endswith(self.extensions):
continue
yield name
def parse_files(self, file_list, file_not_found_cb):
"""
Define an interator to parse all source files from file_list,
handling directories if any
"""
if not file_list:
return
for fname in file_list:
if self.srctree:
f = os.path.join(self.srctree, fname)
else:
f = fname
if os.path.isdir(f):
yield from self._parse_dir(f)
elif os.path.isfile(f):
yield f
elif file_not_found_cb:
file_not_found_cb(fname)
class KernelFiles():
"""
Parse kernel-doc tags on multiple kernel source files.
There are two type of parsers defined here:
- self.parse_file(): parses both kernel-doc markups and
EXPORT_SYMBOL* macros;
- self.process_export_file(): parses only EXPORT_SYMBOL* macros.
"""
def warning(self, msg):
"""Ancillary routine to output a warning and increment error count"""
self.config.log.warning(msg)
self.errors += 1
def error(self, msg):
"""Ancillary routine to output an error and increment error count"""
self.config.log.error(msg)
self.errors += 1
def parse_file(self, fname):
"""
Parse a single Kernel source.
"""
# Prevent parsing the same file twice if results are cached
if fname in self.files:
return
doc = KernelDoc(self.config, fname)
export_table, entries = doc.parse_kdoc()
self.export_table[fname] = export_table
self.files.add(fname)
self.export_files.add(fname) # parse_kdoc() already check exports
self.results[fname] = entries
def process_export_file(self, fname):
"""
Parses EXPORT_SYMBOL* macros from a single Kernel source file.
"""
# Prevent parsing the same file twice if results are cached
if fname in self.export_files:
return
doc = KernelDoc(self.config, fname)
export_table = doc.parse_export()
if not export_table:
self.error(f"Error: Cannot check EXPORT_SYMBOL* on {fname}")
export_table = set()
self.export_table[fname] = export_table
self.export_files.add(fname)
def file_not_found_cb(self, fname):
"""
Callback to warn if a file was not found.
"""
self.error(f"Cannot find file {fname}")
def __init__(self, verbose=False, out_style=None,
werror=False, wreturn=False, wshort_desc=False,
wcontents_before_sections=False,
logger=None):
"""
Initialize startup variables and parse all files
"""
if not verbose:
verbose = bool(os.environ.get("KBUILD_VERBOSE", 0))
if out_style is None:
out_style = OutputFormat()
if not werror:
kcflags = os.environ.get("KCFLAGS", None)
if kcflags:
match = re.search(r"(\s|^)-Werror(\s|$)/", kcflags)
if match:
werror = True
# reading this variable is for backwards compat just in case
# someone was calling it with the variable from outside the
# kernel's build system
kdoc_werror = os.environ.get("KDOC_WERROR", None)
if kdoc_werror:
werror = kdoc_werror
# Some variables are global to the parser logic as a whole as they are
# used to send control configuration to KernelDoc class. As such,
# those variables are read-only inside the KernelDoc.
self.config = argparse.Namespace
self.config.verbose = verbose
self.config.werror = werror
self.config.wreturn = wreturn
self.config.wshort_desc = wshort_desc
self.config.wcontents_before_sections = wcontents_before_sections
if not logger:
self.config.log = logging.getLogger("kernel-doc")
else:
self.config.log = logger
self.config.warning = self.warning
self.config.src_tree = os.environ.get("SRCTREE", None)
# Initialize variables that are internal to KernelFiles
self.out_style = out_style
self.errors = 0
self.results = {}
self.files = set()
self.export_files = set()
self.export_table = {}
def parse(self, file_list, export_file=None):
"""
Parse all files
"""
glob = GlobSourceFiles(srctree=self.config.src_tree)
for fname in glob.parse_files(file_list, self.file_not_found_cb):
self.parse_file(fname)
for fname in glob.parse_files(export_file, self.file_not_found_cb):
self.process_export_file(fname)
def out_msg(self, fname, name, arg):
"""
Return output messages from a file name using the output style
filtering.
If output type was not handled by the syler, return None.
"""
# NOTE: we can add rules here to filter out unwanted parts,
# although OutputFormat.msg already does that.
return self.out_style.msg(fname, name, arg)
def msg(self, enable_lineno=False, export=False, internal=False,
symbol=None, nosymbol=None, no_doc_sections=False,
filenames=None, export_file=None):
"""
Interacts over the kernel-doc results and output messages,
returning kernel-doc markups on each interaction
"""
self.out_style.set_config(self.config)
if not filenames:
filenames = sorted(self.results.keys())
for fname in filenames:
function_table = set()
if internal or export:
if not export_file:
export_file = [fname]
for f in export_file:
function_table |= self.export_table[f]
if symbol:
for s in symbol:
function_table.add(s)
self.out_style.set_filter(export, internal, symbol, nosymbol,
function_table, enable_lineno,
no_doc_sections)
msg = ""
for name, arg in self.results[fname]:
msg += self.out_msg(fname, name, arg)
if msg is None:
ln = arg.get("ln", 0)
dtype = arg.get('type', "")
self.config.log.warning("%s:%d Can't handle %s",
fname, ln, dtype)
if msg:
yield fname, msg

793
scripts/lib/kdoc/kdoc_output.py Executable file
View File

@@ -0,0 +1,793 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0
# Copyright(c) 2025: Mauro Carvalho Chehab <mchehab@kernel.org>.
#
# pylint: disable=C0301,R0902,R0911,R0912,R0913,R0914,R0915,R0917
"""
Implement output filters to print kernel-doc documentation.
The implementation uses a virtual base class (OutputFormat) which
contains a dispatches to virtual methods, and some code to filter
out output messages.
The actual implementation is done on one separate class per each type
of output. Currently, there are output classes for ReST and man/troff.
"""
import os
import re
from datetime import datetime
from kdoc_parser import KernelDoc, type_param
from kdoc_re import KernRe
function_pointer = KernRe(r"([^\(]*\(\*)\s*\)\s*\(([^\)]*)\)", cache=False)
# match expressions used to find embedded type information
type_constant = KernRe(r"\b``([^\`]+)``\b", cache=False)
type_constant2 = KernRe(r"\%([-_*\w]+)", cache=False)
type_func = KernRe(r"(\w+)\(\)", cache=False)
type_param_ref = KernRe(r"([\!~\*]?)\@(\w*((\.\w+)|(->\w+))*(\.\.\.)?)", cache=False)
# Special RST handling for func ptr params
type_fp_param = KernRe(r"\@(\w+)\(\)", cache=False)
# Special RST handling for structs with func ptr params
type_fp_param2 = KernRe(r"\@(\w+->\S+)\(\)", cache=False)
type_env = KernRe(r"(\$\w+)", cache=False)
type_enum = KernRe(r"\&(enum\s*([_\w]+))", cache=False)
type_struct = KernRe(r"\&(struct\s*([_\w]+))", cache=False)
type_typedef = KernRe(r"\&(typedef\s*([_\w]+))", cache=False)
type_union = KernRe(r"\&(union\s*([_\w]+))", cache=False)
type_member = KernRe(r"\&([_\w]+)(\.|->)([_\w]+)", cache=False)
type_fallback = KernRe(r"\&([_\w]+)", cache=False)
type_member_func = type_member + KernRe(r"\(\)", cache=False)
class OutputFormat:
"""
Base class for OutputFormat. If used as-is, it means that only
warnings will be displayed.
"""
# output mode.
OUTPUT_ALL = 0 # output all symbols and doc sections
OUTPUT_INCLUDE = 1 # output only specified symbols
OUTPUT_EXPORTED = 2 # output exported symbols
OUTPUT_INTERNAL = 3 # output non-exported symbols
# Virtual member to be overriden at the inherited classes
highlights = []
def __init__(self):
"""Declare internal vars and set mode to OUTPUT_ALL"""
self.out_mode = self.OUTPUT_ALL
self.enable_lineno = None
self.nosymbol = {}
self.symbol = None
self.function_table = None
self.config = None
self.no_doc_sections = False
self.data = ""
def set_config(self, config):
"""
Setup global config variables used by both parser and output.
"""
self.config = config
def set_filter(self, export, internal, symbol, nosymbol, function_table,
enable_lineno, no_doc_sections):
"""
Initialize filter variables according with the requested mode.
Only one choice is valid between export, internal and symbol.
The nosymbol filter can be used on all modes.
"""
self.enable_lineno = enable_lineno
self.no_doc_sections = no_doc_sections
self.function_table = function_table
if symbol:
self.out_mode = self.OUTPUT_INCLUDE
elif export:
self.out_mode = self.OUTPUT_EXPORTED
elif internal:
self.out_mode = self.OUTPUT_INTERNAL
else:
self.out_mode = self.OUTPUT_ALL
if nosymbol:
self.nosymbol = set(nosymbol)
def highlight_block(self, block):
"""
Apply the RST highlights to a sub-block of text.
"""
for r, sub in self.highlights:
block = r.sub(sub, block)
return block
def out_warnings(self, args):
"""
Output warnings for identifiers that will be displayed.
"""
warnings = args.get('warnings', [])
for log_msg in warnings:
self.config.warning(log_msg)
def check_doc(self, name, args):
"""Check if DOC should be output"""
if self.no_doc_sections:
return False
if name in self.nosymbol:
return False
if self.out_mode == self.OUTPUT_ALL:
self.out_warnings(args)
return True
if self.out_mode == self.OUTPUT_INCLUDE:
if name in self.function_table:
self.out_warnings(args)
return True
return False
def check_declaration(self, dtype, name, args):
"""
Checks if a declaration should be output or not based on the
filtering criteria.
"""
if name in self.nosymbol:
return False
if self.out_mode == self.OUTPUT_ALL:
self.out_warnings(args)
return True
if self.out_mode in [self.OUTPUT_INCLUDE, self.OUTPUT_EXPORTED]:
if name in self.function_table:
return True
if self.out_mode == self.OUTPUT_INTERNAL:
if dtype != "function":
self.out_warnings(args)
return True
if name not in self.function_table:
self.out_warnings(args)
return True
return False
def msg(self, fname, name, args):
"""
Handles a single entry from kernel-doc parser
"""
self.data = ""
dtype = args.get('type', "")
if dtype == "doc":
self.out_doc(fname, name, args)
return self.data
if not self.check_declaration(dtype, name, args):
return self.data
if dtype == "function":
self.out_function(fname, name, args)
return self.data
if dtype == "enum":
self.out_enum(fname, name, args)
return self.data
if dtype == "typedef":
self.out_typedef(fname, name, args)
return self.data
if dtype in ["struct", "union"]:
self.out_struct(fname, name, args)
return self.data
# Warn if some type requires an output logic
self.config.log.warning("doesn't now how to output '%s' block",
dtype)
return None
# Virtual methods to be overridden by inherited classes
# At the base class, those do nothing.
def out_doc(self, fname, name, args):
"""Outputs a DOC block"""
def out_function(self, fname, name, args):
"""Outputs a function"""
def out_enum(self, fname, name, args):
"""Outputs an enum"""
def out_typedef(self, fname, name, args):
"""Outputs a typedef"""
def out_struct(self, fname, name, args):
"""Outputs a struct"""
class RestFormat(OutputFormat):
"""Consts and functions used by ReST output"""
highlights = [
(type_constant, r"``\1``"),
(type_constant2, r"``\1``"),
# Note: need to escape () to avoid func matching later
(type_member_func, r":c:type:`\1\2\3\\(\\) <\1>`"),
(type_member, r":c:type:`\1\2\3 <\1>`"),
(type_fp_param, r"**\1\\(\\)**"),
(type_fp_param2, r"**\1\\(\\)**"),
(type_func, r"\1()"),
(type_enum, r":c:type:`\1 <\2>`"),
(type_struct, r":c:type:`\1 <\2>`"),
(type_typedef, r":c:type:`\1 <\2>`"),
(type_union, r":c:type:`\1 <\2>`"),
# in rst this can refer to any type
(type_fallback, r":c:type:`\1`"),
(type_param_ref, r"**\1\2**")
]
blankline = "\n"
sphinx_literal = KernRe(r'^[^.].*::$', cache=False)
sphinx_cblock = KernRe(r'^\.\.\ +code-block::', cache=False)
def __init__(self):
"""
Creates class variables.
Not really mandatory, but it is a good coding style and makes
pylint happy.
"""
super().__init__()
self.lineprefix = ""
def print_lineno(self, ln):
"""Outputs a line number"""
if self.enable_lineno and ln is not None:
ln += 1
self.data += f".. LINENO {ln}\n"
def output_highlight(self, args):
"""
Outputs a C symbol that may require being converted to ReST using
the self.highlights variable
"""
input_text = args
output = ""
in_literal = False
litprefix = ""
block = ""
for line in input_text.strip("\n").split("\n"):
# If we're in a literal block, see if we should drop out of it.
# Otherwise, pass the line straight through unmunged.
if in_literal:
if line.strip(): # If the line is not blank
# If this is the first non-blank line in a literal block,
# figure out the proper indent.
if not litprefix:
r = KernRe(r'^(\s*)')
if r.match(line):
litprefix = '^' + r.group(1)
else:
litprefix = ""
output += line + "\n"
elif not KernRe(litprefix).match(line):
in_literal = False
else:
output += line + "\n"
else:
output += line + "\n"
# Not in a literal block (or just dropped out)
if not in_literal:
block += line + "\n"
if self.sphinx_literal.match(line) or self.sphinx_cblock.match(line):
in_literal = True
litprefix = ""
output += self.highlight_block(block)
block = ""
# Handle any remaining block
if block:
output += self.highlight_block(block)
# Print the output with the line prefix
for line in output.strip("\n").split("\n"):
self.data += self.lineprefix + line + "\n"
def out_section(self, args, out_docblock=False):
"""
Outputs a block section.
This could use some work; it's used to output the DOC: sections, and
starts by putting out the name of the doc section itself, but that
tends to duplicate a header already in the template file.
"""
sectionlist = args.get('sectionlist', [])
sections = args.get('sections', {})
section_start_lines = args.get('section_start_lines', {})
for section in sectionlist:
# Skip sections that are in the nosymbol_table
if section in self.nosymbol:
continue
if out_docblock:
if not self.out_mode == self.OUTPUT_INCLUDE:
self.data += f".. _{section}:\n\n"
self.data += f'{self.lineprefix}**{section}**\n\n'
else:
self.data += f'{self.lineprefix}**{section}**\n\n'
self.print_lineno(section_start_lines.get(section, 0))
self.output_highlight(sections[section])
self.data += "\n"
self.data += "\n"
def out_doc(self, fname, name, args):
if not self.check_doc(name, args):
return
self.out_section(args, out_docblock=True)
def out_function(self, fname, name, args):
oldprefix = self.lineprefix
signature = ""
func_macro = args.get('func_macro', False)
if func_macro:
signature = args['function']
else:
if args.get('functiontype'):
signature = args['functiontype'] + " "
signature += args['function'] + " ("
parameterlist = args.get('parameterlist', [])
parameterdescs = args.get('parameterdescs', {})
parameterdesc_start_lines = args.get('parameterdesc_start_lines', {})
ln = args.get('declaration_start_line', 0)
count = 0
for parameter in parameterlist:
if count != 0:
signature += ", "
count += 1
dtype = args['parametertypes'].get(parameter, "")
if function_pointer.search(dtype):
signature += function_pointer.group(1) + parameter + function_pointer.group(3)
else:
signature += dtype
if not func_macro:
signature += ")"
self.print_lineno(ln)
if args.get('typedef') or not args.get('functiontype'):
self.data += f".. c:macro:: {args['function']}\n\n"
if args.get('typedef'):
self.data += " **Typedef**: "
self.lineprefix = ""
self.output_highlight(args.get('purpose', ""))
self.data += "\n\n**Syntax**\n\n"
self.data += f" ``{signature}``\n\n"
else:
self.data += f"``{signature}``\n\n"
else:
self.data += f".. c:function:: {signature}\n\n"
if not args.get('typedef'):
self.print_lineno(ln)
self.lineprefix = " "
self.output_highlight(args.get('purpose', ""))
self.data += "\n"
# Put descriptive text into a container (HTML <div>) to help set
# function prototypes apart
self.lineprefix = " "
if parameterlist:
self.data += ".. container:: kernelindent\n\n"
self.data += f"{self.lineprefix}**Parameters**\n\n"
for parameter in parameterlist:
parameter_name = KernRe(r'\[.*').sub('', parameter)
dtype = args['parametertypes'].get(parameter, "")
if dtype:
self.data += f"{self.lineprefix}``{dtype}``\n"
else:
self.data += f"{self.lineprefix}``{parameter}``\n"
self.print_lineno(parameterdesc_start_lines.get(parameter_name, 0))
self.lineprefix = " "
if parameter_name in parameterdescs and \
parameterdescs[parameter_name] != KernelDoc.undescribed:
self.output_highlight(parameterdescs[parameter_name])
self.data += "\n"
else:
self.data += f"{self.lineprefix}*undescribed*\n\n"
self.lineprefix = " "
self.out_section(args)
self.lineprefix = oldprefix
def out_enum(self, fname, name, args):
oldprefix = self.lineprefix
name = args.get('enum', '')
parameterlist = args.get('parameterlist', [])
parameterdescs = args.get('parameterdescs', {})
ln = args.get('declaration_start_line', 0)
self.data += f"\n\n.. c:enum:: {name}\n\n"
self.print_lineno(ln)
self.lineprefix = " "
self.output_highlight(args.get('purpose', ''))
self.data += "\n"
self.data += ".. container:: kernelindent\n\n"
outer = self.lineprefix + " "
self.lineprefix = outer + " "
self.data += f"{outer}**Constants**\n\n"
for parameter in parameterlist:
self.data += f"{outer}``{parameter}``\n"
if parameterdescs.get(parameter, '') != KernelDoc.undescribed:
self.output_highlight(parameterdescs[parameter])
else:
self.data += f"{self.lineprefix}*undescribed*\n\n"
self.data += "\n"
self.lineprefix = oldprefix
self.out_section(args)
def out_typedef(self, fname, name, args):
oldprefix = self.lineprefix
name = args.get('typedef', '')
ln = args.get('declaration_start_line', 0)
self.data += f"\n\n.. c:type:: {name}\n\n"
self.print_lineno(ln)
self.lineprefix = " "
self.output_highlight(args.get('purpose', ''))
self.data += "\n"
self.lineprefix = oldprefix
self.out_section(args)
def out_struct(self, fname, name, args):
name = args.get('struct', "")
purpose = args.get('purpose', "")
declaration = args.get('definition', "")
dtype = args.get('type', "struct")
ln = args.get('declaration_start_line', 0)
parameterlist = args.get('parameterlist', [])
parameterdescs = args.get('parameterdescs', {})
parameterdesc_start_lines = args.get('parameterdesc_start_lines', {})
self.data += f"\n\n.. c:{dtype}:: {name}\n\n"
self.print_lineno(ln)
oldprefix = self.lineprefix
self.lineprefix += " "
self.output_highlight(purpose)
self.data += "\n"
self.data += ".. container:: kernelindent\n\n"
self.data += f"{self.lineprefix}**Definition**::\n\n"
self.lineprefix = self.lineprefix + " "
declaration = declaration.replace("\t", self.lineprefix)
self.data += f"{self.lineprefix}{dtype} {name}" + ' {' + "\n"
self.data += f"{declaration}{self.lineprefix}" + "};\n\n"
self.lineprefix = " "
self.data += f"{self.lineprefix}**Members**\n\n"
for parameter in parameterlist:
if not parameter or parameter.startswith("#"):
continue
parameter_name = parameter.split("[", maxsplit=1)[0]
if parameterdescs.get(parameter_name) == KernelDoc.undescribed:
continue
self.print_lineno(parameterdesc_start_lines.get(parameter_name, 0))
self.data += f"{self.lineprefix}``{parameter}``\n"
self.lineprefix = " "
self.output_highlight(parameterdescs[parameter_name])
self.lineprefix = " "
self.data += "\n"
self.data += "\n"
self.lineprefix = oldprefix
self.out_section(args)
class ManFormat(OutputFormat):
"""Consts and functions used by man pages output"""
highlights = (
(type_constant, r"\1"),
(type_constant2, r"\1"),
(type_func, r"\\fB\1\\fP"),
(type_enum, r"\\fI\1\\fP"),
(type_struct, r"\\fI\1\\fP"),
(type_typedef, r"\\fI\1\\fP"),
(type_union, r"\\fI\1\\fP"),
(type_param, r"\\fI\1\\fP"),
(type_param_ref, r"\\fI\1\2\\fP"),
(type_member, r"\\fI\1\2\3\\fP"),
(type_fallback, r"\\fI\1\\fP")
)
blankline = ""
date_formats = [
"%a %b %d %H:%M:%S %Z %Y",
"%a %b %d %H:%M:%S %Y",
"%Y-%m-%d",
"%b %d %Y",
"%B %d %Y",
"%m %d %Y",
]
def __init__(self, modulename):
"""
Creates class variables.
Not really mandatory, but it is a good coding style and makes
pylint happy.
"""
super().__init__()
self.modulename = modulename
dt = None
tstamp = os.environ.get("KBUILD_BUILD_TIMESTAMP")
if tstamp:
for fmt in self.date_formats:
try:
dt = datetime.strptime(tstamp, fmt)
break
except ValueError:
pass
if not dt:
dt = datetime.now()
self.man_date = dt.strftime("%B %Y")
def output_highlight(self, block):
"""
Outputs a C symbol that may require being highlighted with
self.highlights variable using troff syntax
"""
contents = self.highlight_block(block)
if isinstance(contents, list):
contents = "\n".join(contents)
for line in contents.strip("\n").split("\n"):
line = KernRe(r"^\s*").sub("", line)
if not line:
continue
if line[0] == ".":
self.data += "\\&" + line + "\n"
else:
self.data += line + "\n"
def out_doc(self, fname, name, args):
sectionlist = args.get('sectionlist', [])
sections = args.get('sections', {})
if not self.check_doc(name, args):
return
self.data += f'.TH "{self.modulename}" 9 "{self.modulename}" "{self.man_date}" "API Manual" LINUX' + "\n"
for section in sectionlist:
self.data += f'.SH "{section}"' + "\n"
self.output_highlight(sections.get(section))
def out_function(self, fname, name, args):
"""output function in man"""
parameterlist = args.get('parameterlist', [])
parameterdescs = args.get('parameterdescs', {})
sectionlist = args.get('sectionlist', [])
sections = args.get('sections', {})
self.data += f'.TH "{args["function"]}" 9 "{args["function"]}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
self.data += ".SH NAME\n"
self.data += f"{args['function']} \\- {args['purpose']}\n"
self.data += ".SH SYNOPSIS\n"
if args.get('functiontype', ''):
self.data += f'.B "{args["functiontype"]}" {args["function"]}' + "\n"
else:
self.data += f'.B "{args["function"]}' + "\n"
count = 0
parenth = "("
post = ","
for parameter in parameterlist:
if count == len(parameterlist) - 1:
post = ");"
dtype = args['parametertypes'].get(parameter, "")
if function_pointer.match(dtype):
# Pointer-to-function
self.data += f'".BI "{parenth}{function_pointer.group(1)}" " ") ({function_pointer.group(2)}){post}"' + "\n"
else:
dtype = KernRe(r'([^\*])$').sub(r'\1 ', dtype)
self.data += f'.BI "{parenth}{dtype}" "{post}"' + "\n"
count += 1
parenth = ""
if parameterlist:
self.data += ".SH ARGUMENTS\n"
for parameter in parameterlist:
parameter_name = re.sub(r'\[.*', '', parameter)
self.data += f'.IP "{parameter}" 12' + "\n"
self.output_highlight(parameterdescs.get(parameter_name, ""))
for section in sectionlist:
self.data += f'.SH "{section.upper()}"' + "\n"
self.output_highlight(sections[section])
def out_enum(self, fname, name, args):
name = args.get('enum', '')
parameterlist = args.get('parameterlist', [])
sectionlist = args.get('sectionlist', [])
sections = args.get('sections', {})
self.data += f'.TH "{self.modulename}" 9 "enum {args["enum"]}" "{self.man_date}" "API Manual" LINUX' + "\n"
self.data += ".SH NAME\n"
self.data += f"enum {args['enum']} \\- {args['purpose']}\n"
self.data += ".SH SYNOPSIS\n"
self.data += f"enum {args['enum']}" + " {\n"
count = 0
for parameter in parameterlist:
self.data += f'.br\n.BI " {parameter}"' + "\n"
if count == len(parameterlist) - 1:
self.data += "\n};\n"
else:
self.data += ", \n.br\n"
count += 1
self.data += ".SH Constants\n"
for parameter in parameterlist:
parameter_name = KernRe(r'\[.*').sub('', parameter)
self.data += f'.IP "{parameter}" 12' + "\n"
self.output_highlight(args['parameterdescs'].get(parameter_name, ""))
for section in sectionlist:
self.data += f'.SH "{section}"' + "\n"
self.output_highlight(sections[section])
def out_typedef(self, fname, name, args):
module = self.modulename
typedef = args.get('typedef')
purpose = args.get('purpose')
sectionlist = args.get('sectionlist', [])
sections = args.get('sections', {})
self.data += f'.TH "{module}" 9 "{typedef}" "{self.man_date}" "API Manual" LINUX' + "\n"
self.data += ".SH NAME\n"
self.data += f"typedef {typedef} \\- {purpose}\n"
for section in sectionlist:
self.data += f'.SH "{section}"' + "\n"
self.output_highlight(sections.get(section))
def out_struct(self, fname, name, args):
module = self.modulename
struct_type = args.get('type')
struct_name = args.get('struct')
purpose = args.get('purpose')
definition = args.get('definition')
sectionlist = args.get('sectionlist', [])
parameterlist = args.get('parameterlist', [])
sections = args.get('sections', {})
parameterdescs = args.get('parameterdescs', {})
self.data += f'.TH "{module}" 9 "{struct_type} {struct_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
self.data += ".SH NAME\n"
self.data += f"{struct_type} {struct_name} \\- {purpose}\n"
# Replace tabs with two spaces and handle newlines
declaration = definition.replace("\t", " ")
declaration = KernRe(r"\n").sub('"\n.br\n.BI "', declaration)
self.data += ".SH SYNOPSIS\n"
self.data += f"{struct_type} {struct_name} " + "{" + "\n.br\n"
self.data += f'.BI "{declaration}\n' + "};\n.br\n\n"
self.data += ".SH Members\n"
for parameter in parameterlist:
if parameter.startswith("#"):
continue
parameter_name = re.sub(r"\[.*", "", parameter)
if parameterdescs.get(parameter_name) == KernelDoc.undescribed:
continue
self.data += f'.IP "{parameter}" 12' + "\n"
self.output_highlight(parameterdescs.get(parameter_name))
for section in sectionlist:
self.data += f'.SH "{section}"' + "\n"
self.output_highlight(sections.get(section))

1715
scripts/lib/kdoc/kdoc_parser.py Executable file

File diff suppressed because it is too large Load Diff

273
scripts/lib/kdoc/kdoc_re.py Executable file
View File

@@ -0,0 +1,273 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0
# Copyright(c) 2025: Mauro Carvalho Chehab <mchehab@kernel.org>.
"""
Regular expression ancillary classes.
Those help caching regular expressions and do matching for kernel-doc.
"""
import re
# Local cache for regular expressions
re_cache = {}
class KernRe:
"""
Helper class to simplify regex declaration and usage,
It calls re.compile for a given pattern. It also allows adding
regular expressions and define sub at class init time.
Regular expressions can be cached via an argument, helping to speedup
searches.
"""
def _add_regex(self, string, flags):
"""
Adds a new regex or re-use it from the cache.
"""
if string in re_cache:
self.regex = re_cache[string]
else:
self.regex = re.compile(string, flags=flags)
if self.cache:
re_cache[string] = self.regex
def __init__(self, string, cache=True, flags=0):
"""
Compile a regular expression and initialize internal vars.
"""
self.cache = cache
self.last_match = None
self._add_regex(string, flags)
def __str__(self):
"""
Return the regular expression pattern.
"""
return self.regex.pattern
def __add__(self, other):
"""
Allows adding two regular expressions into one.
"""
return KernRe(str(self) + str(other), cache=self.cache or other.cache,
flags=self.regex.flags | other.regex.flags)
def match(self, string):
"""
Handles a re.match storing its results
"""
self.last_match = self.regex.match(string)
return self.last_match
def search(self, string):
"""
Handles a re.search storing its results
"""
self.last_match = self.regex.search(string)
return self.last_match
def findall(self, string):
"""
Alias to re.findall
"""
return self.regex.findall(string)
def split(self, string):
"""
Alias to re.split
"""
return self.regex.split(string)
def sub(self, sub, string, count=0):
"""
Alias to re.sub
"""
return self.regex.sub(sub, string, count=count)
def group(self, num):
"""
Returns the group results of the last match
"""
return self.last_match.group(num)
class NestedMatch:
"""
Finding nested delimiters is hard with regular expressions. It is
even harder on Python with its normal re module, as there are several
advanced regular expressions that are missing.
This is the case of this pattern:
'\\bSTRUCT_GROUP(\\(((?:(?>[^)(]+)|(?1))*)\\))[^;]*;'
which is used to properly match open/close parenthesis of the
string search STRUCT_GROUP(),
Add a class that counts pairs of delimiters, using it to match and
replace nested expressions.
The original approach was suggested by:
https://stackoverflow.com/questions/5454322/python-how-to-match-nested-parentheses-with-regex
Although I re-implemented it to make it more generic and match 3 types
of delimiters. The logic checks if delimiters are paired. If not, it
will ignore the search string.
"""
# TODO: make NestedMatch handle multiple match groups
#
# Right now, regular expressions to match it are defined only up to
# the start delimiter, e.g.:
#
# \bSTRUCT_GROUP\(
#
# is similar to: STRUCT_GROUP\((.*)\)
# except that the content inside the match group is delimiter's aligned.
#
# The content inside parenthesis are converted into a single replace
# group (e.g. r`\1').
#
# It would be nice to change such definition to support multiple
# match groups, allowing a regex equivalent to.
#
# FOO\((.*), (.*), (.*)\)
#
# it is probably easier to define it not as a regular expression, but
# with some lexical definition like:
#
# FOO(arg1, arg2, arg3)
DELIMITER_PAIRS = {
'{': '}',
'(': ')',
'[': ']',
}
RE_DELIM = re.compile(r'[\{\}\[\]\(\)]')
def _search(self, regex, line):
"""
Finds paired blocks for a regex that ends with a delimiter.
The suggestion of using finditer to match pairs came from:
https://stackoverflow.com/questions/5454322/python-how-to-match-nested-parentheses-with-regex
but I ended using a different implementation to align all three types
of delimiters and seek for an initial regular expression.
The algorithm seeks for open/close paired delimiters and place them
into a stack, yielding a start/stop position of each match when the
stack is zeroed.
The algorithm shoud work fine for properly paired lines, but will
silently ignore end delimiters that preceeds an start delimiter.
This should be OK for kernel-doc parser, as unaligned delimiters
would cause compilation errors. So, we don't need to rise exceptions
to cover such issues.
"""
stack = []
for match_re in regex.finditer(line):
start = match_re.start()
offset = match_re.end()
d = line[offset - 1]
if d not in self.DELIMITER_PAIRS:
continue
end = self.DELIMITER_PAIRS[d]
stack.append(end)
for match in self.RE_DELIM.finditer(line[offset:]):
pos = match.start() + offset
d = line[pos]
if d in self.DELIMITER_PAIRS:
end = self.DELIMITER_PAIRS[d]
stack.append(end)
continue
# Does the end delimiter match what it is expected?
if stack and d == stack[-1]:
stack.pop()
if not stack:
yield start, offset, pos + 1
break
def search(self, regex, line):
"""
This is similar to re.search:
It matches a regex that it is followed by a delimiter,
returning occurrences only if all delimiters are paired.
"""
for t in self._search(regex, line):
yield line[t[0]:t[2]]
def sub(self, regex, sub, line, count=0):
"""
This is similar to re.sub:
It matches a regex that it is followed by a delimiter,
replacing occurrences only if all delimiters are paired.
if r'\1' is used, it works just like re: it places there the
matched paired data with the delimiter stripped.
If count is different than zero, it will replace at most count
items.
"""
out = ""
cur_pos = 0
n = 0
for start, end, pos in self._search(regex, line):
out += line[cur_pos:start]
# Value, ignoring start/end delimiters
value = line[end:pos - 1]
# replaces \1 at the sub string, if \1 is used there
new_sub = sub
new_sub = new_sub.replace(r'\1', value)
out += new_sub
# Drop end ';' if any
if line[pos] == ';':
pos += 1
cur_pos = pos
n += 1
if count and count >= n:
break
# Append the remaining string
l = len(line)
out += line[cur_pos:l]
return out