libt3highlight
|
The syntax highlighting of libt3highlight is highly configurable. In the following sections the syntax of the highlighting description files is detailed. libt3highlight uses the PCRE2 library for regular expression matching. See the documentation of the PCRE2 library (either the local pcre2pattern manpage, or the online documentation) for details on the regular expression syntax. All features of the PCRE2 library are available, with the exception of the \G assertion.
libt3highlight uses the libt3config library for storing the highlighting description files. For the most part, the syntax of the files will be self-explanatory, but if you need more details, you can find them in the libt3config documentation.
A complete highlighting description file for libt3highlight consists of a file format specifier, which must have the value 1
or 2
, an optional list of named highlight definitions which can be used elsewhere, and a list of highlight definitions constituting the highlighting. A simple example, which marks any text from a hash sign (#) up to the end of the line as a comment looks like this:
format = 1 %highlight { start = "#" end = "$" style = "comment" }
From the libt3config documentation:
'foo''bar'
encodes the string foo'bar
). Multiple strings may be concatenated by using a plus sign (+). To split a string accross multiple lines, use string concatenation using the plus sign, where a plus sign must appear before the newline after each substring.To make it easier to reuse (parts of) highlighting description files, other files can be included. To include a file, use %include = "file.lang"
. Either absolute path names may be used, or paths relative to the include directories. The include directories are the per user data directory (see above) and the default libt3highlight data directory (usually /usr/share/libt3highlight-VERSION or /usr/local/share/libt3highlight-VERSION). Files meant to be included by other files should not contain a format
key. Only files intended to be used as complete language definitions should include the format
key.
A highlight definition can have three forms: a single matching item using the regex
key, a state definition using the start
and end
keys, and a reference to a named highlight using the use
key.
To define items like keywords and other simple items which can be described using a single regular expression, a highlight can be defined using the regex
key. The style can be selected using the style
key. For example:
%highlight { regex = '\b(?:int|float|bool)\b' style = "keyword" }
will ensure that the words int
, float
and bool
will be styled as keywords.
A state definition uses the start
and end
regular-expression keys. Once the start
regular expression is matched, everything up to and including the first text matching the (optional) end
regular expression is styled using the style selected with the style
key. If the text matching the start
and end
regexes must be styled differently from the rest of the text, the delim-style
key can be used.
In format 2
files, the start
regex is allowed to match the empty string. However, there may not be cycles of states of empty-matching start
patterns. In format 1
files, or files which have the allow-empty-start
top-level boolean set to false
(only valid in format 2
files), the start
regex is not allowed to match the empty string. Although it is legal to write regexes which would match the empty string, only the first non-empty match is considered.
A state definition can also have sub-highlights. This is done by simply adding %highlight
sections inside the highlight definition. If the sub-highlights are to be matched before trying to match the end
regex, make sure that the first %highlight
definition occurs before the end
definition.
Finally, a state may be defined as nested, which means that when the start
regex occurs while the state is already active, it will match again and the state will be entered again. This means that to return to the initial state, the end
regex will have to match twice or more, depending on the nesting level. As is the case with the end
regex, if the start
regex is to be tried before the sub-highlights, it must be included before the first sub-highlight definition.
As an example, which includes nesting, look at the following definition for a Bourne-shell variable. Shell variables start with ${, and end with }. However, if the } is preceeded by a backslash (@), it is not considered to end the variable reference. Furthermore, a dollar sign preceeded by a backslash is not considered to start a nested variable reference. Therefore, a sub-highlight is defined that matches all occurences of a backslash and another character. Because the search for the next match is started from the end of the last match, a backslash followed by a dollar sign or a closing curly brace will never match the start
or end
regex, unless there are two (or any even number of) backslashes before it.
%highlight { start = '\$\{' %highlight { regex = '\\.' } end = '\}' style = "variable" nested = yes }
Sometimes a state is delimited by a symbol that is not known ahead of time. Examples of these are Shell here-docs, perl strings using q/qq/m/s etc. operators, and Lua comments. To accomodate these situations, it is possible to use a named subpattern in the start
pattern, which can be extracted for use in the end
pattern. To make use of this, the state definition should contain the key extract
, to tell libt3highlight the name of the substring to be extracted. For example, here is a section of the here-doc definition for the Shell language:
%highlight { start = '<<\s*(?<delim>\w+)' extract = "delim" end = '^(?&delim)$' style = "string" }
This uses the PCRE2 named sub-pattern syntax, as described in the pcre2pattern(3) man page. Note that this is a relatively expensive operation, because the end
pattern has to be created on the fly. It is therefore inadvisable to use this for patterns which can also be written using fixed patterns.
Sometimes it is desirable to exit from more than one state, or to have more than one end
pattern. To this end, each highlight is allowed to have a exit
key, which specifies how many states to exit. The default for end
patterns is one, and for non-state highlights it is zero. By setting the exit
key to a one for a non-state highlight, you effectively create an extra end
pattern.
To match complex state based elements libt3highlight provides an extra feature. When a start
pattern is matched, additional states can be put on the stack. These additional states can then be used to for example allow an item to be matched once, without leaving the state that was started. An example of where this is useful is the Perl s operator. The s operator allows any character to be used as a delimiter, although commonly the '/' character is used. However, this character is used three times, to delimit two different strings. For example s/abc/def/
. To match this, an extra state can be used:
%highlight { start = '\bs(?<delim>.)' extract = "delim" %on-entry { end = '(?&delim)' } end = '(?&delim)' style = 'string' }
Note that the on-entry
key is a list of states, which will be pushed onto the stack. Thus the last element in the on-entry
list will be active after the start
pattern matched.
In an on-entry
element, the end
, highlight
, style
, delim-stlye
, exit
and use
entries are valid. Their meaning is the same as for normal state definitions. The end
pattern may be a dynamic pattern, using the named sub-pattern that was extracted from the start
pattern that caused the on-entry
state to be created.
It is possible to create named highlights. These must be defined by creating one or more %define
sections. The %define
sections must contain named sections which contain %highlight definitions. For example:
%define { types { %highlight { regex = '\b(?:int|float|bool)\b' style = "keyword" } } hash-comment { %highlight { start = '#' end = '$' style = "comment" } } }
will define a named highlight types
and a highlight named hash-comment
, which can be used as follows:
%highlight { use = "types" } %highlight { use = "hash-comment" }
There is no check for multiple highlights with the same name, and only the first defined highlight with a certain name is used.
As shown in the previous section, the style to be used for highlighting items in the text is determined by a string value. Although the names are not strictly standardized, it is important for the proper functioning of programs using libt3highlight to use the same names for styling across different highlighting description files. Therefore, this section lists the names of styles to be used, with a short description of what they are intended for.
normal
Standard text that is not highlighted. keyword
Keywords in the langauge, and items that are perceived by the user as keywords. An example of the latter is the NULL
keyword in the C language, which is not a keyword but a constant defined in a header file. However, it is used so pervasivly it is perceived as a keyword by many. string
String and character constants. string-escape
Escape sequences in string and character constants, where appropriate. comment
Comments. comment-keyword
For highlighting items within comments. This is mainly to be used when the comments themselves have a specified structure. Examples of this are C++ Doxygen comments and Javadoc comments. number
Numerical constants. variable
Variable references in languages in which they are recognisable as such. Examples are Shell and Perl scripts, in which variable references are introduced by special characters. error
Explicitly highlight syntax errors. Use sparingly, and only when it is absolutely certain that the syntax is incorrect. addition
Used in diff output for additions. deletion
Used in diff output for deletions. misc
Highlighting of items not covered by the above. An example where this is used are C-preprocessor directives.This list may be extended in the future. However, because libt3highlight is also used for highlighting in environments where the display possibilities are limited, the number of styles will remain small.
This section lists useful tips and tricks for writing highlight files.
To make it easier to embed a complete language into another, it is useful to write the whole language definition as a named highlight definition. This definition should be put in a separate file, and a new file, which simply includes the definition file and a single highlight definition to use the named highlight, should be created. See the definition of the C language in c.lang
as an example.
The difficulty in C-style strings, is that they can be continued on the next line by including a backslash as the last character on the line. However, it also uses the backslash to escape characters in the string, such as the double-quote character which would otherwise terminate the string. The final difficulty is that the highlighting should stop at the end of the line if it is not preceeded by a backslash.
The first step is to create a state started by a double-quote character. In this state we define a highlight to match escape-sequences. We also have to create an end regex. This consists of either a double-quote, or the end of line. However, the end of line must only match if the last character before the end of the line is not a backslash. But we must also take into account the fact that there may not be any character left on the line. We could use a lookbehind assertion, but that would also match a backslash we have already matched previously using the sub-highlight.
Instead, we create an extra state, started by backslash followed by the end of the line. This state is then exited when the new line is started:
%highlight { start = '"' %highlight { regex = '\\.' style = "string-escape" } %highlight { start = '\\$' end = '^' } end = '"|$' style = "string" }
By entering a new sub state, we avoid matching the end
pattern. Thus the string is continued on the next line.