Changes to the tags file format

F kind usage

You cannot use F (file) kind in your .ctags because Universal Ctags reserves it. See ctags-incompatibilities(7).

Reference tags

Traditionally ctags collects the information for locating where a language object is DEFINED.

In addition Universal Ctags supports reference tags. If the extra-tag r is enabled, Universal Ctags also collects the information for locating where a language object is REFERENCED. This feature was proposed by @shigio in #569 for GNU GLOBAL.

Here are some examples. Here is the target input file named reftag.c.

#include <stdio.h>
#include "foo.h"
#define TYPE point
struct TYPE { int x, y; };
TYPE p;
#undef TYPE

Traditional output:

$ ./ctags -o - reftag.c
TYPE        reftag.c        /^#define TYPE /;"      d       file:
TYPE        reftag.c        /^struct TYPE { int x, y; };$/;"        s       file:
p   reftag.c        /^TYPE p;$/;"   v       typeref:typename:TYPE
x   reftag.c        /^struct TYPE { int x, y; };$/;"        m       struct:TYPE     typeref:typename:int    file:
y   reftag.c        /^struct TYPE { int x, y; };$/;"        m       struct:TYPE     typeref:typename:int    file:

Output with the extra-tag r enabled:

$ ./ctags --list-extras | grep ^r
r   Include reference tags  off
$ ./ctags -o - --extras=+r reftag.c
TYPE        reftag.c        /^#define TYPE /;"      d       file:
TYPE        reftag.c        /^#undef TYPE$/;"       d       file:
TYPE        reftag.c        /^struct TYPE { int x, y; };$/;"        s       file:
foo.h       reftag.c        /^#include "foo.h"/;"   h
p   reftag.c        /^TYPE p;$/;"   v       typeref:typename:TYPE
stdio.h     reftag.c        /^#include <stdio.h>/;" h
x   reftag.c        /^struct TYPE { int x, y; };$/;"        m       struct:TYPE     typeref:typename:int    file:
y   reftag.c        /^struct TYPE { int x, y; };$/;"        m       struct:TYPE     typeref:typename:int    file:

#undef X and two #include are newly collected.

“roles” is a newly introduced field in Universal Ctags. The field named is for recording how a tag is referenced. If a tag is definition tag, the roles field has “def” as its value.

Universal Ctags prints the role information when the r field is enabled with --fields=+r.

$  ./ctags -o - --extras=+r --fields=+r reftag.c
TYPE        reftag.c        /^#define TYPE /;"      d       file:
TYPE        reftag.c        /^#undef TYPE$/;"       d       file:   roles:undef
TYPE        reftag.c        /^struct TYPE { int x, y; };$/;"        s       file:   roles:def
foo.h       reftag.c        /^#include "foo.h"/;"   h       roles:local
p   reftag.c        /^TYPE p;$/;"   v       typeref:typename:TYPE   roles:def
stdio.h     reftag.c        /^#include <stdio.h>/;" h       roles:system
x   reftag.c        /^struct TYPE { int x, y; };$/;"        m       struct:TYPE     typeref:typename:int    file:   roles:def
y   reftag.c        /^struct TYPE { int x, y; };$/;"        m       struct:TYPE     typeref:typename:int    file:   roles:def

The Reference tag marker field, R, is a specialized GNU global requirement; D is used for the traditional definition tags, and R is used for the new reference tags. The field can be used only with --_xformat.

$ ./ctags -x --_xformat="%R %-16N %4n %-16F %C" --extras=+r reftag.c
D TYPE                3 reftag.c         #define TYPE point
D TYPE                4 reftag.c         struct TYPE { int x, y; };
D p                   5 reftag.c         TYPE p;
D x                   4 reftag.c         struct TYPE { int x, y; };
D y                   4 reftag.c         struct TYPE { int x, y; };
R TYPE                6 reftag.c         #undef TYPE
R foo.h               2 reftag.c         #include "foo.h"
R stdio.h             1 reftag.c         #include <stdio.h>

See Customizing xref output for more details about --_xformat.

Although the facility for collecting reference tags is implemented, only a few parsers currently utilize it. All available roles can be listed with --list-roles:

$ ./ctags --list-roles
#LANGUAGE      KIND(L/N)         NAME                ENABLED DESCRIPTION
SystemdUnit    u/unit            Requires            on      referred in Requires key
SystemdUnit    u/unit            Wants               on      referred in Wants key
SystemdUnit    u/unit            After               on      referred in After key
SystemdUnit    u/unit            Before              on      referred in Before key
SystemdUnit    u/unit            RequiredBy          on      referred in RequiredBy key
SystemdUnit    u/unit            WantedBy            on      referred in WantedBy key
Yaml           a/anchor          alias               on      alias
DTD            e/element         attOwner            on      attributes owner
Automake       c/condition       branched            on      used for branching
Cobol          S/sourcefile      copied              on      copied in source file
Maven2         g/groupId         dependency          on      dependency
DTD            p/parameterEntity elementName         on      element names
DTD            p/parameterEntity condition           on      conditions
LdScript       s/symbol          entrypoint          on      entry points
LdScript       i/inputSection    discarded           on      discarded when linking
...

The first column shows the name of the parser. The second column shows the letter/name of the kind. The third column shows the name of the role. The fourth column shows whether the role is enabled or not. The fifth column shows the description of the role.

You can define a role in an optlib parser for capturing reference tags. See Capturing reference tags for more details.

--roles-<LANG>.<KIND> is the option for enabling/disabling specified roles.

Pseudo-tags

See ctags-client-tools(7) about the concept of the pseudo-tags.

TAG_KIND_DESCRIPTION

This is a newly introduced pseudo-tag. It is not emitted by default. It is emitted only when --pseudo-tags=+TAG_KIND_DESCRIPTION is given.

This is for describing kinds; their letter, name, and description are enumerated in the tag.

ctags emits TAG_KIND_DESCRIPTION with following format:

!_TAG_KIND_SEPARATOR!{parser}   {letter},{name} /{description}/

A backslash and a slash in {description} is escaped with a backslash.

TAG_KIND_SEPARATOR

This is a newly introduced pseudo-tag. It is not emitted by default. It is emitted only when --pseudo-tags=+TAG_KIND_SEPARATOR is given.

This is for describing separators placed between two kinds in a language.

Tag entries including the separators are emitted when --extras=+q is given; fully qualified tags contain the separators. The separators are used in scope information, too.

ctags emits TAG_KIND_SEPARATOR with following format:

!_TAG_KIND_SEPARATOR!{parser}   {sep}   /{upper}{lower}/

or

!_TAG_KIND_SEPARATOR!{parser}   {sep}   /{lower}/

Here {parser} is the name of language. e.g. PHP. {lower} is the letter representing the kind of the lower item. {upper} is the letter representing the kind of the upper item. {sep} is the separator placed between the upper item and the lower item.

The format without {upper} is for representing a root separator. The root separator is used as prefix for an item which has no upper scope.

* given as {upper} is a fallback wild card; if it is given, the {sep} is used in combination with any upper item and the item specified with {lower}.

Each backslash character used in {sep} is escaped with an extra backslash character.

Example output:

$ ./ctags -o - --extras=+p --pseudo-tags=  --pseudo-tags=+TAG_KIND_SEPARATOR input.php
!_TAG_KIND_SEPARATOR!PHP    ::      /*c/
...
!_TAG_KIND_SEPARATOR!PHP    \\      /c/
...
!_TAG_KIND_SEPARATOR!PHP    \\      /nc/
...

The first line means :: is used when combining something with an item of the class kind.

The second line means \\ is used when a class item is at the top level; no upper item is specified.

The third line means \\ is used when for combining a namespace item (upper) and a class item (lower).

Of course, ctags uses the more specific line when choosing a separator; the third line has higher priority than the first.

TAG_OUTPUT_FILESEP

This pseudo-tag represents the separator used in file name: slash or backslash. This is always ‘slash’ on Unix-like environments. This is also ‘slash’ by default on Windows, however when --output-format=e-tags or --use-slash-as-filename-separator=no is specified, it becomes ‘backslash’.

TAG_OUTPUT_MODE

This pseudo-tag represents output mode: u-ctags or e-ctags. This is controlled by --output-format option.

See also Compatible output and weakness.

Truncating the pattern for long input lines

See --pattern-length-limit=N option in ctags(1).

Parser specific fields

A tag has a name, an input file name, and a pattern as basic information. Some fields like language:, signature:, etc are attached to the tag as optional information.

In Exuberant Ctags, fields are common to all languages. Universal Ctags extends the concept of fields; a parser can define its specific field. This extension was proposed by @pragmaware in #857.

For implementing the parser specific fields, the options for listing and enabling/disabling fields are also extended.

In the output of --list-fields, the owner of the field is printed in the LANGUAGE column:

$ ./ctags --list-fields
#LETTER NAME            ENABLED LANGUAGE         XFMT  DESCRIPTION
...
-       end             off     C                TRUE   end lines of various constructs
-       properties      off     C                TRUE   properties (static, inline, mutable,...)
-       end             off     C++              TRUE   end lines of various constructs
-       template        off     C++              TRUE   template parameters
-       captures        off     C++              TRUE   lambda capture list
-       properties      off     C++              TRUE   properties (static, virtual, inline, mutable,...)
-       sectionMarker   off     reStructuredText TRUE   character used for declaring section
-       version         off     Maven2           TRUE   version of artifact

e.g. reStructuredText is the owner of the sectionMarker field and both C and C++ own the end field.

--list-fields takes one optional argument, LANGUAGE. If it is given, --list-fields prints only the fields for that parser:

$ ./ctags --list-fields=Maven2
#LETTER NAME            ENABLED LANGUAGE        XFMT  DESCRIPTION
-       version         off     Maven2          TRUE  version of artifact

A parser specific field only has a long name, no letter. For enabling/disabling such fields, the name must be passed to --fields-<LANG>.

e.g. for enabling the sectionMarker field owned by the reStructuredText parser, use the following command line:

$ ./ctags --fields-reStructuredText=+{sectionMarker} ...

The wild card notation can be used for enabling/disabling parser specific fields, too. The following example enables all fields owned by the C++ parser.

$ ./ctags --fields-C++='*' ...

* can also be used for specifying languages.

The next example is for enabling end fields for all languages which have such a field.

$ ./ctags --fields-'*'=+'{end}' ...
...

In this case, using wild card notation to specify the language, not only fields owned by parsers but also common fields having the name specified (end in this example) are enabled/disabled.

Using the wild card notation to specify the language is helpful to avoid incompatibilities between versions of Universal Ctags itself (SELF INCOMPATIBLY).

In Universal Ctags development, a parser developer may add a new parser specific field for a certain language. Sometimes other developers then recognize it is meaningful not only for the original language but also other languages. In this case the field may be promoted to a common field. Such a promotion will break the command line compatibility for --fields-<LANG> usage. The wild card for <LANG> will help in avoiding this unwanted effect of the promotion.

With respect to the tags file format, nothing is changed when introducing parser specific fields; <fieldname>:<value> is used as before and the name of field owner is never prefixed. The language: field of the tag identifies the owner.

Parser specific extras

As man page of Exuberant Ctags says, --extras option specifies whether to include extra tag entries for certain kinds of information. This option is available in Universal Ctags, too.

In Universal Ctags it is extended; a parser can define its specific extra flags. They can be controlled with --extras-<LANG>=[+|-]{...}.

See some examples:

$ ./ctags --list-extras
#LETTER NAME                   ENABLED LANGUAGE         DESCRIPTION
F       fileScope              TRUE    NONE             Include tags ...
f       inputFile              FALSE   NONE             Include an entry ...
p       pseudo                 FALSE   NONE             Include pseudo tags
q       qualified              FALSE   NONE             Include an extra ...
r       reference              FALSE   NONE             Include reference tags
g       guest                  FALSE   NONE             Include tags ...
-       whitespaceSwapped      TRUE    Robot            Include tags swapping ...

See the LANGUAGE column. NONE means the extra flags are language independent (common). They can be enabled or disabled with --extras= as before.

Look at whitespaceSwapped. Its language is Robot. This flag is enabled by default but can be disabled with --extras-Robot=-{whitespaceSwapped}.

$ cat input.robot
*** Keywords ***
it's ok to be correct
    Python_keyword_2

$ ./ctags -o - input.robot
it's ok to be correct       input.robot     /^it's ok to be correct$/;"     k
it's_ok_to_be_correct       input.robot     /^it's ok to be correct$/;"     k

$ ./ctags -o - --extras-Robot=-'{whitespaceSwapped}' input.robot
it's ok to be correct       input.robot     /^it's ok to be correct$/;"     k

When disabled the name it’s_ok_to_be_correct is not included in the tags output. In other words, the name it’s_ok_to_be_correct is derived from the name it’s ok to be correct when the extra flag is enabled.

Discussion

(This subsection should move to somewhere for developers.)

The question is what are extra tag entries. As far as I know none has answered explicitly. I have two ideas in Universal Ctags. I write “ideas”, not “definitions” here because existing parsers don’t follow the ideas. They are kept as is in variety reasons but the ideas may be good guide for people who wants to write a new parser or extend an exiting parser.

The first idea is that a tag entry whose name is appeared in the input file as is, the entry is NOT an extra. (If you want to control the inclusion of such entries, the classical --kind-<LANG>=[+|-]... is what you want.)

Qualified tags, whose inclusion is controlled by --extras=+q, is explained well with this idea. Let’s see an example:

$ cat input.py
class Foo:
    def func (self):
        pass

$ ./ctags -o - --extras=+q --fields=+E input.py
Foo input.py        /^class Foo:$/;"        c
Foo.func    input.py        /^    def func (self):$/;"      m       class:Foo       extra:qualified
func        input.py        /^    def func (self):$/;"      m       class:Foo

Foo and func are in input.py. So they are no extra tags. In other hand, Foo.func is not in input.py as is. The name is generated by ctags as a qualified extra tag entry. whitespaceSwapped extra flag of Robot parser is also aligned well on the idea.

I don’t say all parsers follows this idea.

$ cat input.cc
class A
{
  A operator+ (int);
};

$ ./ctags --kinds-all='*' --fields= -o - input.cc
A   input.cc        /^class A$/
operator +  input.cc        /^  A operator+ (int);$/

In this example operator+ is in input.cc. In other hand, operator + is in the ctags output as non extra tag entry. See a whitespace between the keyword operator and + operator. This is an exception of the first idea.

The second idea is that if the inclusion of a tag cannot be controlled well with --kind-<LANG>=[+|-]..., the tag may be an extra.

$ cat input.c
static int foo (void)
{
        return 0;
}
int bar (void)
{
        return 1;
}

$ ./ctags --sort=no -o - --extras=+F input.c
foo input.c /^static int foo (void)$/;"     f       typeref:typename:int    file:
bar input.c /^int bar (void)$/;"    f       typeref:typename:int

$ ./ctags -o - --extras=-F input.c
foo input.c /^static int foo (void)$/;"     f       typeref:typename:int    file:

$

Function foo of C language is included only when F extra flag is enabled. Both foo and bar are functions. Their inclusions can be controlled with f kind of C language: --kind-C=[+|-]f.

The difference between static modifier or implicit extern modifier in a function definition is handled by F extra flag.

Basically the concept kind is for handling the kinds of language objects: functions, variables, macros, types, etc. The concept extra can handle the other aspects like scope (static or extern).

However, a parser developer can take another approach instead of introducing parser specific extra; one can prepare staticFunction and exportedFunction as kinds of one’s parser. The second idea is a just guide; the parser developer must decide suitable approach for the target language.

Anyway, in the second idea, --extras is for controlling inclusion of tags. If what you want is not about inclusion, --param-<LANG> can be used as the last resort.

Parser specific parameter

To control the detail of a parser, --param-<LANG> option is introduced. --kinds-<LANG>, --fields-<LANG>, --extras-<LANG> can be used for customizing the behavior of a parser specified with <LANG>.

--param-<LANG> should be used for aspects of the parser that the options(kinds, fields, extras) cannot handle well.

A parser defines a set of parameters. Each parameter has name and takes an argument. A user can set a parameter with following notation

--param-<LANG>:name=arg

An example of specifying a parameter

--param-CPreProcessor:if0=true

Here if0 is a name of parameter of CPreProcessor parser and true is the value of it.

All available parameters can be listed with --list-params option.

$ ./ctags --list-params
#PARSER         NAME     DESCRIPTION
CPreProcessor   if0      examine code within "#if 0" branch (true or [false])
CPreProcessor   ignore   a token to be specially handled

(At this time only CPreProcessor parser has parameters.)