Anglr VSIX extension

Extension description

Anglr VSIX is the official Visual Studio extension for the Anglr language and toolchain. It integrates tightly with the Anglr compiler, MSBuild tasks, and project system to deliver a unified editing and build experience.

Basically, this is an Language Server Protocol implementation for the Anglr language (*.anglr) with some additions.

It is best to use it together with the NuGet package AnglrMSBuildTasks.

How it Works

VSIX Installation — Install extension from Visual Studio Marketplace or install it using vsixinstaller from Developer Command Prompt, if you obtained the file AnglrMSBuildTasks.vsix in some other way
Build Integration — When you create a C# project in which you want to use *.anglr files, first install the AnglrMSBuildTasks NuGet package in that project so that the Anglr compiler integrates into the project's build system.
VSIX activation — VSIX extension will be activated as soon as you open or add the first *.anglr file in your C# project.
Build Invocation — At build time all *.anglr files will be compiled automatically along with the other files in the project. Likewise, all the files generated by the Anglr compiler from *.anglr files will be compiled at the same time.

The extension provides:

Language awareness — content types, file associations, syntax classification
Editor tooling — semantic coloring, lightweight IntelliSense, classifier‑based structure hints
Build integration — automatic discovery of Anglr files, MSBuild item registration, and invocation of the Anglr compiler during project builds
Developer workflow enhancements — error surfacing, output routing, and optional integration with Visual Studio’s Output Window

Anglr VSIX is designed to be lightweight, dependency‑free, and safe for the Visual Studio environment, making it suitable for both standalone Anglr projects and mixed‑language solutions.

Anglr description

What is Anglr

Anglr means several things:

Anglr is language
Anglr is compiler
Anglr is Visual Studio extension
Anglr is NuGet package

What is Anglr used for

Simply put, Anglr is used for processing structured text:

Language — The syntactic rules for the structure of the text we want to analyze (basically for a CFG grammar for this text) are written in the form of EBNF rules and stored in a .anglr file. The Anglr language also has some additional rules, with the help of which we can greatly improve the readability of syntax rules. The Anglr language is therefore an extension of the EBNF language.
Compiler — At build time *.anglr file is translated with the Anglr compiler. The result is a set of files containing the implementation of the parser (and many other stuff) for structured text.
Editor — The VSIX extension provides an improved appearance for *.anglr files, like syntax classification, semantic coloring, error detection, structured view, etc.
Text Complexity — Structured text can represent something as simple as a telephone directory or as complex as the source file of a programming language.
Syntax Complexity — With the Anglr language, one can describe the syntax of things as simple as a telephone directory or as complex as the source file of a programming language.

What Anglr Generates (Core Capabilities Overview)

Anglr is, popularly speaking, just another compiler compiler. Anglr is a compiler or parser generator for context‑free grammars. From a single .anglr grammar file, Anglr produces a complete, strongly‑typed parsing and analysis pipeline:

Generalized LR parser — For every CFG grammar contained in .anglr file, the Anglr compiler will generate a parser that is capable of performing syntactic analysis of any sentence (in a grammatical sense) in the language associated with the selected CFG grammar. Example: for an arbitrary CFG grammar of the C# programming language (for every CFG language, there theoretically exists an infinite number of grammars), the Anglr compiler will create a generalized LR parser capable of performing syntactic analysis of any C# program.
Scanners and Lexers — In .anglr files, not only are CFG grammars for structured text defined, but also scanners and lexical analyzers for the given text. Scanners extract pieces of text from the source text, while lexical analyzers use them as intended. Lexical analyzers are used by parsers for syntactic analysis of text.
Incremental parsers — Efficient parsers that update only the affected parts of the syntax tree when the source changes — ideal for IDE scenarios and real‑time tooling.
Precedence grammars — Syntactic rules that are composed in an appropriate manner or marked with special attributes can be translated into precedence rules.
Iteration preference — For syntactic rules that are composed in an appropriate way, iterative visitors are generated. With this, we prevent stack overflow when handling long recursively composed data, such as long lists, for example.
Parse Tree — The most important product of the parser is certainly the parse tree. For each successfully analyzed structured text, the parser builds a concrete parse tree, which can be used for semantic analysis, code generation, and other transformations of the source text.
Typed parse‑tree objects — Each syntactic rule produces a dedicated node type for every node of generated parse tree, giving you a structured, navigable representation of the grammar.
Semantic analysis infrastructure — Anglr generates objects that support semantic processing through an event‑driven model. Each syntactic rule has an associated semantic event, triggered when the visitor processes the corresponding node.
Visitor pattern support — Anglr emits a complete visitor API for traversing syntax trees, enabling transformations, validations, and code generation.
Source regeneration — Any syntax tree produced by the parser can be converted back into source code, preserving formatting rules defined by the grammar.
Example Parser - For each successfully translated CFG grammar, the Anglr compiler prepares the smallest possible example of a program that can translate text belonging to the language generated by that grammar. If we take, for example, a CFG grammar for C#, then the Anglr compiler prepares a short program that can translate any C# program. This program also includes a visitor with a test semantic object for the syntax tree belonging to the compiled text. The exact same procedure is, for example, also carried out for the syntax tree that belongs to a successfully compiled C# program.

This makes Anglr suitable not only for language design, but also for DSLs, code generators, configuration languages, and domain‑specific modeling tools.

❗Since the Anglr compiler generates a generalized LR parser, there is no need to struggle with making the grammar definition conflict-free. However, it is a good idea to try to do this. Due to conflict resolution, generalized LR parsers are slower. Therefore, it is good to write a grammar that contains as few conflicting syntactic rules as possible.

Structure of *.anglr File

Every *.anglr file is composed of arbitrary number of Anglr file Parts:

General Part — In this section, things that are important for the entire *.anglr file are defined.
Declaration Part — In this section, the terminal symbols and regular expressions that are used in other parts of the *.anglr file are defined.
Scanner Part — In this part, the scanner is defined. It discovers terminal symbols in the source text using regular expressions.
Lexer Part — In this part, a lexical analyzer is defined. It consists of an arbitrary number of scanners. One of them is the initial one, namely the one that is marked as such or the one that is listed first if there is no scanner marked as initial one.
Parser Part — In this part, the grammmar of the language containing instances of structured text is described. The grammar is defined by the set of syntactic rules contained in this part. If this grammar describe the structure of telephone directory, for example, the language contains all possible text representations of telephone directories, and the syntax rules are used to check the validity of specific telephone directory.

Each *.anglr file can contain any number of the mentioned parts. There can be multiple general parts, declaration, scanner, lexer and parser parts. However, it is usually the case that the general part is not needed, only one declaration part appears, several scanner parts, one lexer part, and one parser part.

For each part, the Anglr compiler generates one or more files with source code that is associated with that part.

❗In *.anglr files, all the symbols that we use must be defined, even string and character constants used in syntax rules to denote terminal symbols.

Structure of *.anglr File Part

Each part of an .anglr file is composed similarly:

Attributes — Each part begins with any number, even zero, of attributes. Attributes are composed similarly to those in C#. Some attributes are mandatory, while others can be added by the user according to their own wishes and needs. The only important thing is that they are composed in accordance with the syntactic rules of *.anglr files.
Part Body — The body of .anglr file follows attribute declarations. It is composed of declarator, name and content of specific part which appears between %{ and %}. The declarators are: %general, %declaration, %scanner, %lexer and %parser. Names for parts of the same type must be different.

Examples

Basic Example of *.anglr File

Here’s an example that demonstrates how to declare tokens and regular expressions, scanner definitions, lexical analyzer definition and parser definition for a simple calculator that allows basic arithmetic operations: addition, subtraction, multiplication, and division:


[ Description Text='definitions of tokens and regular expressions used to define syntax']
[ Description Text='of simple arithmetic expressions']
[
	CompilationInfo
		ClassName='MathDecls'
		NameSpace='Math.Declarations'
		Access='public'
]
%declarations mathDecls
%{
	%regex
	{
		decimal-digit [0-9]
		number {decimal-digit}+
		add \+
		sub \-
		mul \*
		div \/
		lb \(
		rb \)
	}

	%terminal
	{
		NUMBER
		add '+'
		sub '-'
		mul '*'
		div '/'
		lb '('
		rb ')'
	}
%}

[ Description Text='definition of scanner, which extracts comments from input string']
[ Declarations Id='mathDecls' ]
[
	CompilationInfo
		ClassName='CommentRegex'
		NameSpace='Math.ScannerLib'
		Access='public'
]
%scanner commentScanner
%{
[\*]+\/
	pop
[\n\r]
	skip
[^\*]+
	skip
[\*]+
	skip
%}

[ Description Text='definition of scanner, which extracts terminal symbols from input string']
[ Declarations Id='mathDecls' ]
[
	CompilationInfo
		ClassName='MathRegex'
		NameSpace='Math.ScannerLib'
		Access='public'
]
%scanner mathScanner
%{
\/\*
	push commentScanner
{number}
	terminal NUMBER
{add}
	terminal add
{sub}
	terminal sub
{mul}
	terminal mul
{div}
	terminal div
{lb}
	terminal lb
{rb}
	terminal rb
[ \t]+
	skip
[\n\r]
	skip
.
	skip
%}

[ Description Text='Lexer for anglr file' Hover='true' ]
[
	UseScanner
		ScannerId='commentScanner'
		InitialScanner='mathScanner'
		Hover='true'
]
[
	CompilationInfo
		ClassName='MathLexer'
		NameSpace='Math.Lexer'
		Access='public'
		Hover='true'
		CodeDir='.'
]
%lexer mathLexer
%{

%}

[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[
	CompilationInfo
	ClassName='MathParser'
	NameSpace='Math.Parser'
	Access='public'
	CodeDir='mathParser'
]
%parser mathParser
%{

[ Start ]
expression
	:	additive-expression
	;

additive-expression
	:	multiplicative-expression
	|	additive-expression '+' multiplicative-expression
	|	additive-expression '-' multiplicative-expression
	;

multiplicative-expression
	:	unary-expression
	|	multiplicative-expression '*' unary-expression
	|	multiplicative-expression '/' unary-expression
	;

unary-expression
	:	number
	|	'(' expression ')'
	;

number
	:	NUMBER
	|	'+' number
	|	'-' number
	;

%}

As we can see, the file is composed of the following parts:

one declaration part: %declarations mathDecls
two scanner parts: %scanner commentScanner, %scanner mathScanner
one lexer part: %lexer mathLexer
and one parser part: %parser mathParser

It has no general part.

We can also observe that the syntax rules are written in such a way that it takes into account the priority of arithmetic operators. This is an example of grammar which can be converted into precedence grammar.

Based on this information, the Anglr compiler generates a set of source C# files, in which the lexical and syntactic analyzer for arithmetic expressions defined in the .anglr file are implemented. In addition to the files in which the lexical and syntax analyzers are implemented, there are also files that can be used to implement the semantic analysis of arithmetic expressions.

Unlike other similar tools, it is not possible to mix source code of target languages in *.anglr files. In this way, we get the generated code that can be easily reused in multiple applications.

This form of syntactic rules is called the canonical form. All syntactic rules in the Anglr language can be expressed in canonical form regardless of how complicated the language we want to represent is.

Advanced Example

We can also write syntactic rules more compactly and complexly, e.g.:


expression
	:
		( : basic-operand :
			NUMBER
			|	'(' expression ')'
			|	'+' basic-operand
			|	'-' basic-operand
		)
		+ [ '<<' | '>>' ]
		+ [ '&' ]
		+ [ '^' ]
		+ [ '|' ]
		+ [ '*' | '/' ]
		+ [ '+' | '-' ]
		~+ [ '=' | '+=' | '-=' | '*=' | '/=' | '|=' | '^=' | '&=' | '<<=' | '>>=' ]
		+ [ '<' | '>' | '<=' | '>=' ]
		+ [ '==' | '!=' ]
		+ [ '&&' ]
		+ [ '||' ]
		+ [ ';' ]
	;

This is a similar arithmetic expression as before, except that it defines many more arithmetic operations. The order of operations (essentially it is about nesting) determines their priority. In all modern programming languages, arithmetic operators have this precedence order. It does not look the most natural, but it is as it is.

It is an example of precedence grammar with some operators being left and others being right associative.

In this example are also represented some syntactic specifics of Anglr language:

Nested Syntax Rules — Example: basic-operand is defined as named nested syntax rule. If the name (basic-operand) were not specified, this would be an example of an unnamed nested syntactic rule.
Cardinality Operators — They define lists of elements, associativity rules between elements in the list and delimiters between them. For example: + [ '<<' | '>>' ] means : take any number of basic-operands, delimit them with operators '<<' or '>>' and group them according to left associativity rules. The text between [ and ] represents unnamed syntax rule. This nesting can be recursive. An example of recursive nesting is definition of Anglr grammar with only one large recursive nesting syntax rule defined there.
Precedence Grammars — The rules for the associativity of cardinality operators in the Anglr language determine the priority of operators, which are listed in unnamed nested syntactic rules of cardinality operators. For example: arithmetic operators '<<' and '>>' have the highest priority, while the operator '||' has the lowest priority. All of them are associative from the left, except assignment operators ('=', '+=', etc.) which are associative from the right.

As was said in the previous example, this case can also be expressed in canonical form, but the notation would be inappropriately longer (perhaps even more unclear) than it otherwise is. Before Anglr translates syntax rules, it quietly converts them into canonical form, since it can only translate syntax rules in canonical form.

Resources

There is link to official web site for Anglr. It is still under construction.

AnglrLangExtension

Angstlr