Friday, August 23, 2013

Syntax Modification using Microsoft Roslyn

In the previous two posts we just used the analysis capabilities of Microsoft Roslyn. In other words, we only get information about the code under inspection, we never changed it. In this post we going to change the code through Roslyn.

Before we start we need to highlight that data structures that represent code under inspection in Roslyn APIs are immutable. This means that it cannot be changed after they are created (in order to safely share them between tools). This applies to syntax trees, compilations, symbols, semantic models, and every other data structure in the Roslyn API. Instead of modification, new objects are created based on specified differences to the old ones.

Creating and Modifying Trees

We used SyntaxTree before to create syntax trees from a code snippet. For creating a syntax tree bit-by-bit, you have to compose syntax nodes hierarchically in a bottom-up fashion. To create SyntaxNodes you must use the Syntax class factory methods. For each kind of node, token, or trivia there is a factory method which can be used to create an instance of that type.

In the following example we going to replace a using directive by a new one we create.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Roslyn.Compilers;
using Roslyn.Compilers.CSharp;
using Roslyn.Services;
using Roslyn.Services.CSharp;

namespace RoslynDemo4
{
class Program
{
static void Main(string[] args)
{
SyntaxTree tree = SyntaxTree.ParseText(
@"using System;
using System.Collections;
using System.Linq;
using System.Text;

namespace HelloWorld
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(""Hello, World!"");
}
}
}");

var root = (CompilationUnitSyntax)tree.GetRoot();
// Get the old Using directive [using System.Collections;]
var oldUsing = root.Usings[1];

// Construct the new Using directive
NameSyntax name = Syntax.IdentifierName("System");
name = Syntax.QualifiedName(name, Syntax.IdentifierName("Collections"));
name = Syntax.QualifiedName(name, Syntax.IdentifierName("Generic"));
var newUsing = oldUsing.WithName(name);

// Replace the old Using directive by the new Using directive
root = root.ReplaceNode(oldUsing, newUsing);

SyntaxTree newTree = SyntaxTree.Create(root);
Console.WriteLine(newTree.ToString());
Console.ReadLine();
}
}
}

NameSyntax is the base class for the following names that can appear in C# code:


  • IdentifierNameSyntax which represents simple single identifier names like System and Roslyn
  • GenericNameSyntax which represents a generic type or method name such as List<int>
  • QualifiedNameSyntax which represents a qualified name of the form <left-name>.<right-identifier-or-generic-name> such as System.IO

In our example we constructed a NameSyntax to represent System.Collections.Generic incrementally. Then we used the ReplaceNode method to replace the using System.Collections; directive with the using System.Collections.Generic; directive. We then constructed a new SyntaxTree using the modified root and print it. You should see the modified HelloWorld program:fileFProjectsRoslynDemo4RoslynDemo4binDebugRoslynDemo4.EXE_2013-08-23_09-38-56


Modifying Syntax Trees using SyntaxRewriters


In a previous post we implemented our SyntaxWalker. Now we going to use the same technique and implement our SyntaxRewriter. Both SyntaxWalker and SyntaxRewriter are subclasses of SyntaxVisitor which recursively visit every node in the syntax tree starting from the passed node. The SyntaxRewriter class can be used to apply a transformation to a specific type of SyntaxNode. It is also possible to apply a set of transformations to multiple types of SyntaxNode wherever they appear in a syntax tree. It uses depth-first search algorithm for SyntaxTree traversal.


In the following example we will implement our own SyntaxRewriter which will replace all variable declaration types with the var type. Create a new Console application and add a new class myRewriter to it. Edit the myRewriter.cs to look like:

using System;
using Roslyn.Compilers;
using Roslyn.Compilers.CSharp;

namespace RoslynDemo5
{
class myRewriter : SyntaxRewriter
{
private readonly SemanticModel SemanticModel;

public myRewriter(SemanticModel semanticModel)
{
this.SemanticModel = semanticModel;
}

public override SyntaxNode VisitLocalDeclarationStatement(LocalDeclarationStatementSyntax node)
{
// To avoid the case of multiple variables in a single declaration like the following one
// Type variable1 = expression1,
// variable2 = expression2;
if (node.Declaration.Variables.Count > 1)
{
return node;
}

// To avoid the case of case of a declation without an initializer like the following
// Type variable;
if (node.Declaration.Variables[0].Initializer == null)
{
return node;
}

// Get the first varaible in the declation
VariableDeclaratorSyntax declarator = node.Declaration.Variables.First();
// Get the Type used in the declaration
TypeSyntax variableTypeName = node.Declaration.Type;
// Bind the Type used in the declaration to a Symbol
TypeSymbol variableType = (TypeSymbol)SemanticModel.GetSymbolInfo(variableTypeName).Symbol;
// Get the whole intialization expression
TypeInfo initializerInfo = SemanticModel.GetTypeInfo(declarator.Initializer.Value);
if (variableType == initializerInfo.Type)
{
// Constructing a new Type with the same leading/trailing in the old type to maintain vertical whitespace and indentation
TypeSyntax varTypeName = Syntax.IdentifierName("var").
WithLeadingTrivia(
variableTypeName.GetLeadingTrivia()
).
WithTrailingTrivia(
variableTypeName.GetTrailingTrivia()
);
// Do the replacement and return the new node
return node.ReplaceNode(variableTypeName, varTypeName);
}
else
{
return node;
}
}
}
}


myRewriter class is simply inheriting from SyntaxRewriter class and implments the VisitLocalDeclarationStatment() method. This method is called by the Visit method on SyntaxRewriter, which recursively visits a node and each of its children. The VisitLocalDeclarationStatment method is called on local declaration statements only. It tokes the starting SyntaxNode, if you passed the tree root (as we will do) it will visit the the local declaration statements that exists in the whole program.


Our rewriter takes the semantic model under inspection in the constructor.


In the VisitLocalDeclarationStatment method we check the declaration statement to filter out the cases that are not within our interest. For these cases we return the node unchanged.


For the cases we interested in, we retrieve the variable and the type used for its declaration. Get a symbol for the type and the whole initialization expression.


Then we check that the type used in the declaration is the type used in the initialization to avoid the situations where the declaration may be casting the initializer expression to a base class or interface. Removing the explicit type in these cases would change the semantics of a program.


Then we construct a new TypeSyntax representing the var type with any leading or trailing tivia (white spaces and tabs) to conserve the indentation. Then we replace the node and return the new node.


Let’s go back to our Main method in the Program.cs file. In order to use the myRewriter class, edit the Main method to look like:

 static void Main(string[] args)
{
// Create a test Compilation of our current project
Compilation test = CreateTestCompilation();

// Loop on each SyntaxTree in the Compilation
foreach (SyntaxTree sourceTree in test.SyntaxTrees)
{
// Get the tree SemanticModel
SemanticModel model = test.GetSemanticModel(sourceTree);
// intialize myRewriter using the tree SemanticModel
myRewriter rewriter = new myRewriter(model);
// put our rewriter into action
SyntaxNode newSource = rewriter.Visit(sourceTree.GetRoot());

if (newSource != sourceTree.GetRoot())
Console.WriteLine(newSource.GetText().ToString());
Console.ReadLine();
}
}

Here we create a test Compilation using CreateTestCompilation method, which we will add soon. We then loop for each SyntaxTree in this test Compilation. For each SyntaxTree we get the SemanticModel and pass it to our myRewriter constructor. Then we call the Visit() method on the SyntaxTree root. This will cause our VisitLocalDeclarationStatement method to be called and do its syntax transformation task. If the new SyntaxTree is different than the old SyntaxTree, this means that we did syntax modifications and we print the new code.


Now it add the following method to your code inside Program.cs.

private static Compilation CreateTestCompilation()
{
SyntaxTree programTree = SyntaxTree.ParseFile(@"..\..\Program.cs");
SyntaxTree rewriterTree = SyntaxTree.ParseFile(@"..\..\myRewriter.cs");

List sourceTrees = new List();
sourceTrees.Add(programTree);
sourceTrees.Add(rewriterTree);

MetadataReference mscorlib = MetadataReference.CreateAssemblyReference("mscorlib");
MetadataReference roslynCompilers = MetadataReference.CreateAssemblyReference("Roslyn.Compilers");
MetadataReference csCompiler = MetadataReference.CreateAssemblyReference("Roslyn.Compilers.CSharp");

List references = new List();
references.Add(mscorlib);
references.Add(roslynCompilers);
references.Add(csCompiler);

return Compilation.Create("TransformationCS", null, sourceTrees, references);

}

This method basically creates two SyntaxTrees of our project code. Then make a list of the required references for these source code files. Then it sum it all in the factory method Compilation.Create() and return the resulted Compilation object.


When you run the project, you will see a modified version of the Program.cs file. Notice that our myRewriter class replaced the type used in all variables declarations to be of type var.


fileFProjectsRoslynDemo5RoslynDemo5binDebugRoslynDemo5.EXE_2013-08-23_13-28-40


pressing Enter will bring the modified version of myRewriter.cs file, which have the same kind of change.


fileFProjectsRoslynDemo5RoslynDemo5binDebugRoslynDemo5.EXE_2013-08-23_17-33-15



We have used the Roslyn Compiler APIs to write our own SyntaxRewriter that searches all files in a C# project for certain syntactic pattern, analyzes the semantics of source code that matches that pattern, and transforms it.


Now you can grab a paper and pencil, and start designing the next big thing in refactoring tools industry :)

Thursday, August 22, 2013

Semantic Analysis using Microsoft Roslyn

In a previous post we talked about using Microsoft Rosyln Syntax API to deal with syntax text in terms of SyntaxTrees and SyntaxNodes. But as we we all know, a single source code or code snippet can’t make a useful program. 99% of the time we end up with many source code files that depend on so many externals: assembly references, namespace imports, or other code files. The meaning (semantic) of SyntaxNodes depend heavily on these externals and may change due changes in these externals even if its enclosing source code file have not been changed.

The Compilation class help us deal with source text in the context of its dependents and externals. A Compilation is analogous to a single project as seen by the compiler and represents everything needed to compile a Visual Basic or C# program such as assembly references, compiler options, and the set of source files to be compiled. With this context you can reason about the meaning of code. Compilations allow you to find Symbols – entities such as types, namespaces, members, and variables which names and other expressions refer to. The process of associating names and expressions with Symbols is called Binding.

Creating a Compilation

In this following example we will create a SyntaxTree for our traditional HelloWorld program. Then we created a Compilation out of this SyntaxTree and added a reference to MS Core Library.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Roslyn.Compilers;
using Roslyn.Compilers.CSharp;
using Roslyn.Services;
using Roslyn.Services.CSharp;

namespace RoslynDemo3
{
class Program
{
static void Main(string[] args)
{
SyntaxTree tree = SyntaxTree.ParseText(
@"using System;
using System.Collections.Generic;
using System.Text;

namespace HelloWorld
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(""Hello, World!"");
}
}
}");

var root = (CompilationUnitSyntax)tree.GetRoot();

Compilation compilation = Compilation.Create("HelloWorld")
.AddReferences(MetadataReference.CreateAssemblyReference("mscorlib"))
.AddSyntaxTrees(tree);

foreach (MetadataReference reference in compilation.ExternalReferences)
Console.WriteLine(reference.Display);

Console.ReadLine();
}
}
}

You can supply all the syntax trees, assembly references, and options in one call or you can spread them out over multiple calls. To add the reference we just used the metadata name, which is the same name you find when you add a reference through Visual Studio’s Reference Manager.Reference Manager - RoslynDemo3_2013-08-22_09-54-14


Now when you run your program, you should get an output similar to the following:FProjectsRoslynDemo3RoslynDemo3binDebugRoslynDemo3.exe_2013-08-22_10-00-45


The SemanticModel


Once we have a Compilation we can get a SemanticModel for any SyntaxTree in that Compilation. SemanticModels can be queried to answer questions like “What names are in scope at this location?” “What members are accessible from this method?” “What variables are used in this block of text?” and “What does this name/expression refer to?”


In the following example we going to modify our example a little bit. We will get the semantic model of our HelloWorld program. Then use this model to get semantic Symbol that represents the first using statement (of type NameSpaceSymbol). Then we use that symbol to get a list of namespaces inside the System namespace.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Roslyn.Compilers;
using Roslyn.Compilers.CSharp;
using Roslyn.Services;
using Roslyn.Services.CSharp;

namespace RoslynDemo3
{
class Program
{
static void Main(string[] args)
{
SyntaxTree tree = SyntaxTree.ParseText(
@"using System;
using System.Collections.Generic;
using System.Text;

namespace HelloWorld
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(""Hello, World!"");
}
}
}");

var root = (CompilationUnitSyntax)tree.GetRoot();

Compilation compilation = Compilation.Create("HelloWorld")
.AddReferences(MetadataReference.CreateAssemblyReference("mscorlib"))
.AddSyntaxTrees(tree);

var model = compilation.GetSemanticModel(tree);
var nameInfo = model.GetSymbolInfo(root.Usings[0].Name);
var systemSymbol = (NamespaceSymbol)nameInfo.Symbol;

foreach (var ns in systemSymbol.GetNamespaceMembers())
Console.WriteLine(ns.Name);
Console.ReadLine();
}
}
}

When you run the above code you should get an output like the following:


FProjectsRoslynDemo3RoslynDemo3binDebugRoslynDemo3.exe_2013-08-22_13-23-45


You can also use the query methods we showed in the previous post to retrieve a certain node and then use the semantic model to get more information about that node. You could use the code below to get the node that represent the “Hello, World!” string in our code snippet, get information about its symbol type (System.String) , and even get information about the assembly that contains this type.

            var helloWorldString = root.DescendantNodes()
.OfType()
.First();
var literalInfo = model.GetTypeInfo(helloWorldString);
Console.WriteLine(literalInfo.Type.ContainingAssembly.BaseName);

or enumerate the public methods of the System.String class

            var stringTypeSymbol = (NamedTypeSymbol)literalInfo.Type;

Console.Clear();
foreach (var name in (from method in stringTypeSymbol.GetMembers()
.OfType()
where method.ReturnType == stringTypeSymbol &&
method.DeclaredAccessibility ==
Accessibility.Public
select method.Name).Distinct())
{
Console.WriteLine(name);
}

Somehow, we just scratched the surface of sematic analysis in this post. using both syntax and semantic analysis we can do more advanced and meaningful code focused tasks (which we will do in the next post).

Tuesday, August 20, 2013

Syntax Analysis using Microsoft Roslyn

In a previous post we talked about Microsoft Roslyn. In this post will get our hands dirty with Roslyn syntax analysis in order to develop our first code-focused program. We will see how the SyntaxTree of HelloWorld program look like and how we can traverse and query it.

The Syntax API exposes the syntax trees the compilers use to understand Visual Basic and C# programs. They are produced by the same parser that runs when a project is built or a developer hits F5. The four primary building blocks of syntax trees are:

  • The SyntaxTree class, an instance of which represents an entire parse tree.
  • The SyntaxNode class, instances of which represent syntactic constructs such as declarations, statements, clauses, and expressions.
  • The SyntaxToken structure, which represents an individual keyword, identifier, operator, or punctuation.
  • And lastly the SyntaxTrivia structure, which represents syntactically insignificant bits of information such as the whitespace between tokens, preprocessor directives, and comments.

Traversing Syntax Trees

First, download Microsoft Roslyn CTP from here. and Install it.You will need Visual Studio 2012 and Visual Studio 2012 SDK in order to install Roslyn.

Now open your Visual Studio 2012 and create a new Console Application project. Add the following to your project references

Reference Manager - RoslynDemo1_2013-08-19_15-25-47

Now edit your code to look like:

using System;
using System;
using Roslyn.Compilers;
using Roslyn.Compilers.CSharp;

namespace RoslynDemo1
{
class Program
{
static void Main(string[] args)
{
SyntaxTree tree = SyntaxTree.ParseText(
@"using System;

namespace HelloWorld
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(""Hello, World!"");
}
}
}");

foreach (SyntaxNode node in tree.GetRoot().ChildNodesAndTokens())
PrintSyntaxTree(node);

Console.ReadLine();
}
static void PrintSyntaxTree(SyntaxNode node)
{
if (node != null)
{
Console.WriteLine(node.Kind + " | " + node.ToString());
foreach (SyntaxNode child in node.ChildNodes())
PrintSyntaxTree(child);
}
}
}
}

Now, run our program. You should see an output similar to the following:


fileFProjectsRoslynDemo1RoslynDemo1binDebugRoslynDemo1.EXE_2013-08-20_15-24-06


Now let’s explain what we already did. First, we included the needed namespaces. The common Syntax APIs are found in the Roslyn.Compilers and the Roslyn.Compilers.Common namespace, while the language specific Syntax APIs are found in Roslyn.Compilers.CSharp and Roslyn.Compilers.VisualBasic.


Second, we tried to get a syntax tree corresponding to our HelloWorld program using the SyntaxTree.ParseText function, which produces a syntax tree by parsing the passed source code.


Third, in order to traverse the syntax tree we get its root, then get its child nodes. Then loop on these child nodes and their children recursively to print its kind and the string representation. C# node kind is on of the kinds in enum Roslyn.Compilers.CSharp.SymbolKind.


Inside the PrintSyntaxTree method, by playing in the order of node printing and the recursive call, you can apply your tree traversal logic. You can also inspect specific elements of the tree:

var root = (CompilationUnitSyntax)tree.GetRoot(); 
var helloWorldDeclaration = (NamespaceDeclarationSyntax)root.Members[0];
var programDeclaration = (TypeDeclarationSyntax)helloWorldDeclaration.Members[0];
var mainDeclaration = (MethodDeclarationSyntax)programDeclaration.Members[0];
var argsParameter = mainDeclaration.ParameterList.Parameters[0];
Console.WriteLine(argsParameter.ToString() );

Query Methods


In addition to traversing syntax trees you can also explore the syntax tree using the query methods defined on SyntaxNode. These methods are similar to XPath. You can also use these methods with LINQ to quickly find things in a tree. For example, the following statements uses a LINQ expression and the DescendantNodes method to locate the same parameter as in the previous example:

            var firstParameters = from methodDeclaration in root.DescendantNodes()
.OfType()
where methodDeclaration.Identifier.ValueText == "Main"
select methodDeclaration.ParameterList.Parameters.First();

var argsParameter2 = firstParameters.Single();
Console.WriteLine(argsParameter2.ToString());

SyntaxWalkers


If you want to find all nodes of a specific type in a syntax tree, for example, every property declaration in a file, you can do that by extending the SyntaxWalker class and overriding the VisitPropertyDeclaration method, you can process every property declaration in a syntax tree without knowing its structure beforehand.


Now let’s implement, as an example, a SyntaxWalker that examines the entire syntax tree and collects any using directives it find which aren’t importing a System namespace.


Create a new C# Console application and edit it to look like:

using System;
using Roslyn.Compilers;
using Roslyn.Compilers.CSharp;

namespace RoslynDemo2
{
class Program
{
static void Main(string[] args)
{
SyntaxTree tree = SyntaxTree.ParseText(
@"using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Roslyn.Compilers;
using Roslyn.Compilers.CSharp;

namespace TopLevel
{
using Microsoft;
using System.ComponentModel;

namespace Child1
{
using Microsoft.Win32;
using System.Runtime.InteropServices;

class Foo { }
}

namespace Child2
{
using System.CodeDom;
using Microsoft.CSharp;

class Bar { }
}
}");

var root = (CompilationUnitSyntax)tree.GetRoot();
}
}
}

As you noticed, this source text contains using directives scattered across four different locations: the file-level, in the top-level namespace, and in the two nested namespaces.


Now, add a new class to your project and edit it to look like the next code. Basically we inherit from the SyntaxWalker class, and over the VisitUsingDirective method. This method is called by Visit method, which recursively visits a node and each of its children. VisitUsingDirective method is called for using directives only, inside it we collect the namespaces that starts with “System.” but not equal to “System”.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Roslyn.Compilers;
using Roslyn.Compilers.CSharp;

namespace RoslynDemo2
{
class myWalker : SyntaxWalker
{
public readonly List Usings = new List();

public override void VisitUsingDirective(UsingDirectiveSyntax node)
{
if (node.Name.GetText().ToString() != "System" &&
!node.Name.GetText().ToString().StartsWith("System."))
{
this.Usings.Add(node);
}
}
}
}

Now let’s go back to the Main method and use our walker by adding the following code snippet to the end of Main method. Basically it passes the syntax tree root to myWalker.Visit method, which internally call the VisitUsingDirective method. Then we loop and print the collected usings.


When you run your program, you should see an output similar to the following:


fileFProjectsRoslynDemo2RoslynDemo2binDebugRoslynDemo2.EXE_2013-08-21_14-30-03


In this post we gave a quick example about obtaining a SyntaxTree of HelloWorld program and explored its nodes. We also built a SyntaxWalker that walk the syntax tree and collect the nodes of our interest only. Later we will see how to fully manipulate SyntaxNodes and SyntaxTrees.

Monday, August 19, 2013

Overview of Microsoft Roslyn – The first compiler-as-service product.

Introduction

From developer perspective, compilers are black boxes -- source code goes in one end, magic happens in the middle, and object files or assemblies come out the other end. During their job, compilers build up a deep understanding of our code. For decades this valuable information was unreachable.

The Roslyn project aims to open compilers for developers and offer it as services. Through Roslyn, compilers become services with APIs that can be used for code related tasks in your tools and applications.

In order to understand the APIs that Roslyn offers, we need to understand the general compiler phases:

  1. The parse phase where source is tokenized and parsed into syntax that follows the language grammar.
  2. The declaration phase where declarations from source and imported metadata are analyzed to form named symbols.
  3. The binding phase where identifiers in the code are matched to symbols.
  4. The emitting phase where all the information built up by the compiler is emitted as an assembly.

For each phase of these phases, Roslyn provides a set of APIs to access its services and an object model that allow access to the information at that phase.

  1. The parsing phase is exposed as a syntax tree
  2. The declaration phase as a hierarchical symbol table
  3. The binding phase as a model that exposes the result of the compiler’s semantic analysis
  4. The emit phase as an API that produces IL byte codes.

image

Rosyln API Layers

image

  1. Compiler APIs contains a representation of a single invocation of a compiler (including assembly references, compiler options, and source code files) which provides the object models that corresponds to the compiler pahases.
    1. Scripting APIs represents a runtime execution context for C# or VB piece of code. It contains a scripting engine that allows evaluation of expressions and statements as top-level constructs in programs.
  2. Services APIs represents the starting point for doing code analysis and refactoring over entire solutions by organizing all the information about the projects in a solution into single object model, offering you direct access to the compiler layer object models without needing to parse files, configure options or manage project to project dependencies. In addition, the services layer surfaces a set of commonly used APIs used when implementing code analysis and refactoring tools that function within a host environment like the Visual Studio IDE.
  3. Editor Services APIs allows a user to easily extend Visual Studio IDE features (such as IntelliSense, refactorings, and code formatting). This layer has a dependency on the Visual Studio text editor and the Visual Studio software development kit (SDK).

Working with Syntax

The fundamental data structure exposed by the Compiler APIs is the syntax tree. It represents the lexical and syntactic structure of source code, No part of the source code is understood without it first being identified. These trees enable tools to manipulate source code without editing it as text.

Syntax Trees Attributes

Syntax trees have three key attributes:

  1. Full fidelity which means that the syntax tree contains every piece of information found in the source text, every grammatical construct, every lexical token, and everything else in between including whitespace, comments, and preprocessor directives.
  2. Round-trippable which means that it can be traced back to the text it was parsed from and get the text representation of the sub-tree rooted at that node. This means that syntax trees can be used as a way to construct and edit source text. By creating a tree you have by implication created the equivalent source code, and by editing a syntax tree, making a new tree out of changes to an existing tree, you have effectively edited the source code.
  3. Immutable and thread-safe which means that after a tree is obtained, it is a snapshot of the current state of the code, and never changes. This allows multiple users to interact with the same syntax tree at the same time in different threads without locking or duplication.

Syntax Tree Elements

Each syntax tree is made up of nodes, tokens, and trivia.

  • Syntax Nodes: the primary elements of syntax trees. It represent syntactic constructs such as declarations, statements, clauses, and expressions. They are non-terminal nodes in the syntax tree, which means they always have other nodes and tokens as children. Each category of syntax nodes is represented by a separate class derived from SyntaxNode.
    • Parent property returns the node parent.
    • ChildNodes() returns a list of child nodes in sequential order based on its position in the source text. This list does not contain tokens.
    • DescendantNodes(), DescendantTokens(), and DescendantTrivia() returns a list of child nodes in sequential order based on its position in the source text.
    • Each syntax node subclass exposes all the same children through strongly typed properties.
      • BinaryExpressionSyntax node class
        • Property Left/Right of type ExpressionSyntax
        • Property OperatorToken of type SyntaxToken
      • IfStatementSyntax node class
        • Optional property ElseClauseSyntax which returns null if the child is not present.
  • Syntax Tokens are the terminals of the language grammar, representing the smallest syntactic fragments of the code. They are never parents of other nodes or tokens. Syntax tokens consist of keywords, identifiers, literals, and punctuation.
  • Syntax Trivia represent the parts of the source text that are largely insignificant for normal understanding of the code, such as whitespace, comments, and preprocessor directives. Because trivia are not part of the normal language syntax and can appear anywhere between any two tokens, they are not included in the syntax tree as a child of a node. you can access them through token’s LeadingTrivia or TrailingTrivia collections.
  • Spans In order to make each node knows its position within the source text, each node has two properties of type TextSpan. A TextSpan object is the beginning position and a count of characters, both represented as integers. Each node has two TextSpan properties:
    • The Span property is the text span from the start of the first token in the node’s sub-tree to the end of the last token. This span does not include any leading or trailing trivia.
    • The FullSpan property is the text span that includes the node’s normal span, plus the span of any leading or trailing trivia.
  • Kinds Each node, token, or trivia has a Kind property, of type SyntaxKind, that identifies the exact syntax element represented. The Kind property allows for easy disambiguation of syntax node types that share the same node class.
  • Errors When the parser encounters code that does not conform to the defined syntax of the language, it uses one of two techniques to create a syntax tree.
    • If the parser expects a particular kind of token, but does not find it, it may insert a missing token into the syntax tree in the location that the token was expected. A missing token represents the actual token that was expected, but it has an empty span, and its IsMissing property returns true.
    • The parser may skip tokens until it finds one where it can continue parsing. In this case, the skipped tokens that were skipped are attached as a trivia node with the kind SkippedTokens.

In this post we explored Microsoft Roslyn, an interesting product that offer the compiler as a service and opens a lot of doors in front of developers to develop code-focused tools and applications. In later posts we will gradually show Roslyn in action.

Tuesday, August 13, 2013

How to debug your Pex parameterized unit tests

We talked before about Microsoft Pex, and how it choose the shown inputs. We then talked more about building blocks of parameterized unit tests and its patterns.

What we missed till now is how to debug your parameterized unit tests. In order to debug your parameterized unit tests, do the following:

  • Go to Tools >> Options >> Pex >> Diagnostic.
  • Change the BreakOnStart option to True.

Options_2013-08-13_13-01-05

  • Now set a break point at any line of code inside your parameterized unit test.
  • Right click inside your parameterized unit test and select Run Pex Explorations.
  • You will see a dialog popping asking to attach a debugger.
  • Select your current Visual Studio instance.

Visual Studio Just-In-Time Debugger_2013-08-13_13-17-20

  • Once attached, press F5.
  • Now Visual Studio debugger will jump to your break point in the parameterized unit test and stop there. You can now step through you parameterized unit test code.

You will notice that parameter’s values are those selected by Pex. The debugger will run the parameterized unit test for each input value(s) chosen by Pex, So you can inspect the PUT behavior against a specific input value.

Thursday, August 1, 2013

Convert Microsoft Pex path conditions from Z3 native format to SMT-LIB

We talked before about getting the code path conditions from Microsoft Pex as in Z3 native format(.z3 file).

Sometimes you may need to convert from Z3 native format to SMT-LIB standard. You can do that using the Z3 C# APIs. The only trick here is: you have to use the Microsoft.Z3.dll that come with your current Microsoft Pex installation (C:\Program Files\Microsoft Pex\bin\Microsoft.Z3.dll).

To show how it works, let’s create a new C# console application. Add refeerence to Microsoft.Z3.dll that come with your Pex installation.

Edit your code to look like:

using System;
using System.IO;
using Microsoft.Z3;

namespace Z3ToSMTLIB
{
class Program
{
static void Main(string[] args)
{
if(args.Length == 0)
Console.WriteLine("Please provide input file name.");
if (!File.Exists(args[0]))
Console.WriteLine("The specified file doesn't exist.");
using (Context ctx = new Context(new Config()))
{
string SMTLIB= ctx.ParseZ3File(args[0]).ToString();
StreamWriter writer = new StreamWriter(args[0].TrimEnd("z3".ToCharArray()) + "SMTLIB");
writer.Write(SMTLIB);
writer.Close();
}
}
}
}

Right click your project in the solution explorer and select Add >> New Item >> select Application Configuration File >> click Add. Edit the newly created App.config file to look like: Z3ToSMTLIB - Microsoft Visual Studio_2013-08-13_12-58-55Now you can run your code and pass a file in Z3 native format and get a file in SMT-LIB format. This SMT-LIB format opens a lot of possibilities that we will explore later.

Tuesday, July 30, 2013

The Z3 Constraint Solver, a developer perspective

Z3 is a high-performance SMT solver developed at Microsoft Research. It have been integrated with many many tools that came out from Microsoft for program analysis, testing, and verification.

What SAT means ?

SAT refers to Boolean satisfiability problem where we want to determine if there exists an interpretation that satisfies a given Boolean formula. In other words, it establishes if the variables of a given Boolean formula can be assigned in such a way as to make the formula evaluate to true.

What SMT means ?

SMT stands for Satiability Modulo Theories. SMT instance is a formula in first-order logic, where some functions and predicates have additional interpretations. SMT problem is a decision problem of determining whether such a formula is satisfiable or not.

An SMT instance is a generalization of a Boolean SAT instance in which various sets of variables are replaced by predicates over a suitable set of non-binary variables.

These predicates are classified according to the theory they belong to. For instance, linear inequalities over real variables are evaluated using the rules of the theory of linear real arithmetic.

What SMT solver means ?

The goal of an SMT solver is to determine whether an SMT instance can evaluate to true or not. The same analog applies for SAT solvers.

SMT solvers

There is a lot of SMT solvers available but there is only one SMT solver with C# APIs, it is Z3. For a list of available SMT solvers, refer to this page.

Download Z3

You can download Z3 from http://z3.codeplex.com/ . I downloaded Z3 4.3.0 and extracted it to C:\z3.

C# Example

In the following example we going to let Z3 solve the following equation system:

  • x > 0
  • y = x + 1 
  • y < 3

Solving the equations means finding values for x and y that make the whole formula evaluates to true.

  • Let’s create a new console application project in Visual Studio.
  • Add reference to Microsoft.Z3.dll which is located in the bin directory of the Z3 installation directory.
  • Copy the file libz3.dll from the bin directory of the Z3 installation directory to your project build directory.
  • Now edit your code to look like the following:
using System;
using Microsoft.Z3;

namespace Z3Demo1
{
class Program
{
static void Main(string[] args)
{
using (Context ctx = new Context())
{
Expr x = ctx.MkConst("x", ctx.MkIntSort());
Expr y = ctx.MkConst("y", ctx.MkIntSort());
Expr zero = ctx.MkNumeral(0, ctx.MkIntSort());
Expr one = ctx.MkNumeral(1, ctx.MkIntSort());
Expr three = ctx.MkNumeral(3, ctx.MkIntSort());

Solver s = ctx.MkSolver();
s.Assert(ctx.MkAnd(ctx.MkGt((ArithExpr)x, (ArithExpr)zero), ctx.MkEq((ArithExpr)y,
ctx.MkAdd((ArithExpr)x, (ArithExpr)one)), ctx.MkLt((ArithExpr)y, (ArithExpr)three)));
Console.WriteLine(s.Check());

Model m = s.Model;
foreach (FuncDecl d in m.Decls)
Console.WriteLine(d.Name + " -> " + m.ConstInterp(d));

Console.ReadLine();
}
}
}
}


Now let’s run the code above and see the output. The solver says the equation system is satisfiable and then gives us the x and y values that satisfy.



fileFProjectsZ3Demo1Z3Demo1binDebugZ3Demo1.EXE_2013-07-29_08-51-07



How it works ?



To interact with Z3 through C#, you need a Context object. Variables and numerals in your equations are modeled as Expr objects. You get these objects using member functions in the Context object (MkConts(), MkNumeral(),….). You construct your operand using member functions in the Context object (MkGt(), MkAdd(), MkLt(),…). To solve all the equations together you need to hock them up using AND operator, which is implemented using Context.MkAnd(). After hocking everything in one AND, you pass that the solver through Solver.Assert(). And as you may guessed, you obtain this Solver using Context.MkSolver().



Solver.Check() will tell you whether this equation system can be solved or not. To get the variables’ assignments that the come up with, get a Model object Solver.Model. Then use the Delcs collection to get all symbols that have an interpretation in the model. Model.ConstInterp() will get the symbol assigned value.



In this post we briefly introduced SAT, SMT, and their solvers. Then we explored the only SMT solver that written in C# and had a C# API. Now you can play with as many equations as you want and check them for satisfiability and even got the solution values.