Thursday, August 22, 2013

Semantic Analysis using Microsoft Roslyn

In a previous post we talked about using Microsoft Rosyln Syntax API to deal with syntax text in terms of SyntaxTrees and SyntaxNodes. But as we we all know, a single source code or code snippet can’t make a useful program. 99% of the time we end up with many source code files that depend on so many externals: assembly references, namespace imports, or other code files. The meaning (semantic) of SyntaxNodes depend heavily on these externals and may change due changes in these externals even if its enclosing source code file have not been changed.

The Compilation class help us deal with source text in the context of its dependents and externals. A Compilation is analogous to a single project as seen by the compiler and represents everything needed to compile a Visual Basic or C# program such as assembly references, compiler options, and the set of source files to be compiled. With this context you can reason about the meaning of code. Compilations allow you to find Symbols – entities such as types, namespaces, members, and variables which names and other expressions refer to. The process of associating names and expressions with Symbols is called Binding.

Creating a Compilation

In this following example we will create a SyntaxTree for our traditional HelloWorld program. Then we created a Compilation out of this SyntaxTree and added a reference to MS Core Library.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Roslyn.Compilers;
using Roslyn.Compilers.CSharp;
using Roslyn.Services;
using Roslyn.Services.CSharp;

namespace RoslynDemo3
{
class Program
{
static void Main(string[] args)
{
SyntaxTree tree = SyntaxTree.ParseText(
@"using System;
using System.Collections.Generic;
using System.Text;

namespace HelloWorld
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(""Hello, World!"");
}
}
}");

var root = (CompilationUnitSyntax)tree.GetRoot();

Compilation compilation = Compilation.Create("HelloWorld")
.AddReferences(MetadataReference.CreateAssemblyReference("mscorlib"))
.AddSyntaxTrees(tree);

foreach (MetadataReference reference in compilation.ExternalReferences)
Console.WriteLine(reference.Display);

Console.ReadLine();
}
}
}

You can supply all the syntax trees, assembly references, and options in one call or you can spread them out over multiple calls. To add the reference we just used the metadata name, which is the same name you find when you add a reference through Visual Studio’s Reference Manager.Reference Manager - RoslynDemo3_2013-08-22_09-54-14


Now when you run your program, you should get an output similar to the following:FProjectsRoslynDemo3RoslynDemo3binDebugRoslynDemo3.exe_2013-08-22_10-00-45


The SemanticModel


Once we have a Compilation we can get a SemanticModel for any SyntaxTree in that Compilation. SemanticModels can be queried to answer questions like “What names are in scope at this location?” “What members are accessible from this method?” “What variables are used in this block of text?” and “What does this name/expression refer to?”


In the following example we going to modify our example a little bit. We will get the semantic model of our HelloWorld program. Then use this model to get semantic Symbol that represents the first using statement (of type NameSpaceSymbol). Then we use that symbol to get a list of namespaces inside the System namespace.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Roslyn.Compilers;
using Roslyn.Compilers.CSharp;
using Roslyn.Services;
using Roslyn.Services.CSharp;

namespace RoslynDemo3
{
class Program
{
static void Main(string[] args)
{
SyntaxTree tree = SyntaxTree.ParseText(
@"using System;
using System.Collections.Generic;
using System.Text;

namespace HelloWorld
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(""Hello, World!"");
}
}
}");

var root = (CompilationUnitSyntax)tree.GetRoot();

Compilation compilation = Compilation.Create("HelloWorld")
.AddReferences(MetadataReference.CreateAssemblyReference("mscorlib"))
.AddSyntaxTrees(tree);

var model = compilation.GetSemanticModel(tree);
var nameInfo = model.GetSymbolInfo(root.Usings[0].Name);
var systemSymbol = (NamespaceSymbol)nameInfo.Symbol;

foreach (var ns in systemSymbol.GetNamespaceMembers())
Console.WriteLine(ns.Name);
Console.ReadLine();
}
}
}

When you run the above code you should get an output like the following:


FProjectsRoslynDemo3RoslynDemo3binDebugRoslynDemo3.exe_2013-08-22_13-23-45


You can also use the query methods we showed in the previous post to retrieve a certain node and then use the semantic model to get more information about that node. You could use the code below to get the node that represent the “Hello, World!” string in our code snippet, get information about its symbol type (System.String) , and even get information about the assembly that contains this type.

            var helloWorldString = root.DescendantNodes()
.OfType()
.First();
var literalInfo = model.GetTypeInfo(helloWorldString);
Console.WriteLine(literalInfo.Type.ContainingAssembly.BaseName);

or enumerate the public methods of the System.String class

            var stringTypeSymbol = (NamedTypeSymbol)literalInfo.Type;

Console.Clear();
foreach (var name in (from method in stringTypeSymbol.GetMembers()
.OfType()
where method.ReturnType == stringTypeSymbol &&
method.DeclaredAccessibility ==
Accessibility.Public
select method.Name).Distinct())
{
Console.WriteLine(name);
}

Somehow, we just scratched the surface of sematic analysis in this post. using both syntax and semantic analysis we can do more advanced and meaningful code focused tasks (which we will do in the next post).