Critical Development

Language design, framework development, UI design, robotics and more.

Archive for May, 2010

Reimagining the IDE

Posted by Dan Vanderboom on May 31, 2010

Overview

After working in Visual Studio for the past decade, I’ve accumulated a broad spectrum of ideas on how the experience could be better.  From microscopic features like “I want to filter Intellisense member lists by member type” to recognition of larger patterns of conceptual organization and comprehension, there aren’t many corners of the IDE that couldn’t be improved with additional features—or in some cases—a redesign.

To put things in perspective, consider how the Windows Mobile platform languished for years and become stale (or “good enough”) until the iPhone changed the game and raised the bar on quality to a whole new level.  It wasn’t until fierce competition stole significant market share that Microsoft completely scrapped the Windows Mobile division and started fresh with a complete redesign called Windows Phone 7.  This is one of the smartest things Microsoft has done in a long time.

After many years of incremental evolution, it’s often necessary to rethink, reimagine, and occassionally even start from scratch in order to make the next revolutionary jump forward.

Visual Studio Focus

Integrated Development Environments have been with us for at least the past decade.  Whether you work in Visual Studio, Eclipse, NetBeans, or another tool, there is tremendous overlap in the set of panels available, the flexible layout of those panels, saved workspaces, and add-in infrastructure to make as much as possible extensible.  I’ll focus on Visual Studio for my examples and explanations since that’s the IDE I’m most familiar with, but there are parallels to other IDEs for much of what I’m going to cover.

Visual Components & Flexible Layout

Visual layout is one thing that IDEs do right.  Instead of a monolithic UI, it’s broken down into individual components such as panels, toolbars, toolboxes, main menus and context menus, code editors, designers, and more.  These components can be laid out at runtime with intuitive drag-and-drop operations that visually suggest the end result.

The panels of an IDE can be docked to any edge of another panel, they can be laid on top of another panel to create tab controls, and adjacent panels can be relatively resized with splitters that appear between panels.  After many years of refinement, it’s hard to imagine a better layout system than this.

The ability to save layouts as workspaces in Expression Blend is a particularly nice feature.  It would be nicer still if the user could define triggers for these workspaces, such as “change layout to the UI Designer workspace when the XAML or Windows Forms designers are opened”.

IDE Hosting

Visual Studio and other development tools have traditionally been desktop applications.  In Silverlight 4, however, we now have a framework sufficiently powerful to build a respectable cross-platform IDE.

With features such as off-line, out-of-browser execution, full screen mode, custom context menus, and trusted access to the local file system, it’s now possible for a great IDE to be built and run on Windows, Mac OS X, or Linux, and to allow a developer to access the IDE and their solutions from any computer with a browser (and the Silverlight plug-in).

There are already programming editors and compilers in the cloud.  In episode 562 of .NET Rocks on teaching programming to kids, their guests point out that a subset of the Small Basic IDE is available in Silverlight.  For those looking to build programming editors, ActiPro has a SyntaxEditor control in WPF that they’re currently porting to Silverlight (for which they report seeing a lot of demand).

Ideally such an IDE would be free, or would have a free version available, but for those of us who need high-end tools and professional-level features sets, imagine how nice it would be to pay a monthly fee for access to an ever-evolving IDE service instead of having to cough up $1,100 or $5,500 (or more) every couple years.  Not only would costs be conveniently amortized over the span of the tool’s use, but all of your personal preferences would be easily synchronized across all computers that you use to work on that IDE.

With cloud computing services such as Windows Azure, it would even be possible to off-load compilation of large solutions to the cloud.  Builds that took 30 minutes could be cut down to a few minutes or less by parallelizing build tasks across mutliple cores and servers.

The era of cloud development tools is upon us.

Solution Explorer & The Project System

Solution Explorer is one of the most useful and important panels in Visual Studio.  It provides us with an organizational tool for all the assets in our solution, and provides a window into the project system on which core behaviors such as builds are based.  It is through the Solution Explorer that we typically add or remove files, and gain access to visual designers and the ever-present code editor.

In many ways, however, Solution Explorer and the project system it represents are built on an old and tired design that hasn’t evolved much since its introduction over ten years ago.

For example, it still isn’t possible to “add existing folder” and have that folder and all of its contents pulled into a project.  If you’ve ever had to rebuild a project file and pull in a large number of files organized in many nested folders, you have a good idea of how painful an effort this can be.

If you’ve ever tried sharing the same code across multiple incompatible platforms, between Full and Compact Framework, or between Silverlight 3 and Full Framework, you’ve likely run into kludgey workarounds like placing multiple project files in the same folder and including the same set of files, or using a tool like Project Linker.

Reference management can also be unwieldy when you have many projects and references.  How do you ensure you’re not accidentally referencing two different versions of the same assembly from two different projects?  My article on Project Reference Oddness in VS2008, which explores the mysterious and indirect ways references work, is by far one of my most popular articles.  I’m guessing that’s because so many people can relate to the complexity and confusion of managing these dependencies.

“Projects” Are Conceptually Overloaded: Violating the Single Responsibility Principle

In perhaps the most important example, consider how multiple projects are packaged for deployment, such as what happens for the sake of debugging.  Which assemblies and other files are copied to the output directory before the program is executed?  The answer, discussed in my Project Reference Oddness article, is that it depends.  Files that are added to a project as “Content” don’t even become part of the assembly: they’re just passed through as a deployment command.

So what exactly is a Visual Studio “project”?  It’s all of these things:

  • A set of source code files that will get compiled, producing an assembly.
  • A set of files that get embedded in the resulting assembly as resources.
  • A set of deployment commands for loose files.
  • A set of deployment commands for referenced assemblies.

If a Visual Studio project were a class definition, we’d say it violated the Single Responsibility Principle.  It’s trying to be too many things: both a definition for an assembly as well as a set of deployment commands.  It’s this last goal that leads to all the confusion over references and deployment.

Let’s examine the reason for this.

A deployment definition is something that can span not only multiple assemblies, but also additional loose files.  In order to debug my application, I need assemblies A, B, and C, as well as some loose files, to be copied to the output directory.  Because there is no room for the deployment definition in the hierarchy visualized by Solution Explorer, however, I must somehow encode that information within the project definitions themselves.

If assembly A references B, then Visual Studio infers that the output of B needs to be copied to A’s output directory when A is built.  Since B references C, we can infer that the output of C needs to be copied to B’s output directory when B is built.  Indirectly, then, C’s output will get dumped in A’s output directory, along with B’s output.

What you end up with is a pipeline of files that shuffles things along from C to B to A.  Hopefully, if all the reference properties are set correctly, this works as intended and the result is good.  But the logic behind all of this is an implicit black box.  There’s no transparency, so when things get complicated and something goes wrong, it can become impossible to figure it out in a reasonable amount of time (try reading through verbose build output sometime).

At one point, just before writing the article on references mentioned above, I was spending 10 hours or more a week just fighting with reference dependencies.  It was a huge mess, and a very expensive way to accomplish absolutely nothing in terms of providing value to customers.

Deployments & Assemblies

Considering our new perspective on the importance of representing deployments as first-class organizational items in solutions, let’s take a look at what that might look like in an IDE.  Focus on the top-left of the screenshot below.

image

The first level of darker text (“Silverlight Client” and “Cloud Services”) are equivalent to “solution folders” in Visual Studio.  They’re labels that can be nested like folders for organizational purposes.  Within each of these areas is a collection of Deployment definitions.  The expanded deployment is for the Shell of our Silverlight application.  The only child of this deployment is a location.

In a desktop application, you might have multiple deployment locations, such as $AppDir$, $AppDir$\Data, or $UserDir$\AppName, each with child nodes representing content to be deployed to those locations.  In Silverlight, however, it doesn’t make sense to deploy to a specific folder since that’s abstracted away from you.  So for this example, the destination is Shell.XAP.

You’ll notice that multiple assemblies are listed.  If this were a web application, you might have a number of loose files as well, such as default.aspx or web.config.  If such files were listed under that deployment, you could double-click one to open and edit in the editor on the right-hand side of the screen.

The nice thing about this setup is the complete transparency: if a file is listed in a deployment path, you know if will be copied to the output directory before debugging begins.  If it’s not listed, it won’t get deployed.  It’s that simple.

The next question you might have is: doesn’t this mean that I have a lot of extra work to manually add each of these assembly files?  Especially when it comes to including the necessary references, nobody wants the additional burden of having to manually drag every needed reference into a deployment definition.

This is pretty easy to deal with.  When you add a reference to an assembly, and that referenced assembly isn’t in the .NET Framework (those are accessed via the GAC and therefore don’t need to be included), the IDE can add that assembly to the deployment definition for you.  Additionally, it would be helpful if all referenced assemblies lit up (with a secondary highlight color) when a referencing assembly was selected in the list.  That way, you’d be able to quickly figure out why each assembly was included in that deployment.  And if you select an assembly that requires a missing assembly, the name of any missing assemblies should appear in a general status area.

What we end up with is a more explicit and transparent way of dealing with deployment definitions separately from assembly definitions, a clean separation of concepts, and direct control over deployment behavior.  Because deployment intent is specified explicitly, this would be a great starting point for installer technologies to plug into the IDE.

In Visual Studio, a project maps many inputs to many outputs, and confuses deployment and assembly definitions.  A Visual Studio “project” is essentially an “input” concept.  In the approach I’ve outlined here, all definitions are “output” concepts; in other words, items in the proposed solution hierarchy are defined in terms of intended results.  It’s always a good idea to “begin with the end in mind” this way.

Multiple Solution Views

In the screenshot above, you’ll notice there’s a dropdown list called Solution View.  The current view is Deployment; the other option is Assembly.  The reason I’ve included two views is because the same assembly may appear in multiple deployments.  If what you want is a list of unique assemblies, that alternative view should be available.

A New Template System

The other redesign required is around the idea of Visual Studio templates.  Instead of solution, project, and project item templates in Visual Studio, you would have four template types: solution, deployment, assembly, and file.  Consider these examples:

Deployment Template: ASP.NET Web Application

  • $AppDir$
    • Assembly: MyWebApp.dll
      • App.xaml.cs
      • App.xaml    (embedded resource)
      • Main.xaml.cs
      • Main.xaml   (embedded resource)
    • File: Default.aspx
    • File: Web.config
    • Folder: App_Data
      • File: SampleData.dat

Solution Template: Silverlight Solution

  • Deployment: Silverlight Client
    • MySLApp.XAP
      • Assembly: MyClient.dll
        • App.xaml.cs
        • App.xaml    (embedded resource)
        • Main.xaml.cs
        • Main.xaml   (embedded resource)
  • Deployment: ASP.NET Web Application
    • $AppDir$
      • Assembly: MyWebApp.dll
        • YouGetTheIdea.cs
      • Folder: ClientBin
        • MySLApp.XAP (auto-copied from Deployment above)
      • File: Default.aspx
      • File: Web.config

Summary

In this article, we explored several features in modern IDEs (Visual Studio specifically), and some of the ways in which imaginative rethinking could bring substantial improvements to the developer experience.  I have to wonder how quickly a large ship like Visual Studio (with 1.5 million lines of mostly C++ code) could turn and adapt to new ideas like this, or whether it makes sense to start fresh without all the burden of legacy.

Though I have many more ideas to share, especially regarding the build system, multiple-language name resolution and refactoring, and IDE REPL tools, I will save all of that for future articles.

Posted in Cloud Computing, Development Environment, Silverlight, User Interface Design, Visual Studio, Windows Azure | Leave a Comment »

The Archetype Language (Part 5)

Posted by Dan Vanderboom on May 24, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Type Extensions

If you’re unfamiliar with extension methods in C# or other languages, this section might blow your mind a little bit.  If you love and use extension methods all the time, and don’t know what you’d do without it, my hope is that you enjoy the power that Archetype unleashes with robust type extensibility.

If you’re in the unfamiliar group, type extensions are a way of adding new members to existing types, regardless of whether those types were defined in the same assembly as the extensions or in a different assembly.  Contrary to what the name may suggest, no modification of the original type actually occurs; instead, the extensions are fed into Visual Studio’s language services, and Intellisense is updated to make it appear that those additional members are available for an instance of that type.  Are there methods you’d like to call on any string object?  With extension methods, you can add methods and use them as if they belonged to that class.  Here’s how it looks in C#:

public static class MyExtensions

{

public static void ShuffleLetters(this string Text)

{

// …

}

}

Here we can say var result = “Hello there”.ShuffleLetters(); and the dot triggers Intellisense to pop up and show our ShuffleLetters method.

Extension methods are great.  But if you really embrace them and start thinking in terms of opportunities for extensions, you’ll run into a few brick walls.  You see, extension methods are just a tease; they’re merely the tip of the iceberg, one isolated fragment of a larger (and seemingly happier) family.

First you begin to wish you could add a property instead of a method so you don’t have to put up with parentheses or a Get- prefix to force a property to look and behave like a method.  You might see distasteful expressions like cust.HasChanges(), and there’s nothing you can do about it.

Then you’ll be working with a static class, and you’ll wish you could add a static method, but you can’t add static members.  Eventually, you’ll run into that scenario where an additional operator or constructor would be the perfectly elegant way to solve the current problem.  But you’ll resign yourself to something kludgy instead.

So having gone down a similar road, I was more than a little frustrated when C# 4.0 was released with no new type extensibility at all.  This is one of the stimulants to my starting the Archetype language project: the crystallization of knowledge that no other language team was likely to evolve in the direction and with the priorities that I’ve been developing in my head over the years.  I’m beyond the point where I’m willing to just wait and see what happens.

C# has a clever approach to designating a method as an extension.  However, it’s somewhat indirect.  Instead of saying something like “extend” or “extension” near a class definition, they impose several syntactically-unrelated requirements:

  1. The method must be both public and static.
  2. The first method parameter’s type is the type to be extended.
  3. The first method parameter must be prefixed by the “this” keyword.  This hints that we’re adding an instance member.
  4. The class the method is defined in must be both public and static.

You would never guess these requirements, and it’s easy to get one of them wrong and skip a beat fixing it.  Now try to add extensions for properties, operators, constructors and finalizers, indexers, and possibly fields, and then throw in static members.  How will you designate all of these different things?  Lots more cleverness, I’d say, but it’s not likely to be syntactically scalable.

When we design syntax, it’s helpful to design a whole family of related capabilities together.  You can see Archetype’s approach in the following example:

Customer object

{

FirstName bindable string;

LastName bindable string;

 

this (FirstName, LastName) set all;

}

 

Customer extension

{

BirthDate bindable DateTime;

 

FullName bindable string

get composite FirstName " " LastName;

 

this (FirstName, LastName, BirthDate) set all;

 

static BuildCustomer Customer (FullName string)

{

var i = FullName.IndexOf(" ");

assert i >= 0;

return new Customer(FullName.Substring(0, i), FullName.Substring(i + 1));

}

}

 

There’s a lot happening here, so let’s go over what we see here one step at a time.

  • The original class we’re going to extend, Customer, is defined first.  Two bindable properties and a constructor, nothing more.
  • Creating a wrapper for a class extension doesn’t require that you remember several clever tricks.  Instead, you simply write “ClassTarget extension”, and everything within the structure is considered an extension of that type.
  • BirthDate is an extension property.
  • FullName is also an extension property, however the composite keyword, combined with references to FirstName and LastName, requires that the Archetype compiler look at the target class as well as the extension class to resolve identifiers.  The compiler must also wire the binding infrastructure so the target object stimulates the FullName property to update when either composite part does. Referenced members against the target class must be visible to an external class: private and protected members can’t be referenced from an extension, for example.
  • The this method is an instance constructor.  set all sets FirstName and LastName on the target object and BirthDate on the extension object.  Constructor methods don’t require a return type, as they are assumed to be the same as the type they’re defined in.
  • BuildCustomer is a static factory method.  The assert keyword is a way to define checkpoints to ensure that conditions are what they’re expected to be, which are especially valuable at the beginning and end of methods.  The basic idea is that you’d be able to define their behavior to throw an exception, log a message, or whatever you like when they’re violated: in debug mode, or in production code.  More about this construct in a future article.
  • Operators are also supported, but are not shown in this example.  To see how custom operators are created, see Part 7 of this series for a detailed explanation.

While extension methods are simple to implement in a compiler, extension fields are a little bit more complicated. In addition to this transformation, it is necessary to remove Dictionary entries that refer to objects that have been garbage collected, to prevent these dictionaries from growing uncontrollably. When one or more extension properties are used in an assembly, a background worker thread might occassionally check the keys in these dictionaries to see if any refer to garbage collected objects, and remove them.

The BirthDate property (with its internal storage field) in the above extension class is converted into something like this:

_BirthDates field Dictionary<Customer, DateTime>;

 

GetBirthDate DateTime (Customer Customer)

{

if (_BirthDates.ContainsKey(cust))

return _BirthDates[cust];

 

return default(DateTime);

}

 

SetBirthDate void (Customer Customer, BirthDate DateTime)

{

if (BirthDate == default(DateTime))

{

if (_BirthDates.ContainsKey(Customer))

_BirthDates.Remove(Customer);

}

else

{

if (!_BirthDates.ContainsKey(cust))

_BirthDates.Add(cust, BirthDate);

else

_BirthDates[cust] = BirthDate;

}

}

 

The BirthDate property would handle data binding and then call one of these two methods (or simply include their logic within the property get and set methods).  Another possibility is to instantiate the internally-named extension class and use that as the value in a dictionary.

A runtime mechanism tracks instances of extension objects and remove them from the dictionary periodically.  System.WeakReference provides the mechanism to do this.  It involves two WeakReferences per object: a short weak reference that becomes null and signals the need to cleanup, and a long weak reference to use as a key to the dictionary to clean up.  This mechanism would be loaded only when necessary, and some configuration on its cleanup behavior will be made available.

Static field and property storage would be easier to implement and wouldn’t require any cleanup.

There are a few members that may not make sense to provide as extensions.  Static constructors may provide some value to the extension class, but it wouldn’t be able to reach into the target object at static constructor runtime.

Once the wrinkles of implementation detail are worked out, this rich ecosystem of type extensions will open up clean and elegant solutions for adding missing parts from types that you know belong there.

Custom Control Structures

Every so often, I end up writing a function that behaves like a control structure with the block of statements passed in as a lambda function.  I’ve done this to spin up a thread to run code in (described in this article), and discussed what Parallel.For would look like as a custom control structure in my article discussing the future of programming languages.  My idea there was to define an extension method in such a way, with the delegate at the end, that the compiler would treat the delegate as a separate closure:

public static class Parallel

{

public static void For(long Start, long Count, Action Action)

{

// …

}

}

This is how you’d use it currently in C#:

Parallel.For(0, 10, () =>

{

// add code here for the Action delegate parameter

});

My proposal was to use it like this instead:

Parallel.For(0, 10)

{

// add code here for the Action delegate parameter

}

 

The point I made in that article is worth repeating.  First, a word from Anders at PDC08:

“Another interesting pattern that I’m very fond of right now in terms of language evolution is this notion that our static languages, and our programming languages in general, are getting to be powerful enough, that with all of these things we’re picking up from functional programming languages and metaprogramming, that you can–in the language itself–build these little internal DSLs, where you use fluent interface style, and you dot together operators, and you have deferred execution… where you can, in a sense, create little mini languages, except for the syntax.

If you look at parallel extensions for .NET, they have a Parallel.For, where you give the start and how many times you want to go around, and a lambda which is the body you want to execute.  And boy, if you squint, that looks like a Parallel For statement.

But it allows API designers to experiment with different styles of programming.  And then, as they become popular, we can pick them up and put syntactic veneers on top of them, or we can work to make languages maybe even richer and have extensible syntax like we talked about, but I’m encouraged by the fact that our languages have gotten rich enough that you do a lot of these things without even having to have syntax.” – Anders Hejlsberg

On one hand, I agree with him: the introduction of lambda expressions and extension methods can create some startling new syntax-like patterns of coding that simply weren’t feasible before.  I’ve written articles demonstrating some of this, such as New Spin on Spawning Threads and especially The Visitor Design Pattern in C# 3.0.  And he’s right: if you squint, it almost looks like new syntax.  The problem is that programmers don’t want to squint at their code.  As Chris Anderson has noted at the PDC and elsewhere, developers are very particular about how they want their code to look.  This is one of the big reasons behind Oslo’s support for authoring textual DSLs with the new MGrammar language [now called M].

Ruby and Groovy also support closures that are supplied external to the method arguments.

Where I originally suggested that a first or final delegate parameter should be automatically supported as an external closure, there are a couple reasons to be more explicit.  Consider the following Archetype syntax:

[Keyword]

fork<T>(items IEnumerable<T>, action Action<T> closure)

{

// create Task for each item

}

With this global function, I can write parallelized forking code as though it were a part of the Archetype language.  With the Keyword attribute, I can even colorize the function name as if it were a built-in keyword.

fork(customers)

{

// work with each customer

it.DoSomething();

}

But the main reason to explicitly mark it as a closure is so it can be treated as an embedded statement, such that keywords like return and break behave within the context of the containing scope.  In other words, if I use return within one of these closures, my intention is to return from the method that closure lives in, not to return out of the closure itself: that’s what the the break keyword is for.

The it keyword above assumes the role of the single object in the collection specified (customers).

This is starting to look pretty good, but the it keyword is a crutch if you think about it.  We only need it because we don’t have something nicer like the expression I introduced in Part 3 of this series when I talked about iterating with loop.  You may recall there were basically two formats for specifying the bounds and behavior of the loop.

// loop through and reference each object in an IEnumerable

loop (var cust in Customers)

{

}

// i starts at 11, decrements by 2, until it reaches (or passes) 1

loop (var i in 11..1 skip 2)

{

}

To make a custom control structure look and behave like one of the built-in variety, there must be a way to indicate in the parameter list that such an argument is required.  So let’s say that we introduce an iterator keyword to indicate we want to support either of the syntaxes shown above.

[Keyword]

fork<T>(items IEnumerable<T> iterator, action Action<T>)

{

// create Task for each item

}

This allows the function’s invocation to define an item identifier which is exposed to the following closure.  We could then very naturally write:

fork (var cust in customers)

{

// work with each customer

cust.DoSomething();

}

Now we can reference an identifier that makes sense to us, cust, and the whole thing looks like it’s baked into the language.  Viola!

There’s another kind of iterator that pertains to coroutines that I haven’t discussed yet, but in Archetype I call them streams, so there shouldn’t be too much confusion between them.

Named Closures

Let’s take this to the next level.  What if we wanted to extend our control structure with a second closure that would execute when all of the tasks that were forked had been completed or canceled?  This would complete the fork-join concurrency pattern.  Consider the following syntax:

[Keyword]

fork<T> void (items IEnumerable<T> iterator, ForkAction Action<T> closure,

JoinAction Action<TaskList<T>> closure as join)

{

// 1. schedule Task for each item

// 2. when all Tasks have been completed or canceled,

// 3. invoke JoinAction

}

Here is how we use it, taken from Part 3 in the series:

// fork out a bunch of parallel tasks and join when all are done

fork (var cust in Customers)

{

// this code is encapsulated in a task in the TPL

// and scheduled for execution

}

join (tasks)

{

// this code block is executed when all of the tasks

// are either completed or canceled

}

One thing I haven’t mentioned yet is Archetype’s support for optional parameters.  They work the same as in C# or VB.NET, and come in handy here.  By making the second closure parameter optional (by adding “ = null ” after "closure as join”), we can now use fork alone, or fork-join together, in a single function definition.  If we don’t enclose it in a specific namespace or class, it will look exactly like a language keyword.

Another example was brought to my attention (in the comments for article 2).  The idea is that a predicate is defined in a closure, syntactically separate from the argument list, inspired from the language Groovy.

print("trying some new syntax") { it.Length > 5 };

But let’s convert this one step at a time.  If we use Archetype’s concept of a named closure, we could insert if to make the intent more obvious:

print void (Text string, Predicate bool(string) closure as if)

{

if (Predicate(Text))

Console.WriteLine(Text);

}

print("trying some new syntax") { return it.Length > 5 };

The curly braces—though fantastic for multiple-statement blocks of logic, is overkill for simple expressions (and so is the return keyword).  Unless I uncover a good reason not to, I’m inclined to allow Archetype to trade the curly braces for parentheses for simple expressions.  We could then write this, which is what I was ultimately looking for.

print("trying some new syntax") if (ShouldPrint && it.Length > 5);

I introduced the ShouldPrint boolean variable here to illustrate that the conditions provided here don’t have to relate to it (and in most cases, probably would not).

You may be wondering if using if as the closure’s name would cause a problem with the if keyword in the Archetype language.  The reason it’s not a problem is that it appears between the method name on the left and the semicolon on the right.  After the semi-colon, the appearance of the if token would be mapped to the language keyword.

The first closure parameter of a method can be named or nameless, but all subsequent closures must be named.  I can’t think of a reason to limit the number of closure parameters other than too many is ridiculous, but I’ll leave that to the discretion of the Archetype developer.  Closure parameters can be defined with params for chaining together arbitrary numbers of named closures.  When calling a method with no parameters except closures, the parentheses after the function name can be omitted.  These methods will be exposed to other languages as having delegate types as well as Iterator and Closure attributes to identify those parameters.

Closure Extension Clauses

Another possibility arises when you consider that the if clause added to the print method could be defined in such a way that it could be applied to any method invocation.  I’m not advocating this particular pattern of placing the if condition after the statement to execute, but it will do for the sake of illustration.  This feature is purely speculative and I’d be curious to hear some feedback.

I could imagine writing a closure extension clause to solve the print problem more generally, something like this:

any (Predicate bool() closure as if)

{

if (Predicate())

any();

}

Then as long as it’s in scope, I could write code like this:

Start void ()

{

import System.Console;

 

var debug = true;

 

WriteLine("Debugging started") if (debug);

 

var name = ReadLine() if (!debug);

WriteLine("Hello " name "!") if (!debug);

}

One line deserves explanation: name is defined as a string variable regardless of the if clause; it is ReadLine which is executed or not based on the condition specified.

More interesting examples are possible if we leave the world of expression closures and consider multiple-statement closures.  One possibility involves adding an async closure extension clause, with the callback handled by the supplied closure.

Start void ()

{

import System.Console;

 

async FetchData()

{

// respond to callback

}

}

There are many possibilities here, as this opens up the doors to syntax experimentation without actually having to modify the language grammar itself.  This is very similar to macros in languages like Lisp and Nemerle.  On one hand, it’s more constrained to ensure that each extensibility point always conforms to a common set of structural principles, but the variety of structural extensibility points make it extremely versatile.

I have a few more ideas for language extensions (or “syntactic sugar shaping”) for other types of clauses (we only covered method invocation here), but I’m going to save that for another article.

Next Steps

This article took some long strides toward defining how Archetype handles type and language extensibility, positioning Archetype as an incredibly flexible and malleable tool with which to define syntactic patterns for solving entire classes of problems more intuitively and elegantly.

I created a CodePlex project for Archetype to give it a home.  Over the past few weeks, I’ve created a C# 4.0 parser using the M language to prepare me for the construction of Archetype’s parser and compiler.  The C# parser definition is available for download on the CodePlex site.  If you’re curious about how languages are parsed and projected into Abstract Syntax Trees (ASTs), download this and open it in Intellipad.  You can download Intellipad for free at this Microsoft site.

Once you’ve opened it in Intellipad, find where it says “M Mode” in the upper-right corner.  Click that, and select DSL Grammar Mode.  Then open the DSL menu and select “Split New Input and Output Views”.  You’ll see three window panes.  On the left, you can enter or paste in some C# code.  The middle contains the C# grammar.  On the right, you’ll see a graph of objects that represents how the parser views your code.  It’s pretty interesting to see what it comes up with!

image

My next steps involve cleaning up the Unicode character class issues in the grammar, and then getting it to build an object-graph in memory based on the M Graph (the contents of the right window pane).  After that, I can work on generating C# code from that AST.  I’ll end up with a C# to C# converter, which seems silly, but I’ll eventually fork the M grammar into its own Archetype grammar and start to change the parser to accomodate the new language.  The output will remain C#, since that’s easy to compile to assemblies with the csc.exe compiler, but the input will be the new goodness of Archetype.

Future articles will detail this work, as well as defining new corners of Archetype language syntax.

[Part 6 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 14 Comments »

The Archetype Language (Part 4)

Posted by Dan Vanderboom on May 8, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Conditional Selection

The if statement has been a classic across so many languages.  In Archetype it is almost identical to C# syntax.

if (expression)

statement;

 

if (expression)

{

// expression is true

}

else

{

// expression is false

}

When conditions become complicated, reversing all of the boolean logic can be tricky.  A common way of reversing it is to surround the expression in parentheses and placing a unary not operator before it.  With the required parentheses around the if statement’s condition, it looks like this:

 

if (!(expression))

statement;

In Archetype, the exclamation point can be placed before the parentheses.  It is the only part of the condition that can appear outside the parentheses.

if !(expression)

statement;

Pattern Matching

C-style languages have supported a language construct, switch-case, for providing access to simple jump tables combined with a syntax that is better suited than if for matching many conditions.  This has been unfortunately limited to matching against value types (and string in C#) and against constant values at that.  It’s unfortunate because the more concise syntax for multiple matching values is good in itself, not only when the matching values are constant values.  This constraint is due to the way those compilers build jump tables; it’s a performance optimization technique designed during a time when 8 MHz processors were considered fast.

Pattern matching is one area where functional languages have been very strong.  Archetype has a match keyword that serves the purpose.

match (text) "BEGIN" -> HandleBegin();

This first example matches a simple string to a constant value, and calls HandleBegin if there is a match.  You could write this using an if statement as well.  Here is the equivalent code:

if (text == "BEGIN") HandleBegin();

This next example illustrates several ideas.  There are multiple conditions and some of the conditions are grouped together (with the or operator, |) to share the same reaction code.  The condition or conditions are listed first, followed by the –> operator, and a statement or code block of statements on the right specifies the reaction code.  Also note the numeric range 6..10 and the use of non-constant values (such as x).  Any valid expression is allowed here as long as it’s type matches (or can implicitly cast from) the type of the term being evaluated (number).  It’s also worth mentioning that the | operator isn’t necessary before each set of conditions as it is in other functional languages.  (I’d rather align the left edge of code to the beginning of each condition.)

var x = 1;

var number = 4;

 

match (number)

{

x -> number += 3;

3 | 5 -> number++;

2 | 4 | 6..10 ->

{

number–;

Log("Numbers are getting too big");

}

}

Unlike switch, Archetype’s match doesn’t automatically fall through from one match to the next.  This feature is rarely used with switch and is a significant source of programming defects.  For the few scenarios where you’d like to execute every branch that matches, I’m considering a match all construct which would look like this:

match all (text)

{

"BEGIN" -> LetsBegin();

(Letter | Digit)* -> AddIdentifier(text);

}

In this example, both LetsBegin and AddIdentifier would be called.

Regular Expression Literals

Archetype also supports regular expression literals based on the syntax from Microsoft’s M language as a result of list syntax and operator overloads defined as a library.  In many scenarios, Archetype can determine the difference between string and regular expression literals.  However, in simple cases such as matching against a single character or simple string of characters, the variable defined will require the specification of the regex type.

var BeginToken regex = "BEGIN";

Without this qualifier, BeginToken would look like a string to the compiler.  To ease this problem, Archetype will convert a string to a regex object if the string participates in a regex-typed expression.  Here’s an example:

var BeginToken = "BEGIN";

var MyRegEx = BeginToken | ("A".."Z")*;

The range of letters and the * Kleene operator (which means repeat 0 or more times) identifies MyRegEx as a regex identifier.

Let’s take a look at how regular expressions and regular expression literals work with the match construct.  First we see the literal embedded directly in the match statement, as the only value to match against.

match (text) ("A".."Z" | "a".."z")* -> DoSomethingUseful();

Next, we’ll look at how we can define the regex objects and use their identifiers.  In this way, we can build up libraries of interdependent regular expressions and go even so far as to write sophisticated parsers.  This is an important tool for fulfilling the goal of language-oriented development.

 

var Letter = "A".."Z" | "a".."z";

var Digit = "0".."9";

 

match (text)

{

"BEGIN" ->

{

MarkBeginning();

NewTransaction();

}

 

(Letter | Digit)* -> AddIdentifier(text);

}

Finally, you can apply a when clause as a pattern matching guard similar to F#.

var Letter = "A".."Z" | "a".."z";

var Digit = "0".."9";

 

match (text)

{

"BEGIN" ->

{

MarkBeginning();

NewTransaction();

}

 

(Letter | Digit)* when (text.Length < 15) -> AddIdentifier(text);

}

This isn’t the last word on pattern matching or regular expressions in Archetype.  This is one area I expect to evolve and grow, and to appear in future articles.

Agents, Classes, and Traits

Archetype is a multi-paradigm language, as most commonly-used languages are today.  While it has many features which are functional, it’s heavily influenced by object-oriented design ideas.  Most object-oriented languages are largely imperative rather than functional.  That is to say, it is “programming by side-effects” rather than the goal in functional programming of “no side-effects” (or as few as possible).

Functional programming has grown in popularity relatively recently, considering it’s been around from the beginning of high-level language design.  However, it suffers in some areas such as representing stateful behavior in user interfaces.  Some clever solutions have been devised (such as the use of monads to trick or fake the logic into representing state in a purely functional way), but the theory and application of these patterns are far from intuitive.  I believe this is largely the reason why functional programming languages have been the niche speciality of scientists and mathematicians and not your every day developer. 

Because of these contentious forces, Archetype is aimed at being a transitional language: urging us forward in the use of functional patterns, but without abandoning the imperative style of “programming by side-effects”, and striving to look familiar to programmers of imperative languages such as C#, Visual Basic, etc.

Software Agents and the Actor Model

There are some built-in Archetype constructs that will help to make object-oriented programming safer.  While it doesn’t propose, like Axum (previously code-named Maestro), to prevent any logic that is unsafe in a concurrent execution environment, it does provide some simple but powerful tools that can be used to reduce the risk considerably.

I’m referring primarily to agents which support the Software Agent or Actor Model of parallel program design.  Agents are special classes that run independently (in parallel) of each other, and can only communicate with other agents through messages (introduced in part 3).  Specifically, agents are not allowed to call the methods or subscribe to the delegate members of other agents.  Since each agent runs without the ability to receive from or give execution control to other agents, there is much smaller chance of coordination problems while executing concurrently.

In every other way, however, agents are defined and composed just like classes.  First, we’ll take a look at a simple Customer class.

Customer object, IDisposable

{

FirstName string;

LastName string;

 

this(FirstName, LastName)

set all;

 

Dispose()

{

// clean up

}

}

The class name appears first, followed by a required base type (object here), and a list of interfaces separated by commas.

Archetype supports single-class inheritance with the ability to implement any number of interfaces (or traits, more about that later in this article), as well as any generic type parameters and generic type constraints that you’re used to.

We see some new things here, however.  The instance constructor is called this, and the set keyword is used to set the values of class members where they match constructor parameters of the same name.  This is the same as writing these lines:

this.FirstName = FirstName;

this.LastName = LastName;

With many parameters in a constructor (or another method), this can save many lines of typing.  If you only want to store some of the parameters in properties, you can use a comma separated list: set FirstName, LastName.  If your intent is to set all parameters, you can use the abbreviated set all instead, as shown in the example.  When set all is used, specifying parameter types is optional.

The following code provides an example of two agents that cooperate with each other.

WebDataAgent agent

{

Subscriptions Dictionary<string, List<guid>>;

 

this()

{

// initialize the agent

Subscriptions = Dictionary<string, List<guid>>();

}

 

Subscribe in message(Topic string);

{

if !(Topic in Subscriptions.Keys)

Subscriptions.Add(Topic, new List<guid>);

 

Subscriptions[Topic].Add(me.Client);

 

// confirm the subscription

SubscriptionConfirmed(me.Message);

}

 

SubscriptionConfirmed out message(RequestID guid);

 

PublishMessage in message(Topic string, Message string)

{

loop (var sub in Subscriptions[Topic])

{

MessagePublished(Topic, Message);

}

}

 

MessagePublished out message(Topic string, Message string);

}

UserInterfaceAgent agent

{

CurrentView IView;

 

this(StartView IView)

CurrentView = StartView;

 

RequestData out message(RequestID guid, Method string);

 

DataReceived in message(RequestID guid, Result List<double>)

{

// handle incoming message…

 

// unhook this message handler

DataReceived -= me;

}

}

I have more ideas for actor-based programming (such as a built-in RequestID: me.id), but I want to start simple and force myself to work hard to justify any overhead.

Traits

Composing classes together to obtain maximum reuse of code has been a goal of object-oriented programming for a long time, but it usually falls short of the ideal.  Languages like C++ that support multiple inheritance are unwieldy due to the additional complexity (see the Diamond Problem), and single inheritance—though sufficient in most scenarios—suffers from limitations that have bothered OOP programmers from the beginning of programming time.  Other languages have introduced constructs like Flavors and Mixins, and each has had to deal with its own peculiarities and workarounds.  While an in-depth discussion of traits and their advantages over other approaches is beyond the scope of this article, a smart group at the OGI School of Science & Engineering published a paper that illustrates the issues clearly.  In it, they explain how traits solve many of the problems while avoiding the pitfalls of other approaches.

I found this characterization to be particularly lucid (the bold emphasis is mine):

Although multiple inheritance makes it possible to reuse any desired set of classes, a class is frequently not the most appropriate element to reuse.  This is because classes play two competing roles.  A class has a primary role as a generator of instances: it must therefore be complete.  But as a unit of reuse, a class should be small.  These properties often conflict.  Furthermore, the role of classes as instance generators requires that each class have a unique place in the class hierarchy, whereas units of reuse should be applicable at arbitrary places.

– Nathaneal Scharli et al, in their 2003 paper entitled “Traits: Composable Units of Behavior

The basic idea is that a trait defines a set of functions but no state.  Multiple traits are pulled into a class, where they are “flattened”.  This means that each trait’s functions are added to the class as if those functions were defined directly in the class.  That is, you don’t need to use a member access operator (.) to navigate from the class to the trait and then to the trait’s function.  In doing this, it’s possible for function names and signatures to overlap among traits.  If the name is the same but the signature is different, they’re applied as overloads.  When there’s an actual clash, a conflict-resolution expression is defined to specify the function to use (or ignore).

Although the design of traits involve the lack of any state, Archetype may attempt to include trait-local state.  That is, variables that are visible to the functions of that trait, but which can’t be seen from the hosting class or any other trait.  (This corresponds to the idea of extension properties, which I’ll discuss in the next article.)

This is an experimental area of the Archetype language, one that will likely change several times before getting right.  Here’s an example of what it will probably look like to compose classes out of traits.

Serializable<T> trait

{

provide Serialize string (obj T) { … }

provide Deserialize T (input string) { … }

}

 

Persistent<T> trait

where T : ref

{

// by function

require Serialize string ();

require Deserialize T (input:string);

// or by trait

require Serializable<T>;

 

provide HasChanges bool

get, private set;

 

provide static Load T (id: uid) { … }

provide Save void () { … }

}

 

Customer object, Serializable, Persistent

{

FirstName string;

LastName string;

 

FullName string

get FirstName " " LastName;

}

 

Start void ()

{

var cust = Customer.Load(123);

 

cust.FirstName = "Dan";

cust.LastName = "Vanderboom";

 

cust.Save();

}

A few notes about the code:

  • The type parameter on the trait allows you to constrain the types of object to which the trait can be applied.
  • The where T : ref is the same as where T : class in C#.
  • The require and provide keywords specify the methods that trait requires to be present, or provides to the host class.  Archetype may also support a require of an entire trait, which would act analogously to subtyping (or rather, more like an #include).
  • Conflict resolution expressions aren’t shown because their syntax hasn’t yet been decided.
  • Code formatting applies an italic font to traits, but maintains the color of a user-defined type.  This should help to avoid confusing traits with classes.  This is only one possible solution, but it suggests the use of multiple ways of categorizing identifiers to apply a mixture of formatting and colorizing behaviors.
  • This is not a great example of the strength of traits.

I’ve also had some design ideas for runtime mixing of traits into classes, while still accessing everything through strongly-typed variables (think of traits as interfaces), but this will require much more exploration.

Another idea to support some Aspect Oriented feature.  Imagine if you could define a trait called Bindable, that when added to a class, would add the bindable type extension modifier to all of its properties.

For better examples of traits and class composition using traits, I recommend reading the above-mentioned paper.

Next Steps

In this article, I covered simple conditional statements as well as functional-style pattern matching.  We also looked at agent-based programming based on loosely-coupled messages, which provides greater safety in parallel programming scenarios, traits as a way to compose features on a more granular level and to solve the composition problems that plague single-inheritance languages.

My next article will cover extension of types (extension methods, properties, events, indexers, constructors, and operators), as well as the first of language extensibility options (defining new control structures).  I will probably dip out of sight for a few weeks as I get further along in building the parser and compiler, and learn about Visual Studio’s Managed Language Services.

If you don’t already follow me on twitter (@danvanderboom), I do a lot of tweeting about what I’m reading, researching, or considering during the language design process, so this is a good way to get an inside look at that process.

[Part 5 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 7 Comments »