Critical Development

Language design, framework development, UI design, robotics and more.

The Archetype Language (Part 5)

Posted by Dan Vanderboom on May 24, 2010


This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Type Extensions

If you’re unfamiliar with extension methods in C# or other languages, this section might blow your mind a little bit.  If you love and use extension methods all the time, and don’t know what you’d do without it, my hope is that you enjoy the power that Archetype unleashes with robust type extensibility.

If you’re in the unfamiliar group, type extensions are a way of adding new members to existing types, regardless of whether those types were defined in the same assembly as the extensions or in a different assembly.  Contrary to what the name may suggest, no modification of the original type actually occurs; instead, the extensions are fed into Visual Studio’s language services, and Intellisense is updated to make it appear that those additional members are available for an instance of that type.  Are there methods you’d like to call on any string object?  With extension methods, you can add methods and use them as if they belonged to that class.  Here’s how it looks in C#:

public static class MyExtensions


public static void ShuffleLetters(this string Text)


// …



Here we can say var result = “Hello there”.ShuffleLetters(); and the dot triggers Intellisense to pop up and show our ShuffleLetters method.

Extension methods are great.  But if you really embrace them and start thinking in terms of opportunities for extensions, you’ll run into a few brick walls.  You see, extension methods are just a tease; they’re merely the tip of the iceberg, one isolated fragment of a larger (and seemingly happier) family.

First you begin to wish you could add a property instead of a method so you don’t have to put up with parentheses or a Get- prefix to force a property to look and behave like a method.  You might see distasteful expressions like cust.HasChanges(), and there’s nothing you can do about it.

Then you’ll be working with a static class, and you’ll wish you could add a static method, but you can’t add static members.  Eventually, you’ll run into that scenario where an additional operator or constructor would be the perfectly elegant way to solve the current problem.  But you’ll resign yourself to something kludgy instead.

So having gone down a similar road, I was more than a little frustrated when C# 4.0 was released with no new type extensibility at all.  This is one of the stimulants to my starting the Archetype language project: the crystallization of knowledge that no other language team was likely to evolve in the direction and with the priorities that I’ve been developing in my head over the years.  I’m beyond the point where I’m willing to just wait and see what happens.

C# has a clever approach to designating a method as an extension.  However, it’s somewhat indirect.  Instead of saying something like “extend” or “extension” near a class definition, they impose several syntactically-unrelated requirements:

  1. The method must be both public and static.
  2. The first method parameter’s type is the type to be extended.
  3. The first method parameter must be prefixed by the “this” keyword.  This hints that we’re adding an instance member.
  4. The class the method is defined in must be both public and static.

You would never guess these requirements, and it’s easy to get one of them wrong and skip a beat fixing it.  Now try to add extensions for properties, operators, constructors and finalizers, indexers, and possibly fields, and then throw in static members.  How will you designate all of these different things?  Lots more cleverness, I’d say, but it’s not likely to be syntactically scalable.

When we design syntax, it’s helpful to design a whole family of related capabilities together.  You can see Archetype’s approach in the following example:

Customer object


FirstName bindable string;

LastName bindable string;


this (FirstName, LastName) set all;



Customer extension


BirthDate bindable DateTime;


FullName bindable string

get composite FirstName " " LastName;


this (FirstName, LastName, BirthDate) set all;


static BuildCustomer Customer (FullName string)


var i = FullName.IndexOf(" ");

assert i >= 0;

return new Customer(FullName.Substring(0, i), FullName.Substring(i + 1));




There’s a lot happening here, so let’s go over what we see here one step at a time.

  • The original class we’re going to extend, Customer, is defined first.  Two bindable properties and a constructor, nothing more.
  • Creating a wrapper for a class extension doesn’t require that you remember several clever tricks.  Instead, you simply write “ClassTarget extension”, and everything within the structure is considered an extension of that type.
  • BirthDate is an extension property.
  • FullName is also an extension property, however the composite keyword, combined with references to FirstName and LastName, requires that the Archetype compiler look at the target class as well as the extension class to resolve identifiers.  The compiler must also wire the binding infrastructure so the target object stimulates the FullName property to update when either composite part does. Referenced members against the target class must be visible to an external class: private and protected members can’t be referenced from an extension, for example.
  • The this method is an instance constructor.  set all sets FirstName and LastName on the target object and BirthDate on the extension object.  Constructor methods don’t require a return type, as they are assumed to be the same as the type they’re defined in.
  • BuildCustomer is a static factory method.  The assert keyword is a way to define checkpoints to ensure that conditions are what they’re expected to be, which are especially valuable at the beginning and end of methods.  The basic idea is that you’d be able to define their behavior to throw an exception, log a message, or whatever you like when they’re violated: in debug mode, or in production code.  More about this construct in a future article.
  • Operators are also supported, but are not shown in this example.  To see how custom operators are created, see Part 7 of this series for a detailed explanation.

While extension methods are simple to implement in a compiler, extension fields are a little bit more complicated. In addition to this transformation, it is necessary to remove Dictionary entries that refer to objects that have been garbage collected, to prevent these dictionaries from growing uncontrollably. When one or more extension properties are used in an assembly, a background worker thread might occassionally check the keys in these dictionaries to see if any refer to garbage collected objects, and remove them.

The BirthDate property (with its internal storage field) in the above extension class is converted into something like this:

_BirthDates field Dictionary<Customer, DateTime>;


GetBirthDate DateTime (Customer Customer)


if (_BirthDates.ContainsKey(cust))

return _BirthDates[cust];


return default(DateTime);



SetBirthDate void (Customer Customer, BirthDate DateTime)


if (BirthDate == default(DateTime))


if (_BirthDates.ContainsKey(Customer))





if (!_BirthDates.ContainsKey(cust))

_BirthDates.Add(cust, BirthDate);


_BirthDates[cust] = BirthDate;




The BirthDate property would handle data binding and then call one of these two methods (or simply include their logic within the property get and set methods).  Another possibility is to instantiate the internally-named extension class and use that as the value in a dictionary.

A runtime mechanism tracks instances of extension objects and remove them from the dictionary periodically.  System.WeakReference provides the mechanism to do this.  It involves two WeakReferences per object: a short weak reference that becomes null and signals the need to cleanup, and a long weak reference to use as a key to the dictionary to clean up.  This mechanism would be loaded only when necessary, and some configuration on its cleanup behavior will be made available.

Static field and property storage would be easier to implement and wouldn’t require any cleanup.

There are a few members that may not make sense to provide as extensions.  Static constructors may provide some value to the extension class, but it wouldn’t be able to reach into the target object at static constructor runtime.

Once the wrinkles of implementation detail are worked out, this rich ecosystem of type extensions will open up clean and elegant solutions for adding missing parts from types that you know belong there.

Custom Control Structures

Every so often, I end up writing a function that behaves like a control structure with the block of statements passed in as a lambda function.  I’ve done this to spin up a thread to run code in (described in this article), and discussed what Parallel.For would look like as a custom control structure in my article discussing the future of programming languages.  My idea there was to define an extension method in such a way, with the delegate at the end, that the compiler would treat the delegate as a separate closure:

public static class Parallel


public static void For(long Start, long Count, Action Action)


// …



This is how you’d use it currently in C#:

Parallel.For(0, 10, () =>


// add code here for the Action delegate parameter


My proposal was to use it like this instead:

Parallel.For(0, 10)


// add code here for the Action delegate parameter



The point I made in that article is worth repeating.  First, a word from Anders at PDC08:

“Another interesting pattern that I’m very fond of right now in terms of language evolution is this notion that our static languages, and our programming languages in general, are getting to be powerful enough, that with all of these things we’re picking up from functional programming languages and metaprogramming, that you can–in the language itself–build these little internal DSLs, where you use fluent interface style, and you dot together operators, and you have deferred execution… where you can, in a sense, create little mini languages, except for the syntax.

If you look at parallel extensions for .NET, they have a Parallel.For, where you give the start and how many times you want to go around, and a lambda which is the body you want to execute.  And boy, if you squint, that looks like a Parallel For statement.

But it allows API designers to experiment with different styles of programming.  And then, as they become popular, we can pick them up and put syntactic veneers on top of them, or we can work to make languages maybe even richer and have extensible syntax like we talked about, but I’m encouraged by the fact that our languages have gotten rich enough that you do a lot of these things without even having to have syntax.” – Anders Hejlsberg

On one hand, I agree with him: the introduction of lambda expressions and extension methods can create some startling new syntax-like patterns of coding that simply weren’t feasible before.  I’ve written articles demonstrating some of this, such as New Spin on Spawning Threads and especially The Visitor Design Pattern in C# 3.0.  And he’s right: if you squint, it almost looks like new syntax.  The problem is that programmers don’t want to squint at their code.  As Chris Anderson has noted at the PDC and elsewhere, developers are very particular about how they want their code to look.  This is one of the big reasons behind Oslo’s support for authoring textual DSLs with the new MGrammar language [now called M].

Ruby and Groovy also support closures that are supplied external to the method arguments.

Where I originally suggested that a first or final delegate parameter should be automatically supported as an external closure, there are a couple reasons to be more explicit.  Consider the following Archetype syntax:


fork<T>(items IEnumerable<T>, action Action<T> closure)


// create Task for each item


With this global function, I can write parallelized forking code as though it were a part of the Archetype language.  With the Keyword attribute, I can even colorize the function name as if it were a built-in keyword.



// work with each customer



But the main reason to explicitly mark it as a closure is so it can be treated as an embedded statement, such that keywords like return and break behave within the context of the containing scope.  In other words, if I use return within one of these closures, my intention is to return from the method that closure lives in, not to return out of the closure itself: that’s what the the break keyword is for.

The it keyword above assumes the role of the single object in the collection specified (customers).

This is starting to look pretty good, but the it keyword is a crutch if you think about it.  We only need it because we don’t have something nicer like the expression I introduced in Part 3 of this series when I talked about iterating with loop.  You may recall there were basically two formats for specifying the bounds and behavior of the loop.

// loop through and reference each object in an IEnumerable

loop (var cust in Customers)



// i starts at 11, decrements by 2, until it reaches (or passes) 1

loop (var i in 11..1 skip 2)



To make a custom control structure look and behave like one of the built-in variety, there must be a way to indicate in the parameter list that such an argument is required.  So let’s say that we introduce an iterator keyword to indicate we want to support either of the syntaxes shown above.


fork<T>(items IEnumerable<T> iterator, action Action<T>)


// create Task for each item


This allows the function’s invocation to define an item identifier which is exposed to the following closure.  We could then very naturally write:

fork (var cust in customers)


// work with each customer



Now we can reference an identifier that makes sense to us, cust, and the whole thing looks like it’s baked into the language.  Viola!

There’s another kind of iterator that pertains to coroutines that I haven’t discussed yet, but in Archetype I call them streams, so there shouldn’t be too much confusion between them.

Named Closures

Let’s take this to the next level.  What if we wanted to extend our control structure with a second closure that would execute when all of the tasks that were forked had been completed or canceled?  This would complete the fork-join concurrency pattern.  Consider the following syntax:


fork<T> void (items IEnumerable<T> iterator, ForkAction Action<T> closure,

JoinAction Action<TaskList<T>> closure as join)


// 1. schedule Task for each item

// 2. when all Tasks have been completed or canceled,

// 3. invoke JoinAction


Here is how we use it, taken from Part 3 in the series:

// fork out a bunch of parallel tasks and join when all are done

fork (var cust in Customers)


// this code is encapsulated in a task in the TPL

// and scheduled for execution


join (tasks)


// this code block is executed when all of the tasks

// are either completed or canceled


One thing I haven’t mentioned yet is Archetype’s support for optional parameters.  They work the same as in C# or VB.NET, and come in handy here.  By making the second closure parameter optional (by adding “ = null ” after "closure as join”), we can now use fork alone, or fork-join together, in a single function definition.  If we don’t enclose it in a specific namespace or class, it will look exactly like a language keyword.

Another example was brought to my attention (in the comments for article 2).  The idea is that a predicate is defined in a closure, syntactically separate from the argument list, inspired from the language Groovy.

print("trying some new syntax") { it.Length > 5 };

But let’s convert this one step at a time.  If we use Archetype’s concept of a named closure, we could insert if to make the intent more obvious:

print void (Text string, Predicate bool(string) closure as if)


if (Predicate(Text))



print("trying some new syntax") { return it.Length > 5 };

The curly braces—though fantastic for multiple-statement blocks of logic, is overkill for simple expressions (and so is the return keyword).  Unless I uncover a good reason not to, I’m inclined to allow Archetype to trade the curly braces for parentheses for simple expressions.  We could then write this, which is what I was ultimately looking for.

print("trying some new syntax") if (ShouldPrint && it.Length > 5);

I introduced the ShouldPrint boolean variable here to illustrate that the conditions provided here don’t have to relate to it (and in most cases, probably would not).

You may be wondering if using if as the closure’s name would cause a problem with the if keyword in the Archetype language.  The reason it’s not a problem is that it appears between the method name on the left and the semicolon on the right.  After the semi-colon, the appearance of the if token would be mapped to the language keyword.

The first closure parameter of a method can be named or nameless, but all subsequent closures must be named.  I can’t think of a reason to limit the number of closure parameters other than too many is ridiculous, but I’ll leave that to the discretion of the Archetype developer.  Closure parameters can be defined with params for chaining together arbitrary numbers of named closures.  When calling a method with no parameters except closures, the parentheses after the function name can be omitted.  These methods will be exposed to other languages as having delegate types as well as Iterator and Closure attributes to identify those parameters.

Closure Extension Clauses

Another possibility arises when you consider that the if clause added to the print method could be defined in such a way that it could be applied to any method invocation.  I’m not advocating this particular pattern of placing the if condition after the statement to execute, but it will do for the sake of illustration.  This feature is purely speculative and I’d be curious to hear some feedback.

I could imagine writing a closure extension clause to solve the print problem more generally, something like this:

any (Predicate bool() closure as if)


if (Predicate())



Then as long as it’s in scope, I could write code like this:

Start void ()


import System.Console;


var debug = true;


WriteLine("Debugging started") if (debug);


var name = ReadLine() if (!debug);

WriteLine("Hello " name "!") if (!debug);


One line deserves explanation: name is defined as a string variable regardless of the if clause; it is ReadLine which is executed or not based on the condition specified.

More interesting examples are possible if we leave the world of expression closures and consider multiple-statement closures.  One possibility involves adding an async closure extension clause, with the callback handled by the supplied closure.

Start void ()


import System.Console;


async FetchData()


// respond to callback



There are many possibilities here, as this opens up the doors to syntax experimentation without actually having to modify the language grammar itself.  This is very similar to macros in languages like Lisp and Nemerle.  On one hand, it’s more constrained to ensure that each extensibility point always conforms to a common set of structural principles, but the variety of structural extensibility points make it extremely versatile.

I have a few more ideas for language extensions (or “syntactic sugar shaping”) for other types of clauses (we only covered method invocation here), but I’m going to save that for another article.

Next Steps

This article took some long strides toward defining how Archetype handles type and language extensibility, positioning Archetype as an incredibly flexible and malleable tool with which to define syntactic patterns for solving entire classes of problems more intuitively and elegantly.

I created a CodePlex project for Archetype to give it a home.  Over the past few weeks, I’ve created a C# 4.0 parser using the M language to prepare me for the construction of Archetype’s parser and compiler.  The C# parser definition is available for download on the CodePlex site.  If you’re curious about how languages are parsed and projected into Abstract Syntax Trees (ASTs), download this and open it in Intellipad.  You can download Intellipad for free at this Microsoft site.

Once you’ve opened it in Intellipad, find where it says “M Mode” in the upper-right corner.  Click that, and select DSL Grammar Mode.  Then open the DSL menu and select “Split New Input and Output Views”.  You’ll see three window panes.  On the left, you can enter or paste in some C# code.  The middle contains the C# grammar.  On the right, you’ll see a graph of objects that represents how the parser views your code.  It’s pretty interesting to see what it comes up with!


My next steps involve cleaning up the Unicode character class issues in the grammar, and then getting it to build an object-graph in memory based on the M Graph (the contents of the right window pane).  After that, I can work on generating C# code from that AST.  I’ll end up with a C# to C# converter, which seems silly, but I’ll eventually fork the M grammar into its own Archetype grammar and start to change the parser to accomodate the new language.  The output will remain C#, since that’s easy to compile to assemblies with the csc.exe compiler, but the input will be the new goodness of Archetype.

Future articles will detail this work, as well as defining new corners of Archetype language syntax.

[Part 6 of this series can be found here.]

14 Responses to “The Archetype Language (Part 5)”

  1. Continuing discussion from part 2…

    I like the curlies instead of yet another layer of parentheses for single-statement functions, but what about scenarios where multiple statements are needed, as in:

    (name, age) =>
    Console.WriteLine(name ” is ” age);

    name, age =>
    Console.WriteLine(name ” is ” age);

    You’re right. Multiline closures are ugly with that syntax. I think they are still better though. The help avoiding this: ));})};


    Console.WriteLine(name ” is ” age);

    Actually, I do not like ASCII art like “=>”. On the other hand a lambda is a projection and therefore an arrow fits, just not in case of () => …, but that you said you’ll omit anyway.

    • Dan Vanderboom said

      The => operator has grown on me in C#, but technically I don’t think it’s needed. Defining an anonymous method without it is an interesting idea, and doesn’t seem to hurt readability (or parse-ability).

      Multiple statements:

      (name, age)
      Console.WriteLine(name ” is ” age “.”);
      Console.WriteLine(“Happy birthday!”);

      Or a single-statement:

      (name, age) { Console.WriteLine(name ” is ” age “.”); }

      I could get used to this as well. I’m wondering if this wouldn’t actually make more sense, since it’s more consistent with normal function syntax.

      With no parameters, () could be omitted:

      { Console.WriteLine(“Hello!”); }

      I’m really liking this. Time to make some adjustments to my parser.

      • C# allows {} as a statement just for scoping, right? but who needs that? 🙂
        {..} is just an expression of .. with implicit convertion to a closure.. ?
        and (){..} is a function expression hence convertible to a closure…

      • Dan Vanderboom said


        There are differences, though. In a function closure, return exits the function. In embedded statement closures, which appear after control structures like for, foreach, switch, etc., return exits from the function and not from the immediate closure.

  2. just reminding about “names first”
    extension class Xy
    ‘Xy’ is far from first here 🙂

    What about class Xy+{...} 🙂

    • Dan Vanderboom said

      I intended the name-first policy to apply to all type members, not all elements of the language. It would look odd indeed to see this:

      System import;
      System.Diagnostics import;

      How does it look for a type definition, though?

      Customer : class

      That’s actually pretty good.

      An + extension symbol attached to the class name is a little cryptic. The fact that the following type definition is an extension and not a standalone class is a big deal, and deserves a stronger visual cue.

      CustomerEx : Customer extension

      This looks decent. But you have to consider how it will look with modifiers and added interfaces at the same time.

      public class StringExtension : IEncryptedString
      public Customer : class inherits ModelObject, INotifyPropertyChanged
      Customer : public class of ModelObject, INotifyPropertyChanged

      The first option is familiar and comfortable. The second is a little painful with a verbose “inherits” keyword (like Java “extends”). In the third, the “of” keyword is nice and short, but “public” is probably better off preceding the Customer identifier than fitting into an already-crowded right-hand side (of the : colon).

      This is what I like best so far:

      public Item : class, INotifyPropertyChanged
      public Customer : ModelObject class, INotifyPropertyChanged
      public CustomerEx : Customer extension, IPersistent

      • Does an extension-class need a name?

        Customer : extension class {} would be enougth, wouldn’t it?

        What do you need “class” for, when you derive from a class?
        public Customer : class, INotifiedPropertyChanged
        public Customer : ModelObject, INotifiedPropertyChanged
        public Customer : extension, IPersistent

        agree with ‘public’ preceding the class name…

      • Dan Vanderboom said


        As soon as I finished writing my reply favoring this syntax:

        public Customer : class, INotifiedPropertyChanged

        … I realized that the word “class” could be eliminated from the language altogether. Since it sits in the position of the base class identifier, “object” could be used instead. Then I came up with the idea of using “value” in the same position to define value types (structs).

        public Customer : object, INotifyPropertyChanged
        public Vector2D : value

        All of the “base types” would then be: value, object, variant, trait, and agent.

        I’d actually rather not name extension classes, though I wonder if an inability to reference the extension type will cause problems elsewhere. I’m going to go with your suggestion unless I come across a contraindication down the road.

      • Agree, ‘object’ is fine, had a tought on it, too.
        ‘class’ and ‘struct’ are again just better known.

        Just a thought: ‘mutable’ and ‘immutable’

      • Dan Vanderboom said

        @lars, I gave serious consideration to making state members immutable by default, but I thought that might be going too far in pushing unfamiliar approaches. There are many situations where immutability is a powerful tool for those who know how to use it, so I want support for it in the language, just not as a default. Virtual is default, public will likely be default, and instance of course is default. I even considered making all value types nullable by default unless overridden, with the justification that it would help to align .NET types with database types, but decided it wasn’t worth the additional runtime cost for all state.

  3. I don’t like the “static” before names…

    For static classes, the static could just be ommitted, right?

    And for other classes, what about a static-block? Mixing static and instance members is unreadable anyway.

    MyClass : class {
    static {
    field : int;
    Method : BuildCustomer(name:String) {..}

    What do you think?

    • Dan Vanderboom said


      What I think I hear you saying is that once the static keyword has been applied to a class, it shouldn’t be necessary to also mark every member of the class as static, as is the case with C#. If I’m interpreting that correctly, then I agree with you completely. And a static block is a fantastic idea! That seems like a great way to keep the static portion of a class, which has additional constraints and behaves like a different animal, from mixing equally with instance members. This is precisely the kind of visual cue that I think will help human brains more rapidly decipher the code they’re working with.

      The question now is where to put constants. Should they be forced into static blocks or allowed to mingle with instance members?

      Thanks for all the suggestions!

      • Exactly! 🙂

        You could do an optional public-block, too. But this might get hairy 😦

        Customer : object {
          static {
            public InstanceCount : int get _lastInstanceNo;
          public {
              _instanceId = _lastInstanceNo++;

        Is a protected/public member virtual by default, btw?

      • Consts are ugly 🙂 And they have problems people forget – I think they are inlined from referencing assemblies at compile time, right?

        Since consts are static by default, allthough it doesn’t hurt to put static on them, right?) I think I would have them put in the static block.. A const block ist too much. So it would be:

          static {
            const ID_PREFIX : string = "cst-";

        Or, if there is an static block they have to be written inside. Else they can be written at instance member level.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: