Critical Development

Language design, framework development, UI design, robotics and more.

Archive for the ‘Archetype Language’ Category

The Archetype Language (Part 9)

Posted by Dan Vanderboom on October 3, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Params & Fluent Syntax

C# has a parameter modifier called params that allows you to supply additional function arguments to populate a single array parameter.

void Display(params string[] Names)
{
   
// …
}

Without the params modifier, we’d have to call it like this:

Display(new string[] { "Dan", "Josa", "Sarah" });

Because params is declared, we can do this instead:

Display("Dan", "Josa", "Sarah");

If there’s one thing you can take away from Archetype’s design, it’s that syntactic sugar is everything.  After examining my own procedural animation library (Animate.NET) to see how it could be used best in Archetype, I came to the conclusion that these params parameters can be substantial.  When they are, they create syntactic unpleasantries, especially when nested structures are involved.

Consider the following C# example.

var anim =

Animate.Wait(0.2.seconds(),

RedChip.MoveBy(0.4.seconds(), -40, 0),

RedChip.FadeIn(0.2.seconds()),

 

BlackChip.MoveBy(0.4.seconds(), 0, 40),

BlackChip.FadeOut(0.4.seconds())

)

.WhenComplete(a =>

{

MainStage.Children.Remove(RedChip);

MainStage.Children.Remove(BlackChip);

})

.Begin();


First, a quick explanation of the code.  Animate is a static class, and the Wait function returns an object called GroupAnimation that inherits from Animation.  After a 0.2 second wait, the following params list of Animation objects will execute.  RedChip and BlackChip are FrameworkElements (Silverlight/WPF objects), and animation commands such as MoveBy and FadeOut are extension methods on FrameworkElement.  Each of these animation commands returns an Animation-derived object.  The seconds() extension method on int and float types convert to TimeSpan objects.


The ultimate goal of this first Wait section of code is to define a set of animations—nested sets are possible, which form a tree of animations.  These trees can get more complicated than this, but we’ll keep the example simple for now.


Now for the criticism.  Look at the matching parentheses of the Wait function.  The normal TimeSpan parameter is listed as an equal along with the Animation parameter list, and what is being used as a complex, nested structure is holding up the closing parenthesis and dragging it down to the end of the entire list.  If only there were a cleaner way of treating this nested structure like constructor initializers (see Part 8).  These correspond, in terms of visual layout, to the attributes and the child elements of an XML node.


What else is wrong with this picture?  The .WhenComplete and .Begin functions are being invoked on the result of the previous expression.  It’s characteristic of fluent-style APIs to define functions (or extension methods) that operate on the result of the previous operation so they can be strung together into sentence-like patterns.  The dot before both WhenComplete and Begin look odd when appearing on lines by themselves, and the lambda expression would be better promoted to a proper code block.


Finally, it’s unfortunate that in declaring a new local variable, we have to indent the whole animation block this way. 
Here’s what the same code looks like in Archetype:


Animate
.Wait (0.2 seconds) -> anim

{

RedChip.MoveBy(0.4 seconds, -40, 0),

RedChip.FadeIn(0.2 seconds),

 

BlackChip.MoveBy(0.4 seconds, 0, 40),

BlackChip.FadeOut(0.4 seconds)

}

WhenComplete (a)

{

MainStage.Children.Remove(RedChip),

MainStage.Children.Remove(BlackChip)

}

Begin();


This is more like it.  Notice the declarative assignment (declaration + assignment) with –> anim on the first line, and the way the parentheses can be closed after the TimeSpan object (see Part 7 on custom operators for an explanation of the syntax “0.2 seconds”).  There’s no more need to indent the whole structure to make it line up nicely in an assignment.  The following initializer code block (in curly braces) supplies Animation object values to the params parameter in the Wait function, and the WhenComplete and Begin functions don’t require a leading dot to operate on the previous expression (Intellisense would reflect these options).

The Archetype code is much cleaner.  It’s easier to see where groups of constructs begin and end, enabling fluent-style APIs with arbitrarily-complicated nested structures to be easily constructed.  Let’s take a look at one more example with a more deeply nested structure:

 

Animate.Group –> anim

{

RedChip.MoveBy(0.4 seconds, -40, 0),

RedChip.FadeIn(0.2 seconds),

 

BlackChip.MoveBy(0.4 seconds, 0, 40),

BlackChip.FadeOut(0.4 seconds),

 

Animate.Wait (0.4 seconds)

{

Animate.CrossFade(1.5 seconds, RedChip, BlackChip),

BlackChip.MoveTo(0.2 seconds, 20, 150)

}

}


Here, a GroupAnimation is defined that contains, as one of its child Animations, another GroupAnimation (created with the Wait function).  The animation isn’t started in this case, so anim.Begin() can be called later, or anim could be composed into a larger animation somewhere.  A peek at the function headers for Group and Wait functions should make the ease and power of this design clear.

static Animate object

{

// a stream is the only parameter

Group GroupAnimation (Animations Animation* params)

{

}

 

// a stream is the last parameter, so [ list ] syntax can still be used

Wait GroupAnimation (WaitTime TimeSpan, Animations Animation* params)

{

}

}

Because the class is static, individual members are assumed to be static as well.

 

The easiest way to support this would be to allow this initializer block to be used with a params parameter that’s declared last.

         

Null Coalescing Operators

The null coalescing operator in C# allows you to compare a value to null, and to supply a default value to use in its place.  This is handy in scenarios like this:

var location = (cust.Address.City ?? "Unknown") + ", " + (cust.Address.State ?? "Unknown");

 

Chris Eargle makes a good point in his article suggesting a “null coalescing assignment operator” when making assignments such as:

cust.Address.City = cust.Address.City ?? "Unknown";

 

There should be a way to eliminate this redundancy.  By combining null coalescing with assignment, we can do this:

cust.Address.City ??= "Unknown";

Groovy’s Elvis Operator serves a similar role, but operates on a value of false in addition to null.

Safe Navigation Operator

There are many situations where we find ourselves needing to check the value of a deeply nested member, but if we access it directly without first checking whether each part of the path is null, we get a NullReferenceException.

var city = cust.Address.City;

 

If either cust or Address are null, an exception will be thrown.  To get around this problem, we have to do something like this in C#:

string city = null;

 

if (cust != null && cust.Address != null)

city = cust.Address.City;

 

The && operator is short-circuiting, which means that if the first boolean expression evaluates to false, the rest of the expression—which would produce a NullReferenceException—never gets executed.  As tedious as this is, without short circuiting operators, our error-prevention code would be even longer.

Jeff Handley wrote a clever safe navigation operator of sorts for C#, using an extension method called _ that takes a delegate (supplied as a lambda).  You can find that code hereIn his code, he does return a null value when the path short circuits.  As you can see, however, the limitations of C# cause this simple example to get confusing quickly, which you can see if we make City a non-primitive object as well:

var city = cust._(c => c.Address._(a => a.City.Name));

Groovy implements a Safe Navigation Operator in the language itself, which is cleaner:
 

var city = cust?.Address.City;

 

This is equivalent to the more verbose code above.  Archetype takes a similar approach:

var city = cust..Address.City;

Because of the .. operator in this member access expression, the type of the city variable is Option<string> (more on Option types).  If the path leading up to City is invalid (because Address is null), the value of city will be None.  This works the same as Nullable<T>, except that None means “doesn’t have a value; not even null”.

I like to think of None as the “mu constant”.  What is mu?  It’s the Japanese word that variously means “not”, “doesn’t exist”, etc., and is illustrated by the well-known Zen Buddhist koan:

A monk asked Zhaozhou Congshen, a Chinese Zen master (known as Jōshū in Japanese), "Has a dog Buddha-nature or not?" Zhaozhou answered, "Wú" (in Japanese, Mu)

The Gateless Gate, koan 1, translation by Robert Aitken

Yasutani Haku’un of the Sanbo Kyodan maintained that "the koan is not about whether a dog does or does not have a Buddha-nature because everything is Buddha-nature, and either a positive or negative answer is absurd because there is no particular thing called Buddha-nature.

In other words, Mu has often been used to mean “I disagree with the presuppositions of the question.”

There are a few basic patterns around options, nullable objects, and safe navigation that occur frequently, so I’ll outline them here with examples:

// if Address is null, this evaluates to false

if (cust..Address.City == "Milwaukee")

WorkHarder();

 

// if City is None because Address is null, set to "Address Missing"; otherwise, get the city text

var city = cust..Address.City ?! "Address Missing";

 

// if City is Some<string> and City == null, set to empty string

var city = cust..Address.City ?? string.Empty;

 

// if Address is null (City is None), set to "Address Not Found";

// but if City == null, set to empty string

var city = cust..Address.City ?! "Address Not Found" ?? string.Empty;

 

// if Address points to an object, leave it alone; otherwise, create a new object

cust.Address ??= new Address(City="Milwaukee");

 

// an assertion

cust..Address ?! new Exception("Address missing");

 

// set the city if possible, throw a specific exception if not

var city = cust..Address.City ?! new Exception("Address missing");

 

Summary

By now it should be obvious that Archetype aims to liberate the developer from the constraints and inefficiencies of ordinary programming languages.  It is designed with modern practices in mind such as fluent-style development and declarative object graph construction.

This article wraps up the material started in Part 8 on declarative programming in Archetype.  In addition, I introduced the safe navigation and null coelescing operators.  These are simple but powerful language elements for cleanly and succinctly specifying common idioms that come up in daily coding.

Posted in Animation, Archetype Language, Composability, Design Patterns, Fluent API, Language Innovation | 4 Comments »

The Archetype Language (Part 8)

Posted by Dan Vanderboom on October 1, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Constructors

A constructor in Archetype is a recommended, predefined prototype for instantiating an object correctly.

 

The default parameterless constructor is defined implicitly (it’s defined even if it isn’t written), even if other constructors are defined explicitly. This last part is unlike other languages that hide the parameterless constructor when others are defined.  This will make classes with these default constructors common in Archetype, to more easily support behaviors like serialization and dynamic construction.  When it needs to be hidden, it can be defined with reduced visibility, such as private.

 

A constructor is defined with the name new, consistent with how it’s invoked.

 

Let’s start with a very basic class, and build up to more complicated examples.

 

Customer object

{

FirstName string;

LastName string;

}

 

Despite the lack of an explicit constructor, it’s important for Archetype to define constructs that are useful in their default configurations.  You couldn’t get more basic that the Customer class above.  If we want to define the constructor explicitly, we can do so.

 

Customer object

{

FirstName string;

LastName string;

 

new ()

{

// do nothing

}

}

 

Instantiating a Customer object is easy. With the parameterless constructor, parentheses are optional.

 

var dilbert = new Customer;

 

Archetype, like C#, supports constructor initializers:

 

var dilbert = new Customer

{

FirstName = "Dilbert",

LastName = "Smith"

};

 

When you have few parameters and want to compress this call to a single line, the curly braces end up feeling a little too much (too formal?).

 

var dilbert = new Customer { FirstName = "Dilbert", LastName = "Smith" };

 

Archetype supports passing these assignment statements as final arguments of the constructor parameter list, like this:

 

var dilbert = new Customer(FirstName = "Dilbert", LastName = "Smith");

 

As a result, there isn’t much need to define constructors that only set fields and properties to the value of constructor parameters.  Because Archetype has this mechanism for fluidly initializing objects at construction, the only time constructors really need to be defined is when construction of the object is complicated or unintuitive, in which case a supplied construction pattern is a sure way to make sure it’s done correctly.  Our Customer example doesn’t meet those criteria, but if it did, this is one way we could write it:

 

Customer object

{

FirstName string;

LastName string;

 

new (FirstName string, LastName string)

{

this.FirstName = FirstName;

this.LastName = LastName;

}

}

 

To avoid having to qualify FirstName with the this keyword, many people prefer naming their parameters with the first character lower-cased.  That’s an unfortunate compromise.  When viewing at least the public members of a type, in a sense you’re creating an outward-facing API, and I think Pascal casing more naturally respects English grammar, not downplaying the signficance of the most-important first word in an identifier by lower-casing it to get around some unfortunate syntax limitation.

 

But instead of taking sides in a naming convention war, we can solve the problem in the language and remove the need to make any compromise.

 

new (FirstName string, LastName string)

{

set FirstName, LastName;

}

 

This lets us set individual properties named the same as constructor parameters.  It’s flexible enough to set some and consume other parameters differently, but when you want to set all parameters with matching member names, you can use the shortcut set all.  If that’s all the constructor needs to do, we can do away with the curly braces:

 

new (FirstName string, LastName string) set all;

 

If our Customer class contained a BirthDate property, we could use this constructor and pass in an initializer statement as a final parameter.

 

var dilbert = new Customer("Dilbert", "Smith", BirthDate = DateTime.Parse("7/4/1970");

 

This works with multiple initializers.  Alternatively, we could use an initializer body after the parameter list:

 

var dilbert = new Customer("Dilbert", "Smith")

{

BirthDate = DateTime.Parse("7/4/1970")

};

 

Note how we have two places to supply data to a new object, if needed: the parameter list for simple, short values, and the initializer body for much larger assignments.


Another common construction pattern is for one or more constructors to call another constructor with a default set of properties.  Typically the constructor with the full list of parameters performs the actual work, while the shorter constructors call into the main one, passing in some default values and passing the others through.

 

new (EvaluateFunc sFunc<T>) new(null, null, EvaluateFunc);

 

new (BaseObject object, EvaluateFuncsFunc<T>) new(null, BaseObject, EvaluateFunc);

 

new (Name string, BaseObject object, EvaluateFunc Func<T>)

{

set all;

 

// do all the real work…

// …

}

 

Declarative Archetype: The Initializer Body

 

The initializer body mentioned above has a special structure in Archetype.  Member assignment statements can appear side-by-side with value expressions that are processed by a special function called value.  This can be used, among other things, to add items to a collection.  It’s best to see in an example:

 

var dilbert = new Customer

{

FirstName = "Dilbert",

LastName = "Smith",

BirthDate = DateTime.Parse("7/4/1970"),

 

new SalesOrder(OrderCode = "ORD012940"),

 

new SalesOrder

{

OrderCode = "ORD012941",

 

new SalesOrderLine(ItemCode = "S0139", Quantity = 3),

new SalesOrderLine(ItemCode = "S0142", Quantity = 1)

}

};

 

The first three lines of the initializer set members with assignment statements.  The next expression (new SalesOrder …) in the list creates an object, but there’s no assignment.  It returns a value, but where does it go?  Take a look at the value functions below for the answer:

 

Customer object

{

FirstName string;

LastName string;

Orders SalesOrder* = new;

Invoices Invoice* = new;

 

// formatted inline

value (Order SalesOrder) Orders += Order;

 

// formatted with full code block

value (Invoice Invoice)

{

Invoices += Invoice;

}

}

 

A Customer has several collections of things–Orders and Invoices here–and because there are two value functions in the class, any expressions of type SalesOrder or Invoice will be evaluated and their values passed to the appropriate value function. Expressions of other types will trigger a compile-time error.

 

The += and -= operators haven’t been shown before.  Their use is a very natural fit for stream and list types.  The += operator appends an object to a stream, and -= removes the first occurrence of that object.

 

This simple addition of a value function in types (classes and structs) gives Archetype the ability to represent hierarchical structures in a clean, declarative way.  Sure it’s always been possible to format expressions similarly, but the syntactic trappings of imperative languages have made this difficult and unattractive at best, and in most real-world cases impractical.

 

When I experimented in creating a Future class, I came up with a pattern in C# to nest structures in a tree for large future expressions, but the need to match parentheses gets in the way and consumes too much attention that’s better focused on the logic itself:

 

Future<string> FuturePi = null, FutureOmega = null, FutureConcat = null, FutureParen = null;

 

var result = new Future<string>("bracket",
    () => Bracket(FutureParen),
    (FutureParen =
new Future<string>("parenthesize",
        () => Parenthesize(FutureConcat),
        (FutureConcat =
new Future<String>("concat",
            () => FuturePi +
" < " + FutureOmega,
                (FuturePi =
new Future<string>("pi", () => CalculatePi(10))),
                (FutureOmega =
new Future<string>("omega", () => CalculateOmega()))
            ))
        ))
    );

 

The difference finally occurred to me between the need to set few simple members and the definition of larger, more structured content–including nested structures–that begged for a way to supply them without carrying the end parenthesis down multiple lines or letting them build up into parentheses knots that must be carefully counted.  One gets to fidgeting with where to put them, and sometimes there’s no good answer to that.


Another feature we need to make this declarative notation ability robust is inline variable declaration and assignment.  Notice in the last example how several intermediary structures have variable names defined for them ahead of time, outside the expression. Writing that Future code, I felt it was unfortunate these variables couldn’t be defined inline as part of the expression.  Doing so would allow us to define any kind of structure we might see in XML or JSON, such as this XAML UI code.

 

new Canvas -> LayoutRoot

{

Height = Auto,

Width = Auto,

 

new StackPanel -> sp

{

Orientation = Vertical,

Height = 150,

Width = Auto,

 

Canvas.Top = 10,

Canvas.Left = 20,

 

with Canvas

{

Top = 10,

Left = 20,

},

 

with Canvas { Top = 10, Left = 20 },

 

Loaded += (sender, e)

{

Debug.WriteLine("StackPanel sp.Loaded running");

sp.ResizeTo(0.5 seconds, Auto, 200).Begin();

},

 

LayoutUpdated += HandleLayoutUpdated,

 

new TextBlock

{

FontSize = 18,

Text = "Title"

},

new TextBlock { Text = "Paragraph 1" },

new TextBlock { Text = "Paragraph 2" },

new TextBlock(Text = "Paragraph 3")

}

};

 

A few notes are needed here:

 

·   Wow, this looks a lot like XAML, but much friendlier to developers who have to actually read and edit it!  Yes, good observation.

·  Unlike XAML, every identifier here works with the all-important Rename refactoring, go to definition, find all references to, etc. This is great for reducing the amount of work to find relationships among things and manually update related files.

·  Also unlike XAML, code for event handlers can be defined here. I’m not saying you should cram all of your event handler logic here, but it could come in quite handy at times and I can’t see any reason to disable it. 

·  The with token is a custom operator (see Part 7) that provides access to attached properties through an initializer body. Custom extensions allow you to access these properties with a natural member-access style.

·  It hasn’t been possible to use generic classes in XAML. Specifying UI in Archetype, this would be trivial, and I suspect they could be used to good effect in many ways. Of course, in doing this you’d lose support for the designers in VS and Blend, which would be awfui.

·  Auto is simply an alias for double.NaN.

·  The -> custom operator in these expressions defines a variable and sets it to the value of the new object. The order of execution is:

1. Evaluate constructor parameters, if any are supplied.

2. Assign the object to the variable defined with ->, if supplied.

3. Set any fields or properties with assignment statements.

4. Evaluate value expressions, if supplied, and call the class’s value function with each one, if a value function has been defined.

5. Invoke any matching value function defined in class extensions.

 

By following this design, the example above can be translated into this C# code by the Archetype compiler:

 

var LayoutRoot = new Canvas()

{

Height = double.NaN,

Width = double.NaN

};

 

var sp = new StackPanel()

{

Orientation = Orientation.Vertical,

Height = 150.0,

Width = double.NaN

};

 

LayoutRoot.Children.Add(sp);

 

sp.SetValue(Canvas.TopProperty, 10.0);

sp.SetValue(Canvas.LeftProperty, 20.0);

 

sp.Loaded += (sender, e)

{

Debug.WriteLine("StackPanel sp.Loaded running");

sp.ResizeTo(0.5.seconds(), double.NaN, 200.0).Begin();

},

 

sp.LayoutUpdated += HandleLayoutUpdated;

 

sp.Children.Add(new TextBlock() { FontSize = 18, Text = "Title"});

sp.Children.Add(new TextBlock() { Text = "Paragraph 1"});

sp.Children.Add(new TextBlock() { Text = "Paragraph 2"});

sp.Children.Add(new TextBlock() { Text = "Paragraph 3"});

 

var VisualTree = LayoutRoot;

 

Compare the two approaches. The C# code is a typical example of imperative structure building, while the Archetype code is arguably as declarative as XAML, and with many advantages over XAML for developers.


Going back to the Future example, we could rewrite this in Archetype a few different ways.  I’ll present two.  In the first one, value functions are used to receive the future’s evaluation function as well as any Future objects the expression depends on.


new
Future<string>("bracket") -> result

{

() => Bracket(FutureParen),
new Future<string>("parenthesize") -> FutureParen

{

() => Parenthesize(FutureConcat),
new Future<string>("concat") -> FutureConcat

{

() => FuturePi + " < " + FutureOmega,
new Future<string>("pi") -> FuturePi

{

() => CalculatePi(10)

},

new Future<string>("omega") -> FutureOmega

{

() => CalculateOmega()

}

}

}

}


The shorter approach passes an evaluation delegate in as a parameter.


new
Future<string>(() => Bracket(FutureParen)) -> result

{

new Future<string>(() => Parenthesize(FutureConcat)) -> FutureParen

{

new Future<string>(() => FuturePi + " < " + FutureOmega) -> FutureConcat

{

new Future<string>(() => CalculatePi(10)) -> FuturePi,

new Future<string>(() => CalculateOmega()) -> FutureOmega

}

}

}

 

The name string parameter is missing from the last example.  This was only for use during debugging.  Now what we have is a very direct description of futures that are dependent on other futures in a dependency graph.

Summary

Object construction is a crucial part of an object-oriented language, and Archetype is advanced with its options for constructing arbitrary object graphs and initializing even complicated state in a single expression.  These fluent declarative syntax features are ideal for representing structures such as XAML UI, state machines, dependency graphs, and much more.

XAML is a language.  The question this work has me asking is: do we really need a separate language if our general purpose language supports highly declarative syntax? It’s a provocative question without an easy answer, but it seems clear that many DSLs could emerge within a language that so richly supports composition.

With the ability to define arbitrarily complex structures in code—from declarative object graphs to rich functional expressions—it’s hard to think of a situation that would be too difficult to model and build an API or application around.

Posted in Archetype Language, Data Structures, Design Patterns, Language Innovation, Silverlight, User Interface Design, WPF | 2 Comments »

The Archetype Language (Part 7)

Posted by Dan Vanderboom on September 27, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Semantic Density

As an avid reader growing up, I noticed that my knowledge and understanding of a topic grew more easily the faster I read.  Instead of going through a chapter every day or two, which puts weeks or months between the front and back covers, I devoured 200-300 pages in a night, getting through the largest books in a couple days.  And in reading multiple books on a subject back-to-back, it was easier to find relationships and tie together concepts for things that were still so fresh in my memory.

In my study of linguistics, I learned that legends like Noam Chomsky could learn hundreds of langauges; the previous librarian at the Vatican could read 97.  Bodmer’s excellent book The Loom of Language attempts to teach 10 languages at once, and it seems that the more languages you learn, the easier it is to pick up others.

What these examples have in common is semantic density.  It might seem from what I’ve said that this would be like drinking from a firehose which only the most gifted could endure, but I would argue that intensely-focused learning puts our minds in a highly alert and receptive condition.  In such a state, being able to draw more connections between statements and ideas, we are better able to comprehend the whole in a holistic, intuitive way.

Code Example

Semantic density is important in code, too.  With a pattern like INotifyPropertyChanged, formatted as I have it below, it’s 12 lines of code, 13 if you separate your fields and properties with a blank line, but in the ballpark of a dozen lines of code.  (This is additional explanation for a feature described in part 2 of this series.)

1:   string _Display;
2:   public string Display
3:   {
4:       get { return _Display; }
5:      
set
6:      
{
7:           _Display = value;
8: 
9:           if (PropertyChanged != null)
10:              PropertyChanged(this, new PropertyChangedEventArgs("Display"));
11:      }
12:  }

Does the ability to inform external code of changes seem like it should take a dozen lines to get there?  This can be somewhat compressed by defining a SetProperty method:

1:   string _Display;
2:   public string Display
3:   {
4:       get { return _Display; }
5:       set { SetProperty("Display", ref _Display, value); }
6:   }


This chops the line count in half, bringing it down to six lines–seven if you include a space above or below to separate it from other members.  On my monitor, that means I can see about eight property definitions at a time.  Now that’s usually enough, but I’ve written a few custom controls  that have upwards of 30 properties.  For a new pair of eyes, getting the gist of that class is going to involve a lot of scrolling, never seeing more than a small slice at a time of a much large picture.  The frame of time I mentioned earlier in regard to studying a subject is analogous here to the frame of space.  Seeing 20% of a class at any time lends itself to faster grokking than seeing a 2% sliver at a time.  Our minds, marvelous as they are, do have limits.  Lowering semantic density, such as by spreading meaning over large distances or time spans, makes us work harder to accomplish the same task, trying to put all the pieces together, and the differences are often dramatic.

Just as nouns can be modified by adjectives in natural languages, types in Archetype support user-defined type modifiers.  By defining a new type modifier called bindable to encapsulate the INotifyPropertyChanged pattern, we can collapse the above example into a single line:

1:   Age bindable int;

 

I have no problem stacking these properties right on top of one another.  Although expressed in a highly dense form, it’s actually easier to understand at a glance in these three simple tokens than in the half-dozen lines above, which beg for interpretation to assemble their meaning.  Even if we have 30 of these, they’d all fit on one screen, and the purpose of the class as a whole is quickly gathered.

 

One thing I noticed in developing the animation library Animate.NET is how much code I saved: not having to worry about the details of storyboard creation, key frames, and so on.  It allows you to get right to the point of stating your intention.  Often a library like this is enough, but once in a while language extensibility is a much better approach; and when it is, not having the option can be painful and time consuming.

 

Custom Operators & Operator Overloading

As in most languages, Archetype supports two forms of syntax for operations: functions and operators.  Functions are invoked by including a pair of parentheses after their name that contain any arguments to pass in, whereas operators appear adjacent to or between sub-expressions. 

In C#, some operators are available for overloading.  Archetype supports these operator overloads by using the same names for them.  This allows Archetype to use operators defined in C# and to expose supported operators to C# consumers.

However, Archetype goes one step further and allows you to define custom operators.  There are three basic kinds of custom operator:

  1. unary prefix
  2. unary suffix
  3. binary

If we wanted an easy way to duplicate strings in C#, we might define an extension method called Dup, but in Archetype we also have this option:

// "ABC" dup 3 == "ABCABCABC"

binary dup string (left string, right int)

return string.Repeat(left, right)


The expression parser sees “ABC”, identifies it as a string value, and then looks at the next token.  If the dot operator were found (.), it would look for a member of string or an extension member on string, but because the next token isn’t the dot operator, it looks up the token in the operator table.  An operator called dup is defined with a left string argument and a right int argument, matching the expression.  If the operator were more complicated, it would have a curly-brace code block, but because it’s a single return statement, that’s optional.

Archetype operators aren’t limited to letters, though.  We can also use symbols (but not numbers) in our operator names.  Here’s a “long forward arrow” (compiled with name DashDashGreaterThan) that allows us to write a single function parameter before the function name itself:

// "Hey" –> Console.WriteLine;

binary<T> –> void (left T, right Action<T>)

right(left);


Note that the generic type parameter is attached to the binary keyword.  I arrived at this placement through much experimentation.  Names like –><T> are difficult to read and can be trickier to parse.

There is a special binary operator called adjacent which you can think of as an “invisible operator” capable of inserting an operation between two sub-expressions.  In the following example, two adjacent strings are interpreted as a concatenation of the two.

// "123" "45" == "12345"

binary adjacent string (left string, right string)

return left + right;

With custom operators, what was originally part of the language can now be defined in a library instead.  This greatly simplifies the language.  Just as methods can be shadowed to override them, so too will some ability be needed in Archetype to block or override operators that would otherwise be imported along with a namespace.

The next operator we’ll look at is the unary suffix.  The example consists of units of time: minutes and seconds.

// 12 minutes == TimeSpan.FromMinutes(12)

unary suffix minutes TimeSpan (short, int)

return TimeSpan.FromMinutes((int)value);

 

// 3 seconds == TimeSpan.FromSeconds(3)

unary suffix seconds TimeSpan (short, int)

return TimeSpan.FromSeconds((int)value);


With support for extension properties, we could have also said 12.minutes or 3.seconds, which is already better than C#’s 12.minutes() and 3.seconds(), but by defining these tokens as unary suffix operators, we can eliminate even the dot operator and make it that much more fluent and natural to type (without losing any syntactic precision).  Notice how a list of types is provided instead of a parameter list.  Unary operators by definition have only a single argument, but we often want them to operate on several different types.

Here’s a floating point operator for seconds.

// 2.5 seconds == (2 seconds).Add(0.5 * 1000 milliseconds)

// WholeNumber and Fraction are extension properties on float, double, and decimal

unary suffix seconds TimeSpan (float, double, decimal)

return value.WholeNumber seconds + value.Fraction * 1000 milliseconds;

We can use the adjacent operator on TimeSpans the same that we did for strings above.

// 4 minutes 10 seconds == (4 minutes).Add(10 seconds)

binary adjacent TimeSpan (left TimeSpan, right TimeSpan)

return left.Add(right);

Now let’s combine the use of a few of these operators into a single example.

alias min = minutes;

alias s = seconds;

var later = DateTime.Now + 2 min 15 s;

// Schedule is an extension method on DateTime

later.Schedule

{

// schedule this to run later

}

We can also define our DateTime without assigning its value to a variable.

(DateTime.Now + 10 seconds).Schedule

{

}

 

(DateTime.Now + 10 seconds).Schedule (Repeat=10 seconds)

{

}

Repeat is an optional parameter of Schedule, defaulting to TimeSpan.Zero, meaning “don’t repeat”.

Additional Notes

When more than one operator is valid in a given position, the most specific operator (in terms of its parameter types) is used.  If there’s any ambiguity or overlap remaining, a compiler error is issued.

Unary operators will take precedence over binary operators, but it hasn’t been determined yet what precedence either one will actually have in relation to all of the other operators, or whether this will be specified in the operator definition.

Because of this design for custom operators, I’ve been able to remove things from the language itself and include them as operator definitions in a library.

Summary

This article provided some deeper explanation into previously covered material and introduced the syntax for Archetype’s very powerful custom operator declaration syntax.  The next article will cover some special operators built into Archetype, and property path syntax which is something I came up with a while back to safely reference identifiers that would be impervious to both refactoring and obfuscation.

I’m curious to read your feedback on custom operators in particular, so keep the great comments coming!

Posted in Archetype Language, Composability, Data Structures, Design Patterns, Language Innovation | 1 Comment »

Language Design: Complexity, Extensibility, and Intention

Posted by Dan Vanderboom on June 14, 2010

Introduction

The object-oriented approach to software is great, and that greatness draws from the power of extensibility.  That we can create our own types, our own abstractions, has opened up worlds of possibilities.  System design is largely focused on this element of development: observing and repeating object-oriented patterns, analyzing their qualities, and adding to our mental toolbox the ones that serve us best.  We also focus on collecting libraries and controls because they encapsulate the patterns we need.

This article explores computer languages as a human-machine interface, the purpose and efficacy of languages, complexity of syntactic structure, and the connection between human and computer languages.  The Archetype project is an on-going effort to incorporate these ideas into language design.  In the same way that some furniture is designed ergonomically, Archetype is an attempt to design a powerful programming language with an ergonomic focus; in other words, with the human element always in mind.

Programming Language as Human-Machine Interface

A programming language is the interface between the human mind and executable code.  The point isn’t to turn human programmers into pure mathematical or machine thinkers, but to leverage the talent that people are born with to manipulate abstract symbols in language.  There is an elite class of computer language experts who have trained themselves to think in terms of purely functional approaches, low-level assembly instructions, or regular, monotonous expression structures—and this is necessary for researchers pushing themselves to understand ever more—but for the every day developer, a more practical approach is required.

Archetype is a series of experiments to build the perfect bridge between the human mind and synthetic computation.  As such, it is based as much as possible on a small core of extensible syntax and maintains a uniformity of expression within each facet of syntax that the human mind can easily keep separate.  At the same time, it honors syntactic variety and is being designed to shift us closer to a balance where all of the elements, blocks, clauses and operation types in a language can be extended or modified equally.  These represent the two most important design tenets of Archetype: the intuitive, natural connection to the human mind, and the maximization of its expressive power.

These forces often seem at odds with each other—at first glance seemingly impossible to resolve—and yet experience has shown that the languages we use are limited in ways we’re often surprised by, indicating that processes such as analogical extension are at work in our minds but not fully leveraged by those languages.

Syntactic Complexity & Extensibility

Most of a programming language’s syntax is highly static, and just a few areas (such as types, members, and sometimes operators) can be extended.  Lisp is the most famous example of a highly extensible language with support for macros which allow the developer to manipulate code as if it were data, and to extend the language to encode data in the form of state machines.  The highly regular, parenthesized syntax is very simple to parse and therefore to extend… so long as you don’t deviate from the parenthesized form.  Therefore Lisp gets away with powerful extensibility at the cost of artificially limiting its structural syntax.

In Lisp we write (+ 4 5) to add two numbers, or (foo 1 2) to call a function with two parameters.  Very uniform.  In C we write 4 + 5 because the infix operator is what we grew up seeing in school, and we vary the syntax for calling the function foo(1, 2) to provide visual cues to the viewer’s brain that the function is qualitatively something different from a basic math operation, and that its name is somehow different from its parameters.

Think about syntax features as visual manifestations of the abstract logical concepts that provide the foundation for all algorithmic expression.  A rich set of fundamental operations can be obscured by a monotony of syntax or confused by a poorly chosen syntactic style.  Archetype involves a lot of research in finding the best features across many existing languages, and exploring the limits, benefits, problems, and other details of each feature and syntactic representation of it.

Syntactic complexity provides greater flexibility, and wider channels with which to convey intent.  This is why people color code file folders and add graphic icons to public signage.  More cues enable faster recognition.  It’s possible to push complexity too far, of course, but we often underestimate what our minds are capable of when augmented by a system of external cues which is carefully designed and supported by good tools.

Imagine if your natural spoken language followed such simple and regular rules as Lisp: although everyone would learn to read and write easily, conversation would be monotonous.  Extend this to semantics, for example with a constructed spoken language like Lojban which is logically pure and provably unambiguous, and it becomes obvious that our human minds aren’t well suited to communicating this way.

Now consider a language like C with its 15 levels of operator precedence which were designed to match programmers’ expectations (although the authors admitted to getting some of this “wrong”, which further proves the point).  This language has given rise to very popular derivatives (C++, C#, Java) and are all easily learned, despite their syntactic complexity.

Natural languages and old world cities have grown with civilization organically, creating winding roads and wonderful linguistic variation.  These complicated structures have been etched into our collective unconscious, stirring within us and giving rise to awareness, thought, and creativity.  Although computers are excellent at processing regular, predictable patterns, it’s the complex interplay of external forces and inner voices that we’re most comfortable with.

Risk, Challenge & Opportunity

There are always trade-offs.  By focusing almost all extensibility in one or two small parts of a language, semantic analysis and code improvement optimizations are easier to develop and faster to execute.  Making other syntactical constructs extensible, if one isn’t careful, can create complexity that quickly spirals out of control, resulting in unverifiable, unpredictable and unsafe logic.

The way this is being managed in Archetype so far isn’t to allow any piece of the syntax tree to be modified, but rather to design regions of syntax with extensibility points built-in.  Outputting C# code as an intermediary (for now) lays a lot of burden on the C# compiler to ensure safety.  It’s also possible to mitigate more computationally expensive semantic analysis and code generation by taking advantage of both multicore and cloud-based processing.  What helps keep things in check is that potential extensibility points are being considered in the context of specific code scenarios and desired outcomes, based on over 25 years of real-world experience, not a disconnected sense of language purity or design ideals.

Creating a language that caters to the irregular texture of thought, while supporting a system of extensions that are both useful and safe, is not a trivial undertaking, but at the same time holds the greatest potential.  The more that computers can accommodate people instead of forcing people to make the effort to cater to machines, the better.  At least to the extent that it enables us to specify our designs unambiguously, which is somewhat unnatural for the human mind and will always require some training.

Summary

So much of the code we write is driven by a set of rituals that, while they achieve their purpose, often beg to be abstracted further away.  Even when good object models exist, they often require intricate or tedious participation to apply (see INotifyPropertyChanged).  Having the ability to incorporate the most common and solid of those patterns into language syntax (or extensions which appear to modify the language) is the ultimate mechanism for abstraction, and goes furthest in minimizing development effort.  By obviating the need to write convoluted yet routine boilerplate code, Archetype aims to filter out the noise and bring one’s intent more clearly into focus.

Posted in Archetype Language, Composability, Design Patterns, Language Extensions, Language Innovation, Linguistics, Metaprogramming, Object Oriented Design, Software Architecture | 2 Comments »

The Archetype Language (Part 6)

Posted by Dan Vanderboom on June 14, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

If Expressions

In Archetype, an if expression can be provided for any value.  The expression variant of the if statement, instead of taking embedded statement clauses, takes value expressions for its "consequence" clauses.

The if expression serves the same purpose as the ternary conditional operator in C#:

TextField.PasswordChar = DisplayStar ? ‘*’ else ‘ ‘;

Enumerations

Enumerations are represented syntactically like lists in Archetype, using the square brackets to enclose values.  The idea is that an enumeration type is simply a list of possible values.

enum RainbowColor [ Red, Orange, Yellow, Green, Blue, Indigo, Violet ];

Enumerations are often formatted to display one value on each line.  The following example demonstrates this, and defines a variable of the enumeration’s type.

enum RainbowColor

[

Red,

Orange,

Yellow,

Green,

Blue,

Indigo,

Violet

];

Anonymous Enumeration Types

It normally makes sense for an enumeration type to be named so it can be referenced and used elsewhere.  But in cases where an enumeration is only needed privately within a single class or method, an anonymous enumeration type can be defined this way:

ForegroundColor enum [ Black, Gray, DarkBlue ];

Enumeration Assignment

Regardless of whether you’re working with named or anonymous enumerations, assignment is the same.  The enumeration type is not used in the assignment, which works well with anonymous enumerations since they don’t have a discoverable name.

// no need for an enumeration name

// so it also works great with anonymous enumeration types

BackgroundColor = Green;

Language services (Intellisense) can still inform the user of the possible values after the equals sign and space are entered.

Nullable Types

The nullable type operator converts a type T to Nullable<T>, the same as in C#.  Consider the following examples of normal and bindable nullable properties, and a local variable with an initializer.

Age int?;

 

HighScore bindable int?;

 

var Age int? = null;

Additionally, we can define a local variable and infer its type from an assignment, using the nullable type operator to force type inference to use a nullable type.

// give me a nullable type, even though I’m not setting it to null no?

var Age ? = 4;

From this point on, we can assign values (including null) to the Age variable without using the null type operator.  In fact, including the ? operator would be invalid.

// update the value of Age; notice we don’t use the nullable ? symbol after the definition

Age = 9;

Tuples

A tuple is an anonymous type consisting of an ordered set of heterogeneous fields. In Archetype, their fields can be named for Intellisense hinting when used as return types, or left unnamed. In local variable definitions, their individual members must either be named or use the anonymous member symbol, the underscore.

The following example shows the syntax for defining a tuple as a return type for a function.  In this case, a pair of int values will be returned.

GetMouseLocation (int, int) ()

{

return (100, 50);

}

The members of a tuple in a return type can be named as a hint to the caller of the function.

GetMouseLocation (x int, y int) ()

{

return (100, 50);

}

The function is called and its return tuple value stored like this:

var (x, y) = GetMouseLocation();

Here we’re defining a new tuple type, Tuple<int, int>, which is not named as a whole.  We might call it an anonymous tuple.  Instead of naming the whole, we’re naming the individual members.

Using the .NET Tuple type, we could also write this:

// we don’t care about the y value here

var (x, _) = GetMouseLocation();

We would then have to reference loc.Item1 and loc.Item2 to access the individual members.  Naming the members instead of the whole, however, makes more sense because it provides greater code readability.

This next example demonstrates how tuples can be defined using type inference.

var (a, b) = (1, 2);

var (c, d) = (a, b);

var x = b;

On the first line, a and b are defined as accessors into a new tuple: a is assigned to a value of 1, and b to a value of 2.  On the second line, another tuple is defined and its member c is assigned to a while d is assigned to b.  The third line demonstrates how you can use the tuple members independently of each other.  In this case, the value of b is assigned to x.

If we don’t care about all of the members of a tuple, we can use the underscore character to ignore that member.  The next example shows how to extract the x value from our GetMouseLocation function while ignoring the y value.

// we don’t care about the y value here

var (x, _) = GetMouseLocation();

Finally, we have a handy way of swapping values without the need to introduce a third variable.

(a, b) = (b, a);

Archetype is not limited to two-member tuples.  The .NET Framework defines tuples up to seven members, so Archetype will handle at least that many.  If that proves inadequate, it should be relatively easy to extend this to any number of members.

Streams

I first read about streams (or lazy lists, as in Haskell) in a C Omega document on a Microsoft Research site.  They’re analogous to sequences in XQuery and XPath, and are implemented using the IEnumerable<T> type in an iterator.  I liked C Omega’s * operator to define a stream because of the way it sets that type apart from a normal type.  In C#, it’s not obvious that a function with a return type of IEnumerable<T> should behave any differently from another function until you notice the yield keyword.

If I want a stream defined as a property in C#, I’d have to write something like this:

IEnumerable<int> Numbers

{

get

{

yield return 1;

yield return 3;

yield return 5;

}

}

In Archetype, the syntax is more succinct and direct:

Numbers int*

{

yield 1;

yield 2;

yield 3;

}

We can be even more terse in such cases by using a comma in the yield list.

Numbers int*

{

yield 1, 2, 3;

}

Using list comprehensions, which we’ll explore in more detail later in this article, we can do this as well:

Numbers int*

{

yield 0..100 skip 5;

}

One note about the list comprehension here: we don’t use square brackets around the numeric range because they are implied in the yield statement.  Including them here would cause the yield statement to return the list as a single yielded value.

These examples produce streams with a fixed number of elements, but streams can be infinite as well.  This example returns all positive odd numbers starting with one.

OddNumbers int*

{

def i = 1;

loop

{

yield i;

i += 2;

}

}

Streams are lazy, so while it looks at first glance like an infinite loop from which you’ll never escape, in reality control is driven by the loop that accesses the stream.

loop (var n in OddNumbers)

{

Console.WriteLine(n);

if (n > 100)

break;

}

When other type operators are used, such as the nullable type operator, the stream operator must appear last.

Ages int?*

{

yield 35;

yield null;

}

List Comprehensions

Archetype provides some special syntax for constructing lists called list comprehensions.  This is syntactic sugar that provide shortcuts for building lists.

Consider the following syntax in C# and Archetype for constructing a list from 1 to 100.

// C#

var FirstHundred = from x in Enumerable.Range(1, 100) select x;

 

// Archetype

var FirstHundred = [ 1..100 ];

The square brackets in Archetype specify the construction of a list.  Now consider a more complicated list construction.  In this case, Linq is employed in both langauges:

// C#

var FirstHundred = from x in Enumerable.Range(1, 100) where x*x > 3 select x*2;

 

// Archetype

var FirstHundred = from x in [ 1..100 ] where x*x > 3 select x*2;

Here you can see how a list can be used as the source of a query.

Here are some more list comprehension examples:

image

Subrange Types

One of the gems in Pascal is subrange types.  This allows a developer to define a new type that is structurally the same as another type, but whose values are constrained in some way.  I’m often bothered by the disparity between database and .NET types.  In a database, a string type (such as varchar) has a definite and usually small limit.  In .NET, strings can be up to 2 MB, but there hasn’t been a good way in languages like C# and Visual Basic to constrain the length.  In various object-relational mappers, a Size attribute is often employed, but this is only metadata and does nothing to prevent the string from becoming too large, so additional work must be carefully performed to constrain the input using control properties and validation logic.

Archetype answers this with subrange types and type constraints.  Consider the following:

// an int that can only have a value from 0 to 105

type ValidAge int in [0..105];

We can now use this ValidAge type to define our class properties:

Age ValidAge;

If a type is unlikely to be reused, we can also define subrange types anonymously.

Age int in [0..105];

In fact, any list comprehension can be used in a subrange type expression, including multiple ranges, as long as a single base type is used.  This example shows an age property that is valid for underage and retiring age people, but is invalid for any ages in between.

Age int in [0..17, 65..105];

We can limit the length of a string simply:

LastName string#30;

Although in actual practice, it might make more sense to create several named types for various string lengths represented in a database:

type Code string#10;

type Name string#20;

type Summary string#100;

type Description string;

 

LastName Name;

By using a limited number of named string types, both in your code as well as in the database, it’s much easier to update the lengths as needed with a lot less effort.  Archetype adds attributes to the members using these types as well, so this data can be queried and used to inform user interface controls and validation logic, enabling a stronger model-driven approach.

The length of strings doesn’t need to be a single number representing the maximum, however.  We can also specify a range of lengths.

Name string#2..3 = ”ZZZ”;

Notice in this example how an initializer must be used.  This is because a value of null is actually invalid for the Name property.  The minimum allowed length is 2.  Not providing an initializer with a valid value produces a compiler error.  If we wanted to also allow a null value, we would do so like this:

Name string?#2..3 = “ZZZ”;

As with type constraint expressions—discussed in the next section—Archetype injects the appropriate runtime checks in the property setter before any explicitly specified setter code, and throws an OutOfRangeException if the value doesn’t match the specified type criteria.

Type Constraint Expressions

Related to subrange types, type constraints can be applied equally to named or anonymous types.  They allow you to specify a Linq-like where clause that will be used to check values being assigned to properties at runtime.  Because they rely on property setter methods, type constraints cannot be used on fields.  Fortunately, local variables within methods are also implemented like properties by default, so type constraints are also valid on local variables.

I’ve noticed that for some brands, or some stores carrying those brands, only even-numbered sizes of pants are stocked.  This example shows a subrange type representing pants size, using both a subrange type as well as a type constraint expression.

// an int that can only be even

type PantSize int in [0..60] where value % 2 == 0;


Using the modulus operator to obtain the remainder of division, we can be certain now that values of this type will only be even numbers.  Type constraint expressions are allowed to call static or global functions, properties, and fields, but they cannot reference instance members.

Summary

This article covered a lot of type fundamentals.  It should be obvious at this point how a common thread is being woven into the Archetype language.  You’ve probably noticed how almost every construct has named and anonymous counterparts.  Another important theme is the ability to extend types with syntax designed to shape them, and the use of Linq-like expressions throughout the language.

There is still some type content to cover, such as variant or tagged union types and duck typing, which I’ll save for a future article.  Also coming soon is my work on defining custom query comprehensions for a Linq-like query language which can be easily extended with a simple language feature, as well as operators for higher-order functions like fold, map, and others.

Work on my goal of getting a basic Archetype compiler into everyone’s hands is going slowly but steadily.  I have a simple Silverlight IDE running in the cloud that parses Archetype code and will return a .NET assembly.  I got the Oslo tools to work on the server, and I’m partially building the AST I need to perform the semantic analysis and use to report compile errors to the user and to generate the C# code which I’ll then compile with csc.exe.  I’m using a WCF publish-subscribe pattern to initiate a build from the client and report progress as messages going back to the client.  In the next few weeks for sure, and possibly sooner, I’ll post a link to that so you can give Archetype a test drive yourself.

[Part 7 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 14 Comments »

The Archetype Language (Part 5)

Posted by Dan Vanderboom on May 24, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Type Extensions

If you’re unfamiliar with extension methods in C# or other languages, this section might blow your mind a little bit.  If you love and use extension methods all the time, and don’t know what you’d do without it, my hope is that you enjoy the power that Archetype unleashes with robust type extensibility.

If you’re in the unfamiliar group, type extensions are a way of adding new members to existing types, regardless of whether those types were defined in the same assembly as the extensions or in a different assembly.  Contrary to what the name may suggest, no modification of the original type actually occurs; instead, the extensions are fed into Visual Studio’s language services, and Intellisense is updated to make it appear that those additional members are available for an instance of that type.  Are there methods you’d like to call on any string object?  With extension methods, you can add methods and use them as if they belonged to that class.  Here’s how it looks in C#:

public static class MyExtensions

{

public static void ShuffleLetters(this string Text)

{

// …

}

}

Here we can say var result = “Hello there”.ShuffleLetters(); and the dot triggers Intellisense to pop up and show our ShuffleLetters method.

Extension methods are great.  But if you really embrace them and start thinking in terms of opportunities for extensions, you’ll run into a few brick walls.  You see, extension methods are just a tease; they’re merely the tip of the iceberg, one isolated fragment of a larger (and seemingly happier) family.

First you begin to wish you could add a property instead of a method so you don’t have to put up with parentheses or a Get- prefix to force a property to look and behave like a method.  You might see distasteful expressions like cust.HasChanges(), and there’s nothing you can do about it.

Then you’ll be working with a static class, and you’ll wish you could add a static method, but you can’t add static members.  Eventually, you’ll run into that scenario where an additional operator or constructor would be the perfectly elegant way to solve the current problem.  But you’ll resign yourself to something kludgy instead.

So having gone down a similar road, I was more than a little frustrated when C# 4.0 was released with no new type extensibility at all.  This is one of the stimulants to my starting the Archetype language project: the crystallization of knowledge that no other language team was likely to evolve in the direction and with the priorities that I’ve been developing in my head over the years.  I’m beyond the point where I’m willing to just wait and see what happens.

C# has a clever approach to designating a method as an extension.  However, it’s somewhat indirect.  Instead of saying something like “extend” or “extension” near a class definition, they impose several syntactically-unrelated requirements:

  1. The method must be both public and static.
  2. The first method parameter’s type is the type to be extended.
  3. The first method parameter must be prefixed by the “this” keyword.  This hints that we’re adding an instance member.
  4. The class the method is defined in must be both public and static.

You would never guess these requirements, and it’s easy to get one of them wrong and skip a beat fixing it.  Now try to add extensions for properties, operators, constructors and finalizers, indexers, and possibly fields, and then throw in static members.  How will you designate all of these different things?  Lots more cleverness, I’d say, but it’s not likely to be syntactically scalable.

When we design syntax, it’s helpful to design a whole family of related capabilities together.  You can see Archetype’s approach in the following example:

Customer object

{

FirstName bindable string;

LastName bindable string;

 

this (FirstName, LastName) set all;

}

 

Customer extension

{

BirthDate bindable DateTime;

 

FullName bindable string

get composite FirstName " " LastName;

 

this (FirstName, LastName, BirthDate) set all;

 

static BuildCustomer Customer (FullName string)

{

var i = FullName.IndexOf(" ");

assert i >= 0;

return new Customer(FullName.Substring(0, i), FullName.Substring(i + 1));

}

}

 

There’s a lot happening here, so let’s go over what we see here one step at a time.

  • The original class we’re going to extend, Customer, is defined first.  Two bindable properties and a constructor, nothing more.
  • Creating a wrapper for a class extension doesn’t require that you remember several clever tricks.  Instead, you simply write “ClassTarget extension”, and everything within the structure is considered an extension of that type.
  • BirthDate is an extension property.
  • FullName is also an extension property, however the composite keyword, combined with references to FirstName and LastName, requires that the Archetype compiler look at the target class as well as the extension class to resolve identifiers.  The compiler must also wire the binding infrastructure so the target object stimulates the FullName property to update when either composite part does. Referenced members against the target class must be visible to an external class: private and protected members can’t be referenced from an extension, for example.
  • The this method is an instance constructor.  set all sets FirstName and LastName on the target object and BirthDate on the extension object.  Constructor methods don’t require a return type, as they are assumed to be the same as the type they’re defined in.
  • BuildCustomer is a static factory method.  The assert keyword is a way to define checkpoints to ensure that conditions are what they’re expected to be, which are especially valuable at the beginning and end of methods.  The basic idea is that you’d be able to define their behavior to throw an exception, log a message, or whatever you like when they’re violated: in debug mode, or in production code.  More about this construct in a future article.
  • Operators are also supported, but are not shown in this example.  To see how custom operators are created, see Part 7 of this series for a detailed explanation.

While extension methods are simple to implement in a compiler, extension fields are a little bit more complicated. In addition to this transformation, it is necessary to remove Dictionary entries that refer to objects that have been garbage collected, to prevent these dictionaries from growing uncontrollably. When one or more extension properties are used in an assembly, a background worker thread might occassionally check the keys in these dictionaries to see if any refer to garbage collected objects, and remove them.

The BirthDate property (with its internal storage field) in the above extension class is converted into something like this:

_BirthDates field Dictionary<Customer, DateTime>;

 

GetBirthDate DateTime (Customer Customer)

{

if (_BirthDates.ContainsKey(cust))

return _BirthDates[cust];

 

return default(DateTime);

}

 

SetBirthDate void (Customer Customer, BirthDate DateTime)

{

if (BirthDate == default(DateTime))

{

if (_BirthDates.ContainsKey(Customer))

_BirthDates.Remove(Customer);

}

else

{

if (!_BirthDates.ContainsKey(cust))

_BirthDates.Add(cust, BirthDate);

else

_BirthDates[cust] = BirthDate;

}

}

 

The BirthDate property would handle data binding and then call one of these two methods (or simply include their logic within the property get and set methods).  Another possibility is to instantiate the internally-named extension class and use that as the value in a dictionary.

A runtime mechanism tracks instances of extension objects and remove them from the dictionary periodically.  System.WeakReference provides the mechanism to do this.  It involves two WeakReferences per object: a short weak reference that becomes null and signals the need to cleanup, and a long weak reference to use as a key to the dictionary to clean up.  This mechanism would be loaded only when necessary, and some configuration on its cleanup behavior will be made available.

Static field and property storage would be easier to implement and wouldn’t require any cleanup.

There are a few members that may not make sense to provide as extensions.  Static constructors may provide some value to the extension class, but it wouldn’t be able to reach into the target object at static constructor runtime.

Once the wrinkles of implementation detail are worked out, this rich ecosystem of type extensions will open up clean and elegant solutions for adding missing parts from types that you know belong there.

Custom Control Structures

Every so often, I end up writing a function that behaves like a control structure with the block of statements passed in as a lambda function.  I’ve done this to spin up a thread to run code in (described in this article), and discussed what Parallel.For would look like as a custom control structure in my article discussing the future of programming languages.  My idea there was to define an extension method in such a way, with the delegate at the end, that the compiler would treat the delegate as a separate closure:

public static class Parallel

{

public static void For(long Start, long Count, Action Action)

{

// …

}

}

This is how you’d use it currently in C#:

Parallel.For(0, 10, () =>

{

// add code here for the Action delegate parameter

});

My proposal was to use it like this instead:

Parallel.For(0, 10)

{

// add code here for the Action delegate parameter

}

 

The point I made in that article is worth repeating.  First, a word from Anders at PDC08:

“Another interesting pattern that I’m very fond of right now in terms of language evolution is this notion that our static languages, and our programming languages in general, are getting to be powerful enough, that with all of these things we’re picking up from functional programming languages and metaprogramming, that you can–in the language itself–build these little internal DSLs, where you use fluent interface style, and you dot together operators, and you have deferred execution… where you can, in a sense, create little mini languages, except for the syntax.

If you look at parallel extensions for .NET, they have a Parallel.For, where you give the start and how many times you want to go around, and a lambda which is the body you want to execute.  And boy, if you squint, that looks like a Parallel For statement.

But it allows API designers to experiment with different styles of programming.  And then, as they become popular, we can pick them up and put syntactic veneers on top of them, or we can work to make languages maybe even richer and have extensible syntax like we talked about, but I’m encouraged by the fact that our languages have gotten rich enough that you do a lot of these things without even having to have syntax.” – Anders Hejlsberg

On one hand, I agree with him: the introduction of lambda expressions and extension methods can create some startling new syntax-like patterns of coding that simply weren’t feasible before.  I’ve written articles demonstrating some of this, such as New Spin on Spawning Threads and especially The Visitor Design Pattern in C# 3.0.  And he’s right: if you squint, it almost looks like new syntax.  The problem is that programmers don’t want to squint at their code.  As Chris Anderson has noted at the PDC and elsewhere, developers are very particular about how they want their code to look.  This is one of the big reasons behind Oslo’s support for authoring textual DSLs with the new MGrammar language [now called M].

Ruby and Groovy also support closures that are supplied external to the method arguments.

Where I originally suggested that a first or final delegate parameter should be automatically supported as an external closure, there are a couple reasons to be more explicit.  Consider the following Archetype syntax:

[Keyword]

fork<T>(items IEnumerable<T>, action Action<T> closure)

{

// create Task for each item

}

With this global function, I can write parallelized forking code as though it were a part of the Archetype language.  With the Keyword attribute, I can even colorize the function name as if it were a built-in keyword.

fork(customers)

{

// work with each customer

it.DoSomething();

}

But the main reason to explicitly mark it as a closure is so it can be treated as an embedded statement, such that keywords like return and break behave within the context of the containing scope.  In other words, if I use return within one of these closures, my intention is to return from the method that closure lives in, not to return out of the closure itself: that’s what the the break keyword is for.

The it keyword above assumes the role of the single object in the collection specified (customers).

This is starting to look pretty good, but the it keyword is a crutch if you think about it.  We only need it because we don’t have something nicer like the expression I introduced in Part 3 of this series when I talked about iterating with loop.  You may recall there were basically two formats for specifying the bounds and behavior of the loop.

// loop through and reference each object in an IEnumerable

loop (var cust in Customers)

{

}

// i starts at 11, decrements by 2, until it reaches (or passes) 1

loop (var i in 11..1 skip 2)

{

}

To make a custom control structure look and behave like one of the built-in variety, there must be a way to indicate in the parameter list that such an argument is required.  So let’s say that we introduce an iterator keyword to indicate we want to support either of the syntaxes shown above.

[Keyword]

fork<T>(items IEnumerable<T> iterator, action Action<T>)

{

// create Task for each item

}

This allows the function’s invocation to define an item identifier which is exposed to the following closure.  We could then very naturally write:

fork (var cust in customers)

{

// work with each customer

cust.DoSomething();

}

Now we can reference an identifier that makes sense to us, cust, and the whole thing looks like it’s baked into the language.  Viola!

There’s another kind of iterator that pertains to coroutines that I haven’t discussed yet, but in Archetype I call them streams, so there shouldn’t be too much confusion between them.

Named Closures

Let’s take this to the next level.  What if we wanted to extend our control structure with a second closure that would execute when all of the tasks that were forked had been completed or canceled?  This would complete the fork-join concurrency pattern.  Consider the following syntax:

[Keyword]

fork<T> void (items IEnumerable<T> iterator, ForkAction Action<T> closure,

JoinAction Action<TaskList<T>> closure as join)

{

// 1. schedule Task for each item

// 2. when all Tasks have been completed or canceled,

// 3. invoke JoinAction

}

Here is how we use it, taken from Part 3 in the series:

// fork out a bunch of parallel tasks and join when all are done

fork (var cust in Customers)

{

// this code is encapsulated in a task in the TPL

// and scheduled for execution

}

join (tasks)

{

// this code block is executed when all of the tasks

// are either completed or canceled

}

One thing I haven’t mentioned yet is Archetype’s support for optional parameters.  They work the same as in C# or VB.NET, and come in handy here.  By making the second closure parameter optional (by adding “ = null ” after "closure as join”), we can now use fork alone, or fork-join together, in a single function definition.  If we don’t enclose it in a specific namespace or class, it will look exactly like a language keyword.

Another example was brought to my attention (in the comments for article 2).  The idea is that a predicate is defined in a closure, syntactically separate from the argument list, inspired from the language Groovy.

print("trying some new syntax") { it.Length > 5 };

But let’s convert this one step at a time.  If we use Archetype’s concept of a named closure, we could insert if to make the intent more obvious:

print void (Text string, Predicate bool(string) closure as if)

{

if (Predicate(Text))

Console.WriteLine(Text);

}

print("trying some new syntax") { return it.Length > 5 };

The curly braces—though fantastic for multiple-statement blocks of logic, is overkill for simple expressions (and so is the return keyword).  Unless I uncover a good reason not to, I’m inclined to allow Archetype to trade the curly braces for parentheses for simple expressions.  We could then write this, which is what I was ultimately looking for.

print("trying some new syntax") if (ShouldPrint && it.Length > 5);

I introduced the ShouldPrint boolean variable here to illustrate that the conditions provided here don’t have to relate to it (and in most cases, probably would not).

You may be wondering if using if as the closure’s name would cause a problem with the if keyword in the Archetype language.  The reason it’s not a problem is that it appears between the method name on the left and the semicolon on the right.  After the semi-colon, the appearance of the if token would be mapped to the language keyword.

The first closure parameter of a method can be named or nameless, but all subsequent closures must be named.  I can’t think of a reason to limit the number of closure parameters other than too many is ridiculous, but I’ll leave that to the discretion of the Archetype developer.  Closure parameters can be defined with params for chaining together arbitrary numbers of named closures.  When calling a method with no parameters except closures, the parentheses after the function name can be omitted.  These methods will be exposed to other languages as having delegate types as well as Iterator and Closure attributes to identify those parameters.

Closure Extension Clauses

Another possibility arises when you consider that the if clause added to the print method could be defined in such a way that it could be applied to any method invocation.  I’m not advocating this particular pattern of placing the if condition after the statement to execute, but it will do for the sake of illustration.  This feature is purely speculative and I’d be curious to hear some feedback.

I could imagine writing a closure extension clause to solve the print problem more generally, something like this:

any (Predicate bool() closure as if)

{

if (Predicate())

any();

}

Then as long as it’s in scope, I could write code like this:

Start void ()

{

import System.Console;

 

var debug = true;

 

WriteLine("Debugging started") if (debug);

 

var name = ReadLine() if (!debug);

WriteLine("Hello " name "!") if (!debug);

}

One line deserves explanation: name is defined as a string variable regardless of the if clause; it is ReadLine which is executed or not based on the condition specified.

More interesting examples are possible if we leave the world of expression closures and consider multiple-statement closures.  One possibility involves adding an async closure extension clause, with the callback handled by the supplied closure.

Start void ()

{

import System.Console;

 

async FetchData()

{

// respond to callback

}

}

There are many possibilities here, as this opens up the doors to syntax experimentation without actually having to modify the language grammar itself.  This is very similar to macros in languages like Lisp and Nemerle.  On one hand, it’s more constrained to ensure that each extensibility point always conforms to a common set of structural principles, but the variety of structural extensibility points make it extremely versatile.

I have a few more ideas for language extensions (or “syntactic sugar shaping”) for other types of clauses (we only covered method invocation here), but I’m going to save that for another article.

Next Steps

This article took some long strides toward defining how Archetype handles type and language extensibility, positioning Archetype as an incredibly flexible and malleable tool with which to define syntactic patterns for solving entire classes of problems more intuitively and elegantly.

I created a CodePlex project for Archetype to give it a home.  Over the past few weeks, I’ve created a C# 4.0 parser using the M language to prepare me for the construction of Archetype’s parser and compiler.  The C# parser definition is available for download on the CodePlex site.  If you’re curious about how languages are parsed and projected into Abstract Syntax Trees (ASTs), download this and open it in Intellipad.  You can download Intellipad for free at this Microsoft site.

Once you’ve opened it in Intellipad, find where it says “M Mode” in the upper-right corner.  Click that, and select DSL Grammar Mode.  Then open the DSL menu and select “Split New Input and Output Views”.  You’ll see three window panes.  On the left, you can enter or paste in some C# code.  The middle contains the C# grammar.  On the right, you’ll see a graph of objects that represents how the parser views your code.  It’s pretty interesting to see what it comes up with!

image

My next steps involve cleaning up the Unicode character class issues in the grammar, and then getting it to build an object-graph in memory based on the M Graph (the contents of the right window pane).  After that, I can work on generating C# code from that AST.  I’ll end up with a C# to C# converter, which seems silly, but I’ll eventually fork the M grammar into its own Archetype grammar and start to change the parser to accomodate the new language.  The output will remain C#, since that’s easy to compile to assemblies with the csc.exe compiler, but the input will be the new goodness of Archetype.

Future articles will detail this work, as well as defining new corners of Archetype language syntax.

[Part 6 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 14 Comments »

The Archetype Language (Part 4)

Posted by Dan Vanderboom on May 8, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Conditional Selection

The if statement has been a classic across so many languages.  In Archetype it is almost identical to C# syntax.

if (expression)

statement;

 

if (expression)

{

// expression is true

}

else

{

// expression is false

}

When conditions become complicated, reversing all of the boolean logic can be tricky.  A common way of reversing it is to surround the expression in parentheses and placing a unary not operator before it.  With the required parentheses around the if statement’s condition, it looks like this:

 

if (!(expression))

statement;

In Archetype, the exclamation point can be placed before the parentheses.  It is the only part of the condition that can appear outside the parentheses.

if !(expression)

statement;

Pattern Matching

C-style languages have supported a language construct, switch-case, for providing access to simple jump tables combined with a syntax that is better suited than if for matching many conditions.  This has been unfortunately limited to matching against value types (and string in C#) and against constant values at that.  It’s unfortunate because the more concise syntax for multiple matching values is good in itself, not only when the matching values are constant values.  This constraint is due to the way those compilers build jump tables; it’s a performance optimization technique designed during a time when 8 MHz processors were considered fast.

Pattern matching is one area where functional languages have been very strong.  Archetype has a match keyword that serves the purpose.

match (text) "BEGIN" -> HandleBegin();

This first example matches a simple string to a constant value, and calls HandleBegin if there is a match.  You could write this using an if statement as well.  Here is the equivalent code:

if (text == "BEGIN") HandleBegin();

This next example illustrates several ideas.  There are multiple conditions and some of the conditions are grouped together (with the or operator, |) to share the same reaction code.  The condition or conditions are listed first, followed by the –> operator, and a statement or code block of statements on the right specifies the reaction code.  Also note the numeric range 6..10 and the use of non-constant values (such as x).  Any valid expression is allowed here as long as it’s type matches (or can implicitly cast from) the type of the term being evaluated (number).  It’s also worth mentioning that the | operator isn’t necessary before each set of conditions as it is in other functional languages.  (I’d rather align the left edge of code to the beginning of each condition.)

var x = 1;

var number = 4;

 

match (number)

{

x -> number += 3;

3 | 5 -> number++;

2 | 4 | 6..10 ->

{

number–;

Log("Numbers are getting too big");

}

}

Unlike switch, Archetype’s match doesn’t automatically fall through from one match to the next.  This feature is rarely used with switch and is a significant source of programming defects.  For the few scenarios where you’d like to execute every branch that matches, I’m considering a match all construct which would look like this:

match all (text)

{

"BEGIN" -> LetsBegin();

(Letter | Digit)* -> AddIdentifier(text);

}

In this example, both LetsBegin and AddIdentifier would be called.

Regular Expression Literals

Archetype also supports regular expression literals based on the syntax from Microsoft’s M language as a result of list syntax and operator overloads defined as a library.  In many scenarios, Archetype can determine the difference between string and regular expression literals.  However, in simple cases such as matching against a single character or simple string of characters, the variable defined will require the specification of the regex type.

var BeginToken regex = "BEGIN";

Without this qualifier, BeginToken would look like a string to the compiler.  To ease this problem, Archetype will convert a string to a regex object if the string participates in a regex-typed expression.  Here’s an example:

var BeginToken = "BEGIN";

var MyRegEx = BeginToken | ("A".."Z")*;

The range of letters and the * Kleene operator (which means repeat 0 or more times) identifies MyRegEx as a regex identifier.

Let’s take a look at how regular expressions and regular expression literals work with the match construct.  First we see the literal embedded directly in the match statement, as the only value to match against.

match (text) ("A".."Z" | "a".."z")* -> DoSomethingUseful();

Next, we’ll look at how we can define the regex objects and use their identifiers.  In this way, we can build up libraries of interdependent regular expressions and go even so far as to write sophisticated parsers.  This is an important tool for fulfilling the goal of language-oriented development.

 

var Letter = "A".."Z" | "a".."z";

var Digit = "0".."9";

 

match (text)

{

"BEGIN" ->

{

MarkBeginning();

NewTransaction();

}

 

(Letter | Digit)* -> AddIdentifier(text);

}

Finally, you can apply a when clause as a pattern matching guard similar to F#.

var Letter = "A".."Z" | "a".."z";

var Digit = "0".."9";

 

match (text)

{

"BEGIN" ->

{

MarkBeginning();

NewTransaction();

}

 

(Letter | Digit)* when (text.Length < 15) -> AddIdentifier(text);

}

This isn’t the last word on pattern matching or regular expressions in Archetype.  This is one area I expect to evolve and grow, and to appear in future articles.

Agents, Classes, and Traits

Archetype is a multi-paradigm language, as most commonly-used languages are today.  While it has many features which are functional, it’s heavily influenced by object-oriented design ideas.  Most object-oriented languages are largely imperative rather than functional.  That is to say, it is “programming by side-effects” rather than the goal in functional programming of “no side-effects” (or as few as possible).

Functional programming has grown in popularity relatively recently, considering it’s been around from the beginning of high-level language design.  However, it suffers in some areas such as representing stateful behavior in user interfaces.  Some clever solutions have been devised (such as the use of monads to trick or fake the logic into representing state in a purely functional way), but the theory and application of these patterns are far from intuitive.  I believe this is largely the reason why functional programming languages have been the niche speciality of scientists and mathematicians and not your every day developer. 

Because of these contentious forces, Archetype is aimed at being a transitional language: urging us forward in the use of functional patterns, but without abandoning the imperative style of “programming by side-effects”, and striving to look familiar to programmers of imperative languages such as C#, Visual Basic, etc.

Software Agents and the Actor Model

There are some built-in Archetype constructs that will help to make object-oriented programming safer.  While it doesn’t propose, like Axum (previously code-named Maestro), to prevent any logic that is unsafe in a concurrent execution environment, it does provide some simple but powerful tools that can be used to reduce the risk considerably.

I’m referring primarily to agents which support the Software Agent or Actor Model of parallel program design.  Agents are special classes that run independently (in parallel) of each other, and can only communicate with other agents through messages (introduced in part 3).  Specifically, agents are not allowed to call the methods or subscribe to the delegate members of other agents.  Since each agent runs without the ability to receive from or give execution control to other agents, there is much smaller chance of coordination problems while executing concurrently.

In every other way, however, agents are defined and composed just like classes.  First, we’ll take a look at a simple Customer class.

Customer object, IDisposable

{

FirstName string;

LastName string;

 

this(FirstName, LastName)

set all;

 

Dispose()

{

// clean up

}

}

The class name appears first, followed by a required base type (object here), and a list of interfaces separated by commas.

Archetype supports single-class inheritance with the ability to implement any number of interfaces (or traits, more about that later in this article), as well as any generic type parameters and generic type constraints that you’re used to.

We see some new things here, however.  The instance constructor is called this, and the set keyword is used to set the values of class members where they match constructor parameters of the same name.  This is the same as writing these lines:

this.FirstName = FirstName;

this.LastName = LastName;

With many parameters in a constructor (or another method), this can save many lines of typing.  If you only want to store some of the parameters in properties, you can use a comma separated list: set FirstName, LastName.  If your intent is to set all parameters, you can use the abbreviated set all instead, as shown in the example.  When set all is used, specifying parameter types is optional.

The following code provides an example of two agents that cooperate with each other.

WebDataAgent agent

{

Subscriptions Dictionary<string, List<guid>>;

 

this()

{

// initialize the agent

Subscriptions = Dictionary<string, List<guid>>();

}

 

Subscribe in message(Topic string);

{

if !(Topic in Subscriptions.Keys)

Subscriptions.Add(Topic, new List<guid>);

 

Subscriptions[Topic].Add(me.Client);

 

// confirm the subscription

SubscriptionConfirmed(me.Message);

}

 

SubscriptionConfirmed out message(RequestID guid);

 

PublishMessage in message(Topic string, Message string)

{

loop (var sub in Subscriptions[Topic])

{

MessagePublished(Topic, Message);

}

}

 

MessagePublished out message(Topic string, Message string);

}

UserInterfaceAgent agent

{

CurrentView IView;

 

this(StartView IView)

CurrentView = StartView;

 

RequestData out message(RequestID guid, Method string);

 

DataReceived in message(RequestID guid, Result List<double>)

{

// handle incoming message…

 

// unhook this message handler

DataReceived -= me;

}

}

I have more ideas for actor-based programming (such as a built-in RequestID: me.id), but I want to start simple and force myself to work hard to justify any overhead.

Traits

Composing classes together to obtain maximum reuse of code has been a goal of object-oriented programming for a long time, but it usually falls short of the ideal.  Languages like C++ that support multiple inheritance are unwieldy due to the additional complexity (see the Diamond Problem), and single inheritance—though sufficient in most scenarios—suffers from limitations that have bothered OOP programmers from the beginning of programming time.  Other languages have introduced constructs like Flavors and Mixins, and each has had to deal with its own peculiarities and workarounds.  While an in-depth discussion of traits and their advantages over other approaches is beyond the scope of this article, a smart group at the OGI School of Science & Engineering published a paper that illustrates the issues clearly.  In it, they explain how traits solve many of the problems while avoiding the pitfalls of other approaches.

I found this characterization to be particularly lucid (the bold emphasis is mine):

Although multiple inheritance makes it possible to reuse any desired set of classes, a class is frequently not the most appropriate element to reuse.  This is because classes play two competing roles.  A class has a primary role as a generator of instances: it must therefore be complete.  But as a unit of reuse, a class should be small.  These properties often conflict.  Furthermore, the role of classes as instance generators requires that each class have a unique place in the class hierarchy, whereas units of reuse should be applicable at arbitrary places.

– Nathaneal Scharli et al, in their 2003 paper entitled “Traits: Composable Units of Behavior

The basic idea is that a trait defines a set of functions but no state.  Multiple traits are pulled into a class, where they are “flattened”.  This means that each trait’s functions are added to the class as if those functions were defined directly in the class.  That is, you don’t need to use a member access operator (.) to navigate from the class to the trait and then to the trait’s function.  In doing this, it’s possible for function names and signatures to overlap among traits.  If the name is the same but the signature is different, they’re applied as overloads.  When there’s an actual clash, a conflict-resolution expression is defined to specify the function to use (or ignore).

Although the design of traits involve the lack of any state, Archetype may attempt to include trait-local state.  That is, variables that are visible to the functions of that trait, but which can’t be seen from the hosting class or any other trait.  (This corresponds to the idea of extension properties, which I’ll discuss in the next article.)

This is an experimental area of the Archetype language, one that will likely change several times before getting right.  Here’s an example of what it will probably look like to compose classes out of traits.

Serializable<T> trait

{

provide Serialize string (obj T) { … }

provide Deserialize T (input string) { … }

}

 

Persistent<T> trait

where T : ref

{

// by function

require Serialize string ();

require Deserialize T (input:string);

// or by trait

require Serializable<T>;

 

provide HasChanges bool

get, private set;

 

provide static Load T (id: uid) { … }

provide Save void () { … }

}

 

Customer object, Serializable, Persistent

{

FirstName string;

LastName string;

 

FullName string

get FirstName " " LastName;

}

 

Start void ()

{

var cust = Customer.Load(123);

 

cust.FirstName = "Dan";

cust.LastName = "Vanderboom";

 

cust.Save();

}

A few notes about the code:

  • The type parameter on the trait allows you to constrain the types of object to which the trait can be applied.
  • The where T : ref is the same as where T : class in C#.
  • The require and provide keywords specify the methods that trait requires to be present, or provides to the host class.  Archetype may also support a require of an entire trait, which would act analogously to subtyping (or rather, more like an #include).
  • Conflict resolution expressions aren’t shown because their syntax hasn’t yet been decided.
  • Code formatting applies an italic font to traits, but maintains the color of a user-defined type.  This should help to avoid confusing traits with classes.  This is only one possible solution, but it suggests the use of multiple ways of categorizing identifiers to apply a mixture of formatting and colorizing behaviors.
  • This is not a great example of the strength of traits.

I’ve also had some design ideas for runtime mixing of traits into classes, while still accessing everything through strongly-typed variables (think of traits as interfaces), but this will require much more exploration.

Another idea to support some Aspect Oriented feature.  Imagine if you could define a trait called Bindable, that when added to a class, would add the bindable type extension modifier to all of its properties.

For better examples of traits and class composition using traits, I recommend reading the above-mentioned paper.

Next Steps

In this article, I covered simple conditional statements as well as functional-style pattern matching.  We also looked at agent-based programming based on loosely-coupled messages, which provides greater safety in parallel programming scenarios, traits as a way to compose features on a more granular level and to solve the composition problems that plague single-inheritance languages.

My next article will cover extension of types (extension methods, properties, events, indexers, constructors, and operators), as well as the first of language extensibility options (defining new control structures).  I will probably dip out of sight for a few weeks as I get further along in building the parser and compiler, and learn about Visual Studio’s Managed Language Services.

If you don’t already follow me on twitter (@danvanderboom), I do a lot of tweeting about what I’m reading, researching, or considering during the language design process, so this is a good way to get an inside look at that process.

[Part 5 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 7 Comments »

The Archetype Language (Part 3)

Posted by Dan Vanderboom on April 27, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Exception Handling

The try keyword can be used within any code block.

ProcessItem void (Item Item)

{

try

{

// throw a runtime exception

}

catch (ex Exception)

{

// handle exception

}

finally

{

// cleanup

}

}

Additionally, every function can specify its own inline catch and/or finally blocks like this:

 

ProcessItem void (Item Item)

{

// throw a runtime exception

}

catch (ex Exception)

{

// handle exception

}

finally

{

     // cleanup

}

You’ll see this pattern appear in other constructs in Archetype (such as async blocks).

 

A try block can exist with one or more catch clauses only, a finally clause only, or both. If both are included, the catch clauses must come first. This is true whether they’re defined as part of the function (example 0) or have an explicit try clause (example 1).  Since curly braces are optional for single statement code blocks, we can write this:

try Work();

catch (x Exception) Log(x);

finally Finish();

The exception type can be specified by itself (without a name), and the default variable "ex" will be used. If a catch block doesn’t specify an exception type, Exception is presumed.

try Work();

catch (ArgumentException) Log(ex);

catch (NullReferenceException) Log(ex);

catch Log(ex);

finally Finish();

The order of catch blocks must be from most derived to most base (Exception itself must always be last, if present). Incorrect ordering will result in a compiler error.

Catch and finally blocks are scoped as nested within the try block. This enables catch and finally blocks to reference identifiers defined in the try block.

try

{

var answer = 42;

}

catch

{

// valid reference to answer

var a = answer;

}

As with C#, the throw keyword can be used with an Exception variable to wrap and rethrow a caught exception, or throw can be used to in a statement by itself to rethrow the original exception.

Namespace Imports

In Part 2, we saw the first of the import keyword to import namespaces.

import System;

 

Start int ()

Console.WriteLine("Hello world!");

Like Nemerle, we can also apply the import keyword with classes to access static members without specifying the class name.

import System.Console;

 

Start void ()

WriteLine("Hello world!");

Another option is to import a namespace or class into a nested function or class scope.

Start void ()

{

import System;

Console.WriteLine("Starting up…");

}

 

Employee object

{

import System;

 

Work void ()

{

Console.WriteLine("Working hard!");

}

}

Similar to the with keyword in Pascal, import can be used to import a namespace or class for a specific code block. This limits a namespace or class import to a limited section of a function.

Start void ()

{

System.Console.WriteLine("The following import doesn’t apply here.");

 

import System.Console

{

WriteLine("hello");

WriteLine("goodbye");

}

}

One final thing you can do with import is to specify a namespace alias.

Start void ()

{

import sys = System;

sys.Console.WriteLine("Example of a namespace alias");

}

Aliases

The section on namespace imports above introduced namespace aliasing.  In this section, we’ll see how to use the alias keyword to provide additional identifiers to classes, functions, properties, and fields.

The class alias is similar to the import class pattern, except that a new identifier is introduced and must be used to reference its members.  (With import, that class’s static members are implicitly accessible.)

DemonstrateAlias void ()

{

alias kid = Geneology.Child;

var Josa = new kid;

Josa.FirstName = "Josa";

Josa.Age = 4;

}

In addition to the alias statement, I’m introducing object instantiation syntax in Nemerle.  The constructor of the Geneology.Child class is being called (via its alias, kid) with the new keyword but no parentheses, which are optional when calling a parameterless constructor.

The var keyword is also new here.  It is similar to the var keyword in C#, except that it’s required for local variable definitions.

This example demonstrates alias used for a local variable.  The syntax is identical for class fields and properties, except that those aliases can be specified at the class level as well as within functions.

DemonstrateAlias void ()

{

var SocialSecurityNumber = "123-456-7890";

alias SSN = SocialSecurityNumber;

System.Console.WriteLine(SSN);

}

This last alias example shows how it can be applied to functions.

DoWork void () { … }

Test void ()

{

alias work = DoWork;

work();

}

Control Flow

We’ve already discussed exception handling, which is a very fundamental kind of control flow structure.  In this section, we’ll explore several other constructs that are familiar to every programmer.

In Programming Language Pragmatics, the author (Michael L. Scott) enumerates six essential types of control flow: sequencing, iteration, selection, exception handling, recursion, and concurrency.  We’ll cover most of them in this article.

Sequencing is merely the scheduling of one statement to be executed after another.  This is the standard model of interpreting source code statements in a code block, so there’s not much more to say about it.

Iteration

Iteration is much more interesting.  In Archetype, there are four iteration constructs.

Loop

The first is the incredibly versatile loop.  It can take a simple integer expression to loop a specific number of times.  It can alternatively take an expression that introduces a variable, a range of values, and an optional skip value (similar to the for keyword in Visual Basic).  Finally, it can act like the foreach keyword in C# and iterate over IEnumerable and IEnumerable<T> collections such as streams, lists, etc.

// loop 10 times

loop (10)

{

}

 

// i starts at 3, increments by 1, until it reaches 9

loop (var i in 3..9)

{

}

 

// i starts at 11, decrements by 2, until it reaches (or passes) 1

loop (var i in 11..1 skip 2)

}

 

// define cust, then loop through and reference each object in an IEnumerable

loop (var cust in Customers)

{

}

This replaces the archaic syntax of the for loop in C# and older C-style languages, and provides a construct which is much easier to read and write.  Each of the integer constants in the examples above can be replaced with expressions (variables, function calls returning integers, etc).

When writing a loop, it’s often necessary to skip the remainder of the current iteration and continue with the next one.  In C#, the ambiguous-sounding continue keyword is used.  I remember seeing this for the first time and thinking that it meant to continue executing after the loop, which wasn’t the case.  So in Archetype, I’m ressurrecting the venerable old keyword next, as in “go to the next iteration in this loop”.  To break out of a loop altogether and continue executing after the loop, the break is used.

// define the int i and loop from 0 to 9

loop (var i in 0..9)

{

// do some work

 

if (DoneWithThisIteration)

next;

 

if (DoneWithLoop)

break;

 

// continue with more work

}

Fork-Join

A potent addition to the loop construct is the fork-join pattern.  The fork and join words are actually defined in a library, not in the language itself.  In a later article, you will see how to create patterns like this.

// fork out a bunch of parallel tasks and join when all are done

fork (var cust in Customers)

{

// this code is encapsulated in a task in the TPL

// and scheduled for execution

}

join (tasks)

{

// this code block is executed when all of the tasks

// are either completed or canceled

}

The join clause’s parameter, named tasks in the example, is a reference to a list of Task Parallel Library (TPL) tasks.  This is a handy way to execute code in parallel without having to restructure your code (similar to the Parallel.ForEach method in the TPL).  Most of the difficulties of concurrent programming are matters of coordination, however, and so they are often best handled by parallel libraries such as the TPL or the Concurrency & Coordination Runtime (CCR).

While and Until

The next example is the familiar while loop.  It matches a condition at the beginning of each iteration and only executes the following code block if the expression provided is true.

// repeat while condition is true

while (a == 10)

{

}

The until loop works similarly, but tests its condition after the following code.

// repeat until the condition is true

until (str.Length == 0)

{

}

Although it makes sense in C# that the until clause should appear at the end, where it’s placed, the reality is that in C# the syntax is awkward: I don’t know whether to keep my curly braces aligned and put the do and until on their own lines so they don’t crowd the embedded code block, or what.  With the naming difference in Archetype, and with the debugger’s help in stepping through in the correct way, I’m betting this feature will not only be easy to grasp, but hopefully will be seen to clean up certain coding situations and help make looping syntax more structurally consistent.

The break and next keywords apply to while and until loops as it does for loop constructs (see the Loop section above).

Asynchronous Programming

It’s becoming more common now to make asynchronous calls to web services and other long-running processes that we don’t want our code to sit around waiting for.  In Silverlight, for example, the only network communication options we have are asynchronous.  But the Asynchronous Programming Model (APM) has a way of confusing and tripping up developers as they try to wrap their heads around it.

Archetype introduces a few language constructs to make asynchronous programming easy.

Calling Methods and Delegates Asynchronously

The async keyword allows you to call any method or delegate asynchronously with the same syntax.

GetData string (Index int)

{

return (Index + 1).ToString();

}

 

async GetData(42)

{

var result = value;

}

All of the details of dealing with AsyncCallback and IAsyncResult are abstracted away.  The value keyword represents the return value of the asynchronously-called method (if applicable).  The first access of value may result in an exception in the event that the target method failed when it ran.  This can be caught with a standard try-catch block, or the following syntax can be used.

async DoWork()

{

// success

}

catch (ex ArgumentException)

{

// failure

}

finally

{

// final logic

}

All of the standard rules apply regarding the syntax of catch and finally blocks (see the Exception Handling section above in this article).

As another reminder of the optional curly braces for single statements, here is a short and simple async call example.

async DoWork()

NextStep()

catch HandleError(ex)

finally Cleanup();

The ability to specify your intent to call methods and delegates asynchronously without mucking around in the implementation details should go a long way to making developers more productive in high-latency scenarios.  In a final example of async, I’ll demonstrate how we can still obtain access to the IAsyncResult variable that is returned by the APM, which is useful if you need to occasionally check if it’s completed.

var ar = async DoWork()

NextStep()

catch HandleError(ex)

finally Cleanup();

Messages

Messages offer an alternative to delegates.  As useful and simple as delegates are, the problem with them is that they pass along not only data, but also execution control.  Often what we need, in particular for applications that must take advantage of parallel execution, is a way to pass along data without giving up control over execution.  This is usually done by pushing a message onto a queue which can be picked up at the receiver’s convenience without holding up the sender.

In Archetype, messages fulfill this need.  Like delegates, they can be named or anonymous.  Here is an example of each.

message EmptySignal();

SomeAgent agent

{

// using a named message

Started out EmptySignal;

 

// using an anonymous message

Completed out message(string);

}

Unlike their delegate counterparts, messages don’t have return values.  Return values only make sense when you make a synchronous call and give up execution control.  Though we can get around this with the async construct and the Asynchronous Programming Model, this is something of a hack.

In the example above, the anonymous message provides the keyword message in place of a return type.  In both cases, the out keyword is used to indicate that it is an outgoing message.  As you might expect, there is an in keyword to indicate incoming messages.

Messages are one way communications.  If you need a response, you need to specify a corresponding incoming message.  This applies to the communication of error information as well.  If you’re interested in learning more about asynchronous message-based communication, you can refer to my article on the subject.

The out messages act like delegates and can be called like functions, and you can think of in messages as event handlers, although they can also be invoked like methods within the agent.

I am in the process of evaluating Axum and other agent-based languages.  I’m particularly interested in the way Axum defines protocol contracts, which are compile-time checks that message A is responded to by message B, and so on.  It’s likely that this type of constraint will be implemented using the language extensibility capabilities of Archetype, and that it will be deferred until I understand the issues and challenges better.  Until that time, I believe that having the option to define messages as first-class members will provide developers with a greatly-needed tool for safer concurrency programming.  It will make the execution control model explicit and obvious at member definition instead of being bolted-on later as a set of imperative instructions in an often-misunderstood corner of the .NET Framework.

Another avenue I’m exploring is a syntax for exposing messages as WCF endpoints.  Stay tuned for more information on this subject.

Next Steps

We covered a lot of ground in this article.  In these first three articles, many of Archetype’s most fundamental and important constructs have been explained and demonstrated.  In the next article, I’ll introduce Archetype’s capabilities and syntax for conditional selection (if), pattern matching (including a much-needed replacement for the switch statement), and the relationship between traits and classes.

There’s a lot more to see, but since this article turned out to be so large, I’ll stop here with promises about the next one.

[Part 4 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 4 Comments »

The Archetype Language (Part 2)

Posted by Dan Vanderboom on April 27, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

The Purpose of Archetype

You may wonder why I’m designing a new language.  As I explained to Vlad in the comments of my introductory article:

I don’t need it. There are plenty of perfectly usable languages out there.

That being said, I want it. I want to spend a lot less time with ceremony and more with substance. I want greater expressive power without sacrificing readability. I want to extend the language syntax and hook into the compiler at certain key points to experiment with new ideas without having to version the base compiler. I want to define traits as composable types and reserve classes for engines of instantiation when it makes sense to do so. I want common concurrency patterns to feel like first-class citizens so that behavior guaranties can be made at compile time. I also want to see if all the language ideas I’ve come up with over the years will really be as valuable as I think they will be.

I have a lot more reasons, too. They’ll be the subject of continued articles in the series.

Hello World!

I would be remiss if I didn’t include a Hello World example.

Start void ()

{

System.Console.WriteLine("Hello world!");

}

This is a complete program.  It defines a Start function as a top-level construct (not inside of a class), with a single call to Console.WriteLine, a normal .NET method.

Because any code block can be either a single statement or a pair of curly braces containing zero or more statement, we can shorten our example to look like this:

Start void ()

System.Console.WriteLine("Hello world!");

C# already allows this with constructs like if, while, and for.  Archetype takes it to the next level by making it universal.

Requiring the start method of a program to be hosted in a class seems like a kludge to me; and in the spirit of enabling the language to be used for more functional programming, I thought it appropriate to allow this type of functional composition without the ceremony of an enclosing class.

The startup function name will be Start by default, and a modifiable setting on the project options page will let you use a different name for the entry point function.  It will also support using a static function in a class.

Delegates

Delegate definitions in Archetype are very close to function definition syntax.  Consider this example:

import System; 

Start void ()

{

ShowInfo void (); // define a delegate

 

ShowInfo(); // invoke the delegate

 

ShowInfo = () => { Console.WriteLine("Name: " me.Name) };  // Name: Start

ShowInfo += { Console.WriteLine("Type: " me.Type) };    // Type: void

 

ShowInfo();

}

First, we’re introducing the import statement.  Our use of it here is identical to the using keyword in C#.  The import statement does some other interesting things, which we’ll see in a future article.

The first statement in our Start method defines a delegate called ShowInfo.  Note that the only difference between this and a function definition is its lack of a trailing code block.  Instead, a semicolon appears after the (empty) parameter list.

The next line invokes the delegate.  In C# this would throw a NullReferenceException, which I’ve always found annoying.  In Archetype, as with Visual Basic, this gets converted by the compiler into a check for null followed by an invocation if it’s not null.  I’ve gone this route because of how rare it is that I actually want to throw an exception in these cases; in C#, I’m constantly writing the null check for delegates to avoid it, wrapping that check and the invocation in OnDoWhatever methods, and that seems wasteful.  In Archetype, if you want to throw an exception when a delegate is null, then write the code to throw one explicitly.

The following two lines point the delegate to a specific function (expressed as a lambda, similar to C#) and add a lambda function to the first.  The =, +=, and –= operators work as expected with delegates.

Notice that the first lamda function supplies an empty parameter list, but the second one omits it.  The parameter list can be omitted when there are no parameters, or when you don’t need to reference the arguments that are passed in.

It’s possible to be even more terse if we have a single non-assignment statement we’re assigning to our delegate variable:

ShowInfo = Console.WriteLine("Name: " me.Name);

Assignment statements cause a problem with parsing because of the right-to-left interpretation of assignments.  Consider these statements:

ShowInfo = Console.WriteLine("Name: " me.Name);

 

ShowInfo = Age = 1;

The first line is a valid delegate assignment.  The intention of the second is :“each time ShowInfo is invoked, set Age to 1.”  However, the parser reads this as “Set Age to 1, then set ShowInfo to Age,” which is not what we want.  As a result, single assignment statement delegates in Archetype require being surrounded in curly braces.

Finally in our Start method above, the last line invokes the delegate again, which in turn calls both lambda functions.

Additional Notes:

  • The me keyword refers to the current function, Start.  As the comments suggest, me.Name returns “Start” and me.Type returns a System.Void Type object.  Calling me() as a function would call Start recursively.
  • String concatenation doesn’t use the + operator.  Multiple strings separated by spaces are concatenated automatically.  In the case of Console.WriteLine above, where a string literal (“Type: ") is followed by a non-string value (me.Type), the non-string value is converted to a string with ToString.  This can occur because the non-string value is listed where a string is expected.

Delegate Parameters

Defining parameters for delegates is easy. If an anonymous function won’t use any of the parameters passed in, it can omit the parentheses entirely. Individual parameters names can be omitted with an underscore character. Otherwise, argument names are supplied as usual. All three variations can be seen in the following example:

Start void ()

{

ShowInfo void (Info string, Priority int);

 

ShowInfo = (info, priority) { Console.WriteLine("Name: " me.Name) };

ShowInfo += (info, _) { Console.WriteLine("Type: " me.Type.Name) };

ShowInfo += { Console.WriteLine("Info: " info) };

 

ShowInfo("Fake Info", 10);

}

Named (Non-Anonymous) Delegate Types

So far we’ve only seen anonymous delegate types.  The ShowInfo delegate above has a type, but we can’t refer to it by name, and so we can’t share that type with other code.  This is fine in many cases.  In fact, many times I’m annoyed by the need to go to another file to add a delegate that will never be used elsewhere.  But there’s also occassionally a need to expose that type, especially for use by a library or framework consumer.

The following code defines a delegate type, a function that uses that delegate type for its parameter, and a call to the function. That call contains a lambda expression that creates an anonymous function and a delegate object pointing to it, and passes that to the ShowInfo function.

type ShowPersonDelegate void (Name string, Age int);

 

ShowInfo void (ShowPerson ShowPersonDelegate)

{

ShowPerson("Josa", 4);

ShowPerson("Ava", 1);

}

 

Start void ()

{

ShowInfo((name, age) { Console.WriteLine(name " is " age) });

}

Another way to think about this is that the delegate keyword simply defines a name (ShowPersonDelegate), which then points to an otherwise-anonymous delegate type: void(Name string, Age int).

The final statement calls ShowInfo, passing in a lambda function, which has the same syntax as C#.

For the sake of comparison, here is the same program using an anonymous delegate.  Note that the delegate’s parameter names are optional.

ShowInfo void (ShowPerson void(string,int))

{

ShowPerson("Josa", 4);

ShowPerson("Ava", 1);

}

 

Start void ()

{

ShowInfo((name, age) { Console.WriteLine(name " is " age) });

}

As in C#, a delegate object will be created automatically if a method name is provided where a matching delegate is requested:

SendText void (Name string, Age int)

{

Console.WriteLine(name " is " age);

}

 

ShowInfo void (ShowPerson void(string,int))

{

ShowPerson("Josa", 4);

ShowPerson("Ava", 1);

}

 

Start void ()

{

ShowInfo(SendText);

}

Delegate Duck Typing

To simplify interoperating between named and anonymous delegates, a form of compile-time duck typing is used.  Consider the following code:

Predicate1 Func<int, bool>;

Predicate2 bool(int);

 

Predicate1 = p => p % 2 == 0;

Predicate2 = Predicate1;

Predicate1 and Predicate2 are technically two different delegate types.  The anonymous bool(int) delegate will be named something like __anon_bool_int by the compiler.  However, the last two lines are valid because bool(int) and Func<int, bool> are structurally equivalent. It is effectively transformed by the compiler into:

Predicate2 = p => Predicate1(p);

Bindable Properties

The bindable keyword can be used to enable data binding support for user interface controls.  It’s always bothered me how much boiler plate code must be written in .NET languages for bindable properties.  In C#, this is typical:

private int _Age;

public int Age

{

get { return _Age; }

set

{

_Age = value;

PropertyChanged("Age", value);

}

}

That’s ten lines of a code for a simple integer property!  And this is a simple scenario.  Compare that to Archetype’s binding property:

Age bindable int;

Much better!  With this, we can define many bindable properties is a small space. This is expanded by the compiler into something like this:

_Age field int;

Age int

{

get me.Value;

set PropertyChanged(me.Name, me.Value = value);

}

After being warned about the potential dangers of INotifyPropertyChanged by Michael in the comments of the previous article, I am exploring alternative implementations.  Regardless of how it’s implemented (see Part 7 for more details), bindable will be a powerful addition to Archetype developers.

Composite Bindings

Occasionally I need a property which is composed of two or more other properties, and I want to ensure that the proper data binding machinery is notified whenever each constituent property is updated.  In C#, I would need to make multiple PropertyChanged calls in each of the individual properties to signal that the composite binding is changing as well.  In Archetype, we can use the composite keyword within the composite property itself.  Syntactically this is a pull model whereas otherwise we’d be forced to implement a push model.  The Archetype syntax looks like this:

FirstName bindable string;

LastName bindable string;

 

FullName bindable string

get composite FirstName " " LastName;

When the compiler sees the composite keyword after get, it scans the following expression tree.  When it finds property references and those properties are marked as bindable, it makes the appropriate transformations to notify of changes.  In the underlying implementation, it is a push model, but the developer of Archetype is spared those details.  Multiple-statement get functions are supported, and set functions are also supported when using composite.

Bindable Collection Properties

Binding to collections is a little different from binding to single properties.  In WPF, Silverlight, and now more broadly in .NET 4.0, types such as ObservableCollection provide several notifications to user interface controls.

Archetype provides special binding expressions to specify common scenarios such as “bind x to the selected item of this collection”.

Here is an example of a collection and its current single selection, bound together in the view model:

Alternatives ObservableCollection<Alternative>;

 

SelectedAlternative bindable Alternative

bind to Alternatives.SelectedItem;

The following example demonstrates binding a collection to another collection with a discriminating expression (subselection):

Options ObservableCollection<Option>;

 

SelectedOptions ObservableCollection<Option>

bind to Options.SelectedItems

     where item.OptionName.StartsWith("L");

SelectedOptions is an ObservableCollection so that its subselection of contents can itself be bound to a user interface control.  The bind to expression sets the binding source, and the where expression specifies a predicate (a function taking “item”, in this case an Option object, as a parameter, and returning a bool) to include only the objects we want.  The where expression is optional.

You may notice that SelectedItem and SelectedItems are not valid properties of ObservableCollection.  This is because they are extension properties.  Archetype supports extension methods just like C#, but it goes further to provide extension properties, indexers, constructors, and operators.  I’ll discuss type extensions in a future article.

What this doesn’t address is the possibility of binding a collection to more than one user interface control, and allowing independent selection in each.  Because of this, the specifics of binding expressions in Archetype will very likely change before being finalized, but this should give you a taste of the possibilities of language-aware binding.

Next Steps

In the next article, we’ll take a closer look at the import keyword and its special abilities, exception handling, local variable definition, and control flow structures such as if, loop, while, and until.  I’ll also introduce one of my favorite Archetype features, the async construct which is used to intuitively call delegates asynchronously.

[Part 3 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 8 Comments »

The Archetype Language (Part 1)

Posted by Dan Vanderboom on April 26, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Experiments in Language Design

After 25 years of computer programming in many different languages and a more-than-casual interest in linguistic analysis, I’ve developed a keen appreciation of the best features among them.  I also have a relatively steady stream of new language feature ideas.  Many years ago I began tinkering with interpreters and compilers.  At PDC 2009, I was surprised and delighted to hear the news of the language M (part of Oslo), with which it is possible to write parsers for other languages.  Parser generators have been around for a long time, but with such strong support in .NET, it was close enough to home for me to sit up and pay close attention.  The desire to create my own language to address the shortcomings I’ve experienced has been perpetually in the back of my mind.

After that PDC, I bought several more books on language and compiler design and began diving in.  I’ve been somewhat obsessed with it recently, and the language specification I’m writing is starting to look legitimate, so I think it’s time to start sharing what I’ve come up with and (hopefully) get some good feedback.

The Language

The code name for this language is Archetype.  I don’t know if I’ll use this for the final name, but this will do for now.  It’s a multi-paradigm language instead of attempting to be pure in any one way, and if you’re familiar with C#, you should be pretty comfortable with the syntax.  Yet, if you enjoy the functional programming power of languages like ML, Haskell, OCaml, F#, or Nemerle, or are interested in language constructs to simplify asynchronous and concurrent workflows, you’ll probably like Archetype.  While it supports functional programming, one of my goals is to make it appealing and even obvious to developers without a strong functional programming background.  It targets the .NET CLR and will therefore run on many platforms and devices, as well as interoperating well with existing .NET assemblies.

To place it in a set of buckets, as languages are classified by paradigm on Wikipedia, Archetype would be considered: imperative, declarative, generic, functional, object-oriented (class-based), language-oriented, reflective, and meta-programming-based.

Current Status

The parser is under development using M (in the Intellipad editor).  Though the language design and specification itself is about 70-80% complete, the parser is only about 10% done.  Once the parser is a little further along and some interesting samples can be written and parsed, I’ll start building the semantic analyzer and code generation pieces.

The first versions of the compiler will generate C# code instead of IL instructions.  It will be a lot faster for me to translate Archetype constructs to C#.  The C# compiler, though not concurrent or incremental, is highly optimized and produces great output.  This does limit my ability to depart radically from C# semantics, but this is okay: C# is a wonderful language and I plan to keep Archetype pretty closely aligned with it.  For example: all of the same operators and precedence rules are borrowed from C#.  Archetype does introduce a number of new operators, keywords, and syntactical constructs, but it aims to be close to a superset as far as semantics go.

Disclaimer

Everything is subject to change.  Some features are stolen directly from specific languages which I will do my best to identify as I go.  Your mileage may vary.  Available while quantities last.  Batteries not included.

This is a set of experiments.  Hopefully it will also be a fun conversation among language enthusiasts.

A Taste of Features

Since this article is already getting long and I have a ton designed already, I’m going to keep the language design part short and present a mere taste of language features.

Properties and Fields

We’ll start with something basic: how to define properties and fields.

Age int;

This first example is a property.  The name comes first, which you’ll see everywhere in Archetype.  Also notice that the int type is the same as in C#.  This is true of all the built-in C# types.

To define a field, the “field” keyword is added before the type.  This encourages property definition by default.

Age field int;

The property definition above is short for, but equivalent to, this:

Age int

{

get { return me.Value; }

set { me.Value = value; }

}

When defining properties, it’s so common to require a private “backing field” that I thought it warranted something in the language.  C# also does this, but only if you use implicit get and set functions.  As soon as you need custom logic for one or the other, you lose this.  In Archetype, the “me” keyword refers to the current function or property.  In the case of properties, me.Value is the backing field which saves you a line of code for every property that needs one.  Reducing code clutter and maximizing information density and conciseness are major design goals in Archetype.

Other “me” properties are available as well, such as Name and Type, which are useful for general-purpose code generation and debugging.  In functions, invoking “me” is recursive.  C# has the keyword “this” (which Archetype shares), which very usefully refers to the current object.  The “me” keyword is roughly analogous to this.GetType().

The curly braces surrounding get and set are optional.  If a get or set method is a single statement, the curly braces around it are also optional.  We could then write this:

Age int

get me.Value,

set me.Value = value;

Note the lack of a "return" keyword in the get method: “get return” would be redundant.  Also, the get and set clauses are separated by a comma.  This is a common pattern for multiple clauses in Archetype.  The semicolon triggers the end of the statement (in this case, a property declaration statement).

Public class variables are properties by default to promote consistent and forward-looking design techniques (such as compatibility with interfaces), and the field keyword is there to opt out when there is a need (such as performance).

Value types in this language are not nullable by default. The question mark can be used after the value type’s name to indicate it is nullable.

Age int?;

Functions

I’ll have much more to say about functions and delegates in my next article.  Here, I’ll just briefly sketch an outline of what they look like and hint at what’s to come.

Save<T> void (Entity T)

{

// …

}

As with properties, the identifier is listed first (along with a generic type parameter), followed by the return type, and finally the parameters in parentheses.

I’ve been comfortable with the type-first definitions in C# for years, but I’ve often begun writing a function whose name came to mind instantly, but whose return type required further thought.  However, my fingers would hesitate to type anything until I could determine the return type.  After seeing the name come first in Nemerle, it struck me how nice it would be to define functions name-first.  The problem that Nemerle has (and Visual Basic, for that matter) is that the parameter list comes next, and the return type is listed last.  This has the advantage that it’s easier to write, but suffers from being more difficult to read.  When doing a quick scan of code, eyes scanning down through a class, the return types will be all over the place on the screen.  In the case of long parameter lists where poorly-formatted code puts the return type off the screen too far to the right, you’d actually have to scroll right to see the return type.  This is decidedly worse than type-first.

Then I thought: why not put the two most important parts of a function header first: name and return type, with parameters after them?  Then you’d have the best of both worlds: faster to write, and easy to read and understand.  Each parameter then follows the “name type” order, consistent with properties and functions.

Next Steps

In my next article on Archetype, I’ll go into much more detail about functions and delegates, where I think Archetype makes some original contributions (at least in terms of syntactical convenience and elegance).  I’ll talk about creating basic console applications, the simplest program possible to write, anonymous functions and anonymous delegates, a keyword (actually a custom type extension) to drastically simplify data-bindable property definitions for UI view models, and more.

[Part 2 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 13 Comments »