Critical Development

Language design, framework development, UI design, robotics and more.

The Archetype Language (Part 8)

Posted by Dan Vanderboom on October 1, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Constructors

A constructor in Archetype is a recommended, predefined prototype for instantiating an object correctly.

 

The default parameterless constructor is defined implicitly (it’s defined even if it isn’t written), even if other constructors are defined explicitly. This last part is unlike other languages that hide the parameterless constructor when others are defined.  This will make classes with these default constructors common in Archetype, to more easily support behaviors like serialization and dynamic construction.  When it needs to be hidden, it can be defined with reduced visibility, such as private.

 

A constructor is defined with the name new, consistent with how it’s invoked.

 

Let’s start with a very basic class, and build up to more complicated examples.

 

Customer object

{

FirstName string;

LastName string;

}

 

Despite the lack of an explicit constructor, it’s important for Archetype to define constructs that are useful in their default configurations.  You couldn’t get more basic that the Customer class above.  If we want to define the constructor explicitly, we can do so.

 

Customer object

{

FirstName string;

LastName string;

 

new ()

{

// do nothing

}

}

 

Instantiating a Customer object is easy. With the parameterless constructor, parentheses are optional.

 

var dilbert = new Customer;

 

Archetype, like C#, supports constructor initializers:

 

var dilbert = new Customer

{

FirstName = "Dilbert",

LastName = "Smith"

};

 

When you have few parameters and want to compress this call to a single line, the curly braces end up feeling a little too much (too formal?).

 

var dilbert = new Customer { FirstName = "Dilbert", LastName = "Smith" };

 

Archetype supports passing these assignment statements as final arguments of the constructor parameter list, like this:

 

var dilbert = new Customer(FirstName = "Dilbert", LastName = "Smith");

 

As a result, there isn’t much need to define constructors that only set fields and properties to the value of constructor parameters.  Because Archetype has this mechanism for fluidly initializing objects at construction, the only time constructors really need to be defined is when construction of the object is complicated or unintuitive, in which case a supplied construction pattern is a sure way to make sure it’s done correctly.  Our Customer example doesn’t meet those criteria, but if it did, this is one way we could write it:

 

Customer object

{

FirstName string;

LastName string;

 

new (FirstName string, LastName string)

{

this.FirstName = FirstName;

this.LastName = LastName;

}

}

 

To avoid having to qualify FirstName with the this keyword, many people prefer naming their parameters with the first character lower-cased.  That’s an unfortunate compromise.  When viewing at least the public members of a type, in a sense you’re creating an outward-facing API, and I think Pascal casing more naturally respects English grammar, not downplaying the signficance of the most-important first word in an identifier by lower-casing it to get around some unfortunate syntax limitation.

 

But instead of taking sides in a naming convention war, we can solve the problem in the language and remove the need to make any compromise.

 

new (FirstName string, LastName string)

{

set FirstName, LastName;

}

 

This lets us set individual properties named the same as constructor parameters.  It’s flexible enough to set some and consume other parameters differently, but when you want to set all parameters with matching member names, you can use the shortcut set all.  If that’s all the constructor needs to do, we can do away with the curly braces:

 

new (FirstName string, LastName string) set all;

 

If our Customer class contained a BirthDate property, we could use this constructor and pass in an initializer statement as a final parameter.

 

var dilbert = new Customer("Dilbert", "Smith", BirthDate = DateTime.Parse("7/4/1970");

 

This works with multiple initializers.  Alternatively, we could use an initializer body after the parameter list:

 

var dilbert = new Customer("Dilbert", "Smith")

{

BirthDate = DateTime.Parse("7/4/1970")

};

 

Note how we have two places to supply data to a new object, if needed: the parameter list for simple, short values, and the initializer body for much larger assignments.


Another common construction pattern is for one or more constructors to call another constructor with a default set of properties.  Typically the constructor with the full list of parameters performs the actual work, while the shorter constructors call into the main one, passing in some default values and passing the others through.

 

new (EvaluateFunc sFunc<T>) new(null, null, EvaluateFunc);

 

new (BaseObject object, EvaluateFuncsFunc<T>) new(null, BaseObject, EvaluateFunc);

 

new (Name string, BaseObject object, EvaluateFunc Func<T>)

{

set all;

 

// do all the real work…

// …

}

 

Declarative Archetype: The Initializer Body

 

The initializer body mentioned above has a special structure in Archetype.  Member assignment statements can appear side-by-side with value expressions that are processed by a special function called value.  This can be used, among other things, to add items to a collection.  It’s best to see in an example:

 

var dilbert = new Customer

{

FirstName = "Dilbert",

LastName = "Smith",

BirthDate = DateTime.Parse("7/4/1970"),

 

new SalesOrder(OrderCode = "ORD012940"),

 

new SalesOrder

{

OrderCode = "ORD012941",

 

new SalesOrderLine(ItemCode = "S0139", Quantity = 3),

new SalesOrderLine(ItemCode = "S0142", Quantity = 1)

}

};

 

The first three lines of the initializer set members with assignment statements.  The next expression (new SalesOrder …) in the list creates an object, but there’s no assignment.  It returns a value, but where does it go?  Take a look at the value functions below for the answer:

 

Customer object

{

FirstName string;

LastName string;

Orders SalesOrder* = new;

Invoices Invoice* = new;

 

// formatted inline

value (Order SalesOrder) Orders += Order;

 

// formatted with full code block

value (Invoice Invoice)

{

Invoices += Invoice;

}

}

 

A Customer has several collections of things–Orders and Invoices here–and because there are two value functions in the class, any expressions of type SalesOrder or Invoice will be evaluated and their values passed to the appropriate value function. Expressions of other types will trigger a compile-time error.

 

The += and -= operators haven’t been shown before.  Their use is a very natural fit for stream and list types.  The += operator appends an object to a stream, and -= removes the first occurrence of that object.

 

This simple addition of a value function in types (classes and structs) gives Archetype the ability to represent hierarchical structures in a clean, declarative way.  Sure it’s always been possible to format expressions similarly, but the syntactic trappings of imperative languages have made this difficult and unattractive at best, and in most real-world cases impractical.

 

When I experimented in creating a Future class, I came up with a pattern in C# to nest structures in a tree for large future expressions, but the need to match parentheses gets in the way and consumes too much attention that’s better focused on the logic itself:

 

Future<string> FuturePi = null, FutureOmega = null, FutureConcat = null, FutureParen = null;

 

var result = new Future<string>("bracket",
    () => Bracket(FutureParen),
    (FutureParen =
new Future<string>("parenthesize",
        () => Parenthesize(FutureConcat),
        (FutureConcat =
new Future<String>("concat",
            () => FuturePi +
" < " + FutureOmega,
                (FuturePi =
new Future<string>("pi", () => CalculatePi(10))),
                (FutureOmega =
new Future<string>("omega", () => CalculateOmega()))
            ))
        ))
    );

 

The difference finally occurred to me between the need to set few simple members and the definition of larger, more structured content–including nested structures–that begged for a way to supply them without carrying the end parenthesis down multiple lines or letting them build up into parentheses knots that must be carefully counted.  One gets to fidgeting with where to put them, and sometimes there’s no good answer to that.


Another feature we need to make this declarative notation ability robust is inline variable declaration and assignment.  Notice in the last example how several intermediary structures have variable names defined for them ahead of time, outside the expression. Writing that Future code, I felt it was unfortunate these variables couldn’t be defined inline as part of the expression.  Doing so would allow us to define any kind of structure we might see in XML or JSON, such as this XAML UI code.

 

new Canvas -> LayoutRoot

{

Height = Auto,

Width = Auto,

 

new StackPanel -> sp

{

Orientation = Vertical,

Height = 150,

Width = Auto,

 

Canvas.Top = 10,

Canvas.Left = 20,

 

with Canvas

{

Top = 10,

Left = 20,

},

 

with Canvas { Top = 10, Left = 20 },

 

Loaded += (sender, e)

{

Debug.WriteLine("StackPanel sp.Loaded running");

sp.ResizeTo(0.5 seconds, Auto, 200).Begin();

},

 

LayoutUpdated += HandleLayoutUpdated,

 

new TextBlock

{

FontSize = 18,

Text = "Title"

},

new TextBlock { Text = "Paragraph 1" },

new TextBlock { Text = "Paragraph 2" },

new TextBlock(Text = "Paragraph 3")

}

};

 

A few notes are needed here:

 

·   Wow, this looks a lot like XAML, but much friendlier to developers who have to actually read and edit it!  Yes, good observation.

·  Unlike XAML, every identifier here works with the all-important Rename refactoring, go to definition, find all references to, etc. This is great for reducing the amount of work to find relationships among things and manually update related files.

·  Also unlike XAML, code for event handlers can be defined here. I’m not saying you should cram all of your event handler logic here, but it could come in quite handy at times and I can’t see any reason to disable it. 

·  The with token is a custom operator (see Part 7) that provides access to attached properties through an initializer body. Custom extensions allow you to access these properties with a natural member-access style.

·  It hasn’t been possible to use generic classes in XAML. Specifying UI in Archetype, this would be trivial, and I suspect they could be used to good effect in many ways. Of course, in doing this you’d lose support for the designers in VS and Blend, which would be awfui.

·  Auto is simply an alias for double.NaN.

·  The -> custom operator in these expressions defines a variable and sets it to the value of the new object. The order of execution is:

1. Evaluate constructor parameters, if any are supplied.

2. Assign the object to the variable defined with ->, if supplied.

3. Set any fields or properties with assignment statements.

4. Evaluate value expressions, if supplied, and call the class’s value function with each one, if a value function has been defined.

5. Invoke any matching value function defined in class extensions.

 

By following this design, the example above can be translated into this C# code by the Archetype compiler:

 

var LayoutRoot = new Canvas()

{

Height = double.NaN,

Width = double.NaN

};

 

var sp = new StackPanel()

{

Orientation = Orientation.Vertical,

Height = 150.0,

Width = double.NaN

};

 

LayoutRoot.Children.Add(sp);

 

sp.SetValue(Canvas.TopProperty, 10.0);

sp.SetValue(Canvas.LeftProperty, 20.0);

 

sp.Loaded += (sender, e)

{

Debug.WriteLine("StackPanel sp.Loaded running");

sp.ResizeTo(0.5.seconds(), double.NaN, 200.0).Begin();

},

 

sp.LayoutUpdated += HandleLayoutUpdated;

 

sp.Children.Add(new TextBlock() { FontSize = 18, Text = "Title"});

sp.Children.Add(new TextBlock() { Text = "Paragraph 1"});

sp.Children.Add(new TextBlock() { Text = "Paragraph 2"});

sp.Children.Add(new TextBlock() { Text = "Paragraph 3"});

 

var VisualTree = LayoutRoot;

 

Compare the two approaches. The C# code is a typical example of imperative structure building, while the Archetype code is arguably as declarative as XAML, and with many advantages over XAML for developers.


Going back to the Future example, we could rewrite this in Archetype a few different ways.  I’ll present two.  In the first one, value functions are used to receive the future’s evaluation function as well as any Future objects the expression depends on.


new
Future<string>("bracket") -> result

{

() => Bracket(FutureParen),
new Future<string>("parenthesize") -> FutureParen

{

() => Parenthesize(FutureConcat),
new Future<string>("concat") -> FutureConcat

{

() => FuturePi + " < " + FutureOmega,
new Future<string>("pi") -> FuturePi

{

() => CalculatePi(10)

},

new Future<string>("omega") -> FutureOmega

{

() => CalculateOmega()

}

}

}

}


The shorter approach passes an evaluation delegate in as a parameter.


new
Future<string>(() => Bracket(FutureParen)) -> result

{

new Future<string>(() => Parenthesize(FutureConcat)) -> FutureParen

{

new Future<string>(() => FuturePi + " < " + FutureOmega) -> FutureConcat

{

new Future<string>(() => CalculatePi(10)) -> FuturePi,

new Future<string>(() => CalculateOmega()) -> FutureOmega

}

}

}

 

The name string parameter is missing from the last example.  This was only for use during debugging.  Now what we have is a very direct description of futures that are dependent on other futures in a dependency graph.

Summary

Object construction is a crucial part of an object-oriented language, and Archetype is advanced with its options for constructing arbitrary object graphs and initializing even complicated state in a single expression.  These fluent declarative syntax features are ideal for representing structures such as XAML UI, state machines, dependency graphs, and much more.

XAML is a language.  The question this work has me asking is: do we really need a separate language if our general purpose language supports highly declarative syntax? It’s a provocative question without an easy answer, but it seems clear that many DSLs could emerge within a language that so richly supports composition.

With the ability to define arbitrarily complex structures in code—from declarative object graphs to rich functional expressions—it’s hard to think of a situation that would be too difficult to model and build an API or application around.

Advertisements

2 Responses to “The Archetype Language (Part 8)”

  1. Dan,

    Nice series. The language has quite a few niceties that may have been part of C# were it to be redesigned now, with our knowledge and experience to guide it.

    I have a comment about one of your statements:

    To avoid having to qualify FirstName with the this keyword, many people prefer naming their parameters with the first character lower-cased. That’s an unfortunate compromise. When viewing at least the public members of a type, in a sense you’re creating an outward-facing API, and I think Pascal casing more naturally respects English grammar, not downplaying the signficance of the most-important first word in an identifier by lower-casing it to get around some unfortunate syntax limitation.

    This gave me pause, because I have always had a different intuitive understanding of casing as it relates to English grammar. In English, proper names are upper-cased, such as Dan and Bryan, whereas referring names (pronouns) are lower-cased, such as he, she, and it.

    In C#, proper names, such as those for classes, methods, and properties, are Pascal-cased, whereas referring names (variables) are camel-cased.

    This seems like a very concrete and semantic analogy which does not require incorporation of external visibility into casing rules.

    How would you respond to that view of English/code grammar?

    • Dan Vanderboom said

      @Bryan Watts,

      Hi Bryan. The difference in opinion among developers on issues like this are no doubt related to our variety of perspectives. As you’ve framed it, lower-casing seems appropriate, and I do often name one-word local variables lower-cased.

      However, when we’re talking about parameter names, it’s part of the header for a method, and I view that entire header as a title. Titles in English do upper-case the first letter of each word, except for certain common articles and conjunctions. We upper-case the first letter in each word of an identifier to make the named phrase easier to read, so these connecting words are usually upper-cased as well: “DateOfBirth” instead of “DateofBirth”. Because identifiers don’t allow spaces, the upper-cased letters make the words stand out and therefore make code easier to read.

      Titles in English always have the the first letter of the first word upper-cased, even if it is a connecting word like “The” or “With”. You don’t see an article titled “medicine at Home” or “buyer Beware”, and so parameters like “firstName” look equally unbalanced and awkward from an English grammar point of view.

      So I would argue that method parameters should also upper-case the first letter of the first word.

      I’m not mandating this in my language, of course. But what I am doing is designing the language in such a way that parameters can be Pascal Cased (or Title Cased) without requiring qualification by a keyword like “this”, as is the case in C#.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: