Critical Development

Language design, framework development, UI design, robotics and more.

The Archetype Language (Part 2)

Posted by Dan Vanderboom on April 27, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

The Purpose of Archetype

You may wonder why I’m designing a new language.  As I explained to Vlad in the comments of my introductory article:

I don’t need it. There are plenty of perfectly usable languages out there.

That being said, I want it. I want to spend a lot less time with ceremony and more with substance. I want greater expressive power without sacrificing readability. I want to extend the language syntax and hook into the compiler at certain key points to experiment with new ideas without having to version the base compiler. I want to define traits as composable types and reserve classes for engines of instantiation when it makes sense to do so. I want common concurrency patterns to feel like first-class citizens so that behavior guaranties can be made at compile time. I also want to see if all the language ideas I’ve come up with over the years will really be as valuable as I think they will be.

I have a lot more reasons, too. They’ll be the subject of continued articles in the series.

Hello World!

I would be remiss if I didn’t include a Hello World example.

Start void ()

{

System.Console.WriteLine("Hello world!");

}

This is a complete program.  It defines a Start function as a top-level construct (not inside of a class), with a single call to Console.WriteLine, a normal .NET method.

Because any code block can be either a single statement or a pair of curly braces containing zero or more statement, we can shorten our example to look like this:

Start void ()

System.Console.WriteLine("Hello world!");

C# already allows this with constructs like if, while, and for.  Archetype takes it to the next level by making it universal.

Requiring the start method of a program to be hosted in a class seems like a kludge to me; and in the spirit of enabling the language to be used for more functional programming, I thought it appropriate to allow this type of functional composition without the ceremony of an enclosing class.

The startup function name will be Start by default, and a modifiable setting on the project options page will let you use a different name for the entry point function.  It will also support using a static function in a class.

Delegates

Delegate definitions in Archetype are very close to function definition syntax.  Consider this example:

import System; 

Start void ()

{

ShowInfo void (); // define a delegate

 

ShowInfo(); // invoke the delegate

 

ShowInfo = () => { Console.WriteLine("Name: " me.Name) };  // Name: Start

ShowInfo += { Console.WriteLine("Type: " me.Type) };    // Type: void

 

ShowInfo();

}

First, we’re introducing the import statement.  Our use of it here is identical to the using keyword in C#.  The import statement does some other interesting things, which we’ll see in a future article.

The first statement in our Start method defines a delegate called ShowInfo.  Note that the only difference between this and a function definition is its lack of a trailing code block.  Instead, a semicolon appears after the (empty) parameter list.

The next line invokes the delegate.  In C# this would throw a NullReferenceException, which I’ve always found annoying.  In Archetype, as with Visual Basic, this gets converted by the compiler into a check for null followed by an invocation if it’s not null.  I’ve gone this route because of how rare it is that I actually want to throw an exception in these cases; in C#, I’m constantly writing the null check for delegates to avoid it, wrapping that check and the invocation in OnDoWhatever methods, and that seems wasteful.  In Archetype, if you want to throw an exception when a delegate is null, then write the code to throw one explicitly.

The following two lines point the delegate to a specific function (expressed as a lambda, similar to C#) and add a lambda function to the first.  The =, +=, and –= operators work as expected with delegates.

Notice that the first lamda function supplies an empty parameter list, but the second one omits it.  The parameter list can be omitted when there are no parameters, or when you don’t need to reference the arguments that are passed in.

It’s possible to be even more terse if we have a single non-assignment statement we’re assigning to our delegate variable:

ShowInfo = Console.WriteLine("Name: " me.Name);

Assignment statements cause a problem with parsing because of the right-to-left interpretation of assignments.  Consider these statements:

ShowInfo = Console.WriteLine("Name: " me.Name);

 

ShowInfo = Age = 1;

The first line is a valid delegate assignment.  The intention of the second is :“each time ShowInfo is invoked, set Age to 1.”  However, the parser reads this as “Set Age to 1, then set ShowInfo to Age,” which is not what we want.  As a result, single assignment statement delegates in Archetype require being surrounded in curly braces.

Finally in our Start method above, the last line invokes the delegate again, which in turn calls both lambda functions.

Additional Notes:

  • The me keyword refers to the current function, Start.  As the comments suggest, me.Name returns “Start” and me.Type returns a System.Void Type object.  Calling me() as a function would call Start recursively.
  • String concatenation doesn’t use the + operator.  Multiple strings separated by spaces are concatenated automatically.  In the case of Console.WriteLine above, where a string literal (“Type: ") is followed by a non-string value (me.Type), the non-string value is converted to a string with ToString.  This can occur because the non-string value is listed where a string is expected.

Delegate Parameters

Defining parameters for delegates is easy. If an anonymous function won’t use any of the parameters passed in, it can omit the parentheses entirely. Individual parameters names can be omitted with an underscore character. Otherwise, argument names are supplied as usual. All three variations can be seen in the following example:

Start void ()

{

ShowInfo void (Info string, Priority int);

 

ShowInfo = (info, priority) { Console.WriteLine("Name: " me.Name) };

ShowInfo += (info, _) { Console.WriteLine("Type: " me.Type.Name) };

ShowInfo += { Console.WriteLine("Info: " info) };

 

ShowInfo("Fake Info", 10);

}

Named (Non-Anonymous) Delegate Types

So far we’ve only seen anonymous delegate types.  The ShowInfo delegate above has a type, but we can’t refer to it by name, and so we can’t share that type with other code.  This is fine in many cases.  In fact, many times I’m annoyed by the need to go to another file to add a delegate that will never be used elsewhere.  But there’s also occassionally a need to expose that type, especially for use by a library or framework consumer.

The following code defines a delegate type, a function that uses that delegate type for its parameter, and a call to the function. That call contains a lambda expression that creates an anonymous function and a delegate object pointing to it, and passes that to the ShowInfo function.

type ShowPersonDelegate void (Name string, Age int);

 

ShowInfo void (ShowPerson ShowPersonDelegate)

{

ShowPerson("Josa", 4);

ShowPerson("Ava", 1);

}

 

Start void ()

{

ShowInfo((name, age) { Console.WriteLine(name " is " age) });

}

Another way to think about this is that the delegate keyword simply defines a name (ShowPersonDelegate), which then points to an otherwise-anonymous delegate type: void(Name string, Age int).

The final statement calls ShowInfo, passing in a lambda function, which has the same syntax as C#.

For the sake of comparison, here is the same program using an anonymous delegate.  Note that the delegate’s parameter names are optional.

ShowInfo void (ShowPerson void(string,int))

{

ShowPerson("Josa", 4);

ShowPerson("Ava", 1);

}

 

Start void ()

{

ShowInfo((name, age) { Console.WriteLine(name " is " age) });

}

As in C#, a delegate object will be created automatically if a method name is provided where a matching delegate is requested:

SendText void (Name string, Age int)

{

Console.WriteLine(name " is " age);

}

 

ShowInfo void (ShowPerson void(string,int))

{

ShowPerson("Josa", 4);

ShowPerson("Ava", 1);

}

 

Start void ()

{

ShowInfo(SendText);

}

Delegate Duck Typing

To simplify interoperating between named and anonymous delegates, a form of compile-time duck typing is used.  Consider the following code:

Predicate1 Func<int, bool>;

Predicate2 bool(int);

 

Predicate1 = p => p % 2 == 0;

Predicate2 = Predicate1;

Predicate1 and Predicate2 are technically two different delegate types.  The anonymous bool(int) delegate will be named something like __anon_bool_int by the compiler.  However, the last two lines are valid because bool(int) and Func<int, bool> are structurally equivalent. It is effectively transformed by the compiler into:

Predicate2 = p => Predicate1(p);

Bindable Properties

The bindable keyword can be used to enable data binding support for user interface controls.  It’s always bothered me how much boiler plate code must be written in .NET languages for bindable properties.  In C#, this is typical:

private int _Age;

public int Age

{

get { return _Age; }

set

{

_Age = value;

PropertyChanged("Age", value);

}

}

That’s ten lines of a code for a simple integer property!  And this is a simple scenario.  Compare that to Archetype’s binding property:

Age bindable int;

Much better!  With this, we can define many bindable properties is a small space. This is expanded by the compiler into something like this:

_Age field int;

Age int

{

get me.Value;

set PropertyChanged(me.Name, me.Value = value);

}

After being warned about the potential dangers of INotifyPropertyChanged by Michael in the comments of the previous article, I am exploring alternative implementations.  Regardless of how it’s implemented (see Part 7 for more details), bindable will be a powerful addition to Archetype developers.

Composite Bindings

Occasionally I need a property which is composed of two or more other properties, and I want to ensure that the proper data binding machinery is notified whenever each constituent property is updated.  In C#, I would need to make multiple PropertyChanged calls in each of the individual properties to signal that the composite binding is changing as well.  In Archetype, we can use the composite keyword within the composite property itself.  Syntactically this is a pull model whereas otherwise we’d be forced to implement a push model.  The Archetype syntax looks like this:

FirstName bindable string;

LastName bindable string;

 

FullName bindable string

get composite FirstName " " LastName;

When the compiler sees the composite keyword after get, it scans the following expression tree.  When it finds property references and those properties are marked as bindable, it makes the appropriate transformations to notify of changes.  In the underlying implementation, it is a push model, but the developer of Archetype is spared those details.  Multiple-statement get functions are supported, and set functions are also supported when using composite.

Bindable Collection Properties

Binding to collections is a little different from binding to single properties.  In WPF, Silverlight, and now more broadly in .NET 4.0, types such as ObservableCollection provide several notifications to user interface controls.

Archetype provides special binding expressions to specify common scenarios such as “bind x to the selected item of this collection”.

Here is an example of a collection and its current single selection, bound together in the view model:

Alternatives ObservableCollection<Alternative>;

 

SelectedAlternative bindable Alternative

bind to Alternatives.SelectedItem;

The following example demonstrates binding a collection to another collection with a discriminating expression (subselection):

Options ObservableCollection<Option>;

 

SelectedOptions ObservableCollection<Option>

bind to Options.SelectedItems

     where item.OptionName.StartsWith("L");

SelectedOptions is an ObservableCollection so that its subselection of contents can itself be bound to a user interface control.  The bind to expression sets the binding source, and the where expression specifies a predicate (a function taking “item”, in this case an Option object, as a parameter, and returning a bool) to include only the objects we want.  The where expression is optional.

You may notice that SelectedItem and SelectedItems are not valid properties of ObservableCollection.  This is because they are extension properties.  Archetype supports extension methods just like C#, but it goes further to provide extension properties, indexers, constructors, and operators.  I’ll discuss type extensions in a future article.

What this doesn’t address is the possibility of binding a collection to more than one user interface control, and allowing independent selection in each.  Because of this, the specifics of binding expressions in Archetype will very likely change before being finalized, but this should give you a taste of the possibilities of language-aware binding.

Next Steps

In the next article, we’ll take a closer look at the import keyword and its special abilities, exception handling, local variable definition, and control flow structures such as if, loop, while, and until.  I’ll also introduce one of my favorite Archetype features, the async construct which is used to intuitively call delegates asynchronously.

[Part 3 of this series can be found here.]

Advertisements

8 Responses to “The Archetype Language (Part 2)”

  1. Boris Letocha said

    If string could mean System.String, why bool(int) cannot be automaticaly Func(int, bool)? I think it could be useful.

    • Dan Vanderboom said

      There will be potentially many delegates that match bool(int), but are considered distinct types by the CLR. The problem is in having a general, universal algorithm for matching anonymous delegates with named delegates. If you hard-code specific anonymous delegates like bool(int) to specific named delegates such as Func(int, bool), you’d be preventing that code from working with other delegates whose type parameters match. The solution I came up with, described in Delegate Duck Typing, allows the interoperation of delegate types that is desirable without closing any doors or making other scenarios difficult.

  2. You’ve definitely done more with bindable properties than I expected. I haven’t seen the application of static analysis to this problem before. And the linq-like syntax to filter bound collections is much simpler than the typical ICollectionView solution.

    Static dependency analysis will help Archetype solve a lot of problems, but it will eventually hit a wall. Sometimes the thing that you depend upon is in another class. If that implementation is hidden behind an interface, static analysis won’t find it.

    This is one place where run-time analysis actually works better. Rather than walking the expression tree, just let the code run. Keep track of everything that it touches. Those are the things that the property depends upon.

    Using run-time analysis, you could trace dependencies from the View Model (where your bindable properties are) into the Data Model. Interfaces, polymorphism, and complex logic would not be a problem.

    • Dan Vanderboom said

      The disadvantage to the LINQ-like binding expression is that, in its current form, it doesn’t provide an identifier to dynamically modify the filter, sort, or other properties. It’s fine for simple scenarios, but ICollectionView is good for dynamic scenarios. I’m definitely going to revisit and revise this, and see if I can come up with something that provides the best of both approaches.

  3. Names first, right?

    Why then “delegate ShowPersonDelegate”?

    • Dan Vanderboom said

      @lars,

      Good point. Names should be first. In fact, there’s no reason to include a “delegate” keyword at all, since the type is specified with parameters, which designates a function or delegate. The difference between delegate and function is determined by the absence or presence of a following code block. In fact, in the rest of the examples I gave, I didn’t use “delegate”. I’m so used to writing C#, that just snuck in there!

      Thanks for pointing this out!

  4. I love how Closures/Delegates are handled in Groovy.

    As you do here, the parameterlist is ommittable. Default param is “it”. Scala uses “_” for the first, and “__” for the second, and so on… A closure in groovy is defined between { and }

    def out = {println it}

    If you want to define named parameters, you do like this:

    def out = {text -> println text}

    Another thing is, that parantheses are ommittable when you call functions. And as an exception, if the last parameter is a closure, you can put it outside of the parameters.

    This enables for really cool constructs:

    def printif = {text, Closure predicate -> if (predicate()) println text}

    Can then be called like this:

    printif (“Hello World”){
    it.startsWith(“H”)
    }

    So my suggestion to consider:

    If a Lambda in Archetype has parameters, place the curlys around the whole thing. Your example:
    ShowInfo((name,age) => Console.WriteLine(name + ” is ” + age));

    Would become:
    ShowInfo({name,age => Console.WriteLine(name + ” is ” + age)});

    Which for my eyes is more readable. And if you then add optional parantheses in method-calls, it becomes:
    ShowInfo {name,age => Console.WriteLine(name + ” is ” + age)}

    What do you think?

    • Dan Vanderboom said

      @lars

      My fifth article is about custom control structures, which does almost exactly what you’re talking about here regarding closures. I first came up with the idea while writing this article. I’ve been told that Ruby does something similar as well.

      I like the curlies instead of yet another layer of parentheses for single-statement functions, but what about scenarios where multiple statements are needed, as in:

      ShowInfo(
      (name, age) =>
      {
      Console.WriteLine(name ” is ” age);
      Console.WriteLine(“hooray!”);
      });

      ShowInfo(
      {
      name, age =>
      Console.WriteLine(name ” is ” age);
      Console.WriteLine(“hooray!”);
      });

      In this case, I still lean toward the first option. The parameters don’t feel like they belong within a { } block of statements.

      Your printif example is very interesting. Here’s how I would approach it in Archetype (taking a few steps to get there):

      printif : void(Text:string, Predicate:bool(string) closure)
      {
      if (Predicate())
      Console.WriteLine(Text);
      }

      printif(“hello there”) { it.Length > 5 };

      I agree: this is very cool. But the syntax could be better. What if Archetype introduced an “after” clause that could introduce a quasi-keyword in the context of this method call?

      print : void(Text:string, Predicate:bool(string) closure after “if”)
      {
      if (Predicate())
      Console.WriteLine(Text);
      }

      print(“hello there”) if { it.Length > 5 };

      Now it’s a little more self-documenting and the intention of the closure’s contents are clear. But it’s still not perfect, IMO. Curly braces are great for multiple-statement blocks, but in scenarios where you only need a single expression, what would be perfect is something like this:

      print : void(Text:string, Predicate:bool(string) expression after “if”)
      {
      if (Predicate())
      Console.WriteLine(Text);
      }

      print(“hello there”) if (it.Length > 5);

      This is ultimately where I’d like to go: controlled points where the language can be extended. Intellisense could be fed to know about this “if” ‘keyword’ to make writing that code easier.

      I don’t think any one of the previous three examples is a silver bullet. Instead, I think they each have a place for different circumstances.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: