Critical Development

Language design, framework development, UI design, robotics and more.

The Archetype Language (Part 4)

Posted by Dan Vanderboom on May 8, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Conditional Selection

The if statement has been a classic across so many languages.  In Archetype it is almost identical to C# syntax.

if (expression)

statement;

 

if (expression)

{

// expression is true

}

else

{

// expression is false

}

When conditions become complicated, reversing all of the boolean logic can be tricky.  A common way of reversing it is to surround the expression in parentheses and placing a unary not operator before it.  With the required parentheses around the if statement’s condition, it looks like this:

 

if (!(expression))

statement;

In Archetype, the exclamation point can be placed before the parentheses.  It is the only part of the condition that can appear outside the parentheses.

if !(expression)

statement;

Pattern Matching

C-style languages have supported a language construct, switch-case, for providing access to simple jump tables combined with a syntax that is better suited than if for matching many conditions.  This has been unfortunately limited to matching against value types (and string in C#) and against constant values at that.  It’s unfortunate because the more concise syntax for multiple matching values is good in itself, not only when the matching values are constant values.  This constraint is due to the way those compilers build jump tables; it’s a performance optimization technique designed during a time when 8 MHz processors were considered fast.

Pattern matching is one area where functional languages have been very strong.  Archetype has a match keyword that serves the purpose.

match (text) "BEGIN" -> HandleBegin();

This first example matches a simple string to a constant value, and calls HandleBegin if there is a match.  You could write this using an if statement as well.  Here is the equivalent code:

if (text == "BEGIN") HandleBegin();

This next example illustrates several ideas.  There are multiple conditions and some of the conditions are grouped together (with the or operator, |) to share the same reaction code.  The condition or conditions are listed first, followed by the –> operator, and a statement or code block of statements on the right specifies the reaction code.  Also note the numeric range 6..10 and the use of non-constant values (such as x).  Any valid expression is allowed here as long as it’s type matches (or can implicitly cast from) the type of the term being evaluated (number).  It’s also worth mentioning that the | operator isn’t necessary before each set of conditions as it is in other functional languages.  (I’d rather align the left edge of code to the beginning of each condition.)

var x = 1;

var number = 4;

 

match (number)

{

x -> number += 3;

3 | 5 -> number++;

2 | 4 | 6..10 ->

{

number–;

Log("Numbers are getting too big");

}

}

Unlike switch, Archetype’s match doesn’t automatically fall through from one match to the next.  This feature is rarely used with switch and is a significant source of programming defects.  For the few scenarios where you’d like to execute every branch that matches, I’m considering a match all construct which would look like this:

match all (text)

{

"BEGIN" -> LetsBegin();

(Letter | Digit)* -> AddIdentifier(text);

}

In this example, both LetsBegin and AddIdentifier would be called.

Regular Expression Literals

Archetype also supports regular expression literals based on the syntax from Microsoft’s M language as a result of list syntax and operator overloads defined as a library.  In many scenarios, Archetype can determine the difference between string and regular expression literals.  However, in simple cases such as matching against a single character or simple string of characters, the variable defined will require the specification of the regex type.

var BeginToken regex = "BEGIN";

Without this qualifier, BeginToken would look like a string to the compiler.  To ease this problem, Archetype will convert a string to a regex object if the string participates in a regex-typed expression.  Here’s an example:

var BeginToken = "BEGIN";

var MyRegEx = BeginToken | ("A".."Z")*;

The range of letters and the * Kleene operator (which means repeat 0 or more times) identifies MyRegEx as a regex identifier.

Let’s take a look at how regular expressions and regular expression literals work with the match construct.  First we see the literal embedded directly in the match statement, as the only value to match against.

match (text) ("A".."Z" | "a".."z")* -> DoSomethingUseful();

Next, we’ll look at how we can define the regex objects and use their identifiers.  In this way, we can build up libraries of interdependent regular expressions and go even so far as to write sophisticated parsers.  This is an important tool for fulfilling the goal of language-oriented development.

 

var Letter = "A".."Z" | "a".."z";

var Digit = "0".."9";

 

match (text)

{

"BEGIN" ->

{

MarkBeginning();

NewTransaction();

}

 

(Letter | Digit)* -> AddIdentifier(text);

}

Finally, you can apply a when clause as a pattern matching guard similar to F#.

var Letter = "A".."Z" | "a".."z";

var Digit = "0".."9";

 

match (text)

{

"BEGIN" ->

{

MarkBeginning();

NewTransaction();

}

 

(Letter | Digit)* when (text.Length < 15) -> AddIdentifier(text);

}

This isn’t the last word on pattern matching or regular expressions in Archetype.  This is one area I expect to evolve and grow, and to appear in future articles.

Agents, Classes, and Traits

Archetype is a multi-paradigm language, as most commonly-used languages are today.  While it has many features which are functional, it’s heavily influenced by object-oriented design ideas.  Most object-oriented languages are largely imperative rather than functional.  That is to say, it is “programming by side-effects” rather than the goal in functional programming of “no side-effects” (or as few as possible).

Functional programming has grown in popularity relatively recently, considering it’s been around from the beginning of high-level language design.  However, it suffers in some areas such as representing stateful behavior in user interfaces.  Some clever solutions have been devised (such as the use of monads to trick or fake the logic into representing state in a purely functional way), but the theory and application of these patterns are far from intuitive.  I believe this is largely the reason why functional programming languages have been the niche speciality of scientists and mathematicians and not your every day developer. 

Because of these contentious forces, Archetype is aimed at being a transitional language: urging us forward in the use of functional patterns, but without abandoning the imperative style of “programming by side-effects”, and striving to look familiar to programmers of imperative languages such as C#, Visual Basic, etc.

Software Agents and the Actor Model

There are some built-in Archetype constructs that will help to make object-oriented programming safer.  While it doesn’t propose, like Axum (previously code-named Maestro), to prevent any logic that is unsafe in a concurrent execution environment, it does provide some simple but powerful tools that can be used to reduce the risk considerably.

I’m referring primarily to agents which support the Software Agent or Actor Model of parallel program design.  Agents are special classes that run independently (in parallel) of each other, and can only communicate with other agents through messages (introduced in part 3).  Specifically, agents are not allowed to call the methods or subscribe to the delegate members of other agents.  Since each agent runs without the ability to receive from or give execution control to other agents, there is much smaller chance of coordination problems while executing concurrently.

In every other way, however, agents are defined and composed just like classes.  First, we’ll take a look at a simple Customer class.

Customer object, IDisposable

{

FirstName string;

LastName string;

 

this(FirstName, LastName)

set all;

 

Dispose()

{

// clean up

}

}

The class name appears first, followed by a required base type (object here), and a list of interfaces separated by commas.

Archetype supports single-class inheritance with the ability to implement any number of interfaces (or traits, more about that later in this article), as well as any generic type parameters and generic type constraints that you’re used to.

We see some new things here, however.  The instance constructor is called this, and the set keyword is used to set the values of class members where they match constructor parameters of the same name.  This is the same as writing these lines:

this.FirstName = FirstName;

this.LastName = LastName;

With many parameters in a constructor (or another method), this can save many lines of typing.  If you only want to store some of the parameters in properties, you can use a comma separated list: set FirstName, LastName.  If your intent is to set all parameters, you can use the abbreviated set all instead, as shown in the example.  When set all is used, specifying parameter types is optional.

The following code provides an example of two agents that cooperate with each other.

WebDataAgent agent

{

Subscriptions Dictionary<string, List<guid>>;

 

this()

{

// initialize the agent

Subscriptions = Dictionary<string, List<guid>>();

}

 

Subscribe in message(Topic string);

{

if !(Topic in Subscriptions.Keys)

Subscriptions.Add(Topic, new List<guid>);

 

Subscriptions[Topic].Add(me.Client);

 

// confirm the subscription

SubscriptionConfirmed(me.Message);

}

 

SubscriptionConfirmed out message(RequestID guid);

 

PublishMessage in message(Topic string, Message string)

{

loop (var sub in Subscriptions[Topic])

{

MessagePublished(Topic, Message);

}

}

 

MessagePublished out message(Topic string, Message string);

}

UserInterfaceAgent agent

{

CurrentView IView;

 

this(StartView IView)

CurrentView = StartView;

 

RequestData out message(RequestID guid, Method string);

 

DataReceived in message(RequestID guid, Result List<double>)

{

// handle incoming message…

 

// unhook this message handler

DataReceived -= me;

}

}

I have more ideas for actor-based programming (such as a built-in RequestID: me.id), but I want to start simple and force myself to work hard to justify any overhead.

Traits

Composing classes together to obtain maximum reuse of code has been a goal of object-oriented programming for a long time, but it usually falls short of the ideal.  Languages like C++ that support multiple inheritance are unwieldy due to the additional complexity (see the Diamond Problem), and single inheritance—though sufficient in most scenarios—suffers from limitations that have bothered OOP programmers from the beginning of programming time.  Other languages have introduced constructs like Flavors and Mixins, and each has had to deal with its own peculiarities and workarounds.  While an in-depth discussion of traits and their advantages over other approaches is beyond the scope of this article, a smart group at the OGI School of Science & Engineering published a paper that illustrates the issues clearly.  In it, they explain how traits solve many of the problems while avoiding the pitfalls of other approaches.

I found this characterization to be particularly lucid (the bold emphasis is mine):

Although multiple inheritance makes it possible to reuse any desired set of classes, a class is frequently not the most appropriate element to reuse.  This is because classes play two competing roles.  A class has a primary role as a generator of instances: it must therefore be complete.  But as a unit of reuse, a class should be small.  These properties often conflict.  Furthermore, the role of classes as instance generators requires that each class have a unique place in the class hierarchy, whereas units of reuse should be applicable at arbitrary places.

– Nathaneal Scharli et al, in their 2003 paper entitled “Traits: Composable Units of Behavior

The basic idea is that a trait defines a set of functions but no state.  Multiple traits are pulled into a class, where they are “flattened”.  This means that each trait’s functions are added to the class as if those functions were defined directly in the class.  That is, you don’t need to use a member access operator (.) to navigate from the class to the trait and then to the trait’s function.  In doing this, it’s possible for function names and signatures to overlap among traits.  If the name is the same but the signature is different, they’re applied as overloads.  When there’s an actual clash, a conflict-resolution expression is defined to specify the function to use (or ignore).

Although the design of traits involve the lack of any state, Archetype may attempt to include trait-local state.  That is, variables that are visible to the functions of that trait, but which can’t be seen from the hosting class or any other trait.  (This corresponds to the idea of extension properties, which I’ll discuss in the next article.)

This is an experimental area of the Archetype language, one that will likely change several times before getting right.  Here’s an example of what it will probably look like to compose classes out of traits.

Serializable<T> trait

{

provide Serialize string (obj T) { … }

provide Deserialize T (input string) { … }

}

 

Persistent<T> trait

where T : ref

{

// by function

require Serialize string ();

require Deserialize T (input:string);

// or by trait

require Serializable<T>;

 

provide HasChanges bool

get, private set;

 

provide static Load T (id: uid) { … }

provide Save void () { … }

}

 

Customer object, Serializable, Persistent

{

FirstName string;

LastName string;

 

FullName string

get FirstName " " LastName;

}

 

Start void ()

{

var cust = Customer.Load(123);

 

cust.FirstName = "Dan";

cust.LastName = "Vanderboom";

 

cust.Save();

}

A few notes about the code:

  • The type parameter on the trait allows you to constrain the types of object to which the trait can be applied.
  • The where T : ref is the same as where T : class in C#.
  • The require and provide keywords specify the methods that trait requires to be present, or provides to the host class.  Archetype may also support a require of an entire trait, which would act analogously to subtyping (or rather, more like an #include).
  • Conflict resolution expressions aren’t shown because their syntax hasn’t yet been decided.
  • Code formatting applies an italic font to traits, but maintains the color of a user-defined type.  This should help to avoid confusing traits with classes.  This is only one possible solution, but it suggests the use of multiple ways of categorizing identifiers to apply a mixture of formatting and colorizing behaviors.
  • This is not a great example of the strength of traits.

I’ve also had some design ideas for runtime mixing of traits into classes, while still accessing everything through strongly-typed variables (think of traits as interfaces), but this will require much more exploration.

Another idea to support some Aspect Oriented feature.  Imagine if you could define a trait called Bindable, that when added to a class, would add the bindable type extension modifier to all of its properties.

For better examples of traits and class composition using traits, I recommend reading the above-mentioned paper.

Next Steps

In this article, I covered simple conditional statements as well as functional-style pattern matching.  We also looked at agent-based programming based on loosely-coupled messages, which provides greater safety in parallel programming scenarios, traits as a way to compose features on a more granular level and to solve the composition problems that plague single-inheritance languages.

My next article will cover extension of types (extension methods, properties, events, indexers, constructors, and operators), as well as the first of language extensibility options (defining new control structures).  I will probably dip out of sight for a few weeks as I get further along in building the parser and compiler, and learn about Visual Studio’s Managed Language Services.

If you don’t already follow me on twitter (@danvanderboom), I do a lot of tweeting about what I’m reading, researching, or considering during the language design process, so this is a good way to get an inside look at that process.

[Part 5 of this series can be found here.]

Advertisements

7 Responses to “The Archetype Language (Part 4)”

  1. I love the trait syntax. It’s reminiscent of the mixin pattern used in C++ templates, but much more explicit and constrained.

    I find the first-class treatment of regular expressions to be an interesting choice. Do you find that you use regular expressions often enough to warrant language support? I myself do not, but maybe that’s because I don’t have language support.

    • Dan Vanderboom said

      The C-style languages, which I feel compelled to honor aesthetically, have enjoyed a rich set of suffixes and affixes to specify various numeric formats: hexadecimal 0x2C10, float 1.501f, etc. Whether regular expression literals should require a textual cue or can be recognized by its unique combination of strings and specific overloaded operators, it’s hard to argue that regular expressions play an important role in logic, modeling, and reasoning. They’re well understood and powerful in many scenarios. The inclusion of them in the parser doesn’t mandate developers use them, but it’s there for language-oriented approaches to problem solving, and for powerful transformation systems in general: converting a source file to an executable, allowing powerful search capabilities of data in a database, building a scripting DSL to create characters and levels in a video game, and so on.

      Though I’m not a typical CRUD app developer, I’ve used regular expressions many times, and with my favorite tool (RegexBuddy), I’ve gotten good at it. The better I get, the more often I see opportunities to solve problems using that tool. In a lot of ways, it opens doors that would otherwise be closed. My first work with them was a static analyzer of VBScript ASP files. It would read in the whole site, organize the assets, parse and analyze the huge VBScript website I had inherited, and gave me an organized views (TreeViews), and let me see where identifiers were being used, I could see dependencies in the code, and it made my understanding and ability to update the code much more comfortable and confident. (And a ton of major problems were uncovered in the process.)

      Archetype will actually go one step further than regular expressions, and that’s context free grammars. Why stop half way, right? This will involve the “syntax” keyword, which I haven’t talked about yet. It will play a role not only in defining grammars and parsers to match against them, but also in providing meta language constructs to make Archetype its own meta-language.

  2. Dan Vanderboom said

    Regular expressions can be very tricky to write, primarily because they’re so densely packed with information. If you work with a language that DOES have support (M, lex, yacc), the first thing you’ll notice is how nice and easy those expressions can be to write, and that makes you more likely to use them as a tool. Suddenly they’re not so esoteric or unapproachable.

  3. Rob Grainger said

    Personally, I’d avoid regular expressions, wherever possible. While regular languages are useful for certain recognition tasks, regular expressions as they stand are an abomination. They’re unreadable, unmaintainable, and error-prone for any but the simplest tasks.

    PEG integration should obviate all need for them.

  4. Rob Grainger said

    ps. You may note that M does not use the standard regular expression syntax – that, to my mind, is what made it readable.

    • Dan Vanderboom said

      M uses a much more readable form of regular expressions and I find that to be much better, and more along the lines of what I’m hoping to incorporate. I agree that normal regex syntax is an abomination. Too terse for humans.

  5. John Cowan said

    On braces:

    I think allowing braces to be omitted is a design mistake, because leads naturally and easily to a programming mistake. People write

    if (x == y)
      z();
    

    and then change it to

    if (x == y)
      z1();
      z2();
    

    without noticing what they have done. Generally, I am all for scrapping boilerplate, but this particular piece of boilerplate is a good defense against this nasty error.

    In my own code, I make it a point to always use braces around blocks unless the block consists of a single return, throw or (in C) goto statement. Perl actually enforces the use of braces in all cases. Another advantage of requiring braces is that you no longer need parentheses in if statements (though Perl does not allow this).

    On traits with state:

    If you allow these, then you can actually get the best of both worlds by separating extends X (as Java puts it) from incorporates X. Extension becomes a matter of type only (and therefore can safely involve multiple inheritance, subsuming interfaces), whereas incorporation is what actually brings in methods. Rather than extending a class from its superclass, we incorporate the superclass as a trait of the subclass, since there is no reason to distinguish classes from traits. Indeed, we do not have to bring in all the traits from which a superclass is built: we can pick and choose just the traits we want.

    I was brought to this line of thinking by considering what you need to do to override a method in a subclass. It’s often not enough to override one method; in fact, what you need to do in the general case is to override all of the group of methods that have access to a given set of state variables. This group can be seen as a trait-with-state.

    If you go for all-out traits like this, you then need to change how constructors work: rather than a constructor calling its superclass constructor, it needs to call one constructor from each of its traits (or if not, then to the parameterless constructor, I guess).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: