Archive for April, 2010

The Archetype Language (Part 3)

Posted by Dan Vanderboom on April 27, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype. Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs. A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 – Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Exception Handling

The try keyword can be used within any code block.

ProcessItem void (Item Item)

{

try

{

// throw a runtime exception

}

catch (ex Exception)

{

// handle exception

}

finally

{

// cleanup

}

Additionally, every function can specify its own inline catch and/or finally blocks like this:

ProcessItem void (Item Item)

{

// throw a runtime exception

}

catch (ex Exception)

{

// handle exception

}

finally

{

// cleanup

}

You’ll see this pattern appear in other constructs in Archetype (such as async blocks).

A try block can exist with one or more catch clauses only, a finally clause only, or both. If both are included, the catch clauses must come first. This is true whether they’re defined as part of the function (example 0) or have an explicit try clause (example 1). Since curly braces are optional for single statement code blocks, we can write this:

try Work();

catch (x Exception) Log(x);

finally Finish();

The exception type can be specified by itself (without a name), and the default variable "ex" will be used. If a catch block doesn’t specify an exception type, Exception is presumed.

try Work();

catch (ArgumentException) Log(ex);

catch (NullReferenceException) Log(ex);

catch Log(ex);

finally Finish();

The order of catch blocks must be from most derived to most base (Exception itself must always be last, if present). Incorrect ordering will result in a compiler error.

Catch and finally blocks are scoped as nested within the try block. This enables catch and finally blocks to reference identifiers defined in the try block.

try

{

var answer = 42;

}

catch

{

// valid reference to answer

var a = answer;

}

As with C#, the throw keyword can be used with an Exception variable to wrap and rethrow a caught exception, or throw can be used to in a statement by itself to rethrow the original exception.

Namespace Imports

In Part 2, we saw the first of the import keyword to import namespaces.

import System;

Start int ()

Console.WriteLine("Hello world!");

Like Nemerle, we can also apply the import keyword with classes to access static members without specifying the class name.

import System.Console;

Start void ()

WriteLine("Hello world!");

Another option is to import a namespace or class into a nested function or class scope.

Start void ()

{

import System;

Console.WriteLine("Starting up…");

}

Employee object

{

import System;

Work void ()

{

Console.WriteLine("Working hard!");

}

Similar to the with keyword in Pascal, import can be used to import a namespace or class for a specific code block. This limits a namespace or class import to a limited section of a function.

Start void ()

{

System.Console.WriteLine("The following import doesn’t apply here.");

import System.Console

{

WriteLine("hello");

WriteLine("goodbye");

}

One final thing you can do with import is to specify a namespace alias.

Start void ()

{

import sys = System;

sys.Console.WriteLine("Example of a namespace alias");

}

Aliases

The section on namespace imports above introduced namespace aliasing. In this section, we’ll see how to use the alias keyword to provide additional identifiers to classes, functions, properties, and fields.

The class alias is similar to the import class pattern, except that a new identifier is introduced and must be used to reference its members. (With import, that class’s static members are implicitly accessible.)

DemonstrateAlias void ()

{

alias kid = Geneology.Child;

var Josa = new kid;

Josa.FirstName = "Josa";

Josa.Age = 4;

}

In addition to the alias statement, I’m introducing object instantiation syntax in Nemerle. The constructor of the Geneology.Child class is being called (via its alias, kid) with the new keyword but no parentheses, which are optional when calling a parameterless constructor.

The var keyword is also new here. It is similar to the var keyword in C#, except that it’s required for local variable definitions.

This example demonstrates alias used for a local variable. The syntax is identical for class fields and properties, except that those aliases can be specified at the class level as well as within functions.

DemonstrateAlias void ()

{

var SocialSecurityNumber = "123-456-7890";

alias SSN = SocialSecurityNumber;

System.Console.WriteLine(SSN);

}

This last alias example shows how it can be applied to functions.

DoWork void () { … }

Test void ()

{

alias work = DoWork;

work();

}

Control Flow

We’ve already discussed exception handling, which is a very fundamental kind of control flow structure. In this section, we’ll explore several other constructs that are familiar to every programmer.

In Programming Language Pragmatics, the author (Michael L. Scott) enumerates six essential types of control flow: sequencing, iteration, selection, exception handling, recursion, and concurrency. We’ll cover most of them in this article.

Sequencing is merely the scheduling of one statement to be executed after another. This is the standard model of interpreting source code statements in a code block, so there’s not much more to say about it.

Iteration

Iteration is much more interesting. In Archetype, there are four iteration constructs.

Loop

The first is the incredibly versatile loop. It can take a simple integer expression to loop a specific number of times. It can alternatively take an expression that introduces a variable, a range of values, and an optional skip value (similar to the for keyword in Visual Basic). Finally, it can act like the foreach keyword in C# and iterate over IEnumerable and IEnumerable<T> collections such as streams, lists, etc.

// loop 10 times

loop (10)

{

}

// i starts at 3, increments by 1, until it reaches 9

loop (var i in 3..9)

{

}

// i starts at 11, decrements by 2, until it reaches (or passes) 1

loop (var i in 11..1 skip 2)

{

}

// define cust, then loop through and reference each object in an IEnumerable

loop (var cust in Customers)

{

}

This replaces the archaic syntax of the for loop in C# and older C-style languages, and provides a construct which is much easier to read and write. Each of the integer constants in the examples above can be replaced with expressions (variables, function calls returning integers, etc).

When writing a loop, it’s often necessary to skip the remainder of the current iteration and continue with the next one. In C#, the ambiguous-sounding continue keyword is used. I remember seeing this for the first time and thinking that it meant to continue executing after the loop, which wasn’t the case. So in Archetype, I’m ressurrecting the venerable old keyword next, as in “go to the next iteration in this loop”. To break out of a loop altogether and continue executing after the loop, the break is used.

// define the int i and loop from 0 to 9

loop (var i in 0..9)

{

// do some work

if (DoneWithThisIteration)

next;

if (DoneWithLoop)

break;

// continue with more work

}

Fork-Join

A potent addition to the loop construct is the fork-join pattern. The fork and join words are actually defined in a library, not in the language itself. In a later article, you will see how to create patterns like this.

// fork out a bunch of parallel tasks and join when all are done

fork (var cust in Customers)

{

// this code is encapsulated in a task in the TPL

// and scheduled for execution

}

join (tasks)

{

// this code block is executed when all of the tasks

// are either completed or canceled

}

The join clause’s parameter, named tasks in the example, is a reference to a list of Task Parallel Library (TPL) tasks. This is a handy way to execute code in parallel without having to restructure your code (similar to the Parallel.ForEach method in the TPL). Most of the difficulties of concurrent programming are matters of coordination, however, and so they are often best handled by parallel libraries such as the TPL or the Concurrency & Coordination Runtime (CCR).

While and Until

The next example is the familiar while loop. It matches a condition at the beginning of each iteration and only executes the following code block if the expression provided is true.

// repeat while condition is true

while (a == 10)

{

}

The until loop works similarly, but tests its condition after the following code.

// repeat until the condition is true

until (str.Length == 0)

{

}

Although it makes sense in C# that the until clause should appear at the end, where it’s placed, the reality is that in C# the syntax is awkward: I don’t know whether to keep my curly braces aligned and put the do and until on their own lines so they don’t crowd the embedded code block, or what. With the naming difference in Archetype, and with the debugger’s help in stepping through in the correct way, I’m betting this feature will not only be easy to grasp, but hopefully will be seen to clean up certain coding situations and help make looping syntax more structurally consistent.

The break and next keywords apply to while and until loops as it does for loop constructs (see the Loop section above).

Asynchronous Programming

It’s becoming more common now to make asynchronous calls to web services and other long-running processes that we don’t want our code to sit around waiting for. In Silverlight, for example, the only network communication options we have are asynchronous. But the Asynchronous Programming Model (APM) has a way of confusing and tripping up developers as they try to wrap their heads around it.

Archetype introduces a few language constructs to make asynchronous programming easy.

Calling Methods and Delegates Asynchronously

The async keyword allows you to call any method or delegate asynchronously with the same syntax.

GetData string (Index int)

{

return (Index + 1).ToString();

}

async GetData(42)

{

var result = value;

}

All of the details of dealing with AsyncCallback and IAsyncResult are abstracted away. The value keyword represents the return value of the asynchronously-called method (if applicable). The first access of value may result in an exception in the event that the target method failed when it ran. This can be caught with a standard try-catch block, or the following syntax can be used.

async DoWork()

{

// success

}

catch (ex ArgumentException)

{

// failure

}

finally

{

// final logic

}

All of the standard rules apply regarding the syntax of catch and finally blocks (see the Exception Handling section above in this article).

As another reminder of the optional curly braces for single statements, here is a short and simple async call example.

async DoWork()

NextStep()

catch HandleError(ex)

finally Cleanup();

The ability to specify your intent to call methods and delegates asynchronously without mucking around in the implementation details should go a long way to making developers more productive in high-latency scenarios. In a final example of async, I’ll demonstrate how we can still obtain access to the IAsyncResult variable that is returned by the APM, which is useful if you need to occasionally check if it’s completed.

var ar = async DoWork()

NextStep()

catch HandleError(ex)

finally Cleanup();

Messages

Messages offer an alternative to delegates. As useful and simple as delegates are, the problem with them is that they pass along not only data, but also execution control. Often what we need, in particular for applications that must take advantage of parallel execution, is a way to pass along data without giving up control over execution. This is usually done by pushing a message onto a queue which can be picked up at the receiver’s convenience without holding up the sender.

In Archetype, messages fulfill this need. Like delegates, they can be named or anonymous. Here is an example of each.

message EmptySignal();

SomeAgent agent

{

// using a named message

Started out EmptySignal;

// using an anonymous message

Completed out message(string);

}

Unlike their delegate counterparts, messages don’t have return values. Return values only make sense when you make a synchronous call and give up execution control. Though we can get around this with the async construct and the Asynchronous Programming Model, this is something of a hack.

In the example above, the anonymous message provides the keyword message in place of a return type. In both cases, the out keyword is used to indicate that it is an outgoing message. As you might expect, there is an in keyword to indicate incoming messages.

Messages are one way communications. If you need a response, you need to specify a corresponding incoming message. This applies to the communication of error information as well. If you’re interested in learning more about asynchronous message-based communication, you can refer to my article on the subject.

The out messages act like delegates and can be called like functions, and you can think of in messages as event handlers, although they can also be invoked like methods within the agent.

I am in the process of evaluating Axum and other agent-based languages. I’m particularly interested in the way Axum defines protocol contracts, which are compile-time checks that message A is responded to by message B, and so on. It’s likely that this type of constraint will be implemented using the language extensibility capabilities of Archetype, and that it will be deferred until I understand the issues and challenges better. Until that time, I believe that having the option to define messages as first-class members will provide developers with a greatly-needed tool for safer concurrency programming. It will make the execution control model explicit and obvious at member definition instead of being bolted-on later as a set of imperative instructions in an often-misunderstood corner of the .NET Framework.

Another avenue I’m exploring is a syntax for exposing messages as WCF endpoints. Stay tuned for more information on this subject.

Next Steps

We covered a lot of ground in this article. In these first three articles, many of Archetype’s most fundamental and important constructs have been explained and demonstrated. In the next article, I’ll introduce Archetype’s capabilities and syntax for conditional selection (if), pattern matching (including a much-needed replacement for the switch statement), and the relationship between traits and classes.

There’s a lot more to see, but since this article turned out to be so large, I’ll stop here with promises about the next one.

[Part 4 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 4 Comments »

The Archetype Language (Part 2)

Posted by Dan Vanderboom on April 27, 2010

Overview

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 – Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

The Purpose of Archetype

You may wonder why I’m designing a new language. As I explained to Vlad in the comments of my introductory article:

I don’t need it. There are plenty of perfectly usable languages out there.

That being said, I want it. I want to spend a lot less time with ceremony and more with substance. I want greater expressive power without sacrificing readability. I want to extend the language syntax and hook into the compiler at certain key points to experiment with new ideas without having to version the base compiler. I want to define traits as composable types and reserve classes for engines of instantiation when it makes sense to do so. I want common concurrency patterns to feel like first-class citizens so that behavior guaranties can be made at compile time. I also want to see if all the language ideas I’ve come up with over the years will really be as valuable as I think they will be.

I have a lot more reasons, too. They’ll be the subject of continued articles in the series.

Hello World!

I would be remiss if I didn’t include a Hello World example.

Start void ()

{

System.Console.WriteLine("Hello world!");

}

This is a complete program. It defines a Start function as a top-level construct (not inside of a class), with a single call to Console.WriteLine, a normal .NET method.

Because any code block can be either a single statement or a pair of curly braces containing zero or more statement, we can shorten our example to look like this:

Start void ()

System.Console.WriteLine("Hello world!");

C# already allows this with constructs like if, while, and for. Archetype takes it to the next level by making it universal.

Requiring the start method of a program to be hosted in a class seems like a kludge to me; and in the spirit of enabling the language to be used for more functional programming, I thought it appropriate to allow this type of functional composition without the ceremony of an enclosing class.

The startup function name will be Start by default, and a modifiable setting on the project options page will let you use a different name for the entry point function. It will also support using a static function in a class.

Delegates

Delegate definitions in Archetype are very close to function definition syntax. Consider this example:

import System;

Start void ()

{

ShowInfo void (); // define a delegate

ShowInfo(); // invoke the delegate

ShowInfo = () => { Console.WriteLine("Name: " me.Name) }; // Name: Start

ShowInfo += { Console.WriteLine("Type: " me.Type) }; // Type: void

ShowInfo();

}

First, we’re introducing the import statement. Our use of it here is identical to the using keyword in C#. The import statement does some other interesting things, which we’ll see in a future article.

The first statement in our Start method defines a delegate called ShowInfo. Note that the only difference between this and a function definition is its lack of a trailing code block. Instead, a semicolon appears after the (empty) parameter list.

The next line invokes the delegate. In C# this would throw a NullReferenceException, which I’ve always found annoying. In Archetype, as with Visual Basic, this gets converted by the compiler into a check for null followed by an invocation if it’s not null. I’ve gone this route because of how rare it is that I actually want to throw an exception in these cases; in C#, I’m constantly writing the null check for delegates to avoid it, wrapping that check and the invocation in OnDoWhatever methods, and that seems wasteful. In Archetype, if you want to throw an exception when a delegate is null, then write the code to throw one explicitly.

The following two lines point the delegate to a specific function (expressed as a lambda, similar to C#) and add a lambda function to the first. The =, +=, and –= operators work as expected with delegates.

Notice that the first lamda function supplies an empty parameter list, but the second one omits it. The parameter list can be omitted when there are no parameters, or when you don’t need to reference the arguments that are passed in.

It’s possible to be even more terse if we have a single non-assignment statement we’re assigning to our delegate variable:

ShowInfo = Console.WriteLine("Name: " me.Name);

Assignment statements cause a problem with parsing because of the right-to-left interpretation of assignments. Consider these statements:

ShowInfo = Console.WriteLine("Name: " me.Name);

ShowInfo = Age = 1;

The first line is a valid delegate assignment. The intention of the second is :“each time ShowInfo is invoked, set Age to 1.” However, the parser reads this as “Set Age to 1, then set ShowInfo to Age,” which is not what we want. As a result, single assignment statement delegates in Archetype require being surrounded in curly braces.

Finally in our Start method above, the last line invokes the delegate again, which in turn calls both lambda functions.

Additional Notes:

The me keyword refers to the current function, Start. As the comments suggest, me.Name returns “Start” and me.Type returns a System.Void Type object. Calling me() as a function would call Start recursively.
String concatenation doesn’t use the + operator. Multiple strings separated by spaces are concatenated automatically. In the case of Console.WriteLine above, where a string literal (“Type: ") is followed by a non-string value (me.Type), the non-string value is converted to a string with ToString. This can occur because the non-string value is listed where a string is expected.

Delegate Parameters

Defining parameters for delegates is easy. If an anonymous function won’t use any of the parameters passed in, it can omit the parentheses entirely. Individual parameters names can be omitted with an underscore character. Otherwise, argument names are supplied as usual. All three variations can be seen in the following example:

Start void ()

{

ShowInfo void (Info string, Priority int);

ShowInfo = (info, priority) { Console.WriteLine("Name: " me.Name) };

ShowInfo += (info, _) { Console.WriteLine("Type: " me.Type.Name) };

ShowInfo += { Console.WriteLine("Info: " info) };

ShowInfo("Fake Info", 10);

}

Named (Non-Anonymous) Delegate Types

So far we’ve only seen anonymous delegate types. The ShowInfo delegate above has a type, but we can’t refer to it by name, and so we can’t share that type with other code. This is fine in many cases. In fact, many times I’m annoyed by the need to go to another file to add a delegate that will never be used elsewhere. But there’s also occassionally a need to expose that type, especially for use by a library or framework consumer.

The following code defines a delegate type, a function that uses that delegate type for its parameter, and a call to the function. That call contains a lambda expression that creates an anonymous function and a delegate object pointing to it, and passes that to the ShowInfo function.

type ShowPersonDelegate void (Name string, Age int);

ShowInfo void (ShowPerson ShowPersonDelegate)

{

ShowPerson("Josa", 4);

ShowPerson("Ava", 1);

}

Start void ()

{

ShowInfo((name, age) { Console.WriteLine(name " is " age) });

}

Another way to think about this is that the delegate keyword simply defines a name (ShowPersonDelegate), which then points to an otherwise-anonymous delegate type: void(Name string, Age int).

The final statement calls ShowInfo, passing in a lambda function, which has the same syntax as C#.

For the sake of comparison, here is the same program using an anonymous delegate. Note that the delegate’s parameter names are optional.

ShowInfo void (ShowPerson void(string,int))

{

ShowPerson("Josa", 4);

ShowPerson("Ava", 1);

}

Start void ()

{

ShowInfo((name, age) { Console.WriteLine(name " is " age) });

}

As in C#, a delegate object will be created automatically if a method name is provided where a matching delegate is requested:

SendText void (Name string, Age int)

{

Console.WriteLine(name " is " age);

}

ShowInfo void (ShowPerson void(string,int))

{

ShowPerson("Josa", 4);

ShowPerson("Ava", 1);

}

Start void ()

{

ShowInfo(SendText);

}

Delegate Duck Typing

To simplify interoperating between named and anonymous delegates, a form of compile-time duck typing is used. Consider the following code:

Predicate1 Func<int, bool>;

Predicate2 bool(int);

Predicate1 = p => p % 2 == 0;

Predicate2 = Predicate1;

Predicate1 and Predicate2 are technically two different delegate types. The anonymous bool(int) delegate will be named something like __anon_bool_int by the compiler. However, the last two lines are valid because bool(int) and Func<int, bool> are structurally equivalent. It is effectively transformed by the compiler into:

Predicate2 = p => Predicate1(p);

Bindable Properties

The bindable keyword can be used to enable data binding support for user interface controls. It’s always bothered me how much boiler plate code must be written in .NET languages for bindable properties. In C#, this is typical:

private int _Age;

public int Age

{

get { return _Age; }

set

{

_Age = value;

PropertyChanged("Age", value);

}

That’s ten lines of a code for a simple integer property! And this is a simple scenario. Compare that to Archetype’s binding property:

Age bindable int;

Much better! With this, we can define many bindable properties is a small space. This is expanded by the compiler into something like this:

_Age field int;

Age int

{

get me.Value;

set PropertyChanged(me.Name, me.Value = value);

}

After being warned about the potential dangers of INotifyPropertyChanged by Michael in the comments of the previous article, I am exploring alternative implementations. Regardless of how it’s implemented (see Part 7 for more details), bindable will be a powerful addition to Archetype developers.

Composite Bindings

Occasionally I need a property which is composed of two or more other properties, and I want to ensure that the proper data binding machinery is notified whenever each constituent property is updated. In C#, I would need to make multiple PropertyChanged calls in each of the individual properties to signal that the composite binding is changing as well. In Archetype, we can use the composite keyword within the composite property itself. Syntactically this is a pull model whereas otherwise we’d be forced to implement a push model. The Archetype syntax looks like this:

FirstName bindable string;

LastName bindable string;

FullName bindable string

get composite FirstName " " LastName;

When the compiler sees the composite keyword after get, it scans the following expression tree. When it finds property references and those properties are marked as bindable, it makes the appropriate transformations to notify of changes. In the underlying implementation, it is a push model, but the developer of Archetype is spared those details. Multiple-statement get functions are supported, and set functions are also supported when using composite.

Bindable Collection Properties

Binding to collections is a little different from binding to single properties. In WPF, Silverlight, and now more broadly in .NET 4.0, types such as ObservableCollection provide several notifications to user interface controls.

Archetype provides special binding expressions to specify common scenarios such as “bind x to the selected item of this collection”.

Here is an example of a collection and its current single selection, bound together in the view model:

Alternatives ObservableCollection<Alternative>;

SelectedAlternative bindable Alternative

bind to Alternatives.SelectedItem;

The following example demonstrates binding a collection to another collection with a discriminating expression (subselection):

Options ObservableCollection<Option>;

SelectedOptions ObservableCollection<Option>

bind to Options.SelectedItems

where item.OptionName.StartsWith("L");

SelectedOptions is an ObservableCollection so that its subselection of contents can itself be bound to a user interface control. The bind to expression sets the binding source, and the where expression specifies a predicate (a function taking “item”, in this case an Option object, as a parameter, and returning a bool) to include only the objects we want. The where expression is optional.

You may notice that SelectedItem and SelectedItems are not valid properties of ObservableCollection. This is because they are extension properties. Archetype supports extension methods just like C#, but it goes further to provide extension properties, indexers, constructors, and operators. I’ll discuss type extensions in a future article.

What this doesn’t address is the possibility of binding a collection to more than one user interface control, and allowing independent selection in each. Because of this, the specifics of binding expressions in Archetype will very likely change before being finalized, but this should give you a taste of the possibilities of language-aware binding.

Next Steps

In the next article, we’ll take a closer look at the import keyword and its special abilities, exception handling, local variable definition, and control flow structures such as if, loop, while, and until. I’ll also introduce one of my favorite Archetype features, the async construct which is used to intuitively call delegates asynchronously.

[Part 3 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 8 Comments »

The Archetype Language (Part 1)

Posted by Dan Vanderboom on April 26, 2010

Overview

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 – Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Experiments in Language Design

After 25 years of computer programming in many different languages and a more-than-casual interest in linguistic analysis, I’ve developed a keen appreciation of the best features among them. I also have a relatively steady stream of new language feature ideas. Many years ago I began tinkering with interpreters and compilers. At PDC 2009, I was surprised and delighted to hear the news of the language M (part of Oslo), with which it is possible to write parsers for other languages. Parser generators have been around for a long time, but with such strong support in .NET, it was close enough to home for me to sit up and pay close attention. The desire to create my own language to address the shortcomings I’ve experienced has been perpetually in the back of my mind.

After that PDC, I bought several more books on language and compiler design and began diving in. I’ve been somewhat obsessed with it recently, and the language specification I’m writing is starting to look legitimate, so I think it’s time to start sharing what I’ve come up with and (hopefully) get some good feedback.

The Language

The code name for this language is Archetype. I don’t know if I’ll use this for the final name, but this will do for now. It’s a multi-paradigm language instead of attempting to be pure in any one way, and if you’re familiar with C#, you should be pretty comfortable with the syntax. Yet, if you enjoy the functional programming power of languages like ML, Haskell, OCaml, F#, or Nemerle, or are interested in language constructs to simplify asynchronous and concurrent workflows, you’ll probably like Archetype. While it supports functional programming, one of my goals is to make it appealing and even obvious to developers without a strong functional programming background. It targets the .NET CLR and will therefore run on many platforms and devices, as well as interoperating well with existing .NET assemblies.

To place it in a set of buckets, as languages are classified by paradigm on Wikipedia, Archetype would be considered: imperative, declarative, generic, functional, object-oriented (class-based), language-oriented, reflective, and meta-programming-based.

Current Status

The parser is under development using M (in the Intellipad editor). Though the language design and specification itself is about 70-80% complete, the parser is only about 10% done. Once the parser is a little further along and some interesting samples can be written and parsed, I’ll start building the semantic analyzer and code generation pieces.

The first versions of the compiler will generate C# code instead of IL instructions. It will be a lot faster for me to translate Archetype constructs to C#. The C# compiler, though not concurrent or incremental, is highly optimized and produces great output. This does limit my ability to depart radically from C# semantics, but this is okay: C# is a wonderful language and I plan to keep Archetype pretty closely aligned with it. For example: all of the same operators and precedence rules are borrowed from C#. Archetype does introduce a number of new operators, keywords, and syntactical constructs, but it aims to be close to a superset as far as semantics go.

Disclaimer

Everything is subject to change. Some features are stolen directly from specific languages which I will do my best to identify as I go. Your mileage may vary. Available while quantities last. Batteries not included.

This is a set of experiments. Hopefully it will also be a fun conversation among language enthusiasts.

A Taste of Features

Since this article is already getting long and I have a ton designed already, I’m going to keep the language design part short and present a mere taste of language features.

Properties and Fields

We’ll start with something basic: how to define properties and fields.

Age int;

This first example is a property. The name comes first, which you’ll see everywhere in Archetype. Also notice that the int type is the same as in C#. This is true of all the built-in C# types.

To define a field, the “field” keyword is added before the type. This encourages property definition by default.

Age field int;

The property definition above is short for, but equivalent to, this:

Age int

{

get { return me.Value; }

set { me.Value = value; }

}

When defining properties, it’s so common to require a private “backing field” that I thought it warranted something in the language. C# also does this, but only if you use implicit get and set functions. As soon as you need custom logic for one or the other, you lose this. In Archetype, the “me” keyword refers to the current function or property. In the case of properties, me.Value is the backing field which saves you a line of code for every property that needs one. Reducing code clutter and maximizing information density and conciseness are major design goals in Archetype.

Other “me” properties are available as well, such as Name and Type, which are useful for general-purpose code generation and debugging. In functions, invoking “me” is recursive. C# has the keyword “this” (which Archetype shares), which very usefully refers to the current object. The “me” keyword is roughly analogous to this.GetType().

The curly braces surrounding get and set are optional. If a get or set method is a single statement, the curly braces around it are also optional. We could then write this:

Age int

get me.Value,

set me.Value = value;

Note the lack of a "return" keyword in the get method: “get return” would be redundant. Also, the get and set clauses are separated by a comma. This is a common pattern for multiple clauses in Archetype. The semicolon triggers the end of the statement (in this case, a property declaration statement).

Public class variables are properties by default to promote consistent and forward-looking design techniques (such as compatibility with interfaces), and the field keyword is there to opt out when there is a need (such as performance).

Value types in this language are not nullable by default. The question mark can be used after the value type’s name to indicate it is nullable.

Age int?;

Functions

I’ll have much more to say about functions and delegates in my next article. Here, I’ll just briefly sketch an outline of what they look like and hint at what’s to come.

Save<T> void (Entity T)

{

// …

}

As with properties, the identifier is listed first (along with a generic type parameter), followed by the return type, and finally the parameters in parentheses.

I’ve been comfortable with the type-first definitions in C# for years, but I’ve often begun writing a function whose name came to mind instantly, but whose return type required further thought. However, my fingers would hesitate to type anything until I could determine the return type. After seeing the name come first in Nemerle, it struck me how nice it would be to define functions name-first. The problem that Nemerle has (and Visual Basic, for that matter) is that the parameter list comes next, and the return type is listed last. This has the advantage that it’s easier to write, but suffers from being more difficult to read. When doing a quick scan of code, eyes scanning down through a class, the return types will be all over the place on the screen. In the case of long parameter lists where poorly-formatted code puts the return type off the screen too far to the right, you’d actually have to scroll right to see the return type. This is decidedly worse than type-first.

Then I thought: why not put the two most important parts of a function header first: name and return type, with parameters after them? Then you’d have the best of both worlds: faster to write, and easy to read and understand. Each parameter then follows the “name type” order, consistent with properties and functions.

Next Steps

In my next article on Archetype, I’ll go into much more detail about functions and delegates, where I think Archetype makes some original contributions (at least in terms of syntactical convenience and elegance). I’ll talk about creating basic console applications, the simplest program possible to write, anonymous functions and anonymous delegates, a keyword (actually a custom type extension) to drastically simplify data-bindable property definitions for UI view models, and more.

[Part 2 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 13 Comments »

Four Traversal Patterns for Tree<T>

Posted by Dan Vanderboom on April 5, 2010

[Updated 8/14/2014] The source code for this library can be found here at GitHub. Also check out my blog post announcing it.

This is an update to my original article on a non-binary tree data structure. After receiving multiple requests to complete this, here it is. The updated source code can be downloaded here.

To illustrate the four possible traversal types, the following tree data structure will be used in all of my examples.

The enum TreeTraversalType has two possible values: DepthFirst and BreadthFirst. The other enum involved, TreeTraversalDirection, determines which end of the tree it starts from: TopDown or BottomUp. There are four possible combinations of these two values, which will each be presented separately.

Depth-First, Top-Down

This traversal strategy involves starting at the root and digging as deep into the tree whenever possible. The order of nodes yielded by our example tree structure would be a-b-d-e-c-f-g.

Depth-First, Bottom-Up

Because we’re moving bottom-up in this case, the first thing we have to do is dive to the bottom of the first branch we find. We skip past a and b, and find d with no children. D is therefore the first node we’ll yield. E is it’s peer, so that comes next. From e, we move up the tree to b. A is the root, so it has to be the last node yielded, so we’re going to dive into the c branch next, yield f and g, and then yield c and finally a. The order is: d-e-b-f-g-c-a.

Breadth-First, Top-Down

When we traverse breadth-first, we’re moving through the tree in levels. With Top-Down, we start with the root a, then move to the second level with b and c, and then finally move to the third level to yield d-e-f-g.

Breadth-First, Bottom-Up

The final traversal is the one I had the most trouble with, and I ended up cheating a little by reversing the Breadth-First, Top-Down traversal.

Conclusion

Now, with a call to GetEnumerable on a node object, you can specify which of these four traversal patterns to use. Here is an example of how that code looks:

foreach (SimpleTreeNode<string> node in 
    root.GetEnumerable(TreeTraversalType.DepthFirst, TreeTraversalDirection.TopDown))
{
    // ...
}

Posted in Algorithms, Data Structures | Tagged: non-binary tree | 4 Comments »

Critical Development

Language design, framework development, UI design, robotics and more.

Categories

Archives

Subscribe

Top Posts

.NET Links

Blogroll

Archive for April, 2010

The Archetype Language (Part 3)

Exception Handling

Namespace Imports

Aliases

Control Flow

Iteration

Loop

Fork-Join

While and Until

Asynchronous Programming

Calling Methods and Delegates Asynchronously

Messages

Next Steps

The Archetype Language (Part 2)

The Purpose of Archetype

Hello World!

Delegates

Delegate Parameters

Named (Non-Anonymous) Delegate Types

Delegate Duck Typing

Bindable Properties

Composite Bindings

Bindable Collection Properties

Next Steps

The Archetype Language (Part 1)

Experiments in Language Design

The Language

Current Status

Disclaimer

A Taste of Features

Properties and Fields

Functions

Next Steps

Four Traversal Patterns for Tree<T>

Depth-First, Top-Down

Depth-First, Bottom-Up

Breadth-First, Top-Down

Breadth-First, Bottom-Up

Conclusion