The Archetype Language (Part 6)
Posted by Dan Vanderboom on June 14, 2010
This is part of a continuing series of articles about a new .NET language under development called Archetype. Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs. A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.
You can follow the news and progress on the Archetype compiler on twitter @archetypelang.
Links to the individual articles:
Part 1 – Properties and fields, function syntax, the me keyword
Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation
Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages
Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits
Part 5 – Type extensions, custom control structures
Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions
Part 7 – Semantic density, operator overloading, custom operators
Part 8 – Constructors, declarative Archetype: the initializer body
Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators
Conceptual articles about language design and development tools:
In Archetype, an if expression can be provided for any value. The expression variant of the if statement, instead of taking embedded statement clauses, takes value expressions for its "consequence" clauses.
The if expression serves the same purpose as the ternary conditional operator in C#:
TextField.PasswordChar = DisplayStar ? ‘*’ else ‘ ‘;
Enumerations are represented syntactically like lists in Archetype, using the square brackets to enclose values. The idea is that an enumeration type is simply a list of possible values.
enum RainbowColor [ Red, Orange, Yellow, Green, Blue, Indigo, Violet ];
Enumerations are often formatted to display one value on each line. The following example demonstrates this, and defines a variable of the enumeration’s type.
Anonymous Enumeration Types
It normally makes sense for an enumeration type to be named so it can be referenced and used elsewhere. But in cases where an enumeration is only needed privately within a single class or method, an anonymous enumeration type can be defined this way:
ForegroundColor enum [ Black, Gray, DarkBlue ];
Regardless of whether you’re working with named or anonymous enumerations, assignment is the same. The enumeration type is not used in the assignment, which works well with anonymous enumerations since they don’t have a discoverable name.
// no need for an enumeration name
// so it also works great with anonymous enumeration types
BackgroundColor = Green;
Language services (Intellisense) can still inform the user of the possible values after the equals sign and space are entered.
The nullable type operator converts a type T to Nullable<T>, the same as in C#. Consider the following examples of normal and bindable nullable properties, and a local variable with an initializer.
HighScore bindable int?;
var Age int? = null;
Additionally, we can define a local variable and infer its type from an assignment, using the nullable type operator to force type inference to use a nullable type.
// give me a nullable type, even though I’m not setting it to null no?
var Age ? = 4;
From this point on, we can assign values (including null) to the Age variable without using the null type operator. In fact, including the ? operator would be invalid.
// update the value of Age; notice we don’t use the nullable ? symbol after the definition
Age = 9;
A tuple is an anonymous type consisting of an ordered set of heterogeneous fields. In Archetype, their fields can be named for Intellisense hinting when used as return types, or left unnamed. In local variable definitions, their individual members must either be named or use the anonymous member symbol, the underscore.
The following example shows the syntax for defining a tuple as a return type for a function. In this case, a pair of int values will be returned.
GetMouseLocation (int, int) ()
return (100, 50);
The members of a tuple in a return type can be named as a hint to the caller of the function.
GetMouseLocation (x int, y int) ()
return (100, 50);
The function is called and its return tuple value stored like this:
var (x, y) = GetMouseLocation();
Here we’re defining a new tuple type, Tuple<int, int>, which is not named as a whole. We might call it an anonymous tuple. Instead of naming the whole, we’re naming the individual members.
Using the .NET Tuple type, we could also write this:
// we don’t care about the y value here
var (x, _) = GetMouseLocation();
We would then have to reference loc.Item1 and loc.Item2 to access the individual members. Naming the members instead of the whole, however, makes more sense because it provides greater code readability.
This next example demonstrates how tuples can be defined using type inference.
var (a, b) = (1, 2);
var (c, d) = (a, b);
var x = b;
On the first line, a and b are defined as accessors into a new tuple: a is assigned to a value of 1, and b to a value of 2. On the second line, another tuple is defined and its member c is assigned to a while d is assigned to b. The third line demonstrates how you can use the tuple members independently of each other. In this case, the value of b is assigned to x.
If we don’t care about all of the members of a tuple, we can use the underscore character to ignore that member. The next example shows how to extract the x value from our GetMouseLocation function while ignoring the y value.
// we don’t care about the y value here
var (x, _) = GetMouseLocation();
Finally, we have a handy way of swapping values without the need to introduce a third variable.
(a, b) = (b, a);
Archetype is not limited to two-member tuples. The .NET Framework defines tuples up to seven members, so Archetype will handle at least that many. If that proves inadequate, it should be relatively easy to extend this to any number of members.
I first read about streams (or lazy lists, as in Haskell) in a C Omega document on a Microsoft Research site. They’re analogous to sequences in XQuery and XPath, and are implemented using the IEnumerable<T> type in an iterator. I liked C Omega’s * operator to define a stream because of the way it sets that type apart from a normal type. In C#, it’s not obvious that a function with a return type of IEnumerable<T> should behave any differently from another function until you notice the yield keyword.
If I want a stream defined as a property in C#, I’d have to write something like this:
yield return 1;
yield return 3;
yield return 5;
In Archetype, the syntax is more succinct and direct:
We can be even more terse in such cases by using a comma in the yield list.
yield 1, 2, 3;
Using list comprehensions, which we’ll explore in more detail later in this article, we can do this as well:
yield 0..100 skip 5;
One note about the list comprehension here: we don’t use square brackets around the numeric range because they are implied in the yield statement. Including them here would cause the yield statement to return the list as a single yielded value.
These examples produce streams with a fixed number of elements, but streams can be infinite as well. This example returns all positive odd numbers starting with one.
def i = 1;
i += 2;
Streams are lazy, so while it looks at first glance like an infinite loop from which you’ll never escape, in reality control is driven by the loop that accesses the stream.
loop (var n in OddNumbers)
if (n > 100)
When other type operators are used, such as the nullable type operator, the stream operator must appear last.
Archetype provides some special syntax for constructing lists called list comprehensions. This is syntactic sugar that provide shortcuts for building lists.
Consider the following syntax in C# and Archetype for constructing a list from 1 to 100.
var FirstHundred = from x in Enumerable.Range(1, 100) select x;
var FirstHundred = [ 1..100 ];
The square brackets in Archetype specify the construction of a list. Now consider a more complicated list construction. In this case, Linq is employed in both langauges:
var FirstHundred = from x in Enumerable.Range(1, 100) where x*x > 3 select x*2;
var FirstHundred = from x in [ 1..100 ] where x*x > 3 select x*2;
Here you can see how a list can be used as the source of a query.
Here are some more list comprehension examples:
One of the gems in Pascal is subrange types. This allows a developer to define a new type that is structurally the same as another type, but whose values are constrained in some way. I’m often bothered by the disparity between database and .NET types. In a database, a string type (such as varchar) has a definite and usually small limit. In .NET, strings can be up to 2 MB, but there hasn’t been a good way in languages like C# and Visual Basic to constrain the length. In various object-relational mappers, a Size attribute is often employed, but this is only metadata and does nothing to prevent the string from becoming too large, so additional work must be carefully performed to constrain the input using control properties and validation logic.
Archetype answers this with subrange types and type constraints. Consider the following:
// an int that can only have a value from 0 to 105
type ValidAge int in [0..105];
We can now use this ValidAge type to define our class properties:
If a type is unlikely to be reused, we can also define subrange types anonymously.
Age int in [0..105];
In fact, any list comprehension can be used in a subrange type expression, including multiple ranges, as long as a single base type is used. This example shows an age property that is valid for underage and retiring age people, but is invalid for any ages in between.
Age int in [0..17, 65..105];
We can limit the length of a string simply:
Although in actual practice, it might make more sense to create several named types for various string lengths represented in a database:
type Code string#10;
type Name string#20;
type Summary string#100;
type Description string;
By using a limited number of named string types, both in your code as well as in the database, it’s much easier to update the lengths as needed with a lot less effort. Archetype adds attributes to the members using these types as well, so this data can be queried and used to inform user interface controls and validation logic, enabling a stronger model-driven approach.
The length of strings doesn’t need to be a single number representing the maximum, however. We can also specify a range of lengths.
Name string#2..3 = ”ZZZ”;
Notice in this example how an initializer must be used. This is because a value of null is actually invalid for the Name property. The minimum allowed length is 2. Not providing an initializer with a valid value produces a compiler error. If we wanted to also allow a null value, we would do so like this:
Name string?#2..3 = “ZZZ”;
As with type constraint expressions—discussed in the next section—Archetype injects the appropriate runtime checks in the property setter before any explicitly specified setter code, and throws an OutOfRangeException if the value doesn’t match the specified type criteria.
Type Constraint Expressions
Related to subrange types, type constraints can be applied equally to named or anonymous types. They allow you to specify a Linq-like where clause that will be used to check values being assigned to properties at runtime. Because they rely on property setter methods, type constraints cannot be used on fields. Fortunately, local variables within methods are also implemented like properties by default, so type constraints are also valid on local variables.
I’ve noticed that for some brands, or some stores carrying those brands, only even-numbered sizes of pants are stocked. This example shows a subrange type representing pants size, using both a subrange type as well as a type constraint expression.
// an int that can only be even
type PantSize int in [0..60] where value % 2 == 0;
Using the modulus operator to obtain the remainder of division, we can be certain now that values of this type will only be even numbers. Type constraint expressions are allowed to call static or global functions, properties, and fields, but they cannot reference instance members.
This article covered a lot of type fundamentals. It should be obvious at this point how a common thread is being woven into the Archetype language. You’ve probably noticed how almost every construct has named and anonymous counterparts. Another important theme is the ability to extend types with syntax designed to shape them, and the use of Linq-like expressions throughout the language.
There is still some type content to cover, such as variant or tagged union types and duck typing, which I’ll save for a future article. Also coming soon is my work on defining custom query comprehensions for a Linq-like query language which can be easily extended with a simple language feature, as well as operators for higher-order functions like fold, map, and others.
Work on my goal of getting a basic Archetype compiler into everyone’s hands is going slowly but steadily. I have a simple Silverlight IDE running in the cloud that parses Archetype code and will return a .NET assembly. I got the Oslo tools to work on the server, and I’m partially building the AST I need to perform the semantic analysis and use to report compile errors to the user and to generate the C# code which I’ll then compile with csc.exe. I’m using a WCF publish-subscribe pattern to initiate a build from the client and report progress as messages going back to the client. In the next few weeks for sure, and possibly sooner, I’ll post a link to that so you can give Archetype a test drive yourself.