Overview
This is part of a continuing series of articles about a new .NET language under development called Archetype. Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs. A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.
You can follow the news and progress on the Archetype compiler on twitter @archetypelang.
Links to the individual articles:
Part 1 – Properties and fields, function syntax, the me keyword
Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation
Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages
Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits
Part 5 – Type extensions, custom control structures
Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions
Part 7 – Semantic density, operator overloading, custom operators
Part 8 – Constructors, declarative Archetype: the initializer body
Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators
Conceptual articles about language design and development tools:
Semantic Density
As an avid reader growing up, I noticed that my knowledge and understanding of a topic grew more easily the faster I read. Instead of going through a chapter every day or two, which puts weeks or months between the front and back covers, I devoured 200-300 pages in a night, getting through the largest books in a couple days. And in reading multiple books on a subject back-to-back, it was easier to find relationships and tie together concepts for things that were still so fresh in my memory.
In my study of linguistics, I learned that legends like Noam Chomsky could learn hundreds of langauges; the previous librarian at the Vatican could read 97. Bodmer’s excellent book The Loom of Language attempts to teach 10 languages at once, and it seems that the more languages you learn, the easier it is to pick up others.
What these examples have in common is semantic density. It might seem from what I’ve said that this would be like drinking from a firehose which only the most gifted could endure, but I would argue that intensely-focused learning puts our minds in a highly alert and receptive condition. In such a state, being able to draw more connections between statements and ideas, we are better able to comprehend the whole in a holistic, intuitive way.
Code Example
Semantic density is important in code, too. With a pattern like INotifyPropertyChanged, formatted as I have it below, it’s 12 lines of code, 13 if you separate your fields and properties with a blank line, but in the ballpark of a dozen lines of code. (This is additional explanation for a feature described in part 2 of this series.)
1: string _Display;
2: public string Display
3: {
4: get { return _Display; }
5: set
6: {
7: _Display = value;
8:
9: if (PropertyChanged != null)
10: PropertyChanged(this, new PropertyChangedEventArgs("Display"));
11: }
12: }
Does the ability to inform external code of changes seem like it should take a dozen lines to get there? This can be somewhat compressed by defining a SetProperty method:
1: string _Display; 2: public string Display 3: { 4: get { return _Display; } 5: set { SetProperty("Display", ref _Display, value); } 6: }
This chops the line count in half, bringing it down to six lines–seven if you include a space above or below to separate it from other members. On my monitor, that means I can see about eight property definitions at a time. Now that’s usually enough, but I’ve written a few custom controls that have upwards of 30 properties. For a new pair of eyes, getting the gist of that class is going to involve a lot of scrolling, never seeing more than a small slice at a time of a much large picture. The frame of time I mentioned earlier in regard to studying a subject is analogous here to the frame of space. Seeing 20% of a class at any time lends itself to faster grokking than seeing a 2% sliver at a time. Our minds, marvelous as they are, do have limits. Lowering semantic density, such as by spreading meaning over large distances or time spans, makes us work harder to accomplish the same task, trying to put all the pieces together, and the differences are often dramatic.
Just as nouns can be modified by adjectives in natural languages, types in Archetype support user-defined type modifiers. By defining a new type modifier called bindable to encapsulate the INotifyPropertyChanged pattern, we can collapse the above example into a single line:
1: Age bindable int;
I have no problem stacking these properties right on top of one another. Although expressed in a highly dense form, it’s actually easier to understand at a glance in these three simple tokens than in the half-dozen lines above, which beg for interpretation to assemble their meaning. Even if we have 30 of these, they’d all fit on one screen, and the purpose of the class as a whole is quickly gathered.
One thing I noticed in developing the animation library Animate.NET is how much code I saved: not having to worry about the details of storyboard creation, key frames, and so on. It allows you to get right to the point of stating your intention. Often a library like this is enough, but once in a while language extensibility is a much better approach; and when it is, not having the option can be painful and time consuming.
Custom Operators & Operator Overloading
As in most languages, Archetype supports two forms of syntax for operations: functions and operators. Functions are invoked by including a pair of parentheses after their name that contain any arguments to pass in, whereas operators appear adjacent to or between sub-expressions.
In C#, some operators are available for overloading. Archetype supports these operator overloads by using the same names for them. This allows Archetype to use operators defined in C# and to expose supported operators to C# consumers.
However, Archetype goes one step further and allows you to define custom operators. There are three basic kinds of custom operator:
- unary prefix
- unary suffix
- binary
If we wanted an easy way to duplicate strings in C#, we might define an extension method called Dup, but in Archetype we also have this option:
// "ABC" dup 3 == "ABCABCABC"
binary dup string (left string, right int)
return string.Repeat(left, right)
The expression parser sees “ABC”, identifies it as a string value, and then looks at the next token. If the dot operator were found (.), it would look for a member of string or an extension member on string, but because the next token isn’t the dot operator, it looks up the token in the operator table. An operator called dup is defined with a left string argument and a right int argument, matching the expression. If the operator were more complicated, it would have a curly-brace code block, but because it’s a single return statement, that’s optional.
Archetype operators aren’t limited to letters, though. We can also use symbols (but not numbers) in our operator names. Here’s a “long forward arrow” (compiled with name DashDashGreaterThan) that allows us to write a single function parameter before the function name itself:
// "Hey" –> Console.WriteLine;
binary<T> –> void (left T, right Action<T>)
right(left);
Note that the generic type parameter is attached to the binary keyword. I arrived at this placement through much experimentation. Names like –><T> are difficult to read and can be trickier to parse.
There is a special binary operator called adjacent which you can think of as an “invisible operator” capable of inserting an operation between two sub-expressions. In the following example, two adjacent strings are interpreted as a concatenation of the two.
// "123" "45" == "12345"
binary adjacent string (left string, right string)
return left + right;
With custom operators, what was originally part of the language can now be defined in a library instead. This greatly simplifies the language. Just as methods can be shadowed to override them, so too will some ability be needed in Archetype to block or override operators that would otherwise be imported along with a namespace.
The next operator we’ll look at is the unary suffix. The example consists of units of time: minutes and seconds.
// 12 minutes == TimeSpan.FromMinutes(12)
unary suffix minutes TimeSpan (short, int)
return TimeSpan.FromMinutes((int)value);
// 3 seconds == TimeSpan.FromSeconds(3)
unary suffix seconds TimeSpan (short, int)
return TimeSpan.FromSeconds((int)value);
With support for extension properties, we could have also said 12.minutes or 3.seconds, which is already better than C#’s 12.minutes() and 3.seconds(), but by defining these tokens as unary suffix operators, we can eliminate even the dot operator and make it that much more fluent and natural to type (without losing any syntactic precision). Notice how a list of types is provided instead of a parameter list. Unary operators by definition have only a single argument, but we often want them to operate on several different types.
Here’s a floating point operator for seconds.
// 2.5 seconds == (2 seconds).Add(0.5 * 1000 milliseconds)
// WholeNumber and Fraction are extension properties on float, double, and decimal
unary suffix seconds TimeSpan (float, double, decimal)
return value.WholeNumber seconds + value.Fraction * 1000 milliseconds;
We can use the adjacent operator on TimeSpans the same that we did for strings above.
// 4 minutes 10 seconds == (4 minutes).Add(10 seconds)
binary adjacent TimeSpan (left TimeSpan, right TimeSpan)
return left.Add(right);
Now let’s combine the use of a few of these operators into a single example.
alias min = minutes;
alias s = seconds;
var later = DateTime.Now + 2 min 15 s;
// Schedule is an extension method on DateTime
later.Schedule
{
// schedule this to run later
}
We can also define our DateTime without assigning its value to a variable.
(DateTime.Now + 10 seconds).Schedule
{
}
(DateTime.Now + 10 seconds).Schedule (Repeat=10 seconds)
{
}
Repeat is an optional parameter of Schedule, defaulting to TimeSpan.Zero, meaning “don’t repeat”.
Additional Notes
When more than one operator is valid in a given position, the most specific operator (in terms of its parameter types) is used. If there’s any ambiguity or overlap remaining, a compiler error is issued.
Unary operators will take precedence over binary operators, but it hasn’t been determined yet what precedence either one will actually have in relation to all of the other operators, or whether this will be specified in the operator definition.
Because of this design for custom operators, I’ve been able to remove things from the language itself and include them as operator definitions in a library.
Summary
This article provided some deeper explanation into previously covered material and introduced the syntax for Archetype’s very powerful custom operator declaration syntax. The next article will cover some special operators built into Archetype, and property path syntax which is something I came up with a while back to safely reference identifiers that would be impervious to both refactoring and obfuscation.
I’m curious to read your feedback on custom operators in particular, so keep the great comments coming!