Critical Development

Language design, framework development, UI design, robotics and more.

The Archetype Language (Part 1)

Posted by Dan Vanderboom on April 26, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

Experiments in Language Design

After 25 years of computer programming in many different languages and a more-than-casual interest in linguistic analysis, I’ve developed a keen appreciation of the best features among them.  I also have a relatively steady stream of new language feature ideas.  Many years ago I began tinkering with interpreters and compilers.  At PDC 2009, I was surprised and delighted to hear the news of the language M (part of Oslo), with which it is possible to write parsers for other languages.  Parser generators have been around for a long time, but with such strong support in .NET, it was close enough to home for me to sit up and pay close attention.  The desire to create my own language to address the shortcomings I’ve experienced has been perpetually in the back of my mind.

After that PDC, I bought several more books on language and compiler design and began diving in.  I’ve been somewhat obsessed with it recently, and the language specification I’m writing is starting to look legitimate, so I think it’s time to start sharing what I’ve come up with and (hopefully) get some good feedback.

The Language

The code name for this language is Archetype.  I don’t know if I’ll use this for the final name, but this will do for now.  It’s a multi-paradigm language instead of attempting to be pure in any one way, and if you’re familiar with C#, you should be pretty comfortable with the syntax.  Yet, if you enjoy the functional programming power of languages like ML, Haskell, OCaml, F#, or Nemerle, or are interested in language constructs to simplify asynchronous and concurrent workflows, you’ll probably like Archetype.  While it supports functional programming, one of my goals is to make it appealing and even obvious to developers without a strong functional programming background.  It targets the .NET CLR and will therefore run on many platforms and devices, as well as interoperating well with existing .NET assemblies.

To place it in a set of buckets, as languages are classified by paradigm on Wikipedia, Archetype would be considered: imperative, declarative, generic, functional, object-oriented (class-based), language-oriented, reflective, and meta-programming-based.

Current Status

The parser is under development using M (in the Intellipad editor).  Though the language design and specification itself is about 70-80% complete, the parser is only about 10% done.  Once the parser is a little further along and some interesting samples can be written and parsed, I’ll start building the semantic analyzer and code generation pieces.

The first versions of the compiler will generate C# code instead of IL instructions.  It will be a lot faster for me to translate Archetype constructs to C#.  The C# compiler, though not concurrent or incremental, is highly optimized and produces great output.  This does limit my ability to depart radically from C# semantics, but this is okay: C# is a wonderful language and I plan to keep Archetype pretty closely aligned with it.  For example: all of the same operators and precedence rules are borrowed from C#.  Archetype does introduce a number of new operators, keywords, and syntactical constructs, but it aims to be close to a superset as far as semantics go.

Disclaimer

Everything is subject to change.  Some features are stolen directly from specific languages which I will do my best to identify as I go.  Your mileage may vary.  Available while quantities last.  Batteries not included.

This is a set of experiments.  Hopefully it will also be a fun conversation among language enthusiasts.

A Taste of Features

Since this article is already getting long and I have a ton designed already, I’m going to keep the language design part short and present a mere taste of language features.

Properties and Fields

We’ll start with something basic: how to define properties and fields.

Age int;

This first example is a property.  The name comes first, which you’ll see everywhere in Archetype.  Also notice that the int type is the same as in C#.  This is true of all the built-in C# types.

To define a field, the “field” keyword is added before the type.  This encourages property definition by default.

Age field int;

The property definition above is short for, but equivalent to, this:

Age int

{

get { return me.Value; }

set { me.Value = value; }

}

When defining properties, it’s so common to require a private “backing field” that I thought it warranted something in the language.  C# also does this, but only if you use implicit get and set functions.  As soon as you need custom logic for one or the other, you lose this.  In Archetype, the “me” keyword refers to the current function or property.  In the case of properties, me.Value is the backing field which saves you a line of code for every property that needs one.  Reducing code clutter and maximizing information density and conciseness are major design goals in Archetype.

Other “me” properties are available as well, such as Name and Type, which are useful for general-purpose code generation and debugging.  In functions, invoking “me” is recursive.  C# has the keyword “this” (which Archetype shares), which very usefully refers to the current object.  The “me” keyword is roughly analogous to this.GetType().

The curly braces surrounding get and set are optional.  If a get or set method is a single statement, the curly braces around it are also optional.  We could then write this:

Age int

get me.Value,

set me.Value = value;

Note the lack of a "return" keyword in the get method: “get return” would be redundant.  Also, the get and set clauses are separated by a comma.  This is a common pattern for multiple clauses in Archetype.  The semicolon triggers the end of the statement (in this case, a property declaration statement).

Public class variables are properties by default to promote consistent and forward-looking design techniques (such as compatibility with interfaces), and the field keyword is there to opt out when there is a need (such as performance).

Value types in this language are not nullable by default. The question mark can be used after the value type’s name to indicate it is nullable.

Age int?;

Functions

I’ll have much more to say about functions and delegates in my next article.  Here, I’ll just briefly sketch an outline of what they look like and hint at what’s to come.

Save<T> void (Entity T)

{

// …

}

As with properties, the identifier is listed first (along with a generic type parameter), followed by the return type, and finally the parameters in parentheses.

I’ve been comfortable with the type-first definitions in C# for years, but I’ve often begun writing a function whose name came to mind instantly, but whose return type required further thought.  However, my fingers would hesitate to type anything until I could determine the return type.  After seeing the name come first in Nemerle, it struck me how nice it would be to define functions name-first.  The problem that Nemerle has (and Visual Basic, for that matter) is that the parameter list comes next, and the return type is listed last.  This has the advantage that it’s easier to write, but suffers from being more difficult to read.  When doing a quick scan of code, eyes scanning down through a class, the return types will be all over the place on the screen.  In the case of long parameter lists where poorly-formatted code puts the return type off the screen too far to the right, you’d actually have to scroll right to see the return type.  This is decidedly worse than type-first.

Then I thought: why not put the two most important parts of a function header first: name and return type, with parameters after them?  Then you’d have the best of both worlds: faster to write, and easy to read and understand.  Each parameter then follows the “name type” order, consistent with properties and functions.

Next Steps

In my next article on Archetype, I’ll go into much more detail about functions and delegates, where I think Archetype makes some original contributions (at least in terms of syntactical convenience and elegance).  I’ll talk about creating basic console applications, the simplest program possible to write, anonymous functions and anonymous delegates, a keyword (actually a custom type extension) to drastically simplify data-bindable property definitions for UI view models, and more.

[Part 2 of this series can be found here.]

Advertisements

13 Responses to “The Archetype Language (Part 1)”

  1. […] more here: New Language Code-named “Archetype” « Critical Development If you enjoyed this article please consider sharing […]

  2. I practice language design myself. I’m interested in how you wil integrate with the IDE. Do you plan on working within the project build system? Supporting syntax hilighting? Intellisense?

    I’m also intrigued by you teaser on supporting data binding in the language. I would encourage you to think carefully about simply implementing INotifyPropertyChanged. Proper data binding requires dependency management. It is deeper than just firing an event when a property is set. The code has to understand indirect dependencies.

    I’ve solved the problem at the library level, but you have even better opportunities at the language level. Spend some time with Update Controls (http://updatecontrols.codeplex.com) to see what’s possible.

    • Dan Vanderboom said

      I’m planning on using the Managed Language Service (MLS). Yes, I plan to use the project build system, syntax highlighting, code completion, list members, and all the other language services that much Visual Studio such a productive environment.

      Thanks for the warning about bindable properties. I was beginning with an INotifyPropertyChanged implementation, but now I’ll take a closer look at your library before making that commitment. Ultimately what I care about is the syntax and behavior. I’m not attached to any particular implementation at this point.

      Great feedback!

  3. Adam Salvo said

    This is looking pretty good so far. I like the similarity to c# which would make it easier to try this out eventually (spend more time evaluating the new features, less time figuring out how to do CS101 stuff).

    I like the property by default. Will there be shorthand notation for doing a public getter, but a private/protected setter?

  4. Vlad said

    Why you need this language?
    What goal of it?

    • Dan Vanderboom said

      I don’t need it. There are plenty of perfectly usable languages out there.

      That being said, I want it. I want to spend a lot less time with ceremony and more with substance. I want greater expressive power without sacrificing readability. I want to extend the language syntax and hook into the compiler at certain key points to experiment with new ideas without having to version the base compiler. I want to define traits as composable types and reserve classes for engines of instantiation. I want common concurrency patterns to feel like first-class citizens so behavior guaranties can be made at compile time. I also want to see if all the language ideas I’ve come up with over the years will really be as valuable as I think they would be.

      I have a lot more reasons, too. They’ll be the subject of continued articles in the series.

      • Vlad said

        Maybe would be better to join to Nemerle developers?

      • Dan Vanderboom said

        Vlad, I actually started down the path of learning and extending Nemerle, but I ultimately decided that I need complete control of all the language primitives to pull off some of my ideas.

      • Vlad said

        We (Nemerle developers) plan to start development of new version of Nemerle. In this version we plan to remove all restrictions on changing of syntax.

        We plan using PEG (http://en.wikipedia.org/wiki/Parsing_expression_grammar) for new generation macro system.

      • Dan Vanderboom said

        Parsing Expression Grammars look very cool. Are you going to implement a packrat parser for performance? I’m assuming the base grammar will be a PEG as well, and that modifications (additions, etc.) will simply modify the PEG tree? I read Bryan Ford’s (from MIT) paper, “Parsing Expression Grammars: A Recognition-Based Syntactic Foundation”, which makes a heck of a lot of sense. Especially where he explains that a CFG is a rule system for generating language strings, whereas what we really want is a rule system for recognizing language strings.

        When I was reading this, I imagined a master language that was expressive enough to let you make any modifications to the syntax tree, as well as to supply compiler hooks to make semantic checks and to generate code (in the same core language). It would also allow you to “turn off” its meta-language capabilities within a language definition file on a granular level. The result is really a language workbench, with which it would be much easier to define and experiment with new languages as incremental sets of features.

        What are the ultimate goals for Nemerle?

  5. Hi Dan!

    Love it so far! C#, while a nice language, is full of shortcomings.

    BTW, “M” will probably have a Language Service-Adapter when it releases: https://connect.microsoft.com/oslo/feedback/details/524879/custom-m-langauges-should-be-exposed-to-visual-studio-language-services

    -Lars

  6. Cyril said

    Hi Dan,

    Archetype’s design path looks pretty nice to me so far. Keep it up 🙂 Just saw that PEGs were mentioned several times. So, just thought you might want to have a glance at my own experiments with them:

    http://code.google.com/p/ysharp/

    (PEG implementation via a fluent interface in C# and also grammar patterns mutable… at parse time (sisi! for dynamic grammar refinements depending from the input being parsed… 🙂

    My motivations are not quite the same as yours, but, thus, just in case you’d find it helpful/inspiring for your needs.

    Cheers!
    CJ

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: