Critical Development

Language design, framework development, UI design, robotics and more.

The Archetype Language (Part 6)

Posted by Dan Vanderboom on June 14, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

If Expressions

In Archetype, an if expression can be provided for any value.  The expression variant of the if statement, instead of taking embedded statement clauses, takes value expressions for its "consequence" clauses.

The if expression serves the same purpose as the ternary conditional operator in C#:

TextField.PasswordChar = DisplayStar ? ‘*’ else ‘ ‘;

Enumerations

Enumerations are represented syntactically like lists in Archetype, using the square brackets to enclose values.  The idea is that an enumeration type is simply a list of possible values.

enum RainbowColor [ Red, Orange, Yellow, Green, Blue, Indigo, Violet ];

Enumerations are often formatted to display one value on each line.  The following example demonstrates this, and defines a variable of the enumeration’s type.

enum RainbowColor

[

Red,

Orange,

Yellow,

Green,

Blue,

Indigo,

Violet

];

Anonymous Enumeration Types

It normally makes sense for an enumeration type to be named so it can be referenced and used elsewhere.  But in cases where an enumeration is only needed privately within a single class or method, an anonymous enumeration type can be defined this way:

ForegroundColor enum [ Black, Gray, DarkBlue ];

Enumeration Assignment

Regardless of whether you’re working with named or anonymous enumerations, assignment is the same.  The enumeration type is not used in the assignment, which works well with anonymous enumerations since they don’t have a discoverable name.

// no need for an enumeration name

// so it also works great with anonymous enumeration types

BackgroundColor = Green;

Language services (Intellisense) can still inform the user of the possible values after the equals sign and space are entered.

Nullable Types

The nullable type operator converts a type T to Nullable<T>, the same as in C#.  Consider the following examples of normal and bindable nullable properties, and a local variable with an initializer.

Age int?;

 

HighScore bindable int?;

 

var Age int? = null;

Additionally, we can define a local variable and infer its type from an assignment, using the nullable type operator to force type inference to use a nullable type.

// give me a nullable type, even though I’m not setting it to null no?

var Age ? = 4;

From this point on, we can assign values (including null) to the Age variable without using the null type operator.  In fact, including the ? operator would be invalid.

// update the value of Age; notice we don’t use the nullable ? symbol after the definition

Age = 9;

Tuples

A tuple is an anonymous type consisting of an ordered set of heterogeneous fields. In Archetype, their fields can be named for Intellisense hinting when used as return types, or left unnamed. In local variable definitions, their individual members must either be named or use the anonymous member symbol, the underscore.

The following example shows the syntax for defining a tuple as a return type for a function.  In this case, a pair of int values will be returned.

GetMouseLocation (int, int) ()

{

return (100, 50);

}

The members of a tuple in a return type can be named as a hint to the caller of the function.

GetMouseLocation (x int, y int) ()

{

return (100, 50);

}

The function is called and its return tuple value stored like this:

var (x, y) = GetMouseLocation();

Here we’re defining a new tuple type, Tuple<int, int>, which is not named as a whole.  We might call it an anonymous tuple.  Instead of naming the whole, we’re naming the individual members.

Using the .NET Tuple type, we could also write this:

// we don’t care about the y value here

var (x, _) = GetMouseLocation();

We would then have to reference loc.Item1 and loc.Item2 to access the individual members.  Naming the members instead of the whole, however, makes more sense because it provides greater code readability.

This next example demonstrates how tuples can be defined using type inference.

var (a, b) = (1, 2);

var (c, d) = (a, b);

var x = b;

On the first line, a and b are defined as accessors into a new tuple: a is assigned to a value of 1, and b to a value of 2.  On the second line, another tuple is defined and its member c is assigned to a while d is assigned to b.  The third line demonstrates how you can use the tuple members independently of each other.  In this case, the value of b is assigned to x.

If we don’t care about all of the members of a tuple, we can use the underscore character to ignore that member.  The next example shows how to extract the x value from our GetMouseLocation function while ignoring the y value.

// we don’t care about the y value here

var (x, _) = GetMouseLocation();

Finally, we have a handy way of swapping values without the need to introduce a third variable.

(a, b) = (b, a);

Archetype is not limited to two-member tuples.  The .NET Framework defines tuples up to seven members, so Archetype will handle at least that many.  If that proves inadequate, it should be relatively easy to extend this to any number of members.

Streams

I first read about streams (or lazy lists, as in Haskell) in a C Omega document on a Microsoft Research site.  They’re analogous to sequences in XQuery and XPath, and are implemented using the IEnumerable<T> type in an iterator.  I liked C Omega’s * operator to define a stream because of the way it sets that type apart from a normal type.  In C#, it’s not obvious that a function with a return type of IEnumerable<T> should behave any differently from another function until you notice the yield keyword.

If I want a stream defined as a property in C#, I’d have to write something like this:

IEnumerable<int> Numbers

{

get

{

yield return 1;

yield return 3;

yield return 5;

}

}

In Archetype, the syntax is more succinct and direct:

Numbers int*

{

yield 1;

yield 2;

yield 3;

}

We can be even more terse in such cases by using a comma in the yield list.

Numbers int*

{

yield 1, 2, 3;

}

Using list comprehensions, which we’ll explore in more detail later in this article, we can do this as well:

Numbers int*

{

yield 0..100 skip 5;

}

One note about the list comprehension here: we don’t use square brackets around the numeric range because they are implied in the yield statement.  Including them here would cause the yield statement to return the list as a single yielded value.

These examples produce streams with a fixed number of elements, but streams can be infinite as well.  This example returns all positive odd numbers starting with one.

OddNumbers int*

{

def i = 1;

loop

{

yield i;

i += 2;

}

}

Streams are lazy, so while it looks at first glance like an infinite loop from which you’ll never escape, in reality control is driven by the loop that accesses the stream.

loop (var n in OddNumbers)

{

Console.WriteLine(n);

if (n > 100)

break;

}

When other type operators are used, such as the nullable type operator, the stream operator must appear last.

Ages int?*

{

yield 35;

yield null;

}

List Comprehensions

Archetype provides some special syntax for constructing lists called list comprehensions.  This is syntactic sugar that provide shortcuts for building lists.

Consider the following syntax in C# and Archetype for constructing a list from 1 to 100.

// C#

var FirstHundred = from x in Enumerable.Range(1, 100) select x;

 

// Archetype

var FirstHundred = [ 1..100 ];

The square brackets in Archetype specify the construction of a list.  Now consider a more complicated list construction.  In this case, Linq is employed in both langauges:

// C#

var FirstHundred = from x in Enumerable.Range(1, 100) where x*x > 3 select x*2;

 

// Archetype

var FirstHundred = from x in [ 1..100 ] where x*x > 3 select x*2;

Here you can see how a list can be used as the source of a query.

Here are some more list comprehension examples:

image

Subrange Types

One of the gems in Pascal is subrange types.  This allows a developer to define a new type that is structurally the same as another type, but whose values are constrained in some way.  I’m often bothered by the disparity between database and .NET types.  In a database, a string type (such as varchar) has a definite and usually small limit.  In .NET, strings can be up to 2 MB, but there hasn’t been a good way in languages like C# and Visual Basic to constrain the length.  In various object-relational mappers, a Size attribute is often employed, but this is only metadata and does nothing to prevent the string from becoming too large, so additional work must be carefully performed to constrain the input using control properties and validation logic.

Archetype answers this with subrange types and type constraints.  Consider the following:

// an int that can only have a value from 0 to 105

type ValidAge int in [0..105];

We can now use this ValidAge type to define our class properties:

Age ValidAge;

If a type is unlikely to be reused, we can also define subrange types anonymously.

Age int in [0..105];

In fact, any list comprehension can be used in a subrange type expression, including multiple ranges, as long as a single base type is used.  This example shows an age property that is valid for underage and retiring age people, but is invalid for any ages in between.

Age int in [0..17, 65..105];

We can limit the length of a string simply:

LastName string#30;

Although in actual practice, it might make more sense to create several named types for various string lengths represented in a database:

type Code string#10;

type Name string#20;

type Summary string#100;

type Description string;

 

LastName Name;

By using a limited number of named string types, both in your code as well as in the database, it’s much easier to update the lengths as needed with a lot less effort.  Archetype adds attributes to the members using these types as well, so this data can be queried and used to inform user interface controls and validation logic, enabling a stronger model-driven approach.

The length of strings doesn’t need to be a single number representing the maximum, however.  We can also specify a range of lengths.

Name string#2..3 = ”ZZZ”;

Notice in this example how an initializer must be used.  This is because a value of null is actually invalid for the Name property.  The minimum allowed length is 2.  Not providing an initializer with a valid value produces a compiler error.  If we wanted to also allow a null value, we would do so like this:

Name string?#2..3 = “ZZZ”;

As with type constraint expressions—discussed in the next section—Archetype injects the appropriate runtime checks in the property setter before any explicitly specified setter code, and throws an OutOfRangeException if the value doesn’t match the specified type criteria.

Type Constraint Expressions

Related to subrange types, type constraints can be applied equally to named or anonymous types.  They allow you to specify a Linq-like where clause that will be used to check values being assigned to properties at runtime.  Because they rely on property setter methods, type constraints cannot be used on fields.  Fortunately, local variables within methods are also implemented like properties by default, so type constraints are also valid on local variables.

I’ve noticed that for some brands, or some stores carrying those brands, only even-numbered sizes of pants are stocked.  This example shows a subrange type representing pants size, using both a subrange type as well as a type constraint expression.

// an int that can only be even

type PantSize int in [0..60] where value % 2 == 0;


Using the modulus operator to obtain the remainder of division, we can be certain now that values of this type will only be even numbers.  Type constraint expressions are allowed to call static or global functions, properties, and fields, but they cannot reference instance members.

Summary

This article covered a lot of type fundamentals.  It should be obvious at this point how a common thread is being woven into the Archetype language.  You’ve probably noticed how almost every construct has named and anonymous counterparts.  Another important theme is the ability to extend types with syntax designed to shape them, and the use of Linq-like expressions throughout the language.

There is still some type content to cover, such as variant or tagged union types and duck typing, which I’ll save for a future article.  Also coming soon is my work on defining custom query comprehensions for a Linq-like query language which can be easily extended with a simple language feature, as well as operators for higher-order functions like fold, map, and others.

Work on my goal of getting a basic Archetype compiler into everyone’s hands is going slowly but steadily.  I have a simple Silverlight IDE running in the cloud that parses Archetype code and will return a .NET assembly.  I got the Oslo tools to work on the server, and I’m partially building the AST I need to perform the semantic analysis and use to report compile errors to the user and to generate the C# code which I’ll then compile with csc.exe.  I’m using a WCF publish-subscribe pattern to initiate a build from the client and report progress as messages going back to the client.  In the next few weeks for sure, and possibly sooner, I’ll post a link to that so you can give Archetype a test drive yourself.

[Part 7 of this series can be found here.]

Advertisements

14 Responses to “The Archetype Language (Part 6)”

  1. Hi again 🙂

    like the enums-thing.

    Conditional-Operator:
    You don’t like “?:”? agree that it is often very unreadable – but when the expressions are short, it’s nice.

    What about “??”?
    Have you thought about a spread-operator? In groovy you do: list*.name, which in c# would be the same as list.Select(e=>e.Name)

    Actually I like int? … why do you change it to int'?

    List comprehension is great! And I was about to suggest a shorthand for IEnumerable 🙂

    string[30] looks like an array of strings… consider “M”-syntax here: string#30

    Also for type-constraints you could support the “M”-way, which actually will change.

    Maybe also think of sets = unsorted lists. {14,15,16}=={15,16,14}

    Thought of overloading operators for lists?
    [1,2,3] == [1,2]+3
    [1,2] == [1,2,3]-3

    Do you yet have a parser for this stuff?

    • Dan Vanderboom said

      I use ? : all the time in C# but I notice that others avoid it; it does look a little cryptic. Actually, I buy that the ? represents a conditional, but I don’t like : as else.

      Even this would be better, I think, which is something of a compromise.
      def plan = IsEnabled ? PlanA else PlanB;

      ?? – I like the null coalescing operator, and an assignment version of it as well: ??=

      I chose ‘ instead of ? for nullable types because I was using ? as a Kleene operator in regular expressions, and was planning to use ? : at the time for conditional expressions (and may switch to using ? else now). My instinct was to avoid too much overloading of symbolic operators to avoid confusion. I’m somewhat on the fence about these decisions. In typing out a bunch of Archetype code, I’ve come to like the ‘ symbol on nullables, but I could go either way at this point. These decisions probably won’t be finalized until I’m writing and compiling a bunch of real code and get a better feel for how it works together.

      I was, in fact, just considering a spread operator! Though I didn’t know to call it that. C Omega overloads the . member access operator on streams to do the same thing in this paper. If you have a stream like this:

      virtual int* FromTo(int b, int e)
      {
      for (i = b; i <= e; i++) yield return i;
      }

      The following expression converts all 100 elements in the stream to a string, pads each string, and then returns a new stream of these padded strings.

      FromTo(1, 100).ToString().PadLeft(5)

      This is really cool, but I don't like the idea of giving up the ability to access members of the stream object itself. I had sketched in a : colon in my notes, which didn't seem too bad, but I really like the *. combination: the * asterisk suggests the operation's multiplicity, while the . dot allows me to continue building a path.

      You're right about string[30] overlapping with array syntax. I had a nagging feeling to write [..30] or better yet, to force a full range for string lengths like [0..30].

      Aesthetically, I'm not crazy about the M syntax of string#30. And there should definitely be some visual differentiation between arrays and length ranges. The base type, string, and our type constraint, the length range, are closely bound together and I'm used to assuming they should appear first, followed by array notation.

      string#30[5]
      string[30](5)
      string(30)[5]
      string{30}[5]
      string.30[5]
      string:30[5]

      However, when I brainstorm ideas, the array subscript looks more like a modifier of the 30 then the string. What happens when we indicate the array-ness of a type first?

      // "five string-thirties"
      LastName : [5] string:30;

      This makes even more sense when you consider type constraint expressions, such as this array/list of 10 peoples' pant sizes:

      PantSizes : [10] int where value % 2 == 0;

      • *. is from Groovy: http://groovy.codehaus.org/Operators#Operators-SpreadOperator%28.%29

        agree, true ? x else y is better… but ?: is used in every curly-style language 🙂

        Actually ? in int? also acts like a kleene-operator, right? 🙂

        Right, type-constraint belongs to the type, array comes after. But I don’t like the two “:” in LastName : [5] string:30;

        Is a short-hand really needed here? What about using “#” as count-operator, and imply value..:
        LastName : [5] string where #<30

      • Dan Vanderboom said

        @lars

        Yes, ?: is used in every curly-style language, but so is for(;;), which is an abomination. I’m okay with deviating a little for an improvement.

        Ha, int? is like a Kleene operator in a way! Good point! I think that’s enough justification for me to change it back. int? it is.

        I also don’t care for the second : in the type expression clause. I spent a good while staring at all the symbols on my keyboard and found little to help. The most promising alternatives (in order of preference) are * and then ‘, but I’m going to wear Oslo’s # for a few days and see if it doesn’t grow on me. Of course it could easily be done with the type constraint expression (where value.Length < 30), but that's such a long construction for a very simple and commonly-used string length constraint, which deserves its own super-shorthand. And using the # symbol as a shortcut in the where clause is too anomalous an exception for a specific, otherwise normal property.

        LastName : [5] string#30;

    • Dan Vanderboom said

      I’m also not sure what the relationship will be between arrays and lists. I obviously need to support arrays for .NET interoperability, but linked lists used by languages like F# are very powerful tools. Perhaps I’ll be able to convert between them for various bits of syntax and functionality.

      I have also been looking at set notation, especially in Pascal, and I’m likely to use the curly-brace syntax you mention with the appropriate operators.

      I like your operator overload for lists. Taking it a little further, nested lists could build matrices and matrix math operations could be performed on them.

      There is a very early prototype of the parser written in M on the Archetype CodePlex site that you can download and play with.

      • Right, thought of the array vs. lists thing, too. Actually, who needs arrays? But I see the problem.

        []int could be a list of ints, int* is an IEnumerable (or should it then be *int)?

        {}int would be a set.. ?? hmm … 🙂

        And the array had to be made explicit: arrayof(int)

      • Dan Vanderboom said

        @lars,

        I’m totally on board with you with the syntax for lists, sets, and streams (IEnumerables).

        Ages: []int;
        ColorSet : {}Color;
        OddNumbers : *int where value % 2 == 1
        { … }

        To provide room for type constraint expressions on the right, it makes sense to shift the stream * operator to the left, as with lists and sets.

        This brings up another issue. So far, I’ve talked about type expressions that constrain the property named in the definition. But within the current conversation, we’re making the assumption that the type constraint applies to the items within the collection. I’m tempted to go with this instinct since it came so automatically, but I’ll have to think about the consequences. After all, if what you want is a stream of even int values, you could write this:

        type EvenInt : int where value % 2 == 0;
        PantSizes : *EvenInt { … };

        The question I have now is: are there any useful constraints we might place on a collection type such as a list, set, or stream? Already, subrange types apply only to value types. Any examples I try to come up with involve constraining the value of one of the object’s members, but that doesn’t make sense because that constraint belongs on the member itself. If I reference a .NET assembly written in C#, I have no way of intercepting its property settings and the whole idea falls apart.

        I like the idea of being explicit with arrays. Array(T) is perfect! … (substitute parentheses for angle brackets, which annoyingly can’t appear in WordPress comments)

      • I only find an C#4.0-grammar… where is the parser for archetype?

      • Dan Vanderboom said

        @lars, my mistake: I haven’t included an Archetype parser yet in the download. I have a bunch of them in various states of experimentation, and wanted to clean one of them up with a combination of syntax before sharing it. Expect to see something in the next week or two.

      • You could integrate code-contracts using invariants for class member validations.

        Under this considerations I understand wy “M” has the brackets around the types: {int*} and [int*]

        Hm, allthough I don’t like the M-syntax with the kleene operator inside the brackets, this seems quite nice:

        EvenNumbers: [int where value %2==0]#10;
        ColorSet : {Color};
        Ages : int*

        If you want to have a value ascribed to the type, you could use “as” or just a colon:

        [1,2] as [object]
        [1,2] : [object]

        In the initializer a equal-sign is ok:

        EvenNumbers: [int where value %2==0]#10 = [2,4,6];

      • Dan Vanderboom said

        Code contracts are really what I’ve been getting at with the type expressions and subrange types. If a property can only have a value that matches a given expression, it’s an invariant. If a function parameter type is likewise restricted, it’s an entry requirement. The only thing I don’t have in Archetype so far is a way to ensure the return value of a function. The question I’m considering with parameters is: will complex type expressions on parameters be a distraction, such that those requirements would be better expressed after the parameter list? And if so, should it always be expressed there for functions, or should that be up to the developer? I also considered writing type invariants near the type name, but you could have a lot of invariants, and the type header can already get pretty crowded as it is. It seems best to keep constraining expressions as close as possible to their targets.

        Whatever syntax emerges the victor, implementing them with the Code Contracts built into mscorlib.dll in .NET 4 (and in a separate assembly for previous versions) makes a lot of sense. There are tooling hooks that will read and act on that information, and developers will be better off having that.

  2. Actually, when using code-contracts you could remove some of the “NullReferenceExceptions” by making all types (not just values) non-nullable if they have no “?”… NullReferenceException should just not exist!

    For instance members the rule had to run after constructor has been called.

    • Dan Vanderboom said

      See, I would go the other way and make all types nullable by default, if I could get away with not imposing huge runtime penalties. As much as I hate running into a NullReferenceException, I still always want to say that a property’s value is unknown or as-yet-undetermined. In many scenarios, we can determine these values in the constructor; but equally often, objects can’t be constructed until later in the parent object’s life cycle.

      What’s called for is a less taxing way of dealing with potential null values. Considering the ubiquity of the null ref exception, I think it warrants consideration in the design of a language. Instead of this:

      if (Puzzle != null && Puzzle.Pieces.Count > 0)
      Scatter(Puzzle.Pieces);

      We could write something like this:

      if (Puzzle?.Pieces.Count > 0)
      Scatter(Puzzle.Pieces);

      Similar to Groovy’s safe navigation operator. The second example could be translated to the first example. You could move the ? to the right to indicate null checking should happen for all parts of the property path up to that point. Adding it to the end would add checking for every part of the path.

      if (Game.Puzzle.Pieces[0].Image.Height? == 50)
      … would be translated into:
      if (Game != null && Game.Puzzle != null && Game.Pieces != null && Game.Pieces[0] != null && Game.Puzzle.Pieces[0].Image != null && Game.Puzzle.Pieces[0].Image.Height == 50)

      You could get crazy and allow arguments to be annotated with ?, and only execute the function call if all such annotated arguments will not throw a null ref exception.
      Scatter(Puzzle.Pieces?, 15);

      That starts to get a little cryptic, and I worry about readability, but it’s something to think about. Maybe adding ? to the method name in the call makes it stand out enough.
      Scatter?(Puzzle.Pieces?, 15);

      The way you could read this last call is… Should I Scatter? That depends on whether Puzzle.Pieces is okay to evaluate.

  3. Thanks for sharing the information related to wordpress development. It is helpfull for me in future
    for development.It is very helpfull in plugin development.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: