Critical Development

Language design, framework development, UI design, robotics and more.

Archive for June, 2010

Language Design: Complexity, Extensibility, and Intention

Posted by Dan Vanderboom on June 14, 2010

Introduction

The object-oriented approach to software is great, and that greatness draws from the power of extensibility.  That we can create our own types, our own abstractions, has opened up worlds of possibilities.  System design is largely focused on this element of development: observing and repeating object-oriented patterns, analyzing their qualities, and adding to our mental toolbox the ones that serve us best.  We also focus on collecting libraries and controls because they encapsulate the patterns we need.

This article explores computer languages as a human-machine interface, the purpose and efficacy of languages, complexity of syntactic structure, and the connection between human and computer languages.  The Archetype project is an on-going effort to incorporate these ideas into language design.  In the same way that some furniture is designed ergonomically, Archetype is an attempt to design a powerful programming language with an ergonomic focus; in other words, with the human element always in mind.

Programming Language as Human-Machine Interface

A programming language is the interface between the human mind and executable code.  The point isn’t to turn human programmers into pure mathematical or machine thinkers, but to leverage the talent that people are born with to manipulate abstract symbols in language.  There is an elite class of computer language experts who have trained themselves to think in terms of purely functional approaches, low-level assembly instructions, or regular, monotonous expression structures—and this is necessary for researchers pushing themselves to understand ever more—but for the every day developer, a more practical approach is required.

Archetype is a series of experiments to build the perfect bridge between the human mind and synthetic computation.  As such, it is based as much as possible on a small core of extensible syntax and maintains a uniformity of expression within each facet of syntax that the human mind can easily keep separate.  At the same time, it honors syntactic variety and is being designed to shift us closer to a balance where all of the elements, blocks, clauses and operation types in a language can be extended or modified equally.  These represent the two most important design tenets of Archetype: the intuitive, natural connection to the human mind, and the maximization of its expressive power.

These forces often seem at odds with each other—at first glance seemingly impossible to resolve—and yet experience has shown that the languages we use are limited in ways we’re often surprised by, indicating that processes such as analogical extension are at work in our minds but not fully leveraged by those languages.

Syntactic Complexity & Extensibility

Most of a programming language’s syntax is highly static, and just a few areas (such as types, members, and sometimes operators) can be extended.  Lisp is the most famous example of a highly extensible language with support for macros which allow the developer to manipulate code as if it were data, and to extend the language to encode data in the form of state machines.  The highly regular, parenthesized syntax is very simple to parse and therefore to extend… so long as you don’t deviate from the parenthesized form.  Therefore Lisp gets away with powerful extensibility at the cost of artificially limiting its structural syntax.

In Lisp we write (+ 4 5) to add two numbers, or (foo 1 2) to call a function with two parameters.  Very uniform.  In C we write 4 + 5 because the infix operator is what we grew up seeing in school, and we vary the syntax for calling the function foo(1, 2) to provide visual cues to the viewer’s brain that the function is qualitatively something different from a basic math operation, and that its name is somehow different from its parameters.

Think about syntax features as visual manifestations of the abstract logical concepts that provide the foundation for all algorithmic expression.  A rich set of fundamental operations can be obscured by a monotony of syntax or confused by a poorly chosen syntactic style.  Archetype involves a lot of research in finding the best features across many existing languages, and exploring the limits, benefits, problems, and other details of each feature and syntactic representation of it.

Syntactic complexity provides greater flexibility, and wider channels with which to convey intent.  This is why people color code file folders and add graphic icons to public signage.  More cues enable faster recognition.  It’s possible to push complexity too far, of course, but we often underestimate what our minds are capable of when augmented by a system of external cues which is carefully designed and supported by good tools.

Imagine if your natural spoken language followed such simple and regular rules as Lisp: although everyone would learn to read and write easily, conversation would be monotonous.  Extend this to semantics, for example with a constructed spoken language like Lojban which is logically pure and provably unambiguous, and it becomes obvious that our human minds aren’t well suited to communicating this way.

Now consider a language like C with its 15 levels of operator precedence which were designed to match programmers’ expectations (although the authors admitted to getting some of this “wrong”, which further proves the point).  This language has given rise to very popular derivatives (C++, C#, Java) and are all easily learned, despite their syntactic complexity.

Natural languages and old world cities have grown with civilization organically, creating winding roads and wonderful linguistic variation.  These complicated structures have been etched into our collective unconscious, stirring within us and giving rise to awareness, thought, and creativity.  Although computers are excellent at processing regular, predictable patterns, it’s the complex interplay of external forces and inner voices that we’re most comfortable with.

Risk, Challenge & Opportunity

There are always trade-offs.  By focusing almost all extensibility in one or two small parts of a language, semantic analysis and code improvement optimizations are easier to develop and faster to execute.  Making other syntactical constructs extensible, if one isn’t careful, can create complexity that quickly spirals out of control, resulting in unverifiable, unpredictable and unsafe logic.

The way this is being managed in Archetype so far isn’t to allow any piece of the syntax tree to be modified, but rather to design regions of syntax with extensibility points built-in.  Outputting C# code as an intermediary (for now) lays a lot of burden on the C# compiler to ensure safety.  It’s also possible to mitigate more computationally expensive semantic analysis and code generation by taking advantage of both multicore and cloud-based processing.  What helps keep things in check is that potential extensibility points are being considered in the context of specific code scenarios and desired outcomes, based on over 25 years of real-world experience, not a disconnected sense of language purity or design ideals.

Creating a language that caters to the irregular texture of thought, while supporting a system of extensions that are both useful and safe, is not a trivial undertaking, but at the same time holds the greatest potential.  The more that computers can accommodate people instead of forcing people to make the effort to cater to machines, the better.  At least to the extent that it enables us to specify our designs unambiguously, which is somewhat unnatural for the human mind and will always require some training.

Summary

So much of the code we write is driven by a set of rituals that, while they achieve their purpose, often beg to be abstracted further away.  Even when good object models exist, they often require intricate or tedious participation to apply (see INotifyPropertyChanged).  Having the ability to incorporate the most common and solid of those patterns into language syntax (or extensions which appear to modify the language) is the ultimate mechanism for abstraction, and goes furthest in minimizing development effort.  By obviating the need to write convoluted yet routine boilerplate code, Archetype aims to filter out the noise and bring one’s intent more clearly into focus.

Posted in Archetype Language, Composability, Design Patterns, Language Extensions, Language Innovation, Linguistics, Metaprogramming, Object Oriented Design, Software Architecture | 2 Comments »

The Archetype Language (Part 6)

Posted by Dan Vanderboom on June 14, 2010

Overview

This is part of a continuing series of articles about a new .NET language under development called Archetype.  Archetype is a C-style (curly brace) functional, object-oriented (class-based), metaprogramming-capable language with features and syntax borrowed from many languages, as well as some new constructs.  A major design goal is to succinctly and elegantly implement common patterns that normally require a lot of boilerplate code which can be difficult, error-prone, or just plain onerous to write.

You can follow the news and progress on the Archetype compiler on twitter @archetypelang.

Links to the individual articles:

Part 1 – Properties and fields, function syntax, the me keyword

Part 2 – Start function, named and anonymous delegates, delegate duck typing, bindable properties, composite bindings, binding expressions, namespace imports, string concatenation

Part 3 – Exception handling, local variable definition, namespace imports, aliases, iteration (loop, fork-join, while, unless), calling functions and delegates asynchronously, messages

Part 4 – Conditional selection (if), pattern matching, regular expression literals, agents, classes and traits

Part 5 – Type extensions, custom control structures

Part 6 – If expressions, enumerations, nullable types, tuples, streams, list comprehensions, subrange types, type constraint expressions

Part 7 Semantic density, operator overloading, custom operators

Part 8 – Constructors, declarative Archetype: the initializer body

Part 9 – Params & fluent syntax, safe navigation operator, null coalescing operators

Conceptual articles about language design and development tools:

Language Design: Complexity, Extensibility, and Intention

Reimagining the IDE

Better Tool Support for .NET

If Expressions

In Archetype, an if expression can be provided for any value.  The expression variant of the if statement, instead of taking embedded statement clauses, takes value expressions for its "consequence" clauses.

The if expression serves the same purpose as the ternary conditional operator in C#:

TextField.PasswordChar = DisplayStar ? ‘*’ else ‘ ‘;

Enumerations

Enumerations are represented syntactically like lists in Archetype, using the square brackets to enclose values.  The idea is that an enumeration type is simply a list of possible values.

enum RainbowColor [ Red, Orange, Yellow, Green, Blue, Indigo, Violet ];

Enumerations are often formatted to display one value on each line.  The following example demonstrates this, and defines a variable of the enumeration’s type.

enum RainbowColor

[

Red,

Orange,

Yellow,

Green,

Blue,

Indigo,

Violet

];

Anonymous Enumeration Types

It normally makes sense for an enumeration type to be named so it can be referenced and used elsewhere.  But in cases where an enumeration is only needed privately within a single class or method, an anonymous enumeration type can be defined this way:

ForegroundColor enum [ Black, Gray, DarkBlue ];

Enumeration Assignment

Regardless of whether you’re working with named or anonymous enumerations, assignment is the same.  The enumeration type is not used in the assignment, which works well with anonymous enumerations since they don’t have a discoverable name.

// no need for an enumeration name

// so it also works great with anonymous enumeration types

BackgroundColor = Green;

Language services (Intellisense) can still inform the user of the possible values after the equals sign and space are entered.

Nullable Types

The nullable type operator converts a type T to Nullable<T>, the same as in C#.  Consider the following examples of normal and bindable nullable properties, and a local variable with an initializer.

Age int?;

 

HighScore bindable int?;

 

var Age int? = null;

Additionally, we can define a local variable and infer its type from an assignment, using the nullable type operator to force type inference to use a nullable type.

// give me a nullable type, even though I’m not setting it to null no?

var Age ? = 4;

From this point on, we can assign values (including null) to the Age variable without using the null type operator.  In fact, including the ? operator would be invalid.

// update the value of Age; notice we don’t use the nullable ? symbol after the definition

Age = 9;

Tuples

A tuple is an anonymous type consisting of an ordered set of heterogeneous fields. In Archetype, their fields can be named for Intellisense hinting when used as return types, or left unnamed. In local variable definitions, their individual members must either be named or use the anonymous member symbol, the underscore.

The following example shows the syntax for defining a tuple as a return type for a function.  In this case, a pair of int values will be returned.

GetMouseLocation (int, int) ()

{

return (100, 50);

}

The members of a tuple in a return type can be named as a hint to the caller of the function.

GetMouseLocation (x int, y int) ()

{

return (100, 50);

}

The function is called and its return tuple value stored like this:

var (x, y) = GetMouseLocation();

Here we’re defining a new tuple type, Tuple<int, int>, which is not named as a whole.  We might call it an anonymous tuple.  Instead of naming the whole, we’re naming the individual members.

Using the .NET Tuple type, we could also write this:

// we don’t care about the y value here

var (x, _) = GetMouseLocation();

We would then have to reference loc.Item1 and loc.Item2 to access the individual members.  Naming the members instead of the whole, however, makes more sense because it provides greater code readability.

This next example demonstrates how tuples can be defined using type inference.

var (a, b) = (1, 2);

var (c, d) = (a, b);

var x = b;

On the first line, a and b are defined as accessors into a new tuple: a is assigned to a value of 1, and b to a value of 2.  On the second line, another tuple is defined and its member c is assigned to a while d is assigned to b.  The third line demonstrates how you can use the tuple members independently of each other.  In this case, the value of b is assigned to x.

If we don’t care about all of the members of a tuple, we can use the underscore character to ignore that member.  The next example shows how to extract the x value from our GetMouseLocation function while ignoring the y value.

// we don’t care about the y value here

var (x, _) = GetMouseLocation();

Finally, we have a handy way of swapping values without the need to introduce a third variable.

(a, b) = (b, a);

Archetype is not limited to two-member tuples.  The .NET Framework defines tuples up to seven members, so Archetype will handle at least that many.  If that proves inadequate, it should be relatively easy to extend this to any number of members.

Streams

I first read about streams (or lazy lists, as in Haskell) in a C Omega document on a Microsoft Research site.  They’re analogous to sequences in XQuery and XPath, and are implemented using the IEnumerable<T> type in an iterator.  I liked C Omega’s * operator to define a stream because of the way it sets that type apart from a normal type.  In C#, it’s not obvious that a function with a return type of IEnumerable<T> should behave any differently from another function until you notice the yield keyword.

If I want a stream defined as a property in C#, I’d have to write something like this:

IEnumerable<int> Numbers

{

get

{

yield return 1;

yield return 3;

yield return 5;

}

}

In Archetype, the syntax is more succinct and direct:

Numbers int*

{

yield 1;

yield 2;

yield 3;

}

We can be even more terse in such cases by using a comma in the yield list.

Numbers int*

{

yield 1, 2, 3;

}

Using list comprehensions, which we’ll explore in more detail later in this article, we can do this as well:

Numbers int*

{

yield 0..100 skip 5;

}

One note about the list comprehension here: we don’t use square brackets around the numeric range because they are implied in the yield statement.  Including them here would cause the yield statement to return the list as a single yielded value.

These examples produce streams with a fixed number of elements, but streams can be infinite as well.  This example returns all positive odd numbers starting with one.

OddNumbers int*

{

def i = 1;

loop

{

yield i;

i += 2;

}

}

Streams are lazy, so while it looks at first glance like an infinite loop from which you’ll never escape, in reality control is driven by the loop that accesses the stream.

loop (var n in OddNumbers)

{

Console.WriteLine(n);

if (n > 100)

break;

}

When other type operators are used, such as the nullable type operator, the stream operator must appear last.

Ages int?*

{

yield 35;

yield null;

}

List Comprehensions

Archetype provides some special syntax for constructing lists called list comprehensions.  This is syntactic sugar that provide shortcuts for building lists.

Consider the following syntax in C# and Archetype for constructing a list from 1 to 100.

// C#

var FirstHundred = from x in Enumerable.Range(1, 100) select x;

 

// Archetype

var FirstHundred = [ 1..100 ];

The square brackets in Archetype specify the construction of a list.  Now consider a more complicated list construction.  In this case, Linq is employed in both langauges:

// C#

var FirstHundred = from x in Enumerable.Range(1, 100) where x*x > 3 select x*2;

 

// Archetype

var FirstHundred = from x in [ 1..100 ] where x*x > 3 select x*2;

Here you can see how a list can be used as the source of a query.

Here are some more list comprehension examples:

image

Subrange Types

One of the gems in Pascal is subrange types.  This allows a developer to define a new type that is structurally the same as another type, but whose values are constrained in some way.  I’m often bothered by the disparity between database and .NET types.  In a database, a string type (such as varchar) has a definite and usually small limit.  In .NET, strings can be up to 2 MB, but there hasn’t been a good way in languages like C# and Visual Basic to constrain the length.  In various object-relational mappers, a Size attribute is often employed, but this is only metadata and does nothing to prevent the string from becoming too large, so additional work must be carefully performed to constrain the input using control properties and validation logic.

Archetype answers this with subrange types and type constraints.  Consider the following:

// an int that can only have a value from 0 to 105

type ValidAge int in [0..105];

We can now use this ValidAge type to define our class properties:

Age ValidAge;

If a type is unlikely to be reused, we can also define subrange types anonymously.

Age int in [0..105];

In fact, any list comprehension can be used in a subrange type expression, including multiple ranges, as long as a single base type is used.  This example shows an age property that is valid for underage and retiring age people, but is invalid for any ages in between.

Age int in [0..17, 65..105];

We can limit the length of a string simply:

LastName string#30;

Although in actual practice, it might make more sense to create several named types for various string lengths represented in a database:

type Code string#10;

type Name string#20;

type Summary string#100;

type Description string;

 

LastName Name;

By using a limited number of named string types, both in your code as well as in the database, it’s much easier to update the lengths as needed with a lot less effort.  Archetype adds attributes to the members using these types as well, so this data can be queried and used to inform user interface controls and validation logic, enabling a stronger model-driven approach.

The length of strings doesn’t need to be a single number representing the maximum, however.  We can also specify a range of lengths.

Name string#2..3 = ”ZZZ”;

Notice in this example how an initializer must be used.  This is because a value of null is actually invalid for the Name property.  The minimum allowed length is 2.  Not providing an initializer with a valid value produces a compiler error.  If we wanted to also allow a null value, we would do so like this:

Name string?#2..3 = “ZZZ”;

As with type constraint expressions—discussed in the next section—Archetype injects the appropriate runtime checks in the property setter before any explicitly specified setter code, and throws an OutOfRangeException if the value doesn’t match the specified type criteria.

Type Constraint Expressions

Related to subrange types, type constraints can be applied equally to named or anonymous types.  They allow you to specify a Linq-like where clause that will be used to check values being assigned to properties at runtime.  Because they rely on property setter methods, type constraints cannot be used on fields.  Fortunately, local variables within methods are also implemented like properties by default, so type constraints are also valid on local variables.

I’ve noticed that for some brands, or some stores carrying those brands, only even-numbered sizes of pants are stocked.  This example shows a subrange type representing pants size, using both a subrange type as well as a type constraint expression.

// an int that can only be even

type PantSize int in [0..60] where value % 2 == 0;


Using the modulus operator to obtain the remainder of division, we can be certain now that values of this type will only be even numbers.  Type constraint expressions are allowed to call static or global functions, properties, and fields, but they cannot reference instance members.

Summary

This article covered a lot of type fundamentals.  It should be obvious at this point how a common thread is being woven into the Archetype language.  You’ve probably noticed how almost every construct has named and anonymous counterparts.  Another important theme is the ability to extend types with syntax designed to shape them, and the use of Linq-like expressions throughout the language.

There is still some type content to cover, such as variant or tagged union types and duck typing, which I’ll save for a future article.  Also coming soon is my work on defining custom query comprehensions for a Linq-like query language which can be easily extended with a simple language feature, as well as operators for higher-order functions like fold, map, and others.

Work on my goal of getting a basic Archetype compiler into everyone’s hands is going slowly but steadily.  I have a simple Silverlight IDE running in the cloud that parses Archetype code and will return a .NET assembly.  I got the Oslo tools to work on the server, and I’m partially building the AST I need to perform the semantic analysis and use to report compile errors to the user and to generate the C# code which I’ll then compile with csc.exe.  I’m using a WCF publish-subscribe pattern to initiate a build from the client and report progress as messages going back to the client.  In the next few weeks for sure, and possibly sooner, I’ll post a link to that so you can give Archetype a test drive yourself.

[Part 7 of this series can be found here.]

Posted in Archetype Language, Functional Programming, Language Innovation | 14 Comments »