Archive for the ‘Metaprogramming’ Category

Language Design: Complexity, Extensibility, and Intention

Posted by Dan Vanderboom on June 14, 2010

Introduction

The object-oriented approach to software is great, and that greatness draws from the power of extensibility. That we can create our own types, our own abstractions, has opened up worlds of possibilities. System design is largely focused on this element of development: observing and repeating object-oriented patterns, analyzing their qualities, and adding to our mental toolbox the ones that serve us best. We also focus on collecting libraries and controls because they encapsulate the patterns we need.

This article explores computer languages as a human-machine interface, the purpose and efficacy of languages, complexity of syntactic structure, and the connection between human and computer languages. The Archetype project is an on-going effort to incorporate these ideas into language design. In the same way that some furniture is designed ergonomically, Archetype is an attempt to design a powerful programming language with an ergonomic focus; in other words, with the human element always in mind.

Programming Language as Human-Machine Interface

A programming language is the interface between the human mind and executable code. The point isn’t to turn human programmers into pure mathematical or machine thinkers, but to leverage the talent that people are born with to manipulate abstract symbols in language. There is an elite class of computer language experts who have trained themselves to think in terms of purely functional approaches, low-level assembly instructions, or regular, monotonous expression structures—and this is necessary for researchers pushing themselves to understand ever more—but for the every day developer, a more practical approach is required.

Archetype is a series of experiments to build the perfect bridge between the human mind and synthetic computation. As such, it is based as much as possible on a small core of extensible syntax and maintains a uniformity of expression within each facet of syntax that the human mind can easily keep separate. At the same time, it honors syntactic variety and is being designed to shift us closer to a balance where all of the elements, blocks, clauses and operation types in a language can be extended or modified equally. These represent the two most important design tenets of Archetype: the intuitive, natural connection to the human mind, and the maximization of its expressive power.

These forces often seem at odds with each other—at first glance seemingly impossible to resolve—and yet experience has shown that the languages we use are limited in ways we’re often surprised by, indicating that processes such as analogical extension are at work in our minds but not fully leveraged by those languages.

Syntactic Complexity & Extensibility

Most of a programming language’s syntax is highly static, and just a few areas (such as types, members, and sometimes operators) can be extended. Lisp is the most famous example of a highly extensible language with support for macros which allow the developer to manipulate code as if it were data, and to extend the language to encode data in the form of state machines. The highly regular, parenthesized syntax is very simple to parse and therefore to extend… so long as you don’t deviate from the parenthesized form. Therefore Lisp gets away with powerful extensibility at the cost of artificially limiting its structural syntax.

In Lisp we write (+ 4 5) to add two numbers, or (foo 1 2) to call a function with two parameters. Very uniform. In C we write 4 + 5 because the infix operator is what we grew up seeing in school, and we vary the syntax for calling the function foo(1, 2) to provide visual cues to the viewer’s brain that the function is qualitatively something different from a basic math operation, and that its name is somehow different from its parameters.

Think about syntax features as visual manifestations of the abstract logical concepts that provide the foundation for all algorithmic expression. A rich set of fundamental operations can be obscured by a monotony of syntax or confused by a poorly chosen syntactic style. Archetype involves a lot of research in finding the best features across many existing languages, and exploring the limits, benefits, problems, and other details of each feature and syntactic representation of it.

Syntactic complexity provides greater flexibility, and wider channels with which to convey intent. This is why people color code file folders and add graphic icons to public signage. More cues enable faster recognition. It’s possible to push complexity too far, of course, but we often underestimate what our minds are capable of when augmented by a system of external cues which is carefully designed and supported by good tools.

Imagine if your natural spoken language followed such simple and regular rules as Lisp: although everyone would learn to read and write easily, conversation would be monotonous. Extend this to semantics, for example with a constructed spoken language like Lojban which is logically pure and provably unambiguous, and it becomes obvious that our human minds aren’t well suited to communicating this way.

Now consider a language like C with its 15 levels of operator precedence which were designed to match programmers’ expectations (although the authors admitted to getting some of this “wrong”, which further proves the point). This language has given rise to very popular derivatives (C++, C#, Java) and are all easily learned, despite their syntactic complexity.

Natural languages and old world cities have grown with civilization organically, creating winding roads and wonderful linguistic variation. These complicated structures have been etched into our collective unconscious, stirring within us and giving rise to awareness, thought, and creativity. Although computers are excellent at processing regular, predictable patterns, it’s the complex interplay of external forces and inner voices that we’re most comfortable with.

Risk, Challenge & Opportunity

There are always trade-offs. By focusing almost all extensibility in one or two small parts of a language, semantic analysis and code improvement optimizations are easier to develop and faster to execute. Making other syntactical constructs extensible, if one isn’t careful, can create complexity that quickly spirals out of control, resulting in unverifiable, unpredictable and unsafe logic.

The way this is being managed in Archetype so far isn’t to allow any piece of the syntax tree to be modified, but rather to design regions of syntax with extensibility points built-in. Outputting C# code as an intermediary (for now) lays a lot of burden on the C# compiler to ensure safety. It’s also possible to mitigate more computationally expensive semantic analysis and code generation by taking advantage of both multicore and cloud-based processing. What helps keep things in check is that potential extensibility points are being considered in the context of specific code scenarios and desired outcomes, based on over 25 years of real-world experience, not a disconnected sense of language purity or design ideals.

Creating a language that caters to the irregular texture of thought, while supporting a system of extensions that are both useful and safe, is not a trivial undertaking, but at the same time holds the greatest potential. The more that computers can accommodate people instead of forcing people to make the effort to cater to machines, the better. At least to the extent that it enables us to specify our designs unambiguously, which is somewhat unnatural for the human mind and will always require some training.

Summary

So much of the code we write is driven by a set of rituals that, while they achieve their purpose, often beg to be abstracted further away. Even when good object models exist, they often require intricate or tedious participation to apply (see INotifyPropertyChanged). Having the ability to incorporate the most common and solid of those patterns into language syntax (or extensions which appear to modify the language) is the ultimate mechanism for abstraction, and goes furthest in minimizing development effort. By obviating the need to write convoluted yet routine boilerplate code, Archetype aims to filter out the noise and bring one’s intent more clearly into focus.

Posted in Archetype Language, Composability, Design Patterns, Language Extensions, Language Innovation, Linguistics, Metaprogramming, Object Oriented Design, Software Architecture | 2 Comments »

Oslo: Misconceptions and Fallacies

Posted by Dan Vanderboom on February 1, 2009

In the many conversations and debates I’ve had about Oslo recently–in person, on the phone, through email, on blogs, and in the Oslo forum–I’ve encountered a good amount of resistance to the goals of Oslo. Some of this is due to misconception and general confusion, some is due to an attachment to one’s current methodology, and some I believe is simply due to a fear of anything new and unknown. In the course of these conversations, I’ve run across a common set of thoughts or themes which I have attempted to represent faithfully here.

My first article, Why Oslo is Important, attempted to elucidate the high-level collection of concepts, technologies, and tools flying under the Oslo banner as it exists today and how I imagine it evolving in the future. Though I’ve received a lot of interest and appreciation, it also managed to spark some feedback from those who were still confused or concerned, leading me to believe that I had failed to deliver a fully satisfying explanation.

Understandably, Oslo and its target domain are too large to explain or digest in a single article, even a long one. It’s also too early in the development cycle to be very specific. So it’s reasonable to suggest that one can’t be totally satisfied until substantial examples and reference applications are built using the Oslo tools. Fair enough.

This article isn’t going to provide that reference application. It has the more modest goal of trying to dispel some of the common misconceptions and fallacies that I’ve encountered, and my responses to them. In future articles, as I design and develop my newest software system, I’ll be documenting and publishing the how and why of my use of Oslo tools and technologies to provide more specific evidence of its usefulness. I’ll also be providing references to much of the good work that is being done to provide solutions to various problems.

As always, I encourage you to participate and provide feedback: to me, and especially to the Oslo team. The more brain power we can bring to bear on this problem in the community, the better off the Oslo team will be, and the faster Oslo will evolve to become precisely the set of tools we need to improve our overall development experience, and developer productivity in particular.

“Oslo doesn’t solve any problems that can’t already be solved with existing tools or technologies.”

When the first steam-powered tractors were sold to farmers in the 1860’s, traditional ox and horse farmers might have said the same thing. “This tractor won’t do anything that my ox-plowing method can’t already do.” This is true, but it’s not an effective argument against the use of the new technology, which was a much faster and more cost-effective method of farming. The same farmer could harvest much more of his crop in the same amount of time, and as the new technology matured and gas-powered engines became available (in the 1880’s), so did the benefit increase. The same goes for any high level language above assembly language, the use of a relational database over a loose collection of files, display of text on a CRT instead of punched tape to communicate with a user, or any other great technological leap forward that “doesn’t accomplish anything new”.

Then again, it all depends what kind of problems you’re talking about. If you get specific enough, I’m sure you’ll find plenty in Oslo that’s new, even so early in its development, such as a shared repository for interoperable models and the ability to define new parsers that provide tooling support such as keyword colorization. Sometimes it’s these little details that act as the glue to pull components together and create substantial value through the synergy that results. Visual Studio and Intellisense weren’t strictly needed (you can still use Notepad and cs.exe), but it can quickly answer dozens of questions a day without having to jump out of context and spend a lot of time looking through disconnected documentation.

“We don’t need to know about Oslo or model-driven development because the applications I build are small and specific, or otherwise don’t need to be so general and flexible.”

This may or may not be true for you, but saying that the industry doesn’t need to advance because you don’t perceive a direct benefit to your own development isn’t valid. The reason your applications are able to be so simple is because of the wealth of tools, languages, platforms, frameworks, and libraries that your applications leverage. Standing on the shoulder of giants, you might say.

Many of these systems and components can benefit tremendously from a model-driven approach, and if it improves productivity for Microsoft and other third-party developers, that means they’ll be able to spend less time on plumbing and more time building the framework features you care about. It’s also likely to make all of those APIs cleaner and more consistent, resulting in easier discoverability and fewer headaches for you, the API consumer. As the .NET Framework and other frameworks and libraries grow ever larger, this will be critical to keeping things organized and under control.

“Oslo is going to force me to model all kinds of things that really aren’t needed in my software.”

The existence of Oslo will not force you to model any characteristics that you aren’t already modeling through other means. What it will do is provide more options and tools for modeling your software more effectively and more productively. It will also significantly ease the burden of creating more heavily model-driven software if you decide that’s right for your application or service.

For more information and a clearer definition of what a model is, see this article on the MSDN Oslo Development Center.

“Oslo will impose a workflow on me that doesn’t make sense for my methodology or business.”

Where Oslo fits into your specific workflow will be ultimately determined by you. This isn’t any different from Entity Framework. In v1 of EF, the tooling supports the reading of database structure and the generation of entity classes, but there is work being done to support a workflow going in the other direction: that of starting with classes and generating the database. The Entity Framework itself doesn’t actually care in which direction you want to work; the issue is primarily one of tool support. Other initiatives such as adding support for POCO indicate that the EF team is listening to feedback from the community and making the necessary changes to achieve broad support of their framework. I would expect the same from the Oslo team.

Early releases of Oslo will have similar limitations; currently it seems that M can only be used to generate a database structure from MSchema, and that database structure can be read by Entity Framework to generate your entity classes. Because Microsoft has such a broad audience to satisfy, other workflows will have to be accommodated, such as starting with class files and generating M files and database schemas. In fact, I’ve submitted feedback to Microsoft’s Connect site to ensure this kind of multi-master synchronization of model representations is considered.

“Putting everything in a database is overkill for my application, so Oslo isn’t relevant to me.”

While the Repository is an important aspect of Oslo, it isn’t required. Command line tools exist to transform textual input (specified as MGraph, or in your own custom-defined format using a Domain Specific Language) into MGraph output. There is a separate step to convert this into SQL, or to optionally inject this data directly into the Oslo Repository.

If you don’t want to use the Repository, there are already multiple methods available for instantiating objects directly from this text data, whether it’s read from a file, embedded as a resource, or sent as data over a network. Josh Williams (SpankyJ) has published an example showing how to convert DSL text into XAML, and instantiate the object graph using an MGraphXamlReader, and Torkel Ödegaard of Coding Instinct wrote an article demonstrating how to write a generic deserializer without using XAML.

Model formats such as CSDL, MSL, and SSDL for EF, or configuration data currently specified for WCF and WF, will all very likely be expressed in some DSL specified with M (there has been talk about these efforts already). Since applications without database access will need these technologies, it will be impossible to force developers to read this model data from a SQL Server database.

“We already have XML, XSD, and XSLT, so there’s no benefit to having yet another language to specify the same things.”

XSD is used to define formats and languages (such as XAML), and XML is used as a poor man’s one-size-fits-all meta-format for specifying hierarchical data. While XML is friendly enough to open in text editors, it’s designed more for tools than for human eyes.

Having different languages and formats to represent different kinds of data actually eases human comprehension and authoring ability. As Chris Anderson said in his Oslo session at PDC, when you’re looking at XML in an editor, what stands out are not what’s important to your domain, but rather what’s important to XML: elements and attributes.

People are using XML to define their DSLs and formats, not because XML is the best representation, but because writing parsers for new formats and languages is just too hard. Customers had been asking Microsoft for the ability to write these DSLs easily, so it was out of conversations and customer feedback that Microsoft decided to expose these services.

So it’s not a matter of absolutely needing M and the ability to define new languages because of some inability to get work done without them. Rather, it’s about reducing the amount of mental work required to author our models and increasing our productivity as a result. It’s also about having powerful transformational tools available to convert all formats and languages into a common representation so that the models can all interoperate despite their differences, in the same way that .NET languages all compile to a common CIL/MSIL language so that many different programming languages can interoperate. Without this ability, we’d have a different community of developers for each language instead of one broad group of “.NET developers” who can all share and benefit from each other’s knowledge and libraries. This has been recognized as such an important advantage that there are efforts underway to compile languages other than Java to JVM byte cote.

The larger the community, the larger our collective pool of knowledge, and the greater reuse we actually achieve.

“Oslo is too general and abstract to be useful to real developers building real systems.”

The idea that generalization can get out of control for a specific problem is valid, in the same way that a problem can be over-analyzed. But that doesn’t mean that we should stigmatize all general-purpose software, or that we should ignore the growing trend for enterprise software systems to require greater flexibility, user customization support, extensibility, and so on.

The fact is that life on Earth evolves towards greater complexity, and as supporting hardware resources increase and business demands grow, so does software. Taming that complexity will require rethinking how we approach every aspect of software design and development, including how to model it.

The software development industry is stratified into many layers, from platform development to one-off, command-line utilities. Some organizations write software to support millions of users, while others deploy specialized applications in-house, but most of us fall somewhere in between. Oslo seems to be most applicable to enterprise software offered to many customers, including cloud services, but there are subsets of Oslo that will have an impact on a great majority of .NET developers sooner or later.

There’s a lot of thought and work that goes on in our world (and billions of dollars spent) on “pure research” in the sciences (including computer science) that isn’t directly applicable to every John Doe, but without which we wouldn’t have things like nuclear power plants, microwaves, radio, television, satellite communication, or many pharmaceuticals. The Nobel laureates of the world who have spent their lives studying something so abstract and remote from every day life have contributed massively to the technological progress of our world, and quite often contribute to a better, more sanitary, healthy, and productive society. Despite the risks and dangers each technology enables, we somehow still make steady progress in terms of reducing chaos and violence.

Without abstract and general technology like general purpose language compilers, which can specify any logic we dream up for any type of application we care to build, we’d be back in the stone age. The Internet itself is based on communication standards that are so general, they are applicable to any application protocol or service traffic we can devise.

So before dismissing software (or any technology) due to its abstract or general nature, think about where we’d be without them. Someone has to approach the colossal, abstract, general problems with enough foresight to deliver solutions before they’re too desperately needed; and who better than a huge organization like Microsoft with deep pockets?

Ironically, our ability to define Domain Specific Languages with Oslo give us the converse power: the ability to define languages and formats that are extremely specific to our purposes and problem domains, and therefore enable us to write our specifications with less ceremony and noise that accompanies a general purpose language. This allows us to specify our intentions more easily by saying only what we need to say to get the point across. So the only reason Oslo must be so general is to provide that interoperability and translation layer across a set of specific formats that we define… without us having to work so hard for it.

For different reasons, it reminds me of generics, another general and abstract tool: it’s a complicated feature to implement in a language, but it provides great expressive power. Generics also don’t provide anything we couldn’t manage to do before in other ways, but I certainly wouldn’t want to go back to programming without them! In fact, you might say it’s an effective modeling tool.

Posted in Development Environment, Dynamic Programming, Language Innovation, Metaprogramming, Oslo, Problem Modeling | 7 Comments »

Why Oslo is Important

Posted by Dan Vanderboom on January 17, 2009

Contrary to common misunderstanding and speculation, the point of Oslo is not to put programming in the hands of business analysts who want to write their own business rules. Do I think some of that will happen? Architects and engineers will try everything they can imagine. Some of them will succeed in specific niches or scenarios, but it won’t replace application or system design, and it will probably be very limited for the forseeable future. Oslo is more about dramatically improving the productivity of designers and developers by generalizing common solution patterns and generating more adaptable tools.

Much of the confusion around Oslo occurs for two reasons:

Oslo is designed at a higher level of abstraction than most systems today, so its scope is broad and it will have an impact on virtually every product, solution and service across Microsoft. It’s difficult to get your head around something that big.
Because of its abstract nature, core concepts are defined in terms that are heavily overloaded, like "Model", "Repository", and "Language". Once you’ve picked up the lingo and can translate Oslo terminology into language you’re already familiar with, both the concept and magnitude of it will become obvious.

Oslo isn’t something completely new; in fact, Oslo borrows from a lot of previous research and even existing model-driven development tools. Oslo focuses existing technologies and techniques into a coherent and mature vision of development, combining all parts into a more powerful whole, and promises to deliver a supremely adaptable and efficient platform to develop on.

What Is Oslo?

Oslo is a software factory for generating first-class, tool-supported languages out of your declarative specifications.

A factory is a highly organized production facility
that produces members of a product line
using standardized parts, tools and production processes.

-from a review of Software Factories

The product line is analogous to Oslo’s parsers, transform tools, and IDE plugins for new data models and languages (both textual and visual) that you define. The standardized parts are Oslo’s library components; the tools are the M languages and the Quadrant/Intellipad application; and the processes are shaped by the flow of data through the Oslo tool chain (see the diagram near the end of this article).

With Oslo, you build the custom tools you need to rapidly build or generate software systems. It’s all about using the right tool for the job, and having a say in how those tools are shaped to obtain the greatest leverage.

As stated at the home page of softwarefactories.com:

We see a capacity crisis looming. The industry continues to hand-stitch applications distributed over multiple platforms housed by multiple businesses located around the planet, automating business processes like health insurance claim processing and international currency arbitrage, using strings, integers and line by line conditional logic. Most developers build every application as though it is the first of its kind anywhere.

In other words, there’s already a huge shortage of experienced, highly-qualified professionals capable of ensuring the success of these increasingly complex systems, and with the need (and complexity) growing exponentially, our current development practices increasingly fall short of the total demand.

Books like Greenfield’s Software Factories have been advocating building at a higher level of abstraction for years, and my initial reaction was to see it as a natural, evolutionary milestone for a highly mature software system. However, it’s an awful lot of focused development effort to attain such a level of maturity, and not many organizations are able to pull it off given the state of our current development platforms.

It’s therefore fortuitous that Microsoft teams have taken up the challenge of building these abilities into their .NET platform. After all, that’s where it really belongs: in the framework.

Unexpected Awesomeness

Oslo of course contains a lot of expected awesomeness, but where it will probably have the most impact in terms of developer productivity is with new first-class languages and language tools. Why? It first helps to understand the world of data formats and languages.

We’ve had an explosion of data formats–these mini Domain Specific Languages, if you will (especially in the form of complex configuration files). As systems evolve and scale, and the ways we can configure and compose our application’s behavior continues to grow, at what point do we perceive that configuration graph as the rich language that it becomes? Or when our user interfaces evolve from Monolithic to Modular to Composite to Granular Composite (or User Composable), at what point does that persistent object graph become our UX DSL (as with XAML in WPF).

Sometimes we set our standards too low, or are slow to raise them when the time has come to do so. With XML we get extensibility in defining languages and we think, "If we can parse it, then we can build a tool over it." I don’t know about you, but I’d much rather work with rich client software–some kind of designer–over a textual data format any day.

But you know how things go: some company like Microsoft builds a whole bunch of cool stuff, driven off some XML configuration, or they unleash something like XAML on which WPF, WF, and more are built. XAML is great for tools to read and write, and although XML and XAML are textual and not binary and therefore human readable in a text editor (the original intention behind that term), it’s simply not as easy to read as C# or VB.NET. That’s why we aren’t all rushing to program everything in XAML.

Companies like Microsoft, building from the bottom up, release their platforms well in advance of the thick client user experiences that make them enjoyable to use and which encourages mass adoption. Their models, frameworks, and applications are so large now that they’re released in massively differentiated stages, producing a technology adoption gap.

By giving that language a syntax other than XML, however, we can approach it in the same way we approach our program logic: in the most human readable and aesthetically-pleasant way we can devise, resembling our programming languages of choice.

Sometimes, the density of data and its structure in our model is such that a visual editor fails to represent that model well. Source code is a case in point. You could create a visual designer to visualize flow control, branching logic, and even complex expression building (like the iTunes Smart Playlist), but code in text format is more appropriate in this kind of scenario, and ends up being more efficient with the existing tooling available. Especially with an IDE like Visual Studio, we’re working with human-millenia of effort that have gone into the great code editing tools we use today. Oslo respects this need for choice by offering support for building both visual and textual DSLs, and recognizes the fluent definition of new formats and languages as the bridge to the next quantum leap in productivity.

If we had an easy way of defining languages in formats that we developers felt comfortable working with–as we’re comfortable with our general purpose languages and their rich tool support–then we’d be much more productive in the transition between a technology first being released and later having rich tool support over it. WPF has taken quite a while to be adopted as much as it has, partly due to tool availability and maturity. Before Expression Blend or Cider designers were released and hand-coding XAML was the only way, those who braved the angle brackets struggled with it. As I play with Silverlight, I realize how much must still be done in XAML, and how we still struggle. It’s simply not as nice to work with as my C# code. Not as rich, and not as strongly tool-supported.

That’s one place Oslo provides value. With the ability to define new textual and visual DSLs, rigorous verification and validation in a rich set of tools, the promise of Intellisense, colorization of keywords, operators, constants, and more, the Oslo architects recognize the ability to enhance our development experience in a language-agnostic way, raising the level of abstraction because, as they say, the way to solve any technical problem is to approach it at one higher level of indirection. Unfortunately, this makes Oslo so generalized and abstract that it’s difficult to grasp and therefore to appreciate its immensity. Once you can take a step back and see how it fits in holistically, you’ll see that it has the potential to dramatically transform the landscape of software development.

Currently, it’s a lot of work to implement all the language services in Visual Studio to give them as rich an experience as we’ve come to expect with C#, VB.NET, and others. This is a serious impediment to doing this kind of work, so solving the problem at the level of Oslo drastically lowers the barrier to entry for implementing tool-supported languages. The Oslo bits I’ve seen and played with are very early in the lifecycle for this massive scope of technology, but the more I think about its potential, the more impressed I am with the fundamental concept. As Chris Anderson explained in his PDC session on MGrammar, MGrammar was an implementation detail, but sometime around June 2007, that feature team realized just how much customers wanted direct access to it and decided to release MGrammar to the world.

Modeling & The Repository

That’s all well and good for DSLs and language enthusiasts/geeks, but primarily perhaps, Oslo is about the creation, exploration, relation, and execution of models in an interoperable way. In other words, all of the models that are currently used to describe a software system, or an entire IT environment, are either not encoded formally enough to verify or execute, or they’re encoded or stored in proprietary ways that don’t allow interoperability with other models. A diagram in Visio or PowerPoint documenting network topology, for example, knows nothing about the component architecture or deployment model of the software systems installed and running on that network.

When people usually talk about models, they imagine high-level architecture documents, overviews used to visually summarize work that is much more granular in nature. These models aren’t detailed, and they normally aren’t kept up to date and in sync with the current design as changes are made. But modeling in Oslo is not an attempt to make these visual models contain all of the necessary detail, or to develop software with visual tools exclusively. Oslo simply provides the tools, both graphical and textual, to define and relate many models. It will be up to the development community to decide how all these tools are ultimately used, which parts of our systems will be specified in a mix of general purpose, domain specific, and visual languages. Ultimately, Oslo will provide the material and glue to fill the gaps between the high and low level specifications, and unite them into a common, connected, and much more useful set of data.

To grasp what Oslo modeling is really all about requires that we expand our definition of "model", to see the models expressed in our configuration and XAML files, in our applications’ database schemas, in our entity classes, and so on. As software grows in complexity and becomes more composable, we can use various languages to model its behavior, store that in the repository for runtime execution, inspection, or reuse by other systems.

This funny and clever Oslo video (reminiscent of The Hitchhiker’s Guide to the Galaxy) explains modeling in the broader sense alluded to here.

If we had some universal container for the storage of all different kinds of models, and a standardized way of relating entities across models, we’d be able to do things like impact analysis, where we could see the effect on software systems if someone were to alter the network it was running on; or powerful data mining on the IT execution environment of a business.

Many different tools, with different audiences, will be able to connect into this repository to manipulate aspects of the models that they understand and have access to. This is just the tip of the iceberg. We already model so much of what we do in the IT and software worlds, and as we begin adopting business process middleware and orchestration software like BizTalk, there’s a huge amount of value in those models converging and connecting. That’s where the Oslo Repository comes in.

Oslo provides interoperability among models in the same way that SOA provides interoperability among services. Not unlike the interoperability we have now among many different languages all sharing the same CLR specification.

Bridging data models across repositories or in shared repository is a major step forward. With Windows Azure and Microsoft’s commitment to their online services platform (and considering the momentum of the SaaS movement with Amazon, Google, and others), shared storage and data sets are the future. (Check out SQL Data Services if you haven’t already, and watch for some exciting announcements coming later this year!)

The Dichotomy of Data vs. Metadata

Jeff Pinkston from the Oslo team aptly reflects the attitude of the group when he scoffs at the categorical difference between data and metadata. In terms of storing and querying it, serializing and communicating it, and everything else that matters in enterprise software, data is data and there’s no reason not to treat it the same when it comes to architecting a system. We have our primary models and our secondary models, our shared models and our protected models, but they’re still just models that shape our software’s behavior, and they share all of the same characteristics when it comes to manipulation and access. It’s their ultimate effect that differs.

It’s worth noting, I think, the line that’s been drawn between code and data in some programming languages and not in others (C# vs. LISP). A division has been made for the sake of security rather than necessity. Machine instruction codes are represented in the same sort of binary data and realized in the same digital circuitry as traditional user data. It’s tempting to keep things locked down and divided, but as languages evolve to become more late bound and dynamic (and as the tools evolve to make this feasible), there will be more need for the manipulation of expression trees and ASTs. I strongly suspect the lines will blur until they disappear.

Schema and Object Instance Languages

In order to define models, we need a tool. In Oslo, this is a textual language called MShema and an editor called Intellipad. I personally think it’s odd to talk people’s ears off about "model, model, model", and then to use the synonym "schema" to name the language, but all of these names could change before they’re shipped for all we know.

This is a simple example of an MSchema document:

module MyModel
{
    type Person
    {
        LastName : Text;
        FirstName : Text;
    }

    People : Person*;
}

By running this through the "M Compiler", a SQL script is generated that will create the appropriate database objects. Intellipad is able to verify the correctness of your schema, and what’s really nice is that you don’t even have to specify data types when you start sketching out your model. Defaults are assumed, and you can get more specific as your model evolves.

MGraph is a language for defining instances of objects, constrained by an MSchema and similar in format. So MSchema is to MGraph what XSD is to XML.

In this article, Lars Corneliussen explains Microsoft’s vision to make MGraph as common as XML is today. Take a look at his article to see a side-by-side comparison of the same object represented as XML (POX), JSON, and MGraph, and decide for yourself which you like best (or see below).

MSchema and MGraph are easier and more efficient to read and write than XML. Their message format resembles typical structured programming languages, and developers are already familiar with these formats. XML is a fine format for a tool; it’s human readable but not human-friendly. A C-style language, on the other hand, is much more human-friendly than all of the angle brackets and the redundancy (and verbosity) of tag text. That narrows down our choice to JSON and MGraph.

In JSON, the property/field/attribute names are delimited by quotation marks, suggesting that the whole structure is a dumb property bag.

{
"LastName" : "Vanderboom",
"FirstName" : "Dan"
}

MGraph has a very similar syntax, but its attribute property names are recognized and validated by the parser generated from MSchema, so the quotation marks are unnecessary. It ends up looking more natural, and a little more concise.

{
LastName : "Vanderboom",
FirstName : "Dan"
}

Because MGraph is just a message format, and Microsoft’s service offerings already support multiple message formats (SOAP/POX/JSON/etc.), it wouldn’t disrupt any of their architecture to add an MGraph adapter, and I’ll be shocked if I don’t hear about one in their next release.

Meta-Languages and MGrammar

In the same way that Oslo includes a meta-model because it allows us to define models, it also includes a meta-language because it allows us to define languages (as YACC and ANTLR have done). However, just as Pinkston doesn’t think data and metadata should be treated different, it makes sense to think of a language that defines languages as just another language. There is something Zen about that, where the tools somehow seem to bend back upon themselves like one of Escher‘s drawings.

Here is an example language defined by MGrammar in a great article on MSDN called MGrammar in a Nutshell:

module SongSample
{
    language Song
    {
        // Notes
        token Rest = "-";
        token Note = "A".."G";
        token Sharp = "#";
        token Flat = "b";
        token RestOrNote = Rest | Note (Sharp | Flat)?;

        syntax Bar = RestOrNote RestOrNote RestOrNote RestOrNote;
        syntax List(element)
          = e:element => [e]
          | es:List(element) e:element => [valuesof(es), e];

        // One or more bars (recursive technique)
        syntax Bars = bs:List(Bar) => Bars[valuesof(bs)];
        syntax ASong = Music bs:Bars => Song[Bars[valuesof(bs)]];
        syntax Songs = ss:List(ASong) => Songs[valuesof(ss)];

        // Main rule
        syntax Main = Album ss:Songs => Album[ss];

        // Keywords
        syntax Music = "Music";
        syntax Album = "Album";

        // Ignore whitespace
        syntax LF = "\u000A";
        syntax CR = "\u000D";
        syntax Space = "\u0020";

        interleave Whitespace = LF | CR | Space;
    }
}

This is a pretty straight forward way to define a language and generate a parser. Aside from the obvious keywords to define syntax rules and token patterns (with an alternative and more readable format for regular expressions), the => projection operator allows you to shape the MGraph output according to your needs.

I created two simple languages with MGrammar on the plane trip back to Milwaukee from the PDC in November. The majority of my time was spent fussing with the editor, Intellipad, and for the last half hour I found it very easy to create a language on the fly, extending and changing it through experimentation quickly and easily. Projections, which are functional expressions in MGrammar used to shape MGraph output, are the most challenging part. There are a number of techniques that shape the output graph, so it will be good to see how this is approached in future reference examples.

Surreptitiously announced just before I wrote this, Mike Weinhardt at Microsoft indicated that a gallery of example grammars for MGrammar is being put together, to point to the sample grammars for various languages in addition to grammars that the community develops, and it should be available by the end of this month. These examples demonstrating how to define languages and write sensible projections, coming from the developers who are putting MGrammar together, will be an invaluable tool for teaching you how to use common patterns (just as 101 LINQ Samples did for LINQ).

As Doug Purdy explained on .NET Rocks: "People who are building a domain specific language, and they don’t want to understand how to build a parser, or they’re not language designers. Actually, they are language designers. They design a language, but they actually don’t do the whole thing. They don’t build a parser. What they do, they just leverage the XML parser. And what we’re trying to do is provide a toolset for folks where they don’t have to resort to XML in order to do DSLs."

From the same episode, Don Box said of the DSL session at PDC: "I’ve never seen a session with more geek porn in it."

Don: "It’s like crack for developers. It’s kind of addictive; it takes over your life."

Doug: "If you want the power of Anders in your hand…"

The Tool Chain

Now that we have a better sense of what’s included in Oslo in terms of languages, editors, and the shared repository, we can look at the relationship among the other pieces, which are manifested in the CTP as a set of command-line tools. In the future, these will integrate into an IDE, most likely Visual Studio. (I’d expect Intellipad and Quadrant to merge with Visual Studio, but there’s no guaranty this will happen.)

When you create your model with MSchema, you’ll use m to validate that model and generate a SQL script to create a SQL Server 2008 database schema (yes, it only works right now with SQL Server 2008). You’ll also use the m command to validate your object graph (written in MGraph) against your schema, and translate that into a set of SQL commands to perform inserts and updates against tables.

With enough models, there’ll be huge value in adding yours to the repository. If you don’t mind writing MGraph or you generate it automatically with something like an MGraphSerializer class in your code, this may be all you need.

If, on the other hand, you decide you could really benefit by defining your own textual language to use instead of MGraph, you can use MGrammar to define a new language. This language gets compiled by the mg compiler to create your parser, and the mgx command translates code in your new language into an MGraph, which can then be pulled into your database using m.

This diagram depicts the process:

Other than these command-line tools, Quadrant is the highly extensible visual tool for exploring models graphically, and Intellipad is a different face on the same shell for defining DSLs with MGrammar and writing DSL code, as well as writing and verifying MSchema and MGraph code.

We should see fairly soon the convergence of these three languages (MGraph, MSchema, and MGrammar) into a single M language. This makes sense, since what you want to project in your DSL should be something within your model, verified by your schema. This may ultimately make these projections much easier to write.

We’ll also see this tool chain absorbed into multiple development environments, eventually with rich binding across multiple representations of our model, although this will take longer in Visual Studio.

Languages and Nested Languages

I looked at some MService examples, and I can understand Damon’s concern that although it’s nice to have "operation" as a keyword in a service-oriented language, with more keywords giving you the ability to specify aspects of each endpoint and the communications patterns required, that enclosing the business logic within that service language is probably not a good idea. I took this from Dennis van der Stelt’s blog:

service Service
{
operation PhotoUpload(stream : Stream) : Text
{
.PostUriTemplate = "upload";

    index : Text = invoke DateTime.Now.Ticks.ToString();
    filename : Text = "d:\\demo\\photo\\" + index + ".jpg";
    invoke MService.ServiceHelper.StoreInFile(stream, filename);

return index;
}
}

Why not? You’re defining a general purpose language within the curley braces, one capable of defining variables, assigning values, referencing .NET objects, and calling methods. But why do you want to learn a new language to write services when the language you’re using right now is already supremely capable of that? Don’t you already know a good syntax for invoking methods (other than "invoke %mehthod%")? If instead you simply referenced an assembly, type, and method from an MService script, you could externally turn any .NET method with serializable parameters and return value into a service operation by feeding it this kind of file, without having to recompile, and without having to reinvent the wheel.

The possible exception would be if MGrammar adds the ability (as discussed by speakers at the PDC) of supporting multiple layers of enclosing languages within other languages. In other words, you could use MService to define operations and their attributes using its own syntax, and within the curly braces that follow, use the C# or VB.NET parsers to process the logic with the comprehension of a separate language. There are some neat possibilities here, but I expect the development community to be conservative and hesitent about mixing layers of semantics, as there is an awful lot of room for confusion and complexity. It may be better to leave different language blocks in separate files or containers, and to allow them to reference each other as .NET assemblies and XML files reference each other today.

However, I wouldn’t get too hung up on the early versions of these new languages, or any one language specifically. The useful, sensible ones that take real developer needs into account and provide the most value will be adopted, and many more will quickly fall into disuse. But the overall pattern will be for the emergence of an amazing amount of leverage in terms of improving human comprehension and taking advantage of our ability to manipulate structured, symbolic object graphs to build and verify software systems.

Resources

After a few months of research and many hours of writing, I don’t feel like I’ve even scratched the surface. But instead of giving you an absolutely comprehensive picture, I’m going to stop here and continue in future articles. In the meantime, check out the following resources.

For an overview of the development paradigm, look for information on language-oriented programming, including an article I wrote that alludes to how "we will have to raise the level of abstraction to a point that may be hard for us to imagine with our existing tools and languages" due to the "precipitious growth of software complexity". The "community of abstractions" is the model in Oslo-speak.

For Microsoft specific content: there were some great sessions at the PDC (watch the recorded videos). It was covered (with much confusion) on the .NET Rocks! podcast (here and here) as well as on Software Engineering Radio; and there are lots of bloggers talking about their initial experiences with it, such as Shawn Wildermuth, Lars Corneliussen, and of course Chris Sells and Jeff Pinkston. The most clear and coherent explanation I’ve heard was from an interview with Ron Jacobs and David Chappell (Ron gave the keynote at MSDN Dev Con, hosted the ARCast podcast for years). MSDN has at least 29 videos on the Oslo Developer Center, where there’s a good amount of information. including a FAQ. There’s also the online guide for MGrammar, MGrammar in a Nutshell, and the Oslo team blog.

If you’re interested in creating DSLs, make sure to keep a look out for details about the upcoming DSL Developers Conference, which is tentatively planned for April 16-17, immediately following the Lang.NET conference (on general purpose languages) on April 14-16. I’m hoping to be at both this year. And in case you haven’t heard, Microsoft is planning another PDC Conference for 2009, the first time ever these conferences have run for two consecutive years! There will no doubt be much more Oslo news and conference material to cover it at the PDC in November.

Pluralsight, an instructor-led training company, now teaches a two-day "Oslo" Fundamentals course (and Don Box’s blog is hosted there).

The best way to learn about Oslo, however, is to dive in and use it. That’s what I’m doing with my newest system, which needs to be modeled from scratch. So if you haven’t done so already, download the Oslo SDK (link updated to January 2009 SDK) and introduce yourself to the future of modeling and development!

[Click here for the next article in this Oslo series, on common misconceptions and fallacies about Oslo.]

Posted in Data Structures, Development Environment, Distributed Architecture, Language Extensions, Language Innovation, Metaprogramming, Oslo, Problem Modeling, Service Oriented Architecture, Software Architecture, SQL Data Services, Visual Studio, Windows Azure | 44 Comments »

The Future of Programming Languages

Posted by Dan Vanderboom on November 6, 2008

Two of the best sessions at the PDC this year were Anders Hejlsberg’s The Future of C# and a panel on The Future of Programming.

A lot has been said and written about dynamic programming, metaprogramming, and language syntax extensions–not just academically over the past few decades, but also as a recently growing buzz among the designers and users of mainstream object-oriented languages.

Dynamic Programming

After a scene-setting tour through the history and evolution of C#, Anders addressed how C# 4.0 would allow much simpler interoperation between C# and dynamic languages. I’ve been following Charlie Calvert’s Language Futures website, where they’ve been discussing these features early on with the development community. It’s nice to see how seriously they take the feedback they’re getting, and I really think it’s going to have a positive impact on the language as a whole. Initial thoughts revolved around creating a new block of code with code like dynamic { DynamicStuff.SomeUndefinedProperty = “whatever”; }.

But at the PDC we saw that instead dynamic will be a type for our dynamic objects, and so dynamic lookup of members will only be allowed for those variables. Anders’ demo showed off interactions with JavaScript and Python, as well as Office via COM, all without the ugly Type.Missing parameters (optional parameter support also played a part in that). Other ideas revolved around easing Reflection access, and XML document access for Xml nodes dynamically.

Meta-Programming

At the end of his talk, Anders showed a stunning demo of metaprogramming working within C#. It was an early prototype, so all language features were not supported, but it worked similar to Eval where the code was constructed inside a string and then compiled at runtime. But it was flexible and powerful enough that he could create delegates to functions that he Eval’ed up into existence. Someone in the audience asked how this was different from Lisp macros, to which Anders replied: “This is basically Lisp macros.”

Before you get too excited (or worried) about this significant bit of news, Anders made no promises about when metaprogramming would be available, and he subtly suggested that it may very well be a post-4.0 feature. As he said in the Future of Programming Panel, however: “We’re rewriting the compiler in managed code, and I’d say one of the big motivators there is to make it a better metaprogramming system, sort of open up the black box and allow people to actually use the compiler as a service…”

Regardless of when it arrives, I hope they will give serious consideration to providing syntax checking of this macro or meta code, instead of treating it blindly at compile-time as a “magic string”, as has so long plagued the realm of data access. After all, one of the primary advantages of Linq is to enable compile-time checking of queries, to enforce not only strict type checking, but to also more fundamentally ensure that data sources and their members are valid. The irregularity of C#’s syntax, as opposed to Lisp, will make that more difficult (thanks to Paul for pointing this out), but I think most developers will eventually agree it’s a worthwhile cause. Perhaps support for nested grammars in the generic sense will set the stage for enabling this feature.

Language Syntax Extensions

If metaprogramming is about making the compiler available as a service, language extensions are about making the compiler service transparent and extensible.

The majority (but not all) of the language design panel stressed caution in evolving and customizing language syntax and discussed the importance of syntax at length, but they’ve been considering the demands of the development community seriously. At times Anders vacillated between trying to offer alternatives and admitting that, in the end, customization of language syntax by developers would prevail; and that what’s important is how we go about enabling those scenarios without destroying our ability to evolve languages usefully, avoiding their collapse from an excess of ambiguity and inconsistency in the grammar.

“Another interesting pattern that I’m very fond of right now in terms of language evolution is this notion that our static languages, and our programming languages in general, are getting to be powerful enough, that with all of these things we’re picking up from functional programming languages and metaprogramming, that you can–in the language itself–build these little internal DSLs, where you use fluent interface style, and you dot together operators, and you have deferred execution… where you can, in a sense, create little mini languages, except for the syntax.

If you look at parallel extensions for .NET, they have a Parallel.For, where you give the start and how many times you want to go around, and a lambda which is the body you want to execute. And boy, if you squint, that looks like a Parallel For statement.

But it allows API designers to experiment with different styles of programming. And then, as they become popular, we can pick them up and put syntactic veneers on top of them, or we can work to make languages maybe even richer and have extensible syntax like we talked about, but I’m encouraged by the fact that our languages have gotten rich enough that you do a lot of these things without even having to have syntax.” – Anders Hejlsberg

On one hand, I agree with him: the introduction of lambda expressions and extension methods can create some startling new syntax-like patterns of coding that simply weren’t feasible before. I’ve written articles demonstrating some of this, such as New Spin on Spawning Threads and especially The Visitor Design Pattern in C# 3.0. And he’s right: if you squint, it almost looks like new syntax. The problem is that programmers don’t want to squint at their code. As Chris Anderson has noted at the PDC and elsewhere, developers are very particular about how they want their code to look. This is one of the big reasons behind Oslo’s support for authoring textual DSLs with the new MGrammar language.

One idea that came up several times (and which I alluded to above) is the idea of allowing nested languages, in a similar way that Linq comprehensions live inside an isolated syntactic context. C++ developers can redefine many operators in flexible ways, and this can lead to code that’s very difficult to read. This can perhaps be blamed on the inability of the C++ language to provide alternative and more comprehensive syntactic extensibility points. Operators are what they have to work with, so operators are what get used for all kinds of things, which change per type. But their meaning gets so overloaded, literally, that they lose any obvious (context-free) meaning.

But operators don’t have to be non-alphabetic tokens, and the addition of new keywords or symbols could be introduced in limited contexts, such as a modifier for a member definition in a type (to appear alongside visibility, overload, override, and shadowing keywords), or within a delimited block of code such as an r-value, or a curly-brace block for new flow control constructs (one of my favorite ideas and an area most in need of extensions). Language extensions might also be limited in scope to specific assemblies, only importing extensions explicitly, giving library authors the ability to customize their own syntax without imposing a mess on consumers of the library.

Another idea would be to allow the final Action delegate parameter of a function to be expressed as a curly-brace-delimited code block following the function call, in lieu of specifying the parameter within parentheses, and removing the need for a semicolon. For example, with a method defined like this:

public static class Parallel
{
    // Action delegate defined last, to take advantage of C# syntactic sugar
    public static void For(long Start, long Count, Action Action)
    {
        // TODO: implement
    }
}

…a future C# compiler might allow you to write code like this:

Parallel.For(0, 10)
{
    // add code here for the Action delegate parameter
}

As Dr. T points out to me, however, the tricky part will consist of supporting local returns: in other words, when you call return inside that delegate’s code block, you really expect it to return from the enclosing method, not the one defined by the delegate parameter. Support for continue or break would also make for a more intuitive fit. If there’s one thing Microsoft does right, it’s language design, and I have a lot of confidence that issues like this will continue to be recognized and ultimately implemented correctly. In reading their blogs and occasionally sharing ideas with them, it’s obvious they’re as passionate about the language and syntax as I am.

The key for language extensions, I believe, will be to provide more structured extensibility points for syntax (such as control flow blocks), instead of opening up the entire language for arbitrary modification. As each language opens up some new aspect of its syntax for extension, a number of challenges will surface that will need to be dealt with, and it will be critical to solve these problems before continuing on with further evolution of the language. Think of all we’ve gained from generics, and the challenges of dealing with a more complex type system we’ve incurred as a result. We’re still getting updates in C# 4.0 to address shortcomings of generics, such as issues regarding covariance and contravariance. Ultimately, though, generics were well worth it, and I believe the same will be said of metaprogramming and language extensions.

Looking Forward

I’ll have much more to say on this topic when I talk about Oslo and MGrammar. The important points to take away from this are that mainstream language designers are taking these ideas to heart now, and there are so many ideas and options out there that we can and will experiment to find the right combination (or combinations) of both techniques and limitations to make metaprogramming and language syntax extensions useful, viable, and sustainable.

Posted in Conferences, Design Patterns, Dynamic Programming, Functional Programming, Language Extensions, LINQ, Metaprogramming, Reflection, Software Architecture | 1 Comment »

Critical Development

Language design, framework development, UI design, robotics and more.

Categories

Archives

Subscribe

Top Posts

.NET Links

Blogroll