Critical Development

Language design, framework development, UI design, robotics and more.

Archive for the ‘Windows Azure’ Category

Reimagining the IDE

Posted by Dan Vanderboom on May 31, 2010

Overview

After working in Visual Studio for the past decade, I’ve accumulated a broad spectrum of ideas on how the experience could be better.  From microscopic features like “I want to filter Intellisense member lists by member type” to recognition of larger patterns of conceptual organization and comprehension, there aren’t many corners of the IDE that couldn’t be improved with additional features—or in some cases—a redesign.

To put things in perspective, consider how the Windows Mobile platform languished for years and become stale (or “good enough”) until the iPhone changed the game and raised the bar on quality to a whole new level.  It wasn’t until fierce competition stole significant market share that Microsoft completely scrapped the Windows Mobile division and started fresh with a complete redesign called Windows Phone 7.  This is one of the smartest things Microsoft has done in a long time.

After many years of incremental evolution, it’s often necessary to rethink, reimagine, and occassionally even start from scratch in order to make the next revolutionary jump forward.

Visual Studio Focus

Integrated Development Environments have been with us for at least the past decade.  Whether you work in Visual Studio, Eclipse, NetBeans, or another tool, there is tremendous overlap in the set of panels available, the flexible layout of those panels, saved workspaces, and add-in infrastructure to make as much as possible extensible.  I’ll focus on Visual Studio for my examples and explanations since that’s the IDE I’m most familiar with, but there are parallels to other IDEs for much of what I’m going to cover.

Visual Components & Flexible Layout

Visual layout is one thing that IDEs do right.  Instead of a monolithic UI, it’s broken down into individual components such as panels, toolbars, toolboxes, main menus and context menus, code editors, designers, and more.  These components can be laid out at runtime with intuitive drag-and-drop operations that visually suggest the end result.

The panels of an IDE can be docked to any edge of another panel, they can be laid on top of another panel to create tab controls, and adjacent panels can be relatively resized with splitters that appear between panels.  After many years of refinement, it’s hard to imagine a better layout system than this.

The ability to save layouts as workspaces in Expression Blend is a particularly nice feature.  It would be nicer still if the user could define triggers for these workspaces, such as “change layout to the UI Designer workspace when the XAML or Windows Forms designers are opened”.

IDE Hosting

Visual Studio and other development tools have traditionally been desktop applications.  In Silverlight 4, however, we now have a framework sufficiently powerful to build a respectable cross-platform IDE.

With features such as off-line, out-of-browser execution, full screen mode, custom context menus, and trusted access to the local file system, it’s now possible for a great IDE to be built and run on Windows, Mac OS X, or Linux, and to allow a developer to access the IDE and their solutions from any computer with a browser (and the Silverlight plug-in).

There are already programming editors and compilers in the cloud.  In episode 562 of .NET Rocks on teaching programming to kids, their guests point out that a subset of the Small Basic IDE is available in Silverlight.  For those looking to build programming editors, ActiPro has a SyntaxEditor control in WPF that they’re currently porting to Silverlight (for which they report seeing a lot of demand).

Ideally such an IDE would be free, or would have a free version available, but for those of us who need high-end tools and professional-level features sets, imagine how nice it would be to pay a monthly fee for access to an ever-evolving IDE service instead of having to cough up $1,100 or $5,500 (or more) every couple years.  Not only would costs be conveniently amortized over the span of the tool’s use, but all of your personal preferences would be easily synchronized across all computers that you use to work on that IDE.

With cloud computing services such as Windows Azure, it would even be possible to off-load compilation of large solutions to the cloud.  Builds that took 30 minutes could be cut down to a few minutes or less by parallelizing build tasks across mutliple cores and servers.

The era of cloud development tools is upon us.

Solution Explorer & The Project System

Solution Explorer is one of the most useful and important panels in Visual Studio.  It provides us with an organizational tool for all the assets in our solution, and provides a window into the project system on which core behaviors such as builds are based.  It is through the Solution Explorer that we typically add or remove files, and gain access to visual designers and the ever-present code editor.

In many ways, however, Solution Explorer and the project system it represents are built on an old and tired design that hasn’t evolved much since its introduction over ten years ago.

For example, it still isn’t possible to “add existing folder” and have that folder and all of its contents pulled into a project.  If you’ve ever had to rebuild a project file and pull in a large number of files organized in many nested folders, you have a good idea of how painful an effort this can be.

If you’ve ever tried sharing the same code across multiple incompatible platforms, between Full and Compact Framework, or between Silverlight 3 and Full Framework, you’ve likely run into kludgey workarounds like placing multiple project files in the same folder and including the same set of files, or using a tool like Project Linker.

Reference management can also be unwieldy when you have many projects and references.  How do you ensure you’re not accidentally referencing two different versions of the same assembly from two different projects?  My article on Project Reference Oddness in VS2008, which explores the mysterious and indirect ways references work, is by far one of my most popular articles.  I’m guessing that’s because so many people can relate to the complexity and confusion of managing these dependencies.

“Projects” Are Conceptually Overloaded: Violating the Single Responsibility Principle

In perhaps the most important example, consider how multiple projects are packaged for deployment, such as what happens for the sake of debugging.  Which assemblies and other files are copied to the output directory before the program is executed?  The answer, discussed in my Project Reference Oddness article, is that it depends.  Files that are added to a project as “Content” don’t even become part of the assembly: they’re just passed through as a deployment command.

So what exactly is a Visual Studio “project”?  It’s all of these things:

  • A set of source code files that will get compiled, producing an assembly.
  • A set of files that get embedded in the resulting assembly as resources.
  • A set of deployment commands for loose files.
  • A set of deployment commands for referenced assemblies.

If a Visual Studio project were a class definition, we’d say it violated the Single Responsibility Principle.  It’s trying to be too many things: both a definition for an assembly as well as a set of deployment commands.  It’s this last goal that leads to all the confusion over references and deployment.

Let’s examine the reason for this.

A deployment definition is something that can span not only multiple assemblies, but also additional loose files.  In order to debug my application, I need assemblies A, B, and C, as well as some loose files, to be copied to the output directory.  Because there is no room for the deployment definition in the hierarchy visualized by Solution Explorer, however, I must somehow encode that information within the project definitions themselves.

If assembly A references B, then Visual Studio infers that the output of B needs to be copied to A’s output directory when A is built.  Since B references C, we can infer that the output of C needs to be copied to B’s output directory when B is built.  Indirectly, then, C’s output will get dumped in A’s output directory, along with B’s output.

What you end up with is a pipeline of files that shuffles things along from C to B to A.  Hopefully, if all the reference properties are set correctly, this works as intended and the result is good.  But the logic behind all of this is an implicit black box.  There’s no transparency, so when things get complicated and something goes wrong, it can become impossible to figure it out in a reasonable amount of time (try reading through verbose build output sometime).

At one point, just before writing the article on references mentioned above, I was spending 10 hours or more a week just fighting with reference dependencies.  It was a huge mess, and a very expensive way to accomplish absolutely nothing in terms of providing value to customers.

Deployments & Assemblies

Considering our new perspective on the importance of representing deployments as first-class organizational items in solutions, let’s take a look at what that might look like in an IDE.  Focus on the top-left of the screenshot below.

image

The first level of darker text (“Silverlight Client” and “Cloud Services”) are equivalent to “solution folders” in Visual Studio.  They’re labels that can be nested like folders for organizational purposes.  Within each of these areas is a collection of Deployment definitions.  The expanded deployment is for the Shell of our Silverlight application.  The only child of this deployment is a location.

In a desktop application, you might have multiple deployment locations, such as $AppDir$, $AppDir$\Data, or $UserDir$\AppName, each with child nodes representing content to be deployed to those locations.  In Silverlight, however, it doesn’t make sense to deploy to a specific folder since that’s abstracted away from you.  So for this example, the destination is Shell.XAP.

You’ll notice that multiple assemblies are listed.  If this were a web application, you might have a number of loose files as well, such as default.aspx or web.config.  If such files were listed under that deployment, you could double-click one to open and edit in the editor on the right-hand side of the screen.

The nice thing about this setup is the complete transparency: if a file is listed in a deployment path, you know if will be copied to the output directory before debugging begins.  If it’s not listed, it won’t get deployed.  It’s that simple.

The next question you might have is: doesn’t this mean that I have a lot of extra work to manually add each of these assembly files?  Especially when it comes to including the necessary references, nobody wants the additional burden of having to manually drag every needed reference into a deployment definition.

This is pretty easy to deal with.  When you add a reference to an assembly, and that referenced assembly isn’t in the .NET Framework (those are accessed via the GAC and therefore don’t need to be included), the IDE can add that assembly to the deployment definition for you.  Additionally, it would be helpful if all referenced assemblies lit up (with a secondary highlight color) when a referencing assembly was selected in the list.  That way, you’d be able to quickly figure out why each assembly was included in that deployment.  And if you select an assembly that requires a missing assembly, the name of any missing assemblies should appear in a general status area.

What we end up with is a more explicit and transparent way of dealing with deployment definitions separately from assembly definitions, a clean separation of concepts, and direct control over deployment behavior.  Because deployment intent is specified explicitly, this would be a great starting point for installer technologies to plug into the IDE.

In Visual Studio, a project maps many inputs to many outputs, and confuses deployment and assembly definitions.  A Visual Studio “project” is essentially an “input” concept.  In the approach I’ve outlined here, all definitions are “output” concepts; in other words, items in the proposed solution hierarchy are defined in terms of intended results.  It’s always a good idea to “begin with the end in mind” this way.

Multiple Solution Views

In the screenshot above, you’ll notice there’s a dropdown list called Solution View.  The current view is Deployment; the other option is Assembly.  The reason I’ve included two views is because the same assembly may appear in multiple deployments.  If what you want is a list of unique assemblies, that alternative view should be available.

A New Template System

The other redesign required is around the idea of Visual Studio templates.  Instead of solution, project, and project item templates in Visual Studio, you would have four template types: solution, deployment, assembly, and file.  Consider these examples:

Deployment Template: ASP.NET Web Application

  • $AppDir$
    • Assembly: MyWebApp.dll
      • App.xaml.cs
      • App.xaml    (embedded resource)
      • Main.xaml.cs
      • Main.xaml   (embedded resource)
    • File: Default.aspx
    • File: Web.config
    • Folder: App_Data
      • File: SampleData.dat

Solution Template: Silverlight Solution

  • Deployment: Silverlight Client
    • MySLApp.XAP
      • Assembly: MyClient.dll
        • App.xaml.cs
        • App.xaml    (embedded resource)
        • Main.xaml.cs
        • Main.xaml   (embedded resource)
  • Deployment: ASP.NET Web Application
    • $AppDir$
      • Assembly: MyWebApp.dll
        • YouGetTheIdea.cs
      • Folder: ClientBin
        • MySLApp.XAP (auto-copied from Deployment above)
      • File: Default.aspx
      • File: Web.config

Summary

In this article, we explored several features in modern IDEs (Visual Studio specifically), and some of the ways in which imaginative rethinking could bring substantial improvements to the developer experience.  I have to wonder how quickly a large ship like Visual Studio (with 1.5 million lines of mostly C++ code) could turn and adapt to new ideas like this, or whether it makes sense to start fresh without all the burden of legacy.

Though I have many more ideas to share, especially regarding the build system, multiple-language name resolution and refactoring, and IDE REPL tools, I will save all of that for future articles.

Posted in Cloud Computing, Development Environment, Silverlight, User Interface Design, Visual Studio, Windows Azure | Leave a Comment »

Windows Azure: Blobs and Blocks

Posted by Dan Vanderboom on February 21, 2009

I’ve been busy building a new cloud-based service for the past few weeks, using Windows Azure on the back end and Silverlight for the client.  One of the requirements of my service is to allow users to upload files to a highly scalable Internet storage system.  I’m experimenting with Azure’s blob storage for this, and I have a need to upload these blobs (Binary Large OBjects) in separate blocks.  There are two reasons I can tell why you’d want to do this:

  1. Although blobs can be as large as 2 GB in the current technical preview, the largest blob you can put in one operation is 4 MB.  If your file is larger, you have to store separate blocks, and then put a block list to assemble them together and commit them as a blob.
  2. If you want different users to upload different portions of a file, each user will have to upload individual blocks, and you’ll have to put the block list when all blocks are present.  This is something like a reverse BitTorrent or other P2P protocol.

My service needs to deal with separate blocks for the second reason, though the first is likely to be much more common.

Although there’s a good deal of information about blocks and blobs in the REST API for Azure Storage Services, piecing together code to make REST calls with all the appropriate headers (including authentication signatures) isn’t very fun.  Where is the .NET library to make it easy?

There is one, in fact.  If you’ve downloaded and installed the Azure SDK (Jan 2009), you’ll find a samples.zip file that needs to be unzipped, and the solutions built within it.  Particularly, you’ll need the StorageClient solution.  In it, you’ll find that you can save and load blobs (as well as use queues and table storage), but there’s nothing in the API that suggests it supports putting individual blocks, let alone putting block lists to combine all of those blocks into a blob.  The raw state of this API is unfortunate, but the Azure platform is in an early tech preview stage, so we can expect vast improvements in the future.

Until then, however, I dug into it and discovered that there actually was code to put blocks and commit block lists, but it wasn’t exposed in the API (in BlobContainerRest.PutLargeBlobImpl).  Rather, it was called only when the blob you try to put was over the 4 MB limit.  Taking this code and hacking it a bit, I extended the StorageClient library to provide this needed functionality.

First, add these abstract method definitions to the BlobContainer class (in BlobStorage.cs):

public abstract bool PutBlobBlockList(BlobProperties blobProperties, 
    IEnumerable<string> BlockIDs, bool overwrite, string eTag);

public abstract bool PutBlobBlock(BlobProperties blobProperties, string BlockID, 
    Stream stream, long BlockSize, bool overwrite, string eTag);

Next, you’ll need to add the implementations to the BlobContainerRest class (in RestBlobStorage.cs):

public override bool PutBlobBlock(BlobProperties blobProperties, string BlockID, 
    Stream stream, long BlockSize, bool overwrite, string eTag)
{
    NameValueCollection nvc = new NameValueCollection();
    nvc.Add(QueryParams.QueryParamComp, CompConstants.Block);
    nvc.Add(QueryParams.QueryParamBlockId, 
        Convert.ToBase64String(Encoding.Unicode.GetBytes(BlockID)));
    return UploadData(blobProperties, stream, BlockSize, overwrite, eTag, nvc);
}

public override bool PutBlobBlockList(BlobProperties blobProperties, 
    IEnumerable<string> BlockIDs, bool overwrite, string eTag)
{
    bool retval = false;

    using (MemoryStream buffer = new MemoryStream())
    {
        XmlTextWriter writer = new XmlTextWriter(buffer, Encoding.UTF8);
        writer.WriteStartDocument();
        writer.WriteStartElement(XmlElementNames.BlockList);
        foreach (string id in BlockIDs)
        {
            writer.WriteElementString(XmlElementNames.Block, 
                Convert.ToBase64String(Encoding.Unicode.GetBytes(id)));
        }
        writer.WriteEndElement();
        writer.WriteEndDocument();
        writer.Flush();
        buffer.Position = 0; //Rewind

        NameValueCollection nvc = new NameValueCollection();
        nvc.Add(QueryParams.QueryParamComp, CompConstants.BlockList);

        retval = UploadData(blobProperties, buffer, buffer.Length, overwrite, eTag, nvc);
    }

    return retval;
}

In order to test this, I added two buttons to an ASP.NET page, one to upload the blocks and put the block list, and a second to read the blob back to verify the write operations worked:

protected void btnUploadBlobBlocks_Click(object sender, EventArgs e)
{
    var account = new StorageAccountInfo(new Uri("http://127.0.0.1:10000/"), null, "devstoreaccount1", 
        "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==");
    var storage = BlobStorage.Create(account);
    var container = storage.GetBlobContainer("testfiles");

    if (!container.DoesContainerExist())
        container.CreateContainer();

    var properties = new BlobProperties("TestBlob");

    // put block 0

    var ms = new MemoryStream();
    using (StreamWriter sw = new StreamWriter(ms))
    {
        sw.Write("This is block 0.");
        sw.Flush();
        ms.Position = 0;

        var PutBlock0Success = container.PutBlobBlock(properties, "block 0", ms, ms.Length, true, null);
    }

    // put block 1

    ms = new MemoryStream();
    using (StreamWriter sw = new StreamWriter(ms))
    {
        sw.WriteLine("... and this is block 1.");
        sw.Flush();
        ms.Position = 0;

        var PutBlock1Success = container.PutBlobBlock(properties, "block 1", ms, ms.Length, true, null);
    }

    // put block list

    List<string> BlockIDs = new List<string>();
    BlockIDs.Add("block 0");
    BlockIDs.Add("block 1");

    var PutBlockListSuccess = container.PutBlobBlockList(properties, BlockIDs, true, null);
}

protected void btnTestReadBlob_Click(object sender, EventArgs e)
{
    var account = new StorageAccountInfo(new Uri("http://127.0.0.1:10000/"), null, "devstoreaccount1",
        "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==");
    var storage = BlobStorage.Create(account);
    var container = storage.GetBlobContainer("testfiles");

    MemoryStream ms = new MemoryStream();
    BlobContents contents = new BlobContents(ms);
    container.GetBlob("TestBlob", contents, false);
    ms.Position = 0;

    using (var sr = new StreamReader(ms))
    {
        string x = sr.ReadToEnd();
        sr.Close();
    }
}

It’s nothing fancy, but if you put a breakpoint on the last sr.Close command, you’ll see that the value of x contains both blocks of data, equal to “This is block 0…. and this is block 1.”

Posted in Cloud Computing, Design Patterns, Windows Azure | 5 Comments »

Why Oslo is Important

Posted by Dan Vanderboom on January 17, 2009

imageContrary to common misunderstanding and speculation, the point of Oslo is not to put programming in the hands of business analysts who want to write their own business rules.  Do I think some of that will happen?  Architects and engineers will try everything they can imagine.  Some of them will succeed in specific niches or scenarios, but it won’t replace application or system design, and it will probably be very limited for the forseeable future.  Oslo is more about dramatically improving the productivity of designers and developers by generalizing common solution patterns and generating more adaptable tools.

PDC Keynote

Much of the confusion around Oslo occurs for two reasons:

  1. Oslo is designed at a higher level of abstraction than most systems today, so its scope is broad and it will have an impact on virtually every product, solution and service across Microsoft.  It’s difficult to get your head around something that big.
  2. Because of its abstract nature, core concepts are defined in terms that are heavily overloaded, like "Model", "Repository", and "Language".  Once you’ve picked up the lingo and can translate Oslo terminology into language you’re already familiar with, both the concept and magnitude of it will become obvious.

Oslo isn’t something completely new; in fact, Oslo borrows from a lot of previous research and even existing model-driven development tools.  Oslo focuses existing technologies and techniques into a coherent and mature vision of development, combining all parts into a more powerful whole, and promises to deliver a supremely adaptable and efficient platform to develop on.

What Is Oslo?

Oslo is a software factory for generating first-class, tool-supported languages out of your declarative specifications.

A factory is a highly organized production facility
that produces members of a product line
using standardized parts, tools and production processes.

-from a review of Software Factories

The product line is analogous to Oslo’s parsers, transform tools, and IDE plugins for new data models and languages (both textual and visual) that you define.  The standardized parts are Oslo’s library components; the tools are the M languages and the Quadrant/Intellipad application; and the processes are shaped by the flow of data through the Oslo tool chain (see the diagram near the end of this article).

With Oslo, you build the custom tools you need to rapidly build or generate software systems.  It’s all about using the right tool for the job, and having a say in how those tools are shaped to obtain the greatest leverage.

As stated at the home page of softwarefactories.com:

We see a capacity crisis looming. The industry continues to hand-stitch applications distributed over multiple platforms housed by multiple businesses located around the planet, automating business processes like health insurance claim processing and international currency arbitrage, using strings, integers and line by line conditional logic. Most developers build every application as though it is the first of its kind anywhere.

In other words, there’s already a huge shortage of experienced, highly-qualified professionals capable of ensuring the success of these increasingly complex systems, and with the need (and complexity) growing exponentially, our current development practices increasingly fall short of the total demand.

Books like Greenfield’s Software Factories have been advocating building at a higher level of abstraction for years, and my initial reaction was to see it as a natural, evolutionary milestone for a highly mature software system.  However, it’s an awful lot of focused development effort to attain such a level of maturity, and not many organizations are able to pull it off given the state of our current development platforms.

It’s therefore fortuitous that Microsoft teams have taken up the challenge of building these abilities into their .NET platform.  After all, that’s where it really belongs: in the framework.

Unexpected Awesomeness

Oslo of course contains a lot of expected awesomeness, but where it will probably have the most impact in terms of developer productivity is with new first-class languages and language tools.  Why?  It first helps to understand the world of data formats and languages.

We’ve had an explosion of data formats–these mini Domain Specific Languages, if you will (especially in the form of complex configuration files).  As systems evolve and scale, and the ways we can configure and compose our application’s behavior continues to grow, at what point do we perceive that configuration graph as the rich language that it becomes?  Or when our user interfaces evolve from Monolithic to Modular to Composite to Granular Composite (or User Composable), at what point does that persistent object graph become our UX DSL (as with XAML in WPF).

Sometimes we set our standards too low, or are slow to raise them when the time has come to do so.  With XML we get extensibility in defining languages and we think, "If we can parse it, then we can build a tool over it."  I don’t know about you, but I’d much rather work with rich client software–some kind of designer–over a textual data format any day.

But you know how things go: some company like Microsoft builds a whole bunch of cool stuff, driven off some XML configuration, or they unleash something like XAML on which WPF, WF, and more are built.  XAML is great for tools to read and write, and although XML and XAML are textual and not binary and therefore human readable in a text editor (the original intention behind that term), it’s simply not as easy to read as C# or VB.NET.  That’s why we aren’t all rushing to program everything in XAML.

Companies like Microsoft, building from the bottom up, release their platforms well in advance of the thick client user experiences that make them enjoyable to use and which encourages mass adoption.  Their models, frameworks, and applications are so large now that they’re released in massively differentiated stages, producing a technology adoption gap.

By giving that language a syntax other than XML, however, we can approach it in the same way we approach our program logic: in the most human readable and aesthetically-pleasant way we can devise, resembling our programming languages of choice.

Sometimes, the density of data and its structure in our model is such that a visual editor fails to represent that model well.  Source code is a case in point.  You could create a visual designer to visualize flow control, branching logic, and even complex expression building (like the iTunes Smart Playlist), but code in text format is more appropriate in this kind of scenario, and ends up being more efficient with the existing tooling available.  Especially with an IDE like Visual Studio, we’re working with human-millenia of effort that have gone into the great code editing tools we use today.  Oslo respects this need for choice by offering support for building both visual and textual DSLs, and recognizes the fluent definition of new formats and languages as the bridge to the next quantum leap in productivity.

If we had an easy way of defining languages in formats that we developers felt comfortable working with–as we’re comfortable with our general purpose languages and their rich tool support–then we’d be much more productive in the transition between a technology first being released and later having rich tool support over it.  WPF has taken quite a while to be adopted as much as it has, partly due to tool availability and maturity.  Before Expression Blend or Cider designers were released and hand-coding XAML was the only way, those who braved the angle brackets struggled with it.  As I play with Silverlight, I realize how much must still be done in XAML, and how we still struggle.  It’s simply not as nice to work with as my C# code.  Not as rich, and not as strongly tool-supported.

That’s one place Oslo provides value.  With the ability to define new textual and visual DSLs, rigorous verification and validation in a rich set of tools, the promise of Intellisense, colorization of keywords, operators, constants, and more, the Oslo architects recognize the ability to enhance our development experience in a language-agnostic way, raising the level of abstraction because, as they say, the way to solve any technical problem is to approach it at one higher level of indirection.  Unfortunately, this makes Oslo so generalized and abstract that it’s difficult to grasp and therefore to appreciate its immensity.  Once you can take a step back and see how it fits in holistically, you’ll see that it has the potential to dramatically transform the landscape of software development.

Currently, it’s a lot of work to implement all the language services in Visual Studio to give them as rich an experience as we’ve come to expect with C#, VB.NET, and others.  This is a serious impediment to doing this kind of work, so solving the problem at the level of Oslo drastically lowers the barrier to entry for implementing tool-supported languages.  The Oslo bits I’ve seen and played with are very early in the lifecycle for this massive scope of technology, but the more I think about its potential, the more impressed I am with the fundamental concept.  As Chris Anderson explained in his PDC session on MGrammar, MGrammar was an implementation detail, but sometime around June 2007, that feature team realized just how much customers wanted direct access to it and decided to release MGrammar to the world.

Modeling & The Repository

That’s all well and good for DSLs and language enthusiasts/geeks, but primarily perhaps, Oslo is about the creation, exploration, relation, and execution of models in an interoperable way.  In other words, all of the models that are currently used to describe a software system, or an entire IT environment, are either not encoded formally enough to verify or execute, or they’re encoded or stored in proprietary ways that don’t allow interoperability with other models.  A diagram in Visio or PowerPoint documenting network topology, for example, knows nothing about the component architecture or deployment model of the software systems installed and running on that network.

When people usually talk about models, they imagine high-level architecture documents, overviews used to visually summarize work that is much more granular in nature.  These models aren’t detailed, and they normally aren’t kept up to date and in sync with the current design as changes are made.  But modeling in Oslo is not an attempt to make these visual models contain all of the necessary detail, or to develop software with visual tools exclusively.  Oslo simply provides the tools, both graphical and textual, to define and relate many models.  It will be up to the development community to decide how all these tools are ultimately used, which parts of our systems will be specified in a mix of general purpose, domain specific, and visual languages.  Ultimately, Oslo will provide the material and glue to fill the gaps between the high and low level specifications, and unite them into a common, connected, and much more useful set of data.

To grasp what Oslo modeling is really all about requires that we expand our definition of "model", to see the models expressed in our configuration and XAML files, in our applications’ database schemas, in our entity classes, and so on.  As software grows in complexity and becomes more composable, we can use various languages to model its behavior, store that in the repository for runtime execution, inspection, or reuse by other systems.

This funny and clever Oslo video (reminiscent of The Hitchhiker’s Guide to the Galaxy) explains modeling in the broader sense alluded to here.

If we had some universal container for the storage of all different kinds of models, and a standardized way of relating entities across models, we’d be able to do things like impact analysis, where we could see the effect on software systems if someone were to alter the network it was running on; or powerful data mining on the IT execution environment of a business.

Many different tools, with different audiences, will be able to connect into this repository to manipulate aspects of the models that they understand and have access to.  This is just the tip of the iceberg.  We already model so much of what we do in the IT and software worlds, and as we begin adopting business process middleware and orchestration software like BizTalk, there’s a huge amount of value in those models converging and connecting.  That’s where the Oslo Repository comes in.

Oslo provides interoperability among models in the same way that SOA provides interoperability among services.  Not unlike the interoperability we have now among many different languages all sharing the same CLR specification.

Bridging data models across repositories or in shared repository is a major step forward.  With Windows Azure and Microsoft’s commitment to their online services platform (and considering the momentum of the SaaS movement with Amazon, Google, and others), shared storage and data sets are the future.  (Check out SQL Data Services if you haven’t already, and watch for some exciting announcements coming later this year!)

The Dichotomy of Data vs. Metadata

Jeff Pinkston from the Oslo team aptly reflects the attitude of the group when he scoffs at the categorical difference between data and metadata.  In terms of storing and querying it, serializing and communicating it, and everything else that matters in enterprise software, data is data and there’s no reason not to treat it the same when it comes to architecting a system.  We have our primary models and our secondary models, our shared models and our protected models, but they’re still just models that shape our software’s behavior, and they share all of the same characteristics when it comes to manipulation and access.  It’s their ultimate effect that differs.

It’s worth noting, I think, the line that’s been drawn between code and data in some programming languages and not in others (C# vs. LISP).  A division has been made for the sake of security rather than necessity.  Machine instruction codes are represented in the same sort of binary data and realized in the same digital circuitry as traditional user data.  It’s tempting to keep things locked down and divided, but as languages evolve to become more late bound and dynamic (and as the tools evolve to make this feasible), there will be more need for the manipulation of expression trees and ASTs.  I strongly suspect the lines will blur until they disappear.

Schema and Object Instance Languages

In order to define models, we need a tool.  In Oslo, this is a textual language called MShema and an editor called Intellipad.  I personally think it’s odd to talk people’s ears off about "model, model, model", and then to use the synonym "schema" to name the language, but all of these names could change before they’re shipped for all we know.

This is a simple example of an MSchema document:

module MyModel
{
    type Person
    {
        LastName : Text;
        FirstName : Text;
    }

    People : Person*;
}

By running this through the "M Compiler", a SQL script is generated that will create the appropriate database objects.  Intellipad is able to verify the correctness of your schema, and what’s really nice is that you don’t even have to specify data types when you start sketching out your model.  Defaults are assumed, and you can get more specific as your model evolves.

MGraph is a language for defining instances of objects, constrained by an MSchema and similar in format.  So MSchema is to MGraph what XSD is to XML.

In this article, Lars Corneliussen explains Microsoft’s vision to make MGraph as common as XML is today.  Take a look at his article to see a side-by-side comparison of the same object represented as XML (POX), JSON, and MGraph, and decide for yourself which you like best (or see below).

MSchema and MGraph are easier and more efficient to read and write than XML.  Their message format resembles typical structured programming languages, and developers are already familiar with these formats.  XML is a fine format for a tool; it’s human readable but not human-friendly.  A C-style language, on the other hand, is much more human-friendly than all of the angle brackets and the redundancy (and verbosity) of tag text.  That narrows down our choice to JSON and MGraph.

In JSON, the property/field/attribute names are delimited by quotation marks, suggesting that the whole structure is a dumb property bag.

{
    "LastName" : "Vanderboom",
    "FirstName" : "Dan"
}

MGraph has a very similar syntax, but its attribute property names are recognized and validated by the parser generated from MSchema, so the quotation marks are unnecessary.  It ends up looking more natural, and a little more concise.

{
    LastName : "Vanderboom",
    FirstName : "Dan"
}

Because MGraph is just a message format, and Microsoft’s service offerings already support multiple message formats (SOAP/POX/JSON/etc.), it wouldn’t disrupt any of their architecture to add an MGraph adapter, and I’ll be shocked if I don’t hear about one in their next release.

Meta-Languages and MGrammar

In the same way that Oslo includes a meta-model because it allows us to define models, it also includes a meta-language because it allows us to define languages (as YACC and ANTLR have done).  However, just as Pinkston doesn’t think data and metadata should be treated different, it makes sense to think of a language that defines languages as just another language.  There is something Zen about that, where the tools somehow seem to bend back upon themselves like one of Escher‘s drawings.

DrawingHands

Here is an example language defined by MGrammar in a great article on MSDN called MGrammar in a Nutshell:

module SongSample
{
    language Song
    {
        // Notes
        token Rest = "-";
        token Note = "A".."G";
        token Sharp = "#";
        token Flat = "b";
        token RestOrNote = Rest | Note (Sharp | Flat)?;

        syntax Bar = RestOrNote RestOrNote RestOrNote RestOrNote;
        syntax List(element)
          = e:element => [e]
          | es:List(element) e:element => [valuesof(es), e];

        // One or more bars (recursive technique)
        syntax Bars = bs:List(Bar) => Bars[valuesof(bs)];
        syntax ASong = Music bs:Bars => Song[Bars[valuesof(bs)]];
        syntax Songs = ss:List(ASong) => Songs[valuesof(ss)];

        // Main rule
        syntax Main = Album ss:Songs => Album[ss];

        // Keywords
        syntax Music = "Music";
        syntax Album = "Album";

        // Ignore whitespace
        syntax LF = "\u000A";
        syntax CR = "\u000D";
        syntax Space = "\u0020";

        interleave Whitespace = LF | CR | Space;
    }
}

This is a pretty straight forward way to define a language and generate a parser.  Aside from the obvious keywords to define syntax rules and token patterns (with an alternative and more readable format for regular expressions), the => projection operator allows you to shape the MGraph output according to your needs.

I created two simple languages with MGrammar on the plane trip back to Milwaukee from the PDC in November.  The majority of my time was spent fussing with the editor, Intellipad, and for the last half hour I found it very easy to create a language on the fly, extending and changing it through experimentation quickly and easily.  Projections, which are functional expressions in MGrammar used to shape MGraph output, are the most challenging part.  There are a number of techniques that shape the output graph, so it will be good to see how this is approached in future reference examples.

Surreptitiously announced just before I wrote this, Mike Weinhardt at Microsoft indicated that a gallery of example grammars for MGrammar is being put together, to point to the sample grammars for various languages in addition to grammars that the community develops, and it should be available by the end of this month.  These examples demonstrating how to define languages and write sensible projections, coming from the developers who are putting MGrammar together, will be an invaluable tool for teaching you how to use common patterns (just as 101 LINQ Samples did for LINQ).

As Doug Purdy explained on .NET Rocks: "People who are building a domain specific language, and they don’t want to understand how to build a parser, or they’re not language designers.  Actually, they are language designers.  They design a language, but they actually don’t do the whole thing.  They don’t build a parser.  What they do, they just leverage the XML parser.  And what we’re trying to do is provide a toolset for folks where they don’t have to resort to XML in order to do DSLs."

From the same episode, Don Box said of the DSL session at PDC: "I’ve never seen a session with more geek porn in it."

Don: "It’s like crack for developers.  It’s kind of addictive; it takes over your life."

Doug: "If you want the power of Anders in your hand…"

The Tool Chain

Now that we have a better sense of what’s included in Oslo in terms of languages, editors, and the shared repository, we can look at the relationship among the other pieces, which are manifested in the CTP as a set of command-line tools.  In the future, these will integrate into an IDE, most likely Visual Studio.  (I’d expect Intellipad and Quadrant to merge with Visual Studio, but there’s no guaranty this will happen.)

When you create your model with MSchema, you’ll use m to validate that model and generate a SQL script to create a SQL Server 2008 database schema (yes, it only works right now with SQL Server 2008).  You’ll also use the m command to validate your object graph (written in MGraph) against your schema, and translate that into a set of SQL commands to perform inserts and updates against tables.

With enough models, there’ll be huge value in adding yours to the repository.  If you don’t mind writing MGraph or you generate it automatically with something like an MGraphSerializer class in your code, this may be all you need.

If, on the other hand, you decide you could really benefit by defining your own textual language to use instead of MGraph, you can use MGrammar to define a new language.  This language gets compiled by the mg compiler to create your parser, and the mgx command translates code in your new language into an MGraph, which can then be pulled into your database using m.

This diagram depicts the process:

image

Other than these command-line tools, Quadrant is the highly extensible visual tool for exploring models graphically, and Intellipad is a different face on the same shell for defining DSLs with MGrammar and writing DSL code, as well as writing and verifying MSchema and MGraph code.

We should see fairly soon the convergence of these three languages (MGraph, MSchema, and MGrammar) into a single M language.  This makes sense, since what you want to project in your DSL should be something within your model, verified by your schema.  This may ultimately make these projections much easier to write.

We’ll also see this tool chain absorbed into multiple development environments, eventually with rich binding across multiple representations of our model, although this will take longer in Visual Studio.

Languages and Nested Languages

I looked at some MService examples, and I can understand Damon’s concern that although it’s nice to have "operation" as a keyword in a service-oriented language, with more keywords giving you the ability to specify aspects of each endpoint and the communications patterns required, that enclosing the business logic within that service language is probably not a good idea.  I took this from Dennis van der Stelt’s blog:

service Service
{
  operation PhotoUpload(stream : Stream) : Text
  {
    .PostUriTemplate = "upload";

    index : Text = invoke DateTime.Now.Ticks.ToString();
    filename : Text = "d:\\demo\\photo\\" + index + ".jpg";
    invoke MService.ServiceHelper.StoreInFile(stream, filename);

    return index;
  }
}

Why not?  You’re defining a general purpose language within the curley braces, one capable of defining variables, assigning values, referencing .NET objects, and calling methods.  But why do you want to learn a new language to write services when the language you’re using right now is already supremely capable of that?  Don’t you already know a good syntax for invoking methods (other than "invoke %mehthod%")?  If instead you simply referenced an assembly, type, and method from an MService script, you could externally turn any .NET method with serializable parameters and return value into a service operation by feeding it this kind of file, without having to recompile, and without having to reinvent the wheel.

The possible exception would be if MGrammar adds the ability (as discussed by speakers at the PDC) of supporting multiple layers of enclosing languages within other languages.  In other words, you could use MService to define operations and their attributes using its own syntax, and within the curly braces that follow, use the C# or VB.NET parsers to process the logic with the comprehension of a separate language.  There are some neat possibilities here, but I expect the development community to be conservative and hesitent about mixing layers of semantics, as there is an awful lot of room for confusion and complexity.  It may be better to leave different language blocks in separate files or containers, and to allow them to reference each other as .NET assemblies and XML files reference each other today.

However, I wouldn’t get too hung up on the early versions of these new languages, or any one language specifically.  The useful, sensible ones that take real developer needs into account and provide the most value will be adopted, and many more will quickly fall into disuse.  But the overall pattern will be for the emergence of an amazing amount of leverage in terms of improving human comprehension and taking advantage of our ability to manipulate structured, symbolic object graphs to build and verify software systems.

Resources

After a few months of research and many hours of writing, I don’t feel like I’ve even scratched the surface.  But instead of giving you an absolutely comprehensive picture, I’m going to stop here and continue in future articles.  In the meantime, check out the following resources.

For an overview of the development paradigm, look for information on language-oriented programming, including an article I wrote that alludes to how "we will have to raise the level of abstraction to a point that may be hard for us to imagine with our existing tools and languages" due to the "precipitious growth of software complexity".  The "community of abstractions" is the model in Oslo-speak.

For Microsoft specific content: there were some great sessions at the PDC (watch the recorded videos).  It was covered (with much confusion) on the .NET Rocks! podcast (here and here) as well as on Software Engineering Radio; and there are lots of bloggers talking about their initial experiences with it, such as Shawn Wildermuth, Lars Corneliussen, and of course Chris Sells and Jeff Pinkston.  The most clear and coherent explanation I’ve heard was from an interview with Ron Jacobs and David Chappell (Ron gave the keynote at MSDN Dev Con, hosted the ARCast podcast for years).  MSDN has at least 29 videos on the Oslo Developer Center, where there’s a good amount of information. including a FAQ.  There’s also the online guide for MGrammar, MGrammar in a Nutshell, and the Oslo team blog.

If you’re interested in creating DSLs, make sure to keep a look out for details about the upcoming DSL Developers Conference, which is tentatively planned for April 16-17, immediately following the Lang.NET conference (on general purpose languages) on April 14-16.  I’m hoping to be at both this year.  And in case you haven’t heard, Microsoft is planning another PDC Conference for 2009, the first time ever these conferences have run for two consecutive years!  There will no doubt be much more Oslo news and conference material to cover it at the PDC in November.

Pluralsight, an instructor-led training company, now teaches a two-day "Oslo" Fundamentals course (and Don Box’s blog is hosted there).

The best way to learn about Oslo, however, is to dive in and use it.  That’s what I’m doing with my newest system, which needs to be modeled from scratch.  So if you haven’t done so already, download the Oslo SDK (link updated to January 2009 SDK) and introduce yourself to the future of modeling and development!

[Click here for the next article in this Oslo series, on common misconceptions and fallacies about Oslo.]

Posted in Data Structures, Development Environment, Distributed Architecture, Language Extensions, Language Innovation, Metaprogramming, Oslo, Problem Modeling, Service Oriented Architecture, Software Architecture, SQL Data Services, Visual Studio, Windows Azure | 44 Comments »

MSDN Developer Conference in Chicago

Posted by Dan Vanderboom on January 13, 2009

I just got home to Milwaukee from the MSDN Developer Conference in Chicago, about two hours drive.  I knew that it would be a rehash of the major technologies revealed at the PDC which I was at in November, so I wasn’t sure how much value I’d get out of it, but I had a bunch of questions about their new technologies (Azure, Oslo, Geneva, VS2010, .NET 4.0, new language stuff), and it just sounded like fun to go out to Fogo de Chao for dinner (a wonderful Brazilian steakhouse, with great company).

So despite my reservations, I’m glad I went.  I think it also helped that I’ve had since November to research and digest all of this new stuff, so that I could be ready with good questions to ask.  There’ve been so many new announcements, it’s been a little overwhelming.  I’m still picking up the basics of Silverlight/WPF and WCF/WF, which have been out for a while now.  But that’s part of the fun and the challenge of the software industry.

Sessions

With some last minute changes to my original plan, I ended up watching all four Azure sessions.  All of the speakers did a great job.  That being said, “A Lap Around Azure” was my least favorite content because it was so introductory and general.  But the opportunity to drill speakers for information, clarification, or hints of ship dates made it worth going.

I was wondering, for example, if the ADO.NET Data Services Client Library, which talks to a SQL Server back end, can also be used to point to a SQL Data Services endpoint in the cloud.  And I’m really excited knowing now that it can, because that means we can use real LINQ (not weird LINQ-like syntax in a URI).  And don’t forget Entities!

I also learned that though my Mesh account (which I love and use every day) is beta, there’s a CTP available for developers that includes new features like tracking of Mesh Applications.  I’ve been thinking about Mesh a lot, not only because I use it, but because I wanted to determine if I could use the synchronization abilities in the Mesh API to sync records in a database.

<speculation Mode=”RunOnSentence”>
If Microsoft is building this entire ecosystem of interoperable services, and one of them does data storage and querying (SQL Data Services), and another does synchronization and conflict resolution (Mesh Services)–and considering how Microsoft is making a point of borrowing and building on existing knowledge (REST/JSON/etc.) instead of creating a new proprietary stack–isn’t it at least conceivable that these two technologies would at some point converge in the future into a cloud data services replication technology?
</speculation>

I’m a little disappointed that Ori Amiga’s Mesh Mobile wasn’t mentioned.  It’s a very compelling use of the Mesh API.

The other concern I’ve had lately is the apparent immaturity of SQL Data Services.  As far as what’s there in the beta, it’s tables without enforceable schemas (so far), basic joins, no grouping, no aggregates, and a need to manually partition across virtual instances (and therefore to also deal with the consequences of that partitioning, which affects querying, storage, etc.).  How can I build a serious enterprise, Internet-scale system without grouping or aggregates in the database tier?  But as several folks suggested and speculated, Data Services will most likely have these things figured out by the time it’s released, which will probably be the second half of 2009 (sooner than I thought).

Unfortunately, if you’re using Mesh to synchronize a list of structured things, you don’t get the rich querying power of a relational data store; and if you use SQL Data Services, you don’t get the ability to easily and automatically synchronize data with other devices.  At some point, we’ll need to have both of these capabilities working together.

When you stand back and look at where things are going, you have to admit that the future of SQL Data Services looks amazing.  And I’m told this team is much further ahead than some of the other teams in terms of robustness and readiness to roll out.  In the future (post 2009), we should have analytics and reporting in the cloud, providing Internet-scale analogues to their SQL Analysis Server and SQL Reporting Services products, and then I think there’ll be no stopping it as a mass adopted cloud services building block.

Looking Forward

The thought that keeps repeating in my head is: after we evolve this technology to a point where rapid UX and service development is possible and limitless scaling is reached in terms of software architecture, network load balancing, and hardware virtualization, where does the development industry go from there?  If there are no more rungs of the scalability ladder we have to climb, what future milestones will we reach?  Will we have removed the ceiling of potential for software and what it can accomplish?  What kind of impact will that have on business?

Sometimes I suspect the questions are as valuable as the answers.

Posted in ADO.NET Data Services, Conferences, Distributed Architecture, LINQ, Mesh, Oslo, Service Oriented Architecture, SQL Analysis Services, SQL Data Services, SQL Reporting Services, SQL Server, Virtualization, Windows Azure | 1 Comment »