Critical Development

Enterprise modeling, design, development, languages, and tools.

Archive for the ‘Functional Programming’ Category

Better Tool Support for .NET

Posted by Dan Vanderboom on September 7, 2009

Productivity Enhancing Tools

Visual Studio has come a long way since its debut in 2002.  With the imminent release of 2010, we’ll see a desperately-needed overhauling of the archaic COM extensibility mechanisms (to support the Managed Package Framework, as well as MEF, the Managed Extensibility Framework) and a redesign of the user interface in WPF that I’ve been pushing for and predicted as inevitable quite some time ago.

For many alpha geeks, the Visual Studio environment has been extended with excellent third-party, productivity-enhancing tools such as CodeRush and Resharper.  I personally feel that the Visual Studio IDE team has been slacking in this area, providing only very weak support for refactorings, code navigation, and better Intellisense.  While I understand their desire to avoid stepping on partners’ toes, this is one area I think makes sense for them to be deeply invested in.  In fact, I think a new charter for a Developer Productivity Team is warranted (or an expansion of their team if it already exists).

It’s unfortunately a minority of .NET developers who know about and use these third-party tools, and the .NET community as a whole would without a doubt be significantly more productive if these tools were installed in the IDE from day one.  It would also help to overcome resistance from development departments in larger organizations that are wary of third-party plug-ins, due perhaps to the unstable nature of many of them.  Microsoft should consider purchasing one or both of them, or paying a licensing fee to include them in every copy of Visual Studio.  Doing so, in my opinion, would make them heroes in the eyes of the overwhelming majority of .NET developers around the world.

It’s not that I mind paying a few hundred dollars for these tools.  Far from it!  The tools pay for themselves very quickly in time saved.  The point is to make them ubiquitous: to make high-productivity coding a standard of .NET development instead of a nice add-on that is only sometimes accepted.

Consider just from the perspective of watching speakers at conferences coding up samples.  How many of them don’t use such a tool in their demonstration simply because they don’t want to confuse their audience with an unfamiliar development interface?  How many more demonstrations could they be completing in the limited time they have available if they felt more comfortable using these tools in front of the masses?  You know you pay good money to attend these conferences.  Wouldn’t you like to cover significantly more ground while you’re there?  This is only likely to happen when the tool’s delivery vehicle is Visual Studio itself.  Damon Payne makes a similar case for the inclusion of the Managed Extensibility Framework in .NET Framework 4.0: build it into the core and people will accept it.

The Gorillas in the Room

CodeRush and Resharper have both received recent mention in the Hanselminutes podcast (episode 196 with Mark Miller) and in the Deep Fried Bytes podcast (episode 35 with Corey Haines).  If you haven’t heard of CodeRush, I recommend watching these videos on their use.

For secondary information on CodeRush, DXCore, and the principles with which they were designed, I recommend these episodes of DotNetRocks:

I don’t mean to be so biased toward CodeRush, but this is the tool I’m personally familiar with, has a broader range of functionality, and it seems to get the majority of press coverage.  However, those who do talk about Resharper do speak highly of it, so I recommend you check out both of them to see which one works best for you.  But above all: go check them out!

Refactor – Rename

Refactoring code is something we should all be doing constantly to avoid the accumulation of technical debt as software projects and the requirements on which they are based evolve.  There are many refactorings in Visual Studio for C#, and many more in third-party tools for several languages, but I’m going to focus here on what I consider to be the most important refactoring of them all: Rename.

Why is Rename so important?  Because it’s so commonly used, and it has such far-reaching effects.  It is frequently the case that we give poor names to identifiers before we clearly understand their role in the “finished” system, and even more frequent that an item’s role changes as the software evolves.  Failure to rename items to accurately reflect their current purpose is a recipe for code rot and greater code maintenance costs, developer confusion, and therefore buggy logic (with its associated support costs).

When I rename an identifier with a refactoring tool, all of the references to that identifier are also updated.  There might be hundreds of references.  In the days before refactoring tools, one would accomplish this with Find-and-Replace, but this is dangerous.  Even with options like “match case” and “match whole word”, it’s easy to rename the wrong identifiers, rename pieces of string literals, and so on; and if you forget to set these options, it’s worse.  You can go through each change individually, but that can take a very long time with hundreds of potential updates and is a far cry from a truly intelligent update.

Ultimately, the intelligence of the Rename refactoring provides safety and confidence for making far-reaching changes, encouraging more aggressive refactoring practices on a more regular basis.

Abolishing Magic Strings

I am intensely passionate about any tool or coding practice that encourages refactoring and better code hygiene.  One example of such a coding practice is the use of lambda expressions to select identifiers instead of using evil “magical strings”.  From my article on dynamically sorting Linq queries, the use of “magic strings” would force me to write something like this to dynamically sort a Linq query:

Customers = Customers.Order("LastName").Order("FirstName", SortDirection.Descending);

The problem here is that “LastName” and “FirstName” are oblivious to the Rename refactoring.  Using the refactoring tool might give me a false sense of security in thinking that all of my references to those two fields have been renamed, leading me to The Pit of Despair.  Instead, I can define a function and use it like the following:

public static IOrderedEnumerable<T> Order<T>(this IEnumerable<T> Source,
    Expression<Func<T, object>> Selector, SortDirection SortDirection)
{
    return Order(Source, (Selector.Body as MemberExpression).Member.Name, SortDirection);
}

Customers = Customers.Order(c => c.LastName).Order(c => c.FirstName, SortDirection.Descending);

This requires a little understanding of the structure of expressions to implement, but the benefit is huge: I can now use the refactoring tool with much greater confidence that I’m not introducing subtle reference bugs into my code.  For such a simple example, the benefit is dubious, but multiply this by hundreds or thousands of magic string references, and the effort involved in refactoring quickly becomes overwhelming.

Coding in this style is most valuable when it’s a solution-wide convention.  So long as you have code that strays from this design philosophy, you’ll find yourself grumbling and reaching for the inefficient and inelegant Find-and-Replace tool.  The only time it really becomes an issue, then, is when accessing libraries that you have no control over, such as the Linq-to-Entities and the Entity Framework, which makes extensive use of magic strings.  In the case of EF, this is mitigated somewhat by your ability to regenerate the code it uses.  In other libraries, it may be possible to write extension methods like the Order method shown above.

It’s my earnest hope that library and framework authors such as the .NET Framework team will seriously consider alternatives to, and an abolition of, “magic strings” and other coding practices that frustrate otherwise-powerful refactoring tools.

Refactoring Across Languages

A tool is only as valuable as it is practical.  The Rename refactoring is more valuable when coding practices don’t frustrate it, as explained above.  Another barrier to the practical use of this tool is the prevalence of multiple languages within and across projects in a Visual Studio solution.  The definition of a project as a single-language container is dubious when you consider that a C# or VB.NET project may also contain HTML, ASP.NET, XAML, or configuration XML markup.  These are all languages with their own parsers and other language services.

So what happens when identifiers are shared across languages and a Rename refactoring is executed?  It depends on the languages involved, unfortunately.

When refactoring a C# class in Visual Studio, the XAML’s x:Class value is also updated.  What we’re seeing here is cross-language refactoring, but unfortunately it only works in one direction.  There is no refactor command to update the x:Class value from the XAML editor, so manually changing it causes my C# class to become sadly out of sync.  Furthermore, this seems to be XAML specific.  If I refactor the name of an .aspx.cs class, the Inherits attribute of the Page directive in the .aspx file doesn’t update.

How frequent do you think it is that someone would want to change a code-behind file for an ASP.NET page, and yet would not want to change the Inherits attribute?  Probably not very common (okay, probably NEVER).  This is a matter of having sensible defaults.  When you change an identifier name in this way, the development environment does not respond in a sensible way by default, forcing the developer to do extra work and waste time.  This is a failure in UI design for the same reason that Intellisense has been such a resounding success: Intellisense anticipates our needs and works with us; the failure to keep identifiers in sync by default is diametrically opposed to this intelligence.  This represents a fragmented and inconsistent design for an IDE to possess, thus my hope that it will be addressed in the near future.

The problem should be recognized as systemic, however, and addressed in a generalized way.  Making individual improvements in the relationships between pairs of languages has been almost adequate, but I think it would behoove us to take a step back and take a look at the future family of languages supported by the IDE, and the circumstances that will quickly be upon us with Microsoft’s Oslo platform, which enables developers to more easily build tool-supported languages (especially DSLs, Domain Specific Languages). 

Even without Oslo, we have seen a proliferation of languages: IronRuby, IronPython, F#, and the list goes on.  A refactoring tool that is hard-coded for specific languages will be unable to keep pace with the growing family of .NET and markup languages, and certainly unable to deal with the demands of every DSL that emerges in the next few years.  If instead we had a way to identify our code identifiers to the refactoring tool, and indicate how they should be bound to identifiers in other languages in other files, or even other projects or solutions, the tools would be able to make some intelligent decisions without understanding each language ahead of time.  Each language’s language service could supply this information.  For more information on Microsoft Oslo and its relationship to a world of many languages, see my article on Why Oslo Is Important.

Without this cross-language identifier binding feature, we’ll remain in refactoring hell.  I offered a feature suggestion to the Oslo team regarding this multi-master synchronization of a model across languages that was rejected, much to my dismay.  I’m not sure if the Oslo team is the right group to address this, or if it’s more appropriate for the Visual Studio IDE team, so I’m not willing to give up on this yet.

A Default of Refactor-Rename

The next idea I’d like to propose here is that the Rename refactoring is, in fact, a sensible default behavior.  In other words, when I edit an identifier in my code, I more often than not want all of the references to that identifier to change as well.  This is based on my experience in invoking the refactoring explicitly countless times, compared to the relatively few times I want to “break away” that identifier from all the code that references.

Think about it: if you have 150 references to variable Foo, and you change Foo to FooBar, you’re going to have 150 broken references.  Are you going to create a new Foo variable to replace them?  That workflow doesn’t make any sense.  Why not just start editing the identifier and have the references update themselves implicitly?  If you want to be aware of the change, it would be trivial for the IDE to indicate the number of references that were updated behind the scenes.  Then, if for some reason you really did want to break the references, you could explicitly launch a refactoring tool to “break references”, allowing you to edit that identifier definition separately.

The challenge that comes to mind with this default behavior concerns code that spans across solutions that aren’t loaded into the IDE at the same time.  In principle, this could be dealt with by logging the refactoring somewhere accessible to all solutions involved, in a location they can all access and which gets checked into source control.  The next time the other solutions are loaded, the log is loaded and the identifiers are renamed as specified.

Language Property Paths

If you’ve done much development with Silverlight or WPF, you’ve probably run into the PropertyPath class when using data binding or animation.  PropertyPath objects represent a traversal path to a property such as “Company.CompanyName.Text”.  The travesty is that they’re always “magic strings”.

My argument is that the property path is such an important construct that it deserves to be an core part of language syntax instead of just a type in some UI-platform-specific library.  I created a data binding library for Windows Forms for which I created my own property path syntax and type, and there are countless non-UI scenarios in which this construct would also be incredibly useful.

The advantage of having a language like C# understand property path syntax is that you avoid a whole class of problems that developers have used “magic strings” to solve.  The compiler can then make intelligent decisions about the correctness of paths, and errors can be identified very early in the cycle.

Imagine being able to pass property paths to methods or return then from functions as first-class citizens.  Instead of writing this:

Binding NameTextBinding = new Binding("Name") { Source = customer1; }

… we could write something like this, have access to the Rename refactoring, and even get Intellisense support when hitting the dot (.) operator:

Binding NameTextBinding = new Binding(@Customer.Name) { Source = customer1; }

In this code example, I use the fictitious @ operator to inform the compiler that I’m specifying a property path and not trying to reference a static property called Name on the Customer class.

With property paths in the language, we could solve our dynamic Linq sort problem cleanly, without using lambda expressions to hack around the problem:

Customers = Customers.Order(@Customer.LastName).Order(@Customer.FirstName, SortDirection.Descending);

That looks and feels right to me.  How about you?

Summary

There are many factors of developer productivity, and I’ve established refactoring as one of them.  In this article I discussed tooling and coding practices that support or frustrate refactoring.  We took a deep look into the most important refactoring we have at our disposal, Rename, and examined how to get the greatest value out of it in terms of personal habits, as well as long-term tooling vision and language innovation.  I proposed including property paths in language syntax due to its general usefulness and its ability to solve a whole class of problems that have traditionally been solved using problematic “magic strings”.

It gives me hope to see the growing popularity of Fluent Interfaces and the use of lambda expressions to provide coding conventions that can be verified by the compiler, and a growing community of bloggers (such as here and here) writing about the abolition of “magic strings” in their code.  We can only hope that Microsoft program managers, architects, and developers on the Visual Studio and .NET Framework teams are listening.

Posted in Data Binding, Data Structures, Design Patterns, Development Environment, Dynamic Programming, Functional Programming, LINQ, Language Innovation, Oslo, Silverlight, Software Architecture, User Interface Design, Visual Studio, Visual Studio Extensibility, Windows Forms | Leave a Comment »

The Future of Programming Languages

Posted by Dan Vanderboom on November 6, 2008

Two of the best sessions at the PDC this year were Anders Hejlsberg’s The Future of C# and a panel on The Future of Programming.

A lot has been said and written about dynamic programming, metaprogramming, and language syntax extensions–not just academically over the past few decades, but also as a recently growing buzz among the designers and users of mainstream object-oriented languages.

Anders Hejlsberg

Dynamic Programming

After a scene-setting tour through the history and evolution of C#, Anders addressed how C# 4.0 would allow much simpler interoperation between C# and dynamic languages.  I’ve been following Charlie Calvert’s Language Futures website, where they’ve been discussing these features early on with the development community.  It’s nice to see how seriously they take the feedback they’re getting, and I really think it’s going to have a positive impact on the language as a whole.  Initial thoughts revolved around creating a new block of code with code like dynamic { DynamicStuff.SomeUndefinedProperty = “whatever”; }.

But at the PDC we saw that instead dynamic will be a type for our dynamic objects, and so dynamic lookup of members will only be allowed for those variables.  Anders’ demo showed off interactions with JavaScript and Python, as well as Office via COM, all without the ugly Type.Missing parameters (optional parameter support also played a part in that).  Other ideas revolved around easing Reflection access, and XML document access for Xml nodes dynamically.

Meta-Programming

At the end of his talk, Anders showed a stunning demo of metaprogramming working within C#.  It was an early prototype, so all language features were not supported, but it worked similar to Eval where the code was constructed inside a string and then compiled at runtime.  But it was flexible and powerful enough that he could create delegates to functions that he Eval’ed up into existence.  Someone in the audience asked how this was different from Lisp macros, to which Anders replied: “This is basically Lisp macros.”

Before you get too excited (or worried) about this significant bit of news, Anders made no promises about when metaprogramming would be available, and he subtly suggested that it may very well be a post-4.0 feature.  As he said in the Future of Programming Panel, however: “We’re rewriting the compiler in managed code, and I’d say one of the big motivators there is to make it a better metaprogramming system, sort of open up the black box and allow people to actually use the compiler as a service…”

Regardless of when it arrives, I hope they will give serious consideration to providing syntax checking of this macro or meta code, instead of treating it blindly at compile-time as a “magic string”, as has so long plagued the realm of data access.  After all, one of the primary advantages of Linq is to enable compile-time checking of queries, to enforce not only strict type checking, but to also more fundamentally ensure that data sources and their members are valid.  The irregularity of C#’s syntax, as opposed to Lisp, will make that more difficult (thanks to Paul for pointing this out), but I think most developers will eventually agree it’s a worthwhile cause.  Perhaps support for nested grammars in the generic sense will set the stage for enabling this feature.

Language Syntax Extensions

If metaprogramming is about making the compiler available as a service, language extensions are about making the compiler service transparent and extensible.

The majority (but not all) of the language design panel stressed caution in evolving and customizing language syntax and discussed the importance of syntax at length, but they’ve been considering the demands of the development community seriously.  At times Anders vacillated between trying to offer alternatives and admitting that, in the end, customization of language syntax by developers would prevail; and that what’s important is how we go about enabling those scenarios without destroying our ability to evolve languages usefully, avoiding their collapse from an excess of ambiguity and inconsistency in the grammar.

“Another interesting pattern that I’m very fond of right now in terms of language evolution is this notion that our static languages, and our programming languages in general, are getting to be powerful enough, that with all of these things we’re picking up from functional programming languages and metaprogramming, that you can–in the language itself–build these little internal DSLs, where you use fluent interface style, and you dot together operators, and you have deferred execution… where you can, in a sense, create little mini languages, except for the syntax.

If you look at parallel extensions for .NET, they have a Parallel.For, where you give the start and how many times you want to go around, and a lambda which is the body you want to execute.  And boy, if you squint, that looks like a Parallel For statement.

But it allows API designers to experiment with different styles of programming.  And then, as they become popular, we can pick them up and put syntactic veneers on top of them, or we can work to make languages maybe even richer and have extensible syntax like we talked about, but I’m encouraged by the fact that our languages have gotten rich enough that you do a lot of these things without even having to have syntax.” – Anders Hejlsberg

On one hand, I agree with him: the introduction of lambda expressions and extension methods can create some startling new syntax-like patterns of coding that simply weren’t feasible before.  I’ve written articles demonstrating some of this, such as New Spin on Spawning Threads and especially The Visitor Design Pattern in C# 3.0.  And he’s right: if you squint, it almost looks like new syntax.  The problem is that programmers don’t want to squint at their code.  As Chris Anderson has noted at the PDC and elsewhere, developers are very particular about how they want their code to look.  This is one of the big reasons behind Oslo’s support for authoring textual DSLs with the new MGrammar language.

One idea that came up several times (and which I alluded to above) is the idea of allowing nested languages, in a similar way that Linq comprehensions live inside an isolated syntactic context.  C++ developers can redefine many operators in flexible ways, and this can lead to code that’s very difficult to read.  This can perhaps be blamed on the inability of the C++ language to provide alternative and more comprehensive syntactic extensibility points.  Operators are what they have to work with, so operators are what get used for all kinds of things, which change per type.  But their meaning gets so overloaded, literally, that they lose any obvious (context-free) meaning.

But operators don’t have to be non-alphabetic tokens, and the addition of new keywords or symbols could be introduced in limited contexts, such as a modifier for a member definition in a type (to appear alongside visibility, overload, override, and shadowing keywords), or within a delimited block of code such as an r-value, or a curly-brace block for new flow control constructs (one of my favorite ideas and an area most in need of extensions).  Language extensions might also be limited in scope to specific assemblies, only importing extensions explicitly, giving library authors the ability to customize their own syntax without imposing a mess on consumers of the library.

Another idea would be to allow the final Action delegate parameter of a function to be expressed as a curly-brace-delimited code block following the function call, in lieu of specifying the parameter within parentheses, and removing the need for a semicolon.  For example, with a method defined like this:

public static class Parallel
{
    // Action delegate defined last, to take advantage of C# syntactic sugar
    public static void For(long Start, long Count, Action Action)
    {
        // TODO: implement
    }
}

…a future C# compiler might allow you to write code like this:

Parallel.For(0, 10)
{
    // add code here for the Action delegate parameter
}

As Dr. T points out to me, however, the tricky part will consist of supporting local returns: in other words, when you call return inside that delegate’s code block, you really expect it to return from the enclosing method, not the one defined by the delegate parameter.  Support for continue or break would also make for a more intuitive fit.  If there’s one thing Microsoft does right, it’s language design, and I have a lot of confidence that issues like this will continue to be recognized and ultimately implemented correctly.  In reading their blogs and occasionally sharing ideas with them, it’s obvious they’re as passionate about the language and syntax as I am.

The key for language extensions, I believe, will be to provide more structured extensibility points for syntax (such as control flow blocks), instead of opening up the entire language for arbitrary modification.  As each language opens up some new aspect of its syntax for extension, a number of challenges will surface that will need to be dealt with, and it will be critical to solve these problems before continuing on with further evolution of the language.  Think of all we’ve gained from generics, and the challenges of dealing with a more complex type system we’ve incurred as a result.  We’re still getting updates in C# 4.0 to address shortcomings of generics, such as issues regarding covariance and contravariance.  Ultimately, though, generics were well worth it, and I believe the same will be said of metaprogramming and language extensions.

Looking Forward

I’ll have much more to say on this topic when I talk about Oslo and MGrammar.  The important points to take away from this are that mainstream language designers are taking these ideas to heart now, and there are so many ideas and options out there that we can and will experiment to find the right combination (or combinations) of both techniques and limitations to make metaprogramming and language syntax extensions useful, viable, and sustainable.

Posted in Conferences, Design Patterns, Dynamic Programming, Functional Programming, LINQ, Language Extensions, Metaprogramming, Reflection, Software Architecture | Leave a Comment »

Functional Programming as Intensity of Expression

Posted by Dan Vanderboom on September 20, 2008

On my long drive home last night, I was thinking about the .NET Rocks episode with Ted Neward and Amanda Laucher on F# and functional programming.  Though they’re writing a book on F# together, it seems even they have a hard time clearly articulating what functional programming is all about, and where it’s all headed in terms of mainstream commercial use… aside from scientific and data transformation algorithms, that is (as with the canonical logging example when people explain AOP).

I think the basic error is in thinking that Functional is a Style of programming.  Yet, to say that so-called Imperative-based languages are non-functional is ridiculous.  Not in the sense that they “don’t work”, but that they’re based on Objects “instead of” Functions.

This isn’t much different from the chicken-and-egg problem.  Though the chicken-and-egg conundrum has a simple (but unobvious) answer, it doesn’t really matter whether the root of program logic is a type or a function.  If I write a C# program with a Program class, the Main static function gets called.  Some action is the beginning of a program, so one might argue that functions should be the root-most logical construct.  However, you’d then have to deal with functions containing types as well as types containing functions, and as types can get very large (especially with deep inheritance relationships), you’d have to account for functions being huge, spanning multiple code files, and so on.  There’s also the issue of types being organizational containers for functions (and other members).  Just as we use namespaces to organize our types, so we use types to organize functions.  This doesn’t prevent us from starting execution with a function or thinking of the program’s purpose functionally; it just means that we organize it inside a logical container that we think of as a “thing”.

Does this limit us from thinking of business processes as functional units?  Ted Neward suggests that we’ve been trained to look for the objects in a system, and base our whole design process on that. But this isn’t our only option for how to think about design, even in our so-called imperative languages.  If we’re thinking about it wrong, we can and should change the process; we don’t need to blame our design deficiencies on the trivial fact of which programming construct is the root one.  In fact, there’s no reason we should use any one design principle to the exclusion of others.  Looking for the things in the system is and will remain a valuable approach for discovering and defining database schemas and object models.  The very fact that “functional languages” aren’t perceived as especially useful for stateful components isn’t a fault of a style of programming, but is rather a natural consequence of functions being an incomplete aspect of a general purpose programming language.  Functional is a subset of expressive capability.

Where “functional languages” have demonstrated real value is not in considering functions as root-level constructs (this may ultimately be a mistake), but rather in increasing the flexibility of a language to be much more expressive when defining functions.  Making functions first-class citizens that can be passed as parameters, returned as function values, and stitched together with metaprogramming techniques, is a huge step in the right direction.  The use of simple constructs such as operators to match patterns, reverse the evaluation of functions and the flow of values with piping, and perform complex set- and list-based operations, all increase the expressive intensity and density of the functions in a language.  This can only add to the richness of our existing object models.

Sticking objects together in extensible and arbitrarily complex structures is routine for us, but now we’re seeing a trend toward the same kind of composability in functions.  Of course, even this isn’t new, per se; the environmental forces that demand this power just haven’t become significant enough to require that level of power in mainstream languages, because technology evolution (like evolution in general) tends to work by adapting solutions that are “good enough”.

It’s common to hear how F# is successfully incorporating “both functional and imperative” styles into one language, and this is important because what we need is not so much the transition to a functional style, as I’ve mentioned already, but a growth of greater functional expressiveness and power in existing, successful, object-oriented languages.

So let our best and favorite languages grow, and add greater expressive powers to them, not only for defining functions, but also in declaring data structures, compile-time constraints and guarantees, and anything else that will help to raise the level of abstraction and therefore the productivity with which we can naturally express and fulfill our business needs.

Ultimately, “functional programming” is not a revolutionary idea, but rather an evolutionary step forward.  Even though it’s impact is great, there’s no need to start from scratch, to throw out our old models.  Incompatibility between functional and imperative is an illusion perpetuated by an unclear understanding of their relationship and each aspect’s purpose.

Posted in Design Patterns, Functional Programming, Object Oriented Design, Problem Modeling, Software Architecture | 4 Comments »

Observations on the Evolution of Software Development

Posted by Dan Vanderboom on September 18, 2008

Neoteny in the Growth of Software Flexibility and Power

Neoteny is a biological phenomenon of an organism’s development observed across multiple generations of a species.  According to Wikipedia, neoteny is “the retention, by adults in a species, of traits previously seen only in juveniles”, and accounts for many evolutionary shifts, including the human brain’s ability to remain elastic and malleable later in life than those of our distant ancestors.

So how does this relate to software?  Software is a great deal like an organic species.  The species emerged (not long ago), incubated in a more or less fragile state for a number of decades, and continues to evolve today.  Each software application or system built is a new member of the species, and over the generations they have become more robust, intelligent, and useful.  We’ve even formed a symbiotic relationship with software.

Consider the fact that software running on computers was at one time compiled to machine language code for a specific processor.  With the invention of platform-independent instruction sets and their associated runtimes performing just-in-time compilation (Java’s JVM and .NET Framework’s CLR), we’ve delayed the actual production of machine language code until it’s actually needed on the target machine.  The compiler produces a slightly more abstract representation of the program logic, and an extra translation step at installation or runtime is needed to complete the process to make the software usable.

With the growing popularity of dynamic languages such as Lisp, Python, and the .NET Framework’s upcoming release of its Dynamic Language Runtime (DLR), we’re taking another step of neoteny.  Instead of a compiler generating instruction byte codes, a “compiler for any dynamic language implemented on top of the DLR has to generate DLR abstract trees, and hand it over to the DLR libraries” (per Wikipedia).  These abstract syntax trees (AST), normally an intermediate artifact created deep within the bowels of a traditional compiler (and eventually discarded), are now persisted as compiler output.

Traits previously seen only in juveniles… now retained by adults.  Not too much of a metaphorical stretch!  The question is: how far can we go?  And I think the answer depends on the ability of hardware to support the additional “just in time” processing that needs to occur, executing more of the compiler’s tail-end tasks within the execution runtime itself, providing programming languages with greater flexibility and power until the compilation stages we currently execute at design-time almost entirely disappear (to be replaced, perhaps, by new pre-processing tasks.)

I remember my Turbo Pascal compiler running on a 33 MHz processor with 1 MB of RAM, and now my cell phone runs at 620 MHz (with a graphics accelerator) and has gigabytes of memory and storage.  And yet with the state of things today, the inclusion of language-specific compilers within the runtime is still quite infeasible.  In the .NET Framework, there are too many potential languages that people might attempt to include in such a runtime: C#, F#, VB, Boo, IronPython, etc.  Trying to cram all of those compilers into a universal runtime that would fit (and perform well) on a cell phone or other mobile device isn’t yet feasible, which is why we have technologies with approaches like System.Reflection.Emit (on the full .NET Framework), and Mono.Cecil (which works on Compact Framework as well).  These work at the platform-independent CIL level, and so can interpret and generate programs generically, interact with each others’ components, and so on.  One metaprogramming mechanism can therefore be reused across all .NET languages, and this metalinguistic programming trend is being discussed on the C# and other language design teams.

I’ve just started using Mono.Cecil, chosen because it is cross-platform friendly (and open source).  The API isn’t very intuitive, but because the source is available, and because extension methods can go a long way to making it more accessible, it’s a great option.  The documentation is sparse, and assembly generation has some performance issues, but it’s a work-in-progress with tremendous potential.  If you’re doing any kind of static analysis or have any need to dynamically generate and consume types and assemblies (to get around language limitations, for example), I’d encourage you to check it out.  A comparison of Mono.Cecil to System.Reflection can be found here.  Another library called LinFu, which performs lots of mind-bending magic and actually uses Mono.Cecil, is also worth exploring.

VB10 will supposedly be moving to the DLR to become a truly dynamic language, which considering their history of support for late binding, makes a lot of sense.  With a dynamic language person on the C# 4.0 team (Jim Hugunin from IronPython), one wonders if C# won’t eventually go the same route, while keeping its strongly-typed feel and IDE feedback mechanisms.  You might laugh at the idea of C# supporting late binding (dynamic lookup), but this is being planned regardless of the language being static or dynamic.

As the DLR evolves, performance optimizations are being discovered and implemented that may close the gap between pre-compiled and dynamically interpreted languages.  Combine this with manageable concurrent execution, and the advantages we normally attribute to static languages may soon disappear altogether.

The Precipitous Growth of Software System Complexity

We’re truly on the cusp of a precipitous period of growth for software complexity, as an exploding array of devices and diverse platforms around the world connect in an ever-more immersive Internet.  Taking full advantage of parallel and distributed computing environments by solving the challenges of concurrency and coordination, as well as following the trend toward increased integration among software components, is pushing software complexity into new orders of magnitude.  The strategies we come up with for organizing these systems will have to take several key factors into consideration, and we will have to raise the level of abstraction to a point that may be hard for us to imagine with our existing tools and languages.

One aspect that’s clear is the rise of declarative or intention-based syntax, whether represented as XML, Domain Specific Langauges (DSL), attribute decoration, or a suite of new visual modeling editors.  This is in part a consequence of raising the abstraction level, as lower-level libraries are entrusted to solve common problems and take advantage of common opportunities.

Another is the use of Inversion of Control (IoC) containers and dependency injection in component based architectures, thereby standardizing the lifecycle of the application and its components, and providing a common environment or ecosystem for all of its components, as well as introducing a common protocol for component location, creation, access, and disposal.  This level of consistency is valuable for sharing a common understanding of how to troubleshoot software components.  The more predictable a component’s interaction with the rest of the system, the easier it is to debug and modify; conversely, the more unique it and its communication system is, the more disparity there is among components, and the more difficult to understand and modify without introducing errors.  If software is a species and applications are individuals, then components are the cells of a system.

Even the introduction of functional programming languages into the mainstream over the past couple years is due, in part, to the ability of those languages to provide more declarative support, more syntactic flexibility, and new ways of dealing with concurrency and coordination issues (such as immutable values) and light-weight, ad hoc data structures (tuples).

Balancing the Forces of Coupling, Cohesion, and Modularity

On a fundamental level, the more that components are independent, the less coupled and the more modular and flexible they are.  But the more they can communicate with and are allowed to benefit from each other, the more interdependent they become.  This adds to cohesiveness and synergy, but also stronger coupling to a community of abstractions.

A composition of services has layers and segments of interdependence, and while there are dependencies, these should be dependencies on abstractions (interfaces and not implementations).  Since there will be at least one implementation of each service, and the extensibility exists to build others as needed, dependency is only a liability when the means for fulfilling it are not extensible.  Both sides of a contract need to be fulfilled regardless; service-oriented or component-based designs merely provide a mechanism for each side to implement and fulfill its part of the contract, and ideally the system also provides a discovery mechanism for the service provider to publish its availability for other components to discover and consume it.

If you think about software components as a hierarchy or tree of services, with services of one layer depending on more root services, it’s easy to see how this simplifies the perpetual task of adding new and revising existing functionality.  You’re essentially editing an outline, and you have opportunities to move services around, reorganize dependencies easily, and have many of the details of the software’s complexity absorbed into this easy-to-use outline structure (and its supporting infrastructure).  Systems of arbitrary complexity become feasible, and then relatively routine.  There’s a somewhat steep learning curve to get to this point, but once you’ve crossed it, your opportunities extend endlessly for no additional mental cost.  At least not in terms of how to compose your system out of individual parts.

Absorbing Complexity into Frameworks

The final thing I want to mention is that a rise in overall complexity doesn’t mean that the job of software developers necessarily has to become more difficult than it is currently.  With the proper design of components that abstract away the complexity into reusable frameworks with intuitive interfaces, developers at the business logic level don’t need to be aware of the inner complexity, in the same way that software developers are largely absolved of the responsibility of thinking about the processor’s inner workings.  As we build our technology stack higher and higher, like the famed Tower of Babel, we must make sure that it’s organized and structured in a way to support that upward growth and the load imposed upon it… so it doesn’t come crashing down.

The requirements for building components tomorrow will not be the same as they were yesterday.  As illustrated in this account of the effort involved in a feature change at Microsoft, in the future, we will also want to consider issues such as tool-assisted refactorability (and patterns that frustrate this, such as “magic strings”), and due to an explosion of component libraries, discoverability of types, members, and their use.

A processor can handle any complexity of instruction and data flow.  The trick is in organizing all of this in a way that other developers can understand and work with.

Posted in Compact Framework, Component Based Engineering, Concurrency, Design Patterns, Development Environment, Distributed Architecture, Functional Programming, Mobile Devices, Object Oriented Design, Problem Modeling, Reflection, Service Oriented Architecture, Software Architecture, Visual Studio | Leave a Comment »

Concurrency & Coordination With Futures in C#

Posted by Dan Vanderboom on July 3, 2008

A future is a proxy or placeholder for a value that may not yet be known, usually because the calculation is time consuming.  It is used as a synchronization construct, and it is an effective way to define dependencies among computations that will execute when all of their factors have been calculated, or in other words, to construct an expression tree with each node potentially computing in parallel.  According to the Wikipedia article on futures and promises, using them can dramatically reduce latency in distributed systems.

Damon pointed out that the Parallel Extensions library contains a Future<T> class, so I started looking around for examples and explanations of how they work, what the syntax is like, and I ran across a frightening example implementing the asynchronous programming model with Future<T>, as well as going in the other direction, wrapping an APM implementation with Future<T>.  Other articles give pretty good explanations but trivial examples.  From what I gathered briefly, the ContinueWith method for specifying the next step of calculation to process doesn’t seem to provide an intuitive way to indicate that several calculations may be depending upon the current one (unless it can be called multiple times?).  Using ContinueWith, you’re always specifying forward the calculation task that depends on the current future object.  It also surprised me a little that Future inherits from Task, because my understanding of a future is that it’s primarily defined as a value-holding object.  But considering that a future really holds an expression that needs to be calculated, making Future a Task doesn’t seem so odd.

So I decided to implement my own Future<T> class before looking at the parallel extensions library too deeply.  I didn’t want to prejudice my solution, because I wanted to make an exercise of it and see what I would naturally come up with.

Though I tried avoiding prejudice, I still wound up characterizing it in my head as an task, and thought that a future would simply be a pair of Action and Reaction methods (both of the Action delegate type).  The Action would execute and could do whatever it liked, including evaluate some expression and store it in a variable.  If the Action completed, the Reaction method (a continuation) would run, and these could be specified using lambdas.  Because I was storing the results in a local variable (result), swallowed up and made accessible with a closure, I didn’t see a need for a Value property in the future and therefore no need to make the type generic.  Ultimately I thought it silly to have a Reaction method, since anything you needed to happen sequentially after a successful Action, you could simply store at the end of the Action method itself.

FutureTask task = new FutureTask(
    () => result = CalculatePi(10),
    () => new FutureTask(
        () => result += "...",
        () => Console.WriteLine(result),
        ex2 => Console.WriteLine("ex2: " + ex2.Message)),
    ex1 => Console.WriteLine("ex1: " + ex1.Message));

The syntax is what I was most concerned with, and as I started playing around with nesting of futures to compose my calculation, I started to feel like I was onto something.  After all, it was almost starting to resemble some of the F# code I’ve been looking at, and I took that style of functional composition to be a good sign.  As you can see from the code above, I also include a constructor parameter of type Action<Exception> for compensation logic to run in the event that the original action fails.  (The result variable is a string, CalculatePi returns a string, and so the concatenation of the ellipsis really does make sense here.)

The problem that started nagging me was the thought that a composite computation of future objects might not be able to be defined all in one statement like this, not building the dependency tree from the bottom up.  You can really only define the most basic factors (the leaf nodes of a dependency tree) at the beginning this way, and then the expressions that depend directly upon those leaf nodes, etc.  What if you have 50 different starting future values, and you can only proceed with the next step in the calculation once 5 of those specific futures have completed evaluation?  How would you express those dependencies with this approach?

That’s when I started to think about futures as top-down hierarchical data container objects, instead of tasks that have pointers to some next task in a sequence.  I created a Future<T> class whose constructor takes an optional name (to aid debugging), a method of type Func<T> (which is a simple expression, supplied as a lambda in my examples), and finally an optional params list of other Future<T> objects on which that future value depends.

The first two futures in the code below start calculating pi (3.1415926535) and omega (which I made up to be a string of 9s).  They have no dependencies, so they can start calculating right away.  The paren future has two dependencies, supplied as two parameters at the end of the argument list: pi and omega.  You can see that the values pi.Value and omega.Value are used in the expression, which will simply surround the concatenated string value with parentheses and spaces.

var pi = new Future<string>("pi", () => CalculatePi(10));
var omega = new Future<string>("omega", () => CalculateOmega());

var paren = new Future<string>("parenthesize", () => Parenthesize(pi.Value + " < " + omega.Value), pi, omega);

var result = new Future<string>("bracket", () => Bracket(paren.Value), paren);

Finally, the result future has a dependency on the paren future.  This surrounds the result of paren.Value with brackets and spaces.  Because the operations here are trivial, I’ve added Thread.Sleep statements to all of these methods to simulate more computationally expensive work.

Dependencies Among Futures

The program starts calculating pi and omega concurrently, and then immediately builds the paren future, which because of its dependencies waits for completion of the pi and omega futures.  But it doesn’t block the thread.  Execution continues immediately to build the result future, and then moves on to the next part of the program.  When each part of the expression, each future, completes, it will set a Complete boolean property to true and invoke a Completed event.  Any attempt to access the Value property of one of these futures will block the thread until it (and all of the futures it depends on) have completed evaluation.

Furthermore, if an exception occurs, all of the futures that depend on it will no longer attempt to evaluate, and the exceptions will be thrown as part of an AggregateException when accessing the Value property.  This AggregateException contains all of the individual exceptions that were thrown as part of evaluating each future expression.  If both pi and omega fail, result should be able to hand me a list of all Exceptions below it in the tree structure that automatically gets formed.

There are two bits of code I added as icing on this cake.  The first is the use of the implicit operator to convert a variable of type Future<T> to type T.  In other words, if you have a Future<string> called result, you can now pass result into methods where a string parameter is expected, etc.  In the code listing at the end of the article, you’ll notice that I reference pi and omega instead of pi.Value and omega.Value (as in the code snippet above).

public static implicit operator T(Future<T> Future)
{
    return Future.Value;
}

The other helpful bit is an override of ToString, which allows you to hover over a future variable in debug mode and see its name (if you named it), whether it’s Complete or Incomplete, and any errors encountered during evaluation.

public override string ToString()
{
    return Name + ", " + (Complete ? "Complete" : "Incomplete") + (Error != null ? "Error=" + Error.Message : string.Empty);
}

Debug Experience of Future

What I’d really like to do is have the ability to construct this composite expression in a hierarchical form in the language, with a functionally composed syntax, replacing any parameter T with a Future<T>, something like this:

var result = new Future<string>("bracket", () => Bracket(
    new Future<string>("parenthesize", () => Parenthesize(
        new Future<string>("pi", () => CalculatePi(10))
        + " < " +
        new Future<string>("omega", () => CalculateOmega())
    ))
));

The Bracket and Parenthesize methods both require a string, but I give them an object that will at some point (“in the future”) evaluate to a string.  Another term used for future is promise, although there is a distinction in some languages that support both, but you can think in terms of giving those methods the promise that they’ll get a string later, at which time they can proceed with their own evaluation.  This effectively creates lazy evaluation, sometimes referred to as normal-order evaluation.

There are a few problems with this code, however.  First of all, though it’s composed functionally from the top down and returns the correct answer, it takes too long to do it: about 8 seconds instead of 4.  That means it’s processing all of the steps sequentially.  This happens because the future objects we’re handing to the Parenthesize and Bracket methods have to be converted from Future<string> to string before they can be evaluated in the expression, and doing that activates the implicit operator, which executes the Value property getter.  This completely destroys the asynchronous behavior we’re going for, by insisting on resolving it immediately with the wait built-into the Value property.  The string concatenation expression evaluates sequentially one piece at a time, and when that’s done, the next level up evaluates, and so on.

The solution is to declare our futures as factors we depend on at each level, which start them executing right away due to C#’s order of evaluation, and declare the operations we want to perform in terms of those predefined futures.  After a few hours of rearranging definitions, declaration order, and experimenting with many other details (including a brief foray into being more indirect with Func<Future<T>>), this is the working code I came up with:

Future<string> FuturePi = null, FutureOmega = null, FutureConcat = null, FutureParen = null;

var result = new Future<string>(
    () => Bracket(FutureParen),
    (FutureParen = new Future<string>(
        () => Parenthesize(FutureConcat),
        (FutureConcat = new Future<String>(
            () => FuturePi + " < " + FutureOmega,
            (FuturePi = new Future<string>(() => CalculatePi(10))),
            (FutureOmega = new Future<string>(() => CalculateOmega()))
        ))
    ))
);

In F# and other more functional languages, I imagine we could use let statements to define and assign these variables as part of the overall expression, instead of having to define the variables in a separate statement as shown here.

The Future<T> class I wrote works fairly well for exploration and study of futures and the possible syntax to define them and access their values, and I’ll share it so that you can experiment with it if you like, but understand that this is (even more so than usual) not production ready code.  I’m making some very naive assumptions, not taking advantage of any task managers or thread pools, there is no intelligent scheduling going on, and I haven’t tested this in any real world applications.  With that disclaimer out of the way, here it is, complete with the consuming test code.

using System;
using System.Collections.Generic;
using System.Threading;
using System.Linq;

namespace FutureExpressionExample
{
    class Program
    {
        static void Main(string[] args)
        {
            DateTime StartTime = DateTime.Now;

            Future<string> FuturePi = null, FutureOmega = null, FutureConcat = null, FutureParen = null;

            var result = new Future<string>("bracket",
                () => Bracket(FutureParen),
                (FutureParen = new Future<string>("parenthesize",
                    () => Parenthesize(FutureConcat),
                    (FutureConcat = new Future<String>("concat",
                        () => FuturePi + " < " + FutureOmega,
                        (FuturePi = new Future<string>("pi", () => CalculatePi(10))),
                        (FutureOmega = new Future<string>("omega", () => CalculateOmega()))
                    ))
                ))
            );

            /* Alternative

            // first group of expressions evaluating in parallel
            var pi = new Future<string>("pi", () => CalculatePi(10));
            var omega = new Future<string>("omega", () => CalculateOmega());

            // a single future expression dependent on all of the futures in the first group
            var paren = new Future<string>("parenthesize", () => Parenthesize(pi + " < " + omega), pi, omega);

            // another single future expression dependent on the paren future
            var result = new Future<string>("bracket", () => Bracket(paren), paren);

            */

            Console.WriteLine("Do other stuff while calculation occurs...");

            try
            {
                Console.WriteLine("\n" + result);
            }
            catch (AggregateException ex)
            {
                Console.WriteLine("\n" + ex.Message);
            }

            TimeSpan ts = DateTime.Now - StartTime;
            Console.WriteLine("\n" + ts.TotalSeconds.ToString() + " seconds");

            Console.ReadKey();
        }

        static string CalculatePi(int NumberDigits)
        {
            //throw new ApplicationException("Failed to calculate Pi");
            Thread.Sleep(3000);
            return "3.1415926535";
        }

        static string CalculateOmega()
        {
            //throw new ApplicationException("Failed to calculate Omega");
            Thread.Sleep(3000);
            return "999999999999999";
        }

        static string Parenthesize(string Text)
        {
            Thread.Sleep(500);
            return "( " + Text + " )";
        }

        static string Bracket(string Text)
        {
            Thread.Sleep(500);
            return "[ " + Text + " ]";
        }
    }

    public class Future<T> : IDisposable
    {
        public string Name { get; set; }
        public bool Complete { get; protected set; }
        public Exception Error { get; protected set; }

        protected Func<T> Expression { get; set; }

        protected List<Future<T>> Factors;
        protected List<Future<T>> FactorsCompleted;
        protected List<Future<T>> FactorsFailed;

        public event Action<Future<T>> Completed;
        protected void OnCompleted()
        {
            Complete = true;

            if (Completed != null)
                Completed(this);
        }

        private T _Value;
        public T Value
        {
            get
            {
                // block until complete
                while (!Complete)
                {
                    Thread.Sleep(1);
                }

                if (Exceptions.Count > 0)
                    throw new AggregateException(Exceptions);

                return _Value;
            }
            private set { _Value = value; }
        }

        public List<Exception> Exceptions
        {
            get
            {
                var list = new List<Exception>();

                foreach (Future<T> Factor in Factors)
                {
                    list.AddRange(Factor.Exceptions);
                }

                if (Error != null)
                    list.Add(Error);

                return list;
            }
        }

        public static implicit operator T(Future<T> Future)
        {
            return Future.Value;
        }

        // naming a Future is optional
        public Future(Func<T> Expression, params Future<T>[] Factors) : this("<not named>", Expression, Factors) { }

        public Future(string Name, Func<T> Expression, params Future<T>[] Factors)
        {
            this.Name = Name;
            this.Expression = Expression;
            this.Factors = new List<Future<T>>(Factors);

            FactorsCompleted = new List<Future<T>>();
            FactorsFailed = new List<Future<T>>();

            foreach (Future<T> Factor in this.Factors)
            {
                if (Factor.Complete)
                    FactorsCompleted.Add(Factor);
                else
                    Factor.Completed += new Action<Future<T>>(Factor_Completed);
            }

            // there may not be any factors, or they may all be complete
            if (FactorsCompleted.Count == this.Factors.Count)
                Expression.BeginInvoke(ReceiveCallback, null);
        }

        private void Factor_Completed(Future<T> Factor)
        {
            if (!FactorsCompleted.Contains(Factor))
                FactorsCompleted.Add(Factor);

            if (Factor.Error != null && !FactorsFailed.Contains(Factor))
                FactorsFailed.Add(Factor);

            Factor.Completed -= new Action<Future<T>>(Factor_Completed);

            if (Exceptions.Count > 0)
            {
                Dispose();
                OnCompleted();
                return;
            }

            if (FactorsCompleted.Count == Factors.Count)
                Expression.BeginInvoke(ReceiveCallback, null);
        }

        private void ReceiveCallback(IAsyncResult AsyncResult)
        {
            try
            {
                Value = Expression.EndInvoke(AsyncResult);
            }
            catch (Exception ex)
            {
                Error = ex;
            }

            Dispose();

            // computation is completed, regardless of whether it succeeded or failed
            OnCompleted();
        }

        public void Dispose()
        {
            foreach (Future<T> Factor in Factors)
            {
                Factor.Completed -= new Action<Future<T>>(Factor_Completed);
            }
        }

        // helpful for debugging
        public override string ToString()
        {
            return Name + ", " + (Complete ? "Complete" : "Incomplete") + (Error != null ? ", Error=" + Error.Message : string.Empty);
        }
    }

    public class AggregateException : Exception
    {
        public List<Exception> Exceptions;

        public AggregateException(IEnumerable<Exception> Exceptions)
        {
            this.Exceptions = new List<Exception>(Exceptions);
        }

        public override string Message
        {
            get
            {
                string message = string.Empty;
                foreach (Exception ex in Exceptions)
                {
                    message += ex.Message + "\n";
                }
                return message;
            }
        }
    }
}

Posted in Algorithms, Concurrency, Data Structures, Design Patterns, Functional Programming | Leave a Comment »