Critical Development

Language design, framework development, UI design, robotics and more.

Archive for the ‘LINQ’ Category

Problems with RIA Services (Feedback for July 2009 CTP)

Posted by Dan Vanderboom on November 9, 2009

RIA Services (new home page) is a collection of tools and libraries for making Rich Internet Applications, especially line of business applications, easier to develop.  Brad Abrams did a great presentation of RIA Services at MIX 2009 that touches on querying, validation, authentication, and how to share logic between the server and client sides.  Brad also has a huge series of articles (26 as I write this) on using Silverlight and RIA Services to build a realistic application.

I love the concept of RIA Services.  Brad and his team have done a fantastic job of identifying the critical issues for LOB systems and have the right idea to simplify those common data access tasks through the whole pipeline from database to UI controls, using libraries, Visual Studio tooling, or whatever it takes to get the job done.

So before I lay down some heavy criticism of RIA Services, take into consideration that it’s still a CTP and that my scenario pushes the boundaries of what was likely conceived of for this product, at least for such an early stage.

Shared Data Model with WPF & Silverlight Clients

The cause of so much of my grief with RIA Services has been my need to share a data model, and access to a shared database, across WPF as well as Silverlight client applications.  Within the constraints of this situation, I keep running into problem after problem while trying to use RIA Services productively.

The intuitive thing to do is: define a single data model project that compiles to a single assembly, and then reference that in my Silverlight and non-Silverlight projects.  This would be a 100% full-fidelity shared data model.  As long as the code I wrote was a subset of both Silverlight and normal .NET Frameworks (an intersection), we could share identical types and write complex validation and model manipulation logic, all without having to constrain ourselves to work within the limitations of a convoluted code generation scheme.  Back when I wrote Compact Framework applications, I did this with great success despite the platform gap, and I didn’t have anything like RIA Services to help.

Incompatible Assemblies

Part of the problem arises because Silverlight assemblies are incompatible with non-Silverlight assemblies.  A lot of what RIA Services is doing is trying to find a way around this limitation: picking up attributes and code files from one project and inserting that code into the Silverlight project with a build action.  This Visual Studio “magic” has been criticized for its weakness in dealing with multiple-solution systems where Visual Studio can’t update the client because it’s not loaded, and I’ve heard there’s work being done to address this, but for my current needs, this magic aspect of it isn’t a problem.  The specifics of how it works, however, are.

Different Data Access APIs

Accessing entities requires a different API in Silverlight via RiaContextBase versus ObjectContext elsewhere.  Complex logic in the model (for validation and other actions against the model) requires access to other entities and therefore access to the current object context, but the context APIs for Silverlight and WPF are very different.  Part of this has to do with Silverlight’s inability to make synchronous calls to the server.

In significantly large systems that I build, I use validation logic such as “this entity is valid if it’s pointing to an entity of a different type that contains a PropertyX value of Y”.  One of my tables stores a tree of data, so I have methods for loading entire subtrees and ensuring that no circular references exist.  For these kinds of tasks, I need access to the data context in basic validation methods.  When I delete nodes from a tree, I need to delete child nodes, so update logic is part of the model that needs to be the same in every client.  I don’t want to define that multiple times for multiple clients.  I like to program very DRY.  In other words, I find myself in need of a shared model.

RIA Services doesn’t provide anything like type equivalence for a shared model, however.  Data model classes in Silverlight inherit from Entity, but EntityObject in WPF.  In the RIA Services domain context, we RaiseDataMemberChanged, but in a normal EF object context, we need to ReportPropertyChanged.  In WPF, I can call MyEntity.Load(MergeOption.PreserveChanges), but in Silverlight there’s no Load method on the entity and no MergeOption enum.  In WPF I can query against context.SomeEntitySet, but in Silverlight you would query against context.GetSomeEntitySetQuery() and then execute the query with another method call.

This chasm of disparity makes all but the simplest shared model logic impractical and frustrating.  The code generation technique, though good in principle, keeps getting in the way.  For example, I have both parameterless and parameterized constructors in my entity classes.  This works great in my WPF client, but when this code is synchronized to my Silverlight client, I get an error because the Silverlight-side entity class is generated in two parts: in the hidden partial class, a parameterless constructor is generated which calls partial method OnCreated; and in the visible partial class, the constructor method I defined on the server is dumped into another file, so I have duplicate constructors.  If I remove the parameterless constructor from the server side, I get an error because my entity class requires a parameterless constructor (and defining a non-default constructor effectively removes the default one from the resulting type unless it’s explicitly defined).  I thought I could define the partial method OnCreated and put my construction logic in there, but the partial method is only defined on the client side.  That means sharing construction logic consists of copying and pasting the OnCreated method across the various clients—far from an ideal solution.

Entity Data Model Required to be in Web Project

Another strategy I attempted was to define the .edmx file and my partial class extensions in a class library, and then reference that from the web project.  I could define the LinqToEntitiesDomainService<MyDataContext>, but sharing entity class code (by generating code in the Silverlight project) isn’t possible unless the .edmx file and partial class extensions are defined in the web project itself.  This would mean that my WPF client would have to reference a web project for data access, which by itself seems wrong.  (Or making a copy of the data model, which is worse.)  It would be better for the WPF client to talk to the same domain service as the Silverlight client, but RIA Services doesn’t give you an option to link that web project to a non-Silverlight project, so again I ran into a brick wall.

So Don’t Do That

The kind of advice I’m getting for this is, “so don’t do that”.  In other words, don’t write complex validation logic in the model or otherwise try to access the data context; don’t write parameterized constructors; don’t aim for 100% type fidelity across all endpoints of a system; don’t try to share data models with Silverlight and non-Silverlight projects, etc.  But I see the potential for RIA Services, so I have to push for these things unless I hear really convincing arguments against them (or compelling alternatives).

Conclusion

The fact that there are different data contexts and data item definitions within those contexts imposes a burden on the developer to use different techniques for each environment, and creates challenges for centralizing data model logic and reusing equivalent logic across different kinds of clients.  My gut feeling is that RIA Services in its current form has some fundamental design flaws that will need to be addressed, taking into consideration systems with a mix of Silverlight, WPF, and other clients, before it becomes a truly robust data access platform.

Posted in Data Structures, Design Patterns, Distributed Architecture, LINQ, RIA Services, Software Architecture | 3 Comments »

Better Tool Support for .NET

Posted by Dan Vanderboom on September 7, 2009

Productivity Enhancing Tools

Visual Studio has come a long way since its debut in 2002.  With the imminent release of 2010, we’ll see a desperately-needed overhauling of the archaic COM extensibility mechanisms (to support the Managed Package Framework, as well as MEF, the Managed Extensibility Framework) and a redesign of the user interface in WPF that I’ve been pushing for and predicted as inevitable quite some time ago.

For many alpha geeks, the Visual Studio environment has been extended with excellent third-party, productivity-enhancing tools such as CodeRush and Resharper.  I personally feel that the Visual Studio IDE team has been slacking in this area, providing only very weak support for refactorings, code navigation, and better Intellisense.  While I understand their desire to avoid stepping on partners’ toes, this is one area I think makes sense for them to be deeply invested in.  In fact, I think a new charter for a Developer Productivity Team is warranted (or an expansion of their team if it already exists).

It’s unfortunately a minority of .NET developers who know about and use these third-party tools, and the .NET community as a whole would without a doubt be significantly more productive if these tools were installed in the IDE from day one.  It would also help to overcome resistance from development departments in larger organizations that are wary of third-party plug-ins, due perhaps to the unstable nature of many of them.  Microsoft should consider purchasing one or both of them, or paying a licensing fee to include them in every copy of Visual Studio.  Doing so, in my opinion, would make them heroes in the eyes of the overwhelming majority of .NET developers around the world.

It’s not that I mind paying a few hundred dollars for these tools.  Far from it!  The tools pay for themselves very quickly in time saved.  The point is to make them ubiquitous: to make high-productivity coding a standard of .NET development instead of a nice add-on that is only sometimes accepted.

Consider just from the perspective of watching speakers at conferences coding up samples.  How many of them don’t use such a tool in their demonstration simply because they don’t want to confuse their audience with an unfamiliar development interface?  How many more demonstrations could they be completing in the limited time they have available if they felt more comfortable using these tools in front of the masses?  You know you pay good money to attend these conferences.  Wouldn’t you like to cover significantly more ground while you’re there?  This is only likely to happen when the tool’s delivery vehicle is Visual Studio itself.  Damon Payne makes a similar case for the inclusion of the Managed Extensibility Framework in .NET Framework 4.0: build it into the core and people will accept it.

The Gorillas in the Room

CodeRush and Resharper have both received recent mention in the Hanselminutes podcast (episode 196 with Mark Miller) and in the Deep Fried Bytes podcast (episode 35 with Corey Haines).  If you haven’t heard of CodeRush, I recommend watching these videos on their use.

For secondary information on CodeRush, DXCore, and the principles with which they were designed, I recommend these episodes of DotNetRocks:

I don’t mean to be so biased toward CodeRush, but this is the tool I’m personally familiar with, has a broader range of functionality, and it seems to get the majority of press coverage.  However, those who do talk about Resharper do speak highly of it, so I recommend you check out both of them to see which one works best for you.  But above all: go check them out!

Refactor – Rename

Refactoring code is something we should all be doing constantly to avoid the accumulation of technical debt as software projects and the requirements on which they are based evolve.  There are many refactorings in Visual Studio for C#, and many more in third-party tools for several languages, but I’m going to focus here on what I consider to be the most important refactoring of them all: Rename.

Why is Rename so important?  Because it’s so commonly used, and it has such far-reaching effects.  It is frequently the case that we give poor names to identifiers before we clearly understand their role in the “finished” system, and even more frequent that an item’s role changes as the software evolves.  Failure to rename items to accurately reflect their current purpose is a recipe for code rot and greater code maintenance costs, developer confusion, and therefore buggy logic (with its associated support costs).

When I rename an identifier with a refactoring tool, all of the references to that identifier are also updated.  There might be hundreds of references.  In the days before refactoring tools, one would accomplish this with Find-and-Replace, but this is dangerous.  Even with options like “match case” and “match whole word”, it’s easy to rename the wrong identifiers, rename pieces of string literals, and so on; and if you forget to set these options, it’s worse.  You can go through each change individually, but that can take a very long time with hundreds of potential updates and is a far cry from a truly intelligent update.

Ultimately, the intelligence of the Rename refactoring provides safety and confidence for making far-reaching changes, encouraging more aggressive refactoring practices on a more regular basis.

Abolishing Magic Strings

I am intensely passionate about any tool or coding practice that encourages refactoring and better code hygiene.  One example of such a coding practice is the use of lambda expressions to select identifiers instead of using evil “magical strings”.  From my article on dynamically sorting Linq queries, the use of “magic strings” would force me to write something like this to dynamically sort a Linq query:

Customers = Customers.Order("LastName").Order("FirstName", SortDirection.Descending);

The problem here is that “LastName” and “FirstName” are oblivious to the Rename refactoring.  Using the refactoring tool might give me a false sense of security in thinking that all of my references to those two fields have been renamed, leading me to The Pit of Despair.  Instead, I can define a function and use it like the following:

public static IOrderedEnumerable<T> Order<T>(this IEnumerable<T> Source, 
    Expression<Func<T, object>> Selector, SortDirection SortDirection)
{
    return Order(Source, (Selector.Body as MemberExpression).Member.Name, SortDirection);
}

Customers = Customers.Order(c => c.LastName).Order(c => c.FirstName, SortDirection.Descending);

This requires a little understanding of the structure of expressions to implement, but the benefit is huge: I can now use the refactoring tool with much greater confidence that I’m not introducing subtle reference bugs into my code.  For such a simple example, the benefit is dubious, but multiply this by hundreds or thousands of magic string references, and the effort involved in refactoring quickly becomes overwhelming.

Coding in this style is most valuable when it’s a solution-wide convention.  So long as you have code that strays from this design philosophy, you’ll find yourself grumbling and reaching for the inefficient and inelegant Find-and-Replace tool.  The only time it really becomes an issue, then, is when accessing libraries that you have no control over, such as the Linq-to-Entities and the Entity Framework, which makes extensive use of magic strings.  In the case of EF, this is mitigated somewhat by your ability to regenerate the code it uses.  In other libraries, it may be possible to write extension methods like the Order method shown above.

It’s my earnest hope that library and framework authors such as the .NET Framework team will seriously consider alternatives to, and an abolition of, “magic strings” and other coding practices that frustrate otherwise-powerful refactoring tools.

Refactoring Across Languages

A tool is only as valuable as it is practical.  The Rename refactoring is more valuable when coding practices don’t frustrate it, as explained above.  Another barrier to the practical use of this tool is the prevalence of multiple languages within and across projects in a Visual Studio solution.  The definition of a project as a single-language container is dubious when you consider that a C# or VB.NET project may also contain HTML, ASP.NET, XAML, or configuration XML markup.  These are all languages with their own parsers and other language services.

So what happens when identifiers are shared across languages and a Rename refactoring is executed?  It depends on the languages involved, unfortunately.

When refactoring a C# class in Visual Studio, the XAML’s x:Class value is also updated.  What we’re seeing here is cross-language refactoring, but unfortunately it only works in one direction.  There is no refactor command to update the x:Class value from the XAML editor, so manually changing it causes my C# class to become sadly out of sync.  Furthermore, this seems to be XAML specific.  If I refactor the name of an .aspx.cs class, the Inherits attribute of the Page directive in the .aspx file doesn’t update.

How frequent do you think it is that someone would want to change a code-behind file for an ASP.NET page, and yet would not want to change the Inherits attribute?  Probably not very common (okay, probably NEVER).  This is a matter of having sensible defaults.  When you change an identifier name in this way, the development environment does not respond in a sensible way by default, forcing the developer to do extra work and waste time.  This is a failure in UI design for the same reason that Intellisense has been such a resounding success: Intellisense anticipates our needs and works with us; the failure to keep identifiers in sync by default is diametrically opposed to this intelligence.  This represents a fragmented and inconsistent design for an IDE to possess, thus my hope that it will be addressed in the near future.

The problem should be recognized as systemic, however, and addressed in a generalized way.  Making individual improvements in the relationships between pairs of languages has been almost adequate, but I think it would behoove us to take a step back and take a look at the future family of languages supported by the IDE, and the circumstances that will quickly be upon us with Microsoft’s Oslo platform, which enables developers to more easily build tool-supported languages (especially DSLs, Domain Specific Languages). 

Even without Oslo, we have seen a proliferation of languages: IronRuby, IronPython, F#, and the list goes on.  A refactoring tool that is hard-coded for specific languages will be unable to keep pace with the growing family of .NET and markup languages, and certainly unable to deal with the demands of every DSL that emerges in the next few years.  If instead we had a way to identify our code identifiers to the refactoring tool, and indicate how they should be bound to identifiers in other languages in other files, or even other projects or solutions, the tools would be able to make some intelligent decisions without understanding each language ahead of time.  Each language’s language service could supply this information.  For more information on Microsoft Oslo and its relationship to a world of many languages, see my article on Why Oslo Is Important.

Without this cross-language identifier binding feature, we’ll remain in refactoring hell.  I offered a feature suggestion to the Oslo team regarding this multi-master synchronization of a model across languages that was rejected, much to my dismay.  I’m not sure if the Oslo team is the right group to address this, or if it’s more appropriate for the Visual Studio IDE team, so I’m not willing to give up on this yet.

A Default of Refactor-Rename

The next idea I’d like to propose here is that the Rename refactoring is, in fact, a sensible default behavior.  In other words, when I edit an identifier in my code, I more often than not want all of the references to that identifier to change as well.  This is based on my experience in invoking the refactoring explicitly countless times, compared to the relatively few times I want to “break away” that identifier from all the code that references.

Think about it: if you have 150 references to variable Foo, and you change Foo to FooBar, you’re going to have 150 broken references.  Are you going to create a new Foo variable to replace them?  That workflow doesn’t make any sense.  Why not just start editing the identifier and have the references update themselves implicitly?  If you want to be aware of the change, it would be trivial for the IDE to indicate the number of references that were updated behind the scenes.  Then, if for some reason you really did want to break the references, you could explicitly launch a refactoring tool to “break references”, allowing you to edit that identifier definition separately.

The challenge that comes to mind with this default behavior concerns code that spans across solutions that aren’t loaded into the IDE at the same time.  In principle, this could be dealt with by logging the refactoring somewhere accessible to all solutions involved, in a location they can all access and which gets checked into source control.  The next time the other solutions are loaded, the log is loaded and the identifiers are renamed as specified.

Language Property Paths

If you’ve done much development with Silverlight or WPF, you’ve probably run into the PropertyPath class when using data binding or animation.  PropertyPath objects represent a traversal path to a property such as “Company.CompanyName.Text”.  The travesty is that they’re always “magic strings”.

My argument is that the property path is such an important construct that it deserves to be an core part of language syntax instead of just a type in some UI-platform-specific library.  I created a data binding library for Windows Forms for which I created my own property path syntax and type, and there are countless non-UI scenarios in which this construct would also be incredibly useful.

The advantage of having a language like C# understand property path syntax is that you avoid a whole class of problems that developers have used “magic strings” to solve.  The compiler can then make intelligent decisions about the correctness of paths, and errors can be identified very early in the cycle.

Imagine being able to pass property paths to methods or return then from functions as first-class citizens.  Instead of writing this:

Binding NameTextBinding = new Binding("Name") { Source = customer1; }

… we could write something like this, have access to the Rename refactoring, and even get Intellisense support when hitting the dot (.) operator:

Binding NameTextBinding = new Binding(@Customer.Name) { Source = customer1; }

In this code example, I use the fictitious @ operator to inform the compiler that I’m specifying a property path and not trying to reference a static property called Name on the Customer class.

With property paths in the language, we could solve our dynamic Linq sort problem cleanly, without using lambda expressions to hack around the problem:

Customers = Customers.Order(@Customer.LastName).Order(@Customer.FirstName, SortDirection.Descending);

That looks and feels right to me.  How about you?

Summary

There are many factors of developer productivity, and I’ve established refactoring as one of them.  In this article I discussed tooling and coding practices that support or frustrate refactoring.  We took a deep look into the most important refactoring we have at our disposal, Rename, and examined how to get the greatest value out of it in terms of personal habits, as well as long-term tooling vision and language innovation.  I proposed including property paths in language syntax due to its general usefulness and its ability to solve a whole class of problems that have traditionally been solved using problematic “magic strings”.

It gives me hope to see the growing popularity of Fluent Interfaces and the use of lambda expressions to provide coding conventions that can be verified by the compiler, and a growing community of bloggers (such as here and here) writing about the abolition of “magic strings” in their code.  We can only hope that Microsoft program managers, architects, and developers on the Visual Studio and .NET Framework teams are listening.

Posted in Data Binding, Data Structures, Design Patterns, Development Environment, Dynamic Programming, Functional Programming, Language Innovation, LINQ, Oslo, Silverlight, Software Architecture, User Interface Design, Visual Studio, Visual Studio Extensibility, Windows Forms | Leave a Comment »

Strongly-Typed, Dynamic Linq Order Operator

Posted by Dan Vanderboom on August 20, 2009

A Community Solution

I love social technologies like Stack Overflow, where people can collaborate loosely to share knowledge and help get things done.  Stack Overflow does on a large scale what developer blogs like mine have been doing on a smaller scale: creating a community around the sharing of ideas and methods.

Every once in a while, I get some great feedback that includes a fix for one of my bugs, a performance tweak I didn’t realize was possible, or an extension to some library I left unfinished.

This morning I ran into a question about my dynamic Linq sort, solved and answered by “Ch00k”, allowing one to get compile-time checking of identifier names.  Well done!

(It’s too bad Stack Overflow doesn’t promote the use of real names for professional developers to better market themselves with skill and reputation.)

My original article (with source code) is here.  All I added to the library was this:

public static IOrderedEnumerable<T> Order<T>(this IEnumerable<T> Source, 
    Expression<Func<T, object>> Selector, SortDirection SortDirection)
{
    return Order(Source, (Selector.Body as MemberExpression).Member.Name, SortDirection);
}

To test it, I used this code:

IEnumerable<Customer> Customers = new Customer[] { new Customer("Dan", "Vanderboom"), new Customer("Steve", "Vanderboom"), 
    new Customer("Tracey", "Vanderboom"), new Customer("Sarah", "Barkelew") };

Customers = Customers.Order(c => c.LastName, SortDirection.Ascending);
Customers = Customers.Order(c => c.FirstName, SortDirection.Descending);

foreach (var cust in Customers)
{
    Console.WriteLine(cust.FirstName + " " + cust.LastName);
}

Now I can refactor these data model classes with a tool and all my dynamic sorting queries will stay in sync!

Posted in Collaboration, Design Patterns, Dynamic Programming, Language Extensions, LINQ, Object Oriented Design, Open Source, Social Networking | Tagged: , , , | 3 Comments »

MSDN Developer Conference in Chicago

Posted by Dan Vanderboom on January 13, 2009

I just got home to Milwaukee from the MSDN Developer Conference in Chicago, about two hours drive.  I knew that it would be a rehash of the major technologies revealed at the PDC which I was at in November, so I wasn’t sure how much value I’d get out of it, but I had a bunch of questions about their new technologies (Azure, Oslo, Geneva, VS2010, .NET 4.0, new language stuff), and it just sounded like fun to go out to Fogo de Chao for dinner (a wonderful Brazilian steakhouse, with great company).

So despite my reservations, I’m glad I went.  I think it also helped that I’ve had since November to research and digest all of this new stuff, so that I could be ready with good questions to ask.  There’ve been so many new announcements, it’s been a little overwhelming.  I’m still picking up the basics of Silverlight/WPF and WCF/WF, which have been out for a while now.  But that’s part of the fun and the challenge of the software industry.

Sessions

With some last minute changes to my original plan, I ended up watching all four Azure sessions.  All of the speakers did a great job.  That being said, “A Lap Around Azure” was my least favorite content because it was so introductory and general.  But the opportunity to drill speakers for information, clarification, or hints of ship dates made it worth going.

I was wondering, for example, if the ADO.NET Data Services Client Library, which talks to a SQL Server back end, can also be used to point to a SQL Data Services endpoint in the cloud.  And I’m really excited knowing now that it can, because that means we can use real LINQ (not weird LINQ-like syntax in a URI).  And don’t forget Entities!

I also learned that though my Mesh account (which I love and use every day) is beta, there’s a CTP available for developers that includes new features like tracking of Mesh Applications.  I’ve been thinking about Mesh a lot, not only because I use it, but because I wanted to determine if I could use the synchronization abilities in the Mesh API to sync records in a database.

<speculation Mode=”RunOnSentence”>
If Microsoft is building this entire ecosystem of interoperable services, and one of them does data storage and querying (SQL Data Services), and another does synchronization and conflict resolution (Mesh Services)–and considering how Microsoft is making a point of borrowing and building on existing knowledge (REST/JSON/etc.) instead of creating a new proprietary stack–isn’t it at least conceivable that these two technologies would at some point converge in the future into a cloud data services replication technology?
</speculation>

I’m a little disappointed that Ori Amiga’s Mesh Mobile wasn’t mentioned.  It’s a very compelling use of the Mesh API.

The other concern I’ve had lately is the apparent immaturity of SQL Data Services.  As far as what’s there in the beta, it’s tables without enforceable schemas (so far), basic joins, no grouping, no aggregates, and a need to manually partition across virtual instances (and therefore to also deal with the consequences of that partitioning, which affects querying, storage, etc.).  How can I build a serious enterprise, Internet-scale system without grouping or aggregates in the database tier?  But as several folks suggested and speculated, Data Services will most likely have these things figured out by the time it’s released, which will probably be the second half of 2009 (sooner than I thought).

Unfortunately, if you’re using Mesh to synchronize a list of structured things, you don’t get the rich querying power of a relational data store; and if you use SQL Data Services, you don’t get the ability to easily and automatically synchronize data with other devices.  At some point, we’ll need to have both of these capabilities working together.

When you stand back and look at where things are going, you have to admit that the future of SQL Data Services looks amazing.  And I’m told this team is much further ahead than some of the other teams in terms of robustness and readiness to roll out.  In the future (post 2009), we should have analytics and reporting in the cloud, providing Internet-scale analogues to their SQL Analysis Server and SQL Reporting Services products, and then I think there’ll be no stopping it as a mass adopted cloud services building block.

Looking Forward

The thought that keeps repeating in my head is: after we evolve this technology to a point where rapid UX and service development is possible and limitless scaling is reached in terms of software architecture, network load balancing, and hardware virtualization, where does the development industry go from there?  If there are no more rungs of the scalability ladder we have to climb, what future milestones will we reach?  Will we have removed the ceiling of potential for software and what it can accomplish?  What kind of impact will that have on business?

Sometimes I suspect the questions are as valuable as the answers.

Posted in ADO.NET Data Services, Conferences, Distributed Architecture, LINQ, Mesh, Oslo, Service Oriented Architecture, SQL Analysis Services, SQL Data Services, SQL Reporting Services, SQL Server, Virtualization, Windows Azure | 1 Comment »

Dynamically Composing LINQ OrderBy Clauses in Compact Framework

Posted by Dan Vanderboom on December 19, 2008

[There is a follow-up article to this which makes search properties strongly-typed.]

I’ve received several questions recently about dynamically composing LINQ queries.  Those of us who come from a SQL background and are familiar with embedding queries in strings have had it easy when it comes to composing query logic, although it has always come with a price: those queries are not strongly checked for syntax errors, the existence of members, or type agreement, and there is always SQL injection to be cautious of.

Still, faced with a scenario such as the need to customize the sort order of a query window in an application, sometimes it’s necessary to give up a little validation, or at least push it to a different part of the application.  Imagine an administrative view where a user selects a list, selects the fields to order by, in which direction (ascending or descending), and can reorder which fields are ordered first, second, and so on.  Validation could happen during that interaction.

To keep the example simple, I’ll be using a basic Customer class:

class Customer
{
    public string FirstName, LastName;

    public Customer(string FirstName, string LastName)
    {
        this.FirstName = FirstName;
        this.LastName = LastName;
    }
}

The test method will determine the preferred shape of the API.

class Program
{
    static void Main(string[] args)
    {
        IEnumerable<Customer> Customers = new Customer[] { new Customer("Dan", "Vanderboom"), new Customer("Steve", "Vanderboom"), 
            new Customer("Tracey", "Vanderboom"), new Customer("Sarah", "Barkelew") };

        // these would be run as various iterations of a loop,
        // based on some kind of configuration data
        Customers = Customers.Order("LastName", SortDirection.Ascending);
        Customers = Customers.Order("FirstName", SortDirection.Descending);

        // you can also string them together like this, if you know what they are ahead of time
        // but you'd be better off writing a normal, compiler-checked OrderBy clause, using the standard LINQ API
        //Customers = Customers.Order("LastName", SortDirection.Ascending).Order("FirstName", SortDirection.Descending);
        
        foreach (var cust in Customers)
        {
            Console.WriteLine(cust.FirstName + " " + cust.LastName);
        }

        Console.ReadKey();
    }
}

Curiously, when Customers is defined with the var keyword instead of explicitly defining it as IEnumerable<Customer>, the extension methods don’t work; but by being explicit, the Customer array type is somehow cast or converted into exposing the appropriate interface.

As you can tell from the code, the same Order extension method is called regardless of whether the collection is already ordered or not.  It seems odd that LINQ requires us to call OrderBy followed by ThenBy on all subsequent sorting, and has even more methods for ascending and descending sorts.  By combining all four Enumerable ordering methods into a single Order method, we make it much easier to loop through some kind of data to determine the qualities of the ordering clause.

I found this article on StackOverflow that manipulates expression trees, and the Dynamic LINQ Library–which is pretty nice–does the same sort of thing with some Reflection.Emit thrown in.  The problem with this is that it assumes you have support for Expressions or Reflection.Emit, which isn’t the case in Compact Framework.  So I thought I’d roll up my sleeves and see what I could do to help the poor souls stuck working with LINQ in Compact Framework.  It works for in-memory IEnumerables, but won’t work with query providers that don’t execute the OrderBy expression, but rather read the expression’s metadata to construct a query for the storage engine it was designed for (such as LINQ-to-Objects and SQL Server).

So with that caveat, if you query some data, and find it reasonable or necessary to sort it in memory, this implementation will do the trick:

public enum SortDirection
{
    Ascending, Descending
}

public static class IEnumerableOrderExtension
{
    public static IOrderedEnumerable<T> Order<T>(this IEnumerable<T> Source, string MemberName, SortDirection SortDirection)
    {
        MemberInfo mi = typeof(T).GetField(MemberName);
        if (mi == null)
            mi = typeof(T).GetProperty(MemberName);
        if (mi == null)
            throw new InvalidOperationException("Field or property '" + MemberName + "' not found");

        // get the field or property's type, and make a delegate type that takes a T and returns this member's type
        Type MemberType = mi is FieldInfo ? (mi as FieldInfo).FieldType : (mi as PropertyInfo).PropertyType;
        Type DelegateType = typeof(Func<,>).MakeGenericType(typeof(T), MemberType);

        // we need to call ValueProxy.ReturnValue, which is a generic type
        MethodInfo GenericReturnValueMethod = typeof(ValueProxy).GetMethod("GetValue");
        // it's type parameters are MemberType and <T>, so we "make" a method to bake in those specific types
        MethodInfo GetValueMethod = GenericReturnValueMethod.MakeGenericMethod(MemberType, typeof(T));

        var proxy = new ValueProxy(mi);

        // now create a delegate using the delegate type and method we just constructed
        Delegate GetMethodDelegate = Delegate.CreateDelegate(DelegateType, proxy, GetValueMethod);

        // method name on IEnumerable/IOrderedEnumerable to call later
        string MethodName = null;

        // do we already have at least one sort on this collection?
        if (Source is IOrderedEnumerable<T>)
        {
            if (SortDirection == SortDirection.Ascending)
                MethodName = "ThenBy";
            else
                MethodName = "ThenByDescending";
        }
        else // first sort on this collection
        {
            if (SortDirection == SortDirection.Ascending)
                MethodName = "OrderBy";
            else
                MethodName = "OrderByDescending";
        }

        MethodInfo method = typeof(Enumerable).GetMethods()
            .Single(m => m.Name == MethodName && m.MakeGenericMethod(typeof(int), typeof(int)).GetParameters().Length == 2);

        return method.MakeGenericMethod(typeof(T), MemberType)
            .Invoke(Source, new object[] { Source, GetMethodDelegate }) as IOrderedEnumerable<T>;
    }
}

public class ValueProxy
{
    private MemberInfo Member;

    public T GetValue<T, TKey>(TKey obj)
    {
        if (Member is FieldInfo)
            return (T)(Member as FieldInfo).GetValue(obj);
        else if (Member is PropertyInfo)
            return (T)(Member as PropertyInfo).GetValue(obj, null);

        return default(T);
    }

    public ValueProxy(MemberInfo Member)
    {
        this.Member = Member;
    }
}

 

There are a few things worth mentioning about this approach.  In a nutshell, what we need to do is construct a delegate to a function that will get called later, when the query is actually executed.  (In other words, we’re using Reflection to get a Reflection method that gets values.)  First we need the function to point to, so that’s defined in the ValueProxy class as GetValue, whose job it is to remember which member is being selected, and to query that member’s value when requested.

It’s defined with generics because I want to use it as a generalized mechanism for sorting all types of fields or properties.  Since we’re dealing with generics, we need to know that Func<,> is different from Func<Customer,string>.  We call MakeGenericType on the former, pass in type parameters, and end up with the latter.  The same is true of the difference between GetValue<,> and GetValue<string,Customer> with MakeGenericMethod.

Type DelegateType = typeof(Func<,>).MakeGenericType(typeof(T), MemberType);
MethodInfo GetValueMethod = GenericReturnValueMethod.MakeGenericMethod(MemberType, typeof(T));

Once we have handles to these, we can create our delegate object.

Delegate GetMethodDelegate = Delegate.CreateDelegate(DelegateType, proxy, GetValueMethod);

The ValueProxy objects bind together the method to get the value and the member to get the value from, the latter not being supplied by the OrderBy family of extension methods (which will ultimate call our GetValue method).  Abstracting this away into a proxy is what gives us the necessary indirection to inject the generalization mechanism.

The end of the Order method presents a challenge for running in Compact Framework.  Consider how to select a MethodInfo object from the following options:

public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(this IEnumerable<TSource> source, 
    Func<TSource, TKey> keySelector);
public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(this IEnumerable<TSource> source, 
    Func<TSource, TKey> keySelector, IComparer<TKey> comparer);

We have two overloads, both generic methods, one with two parameters and the other with three.  You might expect, as I did, that one of the Type.GetMethod overloads would include a way to get a MethodInfo object for a specific method, but in the case of generic methods, it’s actually not possible.  Trying to be clever by supplying open generics for the parameters, I attempted this:

MethodInfo mi = typeof(Enumerable).GetMethod("OrderBy", BindingFlags.Public | BindingFlags.Static, null,
    new Type[] { typeof(IEnumerable<>), typeof(Func<,>) }, null);

…but alas, this doesn’t work.  Instead, you have to query all methods with Type.GetMethods, and then somehow filter them out.  In the full .NET Framework, we could do this:

MethodInfo method = typeof(Enumerable).GetMethods()
    .Single(m => m.Name == MethodName && m.GetParameters().Length == 2);

However, in Compact Framework we get a NotSupportedException: "GetParameters for Open Generic methods is not supported."  This suggests that method definitions in the type system are not exactly the same between CF and FF in terms of how generics are defined.  We can get around this problem by calling MakeGenericMethod, but it’s unfortunate that we have to incur the extra overhead just to get a parameter count.  Still, this happens once per sort field instead of something horrible like for once per object in the collection, and with the && being a short-circuit operator, the only time GetParameters will be called is when the method name matches, which brings us to two matches maximum, so it’s really not so bad.  I pass in typeof(int) for both type parameters (not to be confused with the method’s value parameters).  The type parameters used here don’t actually matter; we won’t actually use the OrderBy<int,int> method for anything other than getting a count of its value parameters, which is used to select the overload we want.

MethodInfo method = typeof(Enumerable).GetMethods()
    .Single(m => m.Name == MethodName && m.MakeGenericMethod(typeof(int), typeof(int)).GetParameters().Length == 2);

return method.MakeGenericMethod(typeof(T), MemberType)
    .Invoke(Source, new object[] { Source, GetMethodDelegate }) as IOrderedEnumerable<T>;

So there it is, an unintuitive and round-about way of selecting a method to call, but it works on both Full and Compact Framework.

Happy querying!

Posted in LINQ | 8 Comments »

The Future of Programming Languages

Posted by Dan Vanderboom on November 6, 2008

Two of the best sessions at the PDC this year were Anders Hejlsberg’s The Future of C# and a panel on The Future of Programming.

A lot has been said and written about dynamic programming, metaprogramming, and language syntax extensions–not just academically over the past few decades, but also as a recently growing buzz among the designers and users of mainstream object-oriented languages.

Anders Hejlsberg

Dynamic Programming

After a scene-setting tour through the history and evolution of C#, Anders addressed how C# 4.0 would allow much simpler interoperation between C# and dynamic languages.  I’ve been following Charlie Calvert’s Language Futures website, where they’ve been discussing these features early on with the development community.  It’s nice to see how seriously they take the feedback they’re getting, and I really think it’s going to have a positive impact on the language as a whole.  Initial thoughts revolved around creating a new block of code with code like dynamic { DynamicStuff.SomeUndefinedProperty = “whatever”; }.

But at the PDC we saw that instead dynamic will be a type for our dynamic objects, and so dynamic lookup of members will only be allowed for those variables.  Anders’ demo showed off interactions with JavaScript and Python, as well as Office via COM, all without the ugly Type.Missing parameters (optional parameter support also played a part in that).  Other ideas revolved around easing Reflection access, and XML document access for Xml nodes dynamically.

Meta-Programming

At the end of his talk, Anders showed a stunning demo of metaprogramming working within C#.  It was an early prototype, so all language features were not supported, but it worked similar to Eval where the code was constructed inside a string and then compiled at runtime.  But it was flexible and powerful enough that he could create delegates to functions that he Eval’ed up into existence.  Someone in the audience asked how this was different from Lisp macros, to which Anders replied: “This is basically Lisp macros.”

Before you get too excited (or worried) about this significant bit of news, Anders made no promises about when metaprogramming would be available, and he subtly suggested that it may very well be a post-4.0 feature.  As he said in the Future of Programming Panel, however: “We’re rewriting the compiler in managed code, and I’d say one of the big motivators there is to make it a better metaprogramming system, sort of open up the black box and allow people to actually use the compiler as a service…”

Regardless of when it arrives, I hope they will give serious consideration to providing syntax checking of this macro or meta code, instead of treating it blindly at compile-time as a “magic string”, as has so long plagued the realm of data access.  After all, one of the primary advantages of Linq is to enable compile-time checking of queries, to enforce not only strict type checking, but to also more fundamentally ensure that data sources and their members are valid.  The irregularity of C#’s syntax, as opposed to Lisp, will make that more difficult (thanks to Paul for pointing this out), but I think most developers will eventually agree it’s a worthwhile cause.  Perhaps support for nested grammars in the generic sense will set the stage for enabling this feature.

Language Syntax Extensions

If metaprogramming is about making the compiler available as a service, language extensions are about making the compiler service transparent and extensible.

The majority (but not all) of the language design panel stressed caution in evolving and customizing language syntax and discussed the importance of syntax at length, but they’ve been considering the demands of the development community seriously.  At times Anders vacillated between trying to offer alternatives and admitting that, in the end, customization of language syntax by developers would prevail; and that what’s important is how we go about enabling those scenarios without destroying our ability to evolve languages usefully, avoiding their collapse from an excess of ambiguity and inconsistency in the grammar.

“Another interesting pattern that I’m very fond of right now in terms of language evolution is this notion that our static languages, and our programming languages in general, are getting to be powerful enough, that with all of these things we’re picking up from functional programming languages and metaprogramming, that you can–in the language itself–build these little internal DSLs, where you use fluent interface style, and you dot together operators, and you have deferred execution… where you can, in a sense, create little mini languages, except for the syntax.

If you look at parallel extensions for .NET, they have a Parallel.For, where you give the start and how many times you want to go around, and a lambda which is the body you want to execute.  And boy, if you squint, that looks like a Parallel For statement.

But it allows API designers to experiment with different styles of programming.  And then, as they become popular, we can pick them up and put syntactic veneers on top of them, or we can work to make languages maybe even richer and have extensible syntax like we talked about, but I’m encouraged by the fact that our languages have gotten rich enough that you do a lot of these things without even having to have syntax.” – Anders Hejlsberg

On one hand, I agree with him: the introduction of lambda expressions and extension methods can create some startling new syntax-like patterns of coding that simply weren’t feasible before.  I’ve written articles demonstrating some of this, such as New Spin on Spawning Threads and especially The Visitor Design Pattern in C# 3.0.  And he’s right: if you squint, it almost looks like new syntax.  The problem is that programmers don’t want to squint at their code.  As Chris Anderson has noted at the PDC and elsewhere, developers are very particular about how they want their code to look.  This is one of the big reasons behind Oslo’s support for authoring textual DSLs with the new MGrammar language.

One idea that came up several times (and which I alluded to above) is the idea of allowing nested languages, in a similar way that Linq comprehensions live inside an isolated syntactic context.  C++ developers can redefine many operators in flexible ways, and this can lead to code that’s very difficult to read.  This can perhaps be blamed on the inability of the C++ language to provide alternative and more comprehensive syntactic extensibility points.  Operators are what they have to work with, so operators are what get used for all kinds of things, which change per type.  But their meaning gets so overloaded, literally, that they lose any obvious (context-free) meaning.

But operators don’t have to be non-alphabetic tokens, and the addition of new keywords or symbols could be introduced in limited contexts, such as a modifier for a member definition in a type (to appear alongside visibility, overload, override, and shadowing keywords), or within a delimited block of code such as an r-value, or a curly-brace block for new flow control constructs (one of my favorite ideas and an area most in need of extensions).  Language extensions might also be limited in scope to specific assemblies, only importing extensions explicitly, giving library authors the ability to customize their own syntax without imposing a mess on consumers of the library.

Another idea would be to allow the final Action delegate parameter of a function to be expressed as a curly-brace-delimited code block following the function call, in lieu of specifying the parameter within parentheses, and removing the need for a semicolon.  For example, with a method defined like this:

public static class Parallel
{
    // Action delegate defined last, to take advantage of C# syntactic sugar
    public static void For(long Start, long Count, Action Action)
    {
        // TODO: implement
    }
}

…a future C# compiler might allow you to write code like this:

Parallel.For(0, 10)
{
    // add code here for the Action delegate parameter
}

As Dr. T points out to me, however, the tricky part will consist of supporting local returns: in other words, when you call return inside that delegate’s code block, you really expect it to return from the enclosing method, not the one defined by the delegate parameter.  Support for continue or break would also make for a more intuitive fit.  If there’s one thing Microsoft does right, it’s language design, and I have a lot of confidence that issues like this will continue to be recognized and ultimately implemented correctly.  In reading their blogs and occasionally sharing ideas with them, it’s obvious they’re as passionate about the language and syntax as I am.

The key for language extensions, I believe, will be to provide more structured extensibility points for syntax (such as control flow blocks), instead of opening up the entire language for arbitrary modification.  As each language opens up some new aspect of its syntax for extension, a number of challenges will surface that will need to be dealt with, and it will be critical to solve these problems before continuing on with further evolution of the language.  Think of all we’ve gained from generics, and the challenges of dealing with a more complex type system we’ve incurred as a result.  We’re still getting updates in C# 4.0 to address shortcomings of generics, such as issues regarding covariance and contravariance.  Ultimately, though, generics were well worth it, and I believe the same will be said of metaprogramming and language extensions.

Looking Forward

I’ll have much more to say on this topic when I talk about Oslo and MGrammar.  The important points to take away from this are that mainstream language designers are taking these ideas to heart now, and there are so many ideas and options out there that we can and will experiment to find the right combination (or combinations) of both techniques and limitations to make metaprogramming and language syntax extensions useful, viable, and sustainable.

Posted in Conferences, Design Patterns, Dynamic Programming, Functional Programming, Language Extensions, LINQ, Metaprogramming, Reflection, Software Architecture | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.