Technology Research Highlights: C5, YIIHAW, and MemJet
Posted by Dan Vanderboom on October 7, 2008
When I’m not designing or writing code, I’m usually doing some kind of research. Looking for new gadgets, toys, technologies, groundbreaking research, or following my favorite blogs and podcasts. Because I haven’t been highly focused on any single project lately, I thought I’d share some of the more interesting things I’ve come across.
Peter Sestoft, C5 Generics Collection Library, and YIIHAW
Peter Sestoft is a brilliant professor at the very old (founded 1479) Copenhagen IT University in Denmark, which actually has a student body consisting of a female majority (59%)! I discovered Sestoft’s works while doing research for some new, more powerful collection class I was working on (Tree<T> and Set<T>). He was featured on Channel 9 back in January on a show primarily about the C5 Collection Library.
C5 Generics Collection Library
The C5 Collection Library is an extremely powerful and well-designed library, based on earlier Java and Smalltalk collection library designs, and completely blows away the standard collection classes delivered as part of the .NET Framework. An earlier version was covered well in a Dr. Dobb’s article, but to summarize: C5 is interface-based, provides convenient but hard-to-implement functionality (such as persistent sorted collections and sublist views), provides useful abstractions such as lists, sets, bags, priority queues, double-ended queues, stacks, dictionaries, and optimizes access of these data structures for asymptotic complexity.
Unless you’re an expert—as with encryption—if you find yourself creating your own complex data structures, chances are you’re going to do it wrong. So it’s nice to know that the mathematical experts have taken the time to very carefully design and construct a wide assortment of useful data structures for the rest of us to use, complete with unit tests. And the best part, in the spirit of true science, C5 has a very liberal open source license, so it can be used in open source and commercial software alike. It’s reportedly being used by the military, video game shops, web server software, and more.
Sestoft and his colleagues have the excellent habit of providing copious documentation: a book’s worth, in fact! It can be found online here.
I think it’s an excellent candidate for being included in the .NET Framework itself, and I highly recommend that you check it out the next time you need a fancy data structure and are tempted to write it yourself, and to consider for optimizing performance of existing code that relies heavily on data structures (which is almost everywhere).
Getting the fundamentals right is probably the hardest thing to do. It’s easy (and tempting) to build as tall as we can, but getting a firm foundation in place is much more important. That’s why the C5 Collections Library is so amazing. So what are you waiting for? Go download it!
YIIHAW Is an Intelligent and High-performing Aspect Weaver
What really gets me excited is Sestoft’s YIIHAW, a high-performance static aspect weaver for .NET.
Aspect Oriented Programming (AOP) is a paradigm for addressing cross-cutting concerns in software. That is, logic that operates orthogonally to business logic, such as logging, security, configuration, and more. There are a number of open source aspect weavers available today, and they generally fall into two camps: static or compile-time weavers, and dynamic or runtime weavers.
While incredibly useful, several factors have kept AOP from entering the mainstream. For one, a good set of tools hasn’t yet emerged to enable easy debugging of aspects. When we step through lines of code in Visual Studio, the debugger has no way of knowing if or when some other injected piece of code should pick up the thread of execution, since weaving isn’t a natively recognized and supported activity within the debugger.
Another hindrance is performance: aspect weavers typically inject logic (called advices) in the form of method calls at the beginning and/or ending of existing methods (the locations are called join points) which are selected with special selector statements (called pointcuts). The typical trick, as I just alluded to, is to include a new method in the class (called an introduction) and then insert method invocation statements at the beginning and end of the methods to be modified. This isn’t a big deal in many cases, but in performance-critical sections of calls, the additional method calls and the stack manipulations that accompany them prevent these techniques from being practical in performance-critical scenarios.
Data structures play a pivotal role in high-performance algorithms, of course, and Sestoft’s experience in building the C5 Collections Library provided the perfect context in which to consider how some of the specific features of his data structures might adversely affect performance. For example: his collections fire events to inform any interested parties that a collection had an item inserted, deleted, or modified in some other way, eliminating the need for defining specialized collections that trigger specific actions outside of that collection (see my article on a Tree<T> structure I created that makes use of this).
If the C5 library could start with the basic, slimmed down, super high-performant data structures, and then “weave in” aspects to include features such as event notifications only when needed, the problem might be solved. The problem with the typical approach, however, is that insertion of method calls provides as much or more overhead than the simple checks to activate those features to begin with (if not null, then invoke event), so he knew he’d have to go with a different approach.
After evaluating the capabilities and measuring the performance of existing open source aspect weavers, they were all found to be inadequate, and the YIIHAW thesis project was born. You can find the fascinating thesis (which I just finished reading) here. In a nutshell, YIIHAW employs an advanced technique called advice inlining, in which CIL instructions are woven directly into their target methods. There are a surprising number of challenges to this approach, including the need to convert the original methods RET (return) instructions to jumps, remapping jump addresses, updating the method stack size, and much more. The students who worked on this thesis (Rasmus Johansen and Stephan Spangenberg), under Sestoft’s direction, did a masterful job of overcoming all of these challenges, the result of which is a highly-capable static aspect weaver with no performance penalty!
This is truly amazing, and it enables aspects to be woven into an assembly repeatedly: one for feature A, another for feature B, etc., all with the assurance of type safety and performance equivalent to writing the code by hand with all features included in the first place.
Another positive of the YIIHAW thesis is the light it shines on CIL instructions, their organization and function. The advice inlining techniques provide a great tutorial for those wanting a deeper understanding of how to manipulate things at the CIL level. Many who have attempted AOP have either not considered this approach, or else turned away from this path due to the complexity and challenge. Not so with the YIIHAW team, who faced the problem head on and accomplished something fundamentally great as a result. The fantastic Mono.Cecil library, despite being poorly documented, is used for analysis and manipulation of CIL code, and several other ideas came to mind as I saw how this tool was put to use.
My only criticism of their application of this technology for the generation of specialized data structures at compile time is that you may find yourself needing different versions of the same data structure in different parts of your application. In a class that performs analysis of visual data for extracting spatial cues, you may need a Tree with as little overhead as possible (i.e., no event notifications, no serialization capability), and in the user interface, you may need another Tree with those additional characteristics. By making a single decision at compile/weave time, your options will be limited at runtime.
There’s also the issue of supporting the debugging process, which can’t be dismissed lightly. But YIIHAW has gone to extreme measures to overcome some of AOP’s largest obstacles; and in addition to interceptions (weaving logic into existing methods), YIIHAW also supports introductions (adding new types and members), and typestructure modifications (changing a class’s base class, or making a class implement an interface).
Overall, YIIHAW’s potential is huge, and I believe that it won’t be long before the tooling catches up to the point where AOP becomes a mainstream approach to complex cross-cutting systems problems, simplifying designs and the engineering process overall.
Color printing is about to get a whole lot faster. At 60 pages per minute (ppm), the Memjet printer, developed by Kia Silverbrook, a prolific inventor with over 1400 patents. Check out the YouTube video to see it in action.
I’ve seen a lot of speculation that Memjet is a hoax, and a technological leap forward this large certainly warrants skepticism. However, the technology was written about in a very respectable science magazine, one of my favorites: Scientific American, in June 2007. It’s also been covered in several articles by PC Magazine, for example in May 2007, which mentions the Memjet appearing at the 2007 Global Ink Jet Printing Conference. (Talk about specific conferences!)
The print heads are page-width, so they don’t have to move back and forth across the page. The “MEM” in Memjet stands for microelectromechanical, and refers to the fabrication process that involves CMOS chips on a silicon wafer for precision layout.
With ink spraying at 900 million 1-2 picoliter drops per second, drop size small enough to dry within a second, and an estimated price tag under $200 (compared to the comparable HP Edgeline at $16,000), waiting around for long documents to print will soon be a thing of the past for everyone. The first devices are expected to appear in 2009.