Critical Development

Language design, framework development, UI design, robotics and more.

Archive for February, 2009

Windows Azure: Blobs and Blocks

Posted by Dan Vanderboom on February 21, 2009

I’ve been busy building a new cloud-based service for the past few weeks, using Windows Azure on the back end and Silverlight for the client.  One of the requirements of my service is to allow users to upload files to a highly scalable Internet storage system.  I’m experimenting with Azure’s blob storage for this, and I have a need to upload these blobs (Binary Large OBjects) in separate blocks.  There are two reasons I can tell why you’d want to do this:

  1. Although blobs can be as large as 2 GB in the current technical preview, the largest blob you can put in one operation is 4 MB.  If your file is larger, you have to store separate blocks, and then put a block list to assemble them together and commit them as a blob.
  2. If you want different users to upload different portions of a file, each user will have to upload individual blocks, and you’ll have to put the block list when all blocks are present.  This is something like a reverse BitTorrent or other P2P protocol.

My service needs to deal with separate blocks for the second reason, though the first is likely to be much more common.

Although there’s a good deal of information about blocks and blobs in the REST API for Azure Storage Services, piecing together code to make REST calls with all the appropriate headers (including authentication signatures) isn’t very fun.  Where is the .NET library to make it easy?

There is one, in fact.  If you’ve downloaded and installed the Azure SDK (Jan 2009), you’ll find a samples.zip file that needs to be unzipped, and the solutions built within it.  Particularly, you’ll need the StorageClient solution.  In it, you’ll find that you can save and load blobs (as well as use queues and table storage), but there’s nothing in the API that suggests it supports putting individual blocks, let alone putting block lists to combine all of those blocks into a blob.  The raw state of this API is unfortunate, but the Azure platform is in an early tech preview stage, so we can expect vast improvements in the future.

Until then, however, I dug into it and discovered that there actually was code to put blocks and commit block lists, but it wasn’t exposed in the API (in BlobContainerRest.PutLargeBlobImpl).  Rather, it was called only when the blob you try to put was over the 4 MB limit.  Taking this code and hacking it a bit, I extended the StorageClient library to provide this needed functionality.

First, add these abstract method definitions to the BlobContainer class (in BlobStorage.cs):

public abstract bool PutBlobBlockList(BlobProperties blobProperties, 
    IEnumerable<string> BlockIDs, bool overwrite, string eTag);

public abstract bool PutBlobBlock(BlobProperties blobProperties, string BlockID, 
    Stream stream, long BlockSize, bool overwrite, string eTag);

Next, you’ll need to add the implementations to the BlobContainerRest class (in RestBlobStorage.cs):

public override bool PutBlobBlock(BlobProperties blobProperties, string BlockID, 
    Stream stream, long BlockSize, bool overwrite, string eTag)
{
    NameValueCollection nvc = new NameValueCollection();
    nvc.Add(QueryParams.QueryParamComp, CompConstants.Block);
    nvc.Add(QueryParams.QueryParamBlockId, 
        Convert.ToBase64String(Encoding.Unicode.GetBytes(BlockID)));
    return UploadData(blobProperties, stream, BlockSize, overwrite, eTag, nvc);
}

public override bool PutBlobBlockList(BlobProperties blobProperties, 
    IEnumerable<string> BlockIDs, bool overwrite, string eTag)
{
    bool retval = false;

    using (MemoryStream buffer = new MemoryStream())
    {
        XmlTextWriter writer = new XmlTextWriter(buffer, Encoding.UTF8);
        writer.WriteStartDocument();
        writer.WriteStartElement(XmlElementNames.BlockList);
        foreach (string id in BlockIDs)
        {
            writer.WriteElementString(XmlElementNames.Block, 
                Convert.ToBase64String(Encoding.Unicode.GetBytes(id)));
        }
        writer.WriteEndElement();
        writer.WriteEndDocument();
        writer.Flush();
        buffer.Position = 0; //Rewind

        NameValueCollection nvc = new NameValueCollection();
        nvc.Add(QueryParams.QueryParamComp, CompConstants.BlockList);

        retval = UploadData(blobProperties, buffer, buffer.Length, overwrite, eTag, nvc);
    }

    return retval;
}

In order to test this, I added two buttons to an ASP.NET page, one to upload the blocks and put the block list, and a second to read the blob back to verify the write operations worked:

protected void btnUploadBlobBlocks_Click(object sender, EventArgs e)
{
    var account = new StorageAccountInfo(new Uri("http://127.0.0.1:10000/"), null, "devstoreaccount1", 
        "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==");
    var storage = BlobStorage.Create(account);
    var container = storage.GetBlobContainer("testfiles");

    if (!container.DoesContainerExist())
        container.CreateContainer();

    var properties = new BlobProperties("TestBlob");

    // put block 0

    var ms = new MemoryStream();
    using (StreamWriter sw = new StreamWriter(ms))
    {
        sw.Write("This is block 0.");
        sw.Flush();
        ms.Position = 0;

        var PutBlock0Success = container.PutBlobBlock(properties, "block 0", ms, ms.Length, true, null);
    }

    // put block 1

    ms = new MemoryStream();
    using (StreamWriter sw = new StreamWriter(ms))
    {
        sw.WriteLine("... and this is block 1.");
        sw.Flush();
        ms.Position = 0;

        var PutBlock1Success = container.PutBlobBlock(properties, "block 1", ms, ms.Length, true, null);
    }

    // put block list

    List<string> BlockIDs = new List<string>();
    BlockIDs.Add("block 0");
    BlockIDs.Add("block 1");

    var PutBlockListSuccess = container.PutBlobBlockList(properties, BlockIDs, true, null);
}

protected void btnTestReadBlob_Click(object sender, EventArgs e)
{
    var account = new StorageAccountInfo(new Uri("http://127.0.0.1:10000/"), null, "devstoreaccount1",
        "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==");
    var storage = BlobStorage.Create(account);
    var container = storage.GetBlobContainer("testfiles");

    MemoryStream ms = new MemoryStream();
    BlobContents contents = new BlobContents(ms);
    container.GetBlob("TestBlob", contents, false);
    ms.Position = 0;

    using (var sr = new StreamReader(ms))
    {
        string x = sr.ReadToEnd();
        sr.Close();
    }
}

It’s nothing fancy, but if you put a breakpoint on the last sr.Close command, you’ll see that the value of x contains both blocks of data, equal to “This is block 0…. and this is block 1.”

Posted in Cloud Computing, Design Patterns, Windows Azure | 5 Comments »

Oslo: Misconceptions and Fallacies

Posted by Dan Vanderboom on February 1, 2009

In the many conversations and debates I’ve had about Oslo recently–in person, on the phone, through email, on blogs, and in the Oslo forum–I’ve encountered a good amount of resistance to the goals of Oslo.  Some of this is due to misconception and general confusion, some is due to an attachment to one’s current methodology, and some I believe is simply due to a fear of anything new and unknown.  In the course of these conversations, I’ve run across a common set of thoughts or themes which I have attempted to represent faithfully here.

My first article, Why Oslo is Important, attempted to elucidate the high-level collection of concepts, technologies, and tools flying under the Oslo banner as it exists today and how I imagine it evolving in the future.  Though I’ve received a lot of interest and appreciation, it also managed to spark some feedback from those who were still confused or concerned, leading me to believe that I had failed to deliver a fully satisfying explanation.

Understandably, Oslo and its target domain are too large to explain or digest in a single article, even a long one.  It’s also too early in the development cycle to be very specific.  So it’s reasonable to suggest that one can’t be totally satisfied until substantial examples and reference applications are built using the Oslo tools.  Fair enough.

This article isn’t going to provide that reference application.  It has the more modest goal of trying to dispel some of the common misconceptions and fallacies that I’ve encountered, and my responses to them.  In future articles, as I design and develop my newest software system, I’ll be documenting and publishing the how and why of my use of Oslo tools and technologies to provide more specific evidence of its usefulness.  I’ll also be providing references to much of the good work that is being done to provide solutions to various problems.

As always, I encourage you to participate and provide feedback: to me, and especially to the Oslo team.  The more brain power we can bring to bear on this problem in the community, the better off the Oslo team will be, and the faster Oslo will evolve to become precisely the set of tools we need to improve our overall development experience, and developer productivity in particular.

“Oslo doesn’t solve any problems that can’t already be solved with existing tools or technologies.”

When the first steam-powered tractors were sold to farmers in the 1860’s, traditional ox and horse farmers might have said the same thing.  “This tractor won’t do anything that my ox-plowing method can’t already do.”  This is true, but it’s not an effective argument against the use of the new technology, which was a much faster and more cost-effective method of farming.  The same farmer could harvest much more of his crop in the same amount of time, and as the new technology matured and gas-powered engines became available (in the 1880’s), so did the benefit increase.  The same goes for any high level language above assembly language, the use of a relational database over a loose collection of files, display of text on a CRT instead of punched tape to communicate with a user, or any other great technological leap forward that “doesn’t accomplish anything new”.

Then again, it all depends what kind of problems you’re talking about.  If you get specific enough, I’m sure you’ll find plenty in Oslo that’s new, even so early in its development, such as a shared repository for interoperable models and the ability to define new parsers that provide tooling support such as keyword colorization.  Sometimes it’s these little details that act as the glue to pull components together and create substantial value through the synergy that results.  Visual Studio and Intellisense weren’t strictly needed (you can still use Notepad and cs.exe), but it can quickly answer dozens of questions a day without having to jump out of context and spend a lot of time looking through disconnected documentation.

“We don’t need to know about Oslo or model-driven development because the applications I build are small and specific, or otherwise don’t need to be so general and flexible.”

This may or may not be true for you, but saying that the industry doesn’t need to advance because you don’t perceive a direct benefit to your own development isn’t valid.  The reason your applications are able to be so simple is because of the wealth of tools, languages, platforms, frameworks, and libraries that your applications leverage.  Standing on the shoulder of giants, you might say.

Many of these systems and components can benefit tremendously from a model-driven approach, and if it improves productivity for Microsoft and other third-party developers, that means they’ll be able to spend less time on plumbing and more time building the framework features you care about.  It’s also likely to make all of those APIs cleaner and more consistent, resulting in easier discoverability and fewer headaches for you, the API consumer.  As the .NET Framework and other frameworks and libraries grow ever larger, this will be critical to keeping things organized and under control.

“Oslo is going to force me to model all kinds of things that really aren’t needed in my software.”

The existence of Oslo will not force you to model any characteristics that you aren’t already modeling through other means.  What it will do is provide more options and tools for modeling your software more effectively and more productively.  It will also significantly ease the burden of creating more heavily model-driven software if you decide that’s right for your application or service.

For more information and a clearer definition of what a model is, see this article on the MSDN Oslo Development Center.

“Oslo will impose a workflow on me that doesn’t make sense for my methodology or business.”

Where Oslo fits into your specific workflow will be ultimately determined by you.  This isn’t any different from Entity Framework.  In v1 of EF, the tooling supports the reading of database structure and the generation of entity classes, but there is work being done to support a workflow going in the other direction: that of starting with classes and generating the database.  The Entity Framework itself doesn’t actually care in which direction you want to work; the issue is primarily one of tool support.  Other initiatives such as adding support for POCO indicate that the EF team is listening to feedback from the community and making the necessary changes to achieve broad support of their framework.  I would expect the same from the Oslo team.

Early releases of Oslo will have similar limitations; currently it seems that M can only be used to generate a database structure from MSchema, and that database structure can be read by Entity Framework to generate your entity classes.  Because Microsoft has such a broad audience to satisfy, other workflows will have to be accommodated, such as starting with class files and generating M files and database schemas.  In fact, I’ve submitted feedback to Microsoft’s Connect site to ensure this kind of multi-master synchronization of model representations is considered.

“Putting everything in a database is overkill for my application, so Oslo isn’t relevant to me.”

While the Repository is an important aspect of Oslo, it isn’t required.  Command line tools exist to transform textual input (specified as MGraph, or in your own custom-defined format using a Domain Specific Language) into MGraph output.  There is a separate step to convert this into SQL, or to optionally inject this data directly into the Oslo Repository.

If you don’t want to use the Repository, there are already multiple methods available for instantiating objects directly from this text data, whether it’s read from a file, embedded as a resource, or sent as data over a network.  Josh Williams (SpankyJ) has published an example showing how to convert DSL text into XAML, and instantiate the object graph using an MGraphXamlReader, and Torkel Ödegaard of Coding Instinct wrote an article demonstrating how to write a generic deserializer without using XAML.

Model formats such as CSDL, MSL, and SSDL for EF, or configuration data currently specified for WCF and WF, will all very likely be expressed in some DSL specified with M (there has been talk about these efforts already).  Since applications without database access will need these technologies, it will be impossible to force developers to read this model data from a SQL Server database.

“We already have XML, XSD, and XSLT, so there’s no benefit to having yet another language to specify the same things.”

XSD is used to define formats and languages (such as XAML), and XML is used as a poor man’s one-size-fits-all meta-format for specifying hierarchical data.  While XML is friendly enough to open in text editors, it’s designed more for tools than for human eyes.

Having different languages and formats to represent different kinds of data actually eases human comprehension and authoring ability.  As Chris Anderson said in his Oslo session at PDC, when you’re looking at XML in an editor, what stands out are not what’s important to your domain, but rather what’s important to XML: elements and attributes.

People are using XML to define their DSLs and formats, not because XML is the best representation, but because writing parsers for new formats and languages is just too hard.  Customers had been asking Microsoft for the ability to write these DSLs easily, so it was out of conversations and customer feedback that Microsoft decided to expose these services.

So it’s not a matter of absolutely needing M and the ability to define new languages because of some inability to get work done without them.  Rather, it’s about reducing the amount of mental work required to author our models and increasing our productivity as a result.  It’s also about having powerful transformational tools available to convert all formats and languages into a common representation so that the models can all interoperate despite their differences, in the same way that .NET languages all compile to a common CIL/MSIL language so that many different programming languages can interoperate.  Without this ability, we’d have a different community of developers for each language instead of one broad group of “.NET developers” who can all share and benefit from each other’s knowledge and libraries.  This has been recognized as such an important advantage that there are efforts underway to compile languages other than Java to JVM byte cote.

The larger the community, the larger our collective pool of knowledge, and the greater reuse we actually achieve.

“Oslo is too general and abstract to be useful to real developers building real systems.”

The idea that generalization can get out of control for a specific problem is valid, in the same way that a problem can be over-analyzed.  But that doesn’t mean that we should stigmatize all general-purpose software, or that we should ignore the growing trend for enterprise software systems to require greater flexibility, user customization support, extensibility, and so on.

The fact is that life on Earth evolves towards greater complexity, and as supporting hardware resources increase and business demands grow, so does software.  Taming that complexity will require rethinking how we approach every aspect of software design and development, including how to model it.

The software development industry is stratified into many layers, from platform development to one-off, command-line utilities.  Some organizations write software to support millions of users, while others deploy specialized applications in-house, but most of us fall somewhere in between.  Oslo seems to be most applicable to enterprise software offered to many customers, including cloud services, but there are subsets of Oslo that will have an impact on a great majority of .NET developers sooner or later.

There’s a lot of thought and work that goes on in our world (and billions of dollars spent) on “pure research” in the sciences (including computer science) that isn’t directly applicable to every John Doe, but without which we wouldn’t have things like nuclear power plants, microwaves, radio, television, satellite communication, or many pharmaceuticals.  The Nobel laureates of the world who have spent their lives studying something so abstract and remote from every day life have contributed massively to the technological progress of our world, and quite often contribute to a better, more sanitary, healthy, and productive society.  Despite the risks and dangers each technology enables, we somehow still make steady progress in terms of reducing chaos and violence.

Without abstract and general technology like general purpose language compilers, which can specify any logic we dream up for any type of application we care to build, we’d be back in the stone age.  The Internet itself is based on communication standards that are so general, they are applicable to any application protocol or service traffic we can devise.

So before dismissing software (or any technology) due to its abstract or general nature, think about where we’d be without them.  Someone has to approach the colossal, abstract, general problems with enough foresight to deliver solutions before they’re too desperately needed; and who better than a huge organization like Microsoft with deep pockets?

Ironically, our ability to define Domain Specific Languages with Oslo give us the converse power: the ability to define languages and formats that are extremely specific to our purposes and problem domains, and therefore enable us to write our specifications with less ceremony and noise that accompanies a general purpose language.  This allows us to specify our intentions more easily by saying only what we need to say to get the point across.  So the only reason Oslo must be so general is to provide that interoperability and translation layer across a set of specific formats that we define… without us having to work so hard for it.

For different reasons, it reminds me of generics, another general and abstract tool: it’s a complicated feature to implement in a language, but it provides great expressive power.  Generics also don’t provide anything we couldn’t manage to do before in other ways, but I certainly wouldn’t want to go back to programming without them!  In fact, you might say it’s an effective modeling tool.

Posted in Development Environment, Dynamic Programming, Language Innovation, Metaprogramming, Oslo, Problem Modeling | 7 Comments »