Critical Development

Language design, framework development, UI design, robotics and more.

Windows Azure: Blobs and Blocks

Posted by Dan Vanderboom on February 21, 2009

I’ve been busy building a new cloud-based service for the past few weeks, using Windows Azure on the back end and Silverlight for the client.  One of the requirements of my service is to allow users to upload files to a highly scalable Internet storage system.  I’m experimenting with Azure’s blob storage for this, and I have a need to upload these blobs (Binary Large OBjects) in separate blocks.  There are two reasons I can tell why you’d want to do this:

  1. Although blobs can be as large as 2 GB in the current technical preview, the largest blob you can put in one operation is 4 MB.  If your file is larger, you have to store separate blocks, and then put a block list to assemble them together and commit them as a blob.
  2. If you want different users to upload different portions of a file, each user will have to upload individual blocks, and you’ll have to put the block list when all blocks are present.  This is something like a reverse BitTorrent or other P2P protocol.

My service needs to deal with separate blocks for the second reason, though the first is likely to be much more common.

Although there’s a good deal of information about blocks and blobs in the REST API for Azure Storage Services, piecing together code to make REST calls with all the appropriate headers (including authentication signatures) isn’t very fun.  Where is the .NET library to make it easy?

There is one, in fact.  If you’ve downloaded and installed the Azure SDK (Jan 2009), you’ll find a samples.zip file that needs to be unzipped, and the solutions built within it.  Particularly, you’ll need the StorageClient solution.  In it, you’ll find that you can save and load blobs (as well as use queues and table storage), but there’s nothing in the API that suggests it supports putting individual blocks, let alone putting block lists to combine all of those blocks into a blob.  The raw state of this API is unfortunate, but the Azure platform is in an early tech preview stage, so we can expect vast improvements in the future.

Until then, however, I dug into it and discovered that there actually was code to put blocks and commit block lists, but it wasn’t exposed in the API (in BlobContainerRest.PutLargeBlobImpl).  Rather, it was called only when the blob you try to put was over the 4 MB limit.  Taking this code and hacking it a bit, I extended the StorageClient library to provide this needed functionality.

First, add these abstract method definitions to the BlobContainer class (in BlobStorage.cs):

public abstract bool PutBlobBlockList(BlobProperties blobProperties, 
    IEnumerable<string> BlockIDs, bool overwrite, string eTag);

public abstract bool PutBlobBlock(BlobProperties blobProperties, string BlockID, 
    Stream stream, long BlockSize, bool overwrite, string eTag);

Next, you’ll need to add the implementations to the BlobContainerRest class (in RestBlobStorage.cs):

public override bool PutBlobBlock(BlobProperties blobProperties, string BlockID, 
    Stream stream, long BlockSize, bool overwrite, string eTag)
{
    NameValueCollection nvc = new NameValueCollection();
    nvc.Add(QueryParams.QueryParamComp, CompConstants.Block);
    nvc.Add(QueryParams.QueryParamBlockId, 
        Convert.ToBase64String(Encoding.Unicode.GetBytes(BlockID)));
    return UploadData(blobProperties, stream, BlockSize, overwrite, eTag, nvc);
}

public override bool PutBlobBlockList(BlobProperties blobProperties, 
    IEnumerable<string> BlockIDs, bool overwrite, string eTag)
{
    bool retval = false;

    using (MemoryStream buffer = new MemoryStream())
    {
        XmlTextWriter writer = new XmlTextWriter(buffer, Encoding.UTF8);
        writer.WriteStartDocument();
        writer.WriteStartElement(XmlElementNames.BlockList);
        foreach (string id in BlockIDs)
        {
            writer.WriteElementString(XmlElementNames.Block, 
                Convert.ToBase64String(Encoding.Unicode.GetBytes(id)));
        }
        writer.WriteEndElement();
        writer.WriteEndDocument();
        writer.Flush();
        buffer.Position = 0; //Rewind

        NameValueCollection nvc = new NameValueCollection();
        nvc.Add(QueryParams.QueryParamComp, CompConstants.BlockList);

        retval = UploadData(blobProperties, buffer, buffer.Length, overwrite, eTag, nvc);
    }

    return retval;
}

In order to test this, I added two buttons to an ASP.NET page, one to upload the blocks and put the block list, and a second to read the blob back to verify the write operations worked:

protected void btnUploadBlobBlocks_Click(object sender, EventArgs e)
{
    var account = new StorageAccountInfo(new Uri("http://127.0.0.1:10000/"), null, "devstoreaccount1", 
        "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==");
    var storage = BlobStorage.Create(account);
    var container = storage.GetBlobContainer("testfiles");

    if (!container.DoesContainerExist())
        container.CreateContainer();

    var properties = new BlobProperties("TestBlob");

    // put block 0

    var ms = new MemoryStream();
    using (StreamWriter sw = new StreamWriter(ms))
    {
        sw.Write("This is block 0.");
        sw.Flush();
        ms.Position = 0;

        var PutBlock0Success = container.PutBlobBlock(properties, "block 0", ms, ms.Length, true, null);
    }

    // put block 1

    ms = new MemoryStream();
    using (StreamWriter sw = new StreamWriter(ms))
    {
        sw.WriteLine("... and this is block 1.");
        sw.Flush();
        ms.Position = 0;

        var PutBlock1Success = container.PutBlobBlock(properties, "block 1", ms, ms.Length, true, null);
    }

    // put block list

    List<string> BlockIDs = new List<string>();
    BlockIDs.Add("block 0");
    BlockIDs.Add("block 1");

    var PutBlockListSuccess = container.PutBlobBlockList(properties, BlockIDs, true, null);
}

protected void btnTestReadBlob_Click(object sender, EventArgs e)
{
    var account = new StorageAccountInfo(new Uri("http://127.0.0.1:10000/"), null, "devstoreaccount1",
        "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==");
    var storage = BlobStorage.Create(account);
    var container = storage.GetBlobContainer("testfiles");

    MemoryStream ms = new MemoryStream();
    BlobContents contents = new BlobContents(ms);
    container.GetBlob("TestBlob", contents, false);
    ms.Position = 0;

    using (var sr = new StreamReader(ms))
    {
        string x = sr.ReadToEnd();
        sr.Close();
    }
}

It’s nothing fancy, but if you put a breakpoint on the last sr.Close command, you’ll see that the value of x contains both blocks of data, equal to “This is block 0…. and this is block 1.”

About these ads

5 Responses to “Windows Azure: Blobs and Blocks”

  1. [...] Windows Azure: Blobs and Blocks – Dan Vanderboom [...]

  2. I have to agree and found this very interesting :-)

  3. I have been looking around dvanderboom.wordpress.com and really am impressed by the good content material here. I work the nightshift at my job and it really gets boring. I’ve been coming right here for the past couple nights and reading. I just needed to let you know that I’ve been enjoying what I have seen and I look ahead to reading more.

  4. Lester said

    Hello. Anybody has tried to use the blobstream object. seems a bit buggy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: