Critical Development

Language design, framework development, UI design, robotics and more.

Archive for the ‘Linguistics’ Category

Language Design: Complexity, Extensibility, and Intention

Posted by Dan Vanderboom on June 14, 2010


The object-oriented approach to software is great, and that greatness draws from the power of extensibility.  That we can create our own types, our own abstractions, has opened up worlds of possibilities.  System design is largely focused on this element of development: observing and repeating object-oriented patterns, analyzing their qualities, and adding to our mental toolbox the ones that serve us best.  We also focus on collecting libraries and controls because they encapsulate the patterns we need.

This article explores computer languages as a human-machine interface, the purpose and efficacy of languages, complexity of syntactic structure, and the connection between human and computer languages.  The Archetype project is an on-going effort to incorporate these ideas into language design.  In the same way that some furniture is designed ergonomically, Archetype is an attempt to design a powerful programming language with an ergonomic focus; in other words, with the human element always in mind.

Programming Language as Human-Machine Interface

A programming language is the interface between the human mind and executable code.  The point isn’t to turn human programmers into pure mathematical or machine thinkers, but to leverage the talent that people are born with to manipulate abstract symbols in language.  There is an elite class of computer language experts who have trained themselves to think in terms of purely functional approaches, low-level assembly instructions, or regular, monotonous expression structures—and this is necessary for researchers pushing themselves to understand ever more—but for the every day developer, a more practical approach is required.

Archetype is a series of experiments to build the perfect bridge between the human mind and synthetic computation.  As such, it is based as much as possible on a small core of extensible syntax and maintains a uniformity of expression within each facet of syntax that the human mind can easily keep separate.  At the same time, it honors syntactic variety and is being designed to shift us closer to a balance where all of the elements, blocks, clauses and operation types in a language can be extended or modified equally.  These represent the two most important design tenets of Archetype: the intuitive, natural connection to the human mind, and the maximization of its expressive power.

These forces often seem at odds with each other—at first glance seemingly impossible to resolve—and yet experience has shown that the languages we use are limited in ways we’re often surprised by, indicating that processes such as analogical extension are at work in our minds but not fully leveraged by those languages.

Syntactic Complexity & Extensibility

Most of a programming language’s syntax is highly static, and just a few areas (such as types, members, and sometimes operators) can be extended.  Lisp is the most famous example of a highly extensible language with support for macros which allow the developer to manipulate code as if it were data, and to extend the language to encode data in the form of state machines.  The highly regular, parenthesized syntax is very simple to parse and therefore to extend… so long as you don’t deviate from the parenthesized form.  Therefore Lisp gets away with powerful extensibility at the cost of artificially limiting its structural syntax.

In Lisp we write (+ 4 5) to add two numbers, or (foo 1 2) to call a function with two parameters.  Very uniform.  In C we write 4 + 5 because the infix operator is what we grew up seeing in school, and we vary the syntax for calling the function foo(1, 2) to provide visual cues to the viewer’s brain that the function is qualitatively something different from a basic math operation, and that its name is somehow different from its parameters.

Think about syntax features as visual manifestations of the abstract logical concepts that provide the foundation for all algorithmic expression.  A rich set of fundamental operations can be obscured by a monotony of syntax or confused by a poorly chosen syntactic style.  Archetype involves a lot of research in finding the best features across many existing languages, and exploring the limits, benefits, problems, and other details of each feature and syntactic representation of it.

Syntactic complexity provides greater flexibility, and wider channels with which to convey intent.  This is why people color code file folders and add graphic icons to public signage.  More cues enable faster recognition.  It’s possible to push complexity too far, of course, but we often underestimate what our minds are capable of when augmented by a system of external cues which is carefully designed and supported by good tools.

Imagine if your natural spoken language followed such simple and regular rules as Lisp: although everyone would learn to read and write easily, conversation would be monotonous.  Extend this to semantics, for example with a constructed spoken language like Lojban which is logically pure and provably unambiguous, and it becomes obvious that our human minds aren’t well suited to communicating this way.

Now consider a language like C with its 15 levels of operator precedence which were designed to match programmers’ expectations (although the authors admitted to getting some of this “wrong”, which further proves the point).  This language has given rise to very popular derivatives (C++, C#, Java) and are all easily learned, despite their syntactic complexity.

Natural languages and old world cities have grown with civilization organically, creating winding roads and wonderful linguistic variation.  These complicated structures have been etched into our collective unconscious, stirring within us and giving rise to awareness, thought, and creativity.  Although computers are excellent at processing regular, predictable patterns, it’s the complex interplay of external forces and inner voices that we’re most comfortable with.

Risk, Challenge & Opportunity

There are always trade-offs.  By focusing almost all extensibility in one or two small parts of a language, semantic analysis and code improvement optimizations are easier to develop and faster to execute.  Making other syntactical constructs extensible, if one isn’t careful, can create complexity that quickly spirals out of control, resulting in unverifiable, unpredictable and unsafe logic.

The way this is being managed in Archetype so far isn’t to allow any piece of the syntax tree to be modified, but rather to design regions of syntax with extensibility points built-in.  Outputting C# code as an intermediary (for now) lays a lot of burden on the C# compiler to ensure safety.  It’s also possible to mitigate more computationally expensive semantic analysis and code generation by taking advantage of both multicore and cloud-based processing.  What helps keep things in check is that potential extensibility points are being considered in the context of specific code scenarios and desired outcomes, based on over 25 years of real-world experience, not a disconnected sense of language purity or design ideals.

Creating a language that caters to the irregular texture of thought, while supporting a system of extensions that are both useful and safe, is not a trivial undertaking, but at the same time holds the greatest potential.  The more that computers can accommodate people instead of forcing people to make the effort to cater to machines, the better.  At least to the extent that it enables us to specify our designs unambiguously, which is somewhat unnatural for the human mind and will always require some training.


So much of the code we write is driven by a set of rituals that, while they achieve their purpose, often beg to be abstracted further away.  Even when good object models exist, they often require intricate or tedious participation to apply (see INotifyPropertyChanged).  Having the ability to incorporate the most common and solid of those patterns into language syntax (or extensions which appear to modify the language) is the ultimate mechanism for abstraction, and goes furthest in minimizing development effort.  By obviating the need to write convoluted yet routine boilerplate code, Archetype aims to filter out the noise and bring one’s intent more clearly into focus.

Posted in Archetype Language, Composability, Design Patterns, Language Extensions, Language Innovation, Linguistics, Metaprogramming, Object Oriented Design, Software Architecture | 2 Comments »

Project: Code-Named “SQL Mobile Bridge”

Posted by Dan Vanderboom on December 26, 2007


This is not the final name.  But it will be a useful product.  With as much as I’ve been working with rapidly-evolving mobile database schemas lately, I expect to save from 30 minutes to an hour a day in my frequent build-deploy-test cycles.  The lack of a good tool for mobile device database queries causes me a lot of grief.  I know Visual Studio 2008 has something built-in to connect to mobile devices over ActiveSync, but let’s face it: ActiveSync has been a real pain in the arse, and more often fails than works (my next blog will detail some of those errors).  I can only connect to one device at a time, and I lose that connection frequently (meanwhile, SOTI Pocket Controller continues to work and communicate effectively).  Plus I have a window constantly bugging me to create an ActiveSync association.

I work on enteprise systems using sometimes hundreds of Windows Mobile devices on a network.  So I don’t want to create an association on each one of those, and getting ActiveSync to work over wireless requires an association, as far as I know.

Pocket Controller or other screen-sharing tools can be used to view the mobile device, and run Query Analyzer in QVGA from the desktop, but my queries get big and ugly, and even the normal-looking ones don’t fit very well on such as small screen.  Plus Query Analyzer on PDAs is very sparse, with few of the features that most of us have grown accustomed to in our tools.  Is Pocket Query Analyzer where you want to be doing some hardcore query building or troubleshooting?

So what would a convenient, time-saving, full-featured mobile database query tool look like?  How could it save us time?  First, all of the basics would have to be there.  Loading and saving query files, syntax color coding, executing queries, and displaying the response in a familiar “Query Analyzer”/”Management Studio” UI design.  I want to highlight a few lines of SQL and press F5 to run it, and I expect others have that instinct as well.  I also want to be able to view connected devices, and to use several tabs for queries, and to know exactly which device and database the active query window corresponds to.  No hunting and searching for this information.  It should also have the ability to easily write new providers for different database engines (or different versions of them).

Second, integration.  It should integrate into my development environments, Visual Studio 2005 & 2008.  It should also give an integrated list of databases, working with normal SQL Server databases as well as mobile servers.  If we have a nice extensible tool for querying our data, why limit it to Windows Mobile databases?

Sometimes it’s the little details, the micro-behaviors and features, the nuances of the API and data model, that defines the style and usefulness of a product.  I’ve been paying a lot of attention to these little gestures, features, and semantics, and I’m aiming for a very smooth experience.  I’m curious to know what happens when we remove all the unnecessary friction in our development workflows (when our brains are free to define solutions as fast as we can envision them).

Third, an appreciation for and focus on performance.  Instead of waiting for the entire result to return before marshalling the data back to the client, why not stream it across as it’s read? — several rows at a time.  Users could get nearly instantaneous feedback to their queries, even if the query takes a while to come fully across the wire.  Binary serialization should be used for best performance, and is on the roadmap, but that’s coming after v1.0, after I decide to build vs. buy that piece.

Finally, a highly-extensible architecture that creates the opportunity for additional functionality (and therefore product longevity).  The most exciting part of this project is probably not the query tool itself, but the Device Explorer window, the auto-discovering composite assets it visualizes, and the ability to remotely fetch asset objects and execute commands on them.

The Device identifies (broadcasts) itself and can be interrogated for its assets, which are hierarchically composed to represent what is visualized as an asset tree.  One Device might have some Database assets and some Folder and File assets.  The Database assets will contain a collection of Table assets, which will contain DatabaseRow and DatabaseColumn assets, etc.  In this way, the whole inventory of objects on the device that can be interrogated, discovered, and manipulated in a standard way that makes inherent sense to the human brain.  RegistryEntry, VideoCamera, whatever you want a handle to.

This involves writing “wrapper” classes (facades or proxies) for each kind of asset, along with the code to manipualte it locally.  Because the asset classes are proxies or pointers to the actual thing, and because they inherit from a base class that handles serialization, persistence, data binding, etc., they automatically support being remoted across the network, from any node to any other node.  Asset objects are retrieved in a lazy-load fashion: when a client interrogates the device, it actually interrogates the Device object.  From there it can request child assets, which may fetch them from the remote device at that time, or use its locally-cached copies.  If a client already knows about a remote asset, it can connect to and manipulate it directly (as long as the remote device is online).

With a remoting framework that makes shuffling objects around natural, much less message parsing and interpretation code needs to be written.  Normal validation and replication collision logic can be written in the same classes that define the persistent schema.

So what about services?  Where are the protocols defined?  Assets and Services have an orthagonal relationship, so I think that Services should still exist as Service classes, but each service could provide a set of extension methods to extend the Asset classes.  That way, if you add a reference to ServiceX, you will have the ability to access a member Asset.ServiceXMember (like Device.Databases, which would call a method in MobileQueryService).  If this works out the way I expect, this will be my first real use of extension methods.  (I have ideas to extend string and other simple classes for parsing, etc., of course, but not as an extension to something else I own the code for.)  In the linguistic way that I’m using to visualize this: Services = Verbs, Assets = Nouns.  Extension methods are the sticky tape between Nouns and Verbs.

public static AssetCollection<Database> Databases(this Device device) { }

With an ability to effortlessly and remotely drill into the assets in a mobile device (or any computer, for that matter), and the ability to manipulate them through a simple object model, I expect to be a significantly productive platform on which to build.  Commands executed against those assets could be scripted for automatic software updates, they could be queued for guaranteed delivery, or they could be supplemented with new commands in plug-in modules that aid in debugging, diagnostics, runtime statistics gathering, monitoring, synchronizing the device time with a server, capturing video or images, delivering software updates, etc.

And if the collection of assets can grow, so can UI components such as context menu items, document windows, and so on, extending and adding to the usefulness of the Device Explorer window.  By defining UI components as UserControls and defining my own Command invocation mechanism, they can be hosted in Visual Studio or used outside of that with just a few adjustments.

More details to come.

Posted in Compact Framework, Linguistics, My Software, Object Oriented Design, Problem Modeling, Software Architecture, SQL Server Compact, User Interface Design | Leave a Comment »

Linguistics, Language Creation, and Old Papers

Posted by Dan Vanderboom on December 10, 2007

I’ve always been interested in linguistics and languages, and have spent several years studying it as a hobby (a very time consuming hobby).  When I was younger, I worked very hard to invent my own language.  First it was word lists, then it was a collection of grammar ideas, and I even designed my own glyphs for a writing system that was phonetic (and occassionally syllabic).

While I was still in high school (going back to 1991), I wrote a short paper on the subject of creating a language.  Way back then, the Internet was mainly a curiosity accessible to college students.  The rest of the world, myself included, went online by dialing up Bulliten Board Systems (BBSs) with 2400 bps modems (or worse).  I published this paper on some of the BBSs at the time, and haven’t thought about it since.

To my amusement, a friend of mine Googled me the other day and found it.  It’s survived to this day, and can be found on websites all over, especially in non-English-speaking countries.  Here’s an English site that hosts it:

Apparently, it’s being used for purposes other than creating new languages.  On Russian websites, people are referencing it to learn simple English grammar.

I even found a mention of my paper on a site where someone had created their own language.

How neat is that!

That paper was very simple.  I was young and hadn’t yet learned about the crazy complexities of grammar.  But simple is good, and I’m glad it’s found a use.

Posted in Linguistics | Tagged: | Leave a Comment »