History of IDLs

Before we dive into the WSDL discussion, a little background might be helpful. Every distributed computing approach has a mechanism for describing components. Let's examine a brief history of IDLs.

Interface definition languages (IDLs) have a long history in distributed computing. The major use of IDL came as part of the Open Software Foundation's Distributed Computing Environment (DCE) in its specification on RPC in 1994. DCE IDL was a breakthrough concept that quickly spread to other distributed computing initiatives, such as Object Management Group's (OMG) CORBA IDL and Microsoft's COM IDL and COM ODL (Object Definition Language). As with most such technologies, the various flavors of IDL are slightly different and, therefore, more or less incompatible. All hopes are now on WSDL to bring unity back to this crucial area of distributed computing, at least in the area of Web services.

Most people used to developing simple software frown when they first hear about IDL. They say, "Why bother defining the interfaces of any software operations? Just get a pointer/reference to an object or a function and make the call." The reason is that, if a software system has even the slightest amount of heterogeneity, this simple approach won't work. Let's consider some possibilities:

It can be difficult to obtain a reference to the target that implements the operation you want to call. For example, the target could be in another executable on the same machine or on another machine.
The function/method calling convention varies significantly between programming languages or even based on compilation parameters, such as the level of optimization. If even the slightest difference exists between the invoker's and the target's environments, it is likely that the call will fail.
Data encoding rules vary considerably between programming languages (strings in Pascal are length-prefixed while they are null-terminated in C/C++) and platforms (numbers can be represented in little- vs. big-endian format).

The best way to approach these problems, given that software implementations and deployments are, and will forever be, highly heterogeneous, is to agree on a bridging strategy. This strategy establishes common ground in the middle (the bridge) without worrying about how the roads at the endpoints are constructed. In distributed computing, a bridging strategy involves two parts:

Agreeing on how to make an invocation: the mechanics of naming, activation, data encoding, error handling, and so on. This is what distributed computing standards such as DCE, CORBA, and COM do.
Specifying what to call: the operation names, their signatures, return types, and any exceptions that they might generate. This is the job of IDL.

In a typical distributed computing architecture, a tool called an IDL compiler combines the information in an IDL file together with the conventions on how to make invocations to code-generate the pieces that make the bridge work. The client that wants to invoke operations will use a client proxy (sometimes called a client stub). The proxy has the same interface as the operation provider. It can be used as a local object on the client. The proxy implementation knows how to encode and marshal the invocation data to the operation provider and how to capture the operation result and return it to the client. The operation provider will wrap its implementation inside a skeleton (sometimes called a server stub) implementation that is code-generated by the IDL compiler. The skeleton knows how to capture the data sent by the proxy and pass it to the actual implementation. It also knows how to package the result of operations and send it back to the client. Proxies and skeletons are helped by a lot of sophisticated distributed computing middleware. A key part of the story is that proxies and skeletons need not be generated by the same IDL compilers, as long as these compilers are following the same distributed computing conventions. This is the power of IDL—it describes everything that is necessary to make invocation possible in a distributed environment.

DCE IDL specified flat function interfaces. There was no notion of object instance context when making calls. CORBA IDL changed that by adding many important extensions to IDL. CORBA IDL is the de facto IDL standard on non-Microsoft environments. It is also standardized internationally as ISO/IEC 14750.

CORBA IDL is purely declarative; it provides no implementation details. It defines a remote object API concisely (the spec is less than 40 pages long) and covers key issues such as naming, complex type definition, in/out/in-out parameters and exceptions. The syntax is reminiscent of C++ with some additional keywords to cover additional concepts. The following is a brief example of a CORBA IDL specification for an account and an interest account:

module Accounts
{
   interface Account
   {
      readonly attribute string number;
      readonly attribute float balance;

      exception InsuffucientFunds (string detail);

      float debit(in float amount) raises (InsufficientFunds);
      float credit(in float amount);
   }

   interface InterestAccount : Account
   {
      readonly attribute float rate;
   }
}

The information in the IDL file is self-describing and very readable. You can also see that IDL supports the notion of inheritance, which makes it convenient to describe object-oriented distributed systems. In fact, CORBA IDL even supports the notion of multiple interface inheritance as in MyPetTurtle deriving from both Pet and Animal.

In a CORBA-enabled environment, the previous IDL above can be used to invoke the CORBA object via dynamic invocation from a scripting language, generate proxies for client access to the object, generate skeletons for the actual account implementation to be plugged into the CORBA middleware, regardless of its actual implementation language, and store information about the implementation in an interface repository (a central store of metadata about CORBA components' interfaces).

For various historical reasons, Microsoft used to have two versions of IDL. COM IDL was closely based on DCE IDL, although it lacked some of the advanced features supported by COM. COM ODL had support for these features but was incompatible with COM IDL. Clearly, that was an odd state of affairs, and Microsoft fixed things a few years ago when it merged the two IDL languages. The following example shows a somewhat simplified version of the Account and InterestAccount interfaces in Microsoft's IDL:

[
   object, uuid(E9FF28F4-B79F-469A-B2D9-477FF19873A0), dual
]
interface IAccount : IDispatch
{
[propget, id(1)] HRESULT balance([out, retval] float *pVal);
[id(2)] HRESULT debit(in float amount, [out, retval] float *pVal);
[id(3)] HRESULT credit(in float amount, [out, retval] float *pVal);
} ;

[
   object, uuid(5B93E296-4FF5-4A6C-A64E-51A7B6C20B6C), dual
]
interface IInterestAccount : IAccount
{
   [propget, id(1)] HRESULT rate([out, retval] float *pVal);
} ;

[
   uuid(22AE9E3F-DC3C-478E-BC00-13A735D57167), version(1.0)
]
library AccountLib
{
   [
      uuid(0D1630F9-C4E0-46A6-BCDF-A9B752DBDD94)
   ]
   coclass Account
   {
      [default] interface IAccount;
   } ;

   [
      uuid(F949F2A1-18C1-4010-A062-6B8DF49D4BCE)
   ]
   coclass InterestAccount
   {
      [default] interface IInterestAccount;
   } ;
} ;

Needless to say, Microsoft's ODL is not syntax-compatible, or even concept-compatible, with CORBA IDL. For example, metadata about elements is provided via attributes. Attributes prefix elements using the format [ name(value), ... ]. The convention is to prefix interface names with a capital I. To support dynamic invocation from scripting languages, interfaces must inherit from IDispatch and methods need to identify their location in dynamic dispatch tables with the id attribute. COM does not support exceptions. Error information is communicated via the return value of a method, which must be an HRESULT (an integer with well-defined error codes). Therefore, the real return value of a method is identified as [out, retval] in the IDL. Finally, COM supports a true separation of interfaces (IAccount and IInterestAccount) from implementations (Account and InterestAccount). The latter are identified by the coclass keyword in the account library. COM uses UUIDs exclusively to register interfaces and implementation classes under unique names. Apart from these differences, COM objects can be used exactly like CORBA objects.

Programmers working with modern languages, such as Java, C#, and any other .NET Common Language Runtime (CLR) languages have the luxury of being able to engage in distributed computing applications typically without having to worry much about IDL. Has IDL become irrelevant in these cases? Not at all! IDL is not present on the surface, but IDL concepts are working behind the scenes.

Both Java and the CLR languages are fully introspectable. This means that a compiled language component carries complete metadata about itself, such as information about its parent class, properties, methods, and any supported interfaces. This information is sufficient to replace the need for an explicit IDL description of the component. That is why, for example, Java developers can invoke the RMI compiler directly on their object without having to generate IDL first.

However, in the cases where these languages need to interoperate with components built using other programming languages, there is no substitute for IDL. In short, separating interfaces from implementations is the only guaranteed mechanism for ensuring the potential for interoperability across programming languages, platforms, machines, address spaces, memory models and object versions. On the Web, where heterogeneity is the rule, this is more important than ever.