Why Do We Need a Web Services Approach?

The beginning of this chapter explained the motivation for application-to-application communication over the Internet to address the current challenges of distributed computing and B2B integration in particular. Since 1999, the software industry has been rapidly evolving XML-based Web services technologies as the approach to these problems. In the maelstrom of press hype, product releases, and standards announcements, many people have been left wondering whether this is a good in which direction to go. After all, we already have many different mechanisms for distributed computing. Surely, some of them would be able to rise to meet the challenges of e-business. Why build a completely new distributed computing stack based on Web services?

This is a very good question and one that is hard to give a short answer to. "Because Web services use XML" is not the right answer. It is a correct observation, but it doesn't answer the crucial question as to why using XML makes such a big difference. At a basic level, there are three key reasons why existing distributed computing approaches are inferior to Web services for solving the problems of e-business:

The scope of problems they try to address
The choice of available technology
Industry dynamics around standards control and innovation

Scoping the Problem

Traditional distributed computing mechanisms have typically evolved around technical architectures rather than broader problems of application integration. For example, CORBA evolved as a solution to the problem of implementing rich distributed object architectures. At the time, it was implicitly assumed that this was the right approach to getting applications to communicate with one another. As we discussed earlier, experience has shown that RPCs are not always the best architecture for this requirement. The need for loosely coupled applications and business process automation has clearly shown the benefits of simply exchanging messages containing data (typically a business document) between the participants of e-business interactions, a so-called document-centric approach. Distributed computing specifications address messaging as a computing architecture; however, there has been no unifying approach that brings RPCs and messaging to the same level of importance—until Web services, that is.

Web services have evolved not around pre-defined architectures but around the problem of application integration. This is a very important distinction. The choice of problem scope defines the focus of a technology initiative. Web services technologies have been designed from the ground up to focus on the problems of application integration. As a result, we are able to do things outside the scope of traditional distributed computing approaches:

Support both document-centric messaging and RPCs
Transport encoded data from both applications and business documents
Work over open Internet protocols such as HTTP and SMTP

In other words, Web services are better suited for the task than what we have so far because we have specifically built them with this in mind. COM/CORBA/RMI are still great technologies for tying together distributed objects on the corporate network. However, the e-business application integration problem is best tackled by Web services.

Core Technologies

Because Web services address a much more broadly scoped problem, they use much more flexible technologies than traditional distributed computing approaches. Further, with Web services we can leverage all that we have learned about connecting and integrating applications since we first started doing distributed computing. These two factors put Web services on a better technology foundation for solving the problems of e-business than traditional distributed computing approaches.

Later, in the "Web Services Interoperability Stacks" section, we introduce the notion of Web services interoperability stacks. These interoperability stacks organize a layering of technologies that define the capabilities of Web services. It is possible to compare the Web services approach to traditional distributed computing approaches level-by-level to see why the technical foundation of Web services is more appropriate for the problems it needs to solve. Rather than going through this lengthy process, let's focus on two key capabilities: the ability to represent data structures and the ability to describe these data structures.

Data encoding is a key weakness for traditional distributed computing approaches, particularly those that are programming language independent. Sure, they typically have a mechanism to represent simple data (numbers, strings, booleans, date-time values, and so on), basic arrays, and structures with properties. However, mapping existing complex datatypes in applications to the underlying data encoding mechanisms was very difficult. Adding new native datatypes was practically impossible (doing so required a complete update of specifications). The fact that data was encoded in binary formats further complicated matters. For example, processing code had to worry about little- vs. big-endian issues when reading and writing numbers.

Web services address these issues by using XML to represent information. XML's text-based form eliminates byte ordering concerns. The wide availability of XML processing tools makes participation in the world of Web services relatively easy. XML's hierarchical structure (achieved by the nesting of XML elements) allows changes at some level of nesting in an XML document to be made with ease without worrying about the effect on other parts of the document. Also, the expressive nature of attributes and nested elements makes it considerably easier to represent complex data structures in XML than in the pure binary formats traditionally used by COM and CORBA, for example. In short, XML makes working with arbitrary data easier.

The choice of XML brought another advantage to Web services—the ability to describe datatypes and validate whether data coming on the wire complies with its specification. This happens through the use of XML meta-languages such as XML Schema. Binary data encodings typically used for distributed computing offered no such mechanism and thus pushed data validation into application logic, considerably complicating applications dealing with non-trivial data.

Industry Dynamics

Momentum is a very important aspect of the dynamics of software innovation. Great problems gate great opportunities. The desire to capitalize on the opportunities generates momentum around a set of initiatives targeted at solving the problem. This momentum is the binding force of our industry. This is how major innovation takes place on a broad scale. The challenge of e-business application integration is great; this is why all the key players in the industry are focused on it (see the sidebar "Web Services Market Dynamics"). Customer need, market pressure, and the desire to be part of the frontier-defining elite have pushed many companies to become deeply engaged with Web services. Good things are bound to happen. Consider this: The last time every one of the key infrastructure vendors was focused on the same set of issues was during the early days of e-business when the industry was trying to address the challenges of building Web applications. The net result was a new model for application development that leveraged the Web browser as a universal client and the Web application server as a universal backend. In short, trust that some of the very best minds in the industry working together under the aegis of organizations such as the W3C and OASIS will be able to come up with a good solution to the problems of e-business integration.

To the veterans of the software industry, momentum sometimes equals hype. So, are we trying to say that Web services will succeed because there is so much hype around them? Absolutely not! The momentum around Web services is real and different from what we have experienced so far with other distributed computing fads. The fundamental difference is around the ability of many industry players to engage in complementary standardization in parallel.

Parallelism is key to building real momentum and increasing the bandwidth of innovation. Traditional distributed computing efforts could not achieve this kind of parallelism because they were either driven by a single vendor—Microsoft promoting COM, for example—or they were driven by a large, slow organization such as the Object Management Group (OMG), which owns the CORBA standards. In both cases, the key barrier to fast progress was the centralized management of standards. Any change had to be approved by the body owning the standard. And Microsoft and OMG owned all of COM and CORBA, respectively. This is no way to gain real momentum, regardless of the size of the marketing budgets to promote any given technology. Vendors that feel they have very little control over the evolution of a technology will likely spend very little time investing in its evolution. In other words, you might use COM, but if you think you have no chance of influencing Microsoft's direction on COM you will probably not spend much time thinking about and prototyping ways to improve COM. Open-source efforts such as the Linux operating system and projects of the Apache Software Foundation fundamentally generate momentum because people working on them can have a direct influence on the end product. The momentum of Web services is real because standardization work is going on in parallel at the W3C, OASIS, UDDI, and many other horizontal and vertical industry standards organizations. Further, the major players so far have shown a commitment to do a lot of innovation out in the open.

The interesting thing from a technical perspective is that XML actually has something to do with the ability of Web service standardization to be parallelized. XML has facilities (namespaces and schema) that enable the decentralized evolution of XML-based standards without preventing the later composition of these standards in the context of a single solution. For example, if group A owns some standard and group B is trying to build an extension to the standard, then with some careful use of XML, group B can design the extensions such that:

Its extension can be published independently of the standard.
Its extension can be present in cases where the standard is used.
Applications that do not understand the extension will not break if the extension is present.
Applications that need the extension will only work if the extension is present.

The industry's focus on Web services combines the right scope (e-business application integration) with the right technologies (XML-based standards) with the potential for significant parallelism and high-bandwidth innovation. This is why Web services will be successful.

Distributed Computing History

Historically, distributed computing has been focused on the problem of distributing computation between several systems that are jointly working on a problem. The most often used distributed computing abstraction is the RPC. RPCs allow a remote function to be invoked as if it were a local one. Distributed object-oriented systems require object-based RPCs (ORPCs). ORPCs need some additional context to be able to invoke methods on specific object instances. The history of RPC-style distributed computing and distributed objects is fairly complicated. The following timeline illustrates some of the key events:

1987

Sun Microsystems developed the Open Network Computing (ONC) RPC system as the basic communication mechanism for its Network File System (NFS).

Apollo Computer developed the Network Computing System (NCS) RPC system for its Domain operating system.

1989

The Open Software Foundation (OSF, now The Open Group) issued a Request for Technology (RFT) for an RPC system. OSF received two key submissions. The first submission came from HP/DEC based on NCS (HP had acquired Apollo). The other submission came from Sun based on ONC. OSF selected NCS as the RPC mechanism for its Distributed Computing Environment (DCE).

The Object Management Group (OMG) was formed to deliver language- and platform-neutral specifications for distributed computing. (The consortium includes about 650 members as of the time of this writing.) The OMG began development of specifications for Common Object Request Broker Architecture (CORBA), a distributed objects platform.

1990

Microsoft based its RPC initiatives on a modified version of DCE/RPC.

1991

DCE 1.0 was released by OSF.

CORBA 1.0 shipped with a single language mapping for the C language. The term Object Request Broker (ORB) gained popularity to denote the infrastructure software that enables distributed objects.

1996

Microsoft shipped the Distributed Component Object Model (DCOM), which was closely tied to previous Microsoft component efforts such as Object Linking and Embedding (OLE), non-distributed COM (a.k.a. OLE2), and ActiveX (lightweight components for Web applications). The core DCOM capabilities are based on Microsoft's RPC technologies. DCOM is an ORPC protocol.

CORBA 2.0 shipped with major enhancements in the core distributed computing model as well as higher-level services that distributed objects could use. The Internet Inter-ORB Protocol (IIOP) was part of the specification. IIOP allows multiple ORBs to interoperate in a vendor-agnostic manner. IIOP is an ORPC protocol.

1997

Sun shipped JDK 1.1, which included Remote Method Invocation (RMI). RMI defines a model for distributed computing using Java objects. RMI is similar to CORBA and DCOM but works only with Java objects. RMI has an ORPC protocol called Java Remote Method Protocol (JRMP).

Microsoft announced the first iteration of COM+, the successor of DCOM. The capabilities of COM+ brought it much closer to the CORBA model for distributed computing.

1999

Sun shipped J2EE (Java 2 Platform Enterprise Edition). The Java 2 platform integrated RMI with IIOP, making it easy to interoperate between Java and CORBA systems.

Simple Object Access Protocol (SOAP) appeared for the first time. The era of Web services was born.

Although RPCs and distributed objects have been the traditional approaches for building distributed systems, they are by no means the only ones. Another very important approach is that of data-oriented or document-centric messaging. Rather than being focused on distributing computation by specifically invoking remote code, messaging takes a different approach. Applications that communicate via messaging run their own independent computations and communicate via messages that contain pure data. Messaging was popularized via the efforts of system integrators who were trying to get highly heterogeneous systems to interoperate. In most cases, the systems were so different that the requirement to perform fine-grain integration via RPCs was impossible to satisfy. Instead, system integrators were happy to be able to reliably move pure data between the systems. Commercially, the importance of messaging applications has been steadily growing since IBM released its messaging product MQSeries in 1993. Microsoft's messaging product is the Microsoft Message Queuing Server (MSMQ). J2EE defines a set of APIs for messaging through the Java Messaging Service (JMS). There has been no attempt to define a standard interoperability protocol for messaging servers.

One of the key benefits of Web services is that the core Web service protocols can support RPCs and messaging with equal ease. Chapter 3, "Simple Object Access Protocol (SOAP)," has a section that addresses this topic in detail.