Quality of Service

Sophisticated e-Business Web services should be able to provide various levels of quality according to customer requirements and choices. The Quality of Service (QoS) concept originated in the communications industry, based on the portion of sent packets that is received at the target without corruption. There is a similar concept in e-Business, although applications involving banking or air ticket reservation are still not allowed to perform erroneous operations or stop operations. This section describes an architecture for enterprise SOAP servers, paying special attention to QoS for e-Business applications. We provide an overview of the architecture for the enterprise SOAP server, followed by a detailed treatment of the various aspects of QoS.

Enterprise SOAP Server

In the section "Enterprise Application Integration," we reviewed how to integrate existing applications with the J2EE architecture. The typical configuration is comprised of a Web server, a servlet container, some EJB Containers, Databases, and backend applications. The Web server, Servlet container and EJB Containers portion of this configuration is related to integration. This portion is termed Web Service Container (WSC) for the sake of discussion in this section.

What happens if a WSC receives a large number of requests, beyond its capacity to process them? It is likely that the WSC will take itself offline or, worse yet, one component (such as the Web server) will crash the computer that it is running on. A known technique to avoid such a situation is to prepare clones of the original WSC for load-balancing. Once you introduce clones, the configuration becomes fairly complicated. Therefore, you need a solid architecture to manage various aspects of the whole system, such as the status of each component and security.

Figure 5.21 illustrates an architectural overview of an enterprise SOAP server. At the heart of this architecture is a collection of Web Services Containers (WSCs). Each WSC invokes backend systems, such as the order management system, and databases. For the sake of simplification, assume that the backend systems and databases are at least as scalable as the WSCs. For example, databases are configured into a Data Area Network in which data replication is automatically performed in preparation for a possible fail-over.

Figure 5.21. An architecture for an enterprise SOAP server.

graphics/05fig21.gif

Load-balancing, for instance, would be performed by a Round-Robin Domain Name Service (RR-DNS) and a TCP Router , which dispatches an incoming SOAP message to one of the WSCs. A system monitor checks the status of each component in each WSC and might start up another WSC clone when the load becomes too high, for example. In this complex configuration, managing end-to-end security is a difficult task. In the enterprise SOAP server architecture, there is a single secure domain where all security information is managed; this information is the basis for all security decisions.

Note that the architecture for the enterprise SOAP server shown in Figure 5.21 is not necessarily the best architecture. To come up with the best architecture, the business and system requirements must be considered. However, the architecture in Figure 5.21 does clearly show what aspects should be considered for the purpose of developing an e-Business QoS. Each aspect is discussed in detail in the following sections.

High Availability

Availability is an obvious QoS requirement because users generally expect that services are always, or very nearly always, available. If a service is frequently unavailable, users stop relying on its service provider. An airline ticket reservation system, for example, requires 99% availability; the service can be unavailable for only two hours per week.

Load-balancing is a means to achieving high availability, especially in terms of increasing the scalability of the system. As shown in Figure 5.21, it combines two techniques: RR-DNS and the TCP Router. A Domain Name Service (DNS) maps domain names (hostnames) to IP addresses, and the round-robin variety of DNS allows a hostname to be mapped to one of several IP addresses. An IP address is chosen in round-robin manner, meaning that consecutive requests are assigned to the available IP addresses in a pre-determined sequence, repeating itself at the end of the sequence. HTTP redirect is often used instead of RR-DNS for the same purpose. When a client accesses a server, the server responds with a redirect instruction to one of the available hosts. One disadvantage of this approach is that URLs of the target hosts are visible and then are potentially stored on the client side (such as in a browser's recent URL list or a bookmark), and the client might directly access this URL in subsequent requests. Thus, the purpose of load-balancing might be defeated.

Using TCP routers, you can configure clusters to scale up processing power. A TCP router forwards incoming requests to WSCs. Although the router's name and IP address are public, the addresses of WSCs are hidden from the client. The TCP router could use a load-based algorithm to select a target WSC, or simply adopt the round-robin process. There are many commercially available TCP routers, such as IBM Network Dispatcher (ND). ND can run on several operating systems, such as Windows, AIX, and Solaris, and can forward up to 10,000 messages per second when it runs on an embedded OS. In addition to the TCP router approach, you can use a technique called Network Address Translation (NAT) . NAT dynamically edits a particular IP packet header to change the destination address, and edits the return packet in the same manner. Cisco Local Director is an example of a commercial product for NAT.

Scaling up the system with load-balancing techniques might decrease the risk of system failure, but system failure can still occur. But WSC clones, in addition to being used to increase scalability, can also serve as a backup of the original WSC, taking over the processing when the WSC fails. This introduces the system properties of redundancy (using clones for possible backup) and fault tolerance (performing an automated procedure in case of fail-over).

Another issue related to system failure is recovery. Generally speaking, recovery requires the system to perform a complicated recovery sequence. For example, some kind of status indicator might be recorded at certain checkpoints, and the recovery process might rely on the status history. However, because the SOAP server uses transaction processing, there need not be such a complicated recovery sequence in the enterprise server architecture. By definition, operations within a transaction context are committed or aborted. Therefore, in the event of a system failure, the EJB containers will perform most of the recovery sequence on behalf of applications.

System Management

In order to completely fulfill the requirement for high availability, system management must also be considered. Because the configuration of real e-Business systems is fairly complex, as shown in Figure 5.21, there needs to be a systematic means to manage the whole system. Ideally, there would be a single point of management, from which even system resources at other sites can be managed. System management might include the following tasks:

Monitoring the status of system resources
Sending alert information to system administrators
Remotely configuring, deploying, and controlling system resources
Automatically resolving system resource issues if possible

Typically, system management requires agents, each of which monitors one or more system resources and continuously sends information to the system monitor. Because sending too much information unduly consumes network bandwidth, a proper filtering mechanism is also required.

The system monitor in Figure 5.21 is a central controller to perform system management. On the basis of the information sent by agents, the system monitor controls system resources: It starts/stops a resource, adds a resource, or changes the configuration of a resource. Generally, system administrators interact with the system monitor to remotely control system resources. Optionally, the system monitor could adjust automatically without consulting the system administrator when certain known problems are detected. Of course, this is possible only when you provide the system monitor with a remedy procedure in advance.

Several types of system resources need to be monitored: network devices, operating systems, databases, application servers, applications, and so on. Network device management is easy to implement in the sense that there is a standard specification of network management, called Simple Network Management Protocol (SNMP) , and a large number of devices support SNMP.

On the other hand, system software and middleware packages each have their own management tools. For example, operating systems might collect values of key system parameters, such as application statuses, number of users logged in, jobs in a print queue, system load, free disk space, and other properties. In this case, total system management vendors should provide an agent to retrieve these OS parameters and report them to their system monitor.

The application monitor is the hardest task because application-specific agents have to be developed. In the case of the enterprise SOAP server, a system monitor agent must be placed in the Axis engine. WSTK 2.4 provides a beta implementation of system management for Apache SOAP. It can collect the following information:

Total number of SOAP services deployed
Total number of RPC calls to all services combined
Total number of successful invocations of each service
Average response/transaction time for successful requests

The implementation of this SOAP system management is based on the Java Management Extension (JMX) . With JMX, you can construct and maintain resource objects that contain key parameters, and embed agents to monitor the values of those parameters. WSTK 2.4 will be extended to include additional features, such as operations to change the behavior of a service and system administrator notification when certain criteria are met.

In summary, system management is a complex task because different types of system resources, located at potentially many different sites, need to be monitored and controlled from one point. Although some aspects, such as network management, already have well-developed solutions, emerging areas have not been organized for system management. One immediate problem is that there is no standard way to manage J2EE platforms. However, there is ongoing discussion toward the goal of standardizing J2EE Management in JSR 77.

Enterprise Security

A complex system configuration requires an integrated security architecture. Middleware applications, such as Web servers, application servers, and databases, provide their own respective security functions. However, the security functions should be integrated in a centralized manner, as shown in the previous section. In addition, highly confidential information, such as private keys, must be protected. This centralized approach also contributes to the protection of the single secure domain.

The security server in Figure 5.21 is an aggregation of commercial security products. It has a user registry in an LDAP server that manages user IDs, passwords, certificates, security attributes, roles, and so on. In this architecture, the Web server neither stores nor checks user IDs and passwords, but rather delegates user authentication to the security server. In addition to the user registry, the security server can manage access control rules for authorization and a table for mapping user IDs between different security domains. For example, when users need to access a backend legacy system, their Internet IDs might be mapped to IDs for the legacy system.

On the other hand, the security Web services shown in Figure 5.21 address XML- and SOAP-level security issues. As described earlier, SOAP digital signature and encryption handlers are provided in the form of Java classes and embedded in the Axis engine. However, this configuration requires private keys to be distributed to the original and all clone nodes.

XKMS suggests managing public and private keys in a single repository and accessing that repository via SOAP messages. You can apply this Web services approach to the signature and encryption handlers. First, manageability of security handlers is improved because they are simply invoked from various Axis nodes. Second, manageability of security information is improved because it is protected in a secure domain. In future development, a more comprehensive collection of security services, such as timer and authorization handlers, might be specified and provided in addition to signature and encryption. Thus, some of the functions in the security server could become security Web services.