Integrating Front-End Web Systems with Back-End Systems
Mitchell COHEN <firstname.lastname@example.org>
This paper presents the implications of different storage and replication techniques for the key business objects being used by both front-end and back-end systems simultaneously. Much work has been done on data synchronization and replication in the database area as thoroughly discussed by Bernstein, Hadzilacos, and Goodman. This paper will concentrate on the implications that the different data synchronization and replication paradigms have on common business objects needed for both a merchant server running on the Web and internal business systems. First, the need for integrating front-end and back-end systems from a business-object standpoint is established. Second, five paradigms of data storage and replication are defined. Finally, what each of the paradigms means for the different business objects is discussed.
The Electronic Commerce Revolution has created many new computing needs among which is back-end integration. One way to conduct business on the Internet is to have a front-end system running on a Web server that handles the interaction with the outside world. An example of such a system is a Web merchant server enabling customers (and potential customers) to browse and search product catalogs and descriptions in order to price and purchase items.
Internal business processes such as tracking inventory, processing orders, and accounting are already being handled by back-end systems for many companies. A front-end system requires much of the data already stored in an existing back-end system (for example, product availability) while at the same time, new data coming in from the front-end system (such as new orders) must be propagated to the back-end system for internal business processing. How can we connect the front-end systems to the information already in the back-end systems and have the two work together coherently, efficiently, and quickly? What should be considered for the different types of business objects?
Figure 1. Where replication and storage strategies fit within a front-end and back-end
There are several different basic approaches to storing data when a front-end system (such as a Web merchant server) works in conjunction with a back-end system (such as an Enterprise Resource Planning system). Different methods can be chosen for the different object types. For example, it may be best to have the customer information fully synchronized and replicated while only the back-end stores inventory information.
The different data storage and replication paradigms are based on two basic considerations:
There are many factors affecting the decisions. For each data type, one needs to consider desired or needed response times for data reads at each end of the integration as well as response times for data insertions or updates.
A requirement for having the front-end up during back-end downtime plays a role in the paradigm decision-making process. Many companies bring down their back-ends during nonbusiness hours. However, the Web eliminates the notion of "business hours" when dealing with consumers or possibly even other businesses. The ability for the front-end to run when the back-end is down is called front-end independence. In certain situations, perhaps for some smaller companies or in some business-to-business environments, it may be acceptable if the front-end is not fully functional periodically. In all the paradigms, the back-end has no reliance on the front-end being up, a key integration issue as companies do not want their normal (i.e., existing) business to suffer due to Web presence.
Simplicity of solution from development, maintenance, and cost perspectives is also a factor. Of course, for some data, part of the decision is predetermined as particular back-end systems require specific data items to be stored in and accessible directly from their systems. The same holds for many front-end systems.
Some of the paradigms require user exits or triggers in one or both of the ends. For instance, in order for a change in a back-end system to be synchronously applied in a front-end system, the back-end system needs to inform the front-end of the change via a call to the front-end (or some middleware sitting between the two ends). To make this call, the back-end must be enabled to make external procedure calls inside the business processes at the correct parts of the process. Many front-end and back-end packages come with this capability, and homegrown software can be enhanced to make these calls. Additionally, many of the front-ends and back-ends are based on databases that allow triggers.
For an asynchronous or batch update that needs to be sent from one end to the other, no additional processing is needed by the sender as long as an external process is able to poll (that is, check at a regular interval) for such updates. Some back-ends allow users to check for changes since a particular date and time.
Regardless of the storage scenario choice and implementation, for each type of object a decision must be made as to where updates of the object type are allowed. For instance, perhaps it makes sense for new customers to be added to both front-ends and back-ends while keeping pricing as only updateable via the back-end. Finer update granularity may be needed. Perhaps a customer address can be updated from the front-end, but only the back-end would be allowed to update the customer's credit terms.
Combining the storage and update decisions gives us several paradigms. For ease of discussion, the different scenarios of storage locations and replication timing will be described in five different paradigms. While in reality a hybrid of two or more of these paradigms can be used for each object type, only the chosen paradigms will be defined and analyzed.
In synchronous replication, the data is stored in both the front-end and back-end systems with immediate replication. Any data changes (inserts, updates, deletes) made at either end are propagated to the other end as they occur, meaning that data at both ends are always consistent. To ensure consistency, changes at one end should wait for successful propagation to the other end. On the other hand, for data reads there is no need to go to the other end; that is, there is quick read access on both ends.
With this paradigm, careful thought must be given to handling data changes at one end, say end X, when the other end, say end Y, cannot be communicated with. If it is known that end Y is not running, data changes at end X can be logged and processed at Y when Y is started up. If end X is unable to communicate with the end Y and it is unknown whether or not Y is up and running, two different potential solutions are
Depending on the intricacy of this reconciliation, implementation of this paradigm can be complicated and quite involved.
In periodic replication, the data is stored in both the front-end and back-end systems but data changes are propagated periodically in batch. Similar to the noncommunication periods in the use of synchronized replication, additions, modifications, and removals of a business object occur at one end only. Otherwise, there again needs to be a reconciliation of the changes.
Because each end stores data locally, both data access and changes are quick because there is no need to access a remote system, and each end is fully functional regardless of the status of the other end. However, there will be periods of data inconsistencies with the potential existence of outdated data.
With real-time access and update, only the back-end stores a copy of the data. The front-end accesses and changes data by going directly to the back-end on the fly (via a back-end API, ODBC, or some other similar mechanism). While it is possible to have it set up in the opposite manner, with the back-end accessing data stored at the front-end, this is atypical as most of today's existing back-end systems use internal data stores lacking flexibility, and hence, back-ends accessing front-end data will not be considered.
Without replication, accessed data is always current and there are no consistency issues. The two main disadvantages here are that a front-end relying on a back-end's data will not be functional (or at least fully functional) when the back-end is not operating or communicating, and data retrieval and modification may be time-consuming on the front-end as they are done remotely. Of course, data update on the back-end is quicker than in the above-mentioned replication paradigms because there is no replicating.
With real-time access/batch update, again only the back-end stores a copy of the data and the front-end accesses the data by going to the back-end. Data changes are done locally at the front-end and sent to the back-end periodically in batch. There is no need to wait for the back-end on these changes, but the back-end does not always have a current view of the data as changes may be lingering on the front-end side. With intelligent processing, the front-end can get an up-to-date view of the data by going to the back-end and factoring in its unforwarded data changes. So, the data changes at the front-end are quick, but data accesses are not. Additionally, the front-end cannot fully function during back-end downtime as it will be unable to access the data, but front-end functionality requiring only data changes (and not reads) will continue to be available.
The disparate paradigm, the least useful in integrating the two ends as it is actually not integration at all, has both the front-end and back-end systems keeping their own copies of the data without any synchronization. The lack of any data consistency limits the usefulness of this option, but deserves mentioning as a distinct paradigm that can be chosen at least for certain business objects.
There is one situation where the disparate paradigm can be useful. Some companies choose to separate the Web customers and their existing sales methods (telephone, mail, store). For them, while the same products (at least some of them) need to be accessible at both ends, customers may be separated. All the customer information for shoppers on the Web is stored at the front-end only. The back-end may have one generic place holder for a Web customer. To get the correct ship-to address, the information is passed from the front-end to the back-end as part of the order. Some back-ends allow an order to contain a "ship to" different from that which appears with the customer data.
Although generally impractical, this paradigm is not without advantages as there is no need to wait at either end for data accesses and changes or for the other system to be up, and it has merit when used as part of a hybrid method. One example of such a hybrid method deals with inventory. During periods of high inventory levels, we can use the disparate paradigm with inventory stored without any synchronization at both ends; when either end recognizes inventory becoming low, it warns the other end to switch over to a synchronous replication mode -- the rationale being that when inventory is high, we do not need to check inventory availability, affording a quick response. Availability needs to be questioned only when the inventory is low.
There are many business objects that can be analyzed. The main front-end application being investigated here is a merchant Web server. Customer, Catalog Item, Price, Inventory, Order, and Payment typify the business objects needed for Web shopping that are also stored in typical internal business process back-ends. Two types of users will be considered: external customers on the front-end and internal employees using a system for internal business processes on the back-end.
A Web merchant server front-end allows new customers to register and may allow for customer data to be imported from a back-end. The customer object includes obvious data such as name and address, but may also include profile information used to gear what kinds of marketing strategies are used to sell to the customer. There are two different patterns for access of customer data by the front-end (not including a customer userID and password required at each login):
The first option requires no customer information to be permanently stored at the front-end. Only the second option will be considered, as it enables more functionality for a merchant server such as specialized marketing to the customers based on their information.
Catalog item information is the center of a business. A catalog item gives the internal identifier for the product, product descriptions and attributes, and a price or its associated pricing object for more complicated pricing mechanisms.
A front-end views inventory only for availability. The back-end may have a much more elaborate view of inventory such as how much is at a particular warehouse, expected incoming and outgoing shipments, etc. The only inventory relevant to integration is product as opposed to materials used for product creation.
In most business-to-business environments, as well as some consumer environments, prices can be very complicated and become business objects of their own (as opposed to just being a single-value attribute of a catalog item).
The three actions performed on an order are price, place, and check status. An order consists of the purchaser and the items and quantities as well as other additional information such as ship-to addresses. Shoppers order from the front-end and the back-end. The back-end needs to know about all orders while a front-end needs to know only about its own orders unless customers are allowed to check the order status of nonWeb orders via the Web. Typically, merchant servers need not know about any orders placed in the back-end. However, other front-ends (besides merchant servers) and possibly a merchant server may allow order status checks on back-end orders.
Similar to orders, payments are generally only tracked in a merchant server when their corresponding orders were placed there. Consumers pay at the time of purchase on the merchant server using a credit card or some standard payment protocol such as Secure Electronic Transaction (SET). In the business-to-business environment, payments are usually made in traditional manners and entered in the back-end internally. Computerized standard payment schemes will change how payments are made here, too.
Because customer data is vital in so many parts of the front-end, including marketing, product display, and even shopper log-in, independence and, therefore, replication are needed. Changes made in the front-end need not be seen in the back-end immediately, but those made in the back-end that affect pricing or terms given need to appear quickly in the front-end. So, if periodic replication is used, this period needs to be short in the back-end to front-end direction.
When the front-end has a tool to design Web pages and shopping scenarios based on the catalog, periodic replication with a relatively long period is the clear choice. Having the data at both the front-end and back-end is needed, but the absolute latest changes are not. An exception to this recommendation arises when there is no catalog tool and the business requires constantly up-to-date information. In this case synchronous replication is generally appropriate, with real-time access being acceptable for back-ends with quick response times and high availability.
When inventory is above some danger level (where the largest typical order of an item is able to be filled), the periodic replication fits well. The period of replication need not be a fixed time interval. Back-ends can send updates to front-ends when levels get low, while front-ends update back-ends at fixed item order counts. Most merchant server systems display availability to shoppers on the Web browser. So, periods of low inventory (below the danger level) necessitate synchronized replication. Back-ends control the switching between synchronized and periodic replications by comparing the inventory level at each inventory change they process (including updates sent from front-ends).
Enterprise Resource Planning systems provide the ability to create extremely complicated pricing schemes based on a number of factors: the product, the customer, the quantity, the payment method, and many others. When such complicated methods are being used, it is unreasonable to attempt to duplicate this logic in a merchant server. In this case, real-time access of pricing in the back-end must be used by the front-end. The merchant server will not need to send updates, as pricing changes will have to be made in the ERP system. On the other hand, when pricing is straightforward, based on just the product or just the product and a customer classification, then replication makes sense (in a fashion similar to that described in Catalog Item).
For integration with a back-end that provides quick response time to order status checks, a form of the real-time access and update works well. Incoming orders are fed to the back-end during order placement at the front-end. Order status requests can go to the back-end. On the other hand, when back-ends cannot be relied on for quick turnaround, replication needs to be used -- synchronous when highly accurate order status checks are required, periodic when not.
For merchant servers handling consumer-type purchases (via credit card or a protocol such as SET), payments are typically replicated in a synchronous fashion during the order processing. The payment data on the front-end is a subset of that on the back-end, representing only the payments for front-end orders. In business-to-business environments where credit is used, periodic replication makes sense. All payments for businesses allowed access to the merchant server would need to be replicated. Payment data need not be immediate, allowing for the replication to be periodic.