Alexandre R. SILVA <firstname.lastname@example.org>
Military Institute of Engineering
Michael A. STANTON <email@example.com>
Universidade Federal Fluminense
Although the need for public key infrastructure (PKI) has long been recognized for the widespread use of secure communication, the lack of easily available implementations has severely limited the use of public key cryptography to a small number of specific applications, with specialized solutions. The article reviews the PKI standards and discusses several proposed solutions to the problem of certificate path processing, before presenting a new algorithm, called dynamic path determination, which is designed to address a number of realistic wide-scale applications of this technology. The new algorithm is included in Pequi, a PKI implementation, based on the integration of publicly available software. Pequi provides a network of certification authorities, which use the lightweight directory access protocol to maintain a distributed repository of certificates. The current prototype retrieves all certificates needed to construct a certification path, which it then proceeds to validate.
[Note: Pequi, pronounced "pickee," is a fruit-bearing tree of the scrublands of the Central Highlands of Brazil.]
The most efficient way to provide security for communication between parties within a communication network is the use of cryptography. However, in order for this mechanism to keep information secure, safe ways of distributing keys must exist. Basically, two kinds of cryptography are employed: symmetric and asymmetric.
With symmetric cryptography, as the number of users, N, increases, the overall number of keys, corresponding to each pair of those users, increases quadratically in N. Besides that, the keys have to be distributed one by one. This means there is no easy way to distribute secret keys to a large number of users, using only symmetric cryptography.
If asymmetric cryptography is used, then the number of keys increases linearly with the number of users; that is, each user has just one pair of keys, no matter how many communication partners he or she has. The private key is kept secret, while the public one is propagated to all interested users, by automatic, electronic means. This kind of cryptography can also be used to distribute secret keys automatically by means of the partners' public keys.
Despite the advantages of using asymmetric keys, the binding between a public key and a given user cannot be guaranteed. On the other hand, a trusted third party (TTP), called a certification authority (CA), can be used to sign public keys digitally, providing confidence in this binding. The document containing this binding information is known as a digital certificate, or simply as a certificate.
To verify the validity of a partner's public key, the user checks the CA's signature on the certificate with the CA's own public key. If this CA is not trusted by the user, then he or she must check a chain of certificates, and their associated signatures, which provides a certification path between the trusted CA and the communication partner.
Public key infrastructures (PKIs) -- which are responsible for managing the above process, as well as for the creation, publishing, and revocation of certificates, among other functions -- have been recognized as efficient, and are being widely employed to assist in securing communications. Nevertheless, the lack of easily available implementations in the public domain limits the use of PKIs to a small number of limited and specific applications.
This paper addresses the problem of using PKIs in large-scale distributed systems. A comparative study of PKI standards is presented, and emphasis is given to the discussion of certification path processing methods. The main result of this study is a proposal for a new path processing method and a prototype implementation to demonstrate its viability. This experimental tool is based on the integration of publicly available implementations of relevant server software and a client program, which retrieves the required certificates from distributed repositories in order to construct a path and perform its validation.
PKIs are defined in  as the set of hardware, software, people, policies, and procedures needed for the creation, management, storage, distribution, and revocation of certificates, based on public key cryptography. The main role of a PKI is the management of keys and certificates in a trusted and efficient way for its users.
To take advantage of certificates within large distributed systems, must exist efficient ways to make them available to all system users. One solution is the use of directory services as repositories, to store information about system users and entities and allow easy access to this information by different applications. Usually, directory services follow the International Telecommunications Union-Telecommunications (ITU-T) X.500 series of recommendations . The lightweight directory access protocol (LDAP) , based on X.500, is also often used to access X.500 directory services used as PKI repositories.
The X.500 directory service model is based on entries, which correspond to information about objects. An entry is a set of attributes and has a characteristic name known as distinguished name (DN) that provides unique identification.
The goal of PKI standardization is to provide interoperability, therefore allowing its use in distributed and heterogeneous environments. There are currently two open PKI standards: ITU-T Recommendation X.509 v3 and Internet Engineering Task Force (IETF) Public Key Infrastructure X.509 (PKIX). Both these standards will be discussed in what follows.
Recommendation X.509 defines a framework for authentication services to directory users . Both applications and services can be considered to be users. This recommendation provides a way for communication partners to retrieve mutual authentication information, when and as needed. Two authentication levels are described: simple and strong. The latter, which is more secure, relies on asymmetric cryptography. Its main advantage is the use of certificates, issued by CAs, which may be stored as attributes in directories and made available to the public.
The certificate defined by X.509 is composed of basic fields, like version, serial number, issuer DN, subject DN, validity period, and public key information, among others, and extensions. The main role of these extensions is to allow the addition of new fields to certificates, without any need to modify their coding structure, which uses the distinguished encoding rules, a subset of basic encoding rules . This eases design and implementation of security policies in large-scale distributed environments. Recommendation X.509 defines extensions to perform the following functions: provision of information about public keys and policies; provision of alternative names to issuers and subjects; certification path constraining; inclusion of reason codes and sequence numbers to certificate revocation lists (CRLs); and splitting revocation information into a number of CRLs or merging it from several sources into one CRL.
To describe the notation defined by X.509 to represent certificates, the following relationships between CAs G, H, and J are initially assumed (see figure 1): G is directly subordinated to J in hierarchy; H is directly subordinated to J in the same hierarchy; G and H are not directly subordinated to each other. Therefore, the notation is:
Certification paths are sets of certificates constituting a chain of trust between two entities, one of which needs to authenticate the other though the former is able to validate the public key of the latter. The method of determining a path can vary. An easy way to do this is to make available a hierarchy of CAs, so that users of this hierarchy can establish certification paths, by accessing only the directory, without any previous information. Figure 1 shows an example of a CA hierarchy.
Figure 1. Example of CA hierarchy
In the above example, for A to establish a certification path to B, the following certificates are retrieved: D<<G>>, G<<J>>, J<<H>>, H<<F>>, F<<B>>.
Usually the number of certificates to be retrieved can be reduced in a number of ways:
The Internet X.509 PKIX is an effort of IETF PKIX Working Group  to provide deterministic and automatic identification, authentication, access control, and authorization functions on the Internet. Its establishment was motivated by the fact that X.509 defines a very general framework, which allows the coexistence of several X.509-compliant implementations incompatible between themselves.
This proposed standard defines certificate and CRL protocols and profiles, as well as imposes a number of restrictions designed to enhance PKI application interoperability and management, without the need for large bandwidth, real-time IP (Internet protocol) connectivity, or high connection availability . Although PKIX is based on X.509, it does not require the use of X.500 directory systems; neither does it forbid them. Therefore, certificate and CRL distribution methods, other than directories, may also be used.
PKIX defines certificate and CRL format and semantics for Internet use, determining a common base for applications that require both high interoperability and limited requirements for special purposes, such as e-mail, World Wide Web, and IPSec. The basic certificate fields and extensions are those defined in X.509, with the authority information access extension added to guide applications to online validation services. To enforce interoperability, the following extensions are required to be handled by PKIX-compliant certificate processing applications: authority key identifier, basic constraints, key usage, certificate policies, policy constraints, name constraints, subject alternative name, and extended key usage.
In order to apply LDAPv2 as certificate and CRL repositories, PKIX defines an operational protocol, describing the access operations to repositories via LDAP , and an information storage schema, describing attributes and object classes to be used by LDAP servers acting as PKIX repositories .
No method to retrieve certificates constituting a certification path is specified by PKIX, although an algorithm to validate such a path is described in . A PKIX-compliant implementation does not necessarily need to adopt this particular algorithm, but any alternative must be functionally equivalent in its external behavior.
The basic validation algorithm defined assumes that all valid paths begin with certificates issued by only one CA, called the most trusted CA, which is defined by policy. This can be the root CA in the PKI hierarchy (self-signed certificate), the issuing CA of the verifying user's certificate, or any other suitable choice. The path validation procedure is the same, independent of the choice of the most trusted CA.
The processing of a certification path in order to verify its validity is composed of two steps: determination -- when certificates are retrieved from a repository and the path is constructed -- and validation -- when each certificate has its integrity, validity period, and information related to semantics verified, in order to validate the constructed path.
From an examination of the definitions of X.509  and PKIX  certification path issues, one can conclude that the first approach is too general and no specific method to determine or validate paths is described. The second defines in detail only a basic path validation algorithm, and path determination is not subject to standardization.
In what follows, four existing proposals for path processing methods are presented and their applicability to large distributed corporate environments is discussed. Then, a new method is proposed and a prototype, implemented to show its viability, is described.
This method is widely implemented by commercial CAs and most utilized Web browsers for secure socket layer (SSL)  authentication, as well as secure multipurpose Internet mail extensions (S/MIME)  compliant e-mail clients.
A certificate chain is a structure composed of a set of certificates, from the CA immediately below the root CA to the user, signed and enveloped by the root CA of the hierarchy. To be able to authenticate users or entities of different hierarchies, the client stores all self-signed root CA certificates needed. The path validation usually consists of the basic algorithm extended to handle multiple self-signed certificates, as suggested in PKIX . Some of the methods applied to publish or retrieve certificate chains are S/MIME e-mail messages and access to distributed repositories via LDAP, among others.
Path processing with certificate chains and multiple self-signed root CA certificates is simple and easy to implement, since only the validation mechanism is necessary, as the determination process is performed at the time of certificate-chain signing and enveloping. Nevertheless, in large intra- and intercorporate environments, the maintenance and management of root CA certificate local lists for each and every user, in conformance with the security policy of the company, becomes extremely unwieldy. Users must be frequently updated about inclusion and exclusion requests for self-signed certificates from the system manager. If one of the users is not logged in or does not pay attention to a request at a given time, he or she risks applying unauthorized root CA certificates and causing serious financial or other kinds of losses.
The implementation of an additional server to manage automatically every client's local list could eliminate that burden, but the more servers a system uses, the more it is subject to unavailability caused by system failure or denial-of-service attacks. Besides, if an attacker succeeds in spoofing that server, all its clients get compromised, because he or she may impersonate the server and distribute to the system users any self-signed certificate.
In this method, proposed in , the client program previously checks and stores all local CA hierarchy to which its user belongs. This hierarchy is handled as a directed graph, where nodes represent CAs and arcs, the certificates that interrelate those CAs. According to the proposed algorithm, called HI, nodes are inserted in a new graph, beginning with the CA that issued the user's certificate subject to verification, until an intersection node with the local hierarchy graph is found. Each certificate has its integrity verified as it is inserted into this new graph. After the end of the determination task, the path validation is then performed.
As there is no restriction imposed on certification policies by this method, there can be multiple, alternative paths. Thus, the proposed algorithm is designed to determine the shortest of these. If one of these paths is invalid, then the process verifies another, in a sequence from the shortest to the longest.
This graph approach provides a unique, general, and flexible way of certification path determination, independent of the hierarchy format resulting from the certification policies. Besides, there is only one self-signed certificate to rely on, making the hierarchy graph approach less vulnerable to spoofing attacks directly against users. The confidence in certificates from other PKIs is established by means of mutual cross-certification with any CAs in the local hierarchy.
However, as the number of DITs increases, the tasks of certificate retrieval, graph construction, and determination of the shortest path may become extremely onerous. In cases where several graphs are linked to each other at a large number of nodes, direct or indirectly, there may be a need to retrieve almost all certificates from such hierarchies. Thus, client implementations of the hierarchy graph method might consume more computational resources than their main functions.
A solution to make such a method practical for large distributed systems is to reduce traffic and the demand on processing resources by storing the most utilized PKIs in a local cache. Nevertheless, the effort required to manage and maintain the cache also compromises the practicality of such an implementation.
Another alternative might be the use of a cache together with a PKI information server, in order to keep clients updated. The disadvantage of such an approach is an increase in the probability of system unavailability due to accidental failure or denial-of-service attacks, as well as vulnerability to spoofing attacks against the server, compromising all its users, as was already pointed out in the previous method.
A service to perform all path processing tasks in order to authenticate users and entities is proposed in , where certificate paths are processed by a data certification server (DCS) -- a TTP, defined by PKIX in , whose main functions are signature validation and the provision of updated certificate status information.
To authenticate a PKI user, the client sends an authentication request to the DCS, together with the user's certificate. The DCS retrieves all needed certificates from distributed repositories, determines and validates the path, and sends the results back to the client with a token that allows several transactions during a certain time interval, without the need to repeat such authentication.
The client implementation is extremely simple, as it performs effectively neither determination, nor processing of certification paths. On the other hand, the whole security of the system practically relies on one server. Consequently, if failure, denial-of-service, or spoofing attacks occur, all users and entities are compromised. To protect such a server, strong additional security measures must be provided. In addition, for an implementation of a DCS to provide such a service means an increase in cost of an already complex high-end server.
The determination of certification paths, as proposed and discussed in , is done by the repository access server. After receiving a request, such a server determines the path and sends back to the user all the certificates that comprise it. The client remains responsible for validating the path found. The protocol employed for repository access operations is LDAP.
In large corporate environments, the implementation of such a service in a LDAP server could lead to a processing overload and its main activity -- to provide access to repositories -- might be compromised. Besides, the modification of an already consolidated and widely employed protocol such as LDAP would compromise its portability, scalability, and interoperability, although no security vulnerability is added, because validation is done by the client.
The method presented below aims at producing a solution for corporate environments that could be implemented simply, without the need for additional servers. An experimental implementation of the method is also described.
Before the definition of such a method, an essential step is to outline a certification policy and the environment in which it will be employed -- large organizations with geographically distributed branch offices, needing to communicate with other similar organizations, each with its own CA hierarchy. The policy outlined for such an environment, illustrated in figure 2, is:
Figure 2. Certification policy application example.
The method of dynamic path determination constructs a path as each certificate is retrieved from a repository via LDAP, without algorithms to handle graphs and determine shortest paths. The simplicity of such method resides in the fact that only one certification path is possible, because only root CAs are authorized by certification policy to cross certificate root CAs of other hierarchies and there is no transitivity of trust between such root CAs, as described above.
The process of certificate path verification needs a local list stored in the client environment, containing all certificates that constitute the path between the user's issuing CA and its hierarchy root CA, including the latter. Such a local list is called a local trust chain. PKIX path validation algorithm is applied to this method after path determination, defined as follows:
The resulting certification path to be validated begins with the certificates of the local trust chain and ends with the certificate of the user to be verified.
As only the local trust chain is stored at the client, there is no need for additional servers to manage local PKI information for each client. Thus, implementation costs are reduced. In addition, interoperability is improved, as there is no need to define a new server standard or modify some existing protocol. In order to reduce the traffic generated by the proposed verification method, a local cache could be used to store the most recently or the most frequently used certification paths.
A prototype called Pequi was implemented in C for the Red Hat 5.1 Linux environment in order to demonstrate the practicality of the proposed dynamic path determination method. This prototype performs the whole verification process automatically, accessing distributed repositories via LDAP and making use of referrals. Strong authentication to control access to the repository is provided by Kerberos, and the functions of certificate handling and certification path validation are performed by Oscar. The libraries employed by Pequi are described below.
The Application Program Interface (API) of KTH-Kerberos, compatible with Kerberos v4 and developed at the Royal Institute of Technology, Sweden, allows the implementation of several authenticated clients and servers  such as telnet, FTP, popper, rsh, and login. This API is used directly by LDAP to provide strong authentication for repository access.
The LDAP API, implemented in C, defines synchronous and asynchronous interfaces to access directories, and tools to handle entry attributes and values. OpenLDAP 1.0.2  was the implementation used because of its greater portability and the larger number of bug fixes. This API is based on the original implementation from the University of Michigan .
Version 3.0b6 of the Open Secure Certificate ARchitecture (Oscar), implemented in C++ at the Distributed Systems Technology Centre of the Queensland University of Technology, Australia, provides a large number of PKIX functions required to manage PKIs, manipulate and validate certificates, and validate certification paths .
A limitation of Oscar's certification path validation function restricts the validation of paths to those that begin with self-signed certificates. Therefore, the local trust chain described earlier could not be implemented. Only the local root CA certificate is used as the beginning of certification paths.
It is worth noting that each repository access performed by Pequi retrieves all certificates and cross certificate pairs contained in the entry found. Such a feature reduces network traffic and allows the use of multiple certificates by users, one for each desired usage.
The prototype does not implement CRL queries to verify certificate status, but this may easily be inserted in the future. The PKI management functions, such as certificate publishing, key pair generation, certification request and issuance, and repository entry maintenance are provided by command tools already implemented by OpenLDAP and Oscar.
This article discussed issues related to the establishment of trust chains between communication partners by means of certification paths. Since the current standards, X.509 and PKIX, do not completely cover such issues, allowing the possibility of noninteroperable, albeit compliant, implementations, and only PKIX imposes restrictions on path validation, we identified the need to study certification path verification methods, and especially the path determination process.
Four path verification methods were described and discussed: certificate chains, hierarchy graphs, certification path validation service, and a modified LDAP server. The proposed method, dynamic path determination, was shown to have advantages over the other methods discussed in large distributed corporate systems. Some of these advantages were:
Also, some drawbacks were identified, and a few solutions suggested:
Some features still need to be implemented for this prototype to be useful in large-scale applications: a local trust chain to store local hierarchy CA certificates; a valid path cache to store frequently or recently verified paths; a personal secure environment for the secure storage of the user's private key, the local trust chain, and the cache of valid paths; and finally the inclusion of CRLs stored in repositories and used to provide information about certificate status.