A Proposal on a Privacy Control Method for Geographical Location Information Systems

Michiko IZUMI <michi-i@is.aist-nara.ac.jp>
Nara Institute of Science and Technology
Japan

Sohgo TAKEUCHI <sohgo@csl.sony.co.jp>
Sony Computer Science Laboratory
Japan

Yasuhito WATANABE <riho-m@sfc.wide.ad.jp>
Keisuke UEHARA <kei@wide.ad.jp>
Keio University
Japan

Hideki SUNAHARA <suna@wide.ad.jp>
Nara Institute of Science and Technology
Japan

Jun MURAI <jun@wide.ad.jp>
Keio University
Japan

Abstract

For the mobile computing environment, binding the cyberworld to the real world is an important service. We have developed a geographical location information (GLI) system that binds a logical location of network entities to its geographical location information. Privacy control in the GLI system is important. In this paper, we propose a privacy control scheme for our GLI system. The scheme provides the following capabilities:

  1. manage public and private information at once
  2. protect private information from illegal user access
  3. access private information without direct communication with the owner

We are implementing a prototype GLI system with a privacy control mechanism. A preliminary evaluation of the prototype system is also given.

Contents

1. Introduction

With the advance of computing and internetworking technology, the public today has a fairly wide knowledge of the Internet, and the mobile computing environment, including wireless communications and some navigation systems, has become popular. With the omnipresence of fixed and mobile computers on the Internet, their physical spatial location information is valuable for many kinds of computing applications on the Internet.

We have developed the GLI system to make that location information available on the Internet[1]. The GLI system provides a way to map the GLI of entities and identifiers on the Internet. Now, we are using the system in the InternetCAR Project[2] to manage the location information of moving cars. However, the current GLI system does not have any privacy control mechanism.

Many people do not like the prospect of having their every move tracked. This concern is legitimate, given that somebody's activities can often be inferred from where that person is. Hence, mechanisms for protecting privacy are necessary. After the experiments from the InternetCAR Project, we recognized that privacy control in the GLI system is quite important. Moreover, the integrity of the GLI is also essential.

Thus, we concluded that GLI needs to be protected against unauthorized disclosure and modification. In this paper, after we outline our GLI system, we consider the security requirements of the GLI system. Then we propose a concrete privacy-protecting security model for the system. We have implemented this model over a prototype of the GLI system. A brief description of our prototype implementation is also given.

2. GLI System

Considering the intrinsic requirements for location service on the Internet (e.g., global scalability, reciprocity, and serviceability), we have proposed the GLI system to bind a logical location of network entities to the GLI.

In this section, we describe the structure of our GLI system and how to register the data and provide the location information to the client.

2.1 Brief outline of the system

To access the entities, such as fixed and mobile computers and noncomputer entities with network connectivity, in a practical way, obtaining the GLI of an entity connected to the Internet is necessary. Put another way, the correspondence of spatial location information and cyberidentifiers in the Internet is required for such different deployment environments. Thus, we have developed the GLI system to bind a logical location of network entities to the GLI. This system was composed of the following three elements in the first prototype:

  1. servers, which manage the GLI in databases
  2. agents, which obtain the geographical location data from a device, such as a global positioning system (GPS), and register them to servers
  3. client, which makes use of the location data with queries to servers

The GLI parameters are shown by the following table.

Table 1: GLI Parameters

Location Latitude-longitude-altitude
Velocity North-east-up
Device GPS, etc.
Datum WGS-84, OSGB-36, NAD-27, etc.
Data type C/A, P, RDGPS, Kinematic, etc.
Time Time of fix

Basic GLI parameters include location, velocity, device type, datum, and data type. Location is expressed by a latitude, longitude, and altitude coordinate. Velocity is expressed by a combination of speeds in three directions: north, east, and up. Device represents a type of device for determining location. Datum is used to translate GLI to GLI based on this local area. Data type represents the order of accuracy. Time represents the time when the GLI was fixed.

With respect to scalability, in the second prototype system, we have separated the servers into area servers and home servers; the area server gathers the data from the entities that exist in the area the server covers, and the home server maintains the personal data of each entity.

This system makes it possible to look up the identifier of an entity on the Internet based on the physical location information and to search the location of the entity based on the identifiers. In other words, a client can look up the entity identifiers by specifying location as a key (the query "who is there?") and vice versa (the query "where are you?"). The upshot is that our system can allow the users to search for various entities in the real world through the Internet and to obtain up-to-date location information about the specified entities in real time. The current system uses an IP address or FQDN (fully qualified domain name) as the identifier in the database of the server.

The effectiveness of the GLI system was shown through experiments of the InternetCAR Project[2]. This system was independent of machine architecture and has been tested on BSD/OS, Sun/OS, and NEWS-OS.

2.2 The role of each element

The current GLI system is composed of four elements: home server, area server, agent, and client. In this section, we describe the role and the behavior of each element.

3. Requirements of Privacy Control Method

Some GLI information is valuable and available for everybody on the Internet. For example, geographical location and speed of entities can convert to traffic jam or parking lot information. But simultaneously, we also have some private information, such as the user name of the owner of each entity, in our system. The private information must be protected, but the current GLI system does not have a privacy control mechanism.

This section discusses topics related to secrecy and integrity in the GLI system: public or private information, the identifiers on this system, how to protect the personal data, and communication among system elements. We are especially concerned with finding a way to balance the need for privacy control against the need to maintain processing speeds and other measures of performance.

Public or private?

We basically define the following data stored in our system as public: geographical location (latitude, longitude, altitude), movement of entities (direction, speed), and several attributes of entities (switch position of lights, outside temperature, and so on). Such public information is useful for everybody on the Internet. For example, geographical location and speed of entities can convert to traffic jam and parking information.

However, many people do not like the idea of having their moves tracked. Thus, private information (e.g., the user name of the owner of each entity) should not be available to everybody. In our scheme, we control access to such private information. The GLI servers store encrypted private information, but they cannot decrypt the information. Servers can access these data only with the added access control lists (ACLs). Decryption of private information is the responsibility of receiver clients. By this method, we can keep the private information secure.

Identifiers on this system

As stated above, we used the IP address or FQDN as identifiers on the current system. Nevertheless, not only raw personal data like these, but also the data that can be conjectured are quite insecure. For example, if you sent your GLI with an unprotected identifier frequently, an intruder could easily and successfully impersonate you. Needless to say, identifiers should be made by their legal owners if and only if they are authenticated.

This system requires that identifiers meet the following requirements:

How to protect personal data

Leaving private information unprotected, for everybody to see, is clearly undesirable. Processing the information is a simple solution. But when should the information be processed? In this section, we consider the three places for data to be processed:

  1. On servers: This is the most general way. It is possible to authenticate the user simultaneously. Yet there is a problem if the server is not reliable.
  2. On the way between servers and agents: This depends on the condition of the network. Accidents occur easily.
  3. On agents: Needless to say, this is a gain in quantity of processing. The thin client (agent) is desirable in the mobile and ubiquitous computing environment[3].

We should determine the opportunity for communications, that is, protocol, by taking account of the disconnected operations.

Communication among system elements

We have to be concerned with the behavior of each component on the system, especially the communication and data protection components. The best systems will work in the following ways:

Unfortunately, it is impossible to make a system that satisfies these requirements with average computers in a practical way. Thus, as a result of some discussion, we propose the following realistic security mechanism for our GLI system.

4. Design of privacy control scheme

We propose the following privacy control model on the GLI system to make it practical with some encryption protocols. The process of data encryption is a common way to protect information and to warrant its integrity. Moreover, the combination of some encryption protocols can allow applications to authenticate users if they have applicable information such as keys or if they can generate the adequate data using the information and tools that they have.

4.1 Notation for our system structure

First, we describe some terms that are employed in our GLI system:

GLI
GLI includes latitude, longitude, altitude, and vector (direction, speed). Attribute information includes the switch position of lights, the outside temperature, and so on.
t
The time when the GLI was fixed
pseudo ID
Identifier on our system that is generated uniquely and is meaningless in area servers and managed on home servers
ID
Identifier on the Internet or in the real world
HS
Home server information such as the host name
AS
Area server information such as the name of a place.
E(M)
The data M that was encrypted by a shared secret key. More information about the key is provided later.
ACL
Access control list for personal information
E(ID)+ACL
Encrypted ID with access control list

In the following sections, we explain the privacy control scheme on the latest GLI system using these terms.

4.2 Pseudo ID for the identifiers

We discussed the need for protected identifiers on this system to run it on the Internet. Because of that, we cannot continue to use the IP address or FQDN as identifiers. Thus, the pseudo ID, which cannot be guessed from the public information, has been introduced to the latest GLI system prototype.

What we need for our system is a unique identifier that seems meaningless to illegal users. Such an identifier, a pseudo ID, is generated by any string including t (time) and some area server information on the area server. The pseudo ID and real identifier are bound and managed on home servers.

The two possible methods for generating identifiers were discussed: (1) heuristic hash algorithms such as SHA[4][5] or HMAC[6], and (2) encryption chaining mode. We concluded that it was best to use the hash function for generating the identifiers on this prototype.

4.3 Not storing the plain text (raw data) on the server

An ideal security mechanism for our GLI system is to make sure that protected identifiers and GLI information do not correspond. It is even better to have a feature that renders the GLI meaningless for intruders (surely legal users can obtain the right GLI). For these requirements, we utilize a cryptography technology to process a GLI.

In this section, we outline the state of data on the area server and the home server. On area servers we store some data (GLI, t, pseudo ID, and E[HS]). E(HS) is home server information encrypted by a shared secret key. On home servers we maintain another data set (pseudo ID, AS, t, and E[ID]+ACL). Since both of these servers have no decrypt routine (i.e., the servers do not recognize what they store), they cannot communicate with each other. Even if intruders succeed in capturing all of the data on these servers, they do not have the secret key.

We have to give careful consideration to balancing this logical fine-grained security mechanism against realistic performance on the Internet. As a result of this consideration, we determined to use symmetric cryptography to protect information and to guarantee the integrity on this prototype, and asymmetric cryptography for authentication.

4.4 The behavior of each element

In this section, we assume that agents have the secret key, which is shared by a group of agents for encryption/decryption and is distributed in advance. We call the shared secret key the "group key." The group consists of a hierarchy like IP address domains. A home server manages the agents of one or more groups that share the same secret key.

There is some doubt whether the shared secret key would be distributed safely to all huge groups on the Internet. Yet we expect that the key distribution would work well for small groups. That is why the group consists of a hierarchy, that is, the huge group would be divided into some small groups.

4.4.1 The data on each element

As stated above, the GLI system is composed of four elements: the area server, the home server, the agent, and the client.

Area Server

We need the routine for generating a pseudo ID, that is, hash function, here. The data managed by area servers are pseudo ID with encrypted home server information of the agent, GLI, and time. Public location information is maintained by this server, and the client obtains some valuable information, such as traffic jam information, from the query "who is there?"

Home Server

Home servers manage the following data: time, the encrypted real identifier with ACL, a corresponding pseudo ID of the agent, and the latest area server that generated the pseudo ID. This server maintains the latest pseudo ID and area server information of each agent, thus answering the query "where are you?" from legal clients.

Both servers should not have a decryption routine.

Agent

The agent needs the routine to encrypt the home server information and identifiers. Once they are encrypted, they would not be changed and used until the key is altered. If the agent belongs to only one group, there is no doubt about the key selection. This problem is under investigation in the case of multiple groups.

The data that are sent to servers are the following:

If the shared key is modified or the group members are changed, sometimes, according to the speed of key distribution, a disagreeable agent might use a shared key. This is why we decided to add the ACL to the encrypted data by the shared key.

Client

In this element, we require the decryption routine and encryption one to request the location information of the specified person "where are you?" The client sends requests to applicable servers. If the client can obtain the encrypted data successfully, it decrypts and displays the result. Thus, the client can obtain all the public information provided by all agents but cannot see private information of an agent who belongs to a different group.

4.4.2 GLI registration from agents to servers

Here is a flowchart of GLI registration from agents to servers.

4.4.3 Queries from client to servers

The query "who is there?" would be treated in the following way:

The query "where are you?" would be treated in the following way:

Our proposed model, that is, the latest prototype, can be used to address the requirements mentioned above. The next section gives a brief description of our prototype implementation.

5. Prototype implementation

We implemented a GLI system prototype with a privacy control mechanism as described in section 4. We used some functions from OpenSSL-0.9.4[7] EVP library to encrypt the data or to make a pseudo ID.

Our proposing model can be used to address the requirements of privacy control as mentioned above. This prototype allowed us to verify the consistency of our location system and its security model.

6. Future work

We will continue to research the following issues and discuss improving the GLI system. We want to develop a more precise information security scheme considering the balance between theory and practice.

The term of validity: Pseudo ID expiration

Pseudo IDs should not be fixed forever, as mentioned above. We will considers having one-time pseudo IDs or setting a certain period, such as a day or a week, for pseudo ID expiration.

Illegal agent

We expect to solve this problem using an authentication technique such as a digital signature.

Key distribution

We are now discussing solving the key distribution problem using KDC (Key Distribution Center), CA (Certificate Authority), and KPS (Key Predistribution System)[9] or IDKMS (ID-based Key Management System)[10].

Investigation into the use of IPsec for data registration

When an agent tries to register the data in two servers, the area server and the home server, using this privacy control scheme, wire tapping might be possible. To be brief, the intruder might obtain some pseudo ID and source address pairs of the agent from IP packets. This would not always be a grave issue since the source address of the agent can be varied because of its mobility. However, in the interest of even more secure system architecture, we are investigating the use of IPsec as a transport of data registration in addition to the pseudo ID validity.

7. Conclusion

Location information is valuable to many kinds of computing and networking applications on the Internet. However, there are fears that inclusion of personal information, such as the owner names of the mobile computer, could lead to new security risks, that is, invasion of privacy.

In this paper, we have discussed the security requirements faced by location information services on the Internet, especially our GLI system. To enhance our GLI system's privacy control scheme, we considered several points, such as the kind of information, the disconnected operation, and the use of an encryption protocol.

We discussed a concrete privacy control model in the GLI system and concluded with a brief description of our prototype implementation, along with a discussion of related work.

Acknowledgments

The authors thank the members of Minato Lab, ITC, Nara Institute of Science and Technology, especially Ms. Mika Ito for providing support. We also thank the members of the WIDE Project, especially the rover working group.

References

[1] The Design and Implementation of the Geographical Location Information System
Proc INET'96
Yasuhito Watanabe, Atsushi Shinozaki, Fumio Teraoka, and Jun Murai
[2] http://www.sfc.wide.ad.jp/InternetCAR/
[3] Virtual Network Computing
IEEE Internet Computing, vol. 2, no. 1, Jan./Feb. 1998
Tristan Richardson, Quentin Stafford-Fraser, Kenneth R. Wood, and Andy Hopper
[4] Secure Hash Standard
Federal Information Processing Standards Publication 180-1
(Supersedes FIPS PUB 180 -- 11 May 1993)
U.S. Department of Commerce/National Institute of Standards and Technology
http://csrc.nist.gov/fips/fip180-1.txt
[5] IP Authentication using Keyed SHA
RFC 1852 (September 1995)
[6] Keyed-Hashing for Message Authentication
RFC 2104 (February 1997)
[7] http://www.openssl.org
[8] Data Encryption Standard (DES)
Federal Information Processing Standards Publication 46-2
(Supersedes FIPS PUB 46-1 -- 22 January 1988)
U.S. Department of Commerce/National Institute of Standards and Technology
http://www.itl.nist.gov/fipspubs/fip46-2.htm
[9] Key Sharing without Communication: Key Predistribution System
Journal of the Institute of Electronics, Information, and Communication Engineers, vol. J71, no.11, Nov. 1988, pp. 2046-2053
Tsutomu Matsumoto and Hideki Imai
(in Japanese)
[10] The ID-based Key Management System (IDKMS)
Internet draft, expires Jan 1999