A Proposal on a Privacy Control Method for Geographical Location Information Systems

Michiko IZUMI <michi-i@is.aist-nara.ac.jp>
Nara Institute of Science and Technology
Japan

Sohgo TAKEUCHI <sohgo@csl.sony.co.jp>
Sony Computer Science Laboratory
Japan

Yasuhito WATANABE <riho-m@sfc.wide.ad.jp>
Keisuke UEHARA <kei@wide.ad.jp>
Keio University
Japan

Hideki SUNAHARA <suna@wide.ad.jp>
Nara Institute of Science and Technology
Japan

Jun MURAI <jun@wide.ad.jp>
Keio University
Japan

Abstract

For the mobile computing environment, binding the cyberworld to the real world is an important service. We have developed a geographical location information (GLI) system that binds a logical location of network entities to its geographical location information. Privacy control in the GLI system is important. In this paper, we propose a privacy control scheme for our GLI system. The scheme provides the following capabilities:

manage public and private information at once
protect private information from illegal user access
access private information without direct communication with the owner

We are implementing a prototype GLI system with a privacy control mechanism. A preliminary evaluation of the prototype system is also given.

1. Introduction
2. GLI System
- 2.1 Brief outline of the system
- 2.2 The role of each element
3. Requirements of Privacy Control Method
4. Design of privacy control scheme
5. Prototype implementation
6. Future work
7. Conclusion
Acknowledgments
References

1. Introduction

With the advance of computing and internetworking technology, the public today has a fairly wide knowledge of the Internet, and the mobile computing environment, including wireless communications and some navigation systems, has become popular. With the omnipresence of fixed and mobile computers on the Internet, their physical spatial location information is valuable for many kinds of computing applications on the Internet.

We have developed the GLI system to make that location information available on the Internet[1]. The GLI system provides a way to map the GLI of entities and identifiers on the Internet. Now, we are using the system in the InternetCAR Project[2] to manage the location information of moving cars. However, the current GLI system does not have any privacy control mechanism.

Many people do not like the prospect of having their every move tracked. This concern is legitimate, given that somebody's activities can often be inferred from where that person is. Hence, mechanisms for protecting privacy are necessary. After the experiments from the InternetCAR Project, we recognized that privacy control in the GLI system is quite important. Moreover, the integrity of the GLI is also essential.

Thus, we concluded that GLI needs to be protected against unauthorized disclosure and modification. In this paper, after we outline our GLI system, we consider the security requirements of the GLI system. Then we propose a concrete privacy-protecting security model for the system. We have implemented this model over a prototype of the GLI system. A brief description of our prototype implementation is also given.

2. GLI System

Considering the intrinsic requirements for location service on the Internet (e.g., global scalability, reciprocity, and serviceability), we have proposed the GLI system to bind a logical location of network entities to the GLI.

In this section, we describe the structure of our GLI system and how to register the data and provide the location information to the client.

2.1 Brief outline of the system

To access the entities, such as fixed and mobile computers and noncomputer entities with network connectivity, in a practical way, obtaining the GLI of an entity connected to the Internet is necessary. Put another way, the correspondence of spatial location information and cyberidentifiers in the Internet is required for such different deployment environments. Thus, we have developed the GLI system to bind a logical location of network entities to the GLI. This system was composed of the following three elements in the first prototype:

servers, which manage the GLI in databases
agents, which obtain the geographical location data from a device, such as a global positioning system (GPS), and register them to servers
client, which makes use of the location data with queries to servers

The GLI parameters are shown by the following table.

Table 1: GLI Parameters

Location	Latitude-longitude-altitude
Velocity	North-east-up
Device	GPS, etc.
Datum	WGS-84, OSGB-36, NAD-27, etc.
Data type	C/A, P, RDGPS, Kinematic, etc.
Time	Time of fix

Basic GLI parameters include location, velocity, device type, datum, and data type. Location is expressed by a latitude, longitude, and altitude coordinate. Velocity is expressed by a combination of speeds in three directions: north, east, and up. Device represents a type of device for determining location. Datum is used to translate GLI to GLI based on this local area. Data type represents the order of accuracy. Time represents the time when the GLI was fixed.

With respect to scalability, in the second prototype system, we have separated the servers into area servers and home servers; the area server gathers the data from the entities that exist in the area the server covers, and the home server maintains the personal data of each entity.

This system makes it possible to look up the identifier of an entity on the Internet based on the physical location information and to search the location of the entity based on the identifiers. In other words, a client can look up the entity identifiers by specifying location as a key (the query "who is there?") and vice versa (the query "where are you?"). The upshot is that our system can allow the users to search for various entities in the real world through the Internet and to obtain up-to-date location information about the specified entities in real time. The current system uses an IP address or FQDN (fully qualified domain name) as the identifier in the database of the server.

The effectiveness of the GLI system was shown through experiments of the InternetCAR Project[2]. This system was independent of machine architecture and has been tested on BSD/OS, Sun/OS, and NEWS-OS.

2.2 The role of each element

The current GLI system is composed of four elements: home server, area server, agent, and client. In this section, we describe the role and the behavior of each element.

Home server: This server maintains each entity on the latest GLI system. This receives the query "where are you?" from a client and sends a reply back containing the result to the client.
Area server: This server manages the GLI sent from agents that exist where it covers in the database, receives the query "who is there?" from a client, and sends a reply back with the results to the client. That is, this server acts like the base station on the cellular phone network.
Agent: An agent runs on each entity, collects the GLI, and registers it to the server. In addition, an agent receives requests from other entities to use the GLI and returns the GLI to the appropriate entities.
Client: Clients provide an interface between the user and the GLI system. They issue user query requests ("where are you?" "who is there?") to the server and return the reply to the user. The client may ask for the latest GLI of a specified entity by accessing the entity directly.

3. Requirements of Privacy Control Method

Some GLI information is valuable and available for everybody on the Internet. For example, geographical location and speed of entities can convert to traffic jam or parking lot information. But simultaneously, we also have some private information, such as the user name of the owner of each entity, in our system. The private information must be protected, but the current GLI system does not have a privacy control mechanism.

This section discusses topics related to secrecy and integrity in the GLI system: public or private information, the identifiers on this system, how to protect the personal data, and communication among system elements. We are especially concerned with finding a way to balance the need for privacy control against the need to maintain processing speeds and other measures of performance.

Public or private?

We basically define the following data stored in our system as public: geographical location (latitude, longitude, altitude), movement of entities (direction, speed), and several attributes of entities (switch position of lights, outside temperature, and so on). Such public information is useful for everybody on the Internet. For example, geographical location and speed of entities can convert to traffic jam and parking information.

However, many people do not like the idea of having their moves tracked. Thus, private information (e.g., the user name of the owner of each entity) should not be available to everybody. In our scheme, we control access to such private information. The GLI servers store encrypted private information, but they cannot decrypt the information. Servers can access these data only with the added access control lists (ACLs). Decryption of private information is the responsibility of receiver clients. By this method, we can keep the private information secure.

Identifiers on this system

As stated above, we used the IP address or FQDN as identifiers on the current system. Nevertheless, not only raw personal data like these, but also the data that can be conjectured are quite insecure. For example, if you sent your GLI with an unprotected identifier frequently, an intruder could easily and successfully impersonate you. Needless to say, identifiers should be made by their legal owners if and only if they are authenticated.

This system requires that identifiers meet the following requirements:

Identifiers are impossible to estimate from other information.
Identifiers are available for a controlled period of time.
Only legal owners can generate identifiers.

How to protect personal data

Leaving private information unprotected, for everybody to see, is clearly undesirable. Processing the information is a simple solution. But when should the information be processed? In this section, we consider the three places for data to be processed:

On servers: This is the most general way. It is possible to authenticate the user simultaneously. Yet there is a problem if the server is not reliable.
On the way between servers and agents: This depends on the condition of the network. Accidents occur easily.
On agents: Needless to say, this is a gain in quantity of processing. The thin client (agent) is desirable in the mobile and ubiquitous computing environment[3].

We should determine the opportunity for communications, that is, protocol, by taking account of the disconnected operations.

Communication among system elements

We have to be concerned with the behavior of each component on the system, especially the communication and data protection components. The best systems will work in the following ways:

The identifiers and their data are stored separately.
It is possible to search and use the public location data (in area servers) every time the user (client) needs those data.
It is impossible for illegal users to search the private user data (in home servers).
There is a clear mechanism to discover whether the client is a legal user or not.
The intruders cannot obtain meaningful private data.
Behavior is normal if the communication is suddenly disconnected.

Unfortunately, it is impossible to make a system that satisfies these requirements with average computers in a practical way. Thus, as a result of some discussion, we propose the following realistic security mechanism for our GLI system.

4. Design of privacy control scheme

We propose the following privacy control model on the GLI system to make it practical with some encryption protocols. The process of data encryption is a common way to protect information and to warrant its integrity. Moreover, the combination of some encryption protocols can allow applications to authenticate users if they have applicable information such as keys or if they can generate the adequate data using the information and tools that they have.

4.1 Notation for our system structure

First, we describe some terms that are employed in our GLI system:

GLI: GLI includes latitude, longitude, altitude, and vector (direction, speed). Attribute information includes the switch position of lights, the outside temperature, and so on.
t: The time when the GLI was fixed
pseudo ID: Identifier on our system that is generated uniquely and is meaningless in area servers and managed on home servers
ID: Identifier on the Internet or in the real world
HS: Home server information such as the host name
AS: Area server information such as the name of a place.
E(M): The data M that was encrypted by a shared secret key. More information about the key is provided later.
ACL: Access control list for personal information
E(ID)+ACL: Encrypted ID with access control list

In the following sections, we explain the privacy control scheme on the latest GLI system using these terms.

4.2 Pseudo ID for the identifiers

We discussed the need for protected identifiers on this system to run it on the Internet. Because of that, we cannot continue to use the IP address or FQDN as identifiers. Thus, the pseudo ID, which cannot be guessed from the public information, has been introduced to the latest GLI system prototype.

What we need for our system is a unique identifier that seems meaningless to illegal users. Such an identifier, a pseudo ID, is generated by any string including t (time) and some area server information on the area server. The pseudo ID and real identifier are bound and managed on home servers.

The two possible methods for generating identifiers were discussed: (1) heuristic hash algorithms such as SHA[4][5] or HMAC[6], and (2) encryption chaining mode. We concluded that it was best to use the hash function for generating the identifiers on this prototype.

4.3 Not storing the plain text (raw data) on the server

An ideal security mechanism for our GLI system is to make sure that protected identifiers and GLI information do not correspond. It is even better to have a feature that renders the GLI meaningless for intruders (surely legal users can obtain the right GLI). For these requirements, we utilize a cryptography technology to process a GLI.

In this section, we outline the state of data on the area server and the home server. On area servers we store some data (GLI, t, pseudo ID, and E[HS]). E(HS) is home server information encrypted by a shared secret key. On home servers we maintain another data set (pseudo ID, AS, t, and E[ID]+ACL). Since both of these servers have no decrypt routine (i.e., the servers do not recognize what they store), they cannot communicate with each other. Even if intruders succeed in capturing all of the data on these servers, they do not have the secret key.

We have to give careful consideration to balancing this logical fine-grained security mechanism against realistic performance on the Internet. As a result of this consideration, we determined to use symmetric cryptography to protect information and to guarantee the integrity on this prototype, and asymmetric cryptography for authentication.

4.4 The behavior of each element

In this section, we assume that agents have the secret key, which is shared by a group of agents for encryption/decryption and is distributed in advance. We call the shared secret key the "group key." The group consists of a hierarchy like IP address domains. A home server manages the agents of one or more groups that share the same secret key.

There is some doubt whether the shared secret key would be distributed safely to all huge groups on the Internet. Yet we expect that the key distribution would work well for small groups. That is why the group consists of a hierarchy, that is, the huge group would be divided into some small groups.

4.4.1 The data on each element

As stated above, the GLI system is composed of four elements: the area server, the home server, the agent, and the client.

Area Server

We need the routine for generating a pseudo ID, that is, hash function, here. The data managed by area servers are pseudo ID with encrypted home server information of the agent, GLI, and time. Public location information is maintained by this server, and the client obtains some valuable information, such as traffic jam information, from the query "who is there?"

Home Server

Home servers manage the following data: time, the encrypted real identifier with ACL, a corresponding pseudo ID of the agent, and the latest area server that generated the pseudo ID. This server maintains the latest pseudo ID and area server information of each agent, thus answering the query "where are you?" from legal clients.

Both servers should not have a decryption routine.

Agent

The agent needs the routine to encrypt the home server information and identifiers. Once they are encrypted, they would not be changed and used until the key is altered. If the agent belongs to only one group, there is no doubt about the key selection. This problem is under investigation in the case of multiple groups.

The data that are sent to servers are the following:

To area server: encrypted home server information, GLI, time when the GLI is fixed. After sending these data, agents request their pseudo ID.
To home server: time, encrypted identifier with ACL, pseudo ID that is accepted from area server, and area server information that has generated the pseudo ID.

If the shared key is modified or the group members are changed, sometimes, according to the speed of key distribution, a disagreeable agent might use a shared key. This is why we decided to add the ACL to the encrypted data by the shared key.

Client

In this element, we require the decryption routine and encryption one to request the location information of the specified person "where are you?" The client sends requests to applicable servers. If the client can obtain the encrypted data successfully, it decrypts and displays the result. Thus, the client can obtain all the public information provided by all agents but cannot see private information of an agent who belongs to a different group.

4.4.2 GLI registration from agents to servers

Here is a flowchart of GLI registration from agents to servers.

Setup
1. The agent generates the encrypted data E(HS) and E(ID) using its group's shared key.
2. The agent makes the ACL with a legal client who has the same shared key.
3. The agent obtains the GLI from the device, such as GPS, periodically.
Interaction
1. The agent sends the data set -- E(HS), GLI, time -- to the nearest AS. The mechanism to discover the nearest AS depends on the original GLI system.
2. The agent requests a pseudo ID corresponding to the data set.
3. The AS accepts the data set and the request.
4. The AS generates a pseudo ID corresponding to the data set and stores the data and pseudo ID.
5. If the data set and its pseudo ID are stored successfully, the AS send the pseudo ID and server information, such as a server name or location name covered by the server.
6. If the pseudo ID and the AS information are sent successfully, the agent makes the other data set: E(ID)+ACL, pseudo ID, AS.
7. If the agent is authorized to register the data set in the HS database, the HS registers it and sends an acknowledgment to the agent.

4.4.3 Queries from client to servers

The query "who is there?" would be treated in the following way:

Setup
1. A client displays the map corresponding to some GLI of the AS.
Interaction
1. The client makes the query "who is there?" with the area information to the apposite AS.
2. The AS searches the pseudo IDs over the specified area with the GLI and answers the client with the data set -- pseudo IDs, E(HS), AS.
3. The client accepts the pseudo IDs and E(HS). At this point, the client finds the number of agents in the AS. This data seems like traffic jam information. If the client wants to know who the agents are, it proceeds to the following steps.
4. The client tries to decrypt E(HS). If the decryption goes well, the client learns who belongs to the same group and has the appropriate shared key.
5. Obtaining the HS information of the pseudo ID with the decryption, the client can send the data set (AS, pseudo ID) to the HS.
6. The HS searches the pseudo ID and checks the corresponding ACL to verify if the sender is a legal client.
7. If the HS can find the sender (client) in the ACL, it sends the corresponding E(ID) to the client.
8. The client tries to decrypt E(ID). If it succeeds, the client could know the real identifier (ID) corresponding to the requested pseudo ID.

The query "where are you?" would be treated in the following way:

Setup
1. A client generates the E(ID) of the target agent using the same shared key. If the client does not belong to the same group as the target, the E(ID) will not be generated.
Interaction
1. The client sends the query "where are you?" with the E(ID) to the applicable HS. At this point, we have a premise that the client used to know the HS that manages the inquiring entity.
2. The HS searches the E(ID) in its database and checks the sender with the corresponding ACL.
3. If the sender (client) is found in the ACL, the HS sends the data set (AS, pseudo ID) to the client. At this point, the client learns generally where the target agent might be. If the client wants to know the detailed area, it takes the following steps.
4. The client sends the query "where is the pseudo ID?" with the pseudo ID to the specified AS.
5. The AS looks up the pseudo ID in its database and answers the client by sending the GLI and time.

Our proposed model, that is, the latest prototype, can be used to address the requirements mentioned above. The next section gives a brief description of our prototype implementation.

5. Prototype implementation

We implemented a GLI system prototype with a privacy control mechanism as described in section 4. We used some functions from OpenSSL-0.9.4[7] EVP library to encrypt the data or to make a pseudo ID.

Data encryption/decryption: We use the function "EVP_des_ede3_cbc()" in the library of OpenSSL-0.9.4 as a Triple-DES[8] CBC mode encryption.
Generation of pseudo ID: The function SHA of OpenSSL-0.9.4 was used in this prototype. The 20 bytes of binary data were generated from the input value, FQDN, and salt.
Parts of communication: TCP socket programming had been used to implement the parts of communications on the Internet.

Our proposing model can be used to address the requirements of privacy control as mentioned above. This prototype allowed us to verify the consistency of our location system and its security model.

6. Future work

We will continue to research the following issues and discuss improving the GLI system. We want to develop a more precise information security scheme considering the balance between theory and practice.

The term of validity: Pseudo ID expiration

Pseudo IDs should not be fixed forever, as mentioned above. We will considers having one-time pseudo IDs or setting a certain period, such as a day or a week, for pseudo ID expiration.

Illegal agent

We expect to solve this problem using an authentication technique such as a digital signature.

Key distribution

We are now discussing solving the key distribution problem using KDC (Key Distribution Center), CA (Certificate Authority), and KPS (Key Predistribution System)[9] or IDKMS (ID-based Key Management System)[10].

Investigation into the use of IPsec for data registration

When an agent tries to register the data in two servers, the area server and the home server, using this privacy control scheme, wire tapping might be possible. To be brief, the intruder might obtain some pseudo ID and source address pairs of the agent from IP packets. This would not always be a grave issue since the source address of the agent can be varied because of its mobility. However, in the interest of even more secure system architecture, we are investigating the use of IPsec as a transport of data registration in addition to the pseudo ID validity.

7. Conclusion

Location information is valuable to many kinds of computing and networking applications on the Internet. However, there are fears that inclusion of personal information, such as the owner names of the mobile computer, could lead to new security risks, that is, invasion of privacy.

In this paper, we have discussed the security requirements faced by location information services on the Internet, especially our GLI system. To enhance our GLI system's privacy control scheme, we considered several points, such as the kind of information, the disconnected operation, and the use of an encryption protocol.

We discussed a concrete privacy control model in the GLI system and concluded with a brief description of our prototype implementation, along with a discussion of related work.

Acknowledgments

The authors thank the members of Minato Lab, ITC, Nara Institute of Science and Technology, especially Ms. Mika Ito for providing support. We also thank the members of the WIDE Project, especially the rover working group.

References

[1] The Design and Implementation of the Geographical Location Information System: Proc INET'96
Yasuhito Watanabe, Atsushi Shinozaki, Fumio Teraoka, and Jun Murai
[2] http://www.sfc.wide.ad.jp/InternetCAR/
[3] Virtual Network Computing: IEEE Internet Computing, vol. 2, no. 1, Jan./Feb. 1998
Tristan Richardson, Quentin Stafford-Fraser, Kenneth R. Wood, and Andy Hopper
[4] Secure Hash Standard: Federal Information Processing Standards Publication 180-1
(Supersedes FIPS PUB 180 -- 11 May 1993)
U.S. Department of Commerce/National Institute of Standards and Technology
http://csrc.nist.gov/fips/fip180-1.txt
[5] IP Authentication using Keyed SHA: RFC 1852 (September 1995)
[6] Keyed-Hashing for Message Authentication: RFC 2104 (February 1997)
[7] http://www.openssl.org
[8] Data Encryption Standard (DES): Federal Information Processing Standards Publication 46-2
(Supersedes FIPS PUB 46-1 -- 22 January 1988)
U.S. Department of Commerce/National Institute of Standards and Technology
http://www.itl.nist.gov/fipspubs/fip46-2.htm
[9] Key Sharing without Communication: Key Predistribution System: Journal of the Institute of Electronics, Information, and Communication Engineers, vol. J71, no.11, Nov. 1988, pp. 2046-2053
Tsutomu Matsumoto and Hideki Imai
(in Japanese)
[10] The ID-based Key Management System (IDKMS): Internet draft, expires Jan 1999

A Proposal on a Privacy Control Method for Geographical Location Information Systems

Abstract

Contents