The "Simple Internet Phone" has an architecture tuned for a future situation in which non-Internet networks, such as IP-based private telephone networks, will disappear. While the "Simple Internet Phone" is a form of voice over Internet Protocol (VoIP), most, if not all, VoIP protocols are designed placing the priority in the affinity to the telephone network. However, it is obvious that the telephone network will be replaced by the Internet, and will eventually disappear. At that time, most of the features of VoIP protocols will become obsolete. Instead, the "Simple Internet Phone" is designed placing the priority in the affinity to the Internet and its architectural principles as an "end-to-end," "globally connected" and "scalable" IP network. As a result, most features of VoIP are substituted by the existing Internet protocols. With Internet phones, callees are required to have persistent connection to the Internet with globally unique addresses, which helps to promote the healthy development of the Internet.
In theory, an Internet phone is a form of voice over Internet Protocol (VoIP). In reality, however, Internet phone and VoIP are different, often contradictory, concepts.
VoIP is often intended to be used as a part of public switched telephone networks (PSTNs) or private (that is, isolated from the Internet and with no global connectivity) IP networks. As such, gateway functionality to other parts of PSTNs, including VoIP PSTNs, is an important requirement. Routing over various gateways, too, is an important issue.
On the other hand, Internet phones work over the Internet, where various functionality is already available. To make the Internet phone smoothly operate with the existing protocols of the Internet, it is essential for the Internet phone protocols to observe the architectural principles of the Internet: "end to end," "globally connected" and "scalable" [ARCH].
The "Simple Internet Phone" has been designed to be an Internet phone that utilizes the existing protocols of the Internet as much as possible.
The architecture of a pure Internet phone is discussed in section 2. As a form of pure Internet phone, the "Simple Internet Phone" uses NOTASIP (Nothing Other Than A Simple Internet Phone) as a basic signaling protocol on end systems and uses uniform resource locators (URLs) for distributing session information to end systems. NOTASIP reuses ICMP (Internet Control Message Protocol) for session establishment. No attempt is made to support complex features of intelligent telephone systems because scarcely no one uses intelligence of PSTN beyond that of POTS (plain old telephone service). URLs are the standard way to describe resources, including telephone devices, on the Internet and they are distributed solely by the World Wide Web (WWW). There are no intelligent intermediate entities corresponding to H.323's Gatekeepers [H323] or SGCP/MGCP's (Single Gateway Control Point/ Multiple Gateway Control Point) Call Agents [SGCP, MGCP], because end systems are assumed to be globally interconnected over the Internet and responsible for their communications.
The telephone network will disappear. However, the fact that the telephone network will disappear does not mean that telephone devices or telephone numbers will also disappear. At least for the time being, and maybe forever, telephone devices with 12 keys (digits, "*" and "#") and without an ASCII keyboard will survive even over the Internet. Telephone devices may have their own Internet connectivity or analog telephone devices may be connected to the Internet through TAs (Terminal Adapters). With such telephone devices, telephone numbers will be the way for callers to specify the callees. Thus, some mechanism is necessary to map telephone numbers to sessions URLs. The "Simple Internet Phone" uses HyperText Transfer Protocol (HTTP) for the mapping, which is discussed in section 3.
Also discussed in section 3 are interoperability issues between the Internet and the telephone network. Although telephone networks will disappear, it will take some time. To replace the telephone network by the Internet as quickly as possible, it is, at first, necessary that the Internet phone can interoperate with PSTNs so that telephone devices in the Internet and in PSTNs can call each other. Thus, there should be some gateways between the Internet and PSTNs with appropriate mechanism for AAA (Authentication, Authorization and Accounting). In this paper, we propose to use RADIUS [RADIUS, RADIUSA] for AAA with the equivalent security level of current telephone networks.
In section 4 our implementation is briefly described and section 5 provides the conclusions of this work.
In this section we discuss the architecture of Internet phones to be used purely within the Internet.
A telephone-network-centric view of VoIP is illustrated in Figure 1. From this point of view, IP networks including the Internet are only a part of a large telephone network system. Telephone devices and exchangers are the gateways (Media Gateways, MGs) connecting various components of the telephone network.
Within that architecture, the SGCP/MGCP approach is reasonable. The communication model of SGCP/MGCP is shown in Figure 2. There are call agents controlling MGs using SGCP/MGCP. Protocols between call agents are not defined by SGCP/MGCP. This model is a complete model if all the MGs are controlled by a single call agent (which may be composed by several physical entities). However, given the scale of the Internet, there must be multiple players operating call agents with their own policies. As a result, some coordination framework is necessary between multiple call agents. For example, a private VoIP network, which contains its own call agent, forms a part of the telephone network and must be connected, as telephone networks do, using the standard protocol of ITU: SS7. As another alternative, several VoIP or other networks may use a new protocol like TBGP (Telephone Border Gateway Protocol; [TBGP]) between them.
On the other hand, an Internet-centric view of VoIP is illustrated in Figure 3. From this point of view, there is only one network worth considering -- the Internet. All communications are performed over the Internet. Telephone devices and exchangers are just end systems attached to the Internet and no different from usual Internet hosts.
From this perspective, most of the protocols used in telephone networks are redundant, unimportant, useless or even harmful. For example, the end-to-end principle of the Internet requires that end systems should collect information and make decisions by themselves. It is totally against the end-to-end principle to separate the functionality of call agents from MGs. An end system acting as an MG should also have the functionality of a call agent (it is not an agent, anymore). Then, protocols like SGCP/MGCP are totally unnecessary, because the interaction between an MG and a call agent occurs within an end system and needs no network protocol. Then, the MG and the call agent can be tightly coupled and fully synchronized. If an MG and a call agent are forcibly separated on different end systems, all the usual problems arising when breaking the end-to-end principle will appear. For example, a protocol between them must be complex to express almost all the detailed interaction and will still be insufficient, which may result in the development of a more complex protocol [MEGACO]. Worse, the protocol has inherent race conditions from the lack of synchronization, and the protocol interface is a possible security hole.
It may be argued that the call agent can increase security by properly controlling MGs to disallow unauthorized access. However, this is a familiar but wrong argument that destroys the Internet into a collection of private IP networks separated by firewalls. End systems can be equipped with filters not to accept some call, and real security can be obtained only on end systems. Unlike firewalls, as call agents do not topologically enclose MGs, MGs are easy victims of attackers assuming the IP address of the call agents and/or snooping traffic between MGs and the call agents, unless strong security is provided on MGs, which makes security by call agents obsolete. Finally, it should not be forgotten that telephone, like e-mail, is useful partly because users can receive unsolicited calls from anyone in the world.
As the interoperation to the telephone network is not necessary, the pure Internet phone should offer the minimum functionality of existing phones: globally connected voice communication. Considering that all the fancy features of modern phone systems are not utilized by most users, those features not only can but also should be ignored.
As the pure Internet phone can use all the existing protocols of the Internet, it is not necessary to develop some completely new protocol.
Due to the global connectivity principle of the Internet, IP routing protocols can provide the route between any pair of end systems, making it unnecessary to develop any new routing protocol. In the Internet, VoIP over UDP (User Datagram Protocol) is no different from and indistinguishable from, say, DNS (Domain Naming System) over UDP. The same routing policy is applied to VoIP and DNS.
Mobility is an important property of telephone devices today. In Japan, there are more cellular phones than POTS. Mobility can (and, considering the consistency of the Internet, should) be provided by IP mobility.
"Directory Service" means a "telephone database of Internet users." As the most successful global database of the Internet is the World Wide Web (WWW), users can include telephone URLs in their home pages if a URL format to represent users' capabilities is defined. With that URL, yellow pages can just be WWW search engines (in this paper, yellow pages are systems used to query telephone URLs' databases based on non-unique keys, while white pages are systems used to query telephone URLs' databases based on a unique numeric index, the phone number).
What is lacking is a signaling protocol, or a protocol to establish a connection with proper port number assignment between two hosts using UDP, and the definition of the URL format. In this paper, only the best-effort communication is discussed. Given the current and expected speed of the Internet, bandwidth available for best-effort communication seems to offer more than enough bandwidth for voice communication. Resource reserved communication may also be performed if extremely high quality of service is required, in which case, signaling messages for the reservation may be able to carry information for port number assignment.
NOTASIP [NOTASIP] is a signaling protocol built on end systems. Its purpose is to establish a UDP connection between two end systems and to negotiate the UDP port numbers for the connection. In short, NOTASIP is unidirectional TCP (Transmission Control Protocol) without retransmission or rate control.
NOTASIP has been designed based on a careful analysis of today's PSTN service and porting the essence of that service to the Internet with simplicity and with the objective of quick response even in the presence of packet losses. It has also been considered that bandwidth consumed by voice traffic is negligible. The essential telephone service provided nowadays by POTS is very simple: if a call succeeds, voice communication is established; if a call does not succeed, a busy tone is heard. An Internet phone can be even simpler because no accounting is necessary in the best-effort flatly rated Internet. In the future, even if resource reserved communication is enabled and users are charged based on the amount of the resource reserved, no special care about accounting is necessary merely because the payload is voice. An accounting mechanism for the reservation will be included in the resource reservation protocol.
NOTASIP can briefly be described as follows: the caller first chooses an appropriate source UDP port (P0) and sends a voice stream of UDP packets to a well-known UDP port (P1) of the callee; then, the callee returns a UDP stream consisting of a ringing tone from an appropriate UDP port (P2) to the originating UDP port (P0) of the caller and the connection is established. P1 and P2 may or may not be same. If the callee's handset is picked up, the callee starts to send a voice stream.
If the callee's UDP port is occupied or the callee does not want to accept a connection, an ICMP PORT UNREACHABLE message is returned and the connection fails. If no packet is received for more than 30 seconds, the caller or the callee closes the connection. An appropriate tone (a busy tone) will be locally generated upon receiving ICMP PORT UNREACHABLE or after the connection is closed.
To prevent NOTASIP protocol used as an amplifier of a denial-of-service attack with a forged source address, before a connection is established, the callee should return at most one packet for every packet received from the caller.
The NOTASIP protocol tolerates any number of lost packets as long as at least one packet is sent every 30 seconds. With NOTASIP, a connection can be established immediately after two packets are exchanged, regardless of the number of lost packets.
With [H323] or [SGCP, MGCP], control information is exchanged as separate packets much more infrequently than voice packets, which means detection of packet losses and retransmission take a long time (the duration is not specified and may be infinite). It may be argued that, in slow networks, the separation reduces the number of control packets. However, in the networks where a stream of voice packets is not a problem, such reduction is meaningless.
NOTASIP can work with any voice encoding scheme as long as a packet is generated and reaches its destination at least once every 30 seconds.
In order to use the WWW (the human-friendly database of the Internet) as a directory service for the Internet phone, URLs must be defined to describe the capabilities of end systems. According to a common syntax of URLs for IP-based protocols [URL], a URL has the following structure:
In our case, we do not need the part given by "<user>:<password>@". "<url-path>", which, in general, specifies how the resource should be accessed, should describe how the stream from <host>:<port> can be decoded. The URL we propose to use has the following syntax:
For example, a URL for phone systems with RTP (Real-Time Protocol) [RTP] 8KHz L16 monaural encoding with dynamic payload type of 97 can be represented as:
There exist an IETF standard protocol to describe a session: SDP (Session Description Protocol) [SDP]. However, SDP is overloaded by a lot of information suitable for human users and better supplied by, say, hypertext markup language (HTML). We have simplified the SDP notation removing most of the information and leaving the minimum amount of data necessary to decode an audio stream. The simplified notation needs ASCII characters only and satisfies a "fully internationalized" architectural principle [ARCH]. The simplification is necessary in order to put the Internet phone URL itself in SDP. As SDP already contains e-mail addresses and home pages of the session administrator, it is unreasonable not to include URLs to call the administrator over the Internet phone.
As an alternative approach, rather than defining a new URL, it is possible to have a defact suffix for an HTTP file containing the session information. However, to access first to an HTTP file and then access to the session constitutes a violation of the URL semantics defining the default handling "to access the resource." Moreover, the indirection unnecessarily increases the response time.
There are several requirements to operate the Internet phone over the Internet.
First of all, assignment of global address is essential. End systems operate as callers, or clients, to initiate calls. However, end systems operate as callees, or servers, to receive the calls. For a caller to know an IP address of a callee through DNS, the address must be registered into DNS and must be reachable globally from any callers located anywhere in the Internet. That is, end systems for Internet phones must have globally reachable addresses registered into DNS or a similar database. The assigned address must be static partly because DNS scalability relies on proper caching. Note also that on-demand address assignments to callees are meaningless, because callees may be called anytime at the demand of callers, which is not known by callees.
In short, callees must be assigned a globally unique static address. Even in that case, a callee may be placed behind some NAT (network address translation) box and multiple callees may share a single IP address at different UDP ports. However, considering that IP mobility means "mobility of a host" and that the UDP ports of a host cannot move independently, the sharing of the address by potential mobile end systems is impossible. Port-wise mobility destroys the semantics of some ICMP message, such as ICMP HOST UNREACHABLE, too. So, address translation mechanisms do not save address space and the callees should better use the assigned globally unique static addresses by themselves.
The other network requirement is the persistent connectivity. A user may initiate a local PSTN call of dial-up Internet connectivity for a long-distance Internet phone call. However, the user behaving as a callee cannot initiate, in general, the local PSTN call to receive Internet phone calls. Callees must be connected to the Internet 24 hours a day, 7 days a week.
Finally, as every person can use a telephone application, once a certain percentage of the population starts using an Internet phone, it will soon become the killer application with an explosive number of users, which will make it necessary to move to the large IP address space of IPv6.
As the Internet phone prevails, it is expected that NATs will be removed, dial-up access to the Internet will disappear, and IPv6 will be common.
The pure Internet phone is a straightforward concept with clean protocols and implementations. However, there are two important factors to motivate people to use the Internet phone.
One is a human interface factor. Many people just do not use a keyboard to call someone else. So, it is necessary to provide a conventional human interface similar to that of analog telephone devices, which means that callees must be identified by digits, that is, telephone numbers. This implies the need for a directory service to map a telephone number to a session URL. In this paper, the service is called white pages service.
The other factor is global connectivity to all the telephones in the world. Until the Internet phone prevails, there will be some (or a lot of) telephone devices in the telephone network. The Internet phone enables people on the Internet to call each other with no extra charge. However, as people still need to call and be called from the telephone network, it is necessary to provide a gateway service from the Internet to the telephone network and from the telephone network to the Internet.
On the Internet, a simple directory service is offered by DNS and a complex one is offered by HTTP. We choose to offer white pages services through HTTP.
With DNS, information on a Japan domestic phone number "03-5734-3299" can be placed in a domain, say, "188.8.131.52.184.108.40.206.3.0.phone.co.jp". However, depending on the context, a telephone number can have different meanings, and a global database like DNS is not good at representing complex relationships between personalized mappings. For example, the previous number may be represented as "81-3-5734-3299" for internationalized context, "5734-3299" for Tokyo local call. Moreover, telephones behind private branch exchanges (PBXs) have their own numbering systems. For example, the previous number may be dialed as "3299" within a local PBX system or "00357343299" with another PBX system.
Instead of DNS, we use HTTP and allow local configuration of white pages servers. If a white pages server is placed at "http://phone.co.jp/whitepages/", a look-up for "03-5734-3299" can be made with the URL given by http://phone.co.jp/whitepages/?0357343299, and the corresponding session URL will be returned.
For the gateway service from the Internet, it is necessary that callers in the Internet pay the telephone charge from the gateway to the callee in the telephone network, and this implies that some form of authentication, authorization and accounting functionality must be provided. It is also necessary to let white and yellow pages servers point to the appropriate gateways.
Authentication of the caller means to know the identity of the caller or the identity of someone who pays the requested charge for the caller. Authentication in a public environment can be a very complicated problem. However, as a simple authentication method is effectively working in the real-world telephone network, we can reuse that mechanism. First, if a call is requested from an IP address belonging to a friendly and reliable ISP, the IP address is considered to be reliable and a bill will be sent to the person who is assigned that IP address. This is the mechanism being used for POTS. Otherwise, a user (or a user agent) should dial an identification number and a personal identification number (PIN), which is the mechanism used for calling cards.
Once a call is authenticated and the caller is identified, it is necessary to judge whether the caller has authority to make the call (Authorization). If the caller is authorized, an appropriate amount of money (depending on the called phone number, the duration of the call, special arrangement between the caller and the gateway service provider) should be requested from the caller (Accounting).
We have found that RADIUS [RADIUS, RADIUSA], which is designed for authentication, authorization and accounting when using the Internet over the telephone network, is also useful for authentication, authorization and accounting when using the telephone network over the Internet.
To find the gateway server and provide proper signaling, we introduce some modifications in the session URL to convert it into a phone URL. The phone URL has a form identical to the session URL except that an optional dialing property "&dial=..." can be added. After a NOTASIP connection to a gateway server is established, the string in the dialing property is sent as a DTMF (Dual Tone Multi-Frequency) tone sequence.
White pages servers owned by a gateway service provider will point to the service provider's gateway when the queried phone number is not assigned to an end system in the Internet.
For example, a query "http://phone.co.jp/whitepages/?0357343299" may return the URL
In this case, the white pages server has wild-card matching capability. The initial string "03" in the phone number means that the phone is in the Tokyo area and the server synthesis the answer combining a template of a gateway in Tokyo and the remaining phone number "5734-3299".
For the gateway service from the telephone network, callers in the telephone network already pay the telephone charge from the caller to the gateway. So the only service of the gateway is media conversion and directory service.
Users in the Internet should ask a gateway service provider to register their phone numbers in the directory server of the gateway, which is likely to be flatly rated. Then, the gateway can relay calls from the telephone network to the registered users in the Internet.
A more advanced service consisting of a yellow pages directory service with DTMF or voice recognition may also be possible.
The "Simple Internet Phone" is implemented as software on PC UNIX. It is also implemented as a standalone hardware of terminal adapters (TAs) to analog phone devices.
On PC UNIX with FreeBSD, caller and callee functions are implemented. A CGI (computer graphics interface)-based white pages server is also implemented as a separate program. The caller and the callee module have RADIUS interface, useful for accounting and authentication as a gateway to PSTN.
A TA has an SH3 central processing unit (CPU) (66MHz), 2MB of flash memory, 16MB of dynamic RAM, an Ethernet interface and an RJ11 jack to an analog telephone device. IP stack, caller and callee functions are implemented over a real-time OS (operating system). It also acts as an HTTP server for system configuration and a small white pages service.
We have implemented a TA rather than implementing a telephone device directly connected to the Internet, in order to allow users to continue using their analog telephone devices through the TA.
In this paper we have presented the "Simple Internet Phone": a simple architecture for an Internet phone and its implementation on software for UNIX PC and dedicated hardware.
The architecture is complete and includes not only the pure Internet phone but also gateways between the Internet and the telephone network, and authentication/authorization/accounting through the gateways.
The Internet phone may be able to promote the IPv6 Internet without NAT and with persistent connectivity.
This research project is funded by a "support system for R&D [research and development] activities contributing to international standards" of TAO (Telecommunications Advancement Organization of Japan) as "research and development of audio/video phone fully utilizing the Internet."
[ARCH] B. Carpenter, "Architectural Principles of the Internet", RFC 1958, June 1996.
[H323] "Packet Based Multimedia Communication Systems", ITU-T Recommendation H.323, 1998.
[MEGACO] F. Cuervo, B. Hill, N. Greene, C. Huitema, A. Rayhan, B. Rosen, J. Segers, "Megaco Protocol", Internet Draft (work in progress as <draft-ietf-megaco-protocol-05.txt>), January 2000.
[MGCP] M. Arango, A. Dugan, I. Elliott, C. Huitema, S. Pickett, "Media Gateway Control Protocol (MGCP)", Internet Draft (work in progress as <draft-huitema-megaco-mgcp-v0r1-05.txt>), February 1999.
[NOTASIP] M. Ohta, K. Fujikawa, "Nothing Other Than A Simple Internet Phone (NOTASIP)", Internet Draft (work in progress as <draft-ohta-notasip-02.txt>), August 1998.
[RADIUS] C. Rigney, A. Rubens, W. Simpson, S. Willens, "Remote Authentication Dial In User Service (RADIUS)", RFC 2138, April 1997.
[RADIUSA] C. Rigney, "RADIUS Accounting", RFC 2139, April 1997.
[RTP] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996.
[SDP] M. Handley, V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998.
[SGCP] M. Arango, C. Huitema, "Simple Gateway Control Protocol (SGCP)", Internet Draft (work in progress as <draft-huitema-sgcp-v1-02.txt>), July 1998.
[TBGP] D. Hampton, D. Oran, H. Salama, D. Shah, "The IP Telephony Border Gateway Protocol (TBGP)", Internet Draft (work in progress as <draft-ietf-iptel-glp-tbgp-01.txt>), June 1999.
[URL] T. Berners-Lee, L. Masinter, M. McCahill, "Uniform Resource Locators (URL)", RFC 1738, December 1994.