H.323 is a recommendation from the ITU Telecommunication Standardization Sector (ITU-T) that defines the protocols to provide audio-visual communication sessions on any packet network. The H.323 standard addresses call signaling and control, multimedia transport and control, and bandwidth control for point-to-point and multi-point conferences.
It is widely implemented by voice and videoconferencing equipment manufacturers, is used within various Internet real-time applications such as GnuGK and NetMeeting and is widely deployed worldwide by service providers and enterprises for both voice and video services over IP networks.
H.323 call signaling is based on the ITU-T Recommendation Q.931 protocol and is suited for transmitting calls across networks using a mixture of IP, PSTN, ISDN, and QSIG over ISDN. A call model, similar to the ISDN call model, eases the introduction of IP telephony into existing networks of ISDN-based PBX systems, including transitions to IP-based PBXs.
Within the context of H.323, an IP-based PBX might be a gatekeeper or other call control element which provides service to telephones or videophones. Such a device may provide or facilitate both basic services and supplementary services, such as call transfer, park, pick-up, and hold.
The first version of H.323 was published by the ITU in November 1996 with an emphasis of enabling videoconferencing capabilities over a local area network (LAN), but was quickly adopted by the industry as a means of transmitting voice communication over a variety of IP networks, including WANs and the Internet (see VoIP).
Over the years, H.323 has been revised and re-published with enhancements necessary to better-enable both voice and video functionality over packet-switched networks, with each version being backward-compatible with the previous version. Recognizing that H.323 was being used for communication, not only on LANs, but over WANs and within large carrier networks, the title of H.323 was changed when published in 1998. The title, which has since remained unchanged, is "Packet-Based Multimedia Communications Systems." The current version of H.323 was approved in 2009.
One strength of H.323 was the relatively early availability of a set of standards, not only defining the basic call model, but also the supplementary services needed to address business communication expectations.
H.323 is a system specification that describes the use of several ITU-T and IETF protocols. The protocols that comprise the core of almost any H.323 system are:
- H.225.0 Registration, Admission and Status (RAS), which is used between an H.323 endpoint and a Gatekeeper to provide address resolution and admission control services.
- H.225.0 Call Signaling, which is used between any two H.323 entities in order to establish communication. (Based on Q.931)
- H.245 control protocol for multimedia communication, which describes the messages and procedures used for capability exchange, opening and closing logical channels for audio, video and data, control and indications.
- Real-time Transport Protocol (RTP), which is used for sending or receiving multimedia information (voice, video, or text) between any two entities.
Many H.323 systems also implement other protocols that are defined in various ITU-T Recommendations to provide supplementary services support or deliver other functionality to the user. Some of those Recommendations are:
- H.235 series describes security within H.323, including security for both signaling and media.
- H.239 describes dual stream use in videoconferencing, usually one for live video, the other for still images.
- H.450 series describes various supplementary services.
- H.460 series defines optional extensions that might be implemented by an endpoint or a Gatekeeper, including ITU-T Recommendations H.460.17, H.460.18, and H.460.19 for Network address translation (NAT) / Firewall (FW) traversal.
In addition to those ITU-T Recommendations, H.323 implements various IETF Request for Comments (RFCs) for media transport and media packetization, including the Real-time Transport Protocol (RTP).
H.323 utilizes both ITU-defined codecs and codecs defined outside the ITU. Codecs that are widely implemented by H.323 equipment include:
- Audio codecs: G.711, G.729 (including G.729a), G.723.1, G.726, G.722, G.728, Speex, AAC-LD
- Text codecs: T.140
- Video codecs: H.261, H.263, H.264
All H.323 terminals providing video communications shall be capable of encoding and decoding video according to H.261 QCIF. All H.323 terminals shall have an audio codec and shall be capable of encoding and decoding speech according to ITU-T Rec. G.711. All terminals shall be capable of transmitting and receiving A-law and ?-law. Support for other audio and video codecs is optional.
The H.323 system defines several network elements that work together in order to deliver rich multimedia communication capabilities. Those elements are Terminals, Multipoint Control Units (MCUs), Gateways, Gatekeepers, and Border Elements. Collectively, terminals, multipoint control units and gateways are often referred to as endpoints.
While not all elements are required, at least two terminals are required in order to enable communication between two people. In most H.323 deployments, a gatekeeper is employed in order to, among other things, facilitate address resolution.
H.323 Network Elements
Figure 1 - A complete, sophisticated protocol stack
Terminals in an H.323 network are the most fundamental elements in any H.323 system, as those are the devices that users would normally encounter. They might exist in the form of a simple IP phone or a powerful high-definition videoconferencing system.
Inside an H.323 terminal is something referred to as a Protocol stack, which implements the functionality defined by the H.323 system. The protocol stack would include an implementation of the basic protocol defined in ITU-T Recommendation H.225.0 and H.245, as well as RTP or other protocols described above.
The diagram, figure 1, depicts a complete, sophisticated stack that provides support for voice, video, and various forms of data communication. In reality, most H.323 systems do not implement such a wide array of capabilities, but the logical arrangement is useful in understanding the relationships.
Multipoint Control Units
A Multipoint Control Unit (MCU) is responsible for managing multipoint conferences and is composed of two logical entities referred to as the Multipoint Controller (MC) and the Multipoint Processor (MP). In more practical terms, an MCU is a conference bridge not unlike the conference bridges used in the PSTN today. The most significant difference, however, is that H.323 MCUs might be capable of mixing or switching video, in addition to the normal audio mixing done by a traditional conference bridge. Some MCUs also provide multipoint data collaboration capabilities. What this means to the end user is that, by placing a video call into an H.323 MCU, the user might be able to see all of the other participants in the conference, not only hear their voices.
Gateways are devices that enable communication between H.323 networks and other networks, such as PSTN or ISDN networks. If one party in a conversation is utilizing a terminal that is not an H.323 terminal, then the call must pass through a gateway in order to enable both parties to communicate.
Gateways are widely used today in order to enable the legacy PSTN phones to interconnect with the large, international H.323 networks that are presently deployed by services providers. Gateways are also used within the enterprise in order to enable enterprise IP phones to communicate through the service provider to users on the PSTN.
Gateways are also used in order to enable videoconferencing devices based on H.320 and H.324 to communicate with H.323 systems. Most of the third generation (3G) mobile networks deployed today utilize the H.324 protocol and are able to communicate with H.323-based terminals in corporate networks through such gateway devices.
A Gatekeeper is an optional component in the H.323 network that provides a number of services to terminals, gateways, and MCU devices. Those services include endpoint registration, address resolution, admission control, user authentication, and so forth. Of the various functions performed by the gatekeeper, address resolution is the most important as it enables two endpoints to contact each other without either endpoint having to know the IP address of the other endpoint.
Gatekeepers may be designed to operate in one of two signaling modes, namely "direct routed" and "gatekeeper routed" mode. Direct routed mode is the most efficient and most widely deployed mode. In this mode, endpoints utilize the RAS protocol in order to learn the IP address of the remote endpoint and a call is established directly with the remote device. In the gatekeeper routed mode, call signaling always passes through the gatekeeper. While the latter requires the gatekeeper to have more processing power, it also gives the gatekeeper complete control over the call and the ability to provide supplementary services on behalf of the endpoints.
H.323 endpoints use the RAS protocol to communicate with a gatekeeper. Likewise, gatekeepers use RAS to communicate with other gatekeepers.
A collection of endpoints that are registered to a single Gatekeeper in H.323 is referred to as a “zone”. This collection of devices does not necessarily have to have an associated physical topology. Rather, a zone may be entirely logical and is arbitrarily defined by the network administrator.
Gatekeepers have the ability to neighbor together so that call resolution can happen between zones. Neighboring facilitates the use of dial plans such as the Global Dialing Scheme. Dial plans facilitate “inter-zone” dialing so that two endpoints in separate zones can still communicate with each other.
Border Elements and Peer Elements
Figure 2 - An illustration of an administrative domain with border elements, peer elements, and gatekeepers
Border Elements and Peer Elements are optional entities similar to a Gatekeeper, but that do not manage endpoints directly and provide some services that are not described in the RAS protocol. The role of a border or peer element is understood via the definition of an "administrative domain".
An administrative domain is the collection of all zones that are under the control of a single person or organization, such as a service provider. Within a service provider network there may be hundreds or thousands of gateway devices, telephones, video terminals, or other H.323 network elements. The service provider might arrange devices into "zones" that enable the service provider to best manage all of the devices under its control, such as logical arrangement by city. Taken together, all of the zones within the service provider network would appear to another service provider as an "administrative domain".
The border element is a signaling entity that generally sits at the edge of the administrative domain and communicates with another administrative domain. This communication might include such things as access authorization information; call pricing information; or other important data necessary to enable communication between the two administrative domains.
Peer elements are entities within the administrative domain that, more or less, help to propagate information learned from the border elements throughout the administrative domain. Such architecture is intended to enable large-scale deployments within carrier networks and to enable services such as clearinghouses.
The diagram, figure 2, provides an illustration of an administrative domain with border elements, peer elements, and gatekeepers.
H.323 Network Signaling
H.323 is defined as a binary protocol, which allows for efficient message processing in network elements. The syntax of the protocol is defined in ASN.1 and uses the Packed Encoding Rules (PER) form of message encoding for efficient message encoding on the wire. Below is an overview of the various communication flows in H.323 systems.
H.225.0 Call Signaling
Figure 3 - Establishment of an H.323 call
Once the address of the remote endpoint is resolved, the endpoint will use H.225.0 Call Signaling in order to establish communication with the remote entity. H.225.0 messages are:
- Setup and Setup acknowledge
- Call Proceeding
- Release Complete
- Status and Status Inquiry
In the simplest form, an H.323 call may be established as follows (figure 3).
In this example, the endpoint (EP) on the left initiated communication with the gateway on the right and the gateway connected the call with the called party. In reality, call flows are often more complex than the one shown, but most calls that utilize the Fast Connect procedures defined within H.323 can be established with as few as 2 or 3 messages. Endpoints must notify their gatekeeper (if gatekeepers are used) that they are in a call.
Once a call has concluded, a device will send a Release Complete message. Endpoints are then required to notify their gatekeeper (if gatekeepers are used) that the call has ended.
Figure 4 - A high-level communication exchange between two endpoints (EP) and two gatekeepers (GK)
Endpoints use the RAS protocol in order to communicate with a gatekeeper. Likewise, gatekeepers use RAS to communicate with peer gatekeepers. RAS is a fairly simple protocol composed of just a few messages. Namely:
- Gatekeeper request, reject and confirm messages (GRx)
- Registration request, reject and confirm messages (RRx)
- Unregister request, reject and confirm messages (URx)
- Admission request, reject and confirm messages (ARx)
- Bandwidth request, reject and confirm message (BRx)
- Disengage request, reject and confirm (DRx)
- Location request, reject and confirm messages (LRx)
- Info request, ack, nack and response (IRx)
- Nonstandard message
- Unknown message response
- Request in progress (RIP)
- Resource availability indication and confirm (RAx)
- Service control indication and response (SCx)
When an endpoint is powered on, it will generally send a gatekeeper request (GRQ) message to "discover" gatekeepers that are willing to provide service. Gatekeepers will then respond with a gatekeeper confirm (GCF) and the endpoint will then select a gatekeeper to work with. Alternatively, it is possible that a gatekeeper has been predefined in the system’s administrative setup so there is no need for the endpoint to discover one.
Once the endpoint determines the gatekeeper to work with, it will try to register with the gatekeeper by sending a registration request (RRQ), to which the gatekeeper responds with a registration confirm (RCF). At this point, the endpoint is known to the network and can make and place calls.
When an endpoint wishes to place a call, it will send an admission request (ARQ) to the gatekeeper. The gatekeeper will then resolve the address (either locally, by consulting another gatekeeper, or by querying some other network service) and return the address of the remote endpoint in the admission confirm message (ACF). The endpoint can then place the call.
Upon receiving a call, a remote endpoint will also send an ARQ and receive an ACF in order to get permission to accept the incoming call. This is necessary, for example, to authenticate the calling device or to ensure that there is available bandwidth for the call.
Figure 4 depicts a high-level communication exchange between two endpoints (EP) and two gatekeepers (GK).
H.245 Call Control
Once a call has initiated (but not necessarily fully connected) endpoints may initiate H.245 call control signaling in order to provide more extensive control over the conference. H.245 is a rather voluminous specification with many procedures that fully enable multipoint communication, though in practice most implementations only implement the minimum necessary in order to enable point-to-point voice and video communication.
H.245 provides capabilities such as capability negotiation, master/slave determination, opening and closing of "logical channels" (i.e., audio and video flows), flow control, and conference control. It has support for both unicast and multicast communication, allowing the size of a conference to theoretically grow without bound.
Of the functionality provided by H.245, capability negotiation is arguably the most important, as it enables devices to communicate without having prior knowledge of the capabilities of the remote entity. H.245 enables rich multimedia capabilities, including audio, video, text, and data communication. For transmission of audio, video, or text, H.323 devices utilize both ITU-defined codecs and codecs defined outside the ITU. Codecs that are widely implemented by H.323 equipment include:
- Video codecs: H.261, H.263, H.264
- Audio codecs: G.711, G.729, G.729a, G.723.1, G.726
- Text codecs: T.140
H.245 also enables real-time data conferencing capability through protocols like T.120. T.120-based applications generally operate in parallel with the H.323 system, but are integrated to provide the user with a seamless multimedia experience. T.120 provides such capabilities as application sharing T.128, electronic whiteboard T.126, file transfer T.127, and text chat T.134 within the context of the conference.
When an H.323 device initiates communication with a remote H.323 device and when H.245 communication is established between the two entities, the Terminal Capability Set (TCS) message is the first message transmitted to the other side.
After sending a TCS message, H.323 entities (through H.245 exchanges) will attempt to determine which device is the "master" and which is the "slave." This process, referred to as Master/Slave Determination (MSD), is important, as the master in a call settles all "disputes" between the two devices. For example, if both endpoints attempt to open incompatible media flows, it is the master who takes the action to reject the incompatible flow.
Logical Channel Signaling
Once capabilities are exchanged and master/slave determination steps have completed, devices may then open "logical channels" or media flows. This is done by simply sending an Open Logical Channel (OLC) message and receiving an acknowledgement message. Upon receipt of the acknowledgement message, an endpoint may then transmit audio or video to the remote endpoint.
Figure 5 - A typical H.245 exchange
A typical H.245 exchange looks similar to figure 5:
After this exchange of messages, the two endpoints (EP) in this figure would be transmitting audio in each direction. The number of message exchanges is numerous, each has an important purpose, but nonetheless takes time.
For this reason, H.323 version 2 (published in 1998) introduced a concept called Fast Connect, which enables a device to establish bi-directional media flows as part of the H.225.0 call establishment procedures. With Fast Connect, it is possible to establish a call with bi-directional media flowing with no more than two messages, like in figure 3.
Fast Connect is widely supported in the industry. Even so, most devices still implement the complete H.245 exchange as shown above and perform that message exchange in parallel to other activities, so there is no noticeable delay to the calling or called party.
H.323 and Voice over IP services
Voice over Internet Protocol (VoIP) describes the transmission of voice using the Internet or other packet switched networks. ITU-T Recommendation H.323 is one of the standards used in VoIP. VoIP requires a connection to the Internet or another packet switched network, a subscription to a VoIP service provider and a client (an analogue telephone adapter (ATA), VoIP Phone or "soft phone"). The service provider offers the connection to other VoIP services or to the PSTN. Most service providers charge a monthly fee, then additional costs when calls are made. Using VoIP between two enterprise locations would not necessarily require a VoIP service provider, for example. H.323 has been widely deployed by companies who wish to interconnect remote locations over IP using a number of various wired and wireless technologies.
H.323 and Videoconference services
A videoconference, or videoteleconference (VTC), is a set of telecommunication technologies allowing two or more locations to interact via two-way video and audio transmissions simultaneously. There are basically two types of videoconferencing; dedicated VTC systems have all required components packaged into a single piece of equipment while desktop VTC systems are add-ons to normal PC's, transforming them into VTC devices. Simultaneous videoconferencing among three or more remote points is possible by means of a Multipoint Control Unit (MCU). There are MCU bridges for IP and ISDN-based videoconferencing. Due to the price point and proliferation of the Internet, and broadband in particular, there has been a strong spurt of growth and use of H.323-based IP videoconferencing. H.323 is accessible to anyone with a high speed Internet connection, such as DSL. Videoconferencing is utilized in various situations, for example; distance education, telemedicine, Video Relay Service, and business.
H.323 has been used in the industry to enable large-scale international video conferences that are significantly larger than the typical video conference. One of the most widely attended is an annual event called Megaconference.
- IAX2 - Inter-Asterisk eXchange, a binary protocol, designed to reduce overhead especially in regard to voice streams. Defined in RFC 5456.
- The IETF produced a standard called the Session Initiation Protocol (SIP) that also enables voice and video communication over IP.
- There are also other ITU-T recommendations used for videoconferencing and videophone services - H.320 (using ISDN) and H.324 (using regular analog phone lines and 3G mobile phones).
- Jingle (Jabber/XMPP extension) also enables video and voice over IP.
- Some providers (such as Skype) also use their own closed, proprietary formats.
- Access Grid provides broadly similar functionality, with more emphasis on open-source and utilizing multicast.
- EVO also provides relatively open functionality via Java, and includes H.323 support.https://en.wikipedia.org/wiki/H.323#cite_note-8