SIP — Session Initiation Protocol

SIP

SIP is the signalling protocol that sets up, manages, and tears down voice and video calls over IP networks. It handles the phone ringing and the 'answer' — the actual audio flows separately over RTP. Every VoIP phone, every Teams call, every WebRTC session builds on the concepts SIP defined.

applicationsipvoiptelephonyrtpsdprfc3261

Overview

SIP (Session Initiation Protocol) is defined in RFC 3261 (2002) and is the dominant VoIP signalling protocol. It is a text-based, request-response protocol deliberately modelled on HTTP — SIP methods, headers, and response codes follow similar patterns to HTTP.

SIP’s responsibility is session control: establishing, modifying, and terminating communication sessions. Like RTSP, SIP does not carry media — it negotiates the parameters of the call (codecs, IP addresses, ports) using SDP (Session Description Protocol), and the actual voice/video flows via RTP on separate ports.

SIP is used in:

Ports: UDP 5060 (standard, preferred for speed), TCP 5060 (reliable delivery), TLS 5061 (SIPS — SIP over TLS).


SIP Components

User Agent (UA): The endpoint — a SIP phone, softphone, or PBX. Has two roles:

SIP Proxy Server: Routes SIP requests toward their destination. Stateless (just forwards) or stateful (tracks transactions). Analogous to an HTTP proxy.

SIP Registrar: Accepts REGISTER requests and stores the current location (IP:port) of each SIP address. When a call arrives for [email protected], the proxy queries the registrar to find Alice’s current IP.

SIP Redirect Server: Responds with a redirect (3xx) pointing the UAC to the next-hop address rather than forwarding the request itself.

In practice, PBX systems (Asterisk, FreePBX) act as Registrar, Proxy, and B2BUA (Back-to-Back User Agent) simultaneously.


SIP Methods

MethodPurpose
INVITEInitiate a call session
ACKAcknowledge receipt of a final response to INVITE
BYETerminate an established session
CANCELCancel a pending INVITE (before it is answered)
REGISTERRegister current location with a SIP registrar
OPTIONSQuery capabilities of a UA or proxy
SUBSCRIBESubscribe to event notifications (presence, voicemail)
NOTIFYDeliver event notifications to a subscriber
REFERAsk a UA to transfer a call (call transfer)
MESSAGEInstant message (SMS over SIP)
INFOMid-session information (DTMF tones)

SIP Response Codes

Deliberately similar to HTTP:

RangeMeaningExample
1xxProvisional (informational)100 Trying, 180 Ringing, 183 Session Progress
2xxSuccess200 OK (call answered)
3xxRedirection302 Moved Temporarily
4xxClient error404 Not Found, 401 Unauthorized, 486 Busy Here, 487 Request Terminated
5xxServer error500 Internal Server Error, 503 Service Unavailable
6xxGlobal failure600 Busy Everywhere, 603 Decline

A Complete SIP Call Flow

Alice (UAC)
SIP Proxy
SDP offer: Alice's RTP IP:port + codec list
Proxied — Via header records path
100 Trying
100 Trying
180 Ringing
Bob's phone is ringing
180 Ringing
Alice hears ringback tone
200 OK
SDP answer: Bob's RTP IP:port + selected codec
200 OK
ACK
Three-way handshake complete
ACK
RTP audio stream (direct)
Voice flows directly — SIP proxy not in the media path
RTP audio stream (direct)
BYE
Alice hangs up
BYE
200 OK
200 OK
Session terminated — RTP stops

Notice: once the call is established, RTP flows directly between Alice and Bob — the SIP proxy is no longer in the path. This is different from older circuit-switched telephony where a central switch always processed the audio.


SDP — The Negotiation Language

The INVITE body contains an SDP payload describing what Alice can send and receive:

v=0
o=alice 1234567890 1234567890 IN IP4 192.168.1.10
s=Call
c=IN IP4 192.168.1.10
t=0 0
m=audio 12340 RTP/AVP 0 8 101
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15

Bob’s 200 OK contains an SDP answer selecting the codec and providing his RTP endpoint. Both sides then send RTP to each other’s declared IP:port.


NAT Traversal — The Hard Problem

SIP was designed assuming end-to-end IP connectivity. NAT breaks this because the SIP UA inside NAT puts its private IP in the SDP c= line — the remote party cannot reach a private IP. Solutions:

STUN (Session Traversal Utilities for NAT): The UA discovers its public IP/port by querying a STUN server. It then inserts the public IP in SDP.

TURN (Traversal Using Relays around NAT): When STUN fails (symmetric NAT), media is relayed through a TURN server. Higher latency, but works in all NAT scenarios.

ALG (Application Layer Gateway): The NAT device understands SIP and rewrites the private IP in SDP headers. Frequently causes problems — SIP ALGs on consumer routers are notoriously buggy. The standard advice: disable SIP ALG.

SIP Proxy in the media path (B2BUA): The proxy terminates both legs of the call and relays media. Adds latency but eliminates all NAT problems since both UAs communicate with the proxy’s public IP.


SRTP — Encrypted Voice

Plain RTP carries voice as unencrypted packets — anyone with network access can capture and replay calls. SRTP (Secure RTP, RFC 3711) adds encryption and authentication to RTP streams:

SRTP keys are negotiated via SDES (SDP Security Descriptions) in the SIP INVITE — embedded in the SDP body, which is why SIP itself must also be encrypted (SIPS/TLS) when SDES is used. The modern alternative, DTLS-SRTP, negotiates keys directly in the DTLS handshake without exposing them in SDP.


References