Recently I had to use WebRTC for a simple project. The technology itself has many advantages and is being developed as an open standard, without the need for any plugins. However, I was quite new to WebRTC and had some problems getting my head around the basic concepts, as well as creating a working solution. There are many tutorials available (like this one, which inspired my solution). But most of them are incomplete, obsolete, or forced me to use some third party services (e.g. Google Firebase), that only made the whole process more complicated to setup and more difficult to understand.

I decided to put together the information from all those resources and create a simple, working example of a WebRTC application. It does not require any third party services, unless you want to use it over a public network (in which case owning a server would really help). I hope it will provide a good starting point for everyone who is interested in exploring WebRTC.

This is not going to be a full tutorial of the WebRTC technology. You can find plenty of tutorials and detailed explanations all over the internet, for example here. You can also check the WebRTC API, if you want more information. This post is just going to show you one possible working example of WebRTC and explain how it works.

General description

Full source code of this example is available on GitHub. The program consists of three parts:

  • web application
  • signaling server
  • TURN server

The web application is very simple: one HTML file and one JavaScript file (plus one dependency: socket.io.js, which is included in the repository). It is designed to work with only two clients (two web browsers or two tabs of the same browser). Once you open it in your browser (tested on Firefox 74), it will ask for permission to use your camera and microphone. Once the permission is granted, the video and audio from each of the tabs will be streamed to the other one.

WebRTC application in action

Note: you might experience some problems if you try to access the same camera from both tabs. In my test, I’ve used two devices while testing on my machine (a built-in laptop camera and a USB webcam).

The signaling server is used by WebRTC applications to exchange information required to create a direct connection between peers. You can choose any technology you want for this. This example uses websockets (python-socketio on backend and socket.io-client on frontent).

The TURN server is required if you want to use this example over a public network. The process is described further in this post. For local network testing you will not need it.

Signaling

The signaling server is written in Python3 and looks like this:

from aiohttp import web
import socketio

ROOM = 'room'

sio = socketio.AsyncServer(cors_allowed_origins='*')
app = web.Application()
sio.attach(app)


@sio.event
async def connect(sid, environ):
    print('Connected', sid)
    await sio.emit('ready', room=ROOM, skip_sid=sid)
    sio.enter_room(sid, ROOM)


@sio.event
def disconnect(sid):
    sio.leave_room(sid, ROOM)
    print('Disconnected', sid)


@sio.event
async def data(sid, data):
    print('Message from {}: {}'.format(sid, data))
    await sio.emit('data', data, room=ROOM, skip_sid=sid)


if __name__ == '__main__':
    web.run_app(app, port=9999)

Every client joins the same room. Before entering the room, a ready event is sent to all clients currently in the room. That means that the first websocket connection will not get any message (the room is empty), but when the second connection is established, the first one will receive a ready event, signaling that there are at least two clients in the room and the WebRTC connection can start. Other than that, this server will forward any data (data event) that is sent by one websocket to the other one.

Setup is quite simple:

cd signaling
pip install aiohttp python-socketio
python server.py

This will start the signaling server at localhost:9999.

WebRTC

The simplified process of using WebRTC in this example looks like this:

  • both clients obtain their local media streams
  • once the stream is obtained, each client connects to the signaling server
  • once the second client connects, the first one receives a ready event, which means that the WebRTC connection can be negotiated
  • the first client creates a RTCPeerConnection object and sends an offer to the second client
  • the second client receives the offer, creates a RTCPeerConnection object, and sends an answer
  • more information is also exchanged, like ICE candidates
  • once the connection is negotiated, a callback for receiving a remote stream is called, and that stream is used as a source of the video element.

If you want to run this example on localhost, signaling server and the web application is all you need. The main part of the HTML file is a single video element (which source is going to be set later by the script):

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>WebRTC working example</title>
</head>
<body>
    <video id="remoteStream" autoplay playsinline></video>
    <script src="socket.io.js"></script>
    <script src="main.js"></script>
</body>
</html>

JavaScript part is a bit more complicated, and I’ll explain it step by step. First, there are the config variables:

// Config variables
const SIGNALING_SERVER_URL = 'http://localhost:9999';
const PC_CONFIG = {};

For localhost PC_CONFIG can stay empty, and SIGNALING_SERVER_URL should point to the signaling server you’ve started in the previous step.

Next, we have the signaling methods:

let socket = io(SIGNALING_SERVER_URL, { autoConnect: false });

socket.on('data', (data) => {
  console.log('Data received: ',data);
  handleSignalingData(data);
});

socket.on('ready', () => {
  console.log('Ready');
  createPeerConnection();
  sendOffer();
});

let sendData = (data) => {
  socket.emit('data', data);
};

In this example, we want to connect to the signaling server only after we obtain the local media stream, so we need to set { autoConnect: false }. Other than that, we have a sendData method that emits a data event, and we react to the data event by handling the incoming information appropriately (more about it later). Also, receiving a ready event means that both clients have obtained their local media streams and have connected to the signaling server, so we can create a connection on our side and negotiate an offer with the remote side.

Next, we have the WebRTC related variables:

let pc;
let localStream;
let remoteStreamElement = document.querySelector('#remoteStream');

The pc will hold our peer connection, localStream is the stream we obtain from the browser, and remoteStreamElement is the video element that we will use to display the remote stream.

To get the media stream from the browser, we will use getLocalStream method:

let getLocalStream = () => {
  navigator.mediaDevices.getUserMedia({ audio: true, video: true })
    .then((stream) => {
      console.log('Stream found');
      localStream = stream;
      // Connect after making sure that local stream is availble
      socket.connect();
    })
    .catch(error => {
      console.error('Stream not found: ', error);
    });
}

As you can see, we are going to connect to the signaling server only after the stream (audio and video) is obtained. Please note that all of the WebRTC related types and variables (like navigator, RTCPeerConnection, etc.) are provided by the browser, and do not require you to install anything.

Creating a peer connection is relatively easy:

let createPeerConnection = () => {
  try {
    pc = new RTCPeerConnection(PC_CONFIG);
    pc.onicecandidate = onIceCandidate;
    pc.onaddstream = onAddStream;
    pc.addStream(localStream);
    console.log('PeerConnection created');
  } catch (error) {
    console.error('PeerConnection failed: ', error);
  }
};

The two callbacks we are going to use are onicecandidate (called when the remote side sends us an ICE candidate), and onaddstream (called after the remote side adds its local media stream to its peer connection).

Next we have the offer and answer logic:

let sendOffer = () => {
  console.log('Send offer');
  pc.createOffer().then(
    setAndSendLocalDescription,
    (error) => { console.error('Send offer failed: ', error); }
  );
};

let sendAnswer = () => {
  console.log('Send answer');
  pc.createAnswer().then(
    setAndSendLocalDescription,
    (error) => { console.error('Send answer failed: ', error); }
  );
};

let setAndSendLocalDescription = (sessionDescription) => {
  pc.setLocalDescription(sessionDescription);
  console.log('Local description set');
  sendData(sessionDescription);
};

The details of WebRTC offer-answer negotiation are not a part of this post (please check the WebRTC documentation if you want to know more about the process). It’s enough to know that one side sends an offer, the other reacts to it by sending an answer, and both sides use the description for their corresponding peer connections.

The WebRTC callbacks look like this:

let onIceCandidate = (event) => {
  if (event.candidate) {
    console.log('ICE candidate');
    sendData({
      type: 'candidate',
      candidate: event.candidate
    });
  }
};

let onAddStream = (event) => {
  console.log('Add stream');
  remoteStreamElement.srcObject = event.stream;
};

Received ICE candidates are sent to the other client, and when the other client sets the media stream, we react by using it as a source for our video element.

The last method is used to handle incoming data:

let handleSignalingData = (data) => {
  switch (data.type) {
    case 'offer':
      createPeerConnection();
      pc.setRemoteDescription(new RTCSessionDescription(data));
      sendAnswer();
      break;
    case 'answer':
      pc.setRemoteDescription(new RTCSessionDescription(data));
      break;
    case 'candidate':
      pc.addIceCandidate(new RTCIceCandidate(data.candidate));
      break;
  }
};

When we receive an offer, we create our own peer connection (the remote one is ready at that point). Then, we set the remote description and send an answer. When we receive the answer, we just set the remote description of our peer connection. Finally, when an ICE candidate is sent by the other client, we add it to our peer connection.

And finally, to actually start the WebRTC connection, we just need to call getLocalStream:

// Start connection
getLocalStream();

Running on localhost

If you started the signaling server in the previous step, you just need to host the HTML and JavaScript files, for example like this:

cd web
python -m http.server 7000

Then, open two tabs in your browser (or in two different browsers), and enter localhost:7000. As mentioned before, it is best to have two cameras available for this example to work. If everything goes well, you should see one video feed in each of the tabs.

Beyond localhost

You might be tempted to use this example on two different computers in your local network, replacing localhost with your machine’s IP address, e.g. 192.168.0.11. You will quicky notice that it doesn’t work, and your browser claims that navigator is undefined.

That happens because WebRTC is designed to be secure. That means in order to work it needs a secure context. Simply put: all of the resources (in our case the HTTP server and the signaling server) have to be hosted either on localhost, or using HTTPS. If the context is not secure, navigator will be undefined, and you will not be allowed to access user media. If you want to test this example on different machines, using localhost if obviously not an option. Setting up certificates is not a part of this post, and not an easy task at all. If you just want to quickly check this example on two different computers, you can use a simple trick. Instead of hosting the resources over HTTPS, you can enable insecure context in Firefox. Go to about:config, accept the risk, and change the values of these two variables to true:

  • media.devices.insecure.enabled
  • media.getusermedia.insecure.enabled

It should look like this:

Firefox insecure context enabled

Now you should be able to access the web application on two different computers, and the WebRTC connection should be properly established.

Going global

You can use this example over a public network, but it’s going to require a bit more work. First, you need to setup a TURN server. Simply put, TURN servers are used to discover WebRTC peers over a public network. Unfortunately, for this step you will need a publicly visible server. Good news is, once you have your own server, the setup process will be quite easy (at least for a Ubuntu-based OS). I’ve found a lot of useful information in this discussion on Stack Overflow, and I’m just going to copy the most important bits here:

sudo apt install coturn
turnserver -a -o -v -n --no-dtls --no-tls -u username:credential -r realmName

This will start a TURN server using port 3478. The flags mean:

  • -a: use the long-term credential mechanism
  • -o: start process as daemon (detach from current shell)
  • -v: ‘Moderate’ verbose mode
  • -n: do not use configuration file, take all parameters from the command line only
  • --no-dtls: do not start DTLS client listeners
  • --no-tls: do not start TLS client listeners
  • -u: user account, in form ‘username:password’, for long-term credentials
  • -r: the default realm to be used for the users

EDIT: To check if your TURN server setup is correct, you can use this validator. To test the example above you should input the following values:

  • STUN or TURN URI: turn:{YOUR_SERVER_IP}:3478
  • TURN username: test
  • TURN password: test

Click “Add Server”, remove other servers, and select “Gather candidates”. If you get a component of type relay, that means your setup is working.

Next, you need to change the peer connection configuration a bit. Edit main.js, replacing {PUBLIC_IP} with an actual IP of your server:

const TURN_SERVER_URL = '{PUBLIC_IP}:3478';
const TURN_SERVER_USERNAME = 'username';
const TURN_SERVER_CREDENTIAL = 'credential';

const PC_CONFIG = {
  iceServers: [
    {
      urls: 'turn:' + TURN_SERVER_URL + '?transport=tcp',
      username: TURN_SERVER_USERNAME,
      credential: TURN_SERVER_CREDENTIAL
    },
    {
      urls: 'turn:' + TURN_SERVER_URL + '?transport=udp',
      username: TURN_SERVER_USERNAME,
      credential: TURN_SERVER_CREDENTIAL
    }
  ]
};

Of course, you will also have to host your signaling server and the web application itself on a public IP, and you need to change SIGNALING_SERVER_URL appropriately. Once that is done, this example should work for any two machines connected to the internet.

Conclusion

This is just one of the examples of what you can do with WebRTC. The technology is not limited to audio and video, it can be used to exchange any data. I hope this post will help you get started and work on your own ideas. And, of course, if you have any questions or find any errors, don’t hesitate to contact me!