French version

wget - segfault: Summary

wget - segfault: Summary

Hi everyone, winw recently showed me something pretty cool. In a terminal, type the command

$ wget -r %3a
Segmentation fault

You’ll get a segfault.

This is all the more interesting since wget is a very widely used binary. Bugs like this are rare! We then wondered what we could do with it, and whether we could fix it. So we didn’t stop there, and we looked for the cause of the problem.

Debug environment

For that, we armed ourselves with the trusty old gdb, as well as the sources of the latest version of wget (1.16.3) available here:

http://ftp.gnu.org/gnu/wget/wget-1.16.3.tar.gz

First, we recompiled the binary to have a non-stripped version and thus have access to the symbols. In the wget sources folder:

$ ./configure --user-prefix=/home/hackndo/wget
$ make && sudo make install

Reproducing the bug

Next we triggered the segfault in gdb and displayed the backtrace to find where the problem is:

gdb$ r -r %3a
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
-----------------------------------------------------------------------------------------------------------------------[regs]
RAX: 0x0000000000000006  RBX: 0x0000000000000000  RBP: 0x000000000065FFE0  RSP: 0x00007FFFFFFFDF10  o d I t s z a p c
RDI: 0x00000000FFFFFFFF  RSI: 0x00007FFFF7FF7000  RDX: 0x00007FFFF799CDF0  RCX: 0x00007FFFF76E59D0  RIP: 0x0000000000421ADB
R8 : 0x00007FFFF7FF7001  R9 : 0x00007FFFF7FE9700  R10: 0x0000000000000000  R11: 0x0000000000000246  R12: 0x000000000065FFB0
R13: 0x000000000065F950  R14: 0x00007FFFFFFFE05A  R15: 0x00007FFFFFFFE060
CS: 0033  DS: 0000  ES: 0000  FS: 0000  GS: 0000  SS: 002B
-----------------------------------------------------------------------------------------------------------------------

Finding the cause

=> 0x421adb <getproxy+27>:  mov    esi,DWORD PTR [rbx+0x18]
   0x421ade <getproxy+30>:  mov    edi,0x44af12
   0x421ae3 <getproxy+35>:  xor    eax,eax
   0x421ae5 <getproxy+37>:  call   0x402f10 <printf@plt>
   0x421aea <getproxy+42>:  mov    rdi,QWORD PTR [rip+0x23b2bf]        # 0x65cdb0 <opt+304>
   0x421af1 <getproxy+49>:  mov    rsi,QWORD PTR [rbx+0x10]
   0x421af5 <getproxy+53>:  test   rdi,rdi
   0x421af8 <getproxy+56>:  je     0x421b03 <getproxy+67>
-----------------------------------------------------------------------------------------------------------------------------
0x0000000000421adb in getproxy ()
gdb$ bt
#0  0x0000000000421adb in getproxy ()
#1  0x00000000004226fa in retrieve_url ()
#2  0x00000000004204a0 in retrieve_tree ()
#3  0x0000000000404168 in main ()

The segfault occurs in the getproxy function located in retr.c:

getproxy (struct url *u)

After some research, we notice that the u pointer to a url structure is a null pointer, and therefore at the line:

if (no_proxy_match (u->host, (const char **)opt.no_proxy))

the attempt to access the host field of the structure causes the segfault.

Great, we have isolated the cause of the segfault. However, how is it that the u pointer passed to getproxy is null? Let’s go back up the backtrace a bit.

In retrieve_url, still in the same file:

uerr_t retrieve_url (struct url * orig_parsed, const char *origurl, char **file,
char **newloc, const char *refurl, int *dt, bool recursive,
struct iri *iri, bool register_status)

We see the call to getproxy:

proxy = getproxy (u);

And we see above that u is defined like this:

struct url *u = orig_parsed

By placing a breakpoint at the entry of the retrieve_url function, we realize that the orig_parsed parameter is already a null pointer. We continue and go one more level up the backtrace, to look at the retrieve_tree function located in the file recur.c:

uerr_t retrieve_tree (struct url *start_url_parsed, struct iri *pi)

We see the call to the retrieve_url function here:

status = retrieve_url (url_parsed, url, &file, &redirected, referer,
&dt, false, i, true);

We said that the url_parsed parameter was null. This pointer is defined one line above:

struct url *url_parsed = url_parse (url, &url_err, i, true);

This time, none of the parameters passed to url_parse are null. So this function returns a null pointer. By placing a breakpoint right after the call to this function, we can see what is in url_err: the number 8.

Error code 8 is defined in the file url.c (which contains the url_parse function):

#define PE_INVALID_IPV6_ADDRESS         8

Indeed, in the url_parse function, we have the following check:

/* Check if the IPv6 address is valid. */
if (!is_valid_ipv6_address(host_b, host_e))
{
    error_code = PE_INVALID_IPV6_ADDRESS;
    goto error;
}

/* Continue parsing after the closing ']'. */

Recall that the argument we passed to wget was -r %3a, where %3a is the ASCII code for :. Upstream, wget detected our : and therefore considered it as an IPv6 address. Since it is invalid, is_valid_ipv6_address() returns false, and we have the error code. Everything is fine and is going as planned by the developers at this point.

The mistake is in the file recur.c with these lines:

struct url *url_parsed = url_parse (url, &url_err, i, true);
status = retrieve_url (url_parsed, url, &file, &redirected, referer,
&dt, false, i, true);

There is no check made on the return of the url_parse function, and the url_parsed pointer is used without checking whether it is null or not.

So, logically, we get a segfault. From our point of view, this oversight does not allow any exploitation, but it was an interesting analysis. A fix is to check that the url_parse function returned a non-null pointer, in the following way:

struct url *url_parsed = url_parse (url, &url_err, i, true);

if (!url_parsed)
{
    char *error = url_error (url, url_err);
    logprintf (LOG_NOTQUIET, "%s: %s.\n",url, error);
    xfree (error);
    inform_exit_status (URLERROR);
}
else
{
    status = retrieve_url (url_parsed, url, &file, &redirected, referer,
    &dt, false, i, true);
    // [...]

We also submitted a fix to GNU. We’ll see if it gets accepted!

This issue does not exist if the -r parameter is omitted, since this missing check is only in the recur.c file, and nowhere else.

Fixing the bug

We sent a fix that was accepted and is merged into the master branch! There you go, a little contribution to the free software world, it feels good :) </content> </invoke>