Format String Attacks Format String Attacks
Tim Newsham Tim Newsham
Guardent, Inc. Guardent, Inc.
September 2000 September 2000
Copyright (c) 2000. All Rights Reserved Copyright (c) 2000. All Rights Reserved
翻译: xuzq@chinasafer.com Translation: xuzq@chinasafer.com
2000/9/13 2000/9/13
内容: Content:
介绍 Introduction
什么是格式化字符串攻击? What is the format string attack?
Printf-学校忘记教给你的东西 Printf-school forgot to teach you something
简单的例子 Simple example
来格式化吧! To format it! (Format Me!) (Format Me!)
X MARKS THE SPOT(X是本文示例程序中我们试图重写的一个变量) X MARKS THE SPOT (X is the sample program in this article we try to rewrite a variable)
怎么着(So what)? How the (So what)?
摘要本文讨论格式化字符串漏洞的成因和含义,并给出实际的例子来解释原理。 Abstract This paper discusses the causes of format string vulnerabilities and meaning, and gives practical examples to explain the principle.
介绍我知道在某些时候对于你我和我们大家而言,下面这种情况总会发生。 Introduction I know that at some point for you and me and all of us, the following situation will always occur. 在一个时下流行的晚餐会上,夹杂在同事们大呼小叫的声音里,你听到了"格式化字符串攻击"这只言片语。 In a popular dinner party, sandwiched colleagues yelling voice, you hear a "format string attack" This word or two. "格式化字符串攻击?什么是格式化字符串攻击?"你心说。 "Format string attack? What is the format string attack?" Your heart says. 由于害怕在同事们面前显露出自己的无知,你决定停止不自然的微笑,而频频点头以示自己对这玩艺了如指掌。 In front of my colleagues for fear of revealing their own ignorance, you decided to stop unnatural smile, and nodded to show that they are well aware of this stuff. 如果一切顺利,大家会共饮鸡尾酒,谈话仍将继续,但是没人明白这究竟是怎么回事。 If all goes well, we will sip cocktails, conversation will continue, but no one understand how it is. 现在不用再害怕什么了,本文会提供你想知道而又不好意思问的所有内容。 Now no longer afraid of anything, this article will provide you want to know who I am sorry to ask all the content.
什么是格式化字符串攻击? What is the format string attack?
格式化字符串漏洞同其他许多安全漏洞一样是由于程序员的懒惰造成的。 Format string vulnerabilities, like many other security vulnerabilities are caused due lazy programmer. 当你正在阅读本文的时候,也许有个程序员正在编写代码,他的任务是:打印输出一个字符串或者把这个串拷贝到某缓冲区内。 When you're reading this, perhaps there is a programmer is writing code, and his mission: to print out a string or a copy of this string buffer. 他可以写出如下的代码: He can write the following code:
printf("%s", str); printf ("% s", str);
但是为了节约时间和提高效率,并在源码中少输入6个字节,他会这样写: However, in order to save time and improve efficiency, and less in the source input 6 bytes, he would write:
printf(str); printf (str);
为什么不呢? Why not? 干嘛要和多余的printf参数打交道,干嘛要花时间分解那些愚蠢的格式? Why should I deal with and the extra printf argument, why take the time to break down those stupid format? printf的第一个参数无论如何都会输出的! The first parameter to printf output anyway! 程序员在不知不觉中打开了一个安全漏洞,可以让攻击者控制程序的执行,这就是不能偷懒的原因所在。 Programmer unknowingly opened a security hole could allow an attacker control program execution, which is the reason why not be lazy.
为什么程序员写的是错误的呢? Why do programmers write is wrong? 他传入了一个他想要逐字打印的字符串。 He passed a string that he wanted to literally print. 实际上该字符串被printf函数解释为一个格式化字符串(formatstring)。 In fact the string is interpreted as a printf function format string (formatstring). 函数在其中寻找特殊的格式字符比如"%d"。 Function in which the search for special characters such as the format of "% d". 如果碰到格式字符,一个变量的参数值就从堆栈中取出。 If you hit a character format, parameter values of a variable from the stack. 很明显,攻击者至少可以通过打印出堆栈中的这些值来偷看程序的内存。 Obviously, the attacker can at least print out the stack of these values to peek at the program memory. 但是有些事情就不那么明显了,这个简单的错误允许向运行中程序的内存里写入任意值。 But some things are not so obvious, this simple error to allow the memory to run the program in writing any value.
Printf-学校忘记教给你的东西在说明如何为了自己的目的滥用printf之前,我们应该深入领会printf提供的特性。 Printf-school forgot to teach you something in how the abuse printf for their own purposes, we should provide in-depth understanding of the characteristics of printf. 假定读者以前用过printf函数并且知道普通的格式化特性,比如如何打印整型和字符串,如何指定最大和最小字符串宽度等。 Previously used the printf function is assumed the reader and know common formatting features, such as how to print integer and string, how to specify the minimum and maximum string width. 除了这些普通的特性之外,还有一些深奥和鲜为人知的特性。 In addition to these common features, there are some esoteric and little-known features. 在这些特性当中,下面介绍的对我们比较有用: Among these features, here's more useful to us:
*在格式化字符串中任何位置都可以得到输出字符的个数。 * Anywhere in the format string can be output number of characters. 当在格式化字符串中碰到"%n"的时候,在%n域之前输出的字符个数会保存到下一个参数里。 When encountered in the format string "% n" when the domain in the% n number of characters before the output is saved to the next parameter. 例如,为了获取在两个格式化的数字之间空间的偏量: For example, to obtain the digital format in the space between two partial volume:
int pos, x = 235, y = 93; int pos, x = 235, y = 93;
printf("%d %n%d\n", x, &pos, y); printf ("% d% n% d \ n", x, & pos, y);
printf("The offset was %d\n", pos); printf ("The offset was% d \ n", pos);
* %n格式返回应该被输出的字符数目,而不是实际输出的字符数目。 *% N format returns the number of characters should be output, rather than the actual number of characters output. 当把一个字符串格式化输出到一个定长缓冲区内时,输出字符串可能被截短。 When a string formatted output to a fixed-length buffer, the output string may be truncated. 不考虑截短的影响,%n格式表示如果不被截短的偏量值(输出字符数目)。 Without considering the impact of truncated,% n format is not truncated if the partial value (the number of output characters). 为了说明这一点,下面的代码会输出100而不是20: To illustrate this point, the following code will output 100 rather than 20:
char buf[20]; char buf [20];
int pos, x = 0; int pos, x = 0;
snprintf(buf, sizeof buf, "%.100d%n", x, &pos); snprintf (buf, sizeof buf, "% .100 d% n", x, & pos);
printf("position: %d\n", pos); printf ("position:% d \ n", pos);
简单的例子除了讨论抽象和复杂的理论,我们将会使用一个具体的例子来说明我们刚才讨论的原理。 In addition to discussing a simple example of an abstract and complex theory, we will use a concrete example to illustrate the principles we have just discussed.
下面这个简单的程序能满足这个要求: The following simple program to meet this requirement:
/* / *
* fmtme.c * Fmtme.c
* Format a value into a fixed-size buffer * Format a value into a fixed-size buffer
*/ * /
#include
int int
main(int argc, char **argv) main (int argc, char ** argv)
{ {
char buf[100]; char buf [100];
int x; int x;
if(argc != 2) if (argc! = 2)
exit(1); exit (1);
x = 1; x = 1;
snprintf(buf, sizeof buf, argv[1]); snprintf (buf, sizeof buf, argv [1]);
buf[sizeof buf - 1] = 0; buf [sizeof buf - 1] = 0;
printf("buffer (%d): %s\n", strlen(buf), buf); printf ("buffer (% d):% s \ n", strlen (buf), buf);
printf("x is %d/%#x (@ %p)\n", x, x, &x); printf ("x is% d /% # x (@% p) \ n", x, x, & x);
return 0; return 0;
} }
对这个程序有几点说明:第一,目的很简单:将一个通过命令行传递值格式化输出到一个定长的缓冲区里。 Some explanations for this program are: First, the purpose is simple: pass a value via the command line to format the output to a fixed-length buffer. 并确保缓冲区的大小限制不被突破。 And to ensure that the buffer size limit is not exceeded. 在缓冲区格式化后,把它输出。 Format in the buffer, put it out. 除了把参数格式化,还设置了一个整型值随后输出。 In addition to the parameter format, also set up an integer value is then output. 这个变量是随后我们攻击的目标。 This variable is then our target. 现在值得我们注意的是这个值应该始终为1。 Now we should note that this value should always be 1.
本文中所有的例子都是在x86 BSD/OS All the examples in this article are x86 BSD / OS
4.1机器上完成。 4.1 machine complete. 如果你到莫桑比克执行任务超过20年时间可能会对x86不熟悉,这是一个little-endian机器。 If you go to Mozambique for more than 20 years to perform the task may be not familiar with x86, it is a little-endian machines. 这决定在例子中多精度数字的表示方法。 The decision in case of multi-precision number representation. 在这里使用的具体数值会因为系统的差异而不同,这些差异表现在不同体系结构、操作系统、环境甚至是命令行长度。 The specific values used here because of differences in the system, these differences in the different architectures, operating systems, the environment and even the command line length. 经过简单调整,这些例子可以在其他x86平台上工作。 After a simple adjustment, these examples can work on other x86 platforms. 通过努力也可以在其他体系结构的平台上工作。 Through the efforts of other architecture in platform work.
来格式化吧! To format it! (Format Me!) (Format Me!)
现在是我们戴上黑帽子开始以攻击者方式思考问题的时候了。 Now we start to wear the black hat thinking in the attacker's time. 我们现在手头有一个测试程序。 We do have a test program. 知道这个程序有一个漏洞并且了解程序员是在哪里犯错误的(直接把用户输入的命令行参数作为snprintf的格式化参数)。 Know and understand the program has a bug where the programmer is to make mistakes (entered by the user directly to the command-line parameters as snprintf formatting parameters). 我们还拥有关于printf函数深入的知识,知道如何运用这些知识。 We also have in-depth knowledge of the printf function, know how to use this knowledge. 让我们开始修补我们的程序吧。 Let us begin to mend our program it.
从简单的开始,我们通过简单的参数调用程序。 From simple, we call the procedure by a simple argument. 看这儿: See here:
% ./fmtme "hello world" %. / Fmtme "hello world"
buffer (11): hello world buffer (11): hello world
x is 1/0x1 (@ 0x804745c) x is 1/0x1 (@ 0x804745c)
现在这儿还没有什么特别的事情发生。 Now here is also nothing special happened. 程序把我们输入的字符串格式化输出到缓冲区里,然后打印出它的长度和数值。 Our string program formatted output to the input buffer, and then print out the length and value. 程序还告诉我们变量x的值是1(以十进制和十六进制分别显示),x的存储地址是0x804745c。 Program also tells us the value of the variable x is 1 (in decimal and hexadecimal display, respectively), x is the memory address of 0x804745c.
接下来我们试着使用一些格式指令。 Next we try to use some of the format command. 在下面的例子中我们打印出在格式化字符串之上栈堆中的整型数值: In the following example, we print out the stack on top of the heap in the format string integer values:
% ./fmtme "%x %x %x %x" %. / Fmtme "% x% x% x% x"
buffer (15): 1 f31 1031 3133 buffer (15): 1 f31 1031 3133
x is 1/0x1 (@ 0x804745c) x is 1/0x1 (@ 0x804745c)
对这个程序的快速分析可以揭示在调用snprintf函数时程序堆栈的规划: Rapid analysis of this process can reveal the snprintf function call stack when the program planning:
Address Contents Description Address Contents Description
fp+8 Buffer pointer 4-byte address fp +8 Buffer pointer 4-byte address
fp+12 Buffer length 4-byte integer fp +12 Buffer length 4-byte integer
fp+16 Format string 4-byte address fp +16 Format string 4-byte address
fp+20 Variable x 4-byte integer fp +20 Variable x 4-byte integer
fp+24 Variable buf 100 characters fp +24 Variable buf 100 characters
(补充:我参考了"缓冲区溢出机理分析"一文,才看明白上面的内容。简单介绍一下:当程序中发生函数调用时,计算机做如下操作:首先把参数压入堆栈;然后保存指令寄存器(IP)中的内容做为返回地址(RET);第三个放入堆栈的是基址寄存器(FP);然后把当前的栈指针(SP)拷贝到FP, (Added: I refer to the "buffer overflow mechanism analysis" article, to see it to understand the contents of the above. Brief: function call occurs when the program, the computer do the following: First, the arguments onto the stack; and hold the instruction register (IP) address as the return of the content (RET); the third is the stack base register (FP); then the current stack pointer (SP) copy to the FP,
做为新的基地址;最后为本地变量留出一定空间,把SP减去适当的数值。 As the new base address; finally, leave some space for local variables, subtract the appropriate value of the SP.
---------------------------------------------------------------------- -------------------------------------------------- --------------------
当调用函数snprintf ()时,堆栈如下: When you call the function snprintf (), the stack is as follows:
低内存端高内存端 High end of low memory, the memory side
函数局部变量sfp ret buf sizeof(buf) argv[1] x和buf Function of local variables sfp ret buf sizeof (buf) argv [1] x and buf
<- [ ] [ ] [ ] [ ] [ ] [ ] 数据区 <- [] [] [] [] [] [] Data area
栈顶栈底 Bottom of the stack stack
) )
前一个测试运行结果的四个输出值(1 f31 1031 3133)是在格式化字符串后面堆栈中接下来的四个参数:变量x和3个4字节整型(未经初始化)。 The results of a test run before the four output values (1 f31 1031 3133) is a format string in the stack behind the next four parameters: the variable x and three 4-byte integer (without initialization).
现在该主角出场了。 Now the appearance of the protagonist. 作为一个攻击者,我们要控制储存在缓冲区中的变量。 As an attacker, we have to control the variables stored in the buffer. 这些值也是传递给snprintf调用的参数! These values are also the parameters passed to snprintf call! 让我们看看这个测试: Let us look at this test:
% ./fmtme "aaaa %x %x" %. / Fmtme "aaaa% x% x"
buffer (15): aaaa 1 61616161 buffer (15): aaaa 1 61616161
x is 1/0x1 (@ 0x804745c) x is 1/0x1 (@ 0x804745c)
耶! Yeah! 我们提供的这四个'a'字符被拷贝到buffer的起始处,然后被snprintf作为整型参数解释成0x61616161 ('a' is 0x61 in ASCII)。 We offer these four 'a' characters are copied to the beginning of the buffer, then the integer argument as an explanation as snprintf 0x61616161 ('a' is 0x61 in ASCII).
X MARKS THE SPOT X MARKS THE SPOT
所有的工作准备就绪了,是时候把我们的攻击从被动探测转为主动改变程序的状态了。 All the work is ready, it is time to attack our change from passive to active exploration program state. 还记得变量"x"吗? Remember the variable "x" do? 让我们试着改变它的值。 Let's try to change its value. 为了完成这个任务,我们必须跳过snprintf的第一个参数,它就是变量x,最后使用%n格式写入我们指定的地址。 To accomplish this task, we must skip the first parameter snprintf, it is the variable x, the last write us using the% n format specified address. 这听起来比实际情况复杂。 This sounds complicated than it actually is. 用一个例子可以解释清楚。 Can be explained with an example.
【注意:我们在这里使用PERL来执行程序,这可以让我们方便地在命令行参数中 [Note: here we use PERL to execute the program, which allows us to easily command line parameters
放置任意字符】: Place any character]:
% perl -e 'system "./fmtme", "\x58\x74\x04\x08%d%n"' % Perl-e 'system ". / Fmtme", "\ x58 \ x74 \ x04 \ x08% d% n"'
buffer (5): X1 buffer (5): X1
x is 5/x05 (@ 0x8047458) x is 5/x05 (@ 0x8047458)
x的值被改变了,但是究竟发生了什么? the value of x is changed, but what happened? 传给snprintf的参数看起来如下所示: The parameters passed to snprintf looks as follows:
snprintf(buf, sizeof buf, "\x58\x74\x04\x08%d%n", x, 4 bytes from buf) snprintf (buf, sizeof buf, "\ x58 \ x74 \ x04 \ x08% d% n", x, 4 bytes from buf)
起先snprintf把头四个字节拷入buf。 At first four bytes snprintf head copyed buf. 接下来扫描%d格式并打印出x的值。 Next, scan% d format and print out the value of x. 最后遇到%n指令。 The last encountered% n directive. 这个指令从栈堆中取出下一个值,该值来自buf的头四个字节。 This command removed from the stack heap next value from the first four bytes of buf. 这四个字节是刚才填入的"\x58\x74\x04\x08",或者解释成一个整型0x08047458。 This is just fill in the four bytes "\ x58 \ x74 \ x04 \ x08", or interpreted as an integer 0x08047458. Snprintf然后写入到目前为止输出的字节数目,5,到 Snprintf then write the number of bytes output so far, 5, to
这个地址(0x08047458)。 The address (0x08047458). 这个地址就是变量x的地址。 This address is the address of the variable x. 这不是巧合。 This is not a coincidence. 我们通过先前对程序的检查仔细选择了数值0x08047458。 Through careful examination of the program previously selected value 0x08047458. 在这里,程序打印出我们感兴趣的地址是十分有帮助的。 Here, we are interested in the program prints out the address is very helpful. 更普遍的情况是这个值要通过debugger的帮助来获取 More generally, this value should help to get through the debugger
好棒耶! Terrific Jesus! 我们可以选取任意地址(几乎是任意地址;长度和不带NULL字符的地址一样长)并且可以写入一个值。 We can select any address (almost any address; length and non-NULL characters as long as the address) and you can write a value. 但是我们能写入一个有用的值吗? However, we can write a useful value? snprintf仅能写入到目前为止输出的字符数目。 snprintf can only write the number of characters output so far. 如果我们想要写入一个比四大的小值,解决方法很简单:按照实际需要的数值填充格式化字符串 If we want to write a little more than four values, the solution is simple: fill the actual needs of the format string value
直到我们得到正确的值。 Until we get the correct value. 但是如果是大数值怎么办? However, if a large value of how to do? 这里我们可以利用一个事实:%n会计数不考虑截短情况应该输出的字符个数: Here we can take advantage of the fact that:% n number of accounting does not consider truncation should output the number of characters:
% perl -e 'system "./fmtme", "\x54\x74\x04\x08%.500d%n" % Perl-e 'system ". / Fmtme", "\ x54 \ x74 \ x04 \ x08% .500 d% n"
buffer (99): 00000 ... 0000 buffer (99):% 0000000 ... 0000
x is 504/x1f8 (@ 0x8047454) x is 504/x1f8 (@ 0x8047454)
%n写入x的值为504,比buf的长度限制99要长多了。 % N to write the value of x 504, 99 more than the length of buf is longer than the limit. 我们可以通过指定一个大的域宽值[1] We can specify a large value of the domain width [1]
(field (Field
width)提供任意大的值。 width) to provide arbitrarily large value. 但是对于小值怎么办呢? But for small values of how to do it? 我们可以通过多次写入的组合来构造任意数值(甚至是0)。 We can construct a combination of multiple write any value (even 0). 如果我们每次以一个字节的偏量写出四个数字,我们可以构造任意整数而不仅限于至少四个字节(地址通常用四字节表示)。 If a byte every time we write the partial volume of four figures, we can construct an arbitrary integer but not limited to at least four bytes (address usually four bytes). 为了说明这一点,考虑下面的四次写操作: To illustrate this, consider the following four write:
Address A A+1 A+2 A+3 A+4 A+5 A+6 Address A A +1 A +2 A +3 A +4 A +5 A +6
Write to A: 0x11 0x11 0x11 0x11 Write to A: 0x11 0x11 0x11 0x11
Write to A+1: 0x22 0x22 0x22 0x22 Write to A +1: 0x22 0x22 0x22 0x22
Write to A+2: 0x33 0x33 0x33 0x33 Write to A +2: 0x33 0x33 0x33 0x33
Write to A+3: 0x44 0x44 0x44 0x44 Write to A +3: 0x44 0x44 0x44 0x44
Memory: 0x11 0x22 0x33 0x44 0x44 0x44 0x44 Memory: 0x11 0x22 0x33 0x44 0x44 0x44 0x44
在四次写操作完成后,整型值0x44332211留在地址为A的内存中。 Write operation is completed in four, integer value 0x44332211 to stay in the address for the A's memory. 由四次写入操作的有效字节构成。 Write operation by the four effective bits. 这个技术使得我们更灵活地选择数值写入,但是这种方法是有缺点的:赋一个值要用四次写操作。 This technology allows us the flexibility to select the value to write, but this approach is flawed: assign a value to use four times a write operation. 而且会覆盖目标地址临近的三个字节。 And will cover the target address of the adjacent three bytes. 它还要进行三次非对齐的写操作,这项技术并不是通用的。 It should be three non-aligned writes, the technology is not universal.
怎么着(So what)? How the (So what)?
So what? So what!? SO WHAT!#@?? 你可以向内存中的任意地址写入任意值(几乎是任意的)! So what? So what!? SO WHAT !#@?? you can write to any address in memory to any value (almost any)! ! ! ! ! 你肯定可以想出利用这一点的好方法。 You can certainly come up with a good way to take advantage of this. 让我们看看 Let us look at
* 覆盖一个程序储存的UID值,以降低和提升特权 * Cover the value of a UID stored procedures to reduce and upgrade privileges
* 覆盖一个执行命令 * Cover an execution command
* 覆盖一个返回地址,将其重定向到包含shell code的缓冲区中 * Overwrite a return address, to be redirected to the buffer that contains shell code
更通俗地讲:你拥有这个程序(为所欲为) More More simply: you have this program (do whatever)
今天我们都学到了什么? Today we learned?
* printf 比你以前想象的功能更强大 * Printf than you previously thought, more powerful
* 抄近路从来都是没有回报的(raphaelzl(小飞熊)) * Cut corners were never return (raphaelzl (boats Bear))
* 一个看起来很微小的错误会给攻击者一个有力的杠杆用来毁掉你的生活(raphaelzl * A very small error will appear the attacker a powerful lever to ruin your life (raphaelzl
(小飞熊)) (Boats Bear))
* 拥有足够的时间、努力和一个复杂的输入字符串,你可以把某人的简单错误变成全国性的新闻事件 * Have enough time, effort and a complex input string, you can put someone into a simple error national news events
[1] 在某些版本的glibc中printf的实现有缺陷。 [1] In some versions of glibc in printf implementation was flawed. 当指定一个大的域宽时,printf会导致一个内部缓冲区的下溢出(?underflow)并且导致程序崩溃。 When specifying a field width is large, printf will cause the next overflow an internal buffer (? Underflow) and cause a crash. 因此,在某些版本的linux下不可能使用大于几千的域宽值来攻击程序。 Thus, in some versions of linux is not possible to use the domain width is greater than the value of several thousand to attack the program. 例如:下面的代码会在有这个缺陷的系统上导致segmentation For example: The following code will have this system led to segmentation defects
fault: fault:
printf("%.9999d", 1); printf ("% .9999 d", 1);
Tidak ada komentar:
Posting Komentar