]>
Commit | Line | Data |
---|---|---|
953ab835 MCC |
1 | Bug hunting |
2 | +++++++++++ | |
43019a56 IM |
3 | |
4 | Last updated: 20 December 2005 | |
5 | ||
43019a56 IM |
6 | Introduction |
7 | ============ | |
8 | ||
9 | Always try the latest kernel from kernel.org and build from source. If you are | |
10 | not confident in doing that please report the bug to your distribution vendor | |
11 | instead of to a kernel developer. | |
12 | ||
13 | Finding bugs is not always easy. Have a go though. If you can't find it don't | |
14 | give up. Report as much as you have found to the relevant maintainer. See | |
15 | MAINTAINERS for who that is for the subsystem you have worked on. | |
16 | ||
953ab835 | 17 | Before you submit a bug report read |
8c27ceff | 18 | :ref:`Documentation/admin-guide/reporting-bugs.rst <reportingbugs>`. |
43019a56 IM |
19 | |
20 | Devices not appearing | |
21 | ===================== | |
22 | ||
23 | Often this is caused by udev. Check that first before blaming it on the | |
24 | kernel. | |
25 | ||
26 | Finding patch that caused a bug | |
27 | =============================== | |
28 | ||
29 | ||
30 | ||
953ab835 MCC |
31 | Finding using ``git-bisect`` |
32 | ---------------------------- | |
43019a56 | 33 | |
953ab835 MCC |
34 | Using the provided tools with ``git`` makes finding bugs easy provided the bug |
35 | is reproducible. | |
43019a56 IM |
36 | |
37 | Steps to do it: | |
953ab835 | 38 | |
43019a56 | 39 | - start using git for the kernel source |
953ab835 | 40 | - read the man page for ``git-bisect`` |
43019a56 IM |
41 | - have fun |
42 | ||
43 | Finding it the old way | |
44 | ---------------------- | |
45 | ||
1da177e4 LT |
46 | [Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)] |
47 | ||
d81919c9 | 48 | This is how to track down a bug if you know nothing about kernel hacking. |
1da177e4 LT |
49 | It's a brute force approach but it works pretty well. |
50 | ||
51 | You need: | |
52 | ||
953ab835 MCC |
53 | - A reproducible bug - it has to happen predictably (sorry) |
54 | - All the kernel tar files from a revision that worked to the | |
1da177e4 LT |
55 | revision that doesn't |
56 | ||
57 | You will then do: | |
58 | ||
953ab835 MCC |
59 | - Rebuild a revision that you believe works, install, and verify that. |
60 | - Do a binary search over the kernels to figure out which one | |
d81919c9 | 61 | introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but |
1da177e4 LT |
62 | you know that 1.3.69 does. Pick a kernel in the middle and build |
63 | that, like 1.3.50. Build & test; if it works, pick the mid point | |
64 | between .50 and .69, else the mid point between .28 and .50. | |
953ab835 | 65 | - You'll narrow it down to the kernel that introduced the bug. You |
d81919c9 | 66 | can probably do better than this but it gets tricky. |
1da177e4 | 67 | |
953ab835 | 68 | - Narrow it down to a subdirectory |
1da177e4 LT |
69 | |
70 | - Copy kernel that works into "test". Let's say that 3.62 works, | |
71 | but 3.63 doesn't. So you diff -r those two kernels and come | |
72 | up with a list of directories that changed. For each of those | |
73 | directories: | |
74 | ||
75 | Copy the non-working directory next to the working directory | |
d81919c9 | 76 | as "dir.63". |
1da177e4 | 77 | One directory at time, try moving the working directory to |
953ab835 | 78 | "dir.62" and mv dir.63 dir"time, try:: |
1da177e4 LT |
79 | |
80 | mv dir dir.62 | |
81 | mv dir.63 dir | |
82 | find dir -name '*.[oa]' -print | xargs rm -f | |
83 | ||
84 | And then rebuild and retest. Assuming that all related | |
d81919c9 CK |
85 | changes were contained in the sub directory, this should |
86 | isolate the change to a directory. | |
1da177e4 LT |
87 | |
88 | Problems: changes in header files may have occurred; I've | |
d81919c9 | 89 | found in my case that they were self explanatory - you may |
1da177e4 LT |
90 | or may not want to give up when that happens. |
91 | ||
953ab835 | 92 | - Narrow it down to a file |
1da177e4 LT |
93 | |
94 | - You can apply the same technique to each file in the directory, | |
d81919c9 CK |
95 | hoping that the changes in that file are self contained. |
96 | ||
953ab835 | 97 | - Narrow it down to a routine |
1da177e4 LT |
98 | |
99 | - You can take the old file and the new file and manually create | |
953ab835 | 100 | a merged file that has:: |
1da177e4 LT |
101 | |
102 | #ifdef VER62 | |
103 | routine() | |
104 | { | |
105 | ... | |
106 | } | |
107 | #else | |
108 | routine() | |
109 | { | |
110 | ... | |
111 | } | |
112 | #endif | |
113 | ||
114 | And then walk through that file, one routine at a time and | |
953ab835 | 115 | prefix it with:: |
1da177e4 LT |
116 | |
117 | #define VER62 | |
118 | /* both routines here */ | |
119 | #undef VER62 | |
120 | ||
121 | Then recompile, retest, move the ifdefs until you find the one | |
122 | that makes the difference. | |
123 | ||
124 | Finally, you take all the info that you have, kernel revisions, bug | |
d81919c9 | 125 | description, the extent to which you have narrowed it down, and pass |
1da177e4 LT |
126 | that off to whomever you believe is the maintainer of that section. |
127 | A post to linux.dev.kernel isn't such a bad idea if you've done some | |
128 | work to narrow it down. | |
129 | ||
130 | If you get it down to a routine, you'll probably get a fix in 24 hours. | |
131 | ||
132 | My apologies to Linus and the other kernel hackers for describing this | |
133 | brute force approach, it's hardly what a kernel hacker would do. However, | |
134 | it does work and it lets non-hackers help fix bugs. And it is cool | |
135 | because Linux snapshots will let you do this - something that you can't | |
136 | do with vendor supplied releases. | |
137 | ||
43019a56 IM |
138 | Fixing the bug |
139 | ============== | |
140 | ||
141 | Nobody is going to tell you how to fix bugs. Seriously. You need to work it | |
142 | out. But below are some hints on how to use the tools. | |
143 | ||
144 | To debug a kernel, use objdump and look for the hex offset from the crash | |
145 | output to find the valid line of code/assembler. Without debug symbols, you | |
146 | will see the assembler code for the routine shown, but if your kernel has | |
147 | debug symbols the C code will also be available. (Debug symbols can be enabled | |
953ab835 | 148 | in the kernel hacking menu of the menu configuration.) For example:: |
43019a56 IM |
149 | |
150 | objdump -r -S -l --disassemble net/dccp/ipv4.o | |
151 | ||
953ab835 MCC |
152 | .. note:: |
153 | ||
154 | You need to be at the top level of the kernel tree for this to pick up | |
155 | your C files. | |
43019a56 IM |
156 | |
157 | If you don't have access to the code you can also debug on some crash dumps | |
953ab835 MCC |
158 | e.g. crash dump output as shown by Dave Miller:: |
159 | ||
160 | EIP is at ip_queue_xmit+0x14/0x4c0 | |
161 | ... | |
162 | Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00 | |
163 | 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08 | |
164 | <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85 | |
165 | ||
166 | Put the bytes into a "foo.s" file like this: | |
167 | ||
168 | .text | |
169 | .globl foo | |
170 | foo: | |
171 | .byte .... /* bytes from Code: part of OOPS dump */ | |
172 | ||
173 | Compile it with "gcc -c -o foo.o foo.s" then look at the output of | |
174 | "objdump --disassemble foo.o". | |
175 | ||
176 | Output: | |
177 | ||
178 | ip_queue_xmit: | |
179 | push %ebp | |
180 | push %edi | |
181 | push %esi | |
182 | push %ebx | |
183 | sub $0xbc, %esp | |
184 | mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb) | |
185 | mov 0x8(%ebp), %ebx ! %ebx = skb->sk | |
186 | mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt | |
43019a56 | 187 | |
926b2898 | 188 | In addition, you can use GDB to figure out the exact file and line |
953ab835 MCC |
189 | number of the OOPS from the ``vmlinux`` file. If you have |
190 | ``CONFIG_DEBUG_INFO`` enabled, you can simply copy the EIP value from the | |
191 | OOPS:: | |
926b2898 PE |
192 | |
193 | EIP: 0060:[<c021e50e>] Not tainted VLI | |
194 | ||
953ab835 | 195 | And use GDB to translate that to human-readable form:: |
926b2898 PE |
196 | |
197 | gdb vmlinux | |
198 | (gdb) l *0xc021e50e | |
199 | ||
953ab835 MCC |
200 | If you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function |
201 | offset from the OOPS:: | |
926b2898 PE |
202 | |
203 | EIP is at vt_ioctl+0xda8/0x1482 | |
204 | ||
953ab835 | 205 | And recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled:: |
926b2898 PE |
206 | |
207 | make vmlinux | |
208 | gdb vmlinux | |
209 | (gdb) p vt_ioctl | |
210 | (gdb) l *(0x<address of vt_ioctl> + 0xda8) | |
953ab835 MCC |
211 | |
212 | or, as one command:: | |
213 | ||
dcc85cb6 RK |
214 | (gdb) l *(vt_ioctl + 0xda8) |
215 | ||
953ab835 MCC |
216 | If you have a call trace, such as:: |
217 | ||
218 | Call Trace: | |
219 | [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5 | |
220 | [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e | |
221 | [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee | |
222 | ... | |
223 | ||
dcc85cb6 | 224 | this shows the problem in the :jbd: module. You can load that module in gdb |
953ab835 MCC |
225 | and list the relevant code:: |
226 | ||
dcc85cb6 RK |
227 | gdb fs/jbd/jbd.ko |
228 | (gdb) p log_wait_commit | |
229 | (gdb) l *(0x<address> + 0xa3) | |
953ab835 MCC |
230 | |
231 | or:: | |
232 | ||
dcc85cb6 RK |
233 | (gdb) l *(log_wait_commit + 0xa3) |
234 | ||
926b2898 | 235 | |
43019a56 IM |
236 | Another very useful option of the Kernel Hacking section in menuconfig is |
237 | Debug memory allocations. This will help you see whether data has been | |
238 | initialised and not set before use etc. To see the values that get assigned | |
953ab835 MCC |
239 | with this look at ``mm/slab.c`` and search for ``POISON_INUSE``. When using |
240 | this an Oops will often show the poisoned data instead of zero which is the | |
241 | default. | |
43019a56 IM |
242 | |
243 | Once you have worked out a fix please submit it upstream. After all open | |
244 | source is about sharing what you do and don't you want to be recognised for | |
245 | your genius? | |
246 | ||
8c27ceff MCC |
247 | Please do read |
248 | ref:`Documentation/process/submitting-patches.rst <submittingpatches>` though | |
249 | to help your code get accepted. |