{"id":7058,"date":"2005-02-19T23:00:00","date_gmt":"2005-02-19T13:00:00","guid":{"rendered":"http:\/\/michael.ellerman.id.au\/blog\/2005\/02\/19\/kernel-monkey\/"},"modified":"2007-10-25T14:03:28","modified_gmt":"2007-10-25T04:03:28","slug":"kernel-monkey","status":"publish","type":"post","link":"https:\/\/michael.ellerman.id.au\/blog\/2005\/02\/19\/kernel-monkey\/","title":{"rendered":"Kernel Monkey"},"content":{"rendered":"<p>I spent most of last week chasing a bug in the as yet unreleased 2.6.11 kernel. I hit it originally while testing some code I&#8217;ve been writing to implement a mem=X boot-time option. After 2-3 hours of running <a href=\"http:\/\/ltp.sourceforge.net\">LTP<\/a> the box would drop into xmon.<\/p>\n<p>Just for fun it would rarely crash in the same spot, the only commonality was that we&#8217;d generally have some registers full of random bollocks, and on further investigation we&#8217;d have a page or two of bollocks as well.<\/p>\n<p>Although we had our suspicions as to which patch might have introduced the bug we still needed to tie it down. So I found my self running the test on everything from 2.6.10-bk1 to 2.6.11-rc4, I haven&#8217;t counted but that&#8217;s something like 30 different kernels.<\/p>\n<p>I&#8217;m sure anyone who&#8217;s done any sort of decent testing knows all of what I&#8217;m about to say, but for me it was new, and so I&#8217;m gonna write it down here so google can keep track of it for <strong>me<\/strong>.<\/p>\n<ul>\n<li>Compile all your kernels on one box, not one of the boxes you&#8217;re trying to crash.<\/li>\n<li>Make a directory where all your kernels go.<\/li>\n<li><strong>Always<\/strong> name the directory a kernel&#8217;s in the same as the kernel&#8217;s name.<\/li>\n<li>If you patch a kernel, change its name, eg: <code>2.6.11-rc4-with-bens-fixes<\/code><\/li>\n<li>Keep a record of which kernel is running on which box, when it crashes you may not be able to check.<\/li>\n<li>Having said that, if you&#8217;re in xmon you can usually check with:\n<pre>\r\n1:mon&gt; ls linux_banner\r\nlinux_banner: c000000000443d20\r\n1:mon&gt; dm c000000000443d20\r\nc000000000443d20 4c696e7578207665 7273696f6e20322e  |Linux version 2.|\r\nc000000000443d30 362e31312d726334 2d6d69636861656c  |6.11-rc4-michael|\r\nc000000000443d40 20286d6963686165 6c40737570657265  | (michael@supere|\r\nc000000000443d50 676f292028676363 2076657273696f6e  |go) (gcc version|<\/pre>\n<p>Although this bug had a habit of corrupting the page holding the banner so then you&#8217;re stuffed.<\/li>\n<li>Keep a test matrix. It doesn&#8217;t have to be tied into your project schedule, or have key milestones and review points, just keep track of which kernel worked\/broke on which machine, it&#8217;ll keep you sane.<\/li>\n<li>It&#8217;s also handy to record what you expect each kernel to do. Otherwise you might find yourself inappropriately excited when a kernel doesn&#8217;t crash &#8211; ie. when it doesn&#8217;t have the suspect code and therefore shouldn&#8217;t crash.<\/li>\n<li>Script it, within reason. You don&#8217;t want to spend 3 hours testing the wrong kernel &#8217;cause you copied the wrong zImage into \/tftpboot or something.<\/li>\n<li>If you&#8217;re applying more than one or two patches you need quilt or something similar, otherwise you <strong>will<\/strong> get confused (well I did!)<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>I spent most of last week chasing a bug in the as yet unreleased 2.6.11 kernel. I hit it originally while testing some code I&#8217;ve been writing to implement a mem=X boot-time option. After 2-3 hours of running LTP the box would drop into xmon. Just for fun it would rarely crash in the same [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[12],"tags":[53,17],"_links":{"self":[{"href":"https:\/\/michael.ellerman.id.au\/blog\/wp-json\/wp\/v2\/posts\/7058"}],"collection":[{"href":"https:\/\/michael.ellerman.id.au\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michael.ellerman.id.au\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michael.ellerman.id.au\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michael.ellerman.id.au\/blog\/wp-json\/wp\/v2\/comments?post=7058"}],"version-history":[{"count":0,"href":"https:\/\/michael.ellerman.id.au\/blog\/wp-json\/wp\/v2\/posts\/7058\/revisions"}],"wp:attachment":[{"href":"https:\/\/michael.ellerman.id.au\/blog\/wp-json\/wp\/v2\/media?parent=7058"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michael.ellerman.id.au\/blog\/wp-json\/wp\/v2\/categories?post=7058"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michael.ellerman.id.au\/blog\/wp-json\/wp\/v2\/tags?post=7058"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}