tag:blogger.com,1999:blog-30000679192432374802024-03-12T17:47:19.697-07:00Reflections on Technology and Societynygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.comBlogger59125tag:blogger.com,1999:blog-3000067919243237480.post-71359790311604200802022-08-01T08:52:00.002-07:002023-04-10T09:10:46.755-07:00Quora Greatest Hits - What are common stages that PhD student researchers go through with their thesis project?<p>I have been posting on <a href="https://www.quora.com/profile/Marc-Donner-1" target="_blank">Quora</a> since April of 2014, earning top writer status in 2017 and 2018 and running up, as of this writing, 5.6 million views by Quora readers.</p><p>While many of my posts are of limited interest, I'm inordinately proud of some of them. With this post I will begin retrieving some of my particular favorites from Quora and reposting them here on my blog.</p><p>There is some fun history behind this particular post. Back when I was a grad student at CMU back in the 1980s I was friendly with Jeff Schrager, a fellow grad student at the time, and he posted a hilarious item in, as I recall, rec.humor.funny, an early netnews group. The item was titled "How Many AI People Does It Take To Change A Lightbulb" and I admired it so much that I tracked it down and put it up on this blog some years ago (https://nygeek-blog.blogspot.com/2010/05/how-many-ai-people-does-it-take-to.html).</p><p>Several years ago someone posted the question, "What are common stages that PhD student researchers go through with their thesis project?</p><p>In a fey mood I dashed off a silly tongue-in-cheek answer built around the classic <a href="https://en.wikipedia.org/wiki/Five_stages_of_grief" target="_blank">five stages of grief</a> augmented with two PhD-specific additions. To my surprise and delight, it became my most popular post, attracting, as of this writing, 145K views and over 800 upvotes. If I had realized how popular it would become I would have put more care into the writing. In this blog post you can see the post as I should have written it. If you want to see what I actually posted, take a look at the <a href="https://www.quora.com/What-are-common-stages-that-PhD-student-researchers-go-through-with-their-thesis-project/answer/Marc-Donner-1" target="_blank">original</a> on Quora.</p><h2 style="text-align: left;"><b>What are common stages that PhD student researchers go through with their thesis project?</b></h2><p>[1] Enthusiasm,</p><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><p style="text-align: left;">“This is the greatest thesis topic ever. My thesis advisor is brilliant, and I’m even more brilliant. I will receive a Nobel Prize for this work soon after I defend my thesis.”</p></blockquote><p>[2] Disillusion,</p><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><p style="text-align: left;">“This problem is impossible. Worse yet, it’s uninteresting and unimportant. My advisor is a moron. I’m a complete fraud. I wonder when they’ll take me to the guillotine.”</p></blockquote><p>[3] Denial,</p><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><p style="text-align: left;">“What do you know about anything? Your criticism of my thesis proposal is completely wrong.”</p></blockquote><p>[4] Anger,</p><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><p style="text-align: left;">“How dare you criticize my research plan! What are you even doing at this university?”</p></blockquote><p>[5] Bargaining,</p><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><p style="text-align: left;">“Look, can we revise my thesis proposal so that I don’t actually have to show any results?”</p></blockquote><p>[6] Depression, and</p><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><p style="text-align: left;">“My head hurts. I’m so tired all the time. Don’t look at me that way.”</p></blockquote><p>[7] Acceptance.</p><blockquote style="border: none; margin: 0px 0px 0px 40px; padding: 0px;"><p style="text-align: left;">“I am making steady progress. I have a few preliminary results that I like, and I have submitted a working paper to a conference where I hope that I will be able to get some advice from some of the leaders in the field. I expect to finish my thesis within five or ten more years.”</p></blockquote><div class="q-click-wrapper qu-display--block qu-tapHighlight--none ClickWrapper___StyledClickWrapperBox-zoqi4f-0 iyYUZT focus-visible" data-focus-visible-added="" style="-webkit-tap-highlight-color: transparent; background-color: white; box-shadow: rgba(46, 105, 255, 0.35) 0px 0px 0px 3px; box-sizing: border-box; color: inherit; cursor: initial; font: inherit; outline: none; padding: 0px; position: relative; text-align: inherit; width: 624px;" tabindex="-1" width="100%"><div class="q-box spacing_log_answer_content puppeteer_test_answer_content" style="box-sizing: border-box;"><div></div></div></div>nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-78534112265636880042021-10-11T16:55:00.005-07:002022-05-22T17:32:36.303-07:00Facebook Outage (a haiku)<div style="text-align: left;"><span style="font-family: times;"><span style="font-size: large;"><span>BGP reroute</span></span></span></div><div style="text-align: left;"><span style="font-family: times;"><span style="font-size: large;"><span>Blackholes Facebook, Instagram</span></span></span></div><div style="text-align: left;"><span style="font-family: times;"><span style="font-size: large;"><span>The sound of silence.</span></span></span></div><p><span style="font-family: times;"><span style="font-size: large;"><br /></span></span></p>NYGeekhttp://www.blogger.com/profile/17479861418342688890noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-26088074244324856052021-07-08T16:34:00.013-07:002023-03-04T05:54:05.596-08:00HP 35 calculator 200 trick<p><span style="font-size: large;">In 1972 I bought the first HP 35 calculator sold on the Caltech campus. It was not by far the first one <b>on</b> the campus – HP had distributed pre-release copies to numerous faculty members. Max Delbrück, my next-door neighbor, had given me my first experience with the calculator one evening while hosting my landlady, her family, and me for dinner. I was smitten.<br /><br />When the Caltech Bookstore posted the impending availability of the calculator I was the first on the list. The arrival date was not known, so I haunted the bookstore.</span></p><p><span style="font-size: large;"></span></p><div class="separator" style="clear: both; text-align: center;"><span style="font-size: large;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7Qs3a127ifmyyyzoeMHAm0ThBxWX9oy0tAekeHzywLl2VRlLjyZ0T4xgVeOeoboSRnXcDN3JMMqi3FxA-WVR76JqREM5OOqmpsqN-qDyerQ4Ldvziayo4j5VXI12DwdIAggytuXx5FVhZdTd4MsT-4MtlIz4LKap2P8KAN60qXf8tQARUcrOOTD__Qg/s2550/HP-35_Red_Dot.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2550" data-original-width="1530" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7Qs3a127ifmyyyzoeMHAm0ThBxWX9oy0tAekeHzywLl2VRlLjyZ0T4xgVeOeoboSRnXcDN3JMMqi3FxA-WVR76JqREM5OOqmpsqN-qDyerQ4Ldvziayo4j5VXI12DwdIAggytuXx5FVhZdTd4MsT-4MtlIz4LKap2P8KAN60qXf8tQARUcrOOTD__Qg/s320/HP-35_Red_Dot.jpg" width="192" /></a></span></div><span style="font-size: large;"><br />HP-35 (Source: https://commons.wikimedia.org/wiki/User:Mister_rf)<br /><br />They finally came in sometime in November of 1972, if I remember correctly, and I happily paid the $395 price (about a quarter of my life savings at the time). It was everything I had dreamed of and more. It transformed my Physics lab performance from C (great on execution and writeup, not so hot on accuracy of calculation) to A. It made me popular as a member of study groups.</span><p></p><h2 style="text-align: left;"><span style="font-size: large;">Games with HP 35</span><br /></h2><p style="text-align: left;"><span style="font-size: large;">One of the things that we did was play games using the calculator. Geek games, of course. Typically this involved clever ways to calculate interesting results using the calculator’s capabilities in unusual or non-obvious ways.<br /><br />For example, how to get as close to the value of pi as possible without pressing the symbol on the keyboard. (Hint: try 355 / 113).<br /><br />One of my favorites, however, was how to get the number 200 into the display from a cleared calculator (zero in all four of the registers) without using any of the digit keys. I had discovered a remarkable six stroke way to do this that has not, so far as I know, been surpassed.<br /></span></p><h2 style="text-align: left;"><span style="font-size: large;">Getting to 200</span><br /></h2><p><span style="font-size: large;">The rest of this note will explain how this worked and what interesting things it reveals about the HP 35 implementation.<br /><br />Here is the six-stroke trick:<br /></span></p><ol style="text-align: left;"><li><span style="font-size: large;"> arc</span></li><li><span style="font-size: large;"> cos</span></li><li><span style="font-size: large;"> tan</span></li><li><span style="font-size: large;"> log</span></li><li><span style="font-size: large;"> ENTER</span></li><li><span style="font-size: large;"> +</span><br /></li></ol><p><span style="font-size: large;">If you try this on a modern scientific calculator, and even on one of the HP 35 simulators you can get for your smartphone or web browser, you will discover that the result is generally not 200.<br /><br />What’s going on here?<br /><br />The key to this is that the HP 35 was implemented on an early four-bit microprocessor. One of the decisions that the HP engineers made was to perform all calculations in decimal rather than in binary. While they sacrificed efficiency for the primary calculations, they also eliminated the need for expensive decimal-binary conversions on every input and a corresponding binary-decimal conversion on every result display. And since the calculator's performance was ultimately limited by the speed at which a person pressed buttons, keeping things in decimal made a lot of sense.<br /></span></p><p style="text-align: left;"><span style="font-size: large;">This led to some other interesting decisions. One of them was their representation of infinity, or overflow. Their simplification was to use the largest possible number, <span style="font-family: courier;">9.999999999 x <span style="font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space: pre-wrap;">10</span><span style="font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="vertical-align: super;">99</span></span></span> (<span style="font-family: courier;">9.999999999 E 99</span>) or, basically, </span><span id="docs-internal-guid-94ee72da-7fff-8f9f-76ab-8cddb7c7764c"><span style="font-family: courier; font-size: large;"><span style="font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space: pre-wrap;">10</span><span style="font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="vertical-align: super;">100</span></span></span></span><span style="font-size: large;">.</span></p><p></p><p style="text-align: left;"><span style="font-family: times; font-size: x-large;">For those perhaps less familiar with this particular terminology arc is a prefix key for the trigonometric functions (sin, cos, tan) and invokes the inverse function. So arc cos is the inverse cosine, more commonly rendered </span><span style="font-family: times; font-size: x-large;"><span id="docs-internal-guid-e964351a-7fff-c10c-2bf8-dfb9e6f3532d" style="color: black; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space: pre-wrap;">cos</span><span style="color: black; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space: pre-wrap;"><span style="vertical-align: super;">-1</span></span> (not 1/cos).</span><span id="docs-internal-guid-b60fc51e-7fff-1921-b1a2-8e71b7de970e"></span></p><p><span style="font-size: large;"><span style="font-family: times;">Those who recall trigonometry know that <span id="docs-internal-guid-e964351a-7fff-c10c-2bf8-dfb9e6f3532d" style="background-color: transparent; color: black; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">cos</span><span style="background-color: transparent; color: black; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><span style="vertical-align: super;">-1</span></span>(0) yields 90 (degrees) or pi<span id="docs-internal-guid-c82c3ace-7fff-e0ed-6966-883906eeea80"></span>/2 (radians).</span><br /><br />Now we apply the tangent function: tan(90). We know that the tangent function goes to infinity at pi/2, so what’s the point of this?<br /><br />Aha – the largest number that the HP 35 could display is <span style="font-size: large;"><span style="font-family: times;"><span id="docs-internal-guid-0c53946e-7fff-ec93-0afe-a4ee3fa30e69" style="background-color: transparent; color: black; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">10</span><span style="background-color: transparent; color: black; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><span style="vertical-align: super;">100</span></span></span></span> and the tangent function dutifully returns <span style="font-family: courier;">9.999999999E99</span> (as close as I can render the calculator display).<br /><br />That explains why the log function, the decimal logarithm, appears next in the algorithm. Because log(</span><span style="font-size: large;"><span style="font-size: large;"><span style="font-size: large;"><span style="font-family: times;"><span id="docs-internal-guid-0c53946e-7fff-ec93-0afe-a4ee3fa30e69" style="background-color: transparent; color: black; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">10</span><span style="background-color: transparent; color: black; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><span style="vertical-align: super;">100</span></span></span></span></span>) returns the exponent, it produces 100.<br /><br />And ENTER + simply doubles that result.<br /><br />Voila! 200 in six keystrokes.</span><br /></p>NYGeekhttp://www.blogger.com/profile/17479861418342688890noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-65207122433242347972017-07-01T04:36:00.000-07:002021-04-14T11:17:09.076-07:00Monthly Payment Tracker spreadsheet<span style="font-weight: 400;">Increasingly the various businesses that I pay monthly (rent, garage, credit cards, phone bill, newspaper bill, and utility bill) are only notifying me that I have a payment due by email or worse, not notifying me at all. Combined with a shrinking window between when the bill is issued and when late charges start to accrue, there is increasing pressure on me to track my bills.</span><br/><br/><span style="font-weight: 400;">In the past I would accumulate unpaid bills on a designated spot on my desk at home. The presence of paper bills there told me that I had bills to pay. When a paycheck rendered me sufficiently liquid to write checks, I would pay them.</span><br/><br/><span style="font-weight: 400;">As the transition to email notification and non-notification has progressed, many of these bill issuers have offered to help me out with this scheduling problem. All I had to do was give them the ability to directly take money from my bank account in whatever amount they desired and at whatever time they liked.</span><br/><br/><span style="font-weight: 400;">You’re kidding, right?</span><br/><br/><span style="font-weight: 400;">Not all of these institutions are crooked, and perhaps none of them are crooked today, but the documented behavior of some of their </span><a href="http://www.nydailynews.com/news/national/wells-fargo-ceo-john-stumpf-resigns-customer-account-scam-article-1.2828601"><span style="font-weight: 400;">peers</span></a><span style="font-weight: 400;"> has convinced me that I should not trust any of them to take only what they are owed from my account.</span><br/><br/><span style="font-weight: 400;">So, how to keep on top of these due dates?</span><br/><br/><span style="font-weight: 400;">The first step was to create a spreadsheet. Each column was a payee. Each row was a date. When I paid bills I would insert a row and put that day’s date in the leftmost column and put dollar amounts in each column corresponding to a bill I had paid that day.</span><br/><br/><span style="font-weight: 400;">It ended up looking something like this:</span><br/><table><br/><tbody><br/><tr><br/><td><span style="font-weight: 400;">Date</span></td><br/><td><span style="font-weight: 400;">Shady Bank</span></td><br/><td><span style="font-weight: 400;">Weird Bank</span></td><br/><td><span style="font-weight: 400;">Odd Bank</span></td><br/><td><span style="font-weight: 400;">Rent</span></td><br/><td><span style="font-weight: 400;">Parking</span></td><br/><td><span style="font-weight: 400;">Pa Bell</span></td><br/></tr><br/><tr><br/><td><span style="font-weight: 400;">2017-02-01</span></td><br/><td></td><br/><td></td><br/><td></td><br/><td><span style="font-weight: 400;">XXX</span></td><br/><td><span style="font-weight: 400;">YYY</span></td><br/><td></td><br/></tr><br/><tr><br/><td><span style="font-weight: 400;">2017-02-13</span></td><br/><td><span style="font-weight: 400;">ZZZ</span></td><br/><td><span style="font-weight: 400;">WWW</span></td><br/><td></td><br/><td></td><br/><td></td><br/><td></td><br/></tr><br/><tr><br/><td><span style="font-weight: 400;">2017-02-25</span></td><br/><td></td><br/><td></td><br/><td></td><br/><td></td><br/><td></td><br/><td><span style="font-weight: 400;">QQQ</span></td><br/></tr><br/><tr><br/><td><span style="font-weight: 400;">2017-02-26</span></td><br/><td></td><br/><td></td><br/><td><span style="font-weight: 400;">VVV</span></td><br/><td></td><br/><td></td><br/><td></td><br/></tr><br/></tbody><br/></table><br/><span style="font-weight: 400;">It helped me keep an eye on things, particularly after I set up some color coding so that the background of odd-numbered months was different from that of even-numbered months.</span><br/><br/><span style="font-weight: 400;">But I was still not happy, since the payment windows established by the banks were becoming smaller and smaller.</span><br/><br/><span style="font-weight: 400;">I decided to figure out a formula that I could put in the top of each column, right below the column headings, that would tell me how many days had elapsed between the most recent payment and today.</span><br/><br/><span style="font-weight: 400;">How to do that?</span><br/><br/><span style="font-weight: 400;">The top level structure of the formula was “</span><b>=today()-(something)</b><span style="font-weight: 400;">” where something translated to the date in column A of the row with the most recent payment.</span><br/><br/><span style="font-weight: 400;">How to find the row number of the most recent payment?</span><br/><br/><span style="font-weight: 400;">With some research, I finally came up with this formula - in this case for Column E: </span><br/><br/><b>=today()-index($A:$A,max((E3:E<>"")*row(E3:E)))</b><br/><br/><span style="font-weight: 400;">This is pretty weird … here’s how it works:</span><br/><br/><b>max((E3:E<>””)*row(E3:E))</b> <span style="font-weight: 400;">finds the highest row number in column E on or after row 3 that contains a nonempty value.</span><br/><br/><b>index($A:$A, max(...))</b><span style="font-weight: 400;"> finds the value in column A at the row identified in the max(...) formula above.</span><br/><br/><span style="font-weight: 400;">How did I figure out the idiom </span><b>index(column, max(...))</b><span style="font-weight: 400;">? I’d like to claim that I’m a master of Google Sheets and read relevant documentation, but it ain’t so. I did some searching and found </span><a href="https://stackoverflow.com/questions/8116043/get-the-last-non-empty-cell-in-a-column-in-google-sheets/29808305#29808305"><span style="font-weight: 400;">this StackOverflow page</span></a><span style="font-weight: 400;"> that illustrated the idiom. I had to adapt it significantly for it to work properly, but now I have a row of values at the top of the columns, just below the headings and in the frozen part of the sheet. Each cell reports the number of days since the last payment in that column. Here’s the complete formula for column G:</span><br/><br/><b>=today()-index($A:$A,max((G3:G<>"")*row(G3:G)))</b><br/><br/><span style="font-weight: 400;">I have also established conditional formatting for that row so that when more than 20 days has passed since that bill was paid the time is highlighted in red to draw my attention to it.</span><br/><br/>So now it looks something like this:<br/><table><br/><tbody><br/><tr><br/><td><span style="font-weight: 400;">Date</span></td><br/><td><span style="font-weight: 400;">Shady Bank</span></td><br/><td><span style="font-weight: 400;">Weird Bank</span></td><br/><td><span style="font-weight: 400;">Odd Bank</span></td><br/><td><span style="font-weight: 400;">Rent</span></td><br/><td><span style="font-weight: 400;">Parking</span></td><br/><td><span style="font-weight: 400;">Pa Bell</span></td><br/></tr><br/><tr><br/><td></td><br/><td style="text-align: center;"><span style="font-weight: 400;">16</span></td><br/><td style="text-align: center;"><span style="font-weight: 400;">16</span></td><br/><td style="text-align: center;"><span style="font-weight: 400;">3</span></td><br/><td style="text-align: center;" bgcolor="red"><span style="font-weight: 400;">29</span></td><br/><td style="text-align: center;" bgcolor="red"><span style="font-weight: 400;">29</span></td><br/><td style="text-align: center;"><span style="font-weight: 400;">4</span></td><br/></tr><br/><tr><br/><td><span style="font-weight: 400;">2017-02-01</span></td><br/><td></td><br/><td></td><br/><td></td><br/><td><span style="font-weight: 400;">XXX</span></td><br/><td><span style="font-weight: 400;">YYY</span></td><br/><td></td><br/></tr><br/><tr><br/><td><span style="font-weight: 400;">2017-02-13</span></td><br/><td><span style="font-weight: 400;">ZZZ</span></td><br/><td><span style="font-weight: 400;">WWW</span></td><br/><td></td><br/><td></td><br/><td></td><br/><td></td><br/></tr><br/><tr><br/><td><span style="font-weight: 400;">2017-02-25</span></td><br/><td></td><br/><td></td><br/><td></td><br/><td></td><br/><td></td><br/><td><span style="font-weight: 400;">QQQ</span></td><br/></tr><br/><tr><br/><td><span style="font-weight: 400;">2017-02-26</span></td><br/><td></td><br/><td></td><br/><td><span style="font-weight: 400;">VVV</span></td><br/><td></td><br/><td></td><br/><td></td><br/></tr><br/></tbody><br/></table>nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-69201286700749295012017-02-28T19:23:00.000-08:002021-04-14T11:17:08.846-07:00Now that's funnyA friend of mine told me that she had logged in to the Social Security Administration's website recently and was able to review her account and get a forecast of her Social Security retirement income, should she make it to retirement.<br/><br/>Intrigued, I decided to do the same thing.<br/><br/>I went to the website (<a href="https://www.ssa.gov/">www.ssa.gov</a>) and tried to create myself an account. I entered my Social Security number, my name, my address, my date of birth, all sorts of information.<br/><br/>After a while it told me that it could not create an account for my social security number. It suggested that I call the help desk and gave me a toll-free number.<br/><br/>So I called the number and followed the instructions to get to the help desk.<br/><br/>The help desk person was very nice. She asked me all of the same information that the website had asked.<br/><br/>She confirmed that the site would not activate access for me. She asked me if I had bad credit. I said no, I think I have excellent credit. She asked me about my mortgage, and I told her that my wife and I had paid off our mortgage a few years ago.<br/><br/>The woman told me that that explained the problem. It seems that the Social Security Administration outsources the identity verification function to one of the big credit bureaus. The credit bureau can not verify identity for folks without significant debt, it seems.<br/><br/>The only way for me to get an online account with the Social Security website is to take my passport or other identification to a Social Security field office and identify myself to them.<br/><br/>How absurd.<br/><br/>If I have paid off my mortgage, I can't get access to the Social Security website without a personal visit to a field office?<br/><br/>Well, I work for a living and I can not afford to take time off from work to go visit the Social Security field office in order to get access to the website. Too bad that the identity verification service doesn't work for folks without debt.<br/><br/> nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-12893656224847556822016-05-20T03:30:00.000-07:002021-04-14T11:17:08.787-07:00Buying notebooks ...Well, I'm trained as a scientist and engineer, so I keep a notebook. This is something I have done religiously since I was in grad school, much to my wife's dismay.<br/><br/>Since 1991 I have loved the National brand Chemistry Notebook (number 43-571), but National was bought a few years ago and the new owners cut a stupid corner by reducing the notebook from 128 pages to 120. Worse yet, this notebook has become rather expensive to buy, costing upward of $10 per book. The pages are still numbered for me, but the reduction from 128 to 120 remains an irritant. <img class="size-medium wp-image-796 aligncenter" src="http://nygeek.net/wp-content/uploads/2016/05/national-brand-43-571-chemistry-notebook.jpg?w=300" alt="national-brand-43-571-chemistry-notebook" width="300" height="300" /><br/><br/>So, when I recently changed jobs and, at the same time, ran out of notebooks I decided to switch to the Clairefontaine 9542C. This is a smaller notebook with paper that is slightly more opaque and quadrille ruled 5x5 to the inch.<br/><br/><img class="alignnone size-medium wp-image-795" src="http://nygeek.net/wp-content/uploads/2016/05/9542c_3.jpg?w=300" alt="9542C_3" width="300" height="300" /><br/><br/>Oddly, despite the fact that it is made in France and described with metric dimensions (14.8 cm x 21 cm) the ruling is specified as 5x5 to the inch. I agree that this is a convenient grid size for technical notebooks, but is there no metric ruling that matches? 0.5 cm comes to mind, since that would end up very close to 5x5 to the inch, since 5 x 0.5 cm is 2.5 cm, and 2.54 cm is an inch. Perhaps it is marketed as 0.5 cm square grid in Europe but as 5x5 to the inch in the US?<br/><br/>Anyway, I needed to buy some more of these notebooks. Normally I pick them up from a stationery store near my apartment, but that is inconvenient and expensive.<br/><br/>I tried looking for them on Amazon (amazon.com, to be precise). While I can find them, it's hard to tell which product is being sold because Amazon's product information for these Clairefontaine notebooks is dreadful. And they're expensive.<br/><br/>After being frustrated by the unusually low quality of Amazon's offerings I tried searching Google for "clairefontaine 9542c". To my surprise, I found an amazon.de page near the top of the organic results. Even more of a surprise was the fact that it was offering five of these lovely notebooks for about 10 euros, or only a little bit more than I was paying for one in the US.<br/><br/>Not reading German I decided to try amazon.co.uk. There I found these notebooks, again better described, priced at ten pounds for a package of five. I ordered two packages. Even with shipping to the US these notebooks come out at about half the price that I pay for them in the US.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-89215243835251734932016-04-03T05:09:00.000-07:002021-04-14T11:17:08.690-07:00Toy Data Center updateIt's been a while since I've written about my toy data center. I started with <a href="https://nygeek.net/2014/05/12/two-intel-nuc-servers-running-ubuntu/">two Intel NUCs</a> and shortly thereafter expanded to four. Each of the first pair has a 240 G SSD and the second pair each sports a 480 G SSD.<br/><br/><img class="alignnone size-full wp-image-782" src="http://nygeek.net/wp-content/uploads/2016/04/2016-04-03-toy-data-center-mug-shot.jpg" alt="2016-04-03-toy-data-center-mug-shot" width="2670" height="2046" /><br/><br/>All running Ubuntu 14.04.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com2tag:blogger.com,1999:blog-3000067919243237480.post-44066709924173557672015-07-18T01:17:00.000-07:002021-04-14T11:17:08.629-07:00Simple Python __str__(self) method for use during developmentFor my development work I want a simple way to display the data in an object instance without having to modify the <code>__str__(self)</code> method every time I add, delete, or rename members. Here's a technique I've adopted that relies on the fact that every object stores all of its members in a dictionary called <code>self.__dict__</code>. Making a string representation of the object is just a matter of returning a string representation of <code>__dict__</code>. This can be achieved in several ways. One of them is simply <code>str(self.__dict__)</code> and the other uses the JSON serializer <code>json.dumps()</code>, which lets you prettyprint the result.<br/><br/>Here's a little Python demonstrator program:<br/><pre><code><br/></code># /usr/bin/python<br/><br/>""" demo - demonstrate a simple technique to display text representations<br/> of Python objects using the __dict__ member and a json serializer.<br/><br/> $Id: demo.py,v 1.3 2015/07/18 13:07:15 marc Exp marc $<br/>"""<br/><br/>import json<br/><br/>class something(object):<br/> """ This is just a demonstration class. """<br/> def __init__(self, id, name):<br/> self.id = id<br/> self.name = name<br/><br/> def rename(self, name):<br/> self.name = name<br/><br/> def __str__(self):<br/> return json.dumps(self.__dict__, indent=2, separators=(',', ': '))<br/> # return str(self.__dict__)<br/><br/>def main():<br/> o1 = something(1, "first object")<br/> o2 = something(2, "second object")<br/><br/> print str(o1)<br/> print str(o2)<br/><br/> o1.rename("dba third object")<br/><br/> print str(o1)<br/><br/>if __name__ == '__main__':<br/> main()</pre><br/>Running it produces this output:<br/><pre><code><br/>$ python demo.py<br/>{<br/> "id": 1,<br/> "name": "first object"<br/>}<br/>{<br/> "id": 2,<br/> "name": "second object"<br/>}<br/>{<br/> "id": 1,<br/> "name": "dba third object"<br/>}<br/> </code></pre><br/>Nice and easy for testing and debugging. Once I'm ready for production and no longer want the JSON representations I can introduce a DEBUG flag so that the non-DEBUG behavior of <code>__str__(self)</code> is appropriate to the production use.<br/><br/>[update]<br/><br/>What's wrong with this? If I have a member that is itself an object, then the json.dumps() call fails. Ideally Python would call __str__() on a member if __str__() was called on the object.<br/><br/>On reading some more goodies, it's clear that what I should be using is repr() and not str().nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-41262924687605022532014-11-08T02:51:00.000-08:002021-04-14T11:17:08.566-07:00Economical NUC desktop running UbuntuThe TV in the kitchen has long had a Mac Mini attached to one of its inputs. We used it to watch Youtube videos, listen to music from iTunes and Google Music, to browse the web, to show photographs from our trips, and so on.<br/><br/>Sadly, the little Mini passed away earlier this year, refusing to power up. When we priced out replacement machines we discovered that the new Minis were a lot more expensive, even if a the same time more capable.<br/><br/><a href="http://nygeek.net/wp-content/uploads/2014/11/2014-11-08-nuc-desktop.jpg"><img class="alignnone size-medium wp-image-709" src="http://nygeek.net/wp-content/uploads/2014/11/2014-11-08-nuc-desktop.jpg?w=300" alt="2014-11-08-nuc-desktop" width="300" height="200" /></a><br/><br/>Given that we were not planning to store lots of data on the machine, we decided to leverage the lessons we had learned from building our little collection of NUC servers and design and build a small desktop on one of the NUC engines. We conducted some research and selected a machine sporting an i3 processor. The parts list we ended up with was:<br/><ul><br/> <li>Intel NUC DCCP847DYE [1 @ $ 146.22]<br/><ul><br/> <li>Intel Core i3 Processor</li><br/></ul><br/></li><br/> <li>Crucial CT120M500SSD3 [1 @ $ 72.09]<br/><ul><br/> <li>120GB mSATA SSD</li><br/></ul><br/></li><br/> <li>Crucial CT25664BF160B [2 @ $ 20.97]<br/><ul><br/> <li>2GB DDR3 1600 SODIMM 204-Pin 1.35V/1.5V Memory Module</li><br/></ul><br/></li><br/> <li>Intel Network 7260.HMWG [1 @ $30.95]<br/><ul><br/> <li>WiFi and Bluetooth HMC</li><br/></ul><br/></li><br/> <li>Belkin 6ft / 3 Prong Notebook Power Cord [1 @ $6.53]</li><br/></ul><br/>Which brought the total expense to $ 297.73, substantially cheaper than the more highly configured i5-based servers that we <a title="Two Intel NUC servers running Ubuntu" href="http://nygeek.net/2014/05/12/two-intel-nuc-servers-running-ubuntu/" target="_blank" rel="noopener">described in a previous post</a>.<br/><br/>We ordered the parts from Amazon and they arrived a few days later.<br/><br/>The next step was to get the BIOS patches needed for the machine and an install image.<br/><br/>The new BIOS image came from the Intel site. Note that the BIOS for the DYE line is different from that in the i5-based WYK line that we used for the servers. The BIOS patch that we downloaded is named gk0054.bio and we found it on an <a title="BIOS Update [GKPPT10H.86A]" href="https://downloadcenter.intel.com/confirm.aspx?httpDown=http://downloadmirror.intel.com/24358/eng/GK0054.BIO&Lang=eng&Dwnldid=24358" target="_blank" rel="noopener">Intel page</a> (easier to find with a search engine than with the Intel site navigation tools, but easy either way).<br/><br/>The Ubuntu desktop image is on the <a href="http://www.ubuntu.com/download/desktop" target="_blank" rel="noopener">Ubuntu site</a> ... they ask you for a donation (give one if you can afford it, please).<br/><br/>The, by now familiar, steps to create an installable image on a USB flash drive are:<br/><pre>> diskutil list<br/>> hdiutil convert -format UDRW -o ubuntu-14.04.1-desktop-amd64.img ubuntu-14.04.1-desktop-amd64.iso <br/>> diskutil unmountDisk /dev/disk2<br/>> sudo dd if=ubuntu-14.04.1-desktop-amd64.img.dmg of=/dev/rdisk2 bs=1m<br/></pre><br/>Where /dev/disk2 and /dev/rdisk2 are identified from examination of the output of the diskutil list call.<br/><br/>That done, we recorded the MAC address from the NUC packaging and updated our DHCP and DNS configurations so that the machine would get its host name and IP address from our infrastructure.<br/><br/>A couple of important differences between building a desktop and a server:<br/><ul><br/> <li>We added the WiFi and Bluetooth network card to the machine. We did not use the WiFi capability, since we were installing the machine in a location with good hard-wired Ethernet connectivity, but we did plan to use a Bluetooth keyboard and mouse on the machine.</li><br/> <li>The desktop install image for Ubuntu 14.04 is big, about 1/3 larger than the server image. The first device we used for the install was the same 1G drive that I had used for my initial server installs, before I got the network install working. What we didn't realize, and dd did not tell us, is that the image was too big for the 1G drive. When we tried to do the install the first time we got a cryptic error message from the BIOS. It took us a while, stumbling around in the dark, to realize that the install image was too big for the drive we were using. After we rebuilt the install image on a 32G drive we had in a drawer, the install proceeded without error.</li><br/></ul><br/>After the installation completed we had trouble getting the Bluetooth keyboard and mouse to work well. The machine ultimately paired with the keyboard, but we could not get input to it.<br/><br/>We then thought back on some of the information we'd seen for our earlier NUC research and verified that the machine actually has an integrated antenna. We opened up the case and found the antenna wires, which we connected to the wireless card as shown in this picture:<br/><br/><a href="http://nygeek.net/wp-content/uploads/2014/11/nuc-antenna-wires-connected.jpg"><img class="alignnone size-medium wp-image-708" src="http://nygeek.net/wp-content/uploads/2014/11/nuc-antenna-wires-connected.jpg?w=300" alt="nuc-antenna-wires-connected" width="300" height="225" /></a><br/><br/>Shortly after we were logged on to the machine. We installed Chrome and connected up to a Google Music library and were playing music as background to a photo slide show within a few minutes.<br/><br/>The only remaining problem is that the Apple Wireless Trackpad that we're using seems to regularly stop talking to the machine. The pointer freezes and we're left using the tab key to navigate the fields of the active window.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-77811909615013121082014-11-06T05:32:00.000-08:002021-04-14T11:17:08.507-07:00Adding CPUInfo to SysinfoThere is a lot of interesting information about the processor hardware in /proc/cpuinfo. Here is a little bit from one of my NUC servers:<br/><pre>processor : 0<br/>vendor_id : GenuineIntel<br/>cpu family : 6<br/>model : 69<br/>model name : Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz<br/>stepping : 1<br/>microcode : 0x16<br/>cpu MHz : 779.000<br/>cache size : 3072 KB<br/>physical id : 0<br/>siblings : 4<br/>core id : 0<br/>cpu cores : 2<br/>apicid : 0<br/>initial apicid : 0<br/>fpu : yes<br/>fpu_exception : yes<br/>cpuid level : 13<br/>wp : yes<br/>flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid<br/>bogomips : 3791.14<br/>clflush size : 64<br/>cache_alignment : 64<br/>address sizes : 39 bits physical, 48 bits virtual<br/>power management:<br/></pre><br/>The content of "cat /proc/cpuinfo" is actually four copies this, with small variations in core id (ranging between 0 and 1), the processor (ranging between 0 and 3), and the apcid (ranging from 0 to 3).<br/><br/>In order to add this information to my sysinfo.py I wrote a new module, cpuinfo.py, modeled on the df.py module that I used to add filesystem information.<br/><pre>""" Parse the content of /proc/cpuinfo and create JSON objects for each cpu<br/><br/>Written by Marc Donner<br/>$Id: cpuinfo.py,v 1.7 2014/11/06 18:25:30 marc Exp marc $<br/><br/>"""<br/><br/>import subprocess<br/>import json<br/>import re<br/><br/>def main():<br/> """Main routine"""<br/> print CPUInfo().to_json()<br/> return<br/><br/># Utility routine ...<br/>#<br/># The /proc/cpuinfo content is a set of (attribute, value records)<br/># the separator between attribute and value is "/t+: "<br/>#<br/># When there are multiple CPUs, there's a blank line between sets<br/># of lines.<br/>#<br/><br/>class CPUInfo(object):<br/> """ An object with key data from the content of the /proc/cpuinfo file """<br/><br/> def __init__(self):<br/> self.cpus = {}<br/> self.populated = False<br/><br/> def to_json(self):<br/> """ Display the object as a JSON string (prettyprinted) """<br/> if not self.populated:<br/> self.populate()<br/> return json.dumps(self.cpus, sort_keys=True, indent=2)<br/><br/> def get_array(self):<br/> """ return the array of cpus """<br/> if not self.populated:<br/> self.populate()<br/> return self.cpus["processors"]<br/><br/> def populate(self):<br/> """ get the content of /proc/cpuinfo and populate the arrays """<br/> self.cpus["processors"] = []<br/> cpu = {}<br/> cpu["processor"] = {}<br/> text = str(subprocess.check_output(["cat", "/proc/cpuinfo"])).rstrip()<br/> lines = text.split('n')<br/> # Use re.split because there's a varying number of tabs :-(<br/> array = [re.split('t+: ', x) for x in lines]<br/> # cpuinfo is structured as n blocks of data, one per logical processor<br/> # o each block has the processor id (0, 1, ...) as its first row.<br/> # o each block ends with a blank row<br/> # o some of the rows have attributes but no values<br/> # (e.g. power_management)<br/> for row in range(0, len(array[:])):<br/> # New processor detected - attach this one to the output, then<br/> if len(lines[row]) == 0:<br/> # create a new processor<br/> self.cpus["processors"].append(cpu)<br/> cpu = {}<br/> cpu["processor"] = {}<br/> if len(array[row]) == 2:<br/> (attribute, value) = array[row]<br/> attribute = attribute.replace(" ", "_")<br/> cpu["processor"][attribute] = value<br/> self.cpus["processors"].append(cpu)<br/> self.populated = True<br/><br/>if __name__ == '__main__':<br/> main()<br/></pre><br/>The state machine implicit in the main loop of populate() is plausibly efficient, though there remains something about it that annoys me. I need to think about edge cases and failure modes to see whether I can make it better.<br/><br/>The result is an augmented json object including info on the logical processors:<br/><pre>cat crepe.sysinfo <br/>{<br/> "boot_time": "system boot 2014-09-14 16:03", <br/> "bufferram": 193994752, <br/> "distro_codename": "trusty", <br/> "distro_description": "Ubuntu 14.04.1 LTS", <br/> "distro_distributor": "Ubuntu", <br/> "distro_release": "14.04", <br/> "filesystems": [<br/> {<br/> "filesystem": {<br/> "mount_point": "/", <br/> "name": "/dev/sda1", <br/> "size": "444919888", <br/> "used": "3038660"<br/> }<br/> }, <br/> {<br/> "filesystem": {<br/> "mount_point": "/sys/fs/cgroup", <br/> "name": "none", <br/> "size": "4", <br/> "used": "0"<br/> }<br/> }, <br/> {<br/> "filesystem": {<br/> "mount_point": "/dev", <br/> "name": "udev", <br/> "size": "8169708", <br/> "used": "4"<br/> }<br/> }, <br/> {<br/> "filesystem": {<br/> "mount_point": "/run", <br/> "name": "tmpfs", <br/> "size": "1636112", <br/> "used": "564"<br/> }<br/> }, <br/> {<br/> "filesystem": {<br/> "mount_point": "/run/lock", <br/> "name": "none", <br/> "size": "5120", <br/> "used": "0"<br/> }<br/> }, <br/> {<br/> "filesystem": {<br/> "mount_point": "/run/shm", <br/> "name": "none", <br/> "size": "8180548", <br/> "used": "4"<br/> }<br/> }, <br/> {<br/> "filesystem": {<br/> "mount_point": "/run/user", <br/> "name": "none", <br/> "size": "102400", <br/> "used": "0"<br/> }<br/> }<br/> ], <br/> "freeram": 12954943488, <br/> "freeswap": 17103319040, <br/> "hardware_platform": "x86_64", <br/> "kernel_name": "Linux", <br/> "kernel_release": "3.13.0-35-generic", <br/> "kernel_version": "#62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014", <br/> "machine": "x86_64", <br/> "mem_unit": 1, <br/> "nodename": "crepe", <br/> "operating_system": "GNU/Linux", <br/> "processor": "x86_64", <br/> "processors": [<br/> {<br/> "processor": {<br/> "address_sizes": "39 bits physical, 48 bits virtual", <br/> "apicid": "0", <br/> "bogomips": "3791.14", <br/> "cache_alignment": "64", <br/> "cache_size": "3072 KB", <br/> "clflush_size": "64", <br/> "core_id": "0", <br/> "cpu_MHz": "779.000", <br/> "cpu_cores": "2", <br/> "cpu_family": "6", <br/> "cpuid_level": "13", <br/> "flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid", <br/> "fpu": "yes", <br/> "fpu_exception": "yes", <br/> "initial_apicid": "0", <br/> "microcode": "0x16", <br/> "model": "69", <br/> "model_name": "Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz", <br/> "physical_id": "0", <br/> "processor": "0", <br/> "siblings": "4", <br/> "stepping": "1", <br/> "vendor_id": "GenuineIntel", <br/> "wp": "yes"<br/> }<br/> }, <br/> {<br/> "processor": {<br/> "address_sizes": "39 bits physical, 48 bits virtual", <br/> "apicid": "2", <br/> "bogomips": "3791.14", <br/> "cache_alignment": "64", <br/> "cache_size": "3072 KB", <br/> "clflush_size": "64", <br/> "core_id": "1", <br/> "cpu_MHz": "779.000", <br/> "cpu_cores": "2", <br/> "cpu_family": "6", <br/> "cpuid_level": "13", <br/> "flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid", <br/> "fpu": "yes", <br/> "fpu_exception": "yes", <br/> "initial_apicid": "2", <br/> "microcode": "0x16", <br/> "model": "69", <br/> "model_name": "Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz", <br/> "physical_id": "0", <br/> "processor": "1", <br/> "siblings": "4", <br/> "stepping": "1", <br/> "vendor_id": "GenuineIntel", <br/> "wp": "yes"<br/> }<br/> }, <br/> {<br/> "processor": {<br/> "address_sizes": "39 bits physical, 48 bits virtual", <br/> "apicid": "1", <br/> "bogomips": "3791.14", <br/> "cache_alignment": "64", <br/> "cache_size": "3072 KB", <br/> "clflush_size": "64", <br/> "core_id": "0", <br/> "cpu_MHz": "779.000", <br/> "cpu_cores": "2", <br/> "cpu_family": "6", <br/> "cpuid_level": "13", <br/> "flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid", <br/> "fpu": "yes", <br/> "fpu_exception": "yes", <br/> "initial_apicid": "1", <br/> "microcode": "0x16", <br/> "model": "69", <br/> "model_name": "Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz", <br/> "physical_id": "0", <br/> "processor": "2", <br/> "siblings": "4", <br/> "stepping": "1", <br/> "vendor_id": "GenuineIntel", <br/> "wp": "yes"<br/> }<br/> }, <br/> {<br/> "processor": {<br/> "address_sizes": "39 bits physical, 48 bits virtual", <br/> "apicid": "3", <br/> "bogomips": "3791.14", <br/> "cache_alignment": "64", <br/> "cache_size": "3072 KB", <br/> "clflush_size": "64", <br/> "core_id": "1", <br/> "cpu_MHz": "1000.000", <br/> "cpu_cores": "2", <br/> "cpu_family": "6", <br/> "cpuid_level": "13", <br/> "flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid", <br/> "fpu": "yes", <br/> "fpu_exception": "yes", <br/> "initial_apicid": "3", <br/> "microcode": "0x16", <br/> "model": "69", <br/> "model_name": "Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz", <br/> "physical_id": "0", <br/> "processor": "3", <br/> "siblings": "4", <br/> "stepping": "1", <br/> "vendor_id": "GenuineIntel", <br/> "wp": "yes"<br/> }<br/> }<br/> ], <br/> "report_date": "2014-11-06 13:27:06", <br/> "sharedram": 0, <br/> "totalhigh": 0, <br/> "totalram": 16753766400, <br/> "totalswap": 17103319040, <br/> "uptime": 4573401<br/>}<br/></pre><br/>I am tempted to augment the module with a configuration capability that would let me set sysinfo up to restrict the set of data from /dev/cpuinfo that I actually include in the sysinfo structure. Do I need "fpu" and "fpu_exception" or "clflush_size" for the things that I will be using the sysinfo stuff for? I'm skeptical. If I make it a configurable filter I can always incorporate data elements after I decide they're interesting.<br/><br/>Decisions, decisions.<br/><br/>Moreover, the multiple repetition of the CPU information is annoying. The four attributes that vary are, processor, core id, apicid, and initial apicid. The values are structured thus (initial apicid seems never to vary from apicid):<br/><br/><table><br/><tr><th>processor</th><th>core id</th><th>apicid</th></tr><br/><tr><td>0</td><td>0</td><td>0</td></tr><br/><tr><td>1</td><td>1</td><td>2</td></tr><br/><tr><td>2</td><td>0</td><td>1</td></tr><br/><tr><td>3</td><td>1</td><td>3</td></tr><br/><table><br/><br/>It would be much more sensible to reduce the size and complexity of the processors section by consolidating the common parts and displaying the variant sections in some sensible subsidiary fashion.<br/><br/>These items are discussed in <a href="https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration" title="Intel 64 Architecture Processor Topology Enumeration" target="_blank">this Intel web page</a>.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-73096785191650123172014-09-02T13:03:00.000-07:002021-04-14T11:17:08.425-07:00JSON output from DFSo I'm adding more capabilities to my sysinfo.py program. The next thing that I want to do is get a JSON result from <code>df</code>. This is a function whose description, from the man page, says "report file system disk space usage".<br/><br/>Here is a sample of the output of df for one of my systems:<br/><br/><pre><br/>Filesystem 1K-blocks Used Available Use% Mounted on<br/>/dev/mapper/flapjack-root 959088096 3802732 906566516 1% /<br/>udev 1011376 4 1011372 1% /dev<br/>tmpfs 204092 288 203804 1% /run<br/>none 5120 0 5120 0% /run/lock<br/>none 1020452 0 1020452 0% /run/shm<br/>/dev/sda1 233191 50734 170016 23% /boot<br/></pre><br/><br/>So I started by writing a little Python program that used the <code>subprocess.check_output()</code> method to capture the output of <code>df</code>.<br/><br/>This went through various iterations and ended up with this single line of python code, which requires eleven lines of comments to explain it:<br/><br/><pre><br/>#<br/># this next line of code is pretty tense ... let me explain what<br/># it does:<br/># subprocess.check_output(["df"]) runs the df command and returns<br/># the output as a string<br/># rstrip() trims of the last whitespace character, which is a 'n'<br/># split('n') breaks the string at the newline characters ... the<br/># result is an array of strings<br/># the list comprehension then applies shlex.split() to each string,<br/># breaking each into tokens<br/># when we're done, we have a two-dimensional array with rows of<br/># tokens and we're ready to make objects out of them<br/>#<br/>df_array = [shlex.split(x) for x in<br/> subprocess.check_output(["df"]).rstrip().split('n')]<br/></pre><br/><br/>My original <code>df.py</code> code constructed the JSON result manually, a painfully finicky process. After I got it running I remembered a lesson I learned from my dear friend the late David Nochlin, namely that I should construct an object and then use a rendering library to create the JSON serialization.<br/><br/>So I did some digging around and discovered that the Python <code>json</code> library includes a fairly sensible serialization method that supports prettyprinting of the result. The result was a much cleaner piece of code:<br/><br/><pre><br/># df.py<br/>#<br/># parse the output of df and create JSON objects for each filesystem.<br/>#<br/># $Id: df.py,v 1.5 2014/09/03 00:41:31 marc Exp $<br/>#<br/><br/># now let's parse the output of df to get filesystem information<br/>#<br/># Filesystem 1K-blocks Used Available Use% Mounted on<br/># /dev/mapper/flapjack-root 959088096 3799548 906569700 1% /<br/># udev 1011376 4 1011372 1% /dev<br/># tmpfs 204092 288 203804 1% /run<br/># none 5120 0 5120 0% /run/lock<br/># none 1020452 0 1020452 0% /run/shm<br/># /dev/sda1 233191 50734 170016 23% /boot<br/><br/>import subprocess<br/>import shlex<br/>import json<br/><br/>def main():<br/> """Main routine - call the df utility and return a json structure."""<br/><br/> # this next line of code is pretty tense ... let me explain what<br/> # it does:<br/> # subprocess.check_output(["df"]) runs the df command and returns<br/> # the output as a string<br/> # rstrip() trims of the last whitespace character, which is a 'n'<br/> # split('n') breaks the string at the newline characters ... the<br/> # result is an array of strings<br/> # the list comprehension then applies shlex.split() to each string,<br/> # breaking each into tokens<br/> # when we're done, we have a two-dimensional array with rows of<br/> # tokens and we're ready to make objects out of them<br/> df_array = [shlex.split(x) for x in<br/> subprocess.check_output(["df"]).rstrip().split('n')]<br/> df_num_lines = df_array[:].__len__()<br/><br/> df_json = {}<br/> df_json["filesystems"] = []<br/> for row in range(1, df_num_lines):<br/> df_json["filesystems"].append(df_to_json(df_array[row]))<br/> print json.dumps(df_json, sort_keys=True, indent=2)<br/> return<br/><br/>def df_to_json(tokenList):<br/> """Take a list of tokens from df and return a python object."""<br/> # If df's ouput format changes, we'll be in trouble, of course.<br/> # the 0 token is the name of the filesystem<br/> # the 1 token is the size of the filesystem in 1K blocks<br/> # the 2 token is the amount used of the filesystem<br/> # the 5 token is the mount point<br/> result = {}<br/> fsName = tokenList[0]<br/> fsSize = tokenList[1]<br/> fsUsed = tokenList[2]<br/> fsMountPoint = tokenList[5]<br/> result["filesystem"] = {}<br/> result["filesystem"]["name"] = fsName<br/> result["filesystem"]["size"] = fsSize<br/> result["filesystem"]["used"] = fsUsed<br/> result["filesystem"]["mount_point"] = fsMountPoint<br/> return result<br/><br/>if __name__ == '__main__':<br/> main()<br/></pre><br/><br/>which, in turn, produces a rather nice df output in JSON.<br/><br/><pre><br/>{<br/> "filesystems": [<br/> {<br/> "filesystem": {<br/> "mount_point": "/", <br/> "name": "/dev/mapper/flapjack-root", <br/> "size": "959088096", <br/> "used": "3802632"<br/> }<br/> }, <br/> {<br/> "filesystem": {<br/> "mount_point": "/dev", <br/> "name": "udev", <br/> "size": "1011376", <br/> "used": "4"<br/> }<br/> }, <br/> {<br/> "filesystem": {<br/> "mount_point": "/run", <br/> "name": "tmpfs", <br/> "size": "204092", <br/> "used": "288"<br/> }<br/> }, <br/> {<br/> "filesystem": {<br/> "mount_point": "/run/lock", <br/> "name": "none", <br/> "size": "5120", <br/> "used": "0"<br/> }<br/> }, <br/> {<br/> "filesystem": {<br/> "mount_point": "/run/shm", <br/> "name": "none", <br/> "size": "1020452", <br/> "used": "0"<br/> }<br/> }, <br/> {<br/> "filesystem": {<br/> "mount_point": "/boot", <br/> "name": "/dev/sda1", <br/> "size": "233191", <br/> "used": "50734"<br/> }<br/> }<br/> ]<br/>}<br/></pre><br/><br/>Quite a lot of fun, really.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com1tag:blogger.com,1999:blog-3000067919243237480.post-88512610664934120422014-08-31T12:59:00.000-07:002021-04-14T11:17:08.361-07:00Automatic InventoryNow I have four machines. Keeping them in sync is the challenge. Worse yet, knowing whether they are in sync or out of sync is a challenge.<br/><br/>So the first step is to make a tool to inventory each machine. In order to use the inventory utility in a scalable way, I want to design it to produce machine-readable results so that I can easily incorporate them into whatever I need.<br/><br/>What I want is a representation that is both friendly to humans and to computers. This suggests a self-describing text representation like XML or JSON. After a little thought I picked JSON.<br/><br/>What sorts of things do I want to know about the machine? Well, let's start with the hardware and the operating system software plus things like the quantity of RAM and other system resources. Some of that information is available from uname and other is availble from the sysinfo(2) function.<br/><br/>To get the information from the sysinfo(2) function I had to do several things:<br/><ul><br/> <li>Install sysinfo on each machine<br/><ul><br/> <li>sudo apt-get install sysinfo</li><br/></ul><br/></li><br/> <li>Write a little program to call sysinfo(2) and report out the results<br/><ul><br/> <li>getSysinfo.c</li><br/></ul><br/></li><br/></ul><br/>Of course this program, getSysinfo.c is a quick-and-dirty - the error handling is almost nonexistent and I ought to have generalized the mechanism to work from a data structure that includes the name of the flag and the attribute name and doesn't have the clumsy sequence of if statements.<br/><pre width="400"><br/>/*<br/> * getSysinfo.c<br/> *<br/> * $Id: getSysinfo.c,v 1.4 2014/08/31 17:29:43 marc Exp $<br/> *<br/> * Started 2014-08-31 by Marc Donner<br/> *<br/> * Using the sysinfo(2) call to report on system information<br/> *<br/> */<br/><br/>#include <stdio.h> /* for printf */<br/>#include <stdlib.h> /* for exit */<br/>#include <unistd.h> /* for getopt */<br/>#include <sys/sysinfo.h> /* for sysinfo */<br/><br/>int main(int argc, char **argv) {<br/><br/> /* Call the sysinfo(2) system call with a pointer to a structure */<br/> /* and then display the results */<br/> struct sysinfo toDisplay;<br/> int rc;<br/><br/> if ( rc = sysinfo(&toDisplay) ) {<br/> printf(" rc: %dn", rc);<br/> exit(rc);<br/> }<br/><br/> int c;<br/> int opt_a = 0;<br/> int opt_b = 0;<br/> int opt_f = 0;<br/> int opt_g = 0;<br/> int opt_h = 0;<br/> int opt_m = 0;<br/> int opt_r = 0;<br/> int opt_s = 0;<br/> int opt_u = 0;<br/> int opt_w = 0;<br/> int opt_help = 0;<br/> int opt_none = 1;<br/><br/> while ( (c = getopt(argc, argv, "abfghmrsuw?")) != -1) {<br/> opt_none = 0;<br/> switch (c) {<br/> case 'a':<br/> opt_a = 1;<br/> break;<br/> case 'b':<br/> opt_b = 1;<br/> break;<br/> case 'f':<br/> opt_f = 1;<br/> break;<br/> case 'g':<br/> opt_g = 1;<br/> break;<br/> case 'h':<br/> opt_h = 1;<br/> break;<br/> case 'm':<br/> opt_m = 1;<br/> break;<br/> case 'r':<br/> opt_r = 1;<br/> break;<br/> case 's':<br/> opt_s = 1;<br/> break;<br/> case 'u':<br/> opt_u = 1;<br/> break;<br/> case 'w':<br/> opt_w = 1;<br/> break;<br/> case '?':<br/> opt_help = 1;<br/> break;<br/> }<br/> }<br/><br/> if ( opt_none || opt_help ) {<br/> showHelp();<br/> return 100;<br/> } else {<br/> if ( opt_u || opt_a ) { printf(" "uptime": %lun", toDisplay.uptime); }<br/> if ( opt_r || opt_a ) { printf(" "totalram": %lun", toDisplay.totalram); }<br/> if ( opt_f || opt_a ) { printf(" "freeram": %lun", toDisplay.freeram); }<br/> if ( opt_b || opt_a ) { printf(" "bufferram": %lun", toDisplay.bufferram); }<br/> if ( opt_s || opt_a ) { printf(" "sharedram": %lun", toDisplay.sharedram); }<br/> if ( opt_w || opt_a ) { printf(" "totalswap": %lun", toDisplay.totalswap); }<br/> if ( opt_g || opt_a ) { printf(" "freeswap": %lun", toDisplay.freeswap); }<br/> if ( opt_h || opt_a ) { printf(" "totalhigh": %lun", toDisplay.totalhigh); }<br/> if ( opt_m || opt_a ) { printf(" "mem_unit": %dn", toDisplay.mem_unit); }<br/> return 0;<br/> }<br/>}<br/><br/>int showHelp() {<br/> printf( "Syntax: getSysinfo [options]n" );<br/> printf( "nDisplay results from the sysinfo(2) result structurenn" );<br/> printf( "Options:n" );<br/> printf( " -b : bufferramn" );<br/> printf( " -f : freeramn" );<br/> printf( " -g : freeswapn" );<br/> printf( " -h : totalhighn" );<br/> printf( " -m : mem_unitn" );<br/> printf( " -r : totalramn" );<br/> printf( " -s : sharedramn" );<br/> printf( " -u : uptimen" );<br/> printf( " -w : totalswapnn" );<br/> printf( "getSysinfo also accepts arbitrary combinations of permitted options." );<br/> return 100;<br/>}<br/></pre><br/><br/>And with this in place, the python program sysinfo.py required to pull together various other bits and pieces becomes possible:<br/><br/><pre><br/>#<br/># sysinfo<br/>#<br/># report a JSON object describing the current system<br/>#<br/># $Id: sysinfo.py,v 1.8 2014/08/31 21:04:30 marc Exp $<br/>#<br/><br/>from subprocess import call<br/>from subprocess import check_output<br/>import time<br/><br/># First we get the uname information<br/>#<br/># kernel_name : -s<br/># nodename : -n<br/># kernel_release : -r<br/># kernel_version : -v<br/># machine : -m<br/># processor : -p<br/># hardware_platform : -i<br/># operating_system : -o<br/>#<br/><br/>operating_system = check_output( ["uname", "-o"] ).rstrip()<br/>kernel_name = check_output( ["uname", "-s"] ).rstrip()<br/>kernel_release = check_output( ["uname", "-r"] ).rstrip()<br/>kernel_version = check_output( ["uname", "-v"] ).rstrip()<br/>nodename = check_output( ["uname", "-n"] ).rstrip()<br/>machine = check_output( ["uname", "-m"] ).rstrip()<br/>processor = check_output( ["uname", "-p"] ).rstrip()<br/>hardware_platform = check_output( ["uname", "-i"] ).rstrip()<br/><br/># now we get the boot time using who -b<br/>boot_time = check_output( ["who", "-b"]).rstrip().lstrip()<br/><br/># now we get information from our handy-dandy getSysinfo program<br/>GETSYSINFO = "/home/marc/projects/s/sysinfo/getSysinfo"<br/>getsysinfo_uptime = check_output( [GETSYSINFO, "-u"] ).rstrip().lstrip()<br/>getsysinfo_totalram = check_output( [GETSYSINFO, "-r"] ).rstrip().lstrip()<br/>getsysinfo_freeram = check_output( [GETSYSINFO, "-f"] ).rstrip().lstrip()<br/>getsysinfo_bufferrram = check_output( [GETSYSINFO, "-b"] ).rstrip().lstrip()<br/>getsysinfo_sharedram = check_output( [GETSYSINFO, "-s"] ).rstrip().lstrip()<br/>getsysinfo_totalswap = check_output( [GETSYSINFO, "-w"] ).rstrip().lstrip()<br/>getsysinfo_freeswap = check_output( [GETSYSINFO, "-g"] ).rstrip().lstrip()<br/>getsysinfo_totalhigh = check_output( [GETSYSINFO, "-h"] ).rstrip().lstrip()<br/>getsysinfo_mem_unit = check_output( [GETSYSINFO, "-m"] ).rstrip().lstrip()<br/><br/>print "{"<br/>print " "report_date": "" + time.strftime("%Y-%m-%d %H:%M:%S") + "","<br/>print " "operating_system": " + """ + operating_system + "","<br/>print " "kernel_name": " + """ + kernel_name + "","<br/>print " "kernel_release": " + """ + kernel_release + "","<br/>print " "kernel_version": " + """ + kernel_version + "","<br/>print " "nodename": " + """ + nodename + "","<br/>print " "machine": " + """ + machine + "","<br/>print " "processor": " + """ + processor + "","<br/>print " "hardware_platform": " + """ + hardware_platform + "","<br/>print " "boot_time": " + """ + boot_time + "","<br/>print " " + getsysinfo_uptime + ","<br/>print " " + getsysinfo_totalram + ","<br/>print " " + getsysinfo_freeram + ","<br/>print " " + getsysinfo_sharedram + ","<br/>print " " + getsysinfo_totalswap + ","<br/>print " " + getsysinfo_totalhigh + ","<br/>print " " + getsysinfo_freeswap + ","<br/>print " " + getsysinfo_mem_unit<br/>print "}"<br/></pre><br/><br/>which in turn enables the Makefile:<br/><br/><pre><br/>#<br/># Makefile for sysinfo<br/>#<br/># $Id: Makefile,v 1.9 2014/08/31 21:27:35 marc Exp $<br/>#<br/><br/>FORCE := force<br/><br/>HOST := $(shell hostname)<br/>HOSTS := flapjack waffle pancake frenchtoast<br/>SSH_FILES := $(HOSTS:%=.%_ssh)<br/>PUSH_HOSTS := $(filter-out ${HOST}, ${HOSTS})<br/>PUSH_FILES := $(PUSH_HOSTS:%=.%_push)<br/><br/>help: ${FORCE}<br/> cat Makefile<br/><br/>FILES := Makefile sysinfo.py sysinfo.bash getSysinfo.c<br/><br/>checkin: ${FILES}<br/> ci -l ${FILES}<br/><br/>install: ~/bin/sysinfo<br/><br/>~/bin/sysinfo: ./sysinfo.bash<br/> cp $< $@<br/> chmod +x $@<br/><br/>getSysinfo: getSysinfo.c<br/> cc $ $*.sysinfo<br/> touch $@<br/><br/>test: ${FORCE}<br/> time python sysinfo.py<br/><br/>force:<br/></pre><br/><br/>Notice the little trick with the Makefile variables HOST, HOSTS, SSH_FILES, PUSH_HOSTS, and PUSH_FILES that lets one host push to the others for distributing the code but lets it call on all of the hosts when gathering data.<br/><br/>With all of this machinery in place and distributed to all of the UNIX machines in my little network, I was now able to type 'make ssh' and get the resulting output:<br/><br/><pre><br/>marc@flapjack:~/projects/s/sysinfo$ more *.sysinfo<br/>::::::::::::::<br/>flapjack.sysinfo<br/>::::::::::::::<br/>{<br/> "report_date": "2014-09-01 10:37:30",<br/> "operating_system": "GNU/Linux",<br/> "kernel_name": "Linux",<br/> "kernel_release": "3.2.0-52-generic",<br/> "kernel_version": "#78-Ubuntu SMP Fri Jul 26 16:21:44 UTC 2013",<br/> "nodename": "flapjack",<br/> "machine": "x86_64",<br/> "processor": "x86_64",<br/> "hardware_platform": "x86_64",<br/> "boot_time": "system boot 2014-08-07 22:01",<br/> "uptime": 2118958,<br/> "totalram": 2089889792,<br/> "freeram": 145928192,<br/> "sharedram": 0,<br/> "totalswap": 2134896640,<br/> "totalhigh": 0,<br/> "freeswap": 2062192640,<br/> "mem_unit": 1<br/>}<br/>::::::::::::::<br/>frenchtoast.sysinfo<br/>::::::::::::::<br/>{<br/> "report_date": "2014-09-01 10:37:31",<br/> "operating_system": "GNU/Linux",<br/> "kernel_name": "Linux",<br/> "kernel_release": "3.13.0-32-generic",<br/> "kernel_version": "#57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014",<br/> "nodename": "frenchtoast",<br/> "machine": "x86_64",<br/> "processor": "x86_64",<br/> "hardware_platform": "x86_64",<br/> "boot_time": "system boot 2014-07-19 14:58",<br/> "uptime": 3785970,<br/> "totalram": 16753840128,<br/> "freeram": 14150377472,<br/> "sharedram": 0,<br/> "totalswap": 17103319040,<br/> "totalhigh": 0,<br/> "freeswap": 17103319040,<br/> "mem_unit": 1<br/>}<br/>::::::::::::::<br/>pancake.sysinfo<br/>::::::::::::::<br/>{<br/> "report_date": "2014-09-01 10:37:31",<br/> "operating_system": "GNU/Linux",<br/> "kernel_name": "Linux",<br/> "kernel_release": "3.13.0-35-generic",<br/> "kernel_version": "#62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014",<br/> "nodename": "pancake",<br/> "machine": "x86_64",<br/> "processor": "x86_64",<br/> "hardware_platform": "x86_64",<br/> "boot_time": "system boot 2014-08-31 09:06",<br/> "uptime": 91840,<br/> "totalram": 16753819648,<br/> "freeram": 15609884672,<br/> "sharedram": 0,<br/> "totalswap": 17104367616,<br/> "totalhigh": 0,<br/> "freeswap": 17104367616,<br/> "mem_unit": 1<br/>}<br/>::::::::::::::<br/>waffle.sysinfo<br/>::::::::::::::<br/>{<br/> "report_date": "2014-09-01 10:37:30",<br/> "operating_system": "GNU/Linux",<br/> "kernel_name": "Linux",<br/> "kernel_release": "3.13.0-35-generic",<br/> "kernel_version": "#62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014",<br/> "nodename": "waffle",<br/> "machine": "x86_64",<br/> "processor": "x86_64",<br/> "hardware_platform": "x86_64",<br/> "boot_time": "system boot 2014-08-31 09:07",<br/> "uptime": 91784,<br/> "totalram": 16752275456,<br/> "freeram": 15594139648,<br/> "sharedram": 0,<br/> "totalswap": 17104367616,<br/> "totalhigh": 0,<br/> "freeswap": 17104367616,<br/> "mem_unit": 1<br/>}<br/></pre><br/><br/>So now I have the beginning of a structured inventory of all of my machines, and an easy way to scale it up.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-71650820880533201652014-08-04T10:48:00.000-07:002021-04-14T11:17:08.293-07:00Log consolidationWell, my nice DNS service with two secondaries and a primary is all well and good, but my logs are now scattered across three machines. If I want to play with the stats or diagnose a problem or see when something went wrong, I now have to grep around on three different machines.<br/><br/>Obviously I could consolidate the logs using syslog. That's what it's designed for, so why don't I do that. Let's see what I have to do to make that work properly:<br/><ol><br/> <li>Set up rsyslogd on flapjack to properly stash the DNS messages</li><br/> <li>Set up DNS on flapjack to log to syslog</li><br/> <li>Set up the rsyslogd service on flapjack to receive syslog messages over the network</li><br/> <li>Set up rsyslog on waffle to forward dns log messages to flapjack</li><br/> <li>Set up rsyslog on pancake to forward dns log messages to flapjack</li><br/> <li>Set up the DNS secondary configurations to use syslog instead of local logs</li><br/> <li>Distribute the updates and restart the secondaries</li><br/> <li>Test everything</li><br/></ol><br/>A side benefit of using syslog to accumulate my dns logs is that they'll now be timestamped so I can do more sophisticated data analysis if I ever get a Round Tuit.<br/><br/>Here's the architecture of the setup I'm going to pursue:<br/><br/><a href="http://nygeek.net/wp-content/uploads/2014/08/2014-08-04-dns-syslog-architecture.jpg"><img src="http://nygeek.net/wp-content/uploads/2014/08/2014-08-04-dns-syslog-architecture.jpg" alt="2014-08-04-dns-syslog-architecture" /></a><br/><br/>So the first step is to set up the primary DNS server on flapjack to write to syslog. This has several parts:<br/><ul><br/> <li>Declare a "facility" in syslog that DNS can write to. For historical reasons (Hi, Eric!) syslog has a limited number of separate facilities that can accumulate logs. The configuration file links sources to facilities, allowing the configuration master to do various clever filtering of the log messages that come in.</li><br/> <li>Tell DNS to log to the "facility"</li><br/> <li>Restart both bind9 and rsyslogd to get everything working.</li><br/></ul><br/>The logging for Bind9 is specified in a file called at /etc/bind/named.conf.local. The default setup involves appending log records to a file named /var/log/named/query.log.<br/><br/>We'll keep using that file for our logs going forward, since some other housekeeping knows about that location and no one else is intent on interfering with it.<br/><br/>The old logging stanza was:<br/><pre>logging {<br/> channel query.log {<br/><strong> file "/var/log/named/query.log";<br/></strong> severity debug 3;<br/> };<br/> category queries { query.log; };<br/>};<br/></pre><br/>What I want will be this:<br/><pre>logging {<br/> channel query.log {<br/><strong> syslog local6;<br/></strong> severity debug 3;<br/> };<br/> category queries { query.log; };<br/>};<br/></pre><br/>Because I have decided to use the facility named <em><strong>local6</strong></em> for DNS.<br/><br/>In order to make the rsyslogd daemon on flapjack listen to messages from DNS, I have to declare the facility active.<br/><br/>The syslog service on flapjack is provided by a server called rsyslogd. It's an alternative to the other two main stream syslog products - syslog-ng and sysklogd. I picked rsyslogd because it comes as the standard logging service on Ubuntu 12.04 and 14.04, the distros I am using in my house. You might call me lazy, you might call me pragmatic, but don't call me late for happy hour.<br/><br/>In order to make rsyslogd do what I need, I have to take control of the management of two configuration files: /etc/rsyslog.conf and /etc/rsyslog.d/50-default.conf. As is my wont, I do this by creating a project directory ~/projects/r/rsyslog/ with a Makefile and the editable versions of the two files under RCS control. Here's the Makefile:<br/><pre>cat Makefile<br/>#<br/># rsyslog setup file<br/>#<br/># As of 2014-08-01 syslog host is flapjack<br/>#<br/># $Id: Makefile,v 1.4 2014/08/02 12:11:52 marc Exp $<br/>#<br/><br/>TARGETS = /etc/rsyslog.conf /etc/rsyslog.d/50-default.conf<br/><br/>FILES = Makefile rsyslog.conf 50-default.conf<br/><br/>help: ${FORCE}<br/> cat Makefile<br/><br/># sudo<br/>/etc/rsyslog.conf: rsyslog.conf<br/> cp $< $@ <br/><br/>/etc/rsyslog.d/50-default.conf: 50-default.conf<br/> cp $< $@ <br/><br/># sudo<br/>push: ${TARGETS}<br/><br/># sudo<br/>restart: ${FORCE}<br/> service rsyslog restart<br/><br/>verify: ${FORCE}<br/> rsyslogd -c5 -N1<br/><br/>compare: ${FORCE}<br/> diff /etc/rsyslog.conf rsyslog.conf<br/> diff /etc/rsyslog.d/50-default.conf 50-default.conf<br/><br/>checkin: ${FORCE}<br/> ci -l ${FILES}<br/><br/>FORCE:<br/></pre><br/>Actually, this Makefile ends up in ~/projects/r/rsyslog/flapjack, since waffle and pancake will end up with different rsyslogd configurations and I separate the different control directories this way.<br/><br/>In order to log using syslog I need to define a facility, local6, in the 50-default.conf file. The new assertion looks like this:<br/><pre>local6.* -/var/log/named/query.log<br/></pre><br/>With a restart of each of the appropriate daemons, we're off to the races and the new logs appear in the log file. I needed to change the ownership of the /var/log/named/query.log from bind to syslog in order for the new writer to be able to write, but that was the work of a moment.<br/><br/>Now comes the task of making the logs from the two secondary DNS servers go across the network to flapjack. This involved a lot of little bits and pieces.<br/><br/>First of all, I had to tell the rsyslogd daemon on flapjack to listen to the rsyslog UDP port. I could have turned on the more reliable TCP logging facility or the even more reliable queueing facility, but let's get real. These are DNS query logs we're talking about. I don't really care if some of them fall on the floor. And anyway, the traffic levels on donner.lan are so low that I'd be very surprised if the loss rate is significant anyway.<br/><br/>To turn on UDP listening on flapjack all I had to do was uncomment two lines in the /etc/rsyslog.conf file:<br/><pre># provides UDP syslog reception<br/>$ModLoad imudp<br/>$UDPServerRun 514<br/></pre><br/>One more restart of rsyslogd on flapjack and we're good to go.<br/><br/>The next step is to make the DNS name service on waffle and pancake send their logs to the local6 facility. In addition, I had to set up rsyslog on waffle and flapjack with a local6 facility, though this time the facility has to know to send the logs across to flapjack by UDP rather than writing locally.<br/><br/>The change to the named.conf.local file for waffle and pancake's DNS secondary service was identical to the change to flapjack's primary service, so kudos to the designers of bind9 and syslogd for good modularization.<br/><br/>To make waffle and pancake forward their logs over to flapjack required that the /etc/rsyslog.d/50-default.conf file define local6 in this way:<br/><pre>local6.* @syslog<br/></pre><br/>Notice that the @ tells rsyslogd to forward logs to local6 via UDP. I could have put the IP address of flapjack right after the @ or I could have put in flapjack. Instead, I created a DNS listing for a service host named syslog ... it happens to have the same IP address as flapjack, but it gives me a level of indirection if I should desire to relocate the syslog service to another host.<br/><br/>With a restart of rsyslogd and bind9 on both waffle and pancake, we are up and running. All DNS logs are now consolidated on a single host, namely flapjack.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-86591909716664322692014-07-02T14:30:00.000-07:002021-04-14T11:17:08.231-07:00Waiting for the File ServerWell, I now have four different UNIX machines and I've been doing sysadmin tasks on all of them. As a result I now have four home directories that are out of sync.<br/><br/>How annoying.<br/><br/>Ultimately I plan to create a file server on one of my machines and provide the same home directory on all of them, but I haven't done that yet, so I need some temporary crutches to tide me over until I get the file server built. In particular, I need to find out what is where.<br/><br/>The first thing I did was establish trust among the machines, making flapjack, the oldest, into the 'master' trusted by the others. This I did by creating an SSH private key using <code>ssh-keygen</code> on the master and putting the matching public key in <code>.ssh/authorized_keys</code> on the other machines.<br/><br/>Then I decided to automate the discovery of what directories were on which machine. This is made easier because of my personal trick for organizing files, namely to have a set of top level subdirectories named <code>org/</code>, <code>people/</code>, and <code>projects/</code> in my home directory. Each of these has twenty-six subdirectories named <code>a</code> through <code>z</code>, with appropriately named subdirectories under them. This I find helps me put related things together. It is not an alternative to search but rather a complement.<br/><br/>Anyway, the result is that I could build a Makefile that automates reaching out to all of my machines and gathering information. Here's the Makefile:<br/><br/><pre><br/># $Id: Makefile,v 1.7 2014/07/04 18:57:44 marc Exp marc $<br/><br/>FORCE = force<br/><br/>HOSTS = flapjack frenchtoast pancake waffle<br/><br/>FILES = Makefile<br/><br/>checkin: ${FORCE}<br/> ci -l ${FILES}<br/><br/>uname: ${FORCE}<br/> for h in ${HOSTS}; <br/> do ssh $$h uname -a <br/> | sed -e 's/^/'$$h': /'; <br/> done<br/><br/>host_find: ${FORCE}<br/> echo > host_find.txt<br/> for h in ${HOSTS}; <br/> do ssh $$h find -print <br/> | sed -e 's/^/'$$h': /' <br/> >> host_find.txt; done<br/><br/>clusters.txt: host_find.txt<br/> sed -e 's|(/[^/]*/[a-z]/[^/]*)/.*$$|1|' host_find.txt <br/> | uniq -c <br/> | grep -v '^ *1 ' <br/> > clusters.txt<br/><br/>force:<br/></pre><br/><br/>Ideally, of course, I'd get the list of host names in the variable <code>HOSTS</code> from my configuration database, but having neglected to build one yet, I am just listing my machines by name there.<br/><br/>The first important target <code>host_find</code> does an ssh to all of the machines, including itself, and runs find, prefixing the host name on each line so that I can determine which files exist on which machine. This creates a file named <code>host_find.txt</code> which I can probably dispense with now that the machinery is working.<br/><br/>The second important target, <code>clusters.txt</code>, passes the host_find.txt output through a SED script. This SED script does a rather careful substitution of patterns like <code>/org/z/zodiac/blah-blah-blah</code> with <code>/org/z/zodiac</code>. Then the pipe through <code>uniq -c</code> counts up the number of identical path prefixes. That's fine, but there are lots of subdirectories <code>/org/f</code> that are empty and I don't want them cluttering up my result, so the <code>grep -v '^ *1 '</code> pipe segment excludes the lines with a count of 1.<br/><br/>The result of running that tonight is the following report:<br/><br/><pre><br/> 8 flapjack: ./org/c/coursera<br/> 351 flapjack: ./org/s/studiopress<br/> 3119 flapjack: ./org/g/gnu<br/> 1312 flapjack: ./org/f/freedesktop<br/> 293 flapjack: ./org/m/minecraft<br/> 9 flapjack: ./org/b/brother<br/> 2 flapjack: ./org/n/national_center_for_access_to_justice<br/> 1168 flapjack: ./org/w/wordpress<br/> 4 flapjack: ./projects/c/cron<br/> 10 flapjack: ./projects/c/cups<br/> 6 flapjack: ./projects/d/dhcp<br/> 33 flapjack: ./projects/d/dns<br/> 15 flapjack: ./projects/s/sysadmin<br/> 5 flapjack: ./projects/f/ftp<br/> 3 flapjack: ./projects/p/printcap<br/> 8 flapjack: ./projects/p/programming<br/> 8 flapjack: ./projects/t/tftpd<br/> 35 flapjack: ./projects/n/netboot<br/> 7 flapjack: ./projects/l/logrotate<br/> 8 flapjack: ./projects/r/rolodex<br/> 189 flapjack: ./projects/h/html5reset<br/> 6 frenchtoast: ./projects/p/printcap<br/> 5 frenchtoast: ./projects/c/cups<br/> 380 pancake: ./org/m/minecraft<br/> 3 pancake: ./projects/l/logrotate<br/> 15 pancake: ./projects/d/dns<br/> 9 pancake: ./projects/s/sysadmin<br/> 11 waffle: ./projects/s/sysadmin<br/> 8 waffle: ./projects/t/tftpd<br/> 15 waffle: ./projects/d/dns<br/> 3 waffle: ./projects/l/logrotate<br/> 375 waffle: ./org/m/minecraft</blockquote><br/></pre><br/><br/>And ... voila! I have a map that I can use to figure out how to consolidate the many scattered parts of my home directory.<br/><br/>[2014-07-04 - updated the Makefile so that it is more friendly to web browsers.]<br/><br/>[2014-07-29 - a friend of mine critiqued my Makefile code and pointed out that gmake has powerful iteration functions of its own, eliminating the need for me to incorporate shell code in my targets. The result is quite elegant, I must say!]<br/><br/><pre><br/>#<br/># Find out what files exist on all of the hosts on donner.lan<br/># Started in June 2014 by Marc Donner<br/>#<br/># $Id: Makefile,v 1.12 2014/07/30 02:07:07 marc Exp $<br/>#<br/><br/>FORCE = force<br/><br/># This ought to be the result of a call to the CMDB<br/>HOSTS = flapjack frenchtoast pancake waffle<br/><br/>FILES = Makefile host_find.txt clusters.txt<br/><br/>#<br/># This provides us with the ISO 8601 date (YYYY-MM-DD)<br/>#<br/>DATE := $(shell /bin/date +"%Y-%m-%d")<br/><br/>help: ${FORCE}<br/> cat Makefile<br/><br/>checkin: ${FORCE}<br/> ci -l ${FILES}<br/><br/># A finger exercise to ensure that we can see the base info on the hosts<br/>HOSTS_UNAME := $(HOSTS:%=.%_uname.txt)<br/><br/>uname: ${HOSTS_UNAME}<br/> cat ${HOSTS_UNAME}<br/><br/>.%_uname.txt: ${FORCE}<br/> ssh $* uname -a | sed -e 's/^/:'$*': /' > $@<br/><br/>HOSTS_UPTIME := $(HOSTS:%=.%_uptime.txt)<br/><br/>uptime: ${HOSTS_UPTIME}<br/> cat ${HOSTS_UPTIME}<br/><br/>.%_uptime.txt: ${FORCE}<br/> ssh $* uptime | sed -e 's/^/:'$*': /' > $@<br/><br/># Another finger exercise to verify the location of the ssh landing<br/># point home directory<br/><br/>HOSTS_PWD := $(HOSTS:%=.%_pwd.txt)<br/><br/>pwd: ${HOSTS_PWD}<br/> cat ${HOSTS_PWD}<br/><br/>.%_pwd.txt: ${FORCE}<br/> ssh $* pwd | sed -e 's/^/:'$*': /' > $@<br/><br/># Run find on all of the ${HOSTS} and prefix mark all of the results,<br/># accumulating them all in host_find.txt<br/><br/>HOSTS_FIND := $(HOSTS:%=.%_find.txt)<br/><br/>find: ${HOSTS_FIND}<br/><br/>.%_find.txt: ${FORCE}<br/> echo '# ' ${DATE} > $@<br/> ssh $* find -print | sed -e 's/^/:'$*': /' >> $@<br/><br/># Get rid of the empty directories and report the number of files in each<br/># non-empty directory<br/>clusters.txt: ${HOSTS_FIND}<br/> cat ${HOSTS_FIND} <br/> | sed -e 's|(/[^/]*/[a-z]/[^/]*)/.*$$|1|' <br/> | uniq -c <br/> | grep -v '^ *1 ' <br/> | sort -t ':' -k 3 <br/> > clusters.txt<br/><br/>force:<br/></pre>nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-66608609143114857612014-05-12T11:39:00.000-07:002022-04-04T12:24:28.956-07:00Two Intel NUC servers running Ubuntu<img alt="Two Intel NUC servers running Ubuntu" class="size-full" src="http://nygeek.net/wp-content/uploads/2014/05/2014-05-11-two-nuc-servers.jpg" /><br /><br />A week or two ago I took the plunge and ordered a pair of Intel NUC systems. Here's what happened next as I worked to build a pair of Ubuntu servers out of the hardware:<br /><br />I ordered the components for two Linux servers from Amazon:<ul> <li>Intel NUC D54250WYK [$364.99 each]</li><br /> <li>Crucial M500 240 GB mSATA [$119.99 each]</li><br /> <li>Crucial 16GB Kit [$134.99 each]</li><br /> <li>Cables Unlimited 6-Foot Mickey Mouse Power Cord [$5.99 each]</li></ul>for a total of $625.96 per machine. Because I have a structured wiring system in my apartment I didn't bother with the wifi card.<br />...<br /><br />Assembly was fast, taking ten or fifteen minutes to open the bottom cover, snap in the RAM and the SSD, and button the machine up again.<br /><br />Getting Ubuntu installed was rather more work (on an iMac):<br /><br />Download the Ubuntu image from the Ubuntu site.<br /><br />Prepare a bootable USB with the server image (used diskutil to learn that my USB stick was on /dev/disk4):<ul> <li><pre>hdiutil convert -format UDRW -o ubuntu-14.04-server-amd64.img ubuntu-14.04-server-amd64.iso</pre></li> <li><pre>diskutil unmountDisk /dev/disk4</pre></li> <li><pre>sudo dd if=ubuntu-14.04-server-amd64.img.dmg of=/dev/rdisk4 bs=1m</pre></li> <li><pre>diskutil eject /dev/disk4</pre></li></ul>This then booted on the NUC, and the install went relatively smoothly.<br /><br />However the system would not boot - did not recognize the SSD as a boot system - after the installation was complete<br /><br />Did a little searching around and learned that I needed to update the BIOS on the NUC. Downloaded the updated firmware from the Intel site, following a YouTube video from Intel, and applied the new firmware.<br /><br />Redid the install, which ultimately worked, after one more glitch. The second machine went more smoothly.<br /><br />Two little Linux boxes now working quite nicely - completely silent, 16G of RAM on each, 240G SSD on each.<br /><br />They are physically tiny ... hard to overemphasize how tiny, but really tiny. They sit on top of my Airport Extreme access point and make it look big.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com10tag:blogger.com,1999:blog-3000067919243237480.post-424796426557517532014-04-06T13:57:00.000-07:002021-04-14T11:17:07.814-07:002014 Five Borough Bike Tour - I'm ridingThe Five Borough Bike Tour is an annual event in which tens of thousands of New Yorkers ride 40 or 50 miles from lower Manhattan up through the Bronx, Queens, Brooklyn, and over the Verrazano Narrows Bridge to Staten Island. For the last three years I've supported a wonderful organization called Bronxworks (<a href="http://bronxworks.org/" target="_blank" rel="noopener">http://bronxworks.org/</a>) that helps families in need in The Bronx. I ride with a number of friends, some of whom live in the Bronx, and all of whom have adopted this wonderful group.<br/><br/>I rode with the Bronxworks team in <a title="2011 Five Boro Bike Tour" href="http://nygeek.net/2011/05/04/five-borough-bike-tour-2011-may-1/" target="_blank" rel="noopener">2011</a> and <a title="2012 Five Boro Bike Tour" href="http://nygeek.net/2012/05/07/2012-five-borough-bike-tour-6-may-2012/" target="_blank" rel="noopener">2012</a> but a conflict prevented me from riding in 2013, though I donated to support the rest of the team. Fortunately for me I will be riding again this year. If you want to contribute to Bronxworks in support of my ride you may visit my fundraising page <a href="http://www.crowdrise.com/BronxWorks2014BikeTour/fundraiser/marcdonner" target="_blank" rel="noopener">http://www.crowdrise.com/BronxWorks2014BikeTour/fundraiser/marcdonner</a>. If you do so, I will be eternally grateful!<br/><br/> nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-34185125001833249872013-10-22T10:48:00.000-07:002021-04-14T11:17:07.753-07:00From the Editors: The Invisible Computers[Originally published in the <a title="Security & Privacy original" href="http://www.computer.org/csdl/mags/sp/2011/06/msp2011060003.html" target="_blank">November/December 2011 issue (Volume 9 number 6)</a> of IEEE <a title="Security & Privacy" href="http://www.computer.org/security" target="_blank">Security & Privacy</a> magazine.]<br/><br/>Just over a decade ago, shortly before we launched IEEE Security & Privacy, <a title="MIT Press" href="http://mitpress.mit.edu/" target="_blank">MIT Press</a> published <a title="Donald Norman's site" href="http://www.jnd.org/" target="_blank">Donald Norman</a>'s book <a title="Amazon.com page for the book" href="http://www.amazon.com/The-Invisible-Computer-Information-Appliances/dp/0262640414" target="_blank">The Invisible Computer</a>. At the time, conversations about the book focused on the opportunities exposed by his powerful analogies between computers and small electric motors as system components.<br/><br/>Today, almost everything we use has one or more computers, and a surprising number have so many that they require internal networks. For instance, a new automobile has so many computers in it that it has at least two local area networks, separated by a firewall, to connect them, along with interconnects to external systems. There's probably even a computer in the key!<br/><br/>Medical device makers have also embraced computers as components. Implantable defibrillators and pacemakers have computers and control APIs. If it's a computer, it must have some test facilities, and these, if misused, could threaten a patient's health. Doctors who have driven these designs, focused entirely on saving lives, are shocked when asked about safeguards to prevent unauthorized abuse. It's probably good that their minds don't go that way, but someone (that's you) should definitely be thinking that way.<br/><br/>In 2007, the convergence battle in the mobile telephone world was resolved with the iPhone. iPhone's launch ended the mad competition to add more surfaces and smaller buttons to attach more "features" to each phone. Ever after, a mobile phone would be primarily a piece of software. One button was enough. After that, it was software all the rest of the way down, and control of the technology's evolution shifted from mechanical to software engineers.<br/><br/>By now, the shape of the computer systems world is beginning to emerge. No longer is the familiar computer body plan of a screen, keyboard, and pointing device recognizable. Now computers lurk inside the most innocuous physical objects, specialized in function but increasingly sophisticated in behavior. Beyond the computer's presence, however, is the ubiquity of interconnection. The new generation of computers is highly connected, and this is driving a revolution in both security and privacy issues.<br/><br/>It isn't always obvious what threats to security and privacy this new reality will present. For example, it's now possible to track stolen cameras using Web-based services that scan published photographs and index them by metadata included in JPEG or TIFF files. Although this is a boon for theft victims, the privacy risks have yet to be understood.<br/><br/>The computer cluster that is a contemporary automobile presents tremendous improvements in safety, performance, and functionality, but it also presents security challenges that are only now being studied and understood. Researchers have identified major vulnerabilities and, encouragingly, report engagement from the automobile industry in acting to mitigate the documented risks.<br/><br/>Security and privacy practitioners and researchers have become comfortable working in the well-lit neighborhood of the standard computer system lamppost. However, the computing world will continue to change rapidly. We should focus more effort on the challenges of the next generations of embedded and interconnected systems.<br/><br/>This is my valedictory editor-in-chief message. I helped George Cybenko, Carl Landwehr, and Fred Schneider launch this magazine and have served as associate EIC ever since. In recent years, my primary work moved into other areas, and lately I have felt that I was gaining more than I was contributing. Thus, at the beginning of 2011, I suggested to EIC John Viega that I would like to step down as associate EIC and give him an opportunity to bring some fresh blood to the team. The two new associate EIC -- Shari Lawrence Pfleeger and Jeremy Epstein -- are both impressive experts and a wonderful addition. The magazine, and the community it serves, are in excellent hands.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-52325237363546485782013-10-22T10:41:00.000-07:002021-04-14T11:17:07.693-07:00From the Editors: Privacy and the System Life Cycle[Originally published in the <a title="Security & Privacy original" href="http://www.computer.org/csdl/mags/sp/2011/02/msp2011020003.html" target="_blank">March/April 2011 issue (Volume 9 number 2) </a>of IEEE <a title="Security & Privacy" href="http://www.computer.org/security" target="_blank">Security & Privacy</a> magazine.]<br/><br/>Engineering long-lived systems is hard, and adding privacy considerations to such systems makes the work harder.<br/><br/>Who may look at private data that I put online? Certainly I may look at it, plus any person I explicitly authorize. When may the online system's operators look at it? Certainly when customer service representatives are assisting me in resolving a problem, they might look at the data, though I would expect them to get my permission before doing so. I would also expect my permission to extend only for the duration of the support transaction and to cover just enough data elements to allow the problem's analysis and resolution.<br/><br/>When may developers responsible for the software's evolution and maintenance look at my data? Well, pretty much never. The exception is when they're called in during escalation of a customer service transaction. Yes, that's right: developers may not, in general, look at private data contained in the systems that they have written and continue to support. In practice, it's probably infeasible to make developer access impossible, but we should make it highly visible.<br/><br/>Doesn't the code have a role in this? Of course it does, but the code isn't generally created by the consumer and isn't private. Insofar as consumers create code—and they do when they write macros, filters, and configurations for the system—it's part of this analysis. The system life cycle and privacy implications of user-created code are beyond the current state of the art and merit significant attention in their own right.<br/><br/>So what happens when an online system is forced to migrate data from one version of the software to another version? This happens periodically in the evolution of most long-lived systems, and it often involves a change to the underlying data model. How do software engineers ensure that the migration is executed correctly? They may not spot-check the data, of course, because it's private. Instead, they build test datasets and run them through the migration system and carefully check the results. But experienced software engineers know very well that test datasets are generally way too clean and don't exercise the worst of the system. Remember, no system can ever be foolproof because fools are way too clever. So we must develop tests that let us verify that data migration has been executed properly without being able to examine the result and spot-check it by eye. Ouch.<br/><br/>What's the state of the art with respect to this topic? Our community has produced several documents that represent a start for dealing with private data in computer systems. By and large, these documents focus on foundational issues such as what is and isn't private data, how to notify consumers that private data will be gathered and held, requirements of laws and regulations governing private data, and protecting private data from unauthorized agents and uses.<br/><br/>Rules and regulations concerning privacy fall along a spectrum. At one end are regulations that attempt to specify behavior to a high level of detail. These rules are well intended, but it's sometimes unclear to engineers whether compliance is actually possible. At the other end are rules such as HIPAA (Health Insurance Portability and Accountability Act) that simply draw a bright line around a community of data users that comprise doctors, pharmacies, labs, insurers, and their agents and forbid any data flow across that line. HIPAA provides few restrictions on the handling or use of this data within that line. Of course, one irony with HIPAA is that the consumer is outside the line.<br/><br/>Given the current state of engineering systems for online privacy, regulations like HIPAA are probably better than heavy-handed attempts to rush solutions faster than the engineering community can figure out feasibility limits.<br/><br/>This is an important area of work, and some promising research is emerging, such as Craig Gentry's recent PhD thesis on homomorphic encryption ( http://crypto.stanford.edu/craig/craig-thesis.pdf), but full rescue looks to be years off. We welcome reports from practitioners and researchers on approaches to the problem of maintaining data that may not be examined.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-82865674755947564382013-10-22T06:51:00.000-07:002021-04-14T11:17:07.634-07:00From the Editors: Phagocytes in Cyberspace[Originally published in the <a title="Security & Privacy original" href="https://www.computer.org/csdl/mags/sp/2010/02/msp2010020003.html" target="_blank">March/April 2010 issue (Volume 8 number 2)</a> of IEEE <a title="Security & Privacy" href="http://www.computer.org/security" target="_blank">Security & Privacy</a> magazine.]<br/><br/>Let us reflect on the evolution of malware as our industry has progressed during the 30-plus years since computers moved out of the mainframe datacenter cathedrals and into the personal computer bazaars. We might be moving back to cathedrals these days with the expansion of cloud computing, but the personal computer is here to stay in one form or another -- whether it's desktop or laptop or PDA or smartphone, and whether it's a stand-alone system with fat client software or a network device with thinner clients.<br/><br/>In the early days of computing, malware was transmitted by infected floppy disks. Authors were amateurs, virulence was low, and the risk was relatively minor—mostly an inconvenience. Later, the computing universe got larger and more densely connected as PCs became cheaper and the Internet and the Web made distributing software cheaper and easier. The software industry in turn made the installation of software easier, accommodating the needs of non-hobbyist users who had little tolerance for technical complexity. Malware authors did likewise, though perhaps for different reasons.<br/><br/>If we look at the history of disease, we see similar changes as biological communities evolved. The higher-population densities of towns and cities sped disease propagation. Adding injury to injury, sharing critical resources like water wells and food markets made propagation an easier problem for bacteria to solve, thus creating a challenge for us. The economic benefits of clustering in cities were in increasing tension with the hygiene problems that emerged from higher population density and the speedups in disease propagation that resulted.<br/><br/><strong>Malware Propagation</strong><br/><br/>Today we see a world in which malware has become a lucrative global industry, for both the offense and the defense. Organized criminals tend a complex interdependent ecosystem in which bot herders supervise vast arrays of zombie PCs. These herders pay malware distributors anywhere from a few cents to a dollar or more for each new machine infected. These botnets are hired out by the hour through professionally designed and implemented websites that accept credit cards and offer online support. What does one do with hired botnet hours? Why, one distributes spam for a fee, or attacks the websites of small- and medium-sized businesses to support the income of a protection racket, or distributes malware to accumulate zombies for another botnet. Malware development is so lucrative that the producers have established companies complete with human resources departments and paintball outings for employees.<br/><br/>The number of zombie PCs is huge. Reliable numbers for total zombies aren't available, but McAfee claimed to have measured in the first quarter of 2009 an increase of 12 million IP addresses behaving like zombies. 0-day exploits are likewise growing in number. Signature-based antimalware software has fallen further and further behind the bad guys, who use tools that enable them to custom design malware by checking boxes on a GUI. The new malware is polymorphic, allowing hundreds or thousands of versions, each with a different signature for a single virus. Enterprising malware knows how to thwart defensive software. In the early days, it would simply halt the antivirus software. Later, it would uninstall the software. The best modern malware surreptitiously alters the defensive software to blind it to the malware, such as by tampering with the signature files, defeating its responses.<br/><br/>Grandma in Iowa might very well have a PC that's running zombie software from two different botnets, but she doesn't notice that her machine is infected or that it has participated in dozens of DDoS attacks and sourced thousands of pieces of spam. The bot software is pretty savvy these day -- it lies low when grandma's using the machine and avoids contending for critical resources so as not to attract grandma's attention.<br/><br/><strong>Defensive Strategies</strong><br/><br/>Our industry continues to design and implement systems as if each will operate in a malware-free environment forever. A process running an application in a contemporary operating system trusts the services provided to it by the kernel. When developers build distributed systems that orchestrate several processes to cooperate in a larger task, the good ones might cross-authenticate to ensure that they're talking to the appropriate process, and the better ones might secure the traffic between nodes, but it's pretty rare for a process to verify that its correspondent is running the right software version, and almost unknown for the process to check on the operating system kernel and the services that it provides.<br/><br/>In the biological world, by contrast, virtually every organism survives with significant numbers of hostile bacteria and viruses in and around its body. Studies show hundreds of distinct bacterial species living on the skin of typical human subjects, and we know that the digestive tract is home to thousands of bacteria, many of which can cause lethal sickness if they were to get out of the gut and into more vulnerable parts of the body. Despite our intimate proximity to dangerous bio-malware, we are generally oblivious. The body keeps the bacteria and viruses in check.<br/><br/>The body has a sophisticated IFF (identify friend or foe) system that helps it distinguish between "thee" and "me" and to attack the "thee." The odd bacterial or viral illness and even the occasional pandemic represent the exceptions that prove the rule. By and large, we survive as individuals and even thrive in the presence of some pretty bad stuff. Most of our body's defensive actions take place below the threshold of awareness. Sometimes the basic defenses fail to keep the malware in check, so you develop a fever indicating that something, perhaps an infection, is amiss. If the defenses fail further, you have a funeral.<br/><br/>Maybe it's time for the good guys (that's us, if you aren't following along in the script) to reconsider our defensive strategies. The designers of the Kerberos authentication system explicitly assumed that the bad guys were going to be on the network and set themselves the task of designing an authentication system that didn't rely on the network's sterility. Of course, the Kerberos designers' conception of bad guys was limited to mischievous undergraduates, not organized criminal gangs, but the key insight was correct.<br/><br/>Can we further weaken the trust assumptions underlying our system designs? What would software look like if the applications didn't trust the file system, or if the file system didn't trust the operating system? We've made some progress on this front, with TPM (trusted platform module) hardware deployed in a number of industries, but we haven't yet established an adequate level of paranoia in system designers.<br/><br/>The bacteria and viruses that threaten our bodies evolved over time, whereas the malware that threatens our computers has been designed by clever software engineers. Our antimalware defenses don't adapt to their threat environment locally; at present, they depend on a small number of managers working at antivirus companies. The signature-based antimalware systems are increasingly challenged by scale and quality control problems. (A misbehaving antimalware system is sort of like an immune system under the influence of HIV—a threat that started life as a defensive system.)<br/><br/>Can we build defensive systems that analyze the behavior of malware and react by disabling it? Is there a graduated response mechanism that we can articulate that will allow our defenses to slow malware down while they study it and decide whether to shut it down? Would it be enough to cripple the malware and reduce its virulence?<br/><br/>Work is already under way with some of these assumptions at places like the University of New Mexico and Microsoft Research, but not nearly enough. We've clearly reached the end of the line with classical approaches and assumptions. Now is the time for radical thinking.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-7850823849239030732013-10-22T06:42:00.000-07:002021-04-14T11:17:07.575-07:00From the Editors: International Blues[Originally published in the <a title="Security & Privacy original" href="https://www.computer.org/csdl/mags/sp/2010/02/msp2010020003.html" target="_blank">March/April 2010 issue (Volume 8 number 2)</a> of IEEE <a title="Security & Privacy" href="http://www.computer.org/security" target="_blank">Security & Privacy</a> magazine.]<br/><br/>IEEE Security & Privacy could be a lot more international in its focus and content. Reflecting on its content and tone over the past seven years, it's hard to tell that we think of either privacy or security in a broad international context. There are examples of taking a broader view, but they're more notable as exceptions than as standards. This is bad for several reasons. First, privacy and security have different levels of importance in different places in the world. Second, by largely ignoring the non-Western world, we risk dangerous blind spots. Third, we might be failing to take simple steps that would make our magazine more valuable worldwide.<br/><br/>Although the purely technical aspects of our work are universal and generic, engineering is all about making trade-offs informed by economic and cultural judgments. Moreover, our subject matter firmly straddles the boundary between technology and policy—something we deliberately set out to do when we created the magazine in 2002/2003. Policy topics are generally more complex and tend to vary across jurisdictions, not to mention industries and institutions. Let's begin to focus our attention on ensuring that our international relevance increases going forward.<br/><br/>We have seen far too few articles on the challenges of dealing with cybersecurity issues across jurisdictions. Definitions of criminal violations differ across the world—let's see some examples of issues raised by these distinctions. Cultural standards vary globally, leading to differences in attitudes toward security, privacy, and the role of security services.<br/><br/>Maybe we can't address generic technical questions yet, so perhaps we should be examining a range of case studies on how these subjects manifest themselves in different countries. After we've seen enough case studies, perhaps we'll be able to abstract away from the details and get our heads around a new set of important questions. How have these variations affected security systems' design and implementation and operational responses to incidents?<br/><br/>"Made in <insert country here>" has become meaningless as industries have globalized and the movement of physical and virtual goods has become ever easier, making accountability for product quality ever more diffuse—and assurance ever more difficult. Views of personal responsibility toward the community, the employer, the nation, and the world vary widely. An employer's power to enforce behavior on the part of its employees varies widely across the world, so a vendor might well intend to deliver a high-integrity product, only to be undermined by one or more employees whose cultural views don't require that they comply. One consequence of this is that products might have "features" that their operators never wanted, features that compromise the security and privacy guarantees that their operators seek to meet.<br/><br/>Can we begin a discussion of techniques for making networks robust in the face of components that are unreliable or even potentially hostile to our usage? Back in the 1980s, the MIT Project Athena folks argued that a security system's design should presuppose that the network is held by hostile adversaries. Maybe it's time to go back to that sort of design principle.<br/><br/>This topic isn't brand new. For example, the United Nations Commission on International Trade Law has been working on cross-border computer crimes, trying to harmonize international agreements on things like rules of evidence, law enforcement cooperation, and definition of crimes. Numerous other international groups are now or can be expected to soon begin working on these and related issues. Cybersecurity is an area in which the balance of power between attackers and defenders is tipping very strongly toward the attackers. This situation presents challenges both to law enforcement and to national security institutions across the world, something that our community should begin to consider and address. S&P has been a leader in discourse throughout its life, and we will adapt ourselves to this emerging trend to best serve our community.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-19379649618323535572013-10-18T04:27:00.000-07:002021-04-14T11:17:07.515-07:00From the Editors: New Models for Old[Originally published in the <a title="Security & Privacy original" href="http://www.computer.org/csdl/mags/sp/2009/04/msp2009040003.html" target="_blank">July/August 2009 issue (Volume 7 number 4)</a> of IEEE <a title="Security & Privacy" href="http://www.computer.org/security" target="_blank">Security & Privacy</a> magazine.]<br/><br/>When faced with a new thing, human beings do something very sensible. They try to harness previous experience and intuition in service of the new thing. How is this new thing like something that I already know and understand?<br/><br/>Trying to model the new thing on some old thing can be efficient, making it easier to reason about the new thing by using analogies adopted from previous experience. The late Claude Shannon did this at least twice in his illustrious career.<br/><br/>The 1930s were an intense time in digital circuits, with engineers busily designing and building ever more complex machines out of electromechanical relays. Design principles for relay systems were vague and imprecise, with engineers employing rules of thumb and heuristics whose efficacy were limited. The result was a world in which tremendous potential was hampered by a real lack of powerful tools for reasoning about the artifacts that engineers were creating.<br/><br/>In 1937, Shannon wrote his master's dissertation at MIT entitled, <a href="http://dx.doi.org/10.1109%2FT-AIEE.1938.5057767" target="_blank">"A Symbolic Analysis of Relay and Switching Circuits."</a> In this paper, which has been called "possibly the most important, and also the most famous, master's thesis of the [twentieth] century," he observed that if one limited the interconnection topology very slightly, one could prove that relay circuits obeyed the mathematical rules George Boole formalized in <a href="http://www.gutenberg.org/ebooks/15114" target="_blank">"An Investigation of the Laws of Thought"</a> in 1854. Suddenly, engineers had in their hands powerful tools to help them analyze designs, predict their performance, and determine whether the designs could be made smaller or simpler. It's because of this work that today we refer to digital circuitry as "logic."<br/><br/>If he had done no more in his career, Shannon would have been a major contributor, but he couldn't leave well enough alone. In 1948, he released <a href="http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf">"A Mathematical Theory of Communication,"</a> a paper that established the field of information theory. The basic concept introduced was that information could be modelled effectively using the mathematics of probability theory, particularly using the specific notations common to thermodynamics. The importance of the information theory work was so great that his earlier work on digital circuit theory has faded to comparative unimportance.<br/><br/>The ability to reuse a model when it fits, even if only approximately, is a powerful tool for speeding the adoption of new technologies. The desktop metaphor is credited with helping the Macintosh rapidly reach a user community that had previously found computing inaccessible, becoming the common metaphor across essentially all computing environments. Although the metaphor has its roots in the work of Douglas Englebart and was refined at Xerox PARC, it's forever associated with the Macintosh.<br/><br/>Analogic and metaphoric reasoning doesn't always work, however. For each of the brilliant examples cited here, there's at least one counterexample in which such approaches fail<br/><br/>Some years ago, I led a project at an investment bank to replace its use of microfiche with an online system. In designing the system, we referred to some SEC regulations governing the storage and retention of records by institutions such as ours. The regulations specified that only optical disks were permitted in these record retention systems. The provision of the regulation that gave the engineers working on the design effort the most entertainment is the requirement that they provide a facility for "projecting" images of the stored documents. It was clear from the rule's wording that the document's authors had a mental model in which an optical disk was very much like microfiche, containing very highly miniaturized photographic images of the documents stored there. In a microfiche system, a document is optically enlarged using what amounts to a slide projector. The intent of the regulation was obviously not that we provide a facility to project retrieved documents on a screen but rather that our system be able to display an essentially unaltered rendition of the original document, allowing investigators to see such documents as they were seen by the bank's staff when they were first used.<br/><br/>As the system's designers, we felt compelled to write an extensive interpretive document that extracted the original intent from the regulations and get the lawyers to sign off on that interpretation. Then, we could ensure that each of those, more appropriately posed, requirements was met and document how that had been done. In this case, we'd inverted the overly specific regulation to get at the true underlying functional requirements. Of course, if the requirements had been written properly to start with, we could have avoided the time-consuming and expensive process of writing the interpretative document and getting it reviewed and approved by the compliance department. Moreover, we would have avoided the risk that the SEC might disagree with our interpretation and restatement of the requirements.<br/><br/>Why is this important? As technical professionals, we often bemoan the challenge of communicating technology's potential to laypeople and of their often painful errors in attempting to pierce the complexities and grasp the essential concepts and values on offer. This challenge is manifested in rules and regulations written to "fight the last war" and interpreted by auditors, reporters, and analysts who sometimes miss the essential point. Our frustration is that it's often these laymen, rather than our technical leaders and visionaries, who establish public understanding of our contributions.<br/><br/>As an industry, we're now faced with a wide range of circumstances in which the security and privacy protection provisions of systems are specified in laws and regulations. For instance, we have regulations like SEC rules, HIPAA, and SOX that enshrine paper-based information storage and retrieval models in their security and control models. If you have a paper record, how do you ensure its immunity from destruction, theft, or alteration? Why, you put it in a room with thick walls and strong locked doors. You check the backgrounds of everyone requesting access to the room, including the executives and the janitors. You implement careful processes to ensure that every transaction involving one of the documents is recorded in a log book somewhere.<br/><br/>Unfortunately, when you replace the file cabinets in the room with racks full of disks connected by networks, you discover that the thick walls are now as effective as a similar volume of air at securing the documents. But a literal audit might well give a clean bill of health to the roomful of disks. It's secured within a strong wall. The doors are locked. Everyone with access to the keys is known. A+.<br/><br/>What can we, the security and privacy technical community, do to improve things? Rules and regulations are unfortunately -static documents that, in a dynamic technology world, will somehow always manage to find themselves out of date. We're in the midst of a huge society-wide change to move record keeping from paper systems to digital ones. In consequence, a vast number of existing rules can and should be rethought and revised. No better time than now, and no one better to do it than we.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-83291917396407476192013-10-18T01:26:00.000-07:002021-04-14T11:17:07.455-07:00From the Editors: Reading (with) the Enemy[Originally published in the <a title="Security & Privacy original" href="http://www.computer.org/csdl/mags/sp/2009/01/msp2009010003.html" target="_blank">January/February 2009 issue (Volume 7 number 1)</a> of IEEE <a title="Security & Privacy" href="http://www.computer.org/security" target="_blank">Security & Privacy</a> magazine.]<br/><br/>Back in the July/August 2006 issue of IEEE Security & Privacy, the editors of the Book Reviews department wrote an essay entitled, "<a title="Why We Won't Review Books by Hackers" href="http://www.computer.org/csdl/mags/sp/2006/04/j4009.html" target="_blank">Why We Won't Review Books by Hackers</a>." They argued that to review such books would be to "tacitly endorse a convicted criminal who now wants to pass himself off as a consultant." We published two letters to the editor in the subsequent issue, and that was the end of the topic. Or so you thought.<br/><br/>In this issue, I argue that whether S&P reviews them, you should read the writings of bad guys, with the usual caveat that you should do so if they have something useful to say and are well written. This topic has been debated for many years, and the positions boil down to one of four basic arguments:<br/><ul><br/> <li>The writings of bad guys are morally tainted.</li><br/> <li>We should not reward bad guys for bad behavior.</li><br/> <li>The writings of bad guys provide "how to" information for the next generation of bad guys.</li><br/> <li>The writings of bad guys glamorize bad behavior and should be eschewed along with other attractive nuisances (to steal a term from the legal community).</li><br/></ul><br/>If the moral taint disqualification fails for Mein Kampf, then there's no reason we should let it stop us reading the works of lesser criminals. Fundamentally, any writing that gives the good guys an insight into the behavior of the bad guys is useful.<br/><br/>In the case of black hat computer adventurers, there's no legitimate employment, so a book's economic importance to the bad guy might be quite significant. On balance, however, this is a red herring. Negligibly few books are so popular that they change the fortunes of their authors. Most books have no more than modest success that, in the best case, produces a few hundreds or perhaps thousands of dollars for the author. This isn't enough to make a real behavioral difference. Moreover, if a book becomes incredibly successful, it's likely that the book's value to society outweighs the harm that comes from rewarding the bad guy. A more subtle argument is that bad guys write books to market their skills for later employment as security experts. This argument is similarly bogus because it's really "moral taint" in disguise. Without getting into an imponderable debate on ethics, this argument comes down to the assertion that a bad guy can never be reformed and that skills learned from bad behavior should never be used for gain.<br/><br/>The third argument -- that bad-guy writing passes evil skills on to future bad guys -- falls apart similarly on deeper analysis. It reduces to the old security through obscurity chestnut, which our community has been on the forefront of rebutting. Besides, cybercrime is a fast-paced arms race, and most of last week's tools and techniques are ineffective and irrelevant this week. Of course, the more general techniques that bad guys use to develop attacks are as valuable to defenders as they are to attackers.<br/><br/>The last argument (about attractive nuisance) is an interesting one. The world of cybercriminal-authored books clearly breaks into two parts -- those whose authors have been caught and convicted and those whose authors have not. All the bad-guy books I can think of have been written by convicted criminals. Books written by unconvicted criminals lack a certain--to put it delicately--credibility, wouldn't you say? After all, it's hard to believe that an uncaught and unconvicted bad guy would reveal all the vulnerabilities he knew. And if you want to trade time in jail and the permanent status of a convicted criminal for the dubious chance at fame that writing a true cybercrime book brings, then you probably already have severe problems.<br/><br/>Most fundamentally, however, the department editors noted that the book they were refusing to review was uninformative and badly written. This makes the book a waste of time by violating my rule that bad-guy books should be "useful and well written" to be worth reading. So if you hear about a good book by a bad guy, by all means read it.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-35090710982298049362013-10-17T00:53:00.000-07:002021-04-14T11:17:07.394-07:00From the Editors: Cyberassault on Estonia[This editorial was published originally in "<a href="http://www.computer.org/portal/web/security/home">Security & Privacy</a>" <a title="Security & Privacy original" href="http://www.computer.org/csdl/mags/sp/2007/04/j4004.html" target="_blank">Volume 5 Number 4 July/August 2007</a>]<br/><br/>Estonia recently survived a massive distributed denial-of-service (DDoS) attack that came on the heels of the Estonian government's relocation of a statue commemorating Russia's 1940s wartime role. This action inflamed the feelings of the substantial Russian population in Estonia, as well as those of various elements in Russia itself.<br/>Purple prose then boiled over worldwide, with apocalyptic announcements that a "cyberwar" had been unleashed on the Estonians. Were the attacks initiated by hot-headed nationalists or by a nation state? Accusations and denials have flown, but no nation state has claimed authorship.<br/><br/>It's not really difficult to decide if this was cyberwarfare or simple criminality. Current concepts of war require people in uniforms or a public declaration. There's no evidence that such was the case. In addition, there's no reason to believe that national resources were required to mount the attack. Michael Lesk's piece on the Estonia attacks in this issue (see the Digital Protection department on p. 76) include estimates that, at current botnet leasing prices, the entire attack could have been accomplished for US$100,000, a sum so small that any member of the upper middle class in Russia, or elsewhere, could have sponsored it.<br/><br/>Was there national agency? It's highly doubtful that Russian President Vladimir Putin or anyone connected to him authorized the attacks. If any Russian leader had anything to say about the Estonians, it was more likely an intemperate outburst like Henry II's exclamation about Thomas Becket, "Will no one rid me of this troublesome priest?"<br/><br/>We can learn from this, however: security matters, even for trivial computers. A few tens of thousands of even fairly negligible PCs, when attached by broadband connections to the Internet and commanded in concert, can overwhelm all modestly configured systems—and most substantial ones.<br/><br/>Engineering personal systems so that they can't be turned into zombies is a task that requires real attention. In the meantime, the lack of quality-of-service facilities in our network infrastructure will leave them vulnerable to future botnet attacks. Several avenues are available to address the weaknesses in our current systems, and we should be exploring all of them. Faced with epidemic disease, financial panic, and other mass threats to the common good, we're jointly and severally at risk and have a definite and legitimate interest in seeing to it that the lower limits of good behavior aren't violated.<br/><br/>From the Estonia attacks, we've also learned that some national military institutions are, at present, hard-pressed to defend their countries' critical infrastructures and services. Historically, military responses to attacks have involved applying kinetic energy to the attacking forces or to the attackers' infrastructure. But when the attacking force is tens or hundreds of thousands of civilian PCs hijacked by criminals, what is the appropriate response? Defense is left to the operators of the services and of the infrastructure, with the military relegated to an advisory role—something that both civilians and military must find uncomfortable. Of course, given the murky situations involved in cyberwar, we'll probably never fully learn what the defense establishments could or did do.<br/><br/>Pundits have dismissed this incident, arguing that this is a cry of "wolf!" that should be ignored (see www.nytimes.com/2007/06/24/weekinreview/24schwartz.html). Although it's true that we're unlikely to be blinded to an invasion by the rebooting of our PCs, it's naïve to suggest that our vulnerability to Internet disruptions has passed its peak. Cyberwar attacks, as demonstrated in 2003 by Slammer, have the potential to disable key infrastructures. To ignore that danger is criminally naïve. Nevertheless, all is not lost.<br/><strong>Conclusion</strong><br/><br/>Events like this have been forecast for several years, and as of the latest reports, there were no surprises in this attack. The mobilization of global expertise to support Estonia's network defense was heartening and will probably be instructive to study. Planners of information defenses and drafters of future cyberdefense treaties should be contemplating these events very carefully. This wasn't the first such attack—and it won't be the last.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-46719437981683702172013-10-17T00:45:00.000-07:002021-04-14T11:17:07.334-07:00From the Editors: Insecurity through Obscurity[This editorial was published originally in "<a href="http://www.computer.org/portal/web/security/home">Security & Privacy</a>" <a title="Security & Privacy original" href="http://www.computer.org/csdl/mags/sp/2006/05/j5004.html" target="_blank">Volume 4 Number 5 September/October 2006</a>]<br/><br/>Settling on a design for a system of any sort involves finding a workable compromise among functionality, feasibility, and finance. Does it do enough of what the sponsor wants? Can it be implemented using understood and practical techniques? Is the projected cost reasonable when set against the anticipated revenue or savings?<br/>In the case of security projects, functionality is generally stated in terms of immunity or resistance to attacks that seek to exploit known vulnerabilities. The first step in deciding whether to fund a security project is to assess whether its benefits outweigh the costs. This is easy to state but hard to achieve.<br/><br/>What are the benefits? Some set of exploits will be thwarted. But how likely would they be to occur if we did nothing? And how likely will they be to occur if we implement the proposed remedy? What is the cost incurred per incident to repair the damage if we do nothing? Armed with the answers to these often unanswerable questions, we can get some sort of quantitative handle on the benefits of implementation in dollars-and-cents terms.<br/><br/>What are the costs? Specification, design, implementation, deployment, and operation of the solution represent the most visible costs. What about the efficiency penalty that stems from the increased operational complexity the solution imposes? This represents an opportunity cost in production that you might have achieved if you hadn't implemented the solution.<br/><br/>In the current world of security practice, it's far too common, when faced with vast unknowns about benefits, to fall back on one of two strategies: either spend extravagantly to protect against all possible threats or ignore threats too expensive to fix. Protection against all possible threats is an appropriate goal when securing nuclear weapons or similar assets for which failure is unacceptable, but for most other situations, a more pragmatic approach is indicated.<br/><br/>Unfortunately, as an industry, we're afflicted with a near complete lack of quantitative information about risks. Most of the entities that experience attacks and deal with the resultant losses are commercial enterprises concerned with maintaining their reputation for care and caution. This leads them to the observation that disclosing factual data can assist their attackers and provoke anxiety in their clients. The lack of data-sharing arrangements has resulted in a near-complete absence of incident documentation standards; as such, even if organizations want to compare notes, they face a painful exercise in converting apples to oranges.<br/><br/>If our commercial entities have failed, is there a role for foundations or governments to act? Can we parse the problem into smaller pieces, solve them separately, and make progress that way? Other fields, notably medicine and public health, have addressed this issue more successfully than we have. What can we learn from their experiences? Doctors almost everywhere in the world are required to report the incidence of certain diseases and have been for many years. California's SB 1386, which requires disclosure of computer security breaches, is a fascinating first step, but it's just that—a first step. Has anyone looked closely at the public health incidence reporting standards and attempted to map them to the computer security domain? The US Federal Communications Commission (FCC) implemented telephone outage reporting requirements in 1991 after serious incidents and in 2004 increased their scope to include all the communications platforms it regulates. What did it learn from those efforts, and how can we apply them to our field?<br/><br/>The US Census Bureau, because it's required to share much of the data that it gathers, has developed a relatively mature practice in anonymizing data. What can we learn from the Census Bureau that we can apply to security incident data sharing? Who is working on this? Is there adequate funding?<br/><strong></strong><br/><br/><strong>Conclusion</strong><br/><br/>These are all encouraging steps, but they're long in coming and limited in scope. Figuring out how to gather and share data might not be as glamorous as cracking a tough cipher or thwarting an exploit, but it does have great leverage.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0tag:blogger.com,1999:blog-3000067919243237480.post-11304642025469201962013-10-17T00:38:00.000-07:002021-04-14T11:17:07.271-07:00From the Editors: The Impending Debate[This editorial was published originally in "<a href="http://www.computer.org/portal/web/security/home">Security & Privacy</a>" <a title="Security & Privacy original" href="http://www.computer.org/csdl/mags/sp/2006/02/j2004.html" target="_blank">Volume 4 Number 2 March/April 2006</a>]<br/><br/>There's some scary stuff going on in the US right now. President Bush says that he has the authority to order, without a warrant, eavesdropping on telephone calls and emails from and to people who have been identified as terrorists. The question of whether the president has this authority will be resolved by a vigorous debate among the government's legislative, executive, and judicial branches, accompanied, if history is any guide, by copious quantities of impassioned rhetoric and perhaps even the rending of garments and tearing of hair. This is as it should be.<br/><br/>The president's assertion is not very far, in some ways, from Google's claims that although its Gmail product examines users' email for the purpose of presenting to them targeted advertisements, user privacy isn't violated because no natural person will examine your email. The ability of systems to mine vast troves of data for information has now arrived, but policy has necessarily lagged behind. The clobbering of Darpa's Total Information Awareness initiative (now renamed Terrorism Information Awareness; http://searchsecurity.techtarget.com/sDefinition/0,,sid14_gci874056,00.html) in 2004 was a lost opportunity to explore these topics in a policy debate, an opportunity we may now regain. Eavesdropping policy conceived in an era when leaf-node monitoring was the only thing possible isn't necessarily the right one in this era of global terrorism. What the correct policy should be, however, requires deep thought and vigorous debate lest the law of unintended consequences take over.<br/><br/>Although our concerns in IEEE Security & Privacy are perhaps slightly less momentous, we are, by dint of our involvement with and expertise in the secure transmission and storage of information, particularly qualified to advise the participants in the political debate about the realities and the risks associated with specific assumptions such as what risks are presented by data mining. As individuals, we'll be called on to inform and advise both the senior policymakers who will engage in this battle and our friends and neighbors who will watch it and worry about the outcome. It behooves us to do two things to prepare for this role. One, we should take the time now to inform ourselves of the technical facts, and two, we should analyze the architectural options and their implications.<br/><br/>Unlike classical law enforcement wiretapping technology (covered in depth in S&P's November/December 2005 issue), which operates at the leaves of the communication interconnection tree, this surveillance involves operations at or close to the root. When monitoring information at the leaves, only information directed to the specific leaf node is subject to scrutiny. It's difficult when monitoring at the root to see only communications involving specific players—monitoring at the root necessarily involves filtering out the communications not being monitored, something that involves looking at them. When examining a vast amount of irrelevant information, we haven't yet demonstrated a clear ability to separate signal (terrorist communication, in this case) from noise (innocuous communication). By tracking down false leads, we waste expensive skilled labor, and might even taint innocent people with suspicion that could feed hysteria in some unfortunate future circumstance.<br/><br/>Who's involved in the process of examining communications and what are the possible and likely outcomes of engaging in this activity? The security and privacy community has historically developed scenario analysis techniques in which we hypothesize several actors, both well- and ill-intentioned, and contemplate their actions toward one another as if they were playing a game. Assume your adversary makes his best possible move. Now assume you make your best possible response. And so on. In the case of examining communications at the root, we have at least four actors to consider.<br/><br/>One is the innocent communicator whom we're trying to protect, another is the terrorist whom we're trying to thwart. The third is the legitimate authority working to protect the innocent from the terrorist, and the fourth, whom we ignore at our peril, is the corrupted authority who, for some unknown reason, is tempted to abuse the information available to him to the detriment of the innocent. We could choose, in recognition of the exigencies of a time of conflict, to reduce our vigilance toward the corrupted authority, but history has taught us that to ignore the concept puts us and our posterity in mortal peril.<br/><br/>Conclusion<br/><br/>Our community's challenge in the coming debate is to participate effectively, for we occupy two roles at once. We are technical experts to whom participants turn for unbiased fact-based guidance and insight, and we are simultaneously concerned global citizens for whom this debate is meaningful and important. We must avoid the temptation to use our expertise to bias the debate, but we must also avoid being passive bystanders. We must engage thoughtfully and creatively. We owe this to our many countries, our colleagues, our neighbors, our friends, our families, and ourselves.nygeekhttp://www.blogger.com/profile/08058861127802416012noreply@blogger.com0