<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Fred Ross</title>
	<atom:link href="http://madhadron.com/feed" rel="self" type="application/rss+xml" />
	<link>http://madhadron.com</link>
	<description></description>
	<lastBuildDate>Fri, 24 May 2013 00:46:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>An extremely short course on fractals</title>
		<link>http://madhadron.com/an-extremely-short-course-on-fractals</link>
		<comments>http://madhadron.com/an-extremely-short-course-on-fractals#comments</comments>
		<pubDate>Fri, 24 May 2013 00:46:49 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[math]]></category>
		<category><![CDATA[nontechnical]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=327</guid>
		<description><![CDATA[[This was an email to a mailing list I'm on to provide background for another discussion.] Say you have a curve that you&#8217;re looking at under a microscope. You do your best to measure the length of the line given that your scope doesn&#8217;t have perfect resolution, so any tiny details get washed out and [...]]]></description>
			<content:encoded><![CDATA[<p><em>[This was an email to a mailing list I'm on to provide background for another discussion.]</em></p>
<p>Say you have a curve that you&#8217;re looking at under a microscope. You do your best to measure the length of the line given that your scope doesn&#8217;t have perfect resolution, so any tiny details get washed out and replaced by you going &#8220;well, it looks like a line&#8221;.</p>
<p>For a straight line, &#8212;-, as you increase the resolution of your scope, the length you measure doesn&#8217;t change. The same is true of a half circle, or most other such smooth things. Now, that when you double your resolution, what looked like a little bit of line turns out to be a squiggle, rather longer than the line it was blurred to, though with the same starting and ending points. And then for each part of the squiggle, when you double the resolution, each bit of line on it turns out to be a squiggle itself. And so on and so on.</p>
<p>If the length you measure grows in a regular fashion as you increase your resolution, so the length at one resolution is equal to the length at half that resolution to some power alpha, then we say that alpha is the Hausdorff (or fractal) dimension. For a straight line, the length is always the same, and alpha=1. It&#8217;s a one dimensional object. There are ones like the Hilbert curve where alpha=2 (that is, if you take the infinitely fine version, it fills the whole of two dimensional space), and things like the Koch curve which is about alpha=1.262. Wikipedia has a <a href="https://en.wikipedia.org/wiki/List_of_fractals_by_Hausdorff_dimension">nice list</a>, ordered by Hausdorff dimension.</p>
<p>Yet all of these things are lines. You could grab them and straighten them out into a straight line with Hausdorff dimension 1. The notion of dimension we&#8217;re used to is defined by what you can straight something out into, not by the resolution game above. If I can straighten it out into a line, it&#8217;s one dimensional. If I can flatten it into a surface, it&#8217;s two dimensional. This notion of dimension we call topological dimension.</p>
<p>Any time the dimension from how length grows with resolution exceeds the dimension of what you can straighten it out into, you have something weird and spiky with infinitely small noise.</p>
<p>That&#8217;s all there is. They&#8217;re cute. But why does anyone care? Because they showed up as a the shape of a bunch of weird things in dynamical systems that people didn&#8217;t realize existed until well into the 20th century.</p>
<p>So, a brief digression on dynamical systems: Say I have a recipe that takes each point in a space and maps it to another point. If I repeat the recipe again and again, the points start moving along paths, point -&gt; recipe applied to point -&gt; recipe applied to recipe applied to point.</p>
<p>One question to ask about the behavior is where the points go. Do all the points in some region stay in that region, that is, are there basins of attraction? Up until the mid 20th century, everyone thought that the only basins that weren&#8217;t really fragile mathematical artifacts were simple things with smooth boundaries with equal Hausdorff and topological dimension. Anything that wasn&#8217;t would just go away if you distorted the recipe slightly, so wasn&#8217;t important for modeling anything real.</p>
<p>Turns out that that&#8217;s not true. There&#8217;s a whole class of basins which behave in all kinds of new and strange ways that basically required a full rewrite of dynamical systems theory. Some of these systems and basins have boundaries which are fractals.</p>
<p>So the fractals aren&#8217;t the interesting part. The interesting part is the behavior and classification of dynamical systems. The fractals are just an easy part to see that everyone&#8217;s latched onto.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/an-extremely-short-course-on-fractals/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The problem with Silicon Valley&#8217;s libertarians</title>
		<link>http://madhadron.com/the-problem-with-silicon-valleys-libertarians</link>
		<comments>http://madhadron.com/the-problem-with-silicon-valleys-libertarians#comments</comments>
		<pubDate>Sun, 19 May 2013 16:44:48 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=322</guid>
		<description><![CDATA[I&#8217;m tired of hearing Silicon Valley techheads bitch and moan about laws and government&#8211;laws are slowing us down! Government is getting in the way of the advance of technology! Leaving aside the fact that computing and Silicon Valley were built almost single handedly by DARPA, it shows their ignorance of technology in general and computer [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m tired of hearing Silicon Valley techheads bitch and moan about laws and government&#8211;laws are slowing us down! Government is getting in the way of the advance of technology! Leaving aside the fact that computing and Silicon Valley were built almost single handedly by DARPA, it shows their ignorance of technology in general and computer science in particular.</p>
<p>Laws and government <em>are</em> technology. Our ancestors roaming the African savannah did not know how to organize and run a state of three hundred people, much less one of three hundred million. That technology has developed over the past few thousand years. Law is similar. How many names from early history are associated with a code of law, from Hammurabi to Justinian?</p>
<p>And even in the supposed specialty of Silicon Valley, computing, it&#8217;s soft headed. Computing is only incidentally about the digital computing machines we use today. The first computers were rooms full of women with adding machines, passing their results around in fixed patterns in order to perform complicated calculations. This is different only in scale from a government.</p>
<p>The clamor for no government by those enamored with software as a cure for all ills can&#8217;t be taken seriously. To a man, they&#8217;re all either sociopaths or incapable of systematic thought.</p>
<p>A sociopath wants everything going his way, and objects to anyone stopping him for any reason. There&#8217;s not much to be done with sociopaths besides kill them. But for those who aren&#8217;t, this obsession with avoiding compulsion isn&#8217;t reasoned. It seems more like a holdover from being beaten up and having their lunch money stolen on the playground.</p>
<p>If they really wanted to minimize the compulsion on people, they wouldn&#8217;t be asking for no government. Most will quickly backtrack and accept some government. After all, you have to have enough force available to compel those who aren&#8217;t playing nicely to behave. If Mark Zukerberg decides that he wants me dead with no retaliation on himself, there has to be someone with even more teeth than he can hire. And that is the point of a government: we give it a monopoly on legitimate force in our society. After Mark Zukerberg sends his death squads after me, there needs to be a government powerful enough to go pry him out of his mercenary guarded compound in Montana.</p>
<p>But why is that compulsion worse than the compulsions imposed on those suffering from obsessive compulsive disorder or schizophrenia by their disorders? These are severe compulsions, ones that don&#8217;t go away after someone has taken your lunch money. We are today capable of at least partially lifting these compulsions. Anyone who has allowed as how enough government to protect them from playground bullies is good that won&#8217;t also accept this isn&#8217;t reasoning, they&#8217;ve just never gotten over having someone give them a wedgie on the playground.</p>
<p>And if such suffering caused by such continuous compulsions is to be prevented, what about forcing people to come to an environment which is both physically and socially unpleasant to do mindnumbing work, day in and day out, or face the threat of starvation, ruined lives, or possibly being allowed to die of a major medical condition? That seems like compulsion worth preventing to me.</p>
<p>So someone really interested in the freedom from compulsion, the liberty, of his fellow man, would agitate for honest and fair law enforcement, health care for all, and strong labor laws. That should sound familiar: it&#8217;s socialism.</p>
<p>That&#8217;s right, a libertarian capable of systematic thought is a socialist. Or a sociopath.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/the-problem-with-silicon-valleys-libertarians/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why we are afflicted with data science degrees</title>
		<link>http://madhadron.com/why-we-are-afflicted-with-data-science-degrees</link>
		<comments>http://madhadron.com/why-we-are-afflicted-with-data-science-degrees#comments</comments>
		<pubDate>Sun, 14 Apr 2013 01:14:33 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[math]]></category>
		<category><![CDATA[nontechnical]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=316</guid>
		<description><![CDATA[A friend sent me an article about the masters and graduate certificate programs in data science springing up around the country. I think it was meant solely to stir me up. He knows me well. We&#8217;ll come back to the peculiar thing that is &#8220;data science&#8221; later. Let&#8217;s look at these programs first. They&#8217;re teaching [...]]]></description>
			<content:encoded><![CDATA[<p>A <a href="http://www.huffingtonpost.com/allen-frances/">friend</a> sent me an <a href="http://t.co/ewyfnTgEYK">article</a> about the masters and graduate certificate programs in data science springing up around the country. I think it was meant solely to stir me up. He knows me well.</p>
<p>We&#8217;ll come back to the peculiar thing that is &#8220;data science&#8221; later. Let&#8217;s look at these programs first. They&#8217;re teaching basic probability and descriptive statistics, how to design a study and analyze it, how to make decent plots, linear regression in its various forms, and enough understanding of programming to get some work done. Some add on some domain knowledge on business.</p>
<p>That&#8217;s excellent material, and the undergraduate students at University of Washington or Northwestern or Columbia or the various schools offering these programs should be screaming bloody murder, or at least demanding their tuition back. Those aren&#8217;t graduate topics! Those are the basics you should expect from anyone with a technical degree! Okay, if you hired someone who studied a basic science like physics or chemistry you might now expect the business knowledge, but an engineer had better have it. This isn&#8217;t a subject. It&#8217;s part of numeracy.</p>
<p>They are very much teachable to the undergraduates. A good hunk of the data science certificate gets taught to physics majors in one semester of their second or third year at University of Virginia as &#8220;Fundamentals of scientific computing&#8221;. I single out University of Virginia&#8217;s class as an example because I happened to be there when it started in 2005, and remember talking about what should be in it with Bob Hirosky, its creator. My friends were the teaching assistants.</p>
<p>And the topics in these certificates are the basics, not the advanced material. Not that there aren&#8217;t legions of professional analysts out there with less statistical skill and no knowledge of programming, but no one would dream of giving them a title other than &#8220;Excel grunt&#8221;—sure, gussied up somehow to stroke their ego, but that&#8217;s what it comes down to.</p>
<p>So, we have a failure of the academy. Nothing new there. The rise of data science itself is a peculiar one, though. It was reified into existence by a <a href="http://www.forbes.com/pictures/lmm45emkh/2-jeff-hammerbacher-chief-scientist-cloudera-and-dj-patil-entrepreneur-in-residence-greylock-ventures/">couple of guys</a> doing the dismal work of mathematically stalking people at Facebook and LinkedIn, though Google got in on the name game pretty quickly. Who can blame them? If your job is doing something as puerile as getting people to click on ads, I don&#8217;t begrudge you any vestige of self respect you may try to grasp. Yes, your life would be better spent getting your plumber&#8217;s certificate and doing something constructive, but I recognize that it&#8217;s hard to make big changes. But let&#8217;s not pretend that anyone would let you anywhere near the census or running a clinical trial.</p>
<p>Once data science was reified, the fight was on for who got to decide what it was. I know a few of the conceptions:</p>
<ul>
<li>The adsmen of Silicon Valley</li>
<li>People trying to repackage Taylor&#8217;s scientific management</li>
<li>Programmers with some machine learning training who found themselves analyzing data, but without knowing what had been done in statistics proper</li>
<li>Academics trying to teach something useful, who hope to use the label to make an end run around the lethargy and uselessness of the statistics department</li>
<li>Journalists looking for the next big thing to report on</li>
<li>Companies trying to sell tools for data analysis who had found you can&#8217;t make money selling data analysis tools to statisticians unless you&#8217;re SAS or IBM</li>
</ul>
<p>There are more. The full study of the factions and their interplay would make a very interesting sociology or history thesis.</p>
<p>But the fact that we&#8217;re arguing over this at all is a symptom of the failure of technical education today. That&#8217;s right: if you&#8217;re a professor in a technical department, it is <em>your fault</em> that these certificates exist. You have failed your students, and the world is paying a price in buzzwords.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/why-we-are-afflicted-with-data-science-degrees/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A criticism of Ruby</title>
		<link>http://madhadron.com/a-criticism-of-ruby</link>
		<comments>http://madhadron.com/a-criticism-of-ruby#comments</comments>
		<pubDate>Mon, 25 Feb 2013 19:15:18 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=307</guid>
		<description><![CDATA[Introduction This is a criticism of the Ruby language and its community. Some of the criticisms point out fundamental errors in the language design, or poor choices in what historical examples to follow. Others are about errors in the process of developing the language that have reduced its usability. I have intentionally avoided all criticisms [...]]]></description>
			<content:encoded><![CDATA[<h1>Introduction</h1>
<p>This is a criticism of the Ruby language and its community. Some of the criticisms point out fundamental errors in the language design, or poor choices in what historical examples to follow. Others are about errors in the process of developing the language that have reduced its usability.</p>
<p>I have intentionally avoided all criticisms based on my own familiarity with some particular language. This means I have had to lay out a clear statement of the criteria a given language construct is supposed to address in practice, with examples of how various languages have handled it, and then turn to how Ruby has failed in this area.</p>
<p>There are things about Ruby that I think are poor choices—usings symbols like <code>@</code> and <code>$</code> instead of spelling out names, or Perlisms such as <code>$/</code> for newline separator. I can make cases for my tastes here, but I acknowledge these as questions of taste. I am concerned here with substantive errors in Ruby&#8217;s language design, often cases where an infelicitous combination of small choices led to a cascade of complexities.</p>
<p>This document is necessarily negative about Ruby. Every section culminates with a criticism of Ruby. I have used what I found frustrating about Ruby as a lens for examining issues of language design. If I had used C++ or PL/I as a lens, then this document would be a sequence of negative statements about those languages instead. I must admit that the choice of Ruby was no accident, and I felt a certain gleeful sadism in the dissection, perhaps in revenge for my own frustrations in programming in the language.</p>
<h1>Documentation</h1>
<p>The degree and form of documentation is nearly uniform within a programming language community. Look at the language, its standard library, and those libraries and tools in common use, and the organization, the amount of detail, the structure of the reference material, and the form in which the documentation is presented and distributed is nearly uniform. For those who work in a single community this may seem unremarkable, but for those of us who wander (more or less uncomfortably) among many communities, it is a cause for astonishment.</p>
<p>The <a href="#appendix">appendix</a> goes into detail about the documentation cultures of several programming communities, but there are some defining questions:</p>
<ul>
<li>How is expository and reference information organized? Is API documentation completely separate from tutorials, as in Java, or does every API reference begin with a short introduction and some examples of use, as in Python?</li>
<li>How closely intertwined are code and documentation? They may be completely separate, as in C, or one and the same, as in Knuth&#8217;s literate programming.</li>
<li>How is the documentation accessed? Is it from static manuals, even hard copy books, is it found by quering a running instance of the language runtime, or both?</li>
<li>How much detail is typical in the expository and reference information for the language? This can range from scanty vignettes as in BioConductor to Common Lisp&#8217;s closely defined standard.</li>
</ul>
<p>Each of these decisions has tradeoffs. If I use docstrings extracted by a running image, then a piece of code that defines a large number of functions on the fly, none of which have an explicit representation in the source code, can easily give them documentation accessible like any other function. On the other hand, it is easy to lose the docstring on an object when transforming it in the instance (as with Python&#8217;s decorators)—a problem that static comments a la Java do not have. Other criteria have some choices that are clearly superior. The lack of both complete exposition and complete reference documentation in BioConductor&#8217;s vignettes, for example, is inferior to Common Lisp&#8217;s standard by all criteria except saving the implementor the time and effort to write documentation.</p>
<p>What are Ruby&#8217;s characteristics in this area?</p>
<ul>
<li>Expository information is separate from reference information. Reference information is provided in HTML format. Expository information is scattered among books, blog posts, and tutorials.</li>
<li>The reference documentation is extracted from the source code, though there are at least three separate tools—RDoc, Tomdoc, and Yard, each with separate formatting conventions—for doing so.</li>
<li>The documentation is accessed via static manuals. Ruby does not use docstrings or provide documentation in the runtime.</li>
<li>The reference documentation is sketchy. It is typical for a Ruby programmer to refer to the source code to figure out the semantics of a function.</li>
</ul>
<p><a name="interfaces"></a></p>
<h1>Interfaces, protocols, and abstractions</h1>
<p>What are the methods on a socket object? Most programmers will immediately respond: read, write, and close. There may be more—whether the socket has data ready to read, is it closed already, and various others depending on the exact semantics a programmer learned for sockets—but those three will be universal. What methods will a stream have? Read and close at least. Such fixed sets of methods on a group of types we refer to as an interface.</p>
<p>All languages have interfaces. In Java, they are explicitly called interfaces and are distinct entities in the language:</p>
<pre><code>public interface Stream { public String read(int n); public void close(); } </code></pre>
<p>Clojure and Haskell give them their own existance under the names &#8220;protocol&#8221; and &#8220;typeclass&#8221;, respectively. Common Lisp, Python, and (weirdly) C don&#8217;t give interfaces a language entity, but make heavy use of them in practice. Any object in Python with a <code>read</code> method and a <code>close</code> method, obeying some basic semantics, is usable as a stream, whether it is a file or a network socket.</p>
<p>In C, we can write a generic function over streams by passing, in addition to the stream to work with, a function to read from that stream and a function to close that stream. Such interfaces are essentially untyped, but common practice among good practitioners of the language, and are found even in the C standard library. For example, <code>qsort</code> (a quicksort function) has the signature</p>
<pre><code>void qsort(void *base, size_t nel, size_t width, int (*compar)(const void *, const void *)); </code></pre>
<p>The first three arguments give a pointer to an array of memory, the number of elements in the array, and the number of bytes per element. The last argument is a function to compare two elements. The actual types and widths of the elements, and the types and widths the function expects to operate on, are completely unavailable to the compiler.</p>
<p>Interfaces extend beyond simple things like streams. Something as complicated as a SAX parser has an interface. There may be multiple SAX parsers in a language, with different tradeoffs of speed, memory use, ease of installation, etc., but there is no reason that they should not all share an API for SAX parsing. Python has codified this. Many elements of its standard library will have a version written in Python (for portability) and one written in C (for speed). The C version has its library name prefixed with a lowercase &#8216;c&#8217;, but the API within the libraries is identical. So whether you use <code>ElementTree</code> or <code>cElementTree</code>, your code should produce identical results, but it will run much faster with the latter.</p>
<p>The most obvious argument for strict interfaces is reducing how much a programmer must memorize. You avoid having distinct blocks of code to handle <em>this</em> kind of socket versus <em>that</em> kind of socket. But the real argument for interfaces is not what they save you from having to do, but what they make possible. For example, you can define a function that takes two streams, and returns a stream which concatenates them, or that returns a stream with all XML doctype definitions removed, or returns a stream that allows you to peek an arbitrary number of characters ahead into the stream it is transforming. These are all still streams. The glory of interfaces is not that they save you work, but that they make disparate types with common behavior into something that can be combined and transformed. Any one of these transformed streams can be used as an argument to any other of them. They have become an algebra. In the absence of an interface, none of these transformed streams are usable by existing code.</p>
<p>This is one of Ruby&#8217;s weaknesses, possibly because the documentation is sparse and it is impossible to satisfy an undocumented interface. A few (by no means comprehensive) examples:</p>
<ul>
<li><code>TcpSocket</code> and <code>SslSocket</code> do not name their read and write methods the same thing.</li>
<li>The SAX parsers in Rexml (which is pure Ruby) and Nokogiri (which is a binding to the C library <code>libxml2</code>) differ only cosmetically. Their constructors are slightly different, and the name of the event handling functions they expect are different, though they do precisely the same thing. There is no reason they could not have identical APIs.</li>
<li>The XML libraries, and everything else that uses a stream, don&#8217;t use any well defined interface, so writing stream transformers is an exercise in frustration: run, see what method the library tried to use that was missing, implement it, repeat.</li>
</ul>
<h1>Namespaces, compilation units, and modules</h1>
<p>I know of three concepts in computer science for organizing large amounts of code. The three are orthogonal. That is, if we write out the algebraic properties satisfied by the operations on them, there are no properties relating operations of one to operations of another (though there are optimizations in compilation that can be made that interact among them).</p>
<p>The three concepts are:</p>
<ul>
<li><strong>Compilation units</strong> to control how much work a compiler must do to recompile a program when parts of it have changed.</li>
<li><strong>Namespaces</strong> to make the bindings of symbols predictable.</li>
<li><strong>Modules</strong> to define interfaces among components of the system.</li>
</ul>
<p>All languages in general use today provide at least a partial implementation of all three, at least by social convention if not by actual language support.</p>
<p>Many language provide a notion of library, package, or assembly, but these can be seen as recursion in these three notions: a library as a namespace containing namespaces, versioning of packages as a compilation unit containing compilation units.</p>
<h2>Compilation units</h2>
<p>When recompiling a program, the simplest way is to compile all of the code from scratch, as if this were the first time it had been compiled.</p>
<p>In practice, this is often impractical. Some systems require hours or even days to build. In systems where the build time is long, the code going into the present build usually differs only slightly from the code that went into previous builds. To take advantage of this, we can divide a program into pieces and draw a directed, acyclic graph between the pieces, with one piece linked to another if its code, in the course of its life, will transfer control to a piece of code in the other. When we change code in one piece, we only need to recompile it and any pieces with a path to it in the graph. For systems with thousands or tens of thousands of pieces, this can be a remarkable speedup. These pieces are what we refer to as compilation units.</p>
<p>In most languages today, compilation units are some combination of files. In Python, the compilation unit is a single file. In C and C++, it is a source file and one more headers.</p>
<p>For a compilation unit we expect to be able to compile it, to be able to measure if it has changed since the last compilation, and, for any pair of compilation units, whether changes to one will also require the other to be recompiled.</p>
<h2>Namespaces</h2>
<p>Say you decide to use a third party library when writing your program. You don&#8217;t want to worry about what every binding in that library is, and whether you are going to collide with it when making your own bindings. Further, you want to be able to write your code to be interpreted in the context of a known set of bindings. Yet you also want to be able to override existing bindings, or attach bindings from other libraries to your current context. Handling these cases has leads us to namespaces.</p>
<p>A namespace encapsulates a set of bindings—functions, classes, constants, macros, or whatever other constructs the language allows a name to be assigned to—so they are not impacted by bindings in other namespaces. To make namespaces useful they must have what I call the &#8220;relocatability property&#8221;: if I move some code from one namespace to another, then attach that namespace, there should be no change in the behavior of the program.</p>
<p>In languages without explicit namespace support, such as C, Smalltalk, and Emacs Lisp, developers usually prefix their bindings with a library name. Every binding in GLib in C is prefixed with <code>g_</code>. All of <code>org-mode</code>&#8216;s bindings are prefixed with <code>org-</code>. If everyone adheres to this convention, leaving unprefixed symbols to the default language and its standard library, then there need be no namespace collisions.</p>
<p>Continuously prefixing everything gets awkward quickly, so languages with more explicit namespace support, such as C++ and Python, allow you to attach part or all of a namespace, possibly qualified or renamed in some systematic way, to another namespace. In Python, where namespaces are files (which are also compilation units and modules as well) you can write</p>
<pre><code>import something something.f() import something as new_name new_name.f() from something import f f() from something import * f() </code></pre>
<p>In C++, where namespaces are separate from compilation units, you can do the same thing:</p>
<pre><code>namespace something { void f() { … } } something::f(); namespace new_name = something; new_name::f(); using namespace something::f; f() using namespace something; f(); </code></pre>
<p>There is another notion of namespace which is similar enough to be justifiably called a namespace, but different enough to be confusing: different syntactic usages of a symbol may refer to different bindings. That&#8217;s obscure, but a bit of obfuscated Java will make all clear:</p>
<pre><code>public class Main { public static class T&lt;T&gt; { public T T; public T(T value) { this.T = value; } } public static String T() { return "Hello, World!"; } public static void main(String[] argv) { String value = T(); T&lt;String&gt; T = new T&lt;String&gt;(value); System.out.println(T.T); } } </code></pre>
<p>Everything in sight is called <code>T</code>, but almost every <code>T</code> refers to something different. In Java, the same symbol can refer, based on its syntactic position, to</p>
<ul>
<li>a local variable</li>
<li>a function or method</li>
<li>a type or package</li>
<li>a generic type parameter</li>
</ul>
<p>The extreme cases of this kind of namespace are Common Lisp, where you can add your own namespaces of this kind to the language (and Common Lisp already has five or six of its own built in), and, at the other end of the spectrum, Scheme, which has one namespace for everything.</p>
<p>These are also namespaces, but the operations on namespaces that we define next make no sense on them.</p>
<p>Returning to our namespaces for encapsulating bindings in code, there are a clear set of operations to support. We must be able to import the bindings from one namespace into to another, we must be able map a namespace into another with all its bindings qualified, typically by a prefix, and we must be able to extract a subset of a namespace.</p>
<p>These operations don&#8217;t always map directly to a language&#8217;s primitive constructs. I have chosen them because they are easy to write as functions with clear algebraic laws relating them. In Python they correspond to:</p>
<pre><code># Attach all bindings in X to the current namespace. from X import * # Attach a subset {a, b, c} of the bindings in X # to the current namespace. from X import a, b, c # Qualify namespace X with prefix 'X.' and attach it # to the current namespace. import X # Qualify namespace X with prefix 'Y.' and attach it # to the current namespace. import X as Y </code></pre>
<p>In C++, any namespace in scope is attached to the current namespace, qualified by the name under which it is in scope. The other operations correspond to:</p>
<pre><code>// Attach an in scope namespace X qualified by the prefix Y. namespace Y = X; // Attach namespace X to the current namespace. using namespace X; // Extract a subset {a, b, c} from the namespace X and attach // its elements to the current namespace. using namespace X::a; using namespace X::b; using namespace X::c; </code></pre>
<h2>Modules</h2>
<p>Modules, strictly speaking, are aspects of program design, not programming language design. A modular program is one made of distinct parts that can be reasoned about, manipulated, tested, and replaced without touching the rest of the program. It&#8217;s a fascinating problem, and the best advice I have yet seen on it is from David Parnas&#8217;s 1972 paper &#8220;On the Criteria To Be Used in Decomposing Systems into Modules&#8221;: &#8220;&#8230;one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others. Since, in most cases, design decisions transcend time of execution, modules will not correspond to steps in the processing. To achieve an efficient implementation we the assumption that a module is one routines, and instead allow subroutines to be assembled collections of code modules.&#8221;</p>
<p>That being said, a programming language and its community can have tools to declare and enforce modules once they have been designed. Since we are talking about tools to support design, there isn&#8217;t a clean, mathematical formulation here as there is for namespaces or compilation units. There are certain properties that have showed up in tooling to support modular design in various languages, and I cannot offer much more than an enumeration of those I have recognized:</p>
<h3>Visibility</h3>
<p>One of the simplest ways to enforce a module&#8217;s boundaries is to make its internals unreferencable from outside. For example, the method <code>getInput</code> in the Java class</p>
<pre><code>class ReaderModule { ... public Stream getInput() { ... } ... } </code></pre>
<p>could be reading from a file, a network stream, or generating random data without reference to the outside world. If there is no other information available than that its return type is a <code>Stream</code>, any other code using this class cannot depend on the module&#8217;s internals simply because there is no way to refer to them. Similarly, the internal functions to manipulate or examine data structures may be hidden. If the right bindings are hidden, it makes the values and behavior of a module inscrutable from the outside.</p>
<p>Most languages in use today have some constructs to control visibility, such as scoping of local variables, namespaces, and private/public declarations on class fields and methods. In C, top level bindings in a compilation unit that are declared <code>static</code> are visible only in that compilation unit. C++ inherits this ability, though in some circles it is eschewed in favor of anonymous namespaces, the contents of which are visible outside of the namespace in the same compilation unit, but not in other compilation units. In Python, any binding prefixed with an underscore is (by unenforced convention) private. Common Lisp also lets any binding be declared private, though it uses a distinct syntax to override the private declaration and access the binding instead of a naming convention.</p>
<h3>Parameterization</h3>
<p>It is a common pattern for one module to be parameterized over another. A stream transformer may be parameterized over a stream type. A queue may be parameterized over the type of its contents. So we might have functions on a stack in Haskell with the types</p>
<pre><code>push :: Stack a -&gt; a -&gt; Stack a pop :: Stack a -&gt; a empty :: Stack a -&gt; Bool </code></pre>
<p>Looking at these functions, it is clear that the stack is parameterized over the type of its contents. A few languages allow that parameterization to be declared once and for all, as in SML, where the module declaration for the stack might be written</p>
<pre><code>signature STACK = sig type 'a stack val push : 'a queue -&gt; a -&gt; 'a queue val pop : 'a queue -&gt; a val empty : 'a queue -&gt; bool end </code></pre>
<p>Other parts of the program refer to a parameterization of the module. Most of the common uses of parameterized modules and the manipulations of them available in SML are handled more simply with constructs like Haskell&#8217;s typeclasses, but the notion of declaring parameterizations at this level is worth knowing about.</p>
<h3>Contracts</h3>
<p>The simplest case of a contract is compile time type checking, as in the C function</p>
<pre><code>double square(double x) { ... } </code></pre>
<p>This function always takes a <code>double</code> and returns a <code>double</code>. The compiler can check that this is true in most cases (though not when pointers are involved, or the signature of <code>qsort</code>,</p>
<pre><code>void qsort(void *base, size_t nel, size_t width, int (*compar)(const void *, const void *)); </code></pre>
<p>would be useless). Modern type systems have pushed this much further, until in recent languages like Agda, the compiler can assert the type of every expression in the program at compile time, without the gaps that C has around pointers, and the types can express details such as the lengths of lists or the dimensions of matrices. Actually, Agda&#8217;s type declarations are themselves a Turing complete language.</p>
<p>Compile time isn&#8217;t the only time to check assertions. For decades, the mathematics wasn&#8217;t in place to do very sophisticated contracts in the type system, so some languages, beginning with Eiffel, added run-time contracts. Here is an example of a contract in PLT Racket for an absolute value function:</p>
<pre><code>(-&gt; number? ; Constraint on the input (and/c number? (or/c positive? zero?))) ; Constraint on the output </code></pre>
<h2>How Ruby does it</h2>
<p>Ruby&#8217;s compilation unit is a single file. The language&#8217;s support for modular programming is restricted to providing the keywords <code>public</code>, <code>private</code>, and <code>protected</code> to control visibility of methods defined on modules and classes. We saw in the section on <a href="#interfaces">interfaces</a> that the libraries in the language make using visibility and parameterization to enforce module boundaries unnecessarily difficult.</p>
<p>The Ruby community uses a language construct called <code>module</code> for namespaces, as well as for mixins. Like C++, any Ruby <code>module</code> in scope is attached to the current namespace qualified by the module&#8217;s name. Ruby <code>module</code>s can be attached to other modules, and qualified by assigning them to a different variable, but they cannot be subsetted, nor do they have the relocatability property, since</p>
<pre><code>def f() puts "Hello" end def g() f() end g() </code></pre>
<p>must be changed to</p>
<pre><code>module Something def self.f() puts "Hello" end def self.g() f() end g() end </code></pre>
<p>and</p>
<pre><code>def f() puts "Hello" end class A def g() f() end end def g() f() end A.new().g() g() </code></pre>
<p>cannot be put in a Ruby <code>module</code> at all.</p>
<h1>Multiplication of like things</h1>
<p>The phrase &#8220;orthogonal&#8221; is often bandied about in praise of programming languages, but what does it mean and why is it desirable? Consider pointers and references in C++. They are similar in that both are a way of passing parameters by reference, so</p>
<pre><code>void increment(int *n) { *n += 1; } </code></pre>
<p>and</p>
<pre><code>void increment(int &amp;n) { n += 1; } </code></pre>
<p>do exactly the same thing. They differ in that pointers may be assigned to point to new memory locations, may be incremented and decremented to shift the memory they refer to, and they must be dereferenced in order to access the values they refer to. References are used like local variables, and the memory they refer to is fixed at their creation. The semantics of pointers and references overlap, though they have their differences, so we say that they are not orthogonal.</p>
<p>Another example is the distinction between superclass and interface in Java. Both are used to provide polymorphism (any subclass of <code>A</code> can be used in a function that takes an argument of type <code>A</code>, and the same is true of interfaces). But superclasses may provide implementations of methods that their subclasses will inherit, while interfaces may only declare that implementing class must define a given method with a given signature. Though the two concepts are not orthogonal, they let Java retain the simplicity of single inheritance (since inherited methods can only be inherited from the superclass), but interfaces also give it the polymorphism of multiple inheritance while avoiding its complexities (which are principally how to order calls to superclass methods).</p>
<p>Similarly, having both pointers and references in C++ is a tradeoff. C++ inherited pointers from C. The language was originally conceived of as a superset of C, so pointers had to stay. Yet pointers are a source of a disproportionate number of the errors in C programs. References fill one of the most common uses for pointers while avoiding all the errors that were possible with pointers, and so they were incorporated.</p>
<p>Now that we have established what it is, what makes orthogonality desirable? Simply that humans are good at memorizing how very distinct things work, but bad at keeping the details of similar things straight. No one confuses <code>for</code> loops and variable assignment, though both create a binding of a certain value to a name. They are as hard to confuse in your memory as a small Chinese woman with a giant black man.</p>
<p>Beyond that, nonorthogonal concepts are not intrinsically bad. C++ references and Java interfaces are both clever, useful solutions. Nonorthogonal concepts become a problem when they become a significant mental task for a programmer to disentangle. Beyond that, they can be a symptom of problems that arose in the course of in language design. Language designers don&#8217;t set out to incorporate nonorthogonal constructs in their language. Once the outline of the language is established, problems will rear their head in the details. It is resolving these details that leads to the addition of nonorthogonal concepts.</p>
<p>Ruby has accumulated a number of nonorthogonal concepts which significantly burden the programmer.</p>
<p>There are two methods to attach a module to the current context. <code>extend</code> adds the methods in a module to the current object; <code>include</code> adds them to whatever will be created by object&#8217;s new method. So in a class, <code>include</code> adds a mixin&#8217;s methods as class methods. <code>extend</code> adds a mixin&#8217;s methods as instance methods to the objects a class creates. For a non class object, <code>include</code> adds methods to it, and <code>extend</code> shouldn&#8217;t do anything at all.</p>
<p>Ruby has four notions that resemble a function: methods, blocks, procs, and lambdas. Methods are hunks of code that can be executed by sending messages to objects, and that terminate their execution and return to the message sender when the <code>return</code> statement is called. A block is a hunk of code, derived in analogy with Smalltalk, but unlike Smalltalk, where a block is a function and is the only form of function in the language, a block in Ruby is not usable directly. It has to be wrapped in a proc or a lambda, which differ in how they handle omitted arguments and how <code>return</code> behaves in them. Procs fill in default values of <code>nil</code> for omitted positional arguments (so if I call a three argument proc with two arguments, the third will be bound to <code>nil</code> in the body of the proc). Lambdas do not. The <code>return</code> statement of a proc returns from the next enclosing method. The <code>return</code> statement of a lambda returns from the lambda itself. And methods turn out to be a different type equivalent to lambdas plus names.</p>
<p>Ruby also provides <em>two</em> exception handling systems, identical except for their intended purpose. <code>raise</code>/<code>rescue</code> is meant for normal exception handling, but it has become a proverb not to use exception handling for control flow, since it is hard to understand and reason about code that does so. In Ruby this reason was apparently forgotten, but the letter of the proverb was obeyed: a separate system of <code>throw</code>/<code>catch</code> was created for control flow.</p>
<p>None of these justifies the mental burden they place on the programmer.</p>
<h1><em>MN</em> vs. <em>M+N</em> and what it did to the language</h1>
<p>Smalltalk was the first object oriented language. It was built around the notion of objects which received messages and executed blocks of code in response. That is the underpinning of most object oriented languages to this day. However, there is a basic problem with it in practice: how do you write a polymorphic max function? That is, a function that takes two arguments of the same type and returns the larger of the two, ordered according to whatever ordering the type defines. In Smalltalk, you must define it on every class that you want it to have, which leads to an enormous amount of repeated code. This is true of any other algorithm you want to work on a given type, so for <em>M</em> algorithms and <em>N</em> types, you end up writing <em>MN</em> methods.</p>
<p>This has been solved in various ways. Common Lisp&#8217;s CLOS, and its descendents such as Dylan removed messages and instead defined generic functions. Each generic function could have multiple implementations, and which implementation was used was chosen at runtime based on the types of the arguments passed to it. This led in turn to the key insight of Stepanov&#8217;s Standard Template Library: you can separate iteration strategies from algorithms. For each of the <em>N</em> types you implement its iteration strategy, and you implement each of the <em>M</em> algorithms in terms of that strategy. Result: you write <em>M+N</em> methods.</p>
<p>The other major solution was to keep message passing and add multiple inheritance. This let programmers inherit from both the natural superclass of a class and also from a class carrying implementations of the various methods, though it adds its own complexities over how to order calls to superclass methods.</p>
<p>Ruby took the multiple inheritance route, but not openly. Instead it retained single inheritance from a superclass, introduced a new inheritance hierarchy of Ruby <code>module</code>s, and provided two separate mechanisms, <code>include</code> and <code>extend</code>, to have a class inherit from them. Now you must memorize how inheritance and Ruby&#8217;s two mixin expessions, <code>include</code> and <code>extend</code>, interact as well as how they order calls to superclass methods.</p>
<h1>Tooling and reasoning</h1>
<p>Most programming communities have their expectations about what tools are necessary and what are frivolous, and there is no core that every community would agree on as necessary. Turbo Pascal programmers assumed that an integrated debugger was a basic tool of a programmer, but the term unit testing did not exist yet. Python programmers today regard integrated debuggers as a luxury, but a unit testing library as a necessity.</p>
<p>There are a number of language agnostic tools—version control systems, build systems, literate programming tools—but beyond that the tooling which can be straightforwardly built in a language depends on two things: how easy the language is to parse, and how easy code in the language is to reason about.</p>
<p>Parsing is an old and largely solved problem. We know how to define unambiguous grammars that are easy and fast to parse. ALGOL 60 already had a mature, precise specification of its grammar, and most of the imperative languages that followed it were at least straightforward to parse. C, for example, is harder to parse than Lisp, but not terribly onerous. C++, unfortunately, is a nightmare to parse.</p>
<p>For many languages, such as Lisp and recent versions of Python, the live instance will parse code for you and return an abstract syntax tree, further reducing this burden.</p>
<p>How easy a language is to reason about is roughly equivalent to how rich a set of program transformations it supports. Some of these transformations are program preserving, ranging from from renaming a variable in the source code, to converting programs to continuation passing style and all the other tricks of writing optimizing compilers.</p>
<p>Others transform the program to other useful forms. Transform the code into a list of all the entities defined in the program and what regions of code they correspond to and you have the underpinnings of a code browser. Transform it to a simplified, decidable execution model where common errors can be detected without running the program and you have a static analysis tool. Alter the code to track what parts are run and what are not when exercised by a test suite and you have a code coverage tools. Mutate the code randomly when it is exercised by a test suite and you have a measure of how incisive the tests really are.</p>
<p>These transformations are also what a programmer does in his head when he reasons about code, so how easy code in a language is to reason about is equivalent to how easy it is to write tools in the language.</p>
<p>Ruby has no well defined grammar. All the Ruby implementations today reuse Matz&#8217;s original parsing code. There are various BNF grammars people have written for the language, but they may or may not match the actual implementation. Nor does Ruby provide a mechanism to turn code into an abstract syntax tree for you as Python and Lisp do. Anyone writing tools beyond a unit testing library must solve this problem first, before ever doing any real work.</p>
<p>Reasoning about Ruby is not much better, and the tools reflect this. There are a plethora of unit test libraries ()all incompatible). Beyond that there is a single code coverage tool providing line coverage, but not branch or instruction coverage, and which will ignore large hunks of code if not configured perfectly, including having the order of <code>require</code> statements in a file be just right. There are a handful of static analysis tools which have the insight you would expect from the first prototype of <code>lint</code> written over a weekend in the 1970&#8242;s. There is a debugger that may or may not skip the body of a loop when single stepping through, depending on how the loop is written, and may or may not crash, and usually fills the console of any frontend its is hooked to with garbage so that any console output of the program itself is obscured. And that&#8217;s it. The Ruby tool ecosystem.</p>
<h1>Summary</h1>
<p>The criticisms above are not matters of taste. They are errors in language design. Unrelocatable namespaces are an error. Introducing three separate methods for inheritance and a separate inheritance hierarchy when multiple inheritance was well understood before Ruby&#8217;s initial creation is an error. Documentation so sparse that its reader must turn to reading the source code instead is an error.</p>
<p>If Ruby were the only language in its niche, these errors might be tolerable, but it is not. So let me make my position on the language clear: Ruby is deprecated. Let it follow Perl into the dustbin of language history.</p>
<p><a name="appendix"></a></p>
<h1>Appendix: Documentation conventions</h1>
<h2>C programmers on Unix-like systems</h2>
<p>There are many communities of C programmers—C programmers in Microsoft&#8217;s ecosystem, in the Macintosh ecosystem, those who worked in Borland&#8217;s Turbo C, those who write for Unix-like systems—and these communities are distinct and have their own conventions for documentation. For those who work in the only community around their language of choice, such as PHP, or in the presence of slightly fragmented subcommunities, such as the scientific community centered around NumPy and SciPy in Python, this will seem strange, but a Windows C programmer would be quite lost on a Unix-like system and in the surrounding community, and vice versa. This section is concerned with the community of C programmers writing for Unix-like systems.</p>
<p>The community writing for Unix-like systems in C has memorized their core language, which is quite small, and almost never refer to a language reference, though such programmers usually have a copy of some C textbook, and one or more books on Unix-like systems (Stevens and Rago&#8217;s <em>Advanced Programming in the Unix Environment</em> or its ilk).</p>
<p>The community&#8217;s reference material is divided into <code>man</code> pages, which are read on text mode terminals. Each <code>man</code> page describes a small set of related functions, such as all the variants of <code>printf</code> or <code>fork</code>, and the pages are organized in a fixed way. The <code>man</code> page of <code>fork</code> is typical:</p>
<pre><code>FORK(2) BSD System Calls Manual FORK(2) NAME fork -- create a new process SYNOPSIS #include &lt;unistd.h&gt; pid_t fork(void); DESCRIPTION Fork() causes creation of a new process. The new process (child process) is an exact copy of the calling process (parent process) except for the following: o The child process has a unique process ID. o The child process has a different parent process ID (i.e., the process ID of the parent process). o The child process has its own copy of the parent's descriptors. These descriptors reference the same underlying objects, so that, for instance, file pointers in file objects are shared between the child and the parent, so that an lseek(2) on a descriptor in the child process can affect a subsequent read or write by the parent. This descriptor copying is also used by the shell to establish standard input and output for newly cre- ated processes as well as to set up pipes. o The child processes resource utilizations are set to 0; see setrlimit(2). RETURN VALUES Upon successful completion, fork() returns a value of 0 to the child process and returns the process ID of the child process to the parent process. Otherwise, a value of -1 is returned to the parent process, no child process is created, and the global variable errno is set to indi- cate the error. ERRORS Fork() will fail and no child process will be created if: [EAGAIN] The system-imposed limit on the total number of pro- cesses under execution would be exceeded. This limit is configuration-dependent. [EAGAIN] The system-imposed limit MAXUPRC (&lt;sys/param.h&gt;) on the total number of processes under execution by a single user would be exceeded. [ENOMEM] There is insufficient swap space for the new process. LEGACY SYNOPSIS #include &lt;sys/types.h&gt; #include &lt;unistd.h&gt; The include file &lt;sys/types.h is necessary. SEE ALSO execve(2), sigaction(2), wait(2), compat(5) HISTORY A fork() function call appeared in Version 6 AT&amp;T UNIX. CAVEATS There are limits to what you can do in the child process. To be totally safe you should restrict yourself to only executing async-signal safe operations until such time as one of the exec functions is called. All APIs, including global data symbols, in any framework or library should be assumed to be unsafe after a fork() unless explicitly documented to be safe or async-signal safe. If you need to use these frameworks in the child process, you must exec. In this situation it is reasonable to exec yourself. 4th Berkeley Distribution June 4, 1993 4th Berkeley Distribution </code></pre>
<p>A skilled C programmer in this community can find what he needs in these strictly formatted pages with great speed, and documents his own libraries in the same way. The Perl community—Perl began life as a normalization of the diverging scripting languages associated with shells across various Unix-like systems—inherited this tradition.</p>
<p>The exception to this pattern is the GNU project. Richard Stallman came from a Lisp background, which had a very different tradition, and brought that tradition with him. Thus GNU software in this community tends to have both <code>man</code> pages and the long form manuals more typical of Lisp.</p>
<p>Both the <code>man</code> pages and the GNU manuals are independent documents. They are usually kept in a separate directory from the code, and compiled for viewing with entirely separate tools. The correspondence between the man pages and the source code is maintained by hand. Comments in the code are to understand its workings. Its purpose and intended behavior are recorded in the (separate) documentation.</p>
<h2>Common Lisp</h2>
<p>Common Lisp is a language with a single community. Indeed, the language was a political compromise meant to unify a number of divergent Lisp communities with shared interests. The compromise defined the language in great detail, from the branch cuts of the numerical functions over complex values, to the interface to the debugger, to standard ways of controlling whether code was to be compiled or interpreted, and the Common Lisp community is nearly unique in that their standard is their primary reference documentation while they work. If their particular implementation does not match the standard, it is expected that the vendor will fix the implementation rather than the programmer work around it.</p>
<p>Third party libraries in Common Lisp have similar manuals. Vendors provide such manuals for their extensions. The culture thinks in terms of coherent, book-like documentation, in contrast to the <code>man</code> pages of the C community on Unix-like systems described above. Indeed, a few still use a hard copy of <em>Common Lisp, the Language, 2nd ed.</em> as their reference, though most of the community uses the HTML based <em>Common Lisp Hyperspec</em>.</p>
<p>The manuals in Common Lisp are entirely separate from the code, but the language defines &#8220;docstrings&#8221;: the first expression in a Common Lisp definition, if it is a string, will be taken by Common Lisp systems to be documentation for the definition. A docstring is accessible by asking a running Common Lisp system for the docstring of a definition loaded into it:</p>
<pre><code>(defun square (x) "Return the square of x" (* x x)) (documentation #'square 'function) ; Evaluates to "Return the square of x" </code></pre>
<p>Docstrings in the Common Lisp community don&#8217;t have as fixed a structure as <code>man</code> pages, and often have a much narrower scope, since there tend to be separate manuals describing how to use the system.</p>
<h2>Python</h2>
<p>Python occupies a middle ground between the Common Lisp community and the C programmers described above. The core language and standard library are documented in a series of HTML pages similar to the <em>Common Lisp Hyperspec</em>. The documentation usually begins with enough exposition to understand the topic at hand, followed by reference documentation for the public functions and classes provided. Unlike Common Lisp, the documentation is specific to each version of Python.</p>
<p>Also like between Common Lisp, Python has docstrings, which are again a string as the first expression of a definition, as in</p>
<pre><code>def square(x): "Return the square of x." return x*x print square.__doc__ # Prints "Return the quare of x." </code></pre>
<p>Unlike in Common Lisp, the docstrings are not only accessible in the running Python instance, but are are extracted into the HTML manuals, so there are conventions governing their form and how they will be formatted for extraction. The most basic is that the docstring should begin with a quick, one line description, followed by a more comprehensive one (this convention was inherited from the Lisp community via Emacs Lisp).</p>
<h2>Java</h2>
<p>Like Python, Java puts its documentation into its source and uses tools to extract it into manuals. Unlike Python, it uses comments prefacing definitions to document them, as in</p>
<pre><code>/** * Return the square of x. */ public static double square(double x) { return x*x; } </code></pre>
<p>This documentation is lost in compilation, so the extracted manual is the only reference. There is nothing equivalent to looking up a docstring in a running Python or Common Lisp instance. Like Python, there are strict conventions for organization and formatting the documentation.</p>
<p>Unlike Python, the manuals tend to be only API documentation, with very little exposition. Java libraries tend to have separately written tutorials to teach a programmer enough about the library that he can hopefully figure out whatever else he needs from the API reference.</p>
<h2>R</h2>
<p>R is an interactive language with a lineage going back to Bell Labs, the home of Unix, so it is no surprise that the documentation of its functions is nearly identical to <code>man</code> pages (though typeset in <code>Rd</code>, a language resembling LaTeX, instead of in <code>groff</code>), but accessed from within R itself. R itself also has a number of long form manuals introducing the language, specifying its grammar, and covering certain major areas such as importing data or writing extensions.</p>
<p>The general statistical community around R documents its libraries in the same way—man pages plus, sometimes, longform manuals covering particular areas—but there is a second, increasingly separate subcommunity centered in bioinformatics around the BioConductor libraries, which has a completely different documentation tradition. BioConductor eschews R&#8217;s online references and instead produces &#8220;vignettes&#8221; for its various packages. A vignette is PDF file with a few paragraphs explaining the library, some annotated code examples to quickly get started, and some terse reference documentation of the most commonly used functions in package.</p>
<h2>TeX</h2>
<p>Donald Knuth&#8217;s TeX language is meant for typesetting documents, so it is no surprise that it has an interesting documentation tradition. Indeed, I mention it to describe the logical conclusion of automatically extracting documentation from programs: literate programming. Knuth, who is concerned with producing code to be used and read for decades to come rather than years to come, proposed writing a document that happened to contain a program in it that a tool could extract to a compilable form, and this is exactly what he did with TeX.</p>
<p>The code need not be organized linearly in the document, nor kept together in any particular way. Blocks of it may be defined anyway and glued together elsewhere. Here is an example taken from the <a href="http://www.cs.tufts.edu/~nr/noweb/examples/wc.html">port</a> of the <code>wc</code> command to the <code>noweb</code> literate programming system:</p>
<pre><code>Here, then, is an overview of the file &lt;tt&gt;wc.c&lt;/tt that is defined by the &lt;tt&gt;noweb&lt;/tt&gt; program &lt;tt&gt;wc.nw&lt;/tt&gt;: &lt;&lt;*&gt;&gt;= &lt;&lt;Header files to include&gt;&gt; &lt;&lt;Definitions&gt;&gt; &lt;&lt;Global variables&gt;&gt; &lt;&lt;Functions&gt;&gt; &lt;&lt;The main program&gt;&gt; @ We must include the standard I/O definitions, since we want to send formatted output to [[stdout]] and [[stderr]]. &lt;&lt;Header files to include&gt;&gt;= #include &lt;stdio.h&gt; @ </code></pre>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/a-criticism-of-ruby/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Public comments considered harmful</title>
		<link>http://madhadron.com/public-comments-considered-harmful</link>
		<comments>http://madhadron.com/public-comments-considered-harmful#comments</comments>
		<pubDate>Tue, 29 Jan 2013 23:34:17 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[nontechnical]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=299</guid>
		<description><![CDATA[Around the time I left academia, I wrote a rant saying what I thought of bioinformatics. I send it around to my old national consortium in Switzerland, which was used to receiving my rants. My rant was well received. A number of my colleagues have been referring people to it over the last nine months [...]]]></description>
			<content:encoded><![CDATA[<p>Around the time I left academia, I wrote a <a href="http://madhadron.com/?p=263">rant</a> saying what I thought of bioinformatics. I send it around to my old national consortium in Switzerland, which was used to receiving my rants. My rant was well received. A number of my colleagues have been referring people to it over the last nine months or so. How do I know? These are friends of mine and we chat.</p>
<p>Then, Friday morning, one of my pals from Switzerland messaged me, saying that someone had posted it to reddit&#8217;s <a href="http://www.reddit.com/r/bioinformatics/comments/179e9k/a_farewell_to_bioinformatics_since_i_am_about_to/">bioinformatics subgroup</a>, asking what people think about it. It was the first item for a couple days.</p>
<p>From there it went to <a href="http://news.ycombinator.com/item?id=5123022">Hacker News</a> and sat at the top of the front page for hours. <a href="http://www.biostars.org/p/62023/">Other</a> <a href="http://hubski.com/pub?id=65639">fora</a> picked it up. I got emails from their moderators asking me to come take part in the discussion, and I lots of email from people who just wanted to talk to me directly.</p>
<p>If you look at the fora, you will see lots of negative comments. I didn&#8217;t get a single negative email. This struck me as strange, so I went back and counted up all the positive, negative, and neutral comments and emails.</p>
<table>
<tr>
<th>Sentiment</th>
<th>Email</th>
<th>reddit</th>
<th>Hacker News</th>
<th>Biostars</th>
<th>Hubski</th>
</tr>
<tr>
<th>Positive</th>
<td>20</td>
<td>1</td>
<td>21</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<th>Neutral</th>
<td>3</td>
<td>16</td>
<td>119</td>
<td>13</td>
<td>14</td>
</tr>
<tr>
<th>Negative</th>
<td>0</td>
<td>24</td>
<td>21</td>
<td>5</td>
<td>2</td>
</tr>
</table>
<p>The three neutral email were two people inviting me to take part in discussions (which is how I found the discussions on Biostars and Hubski), and one which contained only the line &#8220;sent from my iPad&#8221;. The neutral comments on forums were largely side discussion. Interestingly, the mixed or primarily programming fora (Hacker News and Hubski) had about equal numbers of positive and negative comments. The bioinformatics specific fora (BioStars and the reddit subgroup) were very negative.</p>
<p>The negatives all followed certain themes. Many were ad hominems: &#8220;From looking into this guy a bit (who I&#8217;ve never heard of before today in my 10+ years in the field)&#8230;it does not appear that he completed his PhD after several years of work&#8221;; &#8220;Sounds like a fed up academic with a stick up his backside.&#8221; There were lots of implications that I didn&#8217;t know biology, didn&#8217;t know the state of programming in the non-academic world, didn&#8217;t know bioinformatics, etc.</p>
<p>A number were strawmen. One did try to claim I was wrong based on my own words (&#8220;There are only two computationally difficult problems in bioinformatics, sequence alignment and phylogenetic tree construction.&#8221;), but when asked for another, all he could offer was genome assembly, which is a special case of sequence alignment. One amusing strawman took umbrage with my use of the word &#8220;ept&#8221;, claiming it didn&#8217;t exist. Someone did eventually post the reference to the OED entry.</p>
<p>Others took issue with my tone, saying that it was unacceptable to address people this way, but there was no substantive criticism in public, and no criticism at all in private. That means these people weren&#8217;t concerned that I was wrong, or at least had no stomach to send me a criticism without some kind of public setting where they would be part of a group. Indeed, the original poster on reddit, who was worried as he was about to start a PhD in bioinformatics, didn&#8217;t receive an actual answer.<sup><a href="#footnote1">1</a></sup></p>
<p>There&#8217;s a name for this in circles that study human behavior: <a href="http://ymaa.com/articles/violence-dynamics">group monkey dance</a>. You should follow that link and read Rory&#8217;s article on it, and probably Rory&#8217;s books, too, but here&#8217;s a quick summary: human violence follows patterns. Most fist fights occur in the same way. Married couples will have the same arguments year after year. And social groups will turn on an outsider or perceived betrayer with a brutality that most of the group members would never display individually.</p>
<p>In this setting, a group monkey dance would have emotional outbursts against the transgressor (me), with repeated themes and short on rational argument, which is exactly what we find. Most of these people are folks I could sit down with an have a sensible conversation about bioinformatics. However, I attacked the group which they have made a part of their identity and triggered a group monkey dance.</p>
<p>So what has this whole debacle taught me is that public comment fora encourage group monkey dances, and thus reduce the quality of the discourse on the Internet. For the moment I am setting a policy for myself: I am not participating in public, unmoderated fora. I encourage everyone else to do likewise.</p>
<hr />
<p><a name="footnote1"></a><sup>1</sup> I tried to send him something useful by private message, which I&#8217;ll reproduce here for anyone else who may be similarly disturbed:</p>
<blockquote><p>
Hi, I&#8217;m the author of the piece. A colleague of mine still in the<br />
field pointed out that someone had posted it to reddit. I have no<br />
intention of engaging with the comment thread, but I thought I&#8217;d drop<br />
you a private message.</p>
<p>If you notice, no one provided any substantive criticism of what I<br />
said, no refutation of my points. There were a few strawmen, a few ad<br />
hominems, but no one addressed my actual words. If they&#8217;re words that<br />
would make you want to not do a PhD, then you need to address that.<br />
Figure out what the parts are that unsettle you (aside from the tone,<br />
which was intentionally strident), and go independently find an<br />
answer for yourself. The exercise will, at the very least, give you a<br />
useful overview of some of biology. (As a similar exercise, try<br />
writing a history of the future of the field over the next 50 years.)</p>
<p>Whether you decide to do your PhD or not, this is useful. If it leads<br />
you to do something else&#8211;and you should plan on what you&#8217;re going to<br />
do when you leave academia, since the data says you will, like almost<br />
everyone else&#8211;fine. If it leads you to do your PhD, you&#8217;ll have a<br />
perspective that you can use to choose what you&#8217;ll specialize in, as<br />
opposed to randomly fall into it.</p>
<p>If you do the PhD, though, I warn you: use the perspective to cut<br />
areas out that don&#8217;t interest you, but choose based on the professor.<br />
Your advisor, whether you trust his scientific taste, his<br />
personality, and his skill as a mentor, should be almost the only<br />
criterion in your selection of your research in a PhD. Look at his<br />
students. Are they happy, healthy, making progress? Do they respect<br />
him? What about his former students?</p>
<p>Good luck to you either way.
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/public-comments-considered-harmful/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A rant about &#8220;what programming language should I learn?&#8221;</title>
		<link>http://madhadron.com/a-rant-about-what-programming-language-should-i-learn</link>
		<comments>http://madhadron.com/a-rant-about-what-programming-language-should-i-learn#comments</comments>
		<pubDate>Sun, 04 Nov 2012 22:06:21 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[biology]]></category>
		<category><![CDATA[nontechnical]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=278</guid>
		<description><![CDATA[Someone asked on a forum what language they should learn for bioinformatics. I admit it, I went on a rant. Here it is, for your enjoyment: You&#8217;re asking the wrong question. This is understandable, since the skill level of the bioinformatics community is so low that most of them ask the same wrong question. I&#8217;ll [...]]]></description>
			<content:encoded><![CDATA[<p>Someone asked on a forum what language they should learn for bioinformatics. I admit it, I went on a rant. Here it is, for your enjoyment:</p>
<p>You&#8217;re asking the wrong question. This is understandable, since the skill level of the bioinformatics community is so low that most of them ask the same wrong question. I&#8217;ll even answer it: <a href="http://www.racket-lang.org/">PLT Racket</a> is the best source to learn to program today. But it&#8217;s still the wrong question.</p>
<p>Here&#8217;s the right question: &#8220;What do I need to learn to be able to effectively use the computer as a tool to do biology?&#8221;</p>
<p>Part of the answer will depend on what you&#8217;re kind of science you&#8217;re trying to do, but some topics will be absolutely universal.</p>
<p>You need to learn a general purpose programming language. Here, I&#8217;ll teach you Scheme: (function argument argument argument &#8230;), and that form can go anywhere in each of those slots. For example, <tt>(+ 2 2), (+ (* 3 3) 1), ((if (> 2 3) + -) 1 1)</tt>, <tt>(define (square x) (* x x))</tt>, <tt>(square 4)</tt>. Congratulations. You can learn other languages when you need them. Languages come and languages go (well, except Common Lisp and FORTRAN), and you use what you want.</p>
<p>You need a basic knowledge of data structures and algorithms: big-O notation, singly and doubly linked lists, arrays, binary and n-ary trees, and hash tables. You need to know what a hash function is and why they work. You need to know the general operations for manipulating these data structures, and what they&#8217;re called in your language. You need to know how sorting works (though you needn&#8217;t implement it yourself) and searching on the various data structures. You need to know about the vagaries of floating point, and how to do basic root finding and minimization (Acton&#8217;s &#8216;Real Computing Made Real&#8217; is the best source I know of for this), and how to design and write these algorithms by hand. You must know how pseudorandom number generation works, and have a good generator on hand. The Mersenne Twister is the day-to-day state of the art at this point. You need to know how Monte Carlo methods work, and how to generate random data (a.k.a., simulation).</p>
<p>You need to know how data is represented in the computer. What are bytes and words? How are characters represented? What are the different kinds of integer representations and floating point representations? How are enumerations and symbols represented? How are more complicated data structures like structs laid out in memory? How are the representations laid out in binary file formats? (Hint: binary files are not black magic, they&#8217;re just more data as represented in memory). You need to know the difference between machine code and byte code, compilers and interpreters, and what the relative benefits of each are (note that compilers can be interactive and interpreters batch only &#8212; ignore any assertions to the contrary).</p>
<p>You need to understand recursion and the design of loops via preconditions, postconditions, and loop invariants.</p>
<p>You need to understand relational algebra and be able to manipulate relational databases (SQLite is a good place to start). You need to know what memoization is, and how to implement various forms of it. You need to know how to produce 2D graphics in a clean, composable way, such as recognizing that the data area of a chart represents a new set of coordinates that you&#8217;re transforming to. You need to be able to send and receive HTTP requests, that is, opening a port and sending and receiving messages according to a fixed protocol. You need to be able to write a parser for a file format that isn&#8217;t a bunch of hacked-together regular expressions (go look at Haskell&#8217;s Parsec &#8212; write one for your language). You should understand what Prolog is, how to write in it, and how to [implement](http://okmij.org/ftp/Scheme/sokuza-kanren.scm) a simple one yourself.</p>
<p>You need to be able to produce correct programs. This means knowing what each part of your program is supposed to produce for some cases, being able to easily check that easily (best is stating invariants that another program checks by generating increasingly huge random cases &#8212; see QuickCheck), and being able to reason your way to where the error is in your program rather than trying things at random.</p>
<p>Oh, and learn a modern version control system: git or mercurial. If someone around you already uses one of those two, use what they&#8217;re using. Otherwise, flip a coin.</p>
<p>Those are the universals that will make the computer into a tool for you. Seem like a daunting list? It&#8217;s actually not nearly as bad as it looks, trust me. But what about <em>your science</em>? That&#8217;s the goal, remember: use the computer as a tool to do science. Not as a tool to move data from one file format to another (after you learn about representing data in the machine, you&#8217;ll understand that all file formats are arbitrary). Not as a tool for connecting to NCBI or EMBL or anywhere else. A tool to do science. Don&#8217;t lose sight of that fact. Most bioinformaticists spend between 90% and 100% of their time just messing with file formats. It&#8217;s not science.</p>
<p>Now, to recommend where you go next, you&#8217;ll need to talk about what kind of science you want to do.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/a-rant-about-what-programming-language-should-i-learn/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thoughts from Strange Loop 2012</title>
		<link>http://madhadron.com/thoughts-from-strange-loop-2012</link>
		<comments>http://madhadron.com/thoughts-from-strange-loop-2012#comments</comments>
		<pubDate>Tue, 02 Oct 2012 18:48:31 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=287</guid>
		<description><![CDATA[I. I&#8217;ve talked with more people than I usually do in a month. It&#8217;s run me thin. II. The most visible lack among attendees is historical ignorance of our not-very-old field: rumblings among the NoSQL adherents during Stonebraker&#8217;s talk on VoltDB, not realizing that he has experience implementing databases comparable to the whole NoSQL community [...]]]></description>
			<content:encoded><![CDATA[<h2>I.</h2>
<p>I&#8217;ve talked with more people than I usually do in a month. It&#8217;s run me thin.</p>
<h2>II.</h2>
<p>The most visible lack among attendees is historical ignorance of our not-very-old field: rumblings among the NoSQL adherents during Stonebraker&#8217;s talk on VoltDB, not realizing that he has experience implementing databases comparable to the whole NoSQL community combined; Orenstein and Hernstadt reinventing the tree database with a relational adaption layer, not realizing that they were recapitulating something abandoned in the 1980&#8217;s, and looking surprised when part of the audience told them openly that ORMs are a terrible idea; the Bandicoot team during the emerging languages camp, reimplementing capabilities already in PostgreSQL, apparently driven by no more than a need for a C-like syntax.</p>
<p>Codd&#8217;s relational model was an upstart in a market of graph, object, tree, and various hybrids and other exotic databases. I do not claim that it the relational model is superior because it survived. Most entrenched artifacts of computing are fixed by social factors, not technical ones. The relational model is superior because it described the semantics a data store needed to specify for most real world use; made the translation of declarative queries into operations a finite, approachable problem; and unified all the expressive power of the other kinds of database in a single mathematical model. SQL, the language, was the result of a natural language project at IBM. It is not intrinsic. The underlying model, as Codd defined it, <em>is</em> intrinsic.</p>
<p>Puzzlingly, while we regard a programmer ignorant of the difference between an array and a linked list as incompetent, one ignorant of the relational model can bluster his way into respect from his colleagues.</p>
<h2>III.</h2>
<p>When I was a teenager, I suffered from an overabundance of generalizations. Years of experience have beaten them out of me. I still looked back on them with some chagrin. I&#8217;m not unusual. So why do the young spout generalizations? They imitate what they perceive their elders doing.</p>
<p>The problem is the perception. Their elders have either become snake oil salesmen or are chagrined at their own youth and focused on technical material. However, a mind with ten additional years of training and ten additional years of experience can make much more general statements while remaining, in its own conception, precise, technical, and on firm ground. Ten years, twenty years, thirty years, and at some point the generalizations start to seem like magic to those starting out. They are perceived as generalizations, though they are precise statements.</p>
<p>My advice to the young: present hard, technical material that you understand intimately. My advice to the old is the same.</p>
<h2>IV.</h2>
<p>Rich Hickey is engaged in the semantic terraforming of the JVM. The language has fallen beneath his juggernaut, and he has rolled on to the database. Go Rich!</p>
<h2>V.</h2>
<p>Spend half an hour with Elm for the sheer pleasure of seeing a functional reactive programming environment in the wild. Spend half an hour with Julia and mourn for the pain of physicists and engineers who have suffered MATLAB and its kin down through the years.</p>
<h2>VI.</h2>
<p>What do you mean you don&#8217;t know about unification? Go learn to write a Prolog!</p>
<h2>VII.</h2>
<p>Please stop recommending anything besides PLT Racket and &#8216;How to Design Programs&#8217; for new programmers. Just because you wasted time when you were learning doesn&#8217;t mean everyone else has to. And if you prototype new languages in anything else, you&#8217;re a fool.</p>
<p>Oddly, it&#8217;s also the easiest environment to create stand alone, cross platform GUI applications in that&#8217;s available today.</p>
<h2>VIII.</h2>
<p>The 800lb gorilla in the room: JavaScript. We all know that it has to die. It speaks to our ignorance of the social processes governing our field that we have no idea how to go about killing it.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/thoughts-from-strange-loop-2012/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A better model of programmer temperment</title>
		<link>http://madhadron.com/a-better-model-of-programmer-temperment</link>
		<comments>http://madhadron.com/a-better-model-of-programmer-temperment#comments</comments>
		<pubDate>Wed, 22 Aug 2012 23:31:32 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[nontechnical]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=280</guid>
		<description><![CDATA[Steve Yegge stirred up a bunch of controversy recently with a simple model of programmer attitudes: programming has a political axis ranging from liberal to conservative. He declares static typing to be conservative, dynamic typing to be liberal, and a bunch of other such things. Of course, the model is wrong. To paraphrase Box, &#8220;all [...]]]></description>
			<content:encoded><![CDATA[<p>Steve Yegge stirred up a bunch of controversy recently with a <a href="https://plus.google.com/u/0/110981030061712822816/posts/KaSKeg4vQtz">simple model of programmer attitudes</a>: programming has a political axis ranging from liberal to conservative. He declares static typing to be conservative, dynamic typing to be liberal, and a bunch of other such things.</p>
<p>Of course, the model is wrong. To paraphrase Box, &#8220;all models are wrong; some are useful&#8221;. This one is useful in a limited way: it makes explicit the vast gaps between how different programmers think about programming. The observations that led him to this model are extremely valuable.</p>
<p>But given those observations, I think we can have a much more useful model, one that has both better descriptive power and eschews the loaded terms &#8220;conservative&#8221; and &#8220;liberal&#8221;.</p>
<p><b>Digression</b>: Don&#8217;t build models in analogy to political spectra. The notion of a political spectrum is actually quite recent, dating from the French Revolution. In the meeting of the estates general, the third estate sat to the left of the entrance; the first and second estates sat to the right. Thus we have the political left and the political right. Prior to that, the closest thing to a spectrum we had was the <a href="https://en.wikipedia.org/wiki/Guelphs_and_Ghibellines">Guelphs versus the Ghibbelines</a> in Renaissance Italy. The political spectrum in France led to one side killing everyone in the other side. No use of it or analogy to it since then has been noticeably more productive. Using it is a good heuristic that your model could be profitably reformulated. <b>End Digression.</b></p>
<p>I&#8217;m going to shamelessly steal Scott McCloud&#8217;s <a href="http://scottmccloud.com/4-inventions/triangle/index.html">Big Triangle</a> model. You should go read his presentation. It&#8217;s a brilliant piece of work and will forever change how you think about graphic arts. Then go buy his book <em>Understanding Comics</em>. Trust me. It&#8217;s worth every penny.</p>
<p>The model space is, appropriately, a big triangle:</p>
<p><center><img src="http://madhadron.com/wp-content/uploads/2012/08/intro-triangle1.png" alt="" title="intro-triangle" width="411" height="206" class="aligncenter size-full wp-image-284" /></center></p>
<p>The programmers who fall along the left side of the triangle think in terms of specification. They are writing instructions in a language with semantics. Those instructions result in actions when executed by something that obeys those semantics.</p>
<p>Programmers along the right side think in terms of behavior. There is some real system with which they interact to cause certain behaviors. They have a mental model of the system which guides them, but in the end the units of their thought and action are behavioral actions.</p>
<p>Languages are designed to support certain areas of the triangle. Haskell lies not too far from the specificational side, and theorem provers like Coq and the dependently typed languages like Cayenne approach it. Assembly is on the behavioral side by definition (unless you&#8217;re an electrical engineer, in which case it&#8217;s pretty far towards specification).</p>
<p>However there are very different levels within this. Lambda calculus is an assembly language for a very certain kind of a machine, but placing it near the machine instructions for an Intel processor is somewhat ludicrous. This leads to our vertical axis: the farther towards the upper point someone lies, the more abstractly mathematical their thought about computing. </p>
<p>This may be better phrased as how much they regard entities in the machine as immutable artifacts. A non-programmer, on the bottom side of the triangle, must accept the specified interaction modes of his machine as artifacts which he cannot change. Systems administrators live a few steps up, still primarily in the world of artifacts. Most programmers live in a world where something things are artifacts and some aren&#8217;t. For many, languages are artifacts they regard as immutable, but go talk to a compiler specialist or a smug Lisp weeny, and languages become mutable thoughtstuff. Go farther up and the machine, and even what logic you work in, become mutable things at the mercy of the programmer&#8217;s mind.</p>
<p>And now we come to why it&#8217;s a triangle. The gap between a specificational systems administrator and a behavioral one is vast. A specificational one will work entirely through Chef and Puppet, Fabric to bring up machines, a whole layer of programs meant to provide a semantics in which he can specify his systems. A behavioral one logs into each machine and runs shell commands on it, possibly collecting his commands into scripts. At the top of the triangle, a computational theorist working on higher order logic has no difference between specification and behavior. The question makes no sense for him.</p>
<p>So to summarize, </p>
<p><center><img src="http://madhadron.com/wp-content/uploads/2012/08/intro-triangle-11.png" alt="" title="intro-triangle-1" width="411" height="206" class="aligncenter size-full wp-image-283" /></center></p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/a-better-model-of-programmer-temperment/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Calculating pagination bounds for display</title>
		<link>http://madhadron.com/calculating-pagination-bounds-for-display</link>
		<comments>http://madhadron.com/calculating-pagination-bounds-for-display#comments</comments>
		<pubDate>Wed, 15 Aug 2012 22:34:15 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[math]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=267</guid>
		<description><![CDATA[tl;dr: How do you calculate the limits for a pagination widget like &#8220;« 4 5 6 7 »&#8221;? Number the pages starting from 0. Then the first page to display startPage and one page after the last page to display endPage (4 and 8 in the example above) are given by $startPage = [ (currentPage [...]]]></description>
			<content:encoded><![CDATA[<p><b>tl;dr</b>: How do you calculate the limits for a pagination widget like &#8220;« 4 5 6 7 »&#8221;? Number the pages starting from 0. Then the first page to display <i>startPage</i> and one page after the last page to display <i>endPage</i> (4 and 8 in the example above) are given by</p>
<blockquote><p>
$startPage = [ (currentPage - \lceil window/2 \rceil + 1) \  \downarrow \  Npages - window ] \downarrow 0$<br />
$endPage = [ (currentPage + \lfloor window/2 \rfloor + 1) \  \uparrow \  window] \downarrow Npages$
</p></blockquote>
<p>where</p>
<ul>
<li><i>currentPage</i> is the currently displayed page.</li>
<li><i>window</i> is the number of page numbers to display in the widget.</li>
<li><i>Npages</i> is the total number of pages.</li>
<li>$\lceil \cdot \rceil$ is the ceiling function.
<li>$\lfloor \cdot \rfloor$ is the floor function.
<li>$a \uparrow b$ is the maximum of $a$ and $b$.</li>
<li>$a \downarrow b$ is the minimum of $a$ and $b$.</li>
</ul>
<p>That&#8217;s the short version. Now how do we derive that? </p>
<p>First, a warning. Don&#8217;t use page numbers as primitive objects unless you&#8217;ll only have one item per page. If you&#8217;re going to list multiple items per page, tracking page numbers instead of the first item to display on a page will make your code unbelievably complicated when you have to change the number of items displayed per page, or when you want to show a list of items beginning with a specific index, rather than a particular page.</p>
<p>So label your items from 0 to $N$, and let $currentItem$ be the index of the first item to be displayed on the current page. If each page is to show at most $pageSize$ entries (&#8220;at most&#8221; because there may not be enough items to fill the last page to exactly $pageSize$), then $currentPage = \lfloor currentItem / pageSize \rfloor$, and the $i$th page will start at $i \cdot pageSize$. There will be $Npages = \lceil N / pageSize \rceil$ pages in all.</p>
<p>Now let&#8217;s move on to deriving the expressions for the actual pagination range. What, precisely, is the problem? For a particular integer between $0$ and $Npages$, we want to find a range of numbers of a fixed size which also fall in $[0, Npages]$ and are as centered as possible around $currentPage$. As usual, we&#8217;ll describe the range as a half open interval by the first page in the range $startPage$ and one after the last page in the range $endPage$. For example, in “« 4 5 6 7 »”, $startPage = 4$ and $endPage = 8$. Let the size of the desired range be $window$.</p>
<p>$endPage &#8211; startPage = window$, but we need a second constraint to center $currentPage$ as much as possible in this range. If $window$ is odd, then we want the same number of entries on both sides of the current page number. For example, if we are on page 3 and $window = 5$, We should display “« 1 2 3 4 5 »”. If $window$ is even, we must make a choice: do we want to display more page numbers after the current one, or more before? I chose to show more numbers after on the theory that a user is more interested in stuff he has not yet reached. This constraint translates to $currentPage &#8211; startPage = endPage &#8211; currentPage &#8211; 1 &#8211; (1 &#8211; window \mathbf{mod} 2)$</p>
<p>So, centering the current page in the control, $currentPage &#8211; startPage$ should be one less than $endPage &#8211; currentPage$ since $startPage$ is the first element and $endPage$ is one after the last element. So if $window$ is odd, $currentPage &#8211; startPage = endPage &#8211; currentPage &#8211; 1$. If $window$ is odd, since I chose to show more following pages, it is $currentPage &#8211; startPage = endPage &#8211; currentPage &#8211; 2$. We can combine all these expressions to get</p>
<blockquote><p>
$currentPage &#8211; startPage = endPage &#8211; currentPage &#8211; 1 &#8211; (1 &#8211; window \mathbf{mod} 2)$<br />
$window = endPage &#8211; startPage$
</p></blockquote>
<p>which we&#8217;ll solve for $startPage$ and $endPage$. We&#8217;ll be using three properties of the floor and ceiling functions in the calculations below, which I&#8217;ll summarize here:</p>
<blockquote><p>
$x = \lfloor x/2 \rfloor + \lceil x/2 \rceil$<br />
$\lfloor x/2 \rfloor = x/2 &#8211; (x\ \mathbf{mod}\ 2)/2$<br />
$\lceil x/2 \rceil = x/2 + (x\ \mathbf{mod}\ 2)/2$
</p></blockquote>
<p>We calculate thus:</p>
<blockquote><p>
$currentPage &#8211; startPage = endPage &#8211; currentPage &#8211; 1 &#8211; (1 &#8211; window \mathbf{mod} 2)$<br />
$\ \wedge\ window = endPage &#8211; startPage$
</p></blockquote>
<p>$\equiv$ { solve first equation for $startPage$ and second equation for $endPage$ }</p>
<blockquote><p>
$startPage = -endPage &#8211; 2 \cdot currentPage + 2 &#8211; window\ \mathbf{mod}\ 2$<br />
$\ \wedge\ endPage = startPage + window$
</p></blockquote>
<p>$\equiv$ { substitute $endPage$ for $startPage$ in first equation }</p>
<blockquote><p>
$startPage = -startPage &#8211; window &#8211; 2 \cdot currentPage + 2 &#8211; window\ \mathbf{mod}\ 2$<br />
$\ \wedge\ endPage = startPage + window$
</p></blockquote>
<p>$\equiv$ { group $startPage$ and simplify first equation }</p>
<blockquote><p>
$startPage = currentPage &#8211; (window/2 + (window\ \mathbf{mod}\ 2)/2) + 1$<br />
$\ \wedge\ endPage = startPage + window$
</p></blockquote>
<p>$\equiv$ { $x/2 + (x\ \mathbf{mod}\ 2)/2 = \lceil x/2 \rceil$ }</p>
<blockquote><p>
$startPage = currentPage &#8211; \lceil window/2 \rceil + 1$<br />
$\ \wedge\ endPage = startPage + window$
</p></blockquote>
<p>$\equiv$ { substitute for $startPage$ in second equation }</p>
<blockquote><p>
$startPage = currentPage &#8211; \lceil window/2 \rceil + 1$<br />
$\ \wedge\ endPage = currentPage &#8211; \lceil window/2 \rceil + 1 + window$
</p></blockquote>
<p>$\equiv$ { $x = \lfloor x/2 \rfloor + \lceil x/2 \rceil$ }</p>
<blockquote><p>
$startPage = currentPage &#8211; \lceil window/2 \rceil + 1$<br />
$\ \wedge\ endPage = currentPage + \lfloor window/2 \rfloor + 1$
</p></blockquote>
<p>This works in the middle of the range. At the edges, we have to calculate a different window. The first page is 0 and can never be less. The last page $Npages-1$ and can never be more. At the left end, $startPage=0$ and $endPage=window$. At the right end, $startPage = Npages &#8211; window$ and $endPage = Npages$. We augment the expressions for $startPage$ and $endPage$ to handle these cases (note that the order of operations is important):</p>
<blockquote><p>
$startPage = [ (currentPage - \lceil window/2 \rceil + 1) \  \downarrow \  Npages - window ] \downarrow 0$<br />
$endPage = [ (currentPage + \lfloor window/2 \rfloor + 1) \  \uparrow \  window] \downarrow Npages$
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/calculating-pagination-bounds-for-display/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Empirical evidence</title>
		<link>http://madhadron.com/empirical-evidence</link>
		<comments>http://madhadron.com/empirical-evidence#comments</comments>
		<pubDate>Sat, 26 May 2012 00:35:47 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[biology]]></category>
		<category><![CDATA[nontechnical]]></category>
		<category><![CDATA[physics]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=266</guid>
		<description><![CDATA[On a mailing list, someone asked: &#8220;Isn&#8217;t empirical evidence found anywhere? From anybody?&#8221; No. Absolutely not. Empirical evidence comes from doing the experiment yourself, or as close as you can get to it. If I take the average of the guesses of the peasantry as to the length of the emperor of China&#8217;s nose, why [...]]]></description>
			<content:encoded><![CDATA[<p>On a mailing list, someone asked: &#8220;Isn&#8217;t empirical evidence found anywhere? From anybody?&#8221;</p>
<p>No. Absolutely not. Empirical evidence comes from doing the experiment yourself, or as close as you can get to it. If I take the average of the guesses of the peasantry as to the length of the emperor of China&#8217;s nose, why should I expect it to be at all related to the actual length? I should go and measure it myself. Or at least get his valet drunk and ask him if I can&#8217;t actually abduct the emperor, cut off his nose, and lay it next to a ruler myself.</p>
<p>The only empirical evidence that you can get from asking a peasant how long the emperor of China&#8217;s nose is, is what that peasant is willing to publicly state as his guess as to the length when asked by some random person he doesn&#8217;t know who&#8217;s bothering him out of the blue&#8230;and even that&#8217;s biased to peasants that will answer you as opposed to chasing you out of their field with a hoe.</p>
<p>As for the links you sent, unless you&#8217;ve actually read and evaluated the original papers, you can&#8217;t have a prayer of knowing what the truth of the situation is. Even if you read the original papers you may not, since they may all be unusable and broken. Actually, the majority of the published literature is. It&#8217;s all too easy to cherry-pick a few papers to support your position, just like you can round up all the folks who&#8217;ve actually measured the length of the emperor&#8217;s nose into a stockade and demand under a hot light that they tell you the result of their measurement, but it&#8217;s so easy to assume that those guys with the overly long measurements were just trying to tell you what you wanted to hear since you were applying the light pretty close to their pasty, white faces.</p>
<p>Want to have an informed opinion? Go cut off the nose yourself. Or go actually run your own experiment.</p>
<p>Obesity rates? Set up a survey sample. Hint: the proper way to do this is to intern the population of your test area, assign them numbers, pick at random from them (and make sure you weigh any who were shot trying to escape before disposing of the bodies, in case their numbers are picked), and then strip, hogtie, and weigh those selected. Unfortunately, this is illegal in the USA unless you&#8217;re only interested in numbers from those who are Japanese or black.</p>
<p>Autism-vaccine link? Go vaccinate a bunch of kids, and inject the others with water. Stick &#8216;em in isolation bubbles so they don&#8217;t die of anything, or you&#8217;re confounded by kids killed by stuff they weren&#8217;t vaccinated with. Be sure to have a lime pit on hand to dispose of the bodies of parents trying to rescue their kids. Warning: this kind of longitudinal study takes a great deal of time. A decade or so. Again, there are legal problems with this.</p>
<p>That is what I know about empirical evidence.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/empirical-evidence/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Style in technical writing</title>
		<link>http://madhadron.com/style-in-technical-writing</link>
		<comments>http://madhadron.com/style-in-technical-writing#comments</comments>
		<pubDate>Tue, 17 Apr 2012 06:04:05 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[nontechnical]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=265</guid>
		<description><![CDATA[Several discussions with technical writers have shown me that I hold a very peculiar view: there is a uniform good prose style for technical writing in English. There are distinct voices within that style&#8212;no one would confuse Feynman&#8217;s writing with Dijkstra&#8217;s, nor the section by section layout of Landau and Lifshitz with Unix man pages&#8212;but [...]]]></description>
			<content:encoded><![CDATA[<p>Several discussions with technical writers have shown me that I hold a very peculiar view: there is a uniform good prose style for technical writing in English. There are distinct voices within that style&#8212;no one would confuse Feynman&#8217;s writing with Dijkstra&#8217;s, nor the section by section layout of Landau and Lifshitz with Unix man pages&#8212;but the variations are relatively small.</p>
<p>Let me begin with a series of exhibits of small scale prose style, taken from documents that I have read over the years and that I recalled as clear and informative. Each exhibit is a paragraph or two long, chosen more for its ability to stand alone than for any stylistic distinction.</p>
<p><b>Edsger W. Dijkstra, <a href="http://www.cs.utexas.edu/users/EWD/transcriptions/EWD12xx/EWD1213.html">EWD1213: Introducing a course on calculi</a></b>:</p>
<blockquote>
<p>&#8220;The functions I grew up with, such as the sine, the cosine, the square root, and the logarithm were almost exclusively real functions of a real argument. Sometimes, but not very explicitly, the argument could be allowed to be a little bit more complicated, such as the maximum of two values, a real function defined on a pair of real values, or on two values: whether the maximum was a function of one argument (which had to be a pair) or was a function of two arguments (each of which was a real number) was a question that was avoided. We didn&#8217;t talk about the types of the function arguments, we did not talk about the types of the function values either: we did not need to, for to all intents and purposes, all our functions were of type real. (Later this was extended to the type complex, but that was about it.) The net effect was that I was extremely ill-equipped to appreciate functional programming when I encountered it: I was, for instance, totally baffled by the shocking suggestion that the value of a function could be another function.&#8221;</p>
</blockquote>
<p><b>Richard Feynman, <a href="http://www.lhup.edu/~DSIMANEK/cargocul.htm">Cargo Cult Science</a></b>:</p>
<blockquote>
<p>&#8220;I think the educational and psychological studies I mentioned are examples of what I would like to call cargo cult science. In the South Seas there is a cargo cult of people. During the war they saw airplanes land with lots of good materials, and they want the same thing to happen now. So they&#8217;ve arranged to imitate things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas&#8211;he&#8217;s the controller&#8211;and they wait for the airplanes to land. They&#8217;re doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn&#8217;t work. No airplanes land. So I call these things cargo cult science, because they follow all the apparent precepts and forms of scientific investigation, but they&#8217;re missing something essential, because the planes don&#8217;t land.&#8221;</p>
</blockquote>
<p><b>Rob Pike, <a href="http://www.lysator.liu.se/c/pikestyle.html">Notes on Programming in C</a></b>:</p>
<blockquote>
<p>&#8220;I argue that clear use of function pointers is the heart of object-oriented programming. Given a set of operations you want to perform on data, and a set of data types you want to respond to those operations, the easiest way to put the program together is with a group of function pointers for each type. This, in a nutshell, defines class and method. The OO languages give you more of course&#8212;prettier syntax, derived types and so on&#8212;but conceptually they provide little extra.</p>
<p>Combining data-driven programs with function pointers leads to an astonishingly expressive way of working, a way that, in my experience, has often led to pleasant surprises. Even without a special OO language, you can get 90% of the benefit for no extra work and be more in control of the result. I cannot recommend an implementation style more highly. All the programs I have organized this way have survived comfortably after much development&#8212;far better than with less disciplined approaches. Maybe that&#8217;s it: the discipline it forces pays off handsomely in the long run.&#8221;</p>
</blockquote>
<p><a href="http://www.openbsd.org/cgi-bin/man.cgi?query=ln"><b>OpenBSD man page for ln</b></a>:</p>
<blockquote>
<p>&#8220;The ln utility creates a new directory entry (linked file) which has the same modes as the original file. It is useful for maintaining multiple copies of a file in many places at once without using up storage for the copies; instead, a link &#8220;points&#8221; to the original copy. There are two types of links: hard links and symbolic links. How a link points to a file is one of the differences between a hard and symbolic link.&#8221;</p>
</blockquote>
<p>The four passages are obviously written by different people, but examine them again. The similarity of the OpenBSD man page to Rob Pike&#8217;s style is unsurprising, since Pike was deeply involved in the early days of Unix. Feynman, though is as much of an outlier as I can think of. His passage is a transcription of a speech, with &#8216;So&#8217;s&#8217; starting sentences are connectors, but examine the feel of it crossing your mind. It is technical prose uttered in a New York accent, but still English technical prose in technical prose&#8217;s singular style.</p>
<p>I encourage you to read the full texts from which my four excerpts are taken. They aren&#8217;t long. I believe one characteristic imposes this prose style more than any other: throughout each text, its author knew exactly what pattern he wished to conjure in his reader&#8217;s mind. Each eschewed jargon except well buttressed by a couple well chosen examples, since free floating jargon abdicates all idea of what the reader shall make of your text. The thought has been arranged into a set of sentences that take the reader nearly as directly as possible through it. The sentences are direct. At the end we are in no doubt what each passage says, though any paraphrase we make is likely to be less direct.</p>
<p>This uniformity in style applies to the larger structure of a piece as well. I mentioned in the first paragraph Landau and Lifshitz&#8217;s course of theoretical physics and the OpenBSD man pages. The ten volumes of Landau and Lifshitz are organized into sections of a few pages in length, each covering a particular topic or calculation. Each section represents about forty five minutes of study. They are organized into a handful of chapters for convenience, but there are no further levels of subdivision. OpenBSD&#8217;s man pages have one page per command, and in each page a uniform succession of sections describing its use. Feynman&#8217;s lectures on physics, are divided into chapters and sections, though of slightly different proportions than Landau and Lifshitz.</p>
<p>Why two levels, again and again? Readers loathe deeply nested organizations. If you don&#8217;t believe me, examine your reaction to &#8220;Paragraph 3(c) of section 12, volume 324.&#8221; Compare it to &#8220;section 325, about two thirds of the way through.&#8221; Yet we avoid the other extreme. Novels, for example, still use chapters, which are a holdover from the days when books were written on a series of scrolls (chapter derives from the Latin &#8220;capitulum&#8221;, the word for a single such scroll). Longer novels even add a larger division into &#8220;books&#8221; or &#8220;volumes&#8221;, again largely vestigial since most novels are published in one volume today. Chapters, and books for larger masses of material, are guideposts in the books that have escaped extinction by being useful. We may take the degree of organization in novels as an evolved optimum for human use. The breaks serve only as guideposts. There is no reason to expect them to behave differently in technical prose, particularly given the ubiquity of find commands on today&#8217;s computers.</p>
<p>The OpenBSD man page shows another important distinction: instruction is separated from reference. The paragraph quoted is followed by a formatted list specifying in detail the arguments the command will take. This occurs again and again: the precise grammar of Algol 60 in section 2 of the <a href="http://www.masswerk.at/algol60/report.htm">Algol 60 report</a> is separated from an purely instructive introduction in section 1; the <a href="http://pdg.lbl.gov/2002/contents_sports.html">Particle Data Group</a>&#8217;s handbooks are vital references, but make no attempt to explain what the numbers they contain are, merely capture all the details necessary to use them. Attempting to instruct while providing reference material either swamps the neophyte with detail he cannot yet absorb, while denying the expert the details he doubtless needs if he has turned to a text at all.</p>
<p>All this can be summarized thus:</p>
<ul>
<li>Know precisely the effect you wish to engender in the reader, and keep to it.</li>
<li>Make all jargon that you cannot eschew appear linked to a set of examples.</li>
<li>Clearly separate instruction from reference.</li>
<li>Organize your material in the shallowest structure which allows readers to easily recover some remembered place in the text.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/style-in-technical-writing/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting organized, a letter to my girlfriend</title>
		<link>http://madhadron.com/getting-organized-a-letter-to-my-girlfriend</link>
		<comments>http://madhadron.com/getting-organized-a-letter-to-my-girlfriend#comments</comments>
		<pubDate>Mon, 26 Mar 2012 17:55:28 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[nontechnical]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=264</guid>
		<description><![CDATA[My dear, I think I haven&#8217;t explained what you&#8217;re trying to accomplish well. You&#8217;ve been reading the books and watching the videos, and no one has told you what all this machinery is for. It&#8217;s a way for your future self to promise something to your current self. Whenever something occurs to you, you can [...]]]></description>
			<content:encoded><![CDATA[<p>My dear,</p>
<p>I think I haven&#8217;t explained what you&#8217;re trying to accomplish well. You&#8217;ve been reading the books and watching the videos, and no one has told you what all this machinery is for. It&#8217;s a way for your future self to promise something to your current self.</p>
<p>Whenever something occurs to you, you can do one of three things:</p>
<ol>
<li>You can do something about it immediately.</li>
<li>You can let it sit in your head and worry about it.</li>
<li>You can receive a promise <i>that you believe</i> from your future self that it will get taken care of.</li>
</ol>
<p>All this machinery, all this work, is purely so you can replace (2) with (3) for everything in your life. That&#8217;s it.</p>
<p>What makes you believe a promise someone as made you? Here&#8217;s another list of three things:</p>
<ol>
<li>It&#8217;s in their interest to keep the promise.</li>
<li>If they can&#8217;t keep their promise, they will talk to you about it so you can make other arrangements.</li>
<li>In the past they have kept their promises.</li>
</ol>
<p>This is what you need from your future self. (1) should be easy. It&#8217;s you on both ends. I&#8217;ll talk about (2) for the rest of this text, but I need to get (3) out of the way. (3) is what most often goes wrong. You can only arrive at (3) from experience. It&#8217;s the basic problem of any organizational system. It doesn&#8217;t matter what tools it&#8217;s in, or what fancy name it has, or even if it works or not. You cannot trust it until you have worked in it long enough for the ancient, subconscious part of your brain to believe that it will work. Before that, all you can hope for is a theoretical understanding of why it will work and enough immediate value to get through the trust building period. That is a period of months or years, not days or weeks.</p>
<p>Now back to (2). How do you make promises to yourself so that they will get done or you will be notified? A written record somewhere is a start. Put a note on your refrigerator, or whatever other surface you look at on a regular basis. It will catch your attention later and you will deal with it later.</p>
<p>Now try that with a hundred promises. You start running out of space, and any individual promise won&#8217;t attract your attention, just the wall of promises. You won&#8217;t believe those promises anymore unless you regularly go through each one individually, look at it, and do it or remake the promise that it will get done later.</p>
<p>That&#8217;s it: reviewing your big list of promises on a regular basis. Everything else is details.</p>
<p>However, the details are important. A promise to keep a dentist appointment must be handled on a specific day, so you would have to review the whole mass of promises every day to reliably keep that promise to yourself. If you have a bunch of meetings, you would have to review it many times a day. It will eat all your time.</p>
<p>Worse, you probably won&#8217;t have your refrigerator with you when you&#8217;re at the grocery store, so you can&#8217;t review any promises about buying dog food or milk. You need a more articulated system to hold the promises.</p>
<p>Corral the promises that have specific times associated with them off to the side and sort them. There&#8217;s a calendar. Put the promises about buying dog food or milk at the store on a piece of paper and put it in your wallet. There&#8217;s a grocery list. Some of the promises you can immediately do (&#8220;Throw that plant off balcony&#8221;) and others aren&#8217;t so straightforward (&#8220;See the Pont-de-Gard and die&#8221;). Separate the two kinds, and then when you&#8217;re figuring out what to do next you can review just the doable ones. The others, well, you may as well make some small promise to yourself that will get your closer to those big promise, and you might as well make those small promises immediately doable and put it in the other category.</p>
<p>It goes on like this. It&#8217;s all detail, there to make the sheer bulk of it all manageable. It&#8217;s important detail, but it&#8217;s detail. If you&#8217;re a monk with an utterly fixed schedule and only a few responsibilities, you would probably do just fine with notes on a refrigerator.</p>
<p>It&#8217;s easy to get lost in the details, to try to get it perfect. It doesn&#8217;t have to be perfect. The goal isn&#8217;t to be organized. It&#8217;s to be able to keep promises to yourself over the long term. Start, and make a promise to yourself: &#8220;Improve my system.&#8221; Scribble any annoyances of your system there. They&#8217;re promises, too. Fix it all a little bit at a time.</p>
<p>I recommend implementing on a combination of plain paper or plain text files at first. The goal is to keep promises to yourself long enough to have some trust in your system. Until you keep promises, you won&#8217;t get anywhere. You don&#8217;t need pretty tools, or expensive tools, or just the right pen. You just need to be able to get all your promises to yourself in one place and review them. Are your stacks of paper getting unmanageable? Get some file folders and binders and paperclips and solve the worst annoyances. Is there something annoying about your text files? Try organizing them a little differently or switch to a more powerful text editor. They&#8217;re just text files. Switching is trivial. Sharing text files is easy. Put them in a shared folder in Dropbox or Sparkleshare and you&#8217;re done.</p>
<p>Choose a basic convention for how you format your text files. Tweak it over time as you find annoyances. Maybe you start with todo items listed as</p>
<pre>
- Do this
- And do this
</pre>
<p>and you delete a line when you do the item. Perhaps you want to review them later, or you need to see what you already accomplished on a list. Try</p>
<pre>
- [ ] Do this
- [ ] And do this
</pre>
<p>and when you finish something, put an X between the brackets. A lot of text editors will let you set up coloring so that any line starting with &#8220;- [X]&#8221; will be light and those beginning with &#8220;- [ ]&#8221; will be black. Maybe you have too many files to comfortably navigate in your current text editor. Mess with your tools until it&#8217;s comfortable. Don&#8217;t try to make it perfect, just try to fix any nuisances that are slowing you down. A long line of little changes leads to a system so smooth that you won&#8217;t remember it&#8217;s there.</p>
<p>Sometimes you can&#8217;t use text files. Your job uses a Microsoft Outlook calendar, and you want to share your calendar with me. The constraints have just forced something more complicated than a text file, so take that one piece and make it work. Leave the rest as simple as possible, in a format that depends on no special piece of software, no particular operating system or computer.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/getting-organized-a-letter-to-my-girlfriend/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A farewell to bioinformatics</title>
		<link>http://madhadron.com/a-farewell-to-bioinformatics</link>
		<comments>http://madhadron.com/a-farewell-to-bioinformatics#comments</comments>
		<pubDate>Mon, 26 Mar 2012 05:22:00 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[biology]]></category>
		<category><![CDATA[nontechnical]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=263</guid>
		<description><![CDATA[I&#8217;m leaving bioinformatics to go work at a software company with more technically ept people and for a lot more money. This seems like an opportune time to set forth my accumulated wisdom and thoughts on bioinformatics. My attitude towards the subject after all my work in it can probably be best summarized thus: &#8220;Fuck [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m leaving bioinformatics to go work at a software company with more technically ept people and for a lot more money. This seems like an opportune time to set forth my accumulated wisdom and thoughts on bioinformatics.</p>
<p>My attitude towards the subject after all my work in it can probably be best summarized thus: &#8220;Fuck you, bioinformatics. Eat shit and die.&#8221;</p>
<p>Bioinformatics is an attempt to make molecular biology relevant to reality. All the molecular biologists, devoid of skills beyond those of a laboratory technician, cried out for the mathematicians and programmers to magically extract science from their mountain of shitty results.</p>
<p>And so the programmers descended and built giant databases where huge numbers of shitty results could be searched quickly. They wrote algorithms to organize shitty results into trees and make pretty graphs of them, and the molecular biologists carefully avoided telling the programmers the actual quality of the results. When it became obvious to everyone involved that a class of results was worthless, such as microarray data, there was a rush of handwaving about &#8220;not really quantitative, but we can draw qualitative conclusions&#8221; followed by a hasty switch to a new technique that had not yet been proved worthless.</p>
<p>And the databases grew, and everyone annotated their data by searching the databases, then submitted in turn. No one seems to have pointed out that this makes your database a reflection of your database, not a reflection of reality. Pull out an annotation in GenBank today and it&#8217;s not very long odds that it&#8217;s completely wrong.</p>
<p>Compare this with the most important result obtained by sequencing to date: Woese et al&#8217;s discovery of the archaea. (Did you think I was going to say the human genome? Fuck off. That was a monument to the vanity of that god-bobbering asshole Francis Collins, not a science project.) They didn&#8217;t sequence whole genomes, or even whole genes. They sequenced a small region of the 16S rRNA, and it was chosen after pilot experiments and careful thought. The conclusions didn&#8217;t require giant computers, and they didn&#8217;t require precise counting of the number of templates. They knew the limitations of their tools.</p>
<p>Then came clinical identification, done in combination with other assays, where a judicious bit of sequencing could resolve many ambiguities. Similarly, small scale sequencing has been an incredible boon to epidemiology. Indeed, its primary scientific use is in ecology. But how many molecular biologists do you know who know anything about ecology? I can count the ones I know on one hand.</p>
<p>And sequencing outside of ecology? Irene Pepperberg&#8217;s work with Alex the parrot dwarfs the scientific contributions of all other sequencing to date put together.</p>
<p>This all seems an inauspicious beginning for a field. Anything so worthless should quickly shrivel up and die, right? Well, intentionally or not, bioinformatics found a way to survive: obfuscation. By making the tools unusable, by inventing file format after file format, by seeking out the most brittle techniques and the slowest languages, by not publishing their algorithms and making their results impossible to replicate, the field managed to reduce its productivity by at least 90%, probably closer to 99%. Thus the thread of failures can be stretched out from years to decades, hidden by the cloak of incompetence.</p>
<p>And the rhetoric! The call for computational capacity, most of which is wasted! There are only two computationally difficult problems in bioinformatics, sequence alignment and phylogenetic tree construction. Most people would spend a few minutes thinking about what was really important before feeding data to an NP complete algorithm. I ran a full set of alignments last night using the exact algorithms, not heuristic approximations, in a virtual machine on my underpowered laptop yesterday afternoon, so we&#8217;re not talking about truly hard problems. But no, the software is written to be inefficient, to use memory poorly, and the cry goes up for bigger, faster machines! When the machines are procured, even larger hunks of data are indiscriminately shoved through black box implementations of algorithms in hopes that meaning will emerge on the far side. It never does, but maybe with a bigger machine&#8230;</p>
<p>Fortunately for you, no one takes me seriously. The funding of molecular biology and bioinformatics is safe, protected by a wall of inbreeding, pointless jargon, and lies. So you all can rot in your computational shit heap. I&#8217;m gone.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/a-farewell-to-bioinformatics/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Book review: Clay Johnson&#8217;s &#8216;The Information Diet&#8217;</title>
		<link>http://madhadron.com/book-review-clay-johnsons-the-information-diet</link>
		<comments>http://madhadron.com/book-review-clay-johnsons-the-information-diet#comments</comments>
		<pubDate>Fri, 17 Feb 2012 00:08:56 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[nontechnical]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=262</guid>
		<description><![CDATA[Let&#8217;s get the worst out of the way: Clay Johnson&#8217;s &#8216;The Information Diet&#8217; isn&#8217;t worth reading. There. Since you&#8217;re still reading, I imagine you&#8217;d like to know why. It&#8217;s not because of what&#8217;s in it. Actually there&#8217;s a fair amount of interesting information in it. There&#8217;s an attempt to explain why our public discourse is [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s get the worst out of the way: Clay Johnson&#8217;s &#8216;The Information Diet&#8217; isn&#8217;t worth reading. There.</p>
<p>Since you&#8217;re still reading, I imagine you&#8217;d like to know why. It&#8217;s not because of what&#8217;s in it. Actually there&#8217;s a fair amount of interesting information in it. There&#8217;s an attempt to explain why our public discourse is so miserable, and a clear statement of why it&#8217;s not really anyone&#8217;s fault. There&#8217;s a potshot at the little hits of dopamine your cellphone and email client give you. There&#8217;s the true statement that the information most folks pass through their heads is far from messy reality.</p>
<p>But then he tries to make a diet from these, coining the term &#8216;infovegan&#8217;. His diet&#8217;s what you would expect from someone who would choose such a term: from a limited scope, limit yourself even further. It&#8217;s all based on an idea of trophic hierarchy, which works fine in ecology, and if all you&#8217;re talking about is the successive filtering of news. But where in the tropic hierarchy are Henry James&#8217;s collected prefaces? Sure, they&#8217;re commentary on his novels, but they&#8217;re the word from the horse&#8217;s mouth on actually crafting novels. Or how about Carrol&#8217;s Jabberwock or the works of Rabelais? They&#8217;re pure fantasy, but both men clung close to the wellspring of language.</p>
<p>All I can think is that Johnson&#8217;s been in politics too long, and it&#8217;s blinkered him. I don&#8217;t need an information diet, I need an information cuisine. Robert Atkins had a diet. Julia Child had a cuisine. The funny thing is, them as have cuisines tend to get along just fine, while those with diets are in constant need of a new one.</p>
<p>So the major reason not to read &#8216;The Information Diet&#8217; is lack of vision. Johnson doesn&#8217;t have it, because he ain&#8217;t got a cuisine. I&#8217;m sure we&#8217;ll see infodiet crazes once the metaphor&#8217;s firmly cemented, but I don&#8217;t want to be an &#8216;infovegan&#8217;. I don&#8217;t want to deny myself McDonalds hamburgers in favor of overcooked tofu. I made a stew of salmon, lentils, and beets tonight better than either. And I&#8217;ve got a pile of books at my elbow that make up a cuisine, and after this dud cake recipe, I&#8217;m off to stew up something proper.</p>
<p>I&#8217;ll leave you with a book about the same size as Johnson&#8217;s but chock full of cuisine: Ray Bradbury&#8217;s &#8216;Zen in the Art of Writing&#8217;. When you finish that, you can go try Ezra Pound&#8217;s &#8216;ABCs of Reading&#8217;, Kenneth Clarke&#8217;s &#8216;Civilization&#8217;, Garrison Keillor&#8217;s &#8216;Good Poems&#8217;, and Christopher Alexander&#8217;s &#8216;The Nature of Order&#8217;.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/book-review-clay-johnsons-the-information-diet/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to have a credible scientific literature</title>
		<link>http://madhadron.com/how-to-have-a-credible-scientific-literature</link>
		<comments>http://madhadron.com/how-to-have-a-credible-scientific-literature#comments</comments>
		<pubDate>Sun, 05 Feb 2012 07:47:53 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[biology]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[nontechnical]]></category>
		<category><![CDATA[physics]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=260</guid>
		<description><![CDATA[My girlfriend took me to task after I wrote my rant on peer review: &#8220;You have to have some assurance that the scientific literature is believable.&#8221; I agree. In my rant, I publicly publicly promised that what I produced would be believable, my reputation on it. But she persevered, and I eventually disgorged my core [...]]]></description>
			<content:encoded><![CDATA[<p>My girlfriend took me to task after I wrote my <a href="http://madhadron.com/?p=259">rant on peer review</a>: &#8220;You have to have some assurance that the scientific literature is believable.&#8221; I agree. In my rant, I publicly publicly promised that what I produced would be believable, my reputation on it. But she persevered, and I eventually disgorged my core belief on this subject: no intervention, whether it be peer review or something else, at the level of publication, can produce credibility.</p>
<p>Shocking, yes? But consider: all a peer reviewer has to go on is what the author wants him to have. They both know the same background material, and both know that unknown factors can make seemingly identical experiments done by different people contradict. All a reviewer can do is ascertain that, if there was fraud committed, it was skillfully done. Our potential fraudsters are scientists, highly trained and intelligent. I believe them capable of producing a skillful fraud if they wish. Peer review spreads the responsibility editorial decisions for journals around on the theory that, with enough heads, someone will be able to decide well. It does no more.</p>
<p>How can we have credibility, then? Remove the social forces that reward fraud, and then punish the act of fraud with shame and ostracism. If there is no benefit and only risk, only a few pathological minds will bother. What are the social forces? Essentially, temporal rewards: promotion, tenure, prizes, funding, minions, and not being fired.</p>
<p>I want to digress on the firing of academic scientists. At this point in time, all the funding comes from the scientist. Universities provide at best partial salary support and some money to get set up, and thereafter demand rent from the scientist, in exchange for teaching and committee duties in perpetuity. The university has only prestige and facilities to offer in return. Facilities are cheap, so why does the university hold power in this relationship? It has to be the prestige, which is a fine inducement to fraud. The power universities hold beyond what any other landlord renting facilities would needs to end. As for the rest of it, promotion and prizes and minions, get rid of it all. Tenure, in the civilized world, should go. Civilized countries today have freedom of speech built into their legal foundations. The rights tenure originally guaranteed are now guaranteed to <i>all</i> citizens.</p>
<p>What does the resulting scientific establishment look like? It should be relatively easy to get enough funding for a single person to run a frugal lab for four or five years at a stretch, and horribly difficult to get more. There is no official recognition. There are no prizes to compete for, no plum positions or sinecures unless you convince private citizens to endow you luxuriously. There are no armies of graduate students being driven through programs to support the prestige of their professors.</p>
<p>Inevitably someone will protest that without these incentives, why try to do important science? I feel slightly ridiculous even addressing the point, but it must be done: I have trouble imagining the mind that gets up every morning and says to itself, &#8220;I think I&#8217;ll do something humdrum and unimportant today.&#8221; Perhaps they exist, and for those minds we can install basic checks. At the end of your funding period your work has to be reviewed for renewal. If you haven&#8217;t produced anything at all, not even interesting, intermediate results, then you&#8217;re on probation. If you don&#8217;t produce anything twice around, you&#8217;re not funded. The standards need not be that high. We&#8217;re not talking about a large amount of money, no more than $80,000 to $100,000 a year to pay for salary and all research expenses. There&#8217;s plenty of room at the bottom.</p>
<p>Scientists positioned like this, with no hope of official recognition, prestige, or promotion, have no reason to commit fraud. Add to that a strong discouragement&#8212;being barred for life from any government research funding if convicted of fraud&#8212;and you have a system where the literature will probably be credible. There will be crackpots, but that&#8217;s a whole different problem, and one that <i>can</i> be handled gracefully with minimal measures at the level of publishing.</p>
<p>I have spoken here only of individual scientists, not of large projects. What of major initiatives like CERN or USGS surveys? Projects of sufficient size have their own means of avoiding fraud. Papers coming from the experiments on CERN&#8217;s LHC are not peer reviewed by the journals. They have already been checked and rechecked internally, along with the data and the analyses themselves. I&#8217;m not worried about projects on this scale. They function just fine. The worry about credibility of the scientific literature is purely directed at the independent researcher working on a small project, and for that I can see no other solution than a reengineering of the scientific community to remove the inducements to fraud.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/how-to-have-a-credible-scientific-literature/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why I don&#8217;t publish in peer reviewed journals</title>
		<link>http://madhadron.com/why-i-dont-publish-in-peer-reviewed-journals</link>
		<comments>http://madhadron.com/why-i-dont-publish-in-peer-reviewed-journals#comments</comments>
		<pubDate>Thu, 02 Feb 2012 08:01:03 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[biology]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[nontechnical]]></category>
		<category><![CDATA[physics]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=259</guid>
		<description><![CDATA[I don&#8217;t publish in peer reviewed journals. Oh, my name is on a few papers where I&#8217;ve contributed, and I&#8217;ve written a few columns that appeared in otherwise peer reviewed journals, but I have never and have no intention of ever specifically submitting work that is primarily mine to a peer reviewed journal. This astonishes [...]]]></description>
			<content:encoded><![CDATA[<p>I don&#8217;t publish in peer reviewed journals. Oh, my name is on a few papers where I&#8217;ve contributed, and I&#8217;ve written a few columns that appeared in otherwise peer reviewed journals, but I have never and have no intention of ever specifically submitting work that is primarily mine to a peer reviewed journal. This astonishes most scientists, who feel that they must submit their work to these places. There is even a feeling that you would only choose not to if you couldn&#8217;t manage to be accepted, though given the uneven level of rigor in papers published today, this attitude can only be described as deluded.</p>
<p>But why don&#8217;t I publish in these journals?</p>
<p>I do science because I really enjoy the cycle of reexamining what we know for holes, trying to interpolate into them, and then testing the interpolations, and because I value the scientific legacy left to me and I wish to pass it on enriched. Only the second has anything to do with publishing, and it puts the onus on me to communicate my work in the clearest, most generally accessible form possible.</p>
<p>Peer reviewed journals are not a generally accessible form. The papers in them are read only by a few specialists. The writing of such papers is an act of verbal contortion: editors demand extensive results before they will allot one of their competitive slots to a paper, but the slot is sufficiently short that there is no space to describe that work comprehensively, much less clearly. Most papers today don&#8217;t include a description of their methods adequate to reproduce the results. Once published, the paper becomes the property of a company with no interest in its general distribution, or even in its preservation, and is thereafter inaccessible to all but a few privileged individuals. This ignores the amount of work, frustration, and political wrangling that goes into forcing a paper through the peer review process.</p>
<p>Given this, why does anyone publish in these journals? It&#8217;s part of the game: if you want to advance in academia, you have to publish peer reviewed papers that your peers think highly of. That&#8217;s irrelevant to me. I decided before I finished my bachelors degree that I didn&#8217;t want to be an academic. Graduate school for me was an offer by a university to pay me while I amused myself with the fun parts of science for a few more years. Unfortunately, the professors where I went didn&#8217;t understand this and insisted on treating my presence there as somehow a privilege bestowed on me, but this is beside the point. We&#8217;re talking about publishing.</p>
<p>The academics can spend their time fighting to change the peer review system, to curb its excesses and make it function better. They have to. It&#8217;s a game they&#8217;re forced to play. I don&#8217;t have to, and I have no reason to want to. Yes, we need a system of evaluating research to prevent utter crackpottery, to &#8220;preserve standards&#8221;, but <i>I&#8217;m not bringing them down</i>. I required higher standards of myself than most of my colleagues were willing to meet.</p>
<p>I&#8217;ll make a promise that&#8217;s better than limiting my work to peer reviewed journals: I will train and work as rigorously as possible, and ask scientists I respect to tell me honestly where I have failed and need to improve. I will write my results up as clearly as I can, and make them publicly available as widely as possible.</p>
<p>If you&#8217;re required to play the peer review game by your career ambitions, go ahead. I understand. I ask you to understand that my career ambitions require me not to.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/why-i-dont-publish-in-peer-reviewed-journals/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Measuring your energy levels</title>
		<link>http://madhadron.com/measuring-your-energy-levels</link>
		<comments>http://madhadron.com/measuring-your-energy-levels#comments</comments>
		<pubDate>Mon, 30 Jan 2012 05:31:30 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[nontechnical]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=254</guid>
		<description><![CDATA[I recently ran a brief experiment on myself. After becoming ornery about not having enough energy in the late afternoon, I finally decided to measure my energy level throughout the day. I formulated the simplest experiment that I thought would give me adequate data: once an hour writing down L, M, or H (for low, [...]]]></description>
			<content:encoded><![CDATA[<p>I recently ran a brief experiment on myself. After becoming ornery about not having enough energy in the late afternoon, I finally decided to measure my energy level throughout the day. I formulated the simplest experiment that I thought would give me adequate data: once an hour writing down L, M, or H (for low, medium, or high) to indicate my energy level. If it were more than once an hour, then when I inevitably forgot for two or even three hours, it would be hard to remember roughly what my energy level was. Similarly, I wouldn&#8217;t be able to accurately record anything more precise than a three level measurement when I forgot.</p>
<p>I did it on a little pocket notebook part of the time, and in a text file the rest of the time. I started in the early afternoon the day I thought it up, missed a couple hours on one day, and then ended the experiment after four days since the results were so clear. I could have run it a few days more, but more data wouldn&#8217;t have enabled me to make better decisions.</p>
<p>I encourage you to repeat my experiment with yourself. I&#8217;ll explain what I did with the data below to give you an idea of what you might try. If you want to send your results to me, I&#8217;d love to see them.</p>
<p>To start with, here is the raw data. The columns begin when I woke up in the morning, and end when I went to bed at night, except for Wednesday afternoon when I began the experiment.</p>
<table>
<colgroup>
<col class="right" />
<col class="left" />
<col class="left" />
<col class="left" />
<col class="left" />
<col class="left" />
  </colgroup>
<thead>
<tr>
<th scope="col" class="right">Hour</th>
<th scope="col" class="left">Wed</th>
<th scope="col" class="left">Thurs</th>
<th scope="col" class="left">Fri</th>
<th scope="col" class="left">Sat</th>
<th scope="col" class="left">Average</th>
</tr>
</thead>
<tbody>
<tr>
<td class="right">8</td>
<td class="left"></td>
<td class="left">M</td>
<td class="left">M</td>
<td class="left"></td>
<td class="left">M</td>
</tr>
<tr>
<td class="right">9</td>
<td class="left"></td>
<td class="left">M</td>
<td class="left">M</td>
<td class="left"></td>
<td class="left">M</td>
</tr>
<tr>
<td class="right">10</td>
<td class="left"></td>
<td class="left">H</td>
<td class="left">H</td>
<td class="left">H</td>
<td class="left">H</td>
</tr>
<tr>
<td class="right">11</td>
<td class="left"></td>
<td class="left">H</td>
<td class="left">H</td>
<td class="left">H</td>
<td class="left">H</td>
</tr>
<tr>
<td class="right">12</td>
<td class="left"></td>
<td class="left">H</td>
<td class="left">H</td>
<td class="left">M</td>
<td class="left">H</td>
</tr>
<tr>
<td class="right">13</td>
<td class="left">M</td>
<td class="left">H</td>
<td class="left">M</td>
<td class="left">M</td>
<td class="left">M</td>
</tr>
<tr>
<td class="right">14</td>
<td class="left">M</td>
<td class="left">M</td>
<td class="left">M</td>
<td class="left">M</td>
<td class="left">M</td>
</tr>
<tr>
<td class="right">15</td>
<td class="left">M</td>
<td class="left">L</td>
<td class="left">M</td>
<td class="left">M</td>
<td class="left">M</td>
</tr>
<tr>
<td class="right">16</td>
<td class="left">M</td>
<td class="left">L</td>
<td class="left">L</td>
<td class="left">M</td>
<td class="left">M/L</td>
</tr>
<tr>
<td class="right">17</td>
<td class="left">L</td>
<td class="left">L</td>
<td class="left">L</td>
<td class="left">L</td>
<td class="left">L</td>
</tr>
<tr>
<td class="right">18</td>
<td class="left">L</td>
<td class="left">M</td>
<td class="left">M</td>
<td class="left">L</td>
<td class="left">M/L</td>
</tr>
<tr>
<td class="right">19</td>
<td class="left">M</td>
<td class="left">M</td>
<td class="left">M</td>
<td class="left">M</td>
<td class="left">M</td>
</tr>
<tr>
<td class="right">20</td>
<td class="left">H</td>
<td class="left">M</td>
<td class="left">H</td>
<td class="left">M</td>
<td class="left">M/H</td>
</tr>
<tr>
<td class="right">21</td>
<td class="left">H</td>
<td class="left">M</td>
<td class="left">H</td>
<td class="left">H</td>
<td class="left">H</td>
</tr>
<tr>
<td class="right">22</td>
<td class="left">M</td>
<td class="left"></td>
<td class="left">H</td>
<td class="left">H</td>
<td class="left">H</td>
</tr>
<tr>
<td class="right">23</td>
<td class="left"></td>
<td class="left"></td>
<td class="left">H</td>
<td class="left">M</td>
<td class="left">M/H</td>
</tr>
<tr>
<td class="right">24</td>
<td class="left"></td>
<td class="left"></td>
<td class="left">M</td>
<td class="left"></td>
<td class="left">M</td>
</tr>
</tbody>
</table>
<p>The first thing to note is how regular it is. My energy levels have a regular pattern that varies by at most an hour or so per day. What does that tell me about how I should organize my time?</p>
<p>It would be a shame to ignore the structure and improvise my day as I go, as I would have to if the variation were two or three hours, as long or longer than the periods of high or low energy themselves. However, a rigidly defined schedule won&#8217;t work for me, since there is still variation by up to an hour. It would be wonderful for someone with variation of less than half an hour.</p>
<p>What I <b>can</b> do is make a rough flow of events, and their rough length. Society suggests just such a flow to us: have breakfast when you get up, work until midday, have lunch, work until five, go home and do what must be done to keep your life imploding until dinner, then be with your family until it&#8217;s time to sleep. Dinner varies with your background. For many Americans it falls at 17h00 or 18h00. For me it has always been around 20h00. For a Spaniard, it would be closer to 22h00.</p>
<p>Compare this with my energy levels:</p>
<table>
<colgroup>
<col class="right" />
<col class="left" />
<col class="left" />
  </colgroup>
<thead>
<tr>
<th scope="col" class="right">Hour</th>
<th scope="col" class="left">Average energy</th>
<th scope="col" class="left">Societal event</th>
</tr>
</thead>
<tbody>
<tr>
<td class="right">8</td>
<td class="left">M</td>
<td class="left">breakfast</td>
</tr>
<tr>
<td class="right">9</td>
<td class="left">M</td>
<td class="left">start work</td>
</tr>
<tr>
<td class="right">10</td>
<td class="left">H</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">11</td>
<td class="left">H</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">12</td>
<td class="left">H</td>
<td class="left">lunch</td>
</tr>
<tr>
<td class="right">13</td>
<td class="left">M</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">14</td>
<td class="left">M</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">15</td>
<td class="left">M</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">16</td>
<td class="left">M/L</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">17</td>
<td class="left">L</td>
<td class="left">end work</td>
</tr>
<tr>
<td class="right">18</td>
<td class="left">M/L</td>
<td class="left">maintain life</td>
</tr>
<tr>
<td class="right">19</td>
<td class="left">M</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">20</td>
<td class="left">M/H</td>
<td class="left">dinner</td>
</tr>
<tr>
<td class="right">21</td>
<td class="left">H</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">22</td>
<td class="left">H</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">23</td>
<td class="left">M/H</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">24</td>
<td class="left">M</td>
<td class="left"></td>
</tr>
</tbody>
</table>
<p>If I did this, I would be starting work before my energy really rose for the day. I would stop working in the middle of my highest energy period of the day and have lunch. Then I work through my medium and low energy period, and have to take care of everything else in my life when my energy is at its lowest in the day. Then, when my energy is finally coming back, I sit down to dinner and relax for the night. This is almost the worst possible arrangement of events for me.</p>
<p>Instead, I&#8217;m going to make a few observations and build my own sequence of events. First, those two periods of high energy are sacrosanct. Any sequence of events must put my work into those slots. Fortunately, as long as I make an appearance somewhere in the work day at my day job, no one much cares exactly what hours I work.</p>
<p>I want to take half an hour before to start work and get my mind in the right place before my energy peaks so as not to waste any of it. I also want half an hour to an hour after the peak to the results in order and plan the work for the next period of high energy. So in the morning 9h30&#8211;10h00 is a warmup, 10h00&#8211;13h00 is work, and 13h00&#8211;13h30 or 14h00 is cleaning up and planning the next session. Looking over this, I also know that I&#8217;m going to need a light meal, something with plenty of protein and fruit and vegetables&#8212;tuna salad and carrots and an apple, perhaps&#8212;about 11h30, to be eaten while I work. Similarly, at night 20h00-20h30 is warmup, 20h30&#8211;22h30 is work, with a light meal about 22h00, and 22h30&#8211;23h00 or 23h30 is putting my work in order for the next session.</p>
<p>Those low hours from 16h00 to 18h00 are the bane of my existence. I hate them, and I&#8217;m going to deal with them before I do anything else. I&#8217;m going to sleep through them. 16h00&#8211;18h00 is hereby my siesta.</p>
<p>Those are the points that I feel really strongly about. Now for the essentials: sleep, food, and exercise. I like to eat my breakfast calmly, but I don&#8217;t feel like doing much in the morning before that first big peak of energy, so waking up at 8h00 will work fine for me. I need time to wind down before I can go to sleep at night, so after my last work period I&#8217;m going to get ready for bed and go to sleep. It may look like I&#8217;ve schedule ten hours of sleep a day, but I&#8217;m something of an insomniac. If I find I&#8217;m getting more sleep than my body wants, I&#8217;ll just get up earlier and do a little reading in the morning.</p>
<p>I&#8217;ve already scheduled two light meals during my peak energy times, to be eaten as I work. Knowing my metabolism, I need another light meal, and two full meals. The light meal is to get me through until dinner. I&#8217;ll have it at the start of my siesta. The two full meals are breakfast and dinner. Breakfast is before 9h30, probably whenever I get up, and it needs to be rather more than a bowl of cereal. Dinner needs to be done before my evening energy peak, so I&#8217;ll say that it happens between 18h30 and 19h30.</p>
<p>Finally, exercise. I need to have at least a medium energy level, and I must not have eaten heavily in the last hour or so. The obvious time is about 14h00, and I don&#8217;t need hours. I can have myself to collapse in about twenty minutes if I want. That still leaves nice hunks of time in the mid to late afternoon for dealing with the miscellanea of my life and spending time with people.</p>
<p>In summary, here is my proposed rough order of events for my day:</p>
<table>
<colgroup>
<col class="right" />
<col class="left" />
  </colgroup>
<thead>
<tr>
<th scope="col" class="right">Hour</th>
<th scope="col" class="left">Event</th>
</tr>
</thead>
<tbody>
<tr>
<td class="right">8</td>
<td class="left">breakfast</td>
</tr>
<tr>
<td class="right">9</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">10</td>
<td class="left">first work period</td>
</tr>
<tr>
<td class="right">11</td>
<td class="left">light meal</td>
</tr>
<tr>
<td class="right">12</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">13</td>
<td class="left">end first work period</td>
</tr>
<tr>
<td class="right">14</td>
<td class="left">exercise</td>
</tr>
<tr>
<td class="right">15</td>
<td class="left">miscellanea</td>
</tr>
<tr>
<td class="right">16</td>
<td class="left">light meal, siesta</td>
</tr>
<tr>
<td class="right">17</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">18</td>
<td class="left">end siesta</td>
</tr>
<tr>
<td class="right">19</td>
<td class="left">dinner</td>
</tr>
<tr>
<td class="right">20</td>
<td class="left">second work period</td>
</tr>
<tr>
<td class="right">21</td>
<td class="left"></td>
</tr>
<tr>
<td class="right">22</td>
<td class="left">light meal</td>
</tr>
<tr>
<td class="right">23</td>
<td class="left">end second work period</td>
</tr>
</tbody>
</table>
<p>There, a sequence of events that is utterly abnormal in our society, but which is calculated to fit me.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/measuring-your-energy-levels/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The sun and the moon</title>
		<link>http://madhadron.com/the-sun-and-the-moon</link>
		<comments>http://madhadron.com/the-sun-and-the-moon#comments</comments>
		<pubDate>Tue, 03 Jan 2012 23:01:33 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[nontechnical]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=244</guid>
		<description><![CDATA[I once saw the sun in the moon in the sky, with the moon&#8217;s dark side shadowed just so. The way they lay, though I couldn&#8217;t perceive the distances involved, I could grasp the shape of the triangle of sun, moon, and earth. The distance from earth to sun is roughly 1 AU. From the [...]]]></description>
			<content:encoded><![CDATA[<p>I once saw the sun in the moon in the sky, with the moon&#8217;s dark side shadowed just so. The way they lay, though I couldn&#8217;t perceive the distances involved, I could grasp the shape of the triangle of sun, moon, and earth.</p>
<p>The distance from earth to sun is roughly 1 AU. From the Earth to the moon is about 2.5 &#215; 10<sup>-3</sup> AU. How often do you deal with an aspect ratio of three orders of magnitude in your life?</p>
<p>The tallest building in the world, the <a href="http://en.wikipedia.org/wiki/Burj_Khalifa">Burj Khalifa</a> in Dubai, is 828m. What is the smallest object you can see if you have the entirety of the tower in view? The vertical <a href="http://en.wikipedia.org/wiki/Human_eye#Field_of_view">field of view</a> of the human eye is about 0.75&#960; radians. 10<sup>-4</sup>&#960; radians. If we had the entire Burj Khalifa in view, the smallest object we could see next to it is theoretically 10cm, but to get any effect we would have to perceive it as an object, not a speck, so it would actually have to be several meters in size. That&#8217;s at most two orders of magnitude.</p>
<p>The great cathedrals, the Grand Canyon, the pyramids, don&#8217;t offer anything like two orders of magnitude of visible difference. You cannot draw a triangle with an aspect ratio of 1000 on a piece of normal paper with a pencil that you don&#8217;t perceive as a sloppy line. A4 paper is 210 x 297 mm, standard pencil leads are 0.7mm in diameter. The width of the a triangle 297mm long with the same aspect ratio as the sun, moon, earth triangle I saw would be 0.7mm.</p>
<p>We live in a world of very limited aspect ratio. What happens when we are suddenly faced with something as far beyond our experience as the Burj Khalifa is beyond a house in the suburb?</p>
<p><a href="http://en.wikipedia.org/wiki/Satori">Satori</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/the-sun-and-the-moon/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>I cry shame&#8230;</title>
		<link>http://madhadron.com/i-cry-shame</link>
		<comments>http://madhadron.com/i-cry-shame#comments</comments>
		<pubDate>Tue, 13 Dec 2011 23:44:06 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[biology]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=243</guid>
		<description><![CDATA[Shame on these people: &#8220;One limitation of this modeling approach is that for some genes, there is a low degree of similarity between the observed expression profile and the one predicted by the most appropriate model. &#8230;It is possible that other curve-fitting methods&#8230;might also be applicable to this kind of data. However these methods do [...]]]></description>
			<content:encoded><![CDATA[<p>Shame on these people:</p>
<blockquote><p>
  &#8220;One limitation of this modeling approach is that for some genes, there is a low degree of similarity between the observed expression profile and the one predicted by the most appropriate model. &#8230;It is possible that other curve-fitting methods&#8230;might also be applicable to this kind of data. However these methods do not provide the statistical machinery that comes with the regression modeling approach that we have taken. The main advantage of our method is being able to easily apply valid statistical tests to determine which one of two models is more likely in light of the data. We are also able to explicitly define statistical significance in a meaningful way that protects against a specified false discovery rate.&#8221;
</p></blockquote>
<p>It&#8217;s from the paper <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000626"><em>Decomposition of Gene Expression State Space Trajectories</em></a>. Summarized, it says, &#8220;We don&#8217;t know how to use any appropriate tools, so we&#8217;ll use inappropriate ones we found in our introductory statistics book. Everyone else calculates <em>p</em>-values for lots of things with it, so it must be okay here, too.&#8221;</p>
<p>Let us review what you must have if you are not to drive a statistically inclined mad scientist to sic his ants upon you:</p>
<ol>
<li>Your procedure must measure something which is demonstrably relevant to what you are investigating.</li>
<li>You must know the assumptions under which a procedure is valid.</li>
<li>You must have tested your data and apparatus against the assumptions and found that they hold.</li>
</ol>
<p>That&#8217;s it. Really. But if my giant mechanical ants were to truly rend those who fail to accomplish these three simple goals, the majority of inhabitants of the groves of academe would be ant food.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/i-cry-shame/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A letter on monads</title>
		<link>http://madhadron.com/a-letter-on-monads</link>
		<comments>http://madhadron.com/a-letter-on-monads#comments</comments>
		<pubDate>Mon, 14 Nov 2011 02:42:32 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=241</guid>
		<description><![CDATA[Sekhar wrote: &#62; I mentioned this article in the retreat I seem to remember reading something very similar a while back. Unfortunately, he doesn&#8217;t understand monads, as far as I can tell. They were a solution to syntactic problem, not a semantic program. Almost every language out there attaches a continuation to each statement, that [...]]]></description>
			<content:encoded><![CDATA[<p>Sekhar wrote:</p>
<blockquote>
<p>&gt; I mentioned <a href="http://gbracha.blogspot.com/2011/01/maybe-monads-might-not-matter.html">this article</a> in the retreat</p>
</blockquote>
<p>I seem to remember reading something very similar a while back.</p>
<p>Unfortunately, he doesn&#8217;t understand monads, as far as I can tell. They were a solution to syntactic problem, not a semantic program.</p>
<p>Almost every language out there attaches a continuation to each statement, that is, a &#8220;next in time will be this&#8221; pointer. Some languages, like Scheme, give you access to the continuation. Most don&#8217;t. They all have it, though. Even Prolog and ML have them.</p>
<p>The pure functional languages can be regarded as an experiment in cutting off the continuations. All you are guaranteed is that any value you use was evaluated before you used it. So how do you do sequential computation? Use a value which was produced by the thing you want to have happened before you in time. If you look at Mercury, which is a really fast, pure, strongly typed Prolog, you find a lot of statements are rather different from traditional Prolog. <tt>print</tt>, for instance, takes a value of type <tt>IO ()</tt> as well as the string to print, and returns a new value of type <tt>IO ()</tt>. The <tt>IO ()</tt> value you pass it is what was supposed to come before it in time. The <tt>IO ()</tt> it returns is a value that lets another function inject itself into print&#8217;s continuation.</p>
<p>Remember how your thinking inverted when you learned Lisp, and suddenly you pulled values through functions instead of pushed them into variables? Same thing here. You construct the continuation as a sequence of nested statements instead of constructing statements as a sequence of nested continuations.</p>
<p>The thing is, passing that extra state variable around is bloody annoying. It makes code ugly. And if there&#8217;s anything a programming language researcher hates more than ugly code, I don&#8217;t want to see it.</p>
<p>Then Phil Wadler figured out that what he really wanted to do was to reprogram the boundaries between statements, and then found that a category theoretical construct called a monad nicely described the IO case. And a bunch of others. Though it turned out not to be right for a bunch more, which lead to the propagation in the community of applicative functors (aka, idioms) and arrows.</p>
<p>He mentions actors as a better idea than monads. Actors have their own problems, though. Yes, they are more intuitive initially, but reasoning behaviorally about concurrent systems doesn&#8217;t actually pan out. For instance, see Lamport&#8217;s <a href="http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#garbag">commentary</a> on a paper he ended up on the author list of with Dijkstra.</p>
<p>The reason monads took off is <i>not</i> because they were the undeniably best way of handling state threading. Clean, for instance, uses a completely different mechanism that works just fine. Monads took off because they compose. You can construct domains specialized for what you want to do.</p>
<p>The tools for doing this composition work well enough for production use, but are anything but elegant. As soon as anyone has a good idea on how to do it right, it will sweep through the Haskell community like wildfire.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/a-letter-on-monads/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My thoughts on the BIO2010 report</title>
		<link>http://madhadron.com/my-thoughts-on-the-bio2010-report</link>
		<comments>http://madhadron.com/my-thoughts-on-the-bio2010-report#comments</comments>
		<pubDate>Sun, 30 Oct 2011 18:50:26 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[biology]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=240</guid>
		<description><![CDATA[The NIH and HHMI organized a much trumpeted report entitled BIO2010: Transforming Undergraduate Education for Future Research Biologists. It came out in 2003 to much acclaim. I read the crux of it (chapter 2, &#8220;A New Biology Curriculum&#8221;) last night. I don&#8217;t like it. I have some basic philosophical objections to their approach, and a [...]]]></description>
			<content:encoded><![CDATA[<p>The NIH and HHMI organized a much trumpeted report entitled <em><a href="http://www.nap.edu/openbook.php?isbn=0309085357">BIO2010: Transforming Undergraduate Education for Future Research Biologists</a></em>. It came out in 2003 to much acclaim. I read the crux of it (chapter 2, &#8220;A New Biology Curriculum&#8221;) last night. I don&#8217;t like it. I have some basic philosophical objections to their approach, and a lot of quibbles with the details as they stand. I&#8217;ll start with my quibble, then I&#8217;ll attack the underlying philosophy, and propose a curriculum in my turn.</p>
<p>The chapter is organized into a list of core concepts to be taught from biology, physics, chemistry, engineering, and math and computer science, then a set of suggested curricula. I&#8217;ve broken my gripes down by section:</p>
<p><b>Biology</b>: Some of the concepts listed are important (&#8220;Biological systems obey the laws of chemistry and physics.&#8221; and &#8220;Lipids assemble with proteins to form membranes, which surround cells to separate them from their environment. Membranes also form distinct compartments within eukaryotic cells.&#8221;) and I dearly hope they are being taught already. Others are sloppy, tautological, or espouse absurdities such as &#8220;holistic&#8221; science.</p>
<p><b>Chemistry</b>: I know almost no chemistry and am not willing to comment on this section except for the fact that &#8220;computational methods and modeling&#8221; is a bullet point right alongside Lewis structures.</p>
<p><b>Physics</b>: The physics recommendations largely consist of a standard intro physics course for hard science majors with a few topics moved here or there. This isn&#8217;t a bad thing. I think that the best approach would be to take an 1960&#8242;s edition of Halliday and Resnick and go through that in a year (for those who don&#8217;t know it, the early editions of Halliday and Resnick were clear, comprehensive, and had a doable number of extremely well selected problems).</p>
<p>The report also wants the students to do learn by interacting with simulations. I can tell you from being on the receiving end of such an attempt and from anecdotal evidence in the community that such approaches are dismal failures. Let the course be analytical and experimental. Leave the computer out of it.</p>
<p><b>Engineering</b>: Aside from the incredibly broad heading, this has about the only interesting, concrete recommendation in the report. The introduction was written by the biologists who are convinced that systems engineering is all that they might really want. Then someone wrote a curriculum outline for the first third of a really solid neurobiology course. Whoever wrote that, bravo!</p>
<p><b>Mathematics and Computer Science</b>: Some of the topics confuse me (computability theory? why?). The gist of this section is to give the biology students special math classes where the computer can do all the work for them. Hamming said, &#8220;The computer is an extension of the body, not the mind.&#8221; (I found this quote in the preface to McNeil&#8217;s <em>Interactive Data Analysis</em>) I agree that biologists should be able to program, and should really understand what BLAST does. This would be better taught as a one or two semester course on numerical analysis and relevant search methods, plus data structures such as ropes to handle long sequences. And it should be taught in <a href="http://www.teach-scheme.org/">Scheme</a> (or <a href="http://mitpress.mit.edu/sicp/">this</a> for the ambitious), not in &#8220;higher-level languages such as Matlab, Perl, or C&#8221; as the report recommends. Everyone who knows something about programming languages just stopped taking the computing recommendations of the report seriously with that list.</p>
<p>Apparently students should be taught data analysis. I completely agree. Astonishingly, <a href="http://en.wikipedia.org/wiki/John_Tukey">John Tukey</a> wrote some great books for this back in the &#8217;70&#8242;s because he was teaching people data analysis. And it&#8217;s not just biologists who need this. Every scientist would benefit. Call up your local statistics department and get them to institute an &#8220;interdisciplinary&#8221; course based on John Tukey&#8217;s two classics <em><a href="http://www.amazon.com/Exploratory-Data-Analysis-Wilder-Tukey/dp/0201076160/ref=pd_bbs_sr_1/104-9810363-1018362?ie=UTF8&amp;s=books&amp;qid=1186940657&amp;sr=8-1">Exploratory Data Analysis</a></em> and <em><a href="http://www.amazon.com/Data-Analysis-Regression-Statistics-Addison-Wesley/dp/020104854X/ref=sr_1_1/104-9810363-1018362?ie=UTF8&amp;s=books&amp;qid=1186940671&amp;sr=8-1">Data Analysis and Regression</a></em>, or whatever modernized version they want to teach.</p>
<p>The recommended math comes down to the following classes in the math department: single and multivariable calculus, linear algebra, differential equations (more like <a href="http://www.amazon.com/Ordinary-Differential-Equations-V-Arnold/dp/0262510189">Arnol&#8217;d's book</a> than what the engineers are taught), and probability and stochastic processes (this would be a great place to use Nelson&#8217;s <em><a href="http://www.math.princeton.edu/~nelson/books.html">Radically Elementary Probability Theory</a></em>). There, three years of math, a semester of computer science, and a semester of data analysis. That&#8217;s not so bad, is it?</p>
<p>The report also mentions that medical school admissions requirements govern a lot of what biology departments cover. My approach: ignore them. Physics departments do. If the biologists ignore the MCATs completely, then they will change to follow the biology.</p>
<p>The recommended solutions for dealing with not having the expertise to teach a course are frightening: &#8220;&#8230;taught&#8230;by a collaborating team of faculty from multiple departments&#8221; or &#8220;A mathematician or computer scientist might also be invited to give a guest lecture or two.&#8221; Two lectures isn&#8217;t long enough to have any real effect on students&#8217; mental processes, and team teaching makes the course scattered and disorderly. I speak from experience. Biology courses are often taught this way, and it&#8217;s absolutely useless.</p>
<p>There is an obsession throughout with having a course on modeling and simulation. I believe the rationale is that all the ugly mathematics stuff can be put into this, taught by one of those egg-headed math people, and the biology professors can go on doing exactly as they have been.</p>
<p>At one point it says, &#8220;Opportunities to learn mathematical skills in a rich content context will enhance conceptual understanding and procedural fluency.&#8221; No it won&#8217;t. Math professors should (and usually do not) ask students, &#8220;Why is this theorem interesting?&#8221; If a student has not reached a level of mental abstraction sufficient to answer such a question in the context of pure mathematics, the mathematics education has failed. Later it even recommends remedial math courses before calculus! What are you doing in college if you need that?</p>
<p>In an <a href="http://madhadron.auditblogs.com/2007/08/12/bio2010-part-1/">earlier post</a> I gave my quibbles with the language of the BIO2010 report. I promised to lay out an alternative curriculum proposal, but first I&#8217;m going to set forth my underlying philosophy. First a few principles:</p>
<dl>
<dt>Repeat a canon several times.</dt>
<dd>
    Choose a core set of techniques and ideas which will properly shape the students&#8217; minds. The shape of their mind when they come out is far more important than any particular collection of facts.</p>
<p>My sister once noted in surprise, the night before a calculus exam, &#8220;I can&#8217;t study for this class! I can only practice.&#8221; There is no body of things that a science student should know, only a body of things he should be able to do. Skills and mindsets take much longer to teach than facts, so it is important to be stingy with what gets space in the curriculum. For example, given familiarity with the simple SIR epidemiology model and with partial differential equations from mechanics, there is no conceptual difficulty with adding spatial effects to the SIR model. Unless it is a step to some other compelling mental skill, ditch it.</p>
</dd>
<dt>If math is your language, don&#8217;t work in translation.</dt>
<dd>You would not expect students of Italian literature to have everything taught in English translation except for one course where they played with Italian translation. Just so, if mathematics isn&#8217;t the language in which you teach your classes, adding a random course will not fix the situation. Having students take carefully designed courses in math, computer science, and physics is not the right approach. They should only take those courses which impart tools they will use in every biology class from there on out. If no biology class in your department uses complex analysis, then don&#8217;t require the class of your students. If almost every class needs stochastic processes, then abstract that out and require a course in it. If no biology class in your department uses math, then if you find this state of affairs unacceptable, it falls upon you to correct it, not to delegate the task to a mathematician who spends his days worrying about category theory.</dd>
<dt>Prerequisite courses are for the universal, not the potentially useful.</dt>
<dd>
    Physicists have separate calculus courses not because it&#8217;s a good idea for students to know calculus, but because the tools are used in every physics course the students take, so it is more efficient to deal with the tools once, uniformly, and assume them from there on out. Topics like Laguerre polynomials which show up in only some of the following physics classes are simply taught alongside the physics, not abstracted out.</p>
<p>Does every course require the students to know what a calcium ion is, what it does in solution, and how it combines with things? Then general chemistry seems like a compelling prerequisite.</p>
</dd>
<dt>A course is the minimal mental pattern of a practitioner.</dt>
<dd>Students taking a course shouldn&#8217;t have to memorize facts. They are there to pick up a core of tools and structures which shapes their mind for research in a field. I propose a rule of thumb: if a practitioner uses something weekly, it should be in the introductory course; monthly, in the advanced undergraduate course; annually, in the graduate course.</dd>
</dl>
<p>Finally, the report wants physics and chemistry departments to modify their introductory courses for the needs of the biology department, or to offer special introductory courses. No. Many students are undecided as to their major in their first year, and take the introductory courses in all the sciences. I think each of these courses should be taught as if to a class full of majors in the department teaching the course. For instance, why should physics departments throw out relativity? And if an ostensible biology student seeing relativity decides that they like that more, shouldn&#8217;t they become a physics major?</p>
<p>I&#8217;ll post the curriculum proposal next.</p>
<p>(<b>Update</b>: Talking to my adviser, his comment was that the single most important thing that could be done would be to cut the administrative burden (not including teaching) of principal investigators from half to two thirds of their time to something much, much smaller. I think perhaps five to ten percent should be the upper limit.)</p>
<p>Having given some principles to guide curriculum design, I&#8217;ll propose some content. Remember, the goal is a mature and sophisticated mind, not a knowledge of any particular subfield. I&#8217;m not qualified to select the content for such a curriculum, so please suggest changes. We&#8217;ll begin with a survey of the advanced undergraduate courses.</p>
<p><strong>Neurobiology</strong>: I give neurobiology its own place because certain aspects of it are very mature and therefore a prime field for training minds. For similar reasons, immunology makes no appearance in this list, as the field is too contorted, at least as represented in the textbooks, to usefully train anyone.</p>
<p>The first milestone of the course is a physical understanding of the Hodgkin-Huxley model, and how it came to be. This will probably take half a semester.</p>
<p>After Hodgkin-Huxley, I am out of my depth. The two obvious directions are the essential features of the synapse, and the basics of neural architecture, probably using vision and hearing as the models. <a href="http://www.his.sunderland.ac.uk/ps/worksh2/denham.pdf">This brief discussion</a> (warning: PDF) has some interesting references, but I have not had time to follow them up.</p>
<p><strong>Molecular biology</strong> has two aspects. The first is structural understanding of the three major biopolymers (DNA, RNA, and protein); the second is designing protocols to manipulate them. Students should not only understand the structural differences between GC rich vs. AT rich helices (at the level of something like Calladine&#8217;s <em><a href="http://www.amazon.com/gp/redirect.html%3FASIN=0121550893%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0121550893%253FSubscriptionId=02ZH6J1W0649DTNS6002">Understanding DNA</a></em>), but also why each step in a miniprep is there, and how to calculate what it should be in the absence of a protocol. This means they need quantitative descriptions of sedimentation and separation by centrifugation, chromatography, and amplification by various kinds of PCR. Then move on to models of transcription and translation, and mechanics of molecular motors. This course will probably take about a year.</p>
<p><strong>Evolution and ecology</strong> begins with population genetics (Gillespie&#8217;s <em><a href="http://www.amazon.com/gp/redirect.html%3FASIN=0801880092%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0801880092%253FSubscriptionId=02ZH6J1W0649DTNS6002">Population Genetics</a></em> seems about right), epidemiology (some selection from Diekmann and Heesterbeek&#8217;s <em><a href="http://www.amazon.com/gp/redirect.html%3FASIN=0471492418%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0471492418%253FSubscriptionId=02ZH6J1W0649DTNS6002">Mathematical Epidemiology of Infectious Diseases</a></em>), something about species interactions and evolution of behavior (a couple bits and pieces of Gintis&#8217;s <em><a href="http://www.amazon.com/gp/redirect.html%3FASIN=0691009430%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0691009430%253FSubscriptionId=02ZH6J1W0649DTNS6002">Game Theory Evolving</a></em>), bioenergetics and flows of energy carrying molecules both in organisms, which gives a chance to discuss metabolism, and in ecosystems. These selected topics constitute a semester.</p>
<p>The next semester is genetic mapping and screening (design of screens and selections, RFLP and other techniques) and phylogeny (constructing the universal phylogenetic tree, with visits to representative areas such as evolution of horses, gene transfer in bacteria, and parallel evolution of fluorescent proteins in corals).</p>
<p><strong>Biomechanics</strong> is the successor to anatomy, merged with mechanical engineering and developmental biology. It begins with a couple weeks of statics such as Galileo&#8217;s argument on the scaling of bones, pressures on bacterial membranes and cell walls, DNA pressure and packing in viral capsids.</p>
<p>Then it moves to dynamics: swimming of microscopic and macroscopic creatures, motility of cells by actin, circulation in the body, and basic models of cell migration and patterning in development.</p>
<p>What about supporting courses?</p>
<p><strong>Single and multivariable calculus</strong> are necessary in all the courses, as are <strong>ordinary differential equations</strong>, with an emphasis on phase space analysis and qualitative features, but definitely including the Fourier transform. Partial differential equations are used only in biomechanics and evolution and ecology, and in neither place in really intricate ways, so they can be handled in those classes. <strong>Probability and stochastic processes</strong> appear in all of the courses except biomechanics, and possibly there as well depending on the exact selection of topics. It is also necessary to statistics.</p>
<p><strong>Chemistry</strong> of ions in solutions is necessary for neurobiology, chemical thermodynamics for evolution and ecology and molecular biology. Organic chemistry is necessary to make sense of the pieces in molecular biology. A general chemistry course with a really physical bent is probably enough as a starting point.</p>
<p><strong>Physics</strong> of point particles and rigid bodies, thermodynamics, and basic electromagnetism are all that need to be covered outside of the courses. I expect most science students to take introductory physics, chemistry, and biology as part of choosing their field of study, so these courses cannot be altered too much.</p>
<p>All of the classes can provide opportunities for students to dig into real data, so a separate <strong>data analysis and statistics</strong> course makes sense. I think a selection from Tukey&#8217;s <em><a href="http://www.amazon.com/gp/redirect.html%3FASIN=0201076160%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0201076160%253FSubscriptionId=02ZH6J1W0649DTNS6002">Exploratory Data Analysis</a></em> and <em><a href="http://www.amazon.com/gp/redirect.html%3FASIN=020104854X%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/020104854X%253FSubscriptionId=02ZH6J1W0649DTNS6002">Data Analysis and Regression</a></em> and Kiefer&#8217;s <em><a href="http://www.amazon.com/gp/redirect.html%3FASIN=0387964207%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0387964207%253FSubscriptionId=02ZH6J1W0649DTNS6002">Introduction to Statistical Inference</a></em> would form a good basis for such a course. All science students should have such a course, so there is no reason to specialize it for biology.</p>
<p>The data analysis course and the evolution and ecology course both involve an understanding of programming and computational cost of algorithms. A computer science course on <strong>basic numerical analysis</strong> and <strong>search, sorting, and alignment algorithms</strong> would be a useful companion to both. Students should write all their own code for the assignments in this class, and use almost no libraries and certainly no algorithmic black boxes. About the only way to actually do this is in a semester is to teach the course in <a href="http://www.teach-scheme.org/">Scheme</a> or a similar language.</p>
<p>What goes in the introductory course? My inclination is the first two lectures of material from each major section of the four courses, rearranged into a more unified year-long presentation.</p>
<p>What laboratory classes should students take? I would propose a separate introductory laboratory class of experiments that can be done in one or two sessions, separate from the introductory biology course, but covering what of that material is checkable in the lab in reasonable time.</p>
<p>Molecular biology is an obvious for a candidate for lab, learning how to prepare DNA, digest and religate it, run western blots, and do genetic engineering in <em>E. coli</em>.</p>
<p>Almost everyone uses a microscope, so a one semester laboratory on optical microscopy covering material something like Shinya Inou&#233;&#8217;s <em><a href="http://www.amazon.com/gp/redirect.html%3FASIN=0306455315%26tag=ws%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0306455315%253FSubscriptionId=02ZH6J1W0649DTNS6002">Video Microscopy</a></em> would be a good time investment. Then an intermediate lab where students carry out three or four longer experiments in small groups throughout the semester.</p>
<p>The Bio2010 report recommends a research seminar, which I think is a good idea. The <a href="http://www.rockefeller.edu">Rockefeller University</a> requires such a course of its graduate students. The class reads two of the classic papers in biology each week and meets with two faculty members over lunch to go over them.</p>
<p>In the end we have a schedule that looks like this:</p>
<p><strong>First year</strong>:</p>
<ul>
<li>Introductory biology (all year)</li>
<li>Introductory chemistry (all year)</li>
<li>Introductory physics (all year)</li>
<li>Single and multivariable calculus (all year)</li>
</ul>
<p><strong>Second year</strong>:</p>
<ul>
<li>Introductory biology laboratory (all year)</li>
<li>Differential equations (fall) / Probability and stochastic processes (spring)</li>
<li>Neurobiology (fall) / Biomechanics (spring)</li>
<li>Numerical analysis (fall) / Data analysis and statistics (spring)</li>
</ul>
<p><strong>Third year</strong>:</p>
<ul>
<li>Evolution and Ecology (all year)</li>
<li>Molecular biology (all year)</li>
<li>Microscopy lab (fall) / Intermediate lab (spring)</li>
<li>Research seminar (all year)</li>
</ul>
<p><strong>Fourth year</strong>:</p>
<ul>
<li>Molecular biology laboratory (fall)</li>
</ul>
<p>This leaves plenty of space for electives and general requirements. Now everyone have at and rip this to shreds.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/my-thoughts-on-the-bio2010-report/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Astonishing things in informatics</title>
		<link>http://madhadron.com/astonishing-things-in-informatics</link>
		<comments>http://madhadron.com/astonishing-things-in-informatics#comments</comments>
		<pubDate>Tue, 18 Oct 2011 22:30:32 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[nontechnical]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=239</guid>
		<description><![CDATA[Some things astonish me more the longer I know them. Some examples from informatics: Reliable and complex from simple, brittle pieces TCP guarantees the arrival of intact data, short of the entire network fragmenting. Some telecom systems claim nine nine&#8217;s of up time. Large data centers expect disk failure (PDF). No single person understands the [...]]]></description>
			<content:encoded><![CDATA[<p>Some things astonish me more the longer I know them. Some examples from informatics:</p>
<h2 id="reliable-and-complex-from-simple-brittle-pieces">Reliable and complex from simple, brittle pieces</h2>
<p><a href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP</a> guarantees the arrival of intact data, short of the entire network fragmenting. Some <a href="http://en.wikipedia.org/wiki/Erlang_(programming_language)">telecom systems</a> claim nine nine&#8217;s of up time. Large data centers expect <a href="http://labs.google.com/papers/disk_failures.pdf">disk failure</a> (PDF).</p>
<p>No single person understands the whole of one of Intel&#8217;s chips, or those from AMD, or any other desktop microprocessor today. This is <a href="http://en.wikipedia.org/wiki/Conway's_Law">Conway&#8217;s law</a> at work: &#8220;&#8230;organizations which design systems &#8230; are constrained to produce designs which are copies of the communication structures of these organizations.&#8221; Turn it around: a system beyond the grasp of an individual human can be built by organizing them along its lines, and designing the organization <em>can</em> be done by one human.</p>
<p>In groups we can build things beyond the grasp of the individual, that are more robust than any part we have available to put into it. If that doesn&#8217;t give you hope for our species, what does?</p>
<h2 id="abstractions">Abstractions</h2>
<p>The <a href="http://okmij.org/ftp/Scheme/misc.html#sokuza-kanren">simplest mini-Prolog</a> is almost trivial. It&#8217;s a few functions, none more than a few lines of code. Seen in the host language, the functions are inscrutable, as opaque as looking up at the surface of the water.</p>
<p>On the far side, unification over Horn clauses (i.e., Prolog) is a consistent and self contained view of computing. You can peer down at the surface where Prolog begins and lambda calculus ends, but everything on the far side is hazy and distorted.</p>
<p>I always feel a sense of vertigo when I cross such a boundary, and even more when I erect one.</p>
<h2 id="the-infinite-horizon-of-homoiconic-languages">The infinite horizon of homoiconic languages</h2>
<p>In Lisp, all is Lisp. In Smalltalk, all is Smalltalk. In Prolog, all is Prolog. These languages are easily defined in themselves. Their interpreters and compilers live as automata in the world which they define.</p>
<p>This imparts a feeling of space. In Haskell or C you can see the boundary clearly. Your semantics, your type system, the hard wall of the compiler, sit in plain view. In the homoiconic languages, the compiler is just another function, another object, another rule. The boundary that defines the edge of your semantics recedes into the distance and vanishes.</p>
<p>Practically, such languages are incredibly easy to bootstrap. If you can just breath life into some kind of interpreter, you open a door to wonderland, and may step through and forget whence you came.</p>
<h2 id="intrinsically-hard-problems">Intrinsically hard problems</h2>
<p>The implicit trust of mathematicians for millenia was that if we were only smart enough, if our reason were just powerful enough, we could calculate anything in the universe.</p>
<p>Complexity theory ended that. There are problems in our universe on which reason may break itself in vain, and they are named NP-complete. In a universe of subatomic particles colliding and appearing and disappearing, the seeming abstraction of how hard a problem is to solve is bound by universal laws.</p>
<h2 id="artificial-intelligence-isnt-human-intelligence">Artificial intelligence isn&#8217;t human intelligence</h2>
<p>Computers bear no resemblance to human cognition bears. We did not make these machines in our image. Indeed, one of the successes of the machines was to make us realize that we are not computers. Our thought processes do not work like that.</p>
<p>We are things capable of taking decisions based on the world around us. We created other things likewise capable, but they do so in a manner entirely removed from our own. After living with our creations, I can only wonder at the naivete of those who could think a being that created us would do so in his own image.</p>
<h2 id="updating-technical-consensus-is-very-very-hard">Updating technical consensus is very, very hard</h2>
<p>JavaScript is broken. Everyone knows and acknowledges this. Books are written about how to avoid its jagged edges.</p>
<p>The Internet was not designed for web applications or streaming media or commerce or identification of its users.</p>
<p>Unicode is one of the most remarkable consenses of mankind.</p>
<p>Technical consensus is necessary for complicated artifacts, but no one knows how to change them. There are many obvious things that could be fixed in JavaScript, things that no one would dispute, but socially we don&#8217;t know how.</p>
<h2 id="publicprivate-key-cryptography-and-webs-of-trust">Public/private key cryptography and webs of trust</h2>
<p>I can encrypt a message, send you a key to decrypt it, and be sure that no one else can eavesdrop. <a href="http://en.wikipedia.org/wiki/Public-key_cryptography">Public/private key cryptography</a>, like building reliable systems from unreliable components, seems too good to be true. Of course, there&#8217;s a catch. How do you know someone didn&#8217;t replace the message and the key?</p>
<p>This is where <a href="http://en.wikipedia.org/wiki/Web_of_trust">webs of trust</a> come in. I may not trust the key I got from you, but I may trust that a mutual friend says it&#8217;s the right key. That may not scale very well &#8211; why should I trust the friend of a friend of a friend? &#8211; but it needn&#8217;t scale very far. The <a href="http://en.wikipedia.org/wiki/Six_degrees_of_separation">small world theorem</a> says that if the web is sufficiently comprehensive, I am unlikely to need very long chains of trust.</p>
<p>In practice, such cryptography hasn&#8217;t become ubiquitous enough for the small world effect to really kick in, which is a shame.</p>
<h2 id="self-destruction-of-communities">Self destruction of communities</h2>
<p>In any sufficiently large and general online community, cliques arise that try to destroy the community, along with lots of <a href="http://www.shirky.com/writings/group_enemy.html">other odd behavior</a>.</p>
<p>If the interactions are limited enough, then this kind of abuse tends not to arise because the technical limitations make it too much work. It is hard to imagine how to go on a rampage in Twitter. In a small community, the social pressure to behave is high, and the probability of someone resistant or hostile to such pressure being a part of the community is small.</p>
<p>When the interactions are general enough and the community large enough, groups arise who view the community around them as their intellectual prey.</p>
<p>And then a system of controls and enforcers is put in place if there wasn&#8217;t already one. The <a href="http://daniel.haxx.se/irchistory.html">history of IRC</a> is one of networks splitting off and adding controls and enforcer powers. Usenet continually suffered from flamewars. Spam appeared in email, not to mention mail bombs.</p>
<p>We have built communities again and again from scratch on the Internet. It appears that a certain fraction of the human race is naturally destructive and must be held in check, and that some part of every community&#8217;s resources must be devoted to that.</p>
<p>It&#8217;s a strange thing to learn from a collection of wires and transistors.</p>
<h2 id="cryptographic-hash-functions">Cryptographic hash functions</h2>
<p>Imagine a function that has an inverse. The function is easy to compute, the inverse almost impossible. This is another of those pieces of magic that makes modern computing possible.</p>
<p>Add to that two more properties: small changes in input cause large changes in output, and no two inputs yield the same output. Now you have a tool to reduce anything to an identifying integer. At some level, computer security is controlling who has the data and who only has the identifying integer.</p>
<h2 id="accept-liberally-produce-conservatively-and-go-mad">Accept liberally, produce conservatively, and go mad</h2>
<p><a href="http://www.ietf.org/rfc/rfc791.txt">RFC 791</a> says &#8220;Be liberal in what you accept, and conservative in what you send.&#8221; Since RFC 791 defines IP, the foundation of the Internet, this is, for all practical purposes, the Intenet&#8217;s motto. And the Internet grew insanely fast.</p>
<p>The World Wide Web did the same thing, resulting in broken HTML, distorted HTTP requests, and growth more usually associated with bacteria.</p>
<p>You must accept that many people will have to spent a lot of time cleaning the mess to the best of their ability at some point in the future. On the World Wide Web, at least 40% of HTML was malformed in 1996, and probably much, much more (it&#8217;s hard to tell from the way the <a href="www.paulaoki.com/papers/www5-color.pdf">data are presented</a>). In 2001, there were at least 140 invalid HTML documents on the web for every valid one (<a href="http://www.ub.uib.no/elpub/2001/h/413001/">source</a>).</p>
<p>And a <a href="http://www.w3.org/">lot</a> <a href="http://www.alistapart.com/">of</a> <a href="http://www.webstandards.org/">people</a> have spent the last fifteen years trying to fix this. The situation has actually improved, but think how much time and acrimony has been spent.</p>
<p>On the other hand, without this path, the web would have had only a brief history as a medium for <a href="http://en.wikipedia.org/wiki/Les_Horribles_Cernettes">pictures of women</a> as opposed to its multidecade run as the largest source of images of the female of the species ever created by man.</p>
<p>It&#8217;s apparently better to let everyone in, let it go to hell, and try to clean up later. To the mathematical mind, this is a boggling principle.</p>
<h2 id="concurrent-is-different">Concurrent is different</h2>
<p>Concurrency, or multiprogramming as it used to be called, has long been the bugbear waiting to snatch the unsuspecting programmer. Most programmers developed an implicit view of their craft based on some variation of a Turing machine. They work by setting up and running that machine in their head. This has crippled more programmers than I care to think about. (Dijkstra had many unkind things to say about <a href="http://userweb.cs.utexas.edu/users/EWD/transcriptions/EWD09xx/EWD917.html">such issues</a>).</p>
<p>The human brain can keep track of one machine going back and forth on the tape. At each stage, there is only one possibility. The programmer develops an intuition based on how he would move chess pieces around on a board. But as soon as there is more than one machine on the tape, this fails, completely and utterly. There are now two chess players, and even when we only allow them to move their own pieces, that game has absorbed more of man&#8217;s intellectual energies over the course of history than programming has.</p>
<p>Concurrent programming is more difficult than single threaded programming, but more importantly, it&#8217;s just different. Once you have trashed the little Turing machine in your head, there is a lot <a href="http://www.amazon.com/Method-Multiprogramming-Monographs-Computer-Science/dp/038798870X">known</a> <a href="http://portal.acm.org/citation.cfm?doid2=1454456.1454462">about</a> <a href="http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html">multiprogramming</a>. But what is astonishing in this? It&#8217;s not concurrency, though there are gems to be found there. It&#8217;s that we spend endless work trying to pander to programmers crippled by a curable mental afflication.</p>
<h2 id="the-creation-of-a-network">The creation of a network</h2>
<p>The Internet went from 213 hosts in 1981 to 29 million in 1998 to 681 million last year (<a href="http://www.isc.org/solutions/survey/history">source</a>). I remember when the only contact information on business cards were telephone numbers and mailing addresses, before FAX numbers were added. Today the telephone number is secondary to the email address or web site.</p>
<p>My mother was 17 when the first ARPAnet link was established. TCP/IP became the sole accepted protocol on the Internet in 1983, the year I was born.</p>
<p>The Internet is something new and different, though it has become passe to say so. Pundits and bloggers and talking heads pontificate about the transformative powers of this or that trend. They have no more idea of what&#8217;s happening than the neurons in our brains understand our thoughts. (I should add that the Internet is not a brain, and that analogies between the Internet and brains are roughly as useful as analogies between the Internet and cottage cheese; see my astonishment, above, that we created machines not in the image of ourselves).</p>
<p>How outlandish does the phrase, &#8220;The Internet thinks your cute?&#8221; sound? We&#8217;re not far from it seeming almost normal. Now go read <a href="http://project-apollo.net/mos/">A Miracle of Science</a>, where Mars is a group mind and at some point says to someone, &#8220;Mars likes you.&#8221;</p>
<h2 id="epilogue">Epilogue</h2>
<p>Now that the list is made, why did I choose things from informatics? There are things in math and physics and music that astonish me more over the years, but they tend to feel eternal. They are truths the world may reach towards, but they will not meet it half way.</p>
<p>The things I listed in informatics are visceral, low-brow, as though we were playing in a mud puddle from which the Loch Ness monster emerged. And they are, in so many case, about people, about how people interact and how people fail and how people end up in horrible messes.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/astonishing-things-in-informatics/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A novelist&#8217;s yak shaving</title>
		<link>http://madhadron.com/a-novelists-yak-shaving</link>
		<comments>http://madhadron.com/a-novelists-yak-shaving#comments</comments>
		<pubDate>Mon, 29 Aug 2011 04:00:21 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[Writer's Journal]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=236</guid>
		<description><![CDATA[Yak shaving is a term for the seemingly unrelated things you end up doing while trying to accomplish some other task. I found myself with a delightful example while writing today. A character has been slippery. Very well, who is this person? I&#8217;ll digress and figure her out, tell her story to myself, so she [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://projects.csail.mit.edu/gsb/old-archive/gsb-archive/gsb2000-02-11.html">Yak shaving</a> is a term for the seemingly unrelated things you end up doing while trying to accomplish some other task. I found myself with a delightful example while writing today.</p>
<p>A character has been slippery. Very well, who is this person? I&#8217;ll digress and figure her out, tell her story to myself, so she can walk into my novel a breathing creature.</p>
<p>This implies significant yak shaving even writing on Earth unless you&#8217;re writing about the town you live in, but as soon as you create a fictional world, you must digress from the character to construct the society that produced her, her family and upbringing. They, in turn, require a setting to live in. This is how worlds grow vastly beyond the small number of settings a novel actually traverses.</p>
<p>On a planet&#8217;s surface, the construction of areas, the effects of geography and history, are fairly familiar to most reasonably read and traveled people, but this book isn&#8217;t on a planet. Suddenly I find myself reading about orbital velocities and composition of trans-Neptunian objects.</p>
<p>The trick is minimizing it. The physicist in me wants to calculate transfer orbits and slingshots and velocity matching and the transfer orbits to send material insystem and the probable system of volatile materials futures used to hedge materials being sent in this way. Thas way lies madness and no character. I&#8217;m writing a novel, not a mathematics text (though there is that idea I have for a series of love letters involving Bessel functions). To quote Henry James, &#8220;Up to what point is such and such a development <i>indispensable</i> to the interest [of the subject]?&#8221;[preface to Roderick Hudson] and he was talking about the book itself, not the yak shaving of character predevelopment.</p>
<p>I chose to shave just enough yak fur off the astrophysics to give the character a setting, and I&#8217;ll adjust my narrative to eschew any details. After all, what reader wants to hear about the numerical calculation of chaotic transfer orbits from a position leaving a trans-Neptunian dwarf planet&#8217;s transfer orbit to to a cometary core in the inner Oort cloud?</p>
<p>Well, given my friends, probably quite a few, but they will have to be disappointed.</p>
<p>It was all worth it, though, when I traced the chain back out to the character herself, and as I told myself her story there came a moment when she sat up and drew breath in my mind, no longer a role in a narrative but a conception of a fellow human being.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/a-novelists-yak-shaving/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The iniquities of the Unix shell</title>
		<link>http://madhadron.com/the-iniquities-of-the-unix-shell</link>
		<comments>http://madhadron.com/the-iniquities-of-the-unix-shell#comments</comments>
		<pubDate>Fri, 29 Jul 2011 20:52:46 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=227</guid>
		<description><![CDATA[Most bioinformaticists that I know regard the Unix shell as their interface to their computer. There are other things floting around out there in the depths of whatever Unix-like operating system they run, but in their minds everything passes through the shell as the milieu in which computing takes place. This is not only wrong, [...]]]></description>
			<content:encoded><![CDATA[<p>Most bioinformaticists that I know regard the Unix shell as their interface to their computer.  There are other things floting around out there in the depths of whatever Unix-like operating system they run, but in their minds everything passes through the shell as the milieu in which computing takes place.</p>
<p>This is not only wrong, it is dangerously wrong.  I&#8217;ll explain why it&#8217;s wrong, then come back to the dangerous part afterwards.</p>
<p>Everyone has learned some variation of the idealized von Neumann computing machine: a processor that executes one instruction at a time and some contiguous bank of memory which it is connected to. Instructions in the processor can take a memory address in a register and read the data found at that address in memory into another register, or write the contents of that other register to that address in memory.  Perhaps the registers are replaced by a stack.  The memory might be layered, with some parts being faster to access and smaller and size, and others increasingly large and slow.</p>
<p>A program specifies the behavior of this machine.  Typically they start with data at one place in memory and produce other data in another place in memory.  This should all sound very familiar.  A way of specifying programs is an interface to the computer.</p>
<p>So does a shell script constitute a program for this machine?</p>
<p>Many of my readers doubtless said yes.  After all, it begins with data at one address (which happens to be a file in the filesystem, but that&#8217;s a memory address) and generally leaves data at another address (again in the file system, or perhaps stdout).  It specifies the behavior of the machine.  How could it not be a program for this machine?</p>
<p>It specifies the behavior of a machine, but not of this machine.  When you run &#8220;ls | tail -n 7&#8220;, you do not have a coherent piece of memory.  The operating system establishes the appearance of a contiguous chunk of memory for &#8220;ls&#8220;, and it establishes a second, completely disjoint chunk of memory for &#8220;tail&#8220;.  In effect, there are two machines.  The operating system goes through contortions to hand off a starting pattern of data to &#8220;tail&#8220; and to do something with the final pattern of data from &#8220;ls&#8220;.  The shell script describes a machine more like a prototyping board in electrical engineering.  The Unix shell is a more convenient way of starting programs than keying them in by hand with switches on the front of the machine.</p>
<p>Why is this error dangerous?</p>
<p>If you regard the Unix shell as the interface to the machine, then writing a program consists of linking up pieces in the shell.  The programmer may have to venture into some other world in order to create pieces, but those pieces are thought of as the atomic units.  Among such programmers, all projects will end up producing shell commands.  Each command will have its own file format, or ape the format of another command.  There will be no publication of algorithms, only publication of shell commands.  After a certain point the programmers realize that they are running out of descriptive names that are short enough to type comfortably.  They begin creating namespaces by having a single command accept subcommands: foobar create, foobar delete, foobar check, etc.</p>
<p>Among such programmers much of the &#8220;work&#8221; of the field will consist of dealing with file formats, resolving naming conflicts, and trying to find bigger, better ways to connect pieces, never realizing that all of these are irrelevant.</p>
<p>A program consists of algorithms that run on data structures.  Anything that interferes with running the algorithms on the data structures is part of the problem set, not the solution set.  File formats to get data from one program to another are a problem.  Leave the result of an algorithm in memory for the next algorithm to operate on.  Naming every possible operation and adding subcommands to avoid conflicts and reuse names is a problem.  Use a language with namespace support, like anything designed since the 1970s.  Bigger and better ways to connect shell commands &#8212; these go under the moniker of &#8220;workflow managers&#8221; &#8212; are a problem.  Calling conventions ceased to be a topic of controversey in programming languages in the 1970s, and instrumenting programs to monitor and log their execution has spawned that exotic tool, the &#8220;debugger&#8221;.</p>
<p>The shell mindset results in two dangers.  First, as the cruft wedged between algorithms increases, more and more of the field spends more and more of its time fighting through with it instead of real problems. Estimating from my own experience and observation of my colleagues, most bioinformaticists today spend between 90% and 100% of their time stuck in cruft.  Methods are chosen because the file formats are compatible, not because of any underlying suitability.  Second, algorithms vanish from the field.  They are not published.  They are not analyzed.  They are not seen.  I am not worried about a lack of understanding of the heuristics for solving NP-complete problems like sequence matching, though I would love to be.  I&#8217;m worried about the number of bioinformaticists who don&#8217;t understand the difference between an O(n) and an O(n2) algorithm, and don&#8217;t realize that it matters.</p>
<p>I don&#8217;t know what to halt this process.  It may be too late.  Biologists are pouring into the field and being taught that the irrelevancies and trivialities of file formats and workflow managers are legitimate work.  They go back and tell their colleagues that it&#8217;s all very complicated and difficult without ever seeing the really difficult parts of programming.  We may be doomed. </p>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/the-iniquities-of-the-unix-shell/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A final rant for SyBit: the education of scientists</title>
		<link>http://madhadron.com/a-final-rant-for-sybit-the-education-of-scientists</link>
		<comments>http://madhadron.com/a-final-rant-for-sybit-the-education-of-scientists#comments</comments>
		<pubDate>Wed, 15 Jun 2011 18:09:09 +0000</pubDate>
		<dc:creator>Fred Ross</dc:creator>
				<category><![CDATA[nontechnical]]></category>

		<guid isPermaLink="false">http://madhadron.com/?p=217</guid>
		<description><![CDATA[I promised you all a final rant. Well, here it is: university science education doesn&#8217;t produce scientists. What&#8217;s a scientist anyway? Someone who pipettes all day, or stares through a telescope? That could just as well be a technician, and often is, even if hidden behind a title of &#34;postdoc&#34; or &#34;professor&#34;. Let&#8217;s take some [...]]]></description>
			<content:encoded><![CDATA[<p>I promised you all a final rant. Well, here it is: university science education doesn&#8217;t produce scientists.</p>
<p>What&#8217;s a scientist anyway? Someone who pipettes all day, or stares through a telescope? That could just as well be a technician, and often is, even if hidden behind a title of &quot;postdoc&quot; or &quot;professor&quot;. Let&#8217;s take some folks who are scientists by any estimation: Newton, Linnaeus, Mendele&#8217;ev, Darwin, Boyle, Einstein. We could easily add an astoudingly diverse range of names. On the flip side we can name many who were appalling scientists, such as Watson and Crick, Thomas Aquinas, Lysenko, or any of the proponents of &quot;intelligent design&quot; that plague us today.</p>
<p>What separates these two groups? It&#8217;s not that one worked on a particular thing or in a particular way. The day to day methods of Darwin were much closer to Lysenko than to Newton. It&#8217;s not intelligence. Newton was indubitably a genius, but so was Thomas Aquinas.</p>
<p>It&#8217;s a question of virtue.<sup><a href="#fn1" class="footnoteRef" id="fnref1">1</a></sup> We utter &quot;Lysenko contaminated his science with ideology&quot; and &quot;Watson and Crick stole data&quot; and &quot;Intelligent design beggars the question&quot; in tones of moral outrage. These men are epistemically evil, just as Boyle and Darwin are epistemically virtuous. The virtues vary &#8212; objectivity was not one of Mendel&#8217;s great virtues, nor generosity with ideas one of Newton&#8217;s &#8212; but all acted in a manner which embodied some range of epistemic virtues.</p>
<p>So: a scientist is an epistemically virtuous individual.</p>
<p>Now look at the graduates of science departments around the world. They aren&#8217;t particularly epistemically evil, nor very epistemically virtuous either. Virtue is a question of habit. Decisions make ruts in our minds and repeated action, virtuous or evil, digs ruts a man can&#8217;t easily escape from. There are no such ruts impressed into the minds of most of the children emerging from university with their degrees. They have memorized some facts, perhaps learned some technical skills, but we cannot call them scientists.</p>
<p>Some of them go on to graduate school, where they may be shaped by an advisor, but it is just as likely that they will be ignored, and who knows what ruts will develop? In some fields, such a molecular biology, this has gone on for generations, and it is only by chance that you may happen upon a professor who is a scientist.</p>
<p>This is all very depressing. If lecturing the children and making them do laboratory experiments does not produce scientists, how can it be managed? We must drive some ruts through their minds in desirable directions, which can only be accomplished my making them <em>do</em> some science, and do it in such a way that they are not merely technicians but actively make and take responsibility for decisions with epistemic consequences.</p>
<p>They still need technical skills, or they won&#8217;t be in a position to act at all, virtuously or not. Some degree of the following are needed for a scientist:</p>
<ul>
<li>
<p>Mastery of both her native language and English to the point where she doesn&#8217;t create impressions in her audience by accident. Her palette of intentionally created impressions may be limited, but she will not inadvertantly lead her audience astray.</p>
</li>
<li>
<p>Technical fluency in some means of experimenting, whether carpentry and metalwork, manipulating liquids, mathematics, or programming. Whatever the medium, the scientist must turn to her tools without mental hesitation.</p>
</li>
<li>
<p>Explicit thought. Anyone who has taught programming knows most humans have never had anything but vague thoughts. Faced with the computer, they struggle. Much epistemic vice is due to an inability to recognize vagueness. Programming is the best way to learn this explicitness.</p>
</li>
<li>
<p>Organizing definitions and categories. Careful choice of definitions and their arrangement vastly changes the effectiveness of a scientist&#8217;s thoughts<sup><a href="#fn2" class="footnoteRef" id="fnref2">2</a></sup>. The best way to learn this is to solve problems in a dependently typed programming language like Agda in such a way that errors are semantically impossible.</p>
</li>
<li>
<p>Expressing ideas directly in mathematics, without passing through English or any other spoken language. Spoken languages are poor vehicles for new thoughts. There have been great scientists who couldn&#8217;t express their ideas directly in mathematics, and they have, one and all, lamented their inability to do so. There is no such thing as a non-mathematical science, just fields which awkwardly embed their mathematics in English. It is also impossible to understand statistics in any useful way without this fluency.</p>
</li>
<li>
<p>Break the human habit of thinking in terms of &quot;A acts on B&quot;. A mind that has not replaced this with basic notions of relations and mappings is crippled. Essentially, every scientist must internalize the essentials of category theory.</p>
</li>
</ul>
<p>Once we have a student with these prerequisites, how do we drive ruts through her mind? We must give her role models and guidance, and then put her in situations where she must apply epistemic virtues.</p>
<p>The role models needn&#8217;t all be present, or even alive. A scientist who hasn&#8217;t read the founders and great investigators of her field, and understood how and why they approached the problems as they did, is at best a dilettante. A physicist who has not read Newton and a neuroscientist who has not studied Cajal are both pitiful creatures.</p>
<p>Someone does have to guide the student, though. Someone has to choose problems big enough to challenge her but not so large as to swamp her or make it too difficult to act virtuously (remember, we must be sure we lay ruts in the right places). Someone has to be a colleague to the student as she tackles the problem, and critic when she has finished. The written works of dead men don&#8217;t suffice.</p>
<p>And what kind of problems should the student get? I&#8217;ll offer some ideas which cover epistemic issues I think all scientists should address:</p>
<ul>
<li>
<p>Measure something out in the wild that cannot be found via controlled experiment in the laboratory. It could be the population of fish in local waterways, or the incidence of a disease in the area, or anything in astronomy. Observational studies offer scope for epistemic problems which the laboratory scientist can blissfully ignore.</p>
</li>
<li>
<p>Plan an experiment limited by available resources. It could be a small clinical trial or an industrial project, but the outcome must matter and the resources to run the experiment must be sufficiently hard to come by that the student must do the experimental design and analysis correctly. All scientists today must come to terms with the foundations and practice of statistics.</p>
</li>
<li>
<p>Measure a parameter with a precise, fixed value, such as the speed of light or the fracturing stress of a material. The student needs to know the worry of hunting for systematic errors in an apparatus while trying to produce a correct value.</p>
</li>
<li>
<p>Choose the objects of study in an area without an established theory. Scientists who have always had clearly delineated definitions &#8212; the mass of an object is important, its color is irrelevant, etc. &#8212; are often at a loss when faced with nature in her raw form. The student should be faced with raw nature with no guiding theory and forced to impose mental order on some corner of the world.</p>
</li>
<li>
<p>Explore a solid theory, such as classical mechanics or chemical thermodynamics. This is a chance to know what the outcome of successful science looks like: a solid, predictive theory on which you can base a discipline of engineering. The student also learns how much there is yet to be done in even the best established fields.</p>
</li>
<li>
<p>Engineer a tool. The student must learn the difference between hacking something together for herself and producing a tool for others. There is only one way: she designs the tool, then is forced to watch silently while someone else tries to use it. Then she redesigns it, watches again, and on and on.</p>
</li>
</ul>
<p>This list is by no means comprehensive, but it provides the student with a chance to exercise many epistemic virtues, and to find those which suit her best.</p>
<p>Unfortunately, this kind of education would require restructing the universities and firing a large number of professors. Since that is unlikely to happen, however desirable it might be, I&#8217;m afraid it&#8217;s just going to be a studiously ignored ideal.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn1">
<p>This idea isn&#8217;t mine. I stole it from Daston&#8217;s book <em>Objectivity</em>. <a href="#fnref1" class="footnoteBackLink" title="Jump back to footnote 1">?</a></p>
</li>
<li id="fn2">
<p>For an example, see http://apocalisp.wordpress.com/2009/08/21/structural-pattern-matching-in-java/http://blog.tmorris.net/understanding-practical-api-design-static-typing-and-functional-programming/ <a href="#fnref2" class="footnoteBackLink" title="Jump back to footnote 2">?</a></p>
</li>
</ol>
</div>
]]></content:encoded>
			<wfw:commentRss>http://madhadron.com/a-final-rant-for-sybit-the-education-of-scientists/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
