sean cassidyhttps://www.seancassidy.me/2020-05-26T13:23:00-07:00Phishing simulations considered harmful2020-05-26T13:23:00-07:002020-05-26T13:23:00-07:00Sean Cassidytag:www.seancassidy.me,2020-05-26:/phishing-simulations-considered-harmful.html<p>Phishing awareness training done via simulated phishing emails that your Security team sends does more harm than good.</p>
<p>These simulations are often unrealistic, bypass real phishing controls, and raise more resentment than awareness.</p>
<p>The way a typical phishing simulation works is this:</p>
<ol>
<li>Someone on Security makes a new phishing email …</li></ol><p>Phishing awareness training done via simulated phishing emails that your Security team sends does more harm than good.</p>
<p>These simulations are often unrealistic, bypass real phishing controls, and raise more resentment than awareness.</p>
<p>The way a typical phishing simulation works is this:</p>
<ol>
<li>Someone on Security makes a new phishing email, optionally using insider info that few attackers would have (like knowing when or where a company holiday party is).</li>
<li>They send it out to some segment of their employee base. Often, this bypasses the organization's anti-phishing controls via an allowlist.</li>
<li>Most employees ignore it, a few report it, and a few fall for it.</li>
<li>The few that fall for it get signed up for additional phishing awareness training, or, worse, get negative performance feedback.</li>
<li>Security reports the number to their steering committees or board of directors to argue for more investment in security</li>
</ol>
<p>I think that this is, at best, not useful, and at worst, harmful to your employees.</p>
<h1 id="why-is-this-harmful">Why is this harmful?</h1>
<p><em>It puts the onus on the employee</em>, rather than the Security team to stop phishing attacks. Your employees have work to do and you are interrupting them. You are adding to their cognitive load for every single email they read. You are increasing their anxiety of doing their job. The Security team should focus on stopping phishing emails and making it easy to report, investigate, and remediate them.</p>
<p><em>It breeds resentment of the Security team that tricks them</em>. Most organizations want a Security team that is open, friendly, and approachable. People will tend to avoid the Security team after they realized they failed a phishing email test. This is the opposite of what you want.</p>
<p><em>Bypassing your controls is not an accurate simulation</em>. If you are using a tool that bypasses G-Suite's (or Office 365 or whatever) phishing detection, this is not an accurate simulation no matter what the email says. You are likely training your users to spot your phishing emails, rather than what's actually hitting your users, even if you are attempting to base them on real phishing emails.</p>
<p><em>We already know that people fall for phishing emails</em>. <a href="https://gitlab.com/gitlab-com/gl-security/gl-redteam/red-team-tech-notes/-/tree/master/RT-011%20-%20Phishing%20Campaign">Gitlab recently found that 34% of its employees fell for a phishing simulation</a>. This is not a surprise. Even at companies that don't use email much, a consistent proportion of people will fall for phishing attempts.</p>
<p><em>Repeated phishing awareness training is not effective</em>. Most employees nowadays have been through at least one phishing training, and are aware of the basics. Getting re-trained every three months is not a useful exercise for anyone.</p>
<h1 id="what-should-you-do-instead">What should you do instead?</h1>
<p><em>Focus on overall security awareness, and offer an easy-to-use way to report phishing emails</em>. Measure number of true and false positive reports and give awards to the best reporters in your company. Offer opt-in phishing training and let your employees decide what they need.</p>
<p><em>Reduce the impact of phishing emails</em>, by implementing a single-sign on solution with phishing-resistant 2FA (like WebAuthn), and require non-email approvals for financial transactions like wire transfers. Flag replies to new email domains and encrypted attachments for review by the Security team.</p>
<p>To measure phishing susceptibility, <em>defang real phishing emails and leave them in inboxes</em>. This means rewriting URLs and email addresses and replacing attachments with duds. One advanced strategy is to deliver real but defanged phishing emails to more employees than originally received it. This is not a simulation: it's the real thing! It bypassed your phishing prevention controls. Another alternative is to run red team exercises that mimic real phishing attacks, but crucially they should have to go through your anti-phishing controls rather than bypassing them.</p>
<p><em>Avoid in-line user shaming</em>. If a user clicks on a real but defanged phishing email, it should not scold them or automatically sign them up for training. The best practice here is to silently do nothing, or give them a gentle reminder to report phishing emails and attend opt-in phishing training if they want help.</p>
<h1 id="your-users-are-not-the-problem">Your users are not the problem</h1>
<p>If I were to summarize this, it's that often Security treats the users as the problem or the weak link. This is not the case and creates an adversarial us-versus-them attitude. Instead, treat your users as an important asset that can be coached to be better, but not if you train them to spot unrealistic emails or scold them.</p>What's the length of shortest bit sequence that's never been sent over the Internet?2017-12-13T15:27:00-08:002017-12-13T15:27:00-08:00Sean Cassidytag:www.seancassidy.me,2017-12-13:/whats-the-length-of-shortest-bit-sequence-thats-never-been-sent-over-the-internet.html<script type="text/x-mathjax-config">
MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
</script>
<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-MML-AM_CHTML'></script>
<p>A friend of mine posed this brain teaser to me recently: </p>
<blockquote>What's the length of shortest bit sequence that's never been sent over the Internet?</blockquote>
<p>We can never know for sure because we don't have a comprehensive list of all the data. But what can …</p><script type="text/x-mathjax-config">
MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
</script>
<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-MML-AM_CHTML'></script>
<p>A friend of mine posed this brain teaser to me recently: </p>
<blockquote>What's the length of shortest bit sequence that's never been sent over the Internet?</blockquote>
<p>We can never know for sure because we don't have a comprehensive list of all the data. But what can we say probabilistically? Restating it like so:</p>
<blockquote>At what value for X is there a 50% chance there's a sequence of X-bits in length that hasn't been transmitted yet?</blockquote>
<hr />
<p>What does your intuition say? Obviously every 8-bit sequence has been sent since there's only 256 values. By downloading this HTML page over TLS you've probably used up every 8-bit value. Has every 100 byte message been sent?</p>
<p>This is how my intuition went: it's probably less than 128 bits because UUIDs are 128 bits, and they're universally unique. It's probably greater than 48 bits because of how common collisions are at that end for hashes and CRCs, and the Internet has generated a lot of traffic.</p>
<p>How would we determine the right value?</p>
<p>I decided to model data as each bit sent is like flipping a coin. This isn't strictly true, of course, but with encryption becoming more prevalent, it's getting to be close.</p>
<p>So how many flips of a coin does it take to expect to get n heads in a row?</p>
<p>I found <a href="https://www.cs.cornell.edu/~ginsparg/physics/INFO295/mh.pdf">this neat little paper</a> deriving the following formula, where $n$ is number of heads in a row, and $E$ is the expected number of flips:</p>
<p>$$ E = 2^{n+1} - 2 $$</p>
<p>We're looking for a specific sequence, though, not a specific number of heads
in a row. We don't eve know what the sequence is since it hasn't been sent yet.
Is that a problem? Not at all! We're looking for some sequence of length $n$,
and given that both 0 and 1 are equally likely, the sequence 00110 is equally
likely as 11111.</p>
<p>(Of course, different sequences on the Internet are not all equally likely, but
we're simplifying to make this calculable.)</p>
<p>We're looking for $n$, however, and not the number of flips. What should the number of flips be set to? We need to estimate the total amount of data ever sent over the Internet. I found a <a href="https://en.wikipedia.org/wiki/Internet_traffic#Global_Internet_traffic">nice table estimating how many petabytes per month are sent</a> for each year.</p>
<p>Adding them up gets you $3.4067 \cdot 10^{22}$ bits, which is in the same rough neighborhood as the number of grains of sand on Earth! Neat.</p>
<p>To solve for $n$:</p>
<p>\begin{equation}
\begin{aligned}
E &= 2^{n + 1} - 2 \\
3.4067 \cdot 10^{22} &= 2^{n + 1} - 2 \\
\log_2 (3.4067 \cdot 10^{22} + 2) - 1 &= n \\
n &= 73.85
\end{aligned}
\end{equation}</p>
<p>So there's a 50% chance a message of length $73.85$ bits has not been sent yet. This matched my intuition nicely!</p>
<p>Using some <a href="https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/complete-white-paper-c11-481360.pdf">forecasting estimates from Cisco</a>, here's how $n$ changes over the next few years:</p>
<table>
<tr><td>2017</td><td>$n = 74.28$</td></tr>
<tr><td>2018</td><td>$n = 74.67$</td></tr>
<tr><td>2019</td><td>$n = 75.05$</td></tr>
<tr><td>2020</td><td>$n = 75.41$</td></tr>
<tr><td>2021</td><td>$n = 75.75$</td></tr>
</table>
<p>What do you think? Am I right? Is there a different way to solve this problem? Let me know via <a href="mailto:sean@seancassidy.me">email</a> or <a href="https://twitter.com/sean_a_cassidy">Twitter</a>.</p>Browser Extension Password Managers Should Not Be Used2017-03-21T09:13:00-07:002017-03-21T09:13:00-07:00Sean Cassidytag:www.seancassidy.me,2017-03-21:/browser-extension-password-managers-should-not-be-used.html<p>It's been over a year since I presented <a href="https://www.seancassidy.me/lostpass.html">LostPass</a> at
ShmooCon, and in that time, many more bugs have been found in password
managers. The most severe of which are in browser-based password managers
extensions, like LastPass.</p>
<p>This should be obvious to everyone who has been paying attention:
<em>browser-based password …</em></p><p>It's been over a year since I presented <a href="https://www.seancassidy.me/lostpass.html">LostPass</a> at
ShmooCon, and in that time, many more bugs have been found in password
managers. The most severe of which are in browser-based password managers
extensions, like LastPass.</p>
<p>This should be obvious to everyone who has been paying attention:
<em>browser-based password manager extensions should no longer be used</em> as they
are fundamentally risky and have the potential to have all of your credentials
stolen without your knowledge, by a random malicious website you visit, or by
malvertising.</p>
<p>Here's the thing: when you use a browser extension password manager, you're
giving attackers an API to interact with your password manager via JavaScript
or the DOM. That's how LostPass worked, and it's how many of the new attacks
work, too. Desktop-based password managers have no such access, as they require
compromising the local machine first, which is much harder than visiting a
webpage.</p>
<p>Your password manager de jour might not be as bug ridden as LastPass, but it
suffers from the same risk vector if it's a browser extension. If you're using
it in a corporate environment to share passwords, now only one user of many
needs to be attacked to steal all of your passwords via a previously
undisclosed bug. If you think criminals aren't mining LastPass and others for
bugs right now, you're naive.</p>
<h2 id="alternatives">Alternatives</h2>
<p>What should you use instead?</p>
<h3 id="desktop-based-password-managers">Desktop-based Password Managers</h3>
<p>Any program which is not resident in your browser is safer than one that isn't.
There are many choices to choose from in this category, and none of them suffer
from the direct-access-via-JavaScript risk category. If you do use one, do not
install the "form filler" browser extensions. Copy and paste the passwords. I
use <a href="https://www.passwordstore.org/">pass</a> because it's simple to understand for technical folks, but I
have many friends who use <a href="http://keepass.info/">KeePass</a>.</p>
<p>Copying and pasting passwords into the wrong place is not a large enough risk
to use an even riskier browser password manager extension. If you accidentally
paste one password in the wrong place, it is easy to change. If you get all
your passwords stolen by a new bug, you'll never even know, and you'll have
little to no recourse.</p>
<h3 id="built-in-browser-password-managers">Built-in Browser Password Managers</h3>
<p>Every major browser now has a well designed built-in password manager that is
easy to use. These are a nice choice if you dislike copying and pasting
passwords into websites. All of them also offer mobile sync so you can have
your passwords on the go. Since two factor authentication is not available for
these, use a very strong and unique passphrase.</p>
<ul>
<li><a href="https://support.google.com/chrome/answer/95606">Chrome's Password Manager</a> along with a good <a href="https://support.google.com/chrome/answer/1181035">sync password</a></li>
<li><a href="https://support.mozilla.org/t5/Firefox-Display-and-Appearance/Password-Manager-Remember-delete-change-and-import-saved/td-p/2254">Firefox's Password Manager</a> along with a good <a href="https://support.mozilla.org/t5/Firefox/Use-a-Master-Password-to-protect-stored-logins-and-passwords/td-p/1437">master password</a></li>
<li><a href="https://support.apple.com/en-us/HT204085">iCloud Keychain for Safari</a></li>
</ul>
<p>I recommend that non-technical users use the built-in password managers because
they're easy to use and plenty secure.</p>
<h3 id="literally-anything-else">Literally anything else</h3>
<p>An encrypted text file on your computer is safer than a browser extension
password manager. Think of how it would be compromised: someone would need to
get at least user-level access to your computer, and then either read it when
it's temporarily unencrypted, or wait for you to unencrypt it. That cannot be
done by efficient attackers at scale. And if they've compromised your machine,
you have bigger things to worry about.</p>
<h1 id="the-future">The future</h1>
<p>I don't know if these browser extension password managers will ever improve
enough for me to recommend them. The risk of having an attacker be able to
directly interact with them is just too high. Many of them are for-profit
companies who obviously have not invested a lot of resources in an in-depth
audit of their source code because of the trivial bugs that are found by
researchers in an hour.</p>
<p>We need less of the "military grade encryption" marketing from them and more
"here's the full source code audit report by a well known security firm". Maybe
then it'll get better.</p>How to Implement Crypto Poorly2016-10-07T08:37:00-07:002016-10-07T08:37:00-07:00Sean Cassidytag:www.seancassidy.me,2016-10-07:/how-to-implement-crypto-poorly.html<p><em>This is a summary of <a href="https://www.youtube.com/watch?v=Qi2NfZL7xpU">the talk I gave at GrrCon '16</a>.</em></p>
<p>We're always told: don't roll your own crypto!</p>
<p>This has always felt like a kind of abstinence-only education to me. Of course,
it's correct, that almost certainly if you decide to use your own encryption
mechanism instead of …</p><p><em>This is a summary of <a href="https://www.youtube.com/watch?v=Qi2NfZL7xpU">the talk I gave at GrrCon '16</a>.</em></p>
<p>We're always told: don't roll your own crypto!</p>
<p>This has always felt like a kind of abstinence-only education to me. Of course,
it's correct, that almost certainly if you decide to use your own encryption
mechanism instead of say, TLS, that you'll almost certainly do a worse job than
the IETF. You'll certainly fail at making a better block cipher than Daemen and
Rijmen did. But there was always a sort of "don't even learn about it" tone to
this recommendation to me.</p>
<p>Is this recommendation effective? That is, do people or companies actually roll
their own crypto? Are the crypto systems they made horribly broken? I decided
to find out.</p>
<h1 id="the-survey">The Survey</h1>
<p>I needed an area to survey to answer this question. Where can I find lots of
examples of custom cryptography? Can I find a lot of common issues in those
implementations?</p>
<p>It turns out that there is a lot of custom cryptography in one particular
place: custom single sign-on implemenations. I found 21 implementations of
companies that offer some kind of custom single sign-on for their product. </p>
<h2 id="custom-single-sign-on">Custom Single Sign-on</h2>
<p>Single sign-on is any system which grants access to other systems by virtue of
being authenticated against it. For instance, Facebook Connect is a popular
single sign-on mechanism for many websites. Instead of registering with every
website you use, you can sign in with Facebook and the website will get the
user information it needs from Facebook directly on your behalf. OAuth2 and
SAML 2.0 are examples of open standards that provide single sign-on.</p>
<p>But what if that's not quite what you want?</p>
<p>What if you want "a few lines of PHP" in order to have users be authenticated
against your site? Best if it works with Wordpress and whatever weird Java 6
system some of your enterprise customers use. No need to worry about what a
bearer token is and why you'd want to refresh it.</p>
<p>What if instead you made your own little crypto function that combined some
secret and gave it to your customers, who could then authenticate their users
to your service?</p>
<p>For instance, say Alice has a TODO list service that her customers buy. Alice
buys Bob's helpdesk software so that her customers can file support tickets
when they have a problem.</p>
<p>When one of Alice's customers wants to file a support ticket because their
TODOs were missing, Alice computes something like this:</p>
<div class="codehilite"><pre><span></span><span class="err">H(user's email, shared secret)</span>
</pre></div>
<p>Where <code>H</code> is some kind of HMAC or hash function (or even something terrible
only dreamt of in nightmares), and shared secret is a secret shared by Alice
and Bob.</p>
<p>Alice then redirects the user to Bob's website, with the result of that
computation and the user's email, like:</p>
<div class="codehilite"><pre><span></span><span class="c">https://bob.example/alice/login?email=user@example.com&hash=59bcc3ad6775562f845953cf01624225</span>
</pre></div>
<p>Bob then uses the same email address and the same shared secret, and hopefully
comes up with the same hash value, <code>59bcc3ad6775562f845953cf01624225</code>. If so,
the user is successfully authenticated to Alice's support site, hosted by Bob.
The user didn't need to register on Bob's website, so, to the user, it was
seamless.</p>
<p>Since the user doesn't know the shared secret, the user can't compute the hash
value themselves.</p>
<h1 id="common-flaws">Common Flaws</h1>
<p>The good thing about these custom single sign-on implemenations is that they're
simple. The bad thing is that they're often dangerously insecure. For example,
<a href="https://www.wordfence.com/blog/2016/05/freshdesk-vulnerability-red-team-exercise/">this bug reported in Freshdesk</a> resulted from the name and the
email being concatenated. There are plenty of tricky little bugs that can
impact these systems.</p>
<p>For this study, I picked seven flaws that I thought would be common problems
with these custom SSO solutions, and examined each solution's publicly
available documentation and example code for the problems. I didn't do a deep
inspection of each implementation, but rather just enough to determine if the
flaw was present or not. </p>
<h2 id="no-hmac">No HMAC</h2>
<p>Essentially, these single sign-on implemenations are trying to pass an
authenticated message by an untrusted third party, the user. The best way to do
that is with an message authentication code (MAC).</p>
<p>An <a href="https://en.wikipedia.org/wiki/Hash-based_message_authentication_code">HMAC</a> combines a hash function, a secret key, and a message in a
secure way that resists <a href="https://en.wikipedia.org/wiki/Length_extension_attack">length extension attacks</a> and
provides <a href="https://en.wikipedia.org/wiki/Preimage_attack">preimage resistance</a>. Not using an HMAC or any kind of real message authentication opens
up the SSO implementation to many different kinds of attacks.</p>
<h2 id="uses-obsolete-crypto-primitives">Uses Obsolete Crypto Primitives</h2>
<p>Does the implementation use known bad crypto primitives? For this, I counted
HMAC-MD5 as bad, as MD5 is known bad, even though there are no known attacks
against HMAC-MD5 specifically. As with some other flaws I studied, not all of
these problems I wanted to identify were critical. I also wanted to study less
important flaws to understand how fast or slow the adoption of new crypto
primitives, like SHA-3 were.</p>
<p><em>Spoiler alert</em>: no one used SHA-3.</p>
<h2 id="short-keys">Short Keys</h2>
<p>Shared secret keys are often distributed in hexadecimal, like this: </p>
<div class="codehilite"><pre><span></span><span class="err">35f7c022a53662e813952e4a7425533a</span>
</pre></div>
<p>If you're not paying attenion, you might do something like this, in Java:</p>
<div class="codehilite"><pre><span></span><span class="n">String</span> <span class="n">secretKey</span> <span class="o">=</span> <span class="s">"35f7c022a53662e813952e4a7425533a"</span><span class="p">;</span>
<span class="kt">byte</span><span class="o">[]</span> <span class="n">keyBytes</span> <span class="o">=</span> <span class="n">secretKey</span><span class="p">.</span><span class="na">getBytes</span><span class="p">();</span>
</pre></div>
<p>This does not give you 16 byte array like <code>[0x35, 0xf7, … 0x3a]</code>. Instead, it
gives you a 32 byte array of the UTF-8 representation of the string, like
this: <code>[0x33, 0x35, … 0x61]</code>, which is almost certainly not what you meant.</p>
<p>If your cipher takes only the number of bytes it needs, it will leave some of
the key material out! This means that if you're using a 128-bit key, it could
be using only a 64-bit key. That's a massive reduction in the number of
available keys.</p>
<h2 id="replay-attacks">Replay Attacks</h2>
<p>Since the "authenticated message" is being passed to a potentially untrusted
user, it's important to make sure that the message has some kind of expiration.
One simple way is just to attach a timestamp that expires soon after the
message is generated. Another way would be to use a <a href="https://en.wikipedia.org/wiki/Cryptographic_nonce">nonce</a>, which
ensures that the message cannot be used more than once. </p>
<p>If a user can execute a replay attack, they could use another user's
compromised SSO URL, or use an older URL of their own to stay logged in.</p>
<h2 id="static-initialization-vector">Static Initialization Vector</h2>
<p>Block ciphers have modes. These modes make it possible to use block ciphers on
more than one block of data. These mode typically require an initialization
vector (IV) that's random. Some modes, like CTR and CBC, require that the IV
isn't reused, otherwise it will leak information. In CTR mode, IV reuse is
particularly catastrophic, so much so that some crypto experts are
recommending against CTR mode.</p>
<p>For SSO implementations that used a block cipher, I wanted to see if they made
this classic error.</p>
<h2 id="known-plaintext">Known Plaintext</h2>
<p>Usually it's best to limit what the attacker knows. Like the secret key. Best
not to share that with your attacker. But sometimes even knowing (or
controlling) the plaintext can help the attacker. With well designed crypto
systems, this shouldn't matter at all. Attackers could encode whatever messages
they want and not learn anything about other messages or the key. But many
crypto systems are not well designed, so I kept track of which implementations
had plaintexts that the attacker knew or controlled.</p>
<h2 id="random-crap">Random Crap</h2>
<p>This category is a bit tongue-in-cheek, and actually came about after reviewing
the implementations. I noticed a lot of weird stuff that absolutely has no
effect on the crypto guarantees (or lack thereof) of the system. Twiddling
bits, reversing strings, taking the MD5 of the SHA-1 of the MD5 of the SHA-1
of the key, and so on. </p>
<h1 id="survey-results">Survey Results</h1>
<p>Here are the aggregate problems found of the 21 custom single sign-on
implementations studied:</p>
<canvas id="overall-chart" width="400" height="300"></canvas>
<p>Several of the implementations that had the short keys problems used an HMAC
that did not truncate the key, therefore those aren't so much vulnerabilities
as sloppy programming. Similarly, using obsolete primitives is not always
immediately exploitable, but it is no longer best practice. </p>
<p>One implementation used a block cipher (AES) in a mode that requires the IV to
be used only once, and it failed to do so.</p>
<p>Only one implementation was free from all problems studied.</p>
<p>The response from vendors was disappointing. Of the 20 implementations that had
problems, nearly half did not acknowledge my vulnerability report. Two claimed
that the problems I found were not bugs. Only one implementation fixed the bugs
I reported.</p>
<div style="padding-left: 20%; padding-right: 20%">
<canvas id="vendor-chart" width="100" height="100"></canvas>
</div>
<h2 id="custom-cipher">Custom cipher</h2>
<p>Interestingly, one implementation decided that even traditional cryptography
primitives, like MD5 or SHA-1 or AES were too fancy for them and wanted to make
their own. Here it is, edited for clarity:</p>
<div class="codehilite"><pre><span></span><span class="k">def</span> <span class="nf">encrypt</span><span class="p">(</span><span class="n">plaintext</span><span class="p">,</span> <span class="n">input_key</span><span class="p">):</span>
<span class="n">key</span> <span class="o">=</span> <span class="n">hashlib</span><span class="o">.</span><span class="n">sha1</span><span class="p">(</span><span class="n">input_key</span><span class="p">)</span><span class="o">.</span><span class="n">hexdigest</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">character</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">plaintext</span><span class="p">):</span>
<span class="n">val</span> <span class="o">=</span> <span class="nb">ord</span><span class="p">(</span><span class="n">character</span><span class="p">)</span>
<span class="n">adder</span> <span class="o">=</span> <span class="nb">ord</span><span class="p">(</span><span class="n">key</span><span class="p">[</span><span class="n">i</span> <span class="o">%</span> <span class="nb">len</span><span class="p">(</span><span class="n">key</span><span class="p">)])</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">+</span> <span class="n">base36encode</span><span class="p">(</span><span class="n">val</span> <span class="o">+</span> <span class="n">adder</span><span class="p">)[::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
<p>This code has several obvious flaws. It operates on a per character basis,
which means there's no <a href="https://en.wikipedia.org/wiki/Avalanche_effect">avalanche effect</a>. It naively just adds
together a hex character (0-9A-F) and the plaintext, and then base 36 encodes
it. For some reason it reverses the resulting two characters, probably to add
some mystery.</p>
<p>Here's a table of what a few iterations of this function does for a plaintext
of ASCII zeroes and the key "hello":</p>
<table>
<thead>
<tr>
<th>Plaintext</th>
<th>Key</th>
<th>Val</th>
<th>Adder</th>
<th>Addition</th>
<th>Base 36</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>a</td>
<td>48</td>
<td>97</td>
<td>145</td>
<td>41</td>
</tr>
<tr>
<td>0</td>
<td>a</td>
<td>48</td>
<td>97</td>
<td>145</td>
<td>41</td>
</tr>
<tr>
<td>0</td>
<td>f</td>
<td>48</td>
<td>102</td>
<td>150</td>
<td>46</td>
</tr>
<tr>
<td>0</td>
<td>4</td>
<td>48</td>
<td>52</td>
<td>100</td>
<td>2S</td>
</tr>
</tbody>
</table>
<p>To reverse this, it's a simple matter of taking the ciphertext and the plaintext and undoing the operations that were performed to get the secret key. "Val" is the ASCII value of the plaintext, "Decimal" is the decimal value of the base 36 number, "Subtract" is what happens when you subtract those two, and the key is the ASCII representation.</p>
<table>
<thead>
<tr>
<th>Plaintext</th>
<th>Base 36</th>
<th>Val</th>
<th>Decimal</th>
<th>Subtract</th>
<th>Key</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>41</td>
<td>48</td>
<td>145</td>
<td>97</td>
<td>a</td>
</tr>
<tr>
<td>0</td>
<td>41</td>
<td>48</td>
<td>145</td>
<td>97</td>
<td>a</td>
</tr>
<tr>
<td>0</td>
<td>46</td>
<td>48</td>
<td>150</td>
<td>102</td>
<td>f</td>
</tr>
<tr>
<td>0</td>
<td>2S</td>
<td>48</td>
<td>100</td>
<td>52</td>
<td>4</td>
</tr>
</tbody>
</table>
<p>Here's the code for exactly that:</p>
<div class="codehilite"><pre><span></span><span class="k">def</span> <span class="nf">get_key</span><span class="p">(</span><span class="n">plaintext</span><span class="p">,</span> <span class="n">ciphertext</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">ciphertext</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">2</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">plaintext</span><span class="p">):</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">key</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">character</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">plaintext</span><span class="p">):</span>
<span class="n">base36</span> <span class="o">=</span> <span class="n">ciphertext</span><span class="p">[</span><span class="n">i</span> <span class="o">*</span> <span class="mi">2</span><span class="p">:</span> <span class="n">i</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">2</span><span class="p">][::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">value</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">base36</span><span class="p">,</span> <span class="mi">36</span><span class="p">)</span>
<span class="n">key</span> <span class="o">=</span> <span class="n">key</span> <span class="o">+</span> <span class="nb">chr</span><span class="p">(</span><span class="n">value</span> <span class="o">-</span> <span class="nb">ord</span><span class="p">(</span><span class="n">character</span><span class="p">))</span>
<span class="k">return</span> <span class="n">key</span>
</pre></div>
<p>There's probably a more clever ciphertext-only attack that's possible because of how bad this cipher is, but I didn't do it because the attacker has access to the plaintext and the ciphertext in this attack. For anyone that has done any cryptanalysis or CTFs, this "encryption" function is a joke.</p>
<p>When the attacker gains the shared secret key, the attacker can then impersonate any user, including admins. This is a classic privilege escalation attack, done over single sign-on.</p>
<h1 id="takeaways">Takeaways</h1>
<p>Should you roll your own crypto? No.</p>
<p>Do people roll their own anyway? Yes.</p>
<p>The standard recommendation from the security community about learning and
implementing your own cryptography has been to avoid it. Let cryptographers do
cryptography. However, it's clear from these results that people will implement
their own crypto regardless.</p>
<p>This strikes me as similar to absintence-only education. We've tried telling
everyone not to write custom crypto code, but because of product demands,
ignorant developers, or hubris, that hasn't been successful. I think it's time
we try a different tactic: teach everyone crypto. Make it a standard part of
being a software engineer. Everyone should at least know what an HMAC is and
why it should be used, what an initialization vector is and how to handle them,
and how to securely generate random numbers (hint: use /dev/urandom).</p>
<p>There should be simple, imitable implemenations of various real world problems
that cryptography solves. No one wants to make insecure systems. We need to
make it harder to mess up.</p>
<p>And, should you learn cryptography? Yes.</p>
<h2 id="resources-for-learning-cryptography">Resources for learning cryptography</h2>
<p>I covered this in an older blog post of mine,
"<a href="https://www.seancassidy.me/so-you-want-to-crypto.html">So, you want to crypto</a>", but it needs updating.
So here's the updated version:</p>
<h3 id="courses">Courses</h3>
<p>A good place to start is <a href="https://www.coursera.org/learn/crypto">Cryptography I at Coursera</a>. Your local
university's cryptography course is also handy. Formal courses are good for a
foundation in some of the mathematics behind cryptography.</p>
<h3 id="books">Books</h3>
<p>My favorite practical cryptography is far and away
<a href="https://www.schneier.com/books/cryptography_engineering/">Cryptography Engineering by Ferguson et al</a>. It combines the best
aspects of theoretical knowledge and practical experience. However, there are
many good cryptography books. Some, like
<a href="http://www.math.umd.edu/~lcw/book.html">Introduction to Cryptography with Coding Theory</a>, is completely
on the theoretical/mathemetical side, but also covers some historical ciphers,
which sometimes pops up in CTFs.</p>
<h3 id="learn-to-break-it">Learn to break it</h3>
<p>Learning to break cryptography is perhaps the most effective way. After you
get a solid foundation from an introductory course or book, I'd recommend doing
the <a href="http://cryptopals.com/">Cryptopals Crypto Challenges</a>. This will get you thinking
about how crypto systems break, which is arguably the most important way to
design and examine them.</p>
<h3 id="learn-by-imitation">Learn by imitation</h3>
<p>Look at pre-existing solutions to problems. Most crypto protocols have had
several versions where they fixed critical bugs. That's interesting because you
can learn from their mistakes. Implementations that I'd recommend are:
<a href="https://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html">AWS Authentication</a>, <a href="https://oauth.net/2/">OAuth2</a>, and <a href="https://en.wikipedia.org/wiki/Double_Ratchet_Algorithm">Double Ratchet</a>. Why were
they designed the way they were? What flaws do they have? Is their complexity
necessary? Are they too simple?</p>
<script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/2.3.0/Chart.min.js"></script>
<script>
var overallCtx = document.getElementById("overall-chart");
var overallChart = new Chart(overallCtx, {
type: 'bar',
data: {
labels: ['No HMAC', 'Obsolete Primitives', 'Custom Cipher',
'Short Keys', 'Always the same', 'Static IV',
'Known Plaintext', 'Random Crap'],
datasets: [{
label: '# of implementations',
data: [15, 17, 1, 7, 4, 1, 5, 16],
backgroundColor: [
'rgba(255, 99, 132, 0.2)',
'rgba(54, 162, 235, 0.2)',
'rgba(255, 206, 86, 0.2)',
'rgba(75, 192, 192, 0.2)',
'rgba(153, 102, 255, 0.2)',
'rgba(255, 159, 64, 0.2)',
'rgba(255, 99, 132, 0.2)',
'rgba(54, 162, 235, 0.2)'
],
borderColor: [
'rgba(255,99,132,1)',
'rgba(54, 162, 235, 1)',
'rgba(255, 206, 86, 1)',
'rgba(75, 192, 192, 1)',
'rgba(153, 102, 255, 1)',
'rgba(255, 159, 64, 1)',
'rgba(255,99,132,1)',
'rgba(54, 162, 235, 1)'
],
borderWidth: 1
}]
}
});
var vendorCtx = document.getElementById("vendor-chart");
var vendorChart = new Chart(vendorCtx, {
type: 'doughnut',
data: {
labels: ['No bugs', 'No ack', 'No fix', 'Not a bug', 'Fixed'],
datasets: [{
data: [1, 9, 8, 2, 1],
backgroundColor: [
'rgba(255, 99, 132, 0.2)',
'rgba(54, 162, 235, 0.2)',
'rgba(255, 206, 86, 0.2)',
'rgba(75, 192, 192, 0.2)',
'rgba(153, 102, 255, 0.2)'
],
borderColor: [
'rgba(255,99,132,1)',
'rgba(54, 162, 235, 1)',
'rgba(255, 206, 86, 1)',
'rgba(75, 192, 192, 1)',
'rgba(153, 102, 255, 1)'
],
borderWidth: 1
}]
}
});
</script>Meditations Redux2016-06-03T09:44:00-07:002016-06-03T09:44:00-07:00Sean Cassidytag:www.seancassidy.me,2016-06-03:/meditations-redux.html<p>Two years ago, I wrote a post called
"<a href="https://www.seancassidy.me/meditations.html">Meditations</a>". I had just read
<a href="https://www.amazon.com/Meditations-New-Translation-Modern-Library-ebook/dp/B000FC1JAI/ref=as_li_ss_tl?ie=UTF8&redirect=true&ref_=as_li_tl&linkCode=ll1&tag=reamorpap-20&linkId=9f8292febae51ef911786e400ad60e78">Meditations by Marcus Aurelius</a> and was inspired to create a short log
of everything I had learned at my last job. I noticed that writing down what I
learned as I learned it made it stick. I …</p><p>Two years ago, I wrote a post called
"<a href="https://www.seancassidy.me/meditations.html">Meditations</a>". I had just read
<a href="https://www.amazon.com/Meditations-New-Translation-Modern-Library-ebook/dp/B000FC1JAI/ref=as_li_ss_tl?ie=UTF8&redirect=true&ref_=as_li_tl&linkCode=ll1&tag=reamorpap-20&linkId=9f8292febae51ef911786e400ad60e78">Meditations by Marcus Aurelius</a> and was inspired to create a short log
of everything I had learned at my last job. I noticed that writing down what I
learned as I learned it made it stick. I would read over the list every so
often to remind myself not to make the same mistakes.</p>
<p>The company I helped start, <a href="https://www.defensestorm.com">DefenseStorm</a>, just celebrated its second
year, acquired loads more customers, changed its name, and we're just about to
announce another round of fundraising. I have learned more in the past two
years than I thought possible. Many of the lessons were not about technology,
as I expected, but instead were about people. That surprised me, but in
retrospect, it seems obvious. Companies are functions of who they are.</p>
<p>I'm posting the lessons I've learned because I think they might be useful to
others. They're broken into seven sections:</p>
<ul>
<li><a href="#engineering">Engineering</a></li>
<li><a href="#product">Product</a></li>
<li><a href="#people">People</a></li>
<li><a href="#business">Business</a></li>
<li><a href="#marketing">Marketing</a></li>
<li><a href="#sales">Sales</a></li>
<li><a href="#investors">Investors</a></li>
</ul>
<p>If you like this, <a href="https://www.seancassidy.me/pages/about.html">let me know</a>. </p>
<h1 id="engineering">Engineering</h1>
<h3 id="pick-one-risky-component">Pick one risky component</h3>
<p>One aspect of your technology should be bleeding edge and risky so that you're
using the best-in-class software to build your business. But everything else
should be as boring as possible. If you start mixing new frameworks with new
languages and fancy new database software, you're in for trouble.</p>
<h3 id="simple-simple-simple">Simple, simple, simple</h3>
<p>Do whatever it takes to keep systems simple. Avoid complications unless required. Designs should be reviewed by the whole team to make sure this is watched. Ask: "Could it be simpler?"</p>
<h3 id="its-not-done-until-its-monitored">It's not done until it's monitored</h3>
<p>Monitoring is an essential part of application development. It should not be released to production without monitoring and alerting.</p>
<h3 id="central-log-management-is-the-new-unit-testing">Central log management is the new unit testing</h3>
<p>Logs should be searchable and alertable. You should monitor each log level, each server, each service, each environment independently and combined and alert on anomalies.</p>
<h3 id="homogeneity-scales">Homogeneity scales</h3>
<p>While it's not as much fun for engineers, picking one language/tool set allows engineers to move between projects quickly and easily, and speeds up the new hire training.</p>
<h3 id="break-down-tasks">Break down tasks</h3>
<p>Yeah, it's not fun, but do it anyway. Break big tasks into releasable pieces and then work on them as a team. Avoid silos.</p>
<h3 id="estimates-are-important">Estimates are important</h3>
<p>Yes, sometimes they're more guesswork than you'd like, but stick to them and try to get better. Don't be swayed by "It's done when it's done." Instead, cut features and deliver an MVP. Iterate and collect feedback.</p>
<h3 id="process-over-gut-feelings">Process over gut feelings</h3>
<p>It is important to have a repeatable, easy software release process. If you instead test software that feels risky, you will be hit by bugs on seemingly low risk releases. Your gut is often wrong.</p>
<h3 id="culture-is-important">Culture is important</h3>
<p>You can have bugs in your culture. You can fix them.</p>
<h3 id="deadlines">Deadlines</h3>
<p>You can either have a calendar deadline and cut requirements to meet it, or
push back the ship date to keep the requirements intact. You can't have both.</p>
<h3 id="set-targets-not-deadlines">Set targets, not deadlines</h3>
<p>Deadlines imply that bad things will happen if you don't hit them. Instead, set
targets and get everyone on board with them. If you miss the target, do a
retrospective on what could have gone better.</p>
<h3 id="its-just-bits">It's just bits</h3>
<p>Don't say "we can't do that." You can, it's just hard or annoying.</p>
<h3 id="operations-is-important">Operations is important</h3>
<p>You have ops even if you think you don't. Heroku is fine for some companies, but it's still operations. Making devs and ops the same team is a good move.</p>
<h1 id="product">Product</h1>
<h3 id="start-at-a-mockup">Start at a mockup</h3>
<p>Communicating product ideas in the abstract is difficult and time consuming. Make a mockup first and then start talking about it.</p>
<h3 id="think-beyond-the-technical-details">Think beyond the technical details</h3>
<p>You focus on the details, but miss the big picture. Use cases are important.</p>
<h3 id="themes">Themes</h3>
<p>Sometimes product roadmap can feel like chaos management. Plan out the themes in detail for the next two quarters, and then in light detail for one or two more. Use the themes whenever you feel lost or conflicted, but not as a crutch.</p>
<h3 id="demo-often">Demo often</h3>
<p>You should do demos as much as you can. Go off script. Every time that you feel less-than-confident or embarrassed is an improvement that should go on the roadmap to be fixed. The goal is to build something that you're proud to demo.</p>
<h3 id="hone-your-explanations">Hone your explanations</h3>
<p>You tend to explain things well to technical audiences, but you need to keep practicing conveying large swaths of informations to non-technical people who are busy. If you could tell people only one thing about this new feature, what would it be?</p>
<h3 id="delegate">Delegate</h3>
<p>Let other people drive certain features. Give them authority explicitly. Tell them you want their vision to shine through, and let them do it.</p>
<h3 id="sell">Sell</h3>
<p>You need to sell your product ideas. Know your audience. Do not brainstorm in large, unprepared groups, as you tend to do sometimes. Think before you speak.</p>
<h3 id="the-little-things">The Little Things</h3>
<p>Organization is the product manager's secret weapon. Don't forget about all the little bugs that pile up.</p>
<h3 id="feedback">Feedback</h3>
<p>Get feedback on use cases and requirements early and often. However, don't design features by committee.</p>
<h3 id="feedback-requires-a-response">Feedback requires a response</h3>
<p>You can't ask for opinions and then do nothing with them. You either have to say that you won't do it, or do it. Never tell someone that you will include a feature if that's not true.</p>
<h3 id="dont-get-offended">Don't get offended</h3>
<p>Product feedback is just about the most useful thing anyone can give you. If
they don't like it, don't take offense. Improve your product.</p>
<h1 id="people">People</h1>
<h3 id="dont-be-dismissive">Don't be dismissive</h3>
<p>You sound dismissive. You don't mean to be, but you are. Go out of your way to watch for people's feelings.</p>
<h3 id="honesty">Honesty</h3>
<p>You're better off explaining why someone's idea isn't good than trying to placate them and lying to them. It might be hard in the short term, but it's easier than the alternative.</p>
<h3 id="talk-more">Talk more</h3>
<p>Your biggest failure to date was because you didn't talk often enough with one of your coworkers. Don't repeat that mistake.</p>
<h3 id="pressure-is-a-choice">Pressure is a choice</h3>
<p>People will try to pressure you, but that is their failing, not yours. Pressuring others is rarely helpful. Choose not to be pressured, and you won't be.</p>
<h3 id="trust-your-gut-with-hiring">Trust your gut with hiring</h3>
<p>Hemming and hawing is rarely productive. If you're not convinced by a candidate, don't hire them.</p>
<h3 id="dont-worry-about-negative-people">Don't worry about negative people</h3>
<p>Some people are just negative. They don't care about you. You can't make everyone like you. Spend time and energy elsewhere.</p>
<h3 id="understand-before-dismissing">Understand before dismissing</h3>
<p>You have to deal with people you don't like. Why are they acting the way they are? Listen to them and try to understand them. You can help make them productive. You will both benefit.</p>
<h3 id="give-positive-feedback">Give positive feedback</h3>
<p>You tend to focus on the negative feedback because it's actionable. Your team
needs to hear about the positive feedback too. Encouragement is important.</p>
<h3 id="interact">Interact</h3>
<p>It doesn't matter if your "process" means that you put stickies on a board or if you use a spreadsheet. You should be talking to the people that matter. This personal touch makes people feel like they're valued and important, which, of course, they are.</p>
<h3 id="build-a-team-youre-proud-of">Build a team you're proud of</h3>
<p>Every time you talk to investors, prospects, customers, your parents, your friends, your colleagues, anyone, you should want to brag about your team. They're the smartest, the hardest working, the fastest, the most agile, or however you would describe them. You should be proud.</p>
<h3 id="dont-tweet-that">Don't tweet that</h3>
<p>Twitter has a larger reach than you think. Negativity will get back to you. Hating looks bad on you.</p>
<h3 id="forget-about-it">Forget about it</h3>
<p>Don't obsess when you make a mistake. Move on. But write it down here first.</p>
<h3 id="deal-with-it">Deal with it</h3>
<p>You have to understand where your colleagues are coming from, even if they refuse to understand your point of view. Be better.</p>
<h3 id="stress">Stress</h3>
<p>Your stress creeps up on you. You don't even notice it. Take some time and slow down. How are you feeling?</p>
<h3 id="time">Time</h3>
<p>How to spend your time is a complicated decision. Try to take each day one hour at a time, each week one day at a time, and each month one week at a time.</p>
<h3 id="reach-out">Reach out</h3>
<p>Reconnect with your old colleagues more. Call up old friends. You're not that
busy.</p>
<h3 id="network-more">Network more</h3>
<p>It's okay to dislike networking, but it's good for you and the business. Slog through it.</p>
<h3 id="write-more">Write more</h3>
<p>It's therapeutic and fun. You don't do it enough.</p>
<h3 id="listen">Listen</h3>
<p>Actually listen to people when they talk. Sometimes, that's all they need.</p>
<h1 id="business">Business</h1>
<h3 id="choose-a-vertical">Choose a vertical</h3>
<p>Many startups want to make solutions that fit everyone or every business because the potential market size is so large. Make your products generic but make your marketing and sales specific, at least at first.</p>
<h3 id="test-your-vertical">Test your vertical</h3>
<p>Have clear goals for the vertical you've chosen. If they aren't met, adjust. Do not keep moving the goal posts.</p>
<h3 id="listen-to-your-vision">Listen to your vision</h3>
<p>Feedback from customers is to fill in the details about your product, not to <a href="https://en.wikipedia.org/wiki/Gal%C3%A1pagos_syndrome">drive the vision</a>. Vision should come from your experience, your understanding of the market, and what you want to see.</p>
<h3 id="continually-check-your-assumptions">Continually check your assumptions</h3>
<p>Is our product for laypeople or technical users? How customizable should it be? How should we talk about our product? What you've done in the past is not always what you should do in the future. Reevaluate and adjust. Don't be afraid to change.</p>
<h3 id="try-to-ignore-the-competition">Try to ignore the competition</h3>
<p>It's not always possible or practical to ignore everything about your competitors, but don't fret about what they're doing too much. Focus on your customers and your prospects.</p>
<h3 id="innovation">Innovation</h3>
<p>Not all innovation needs to be technical. You can innovate on pricing, by how you talk about your product, and how you portray your business. Think out side of tech more.</p>
<h3 id="differentiation-is-critical">Differentiation is critical</h3>
<p>No matter how many competitors your have, you have to differentiate yourself. What's unique about your product? Better and easier-to-use isn't good enough.</p>
<h1 id="marketing">Marketing</h1>
<h3 id="marketing-is-important">Marketing is important</h3>
<p>You need someone in charge of your company's public face. You don't have enough time to do it yourself, you don't. Hire someone for it.</p>
<h3 id="youre-not-marketing-if-youre-not-targeting">You're not marketing if you're not targeting</h3>
<p>There's really no such thing as general purpose marketing. The best marketing <a href="http://www.npr.org/sections/money/2015/05/29/410589806/episode-628-this-ads-for-you">targets</a> and segments.</p>
<h3 id="conferences-arent-worth-it-if-youre-not-speaking">Conferences aren't worth it if you're not speaking</h3>
<p>Speaking is a great way to position yourself and your company in your industry. Unless you are learning a lot from a conference, it's not worth it to go if you're not speaking. Hustling at conferences is too difficult to be worthwhile.</p>
<h3 id="avoid-marketing-speak">Avoid marketing-speak</h3>
<p>Nothing turns prospects off faster than marketing babble. Speak and write plainly.</p>
<h3 id="tooling-matters">Tooling matters</h3>
<p>There are some great marketing tools nowadays. The privacy implications of some of them are worrisome, but they are extremely useful.</p>
<h1 id="sales">Sales</h1>
<h3 id="its-not-what-you-say-its-how-you-say-it">It's not what you say, it's how you say it</h3>
<p>People remember how you speak. Ask how a salesperson is doing, the answer is always the same. They're doing great. And they'll make you believe it.</p>
<h3 id="energy">Energy</h3>
<p>Good salespeople are very excited. They bring momentum and energy to every room they walk into.</p>
<h3 id="find-unspoken-objections">Find unspoken objections</h3>
<p>Even if a prospect doesn't mention something (like that they would prefer a month-to-month deal to a multi-year deal), you can lose the deal because of it. Find what they haven't mentioned and test your sales assumptions.</p>
<h3 id="dont-raise-objections-yourself">Don't raise objections yourself</h3>
<p>Unless a piece of information is vital for them to know, don't bring it up. You don't know how they'll react to superfluous information.</p>
<h3 id="sell-to-sales">Sell to Sales</h3>
<p>Salespeople live on confidence. They need to know that the product they're selling is the best. Don't lead with all the problems that still exist. Talk about how it's great and will improve constantly.</p>
<h3 id="selling-starts-at-no">Selling starts at "No."</h3>
<p>Once a prospective customer tells you that they won't buy, that's when the selling starts. Don't give up.</p>
<h3 id="sales-objections-are-part-of-the-job">Sales objections are part of the job</h3>
<p>Get a list of them and have talking points for each one. Don't argue, but instead show them that there is more than one right answer.</p>
<h3 id="when-hiring-look-for-the-light-bulb">When hiring, look for the light bulb</h3>
<p>You're not great at hiring salespeople, but there's one thing you can look for: when they really understand what you've told them. Skip salespeople that don't get it. They won't be good at listening to customers, either.</p>
<h3 id="selling-face-to-face-is-way-more-effective-than-over-the-phone">Selling face-to-face is way more effective than over the phone</h3>
<p>Especially early on, it's worth the cost. Go meet your prospects, shake their hand, and build a relationship.</p>
<h3 id="check-your-ego">Check your ego</h3>
<p>Do not get offended by prospects. Don't take it personally.</p>
<h3 id="salespeople-need-leadership">Salespeople need leadership</h3>
<p>More so than any other group of people at your company, they need to have a leader to follow. They need a well-defined path and to see that you're making progress on it.</p>
<h3 id="get-their-feedback">Get their feedback</h3>
<p>You will offend people if you don't ask for their opinion on important matters.</p>
<h1 id="investors">Investors</h1>
<h3 id="what-matters-most-is-the-narrative">What matters most is the narrative</h3>
<p>How good your slide deck looks, your demo, even the customers you have aren't as important as showing investors the path to a successful company.</p>
<h3 id="beware-of-the-question-how-big-is-the-market">Beware of the question "How big is the market?"</h3>
<p>There are two types of market sizes for early stage companies: the total addressable market and the target market. Founders assume that if they conflate these two and present the larger number as an answer to this question, the investors will be so impressed by how big that number is that they'll invest. The total addressable market for most good early stage startups will be almost completely unknown, so the number given is likely to be off by at least an order of magnitude. Instead, say that while you think the total addressable market is quite large, the target market is $X/year and you can hit this market easily because of your sales channels, marketing, and so on.</p>
<h3 id="casual-meetings-and-dinners-arent">Casual meetings and dinners aren't</h3>
<p>Investors are constantly evaluating, even if they aren't fully conscious of it. Their opinion of you and your company can change faster than you think. You need to be on your game. Don't drink too much around them. Your office should be ready to show.</p>
<h3 id="dont-hedge">Don't hedge</h3>
<p>Investors aren't interested in nuance. Don't tell them about all your doubts. Don't hedge against your own success.</p>
<h3 id="dont-talk-to-them-like-customers">Don't talk to them like customers</h3>
<p>Don't make the mistake of telling investors the same exact things as customers. They might assume the wrong things. Do explain what resonates with customers and why.</p>
<h3 id="remember-that-vcs-always-use-video-conferencing">Remember that VCs always use video conferencing</h3>
<p>So put some nice clothes on for meetings.</p>
<h3 id="make-a-product-for-your-users-not-your-investors">Make a product for your users, not your investors</h3>
<p>While investors have good ideas, they don't know your business like you do. Don't be afraid to stand up to them when it's important.</p>
<h3 id="dont-talk-down-to-them">Don't talk down to them</h3>
<p>Most investors will surprise you with how technical they are. Never assume
they're unaware of some popular tech. Like everyone else, they appreciate it
when you acknowledge their intelligence.</p>Genius Blocker2016-04-19T18:39:00-07:002016-04-19T18:39:00-07:00Sean Cassidytag:www.seancassidy.me,2016-04-19:/genius-blocker.html<p>I was listening to <a href="https://gimletmedia.com/episode/61-baby-king/">Reply All</a> today and they were talking about the
context of <a href="https://twitter.com/SaraMorrison/status/718831347259846656">this Sara Morrison tweet</a> about <a href="http://genius.com/web-annotator">Genius's web annotation
tool</a>. Apparently, a lot of people are mad about Genius essentially
adding a comments section to people's websites and having each word and
sentence criticized when they …</p><p>I was listening to <a href="https://gimletmedia.com/episode/61-baby-king/">Reply All</a> today and they were talking about the
context of <a href="https://twitter.com/SaraMorrison/status/718831347259846656">this Sara Morrison tweet</a> about <a href="http://genius.com/web-annotator">Genius's web annotation
tool</a>. Apparently, a lot of people are mad about Genius essentially
adding a comments section to people's websites and having each word and
sentence criticized when they didn't agree to it<sup id="fnref:footnote"><a class="footnote-ref" href="#fn:footnote">1</a></sup>. I hate comments on
sites so I totally understand.</p>
<p>Naturally, I wanted to look into it to see if I could put control back in
the hands of the content creators. Turns out it's real easy. Here's the code:</p>
<div class="codehilite"><pre><span></span><span class="kd">var</span> <span class="nx">geniusBlockerObserver</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">MutationObserver</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">mutations</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">mutations</span><span class="p">.</span><span class="nx">forEach</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">mutation</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">mutation</span><span class="p">.</span><span class="nx">addedNodes</span><span class="p">.</span><span class="nx">length</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">mutation</span><span class="p">.</span><span class="nx">addedNodes</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">item</span> <span class="o">=</span> <span class="nx">mutation</span><span class="p">.</span><span class="nx">addedNodes</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">item</span><span class="p">.</span><span class="nx">tagName</span> <span class="o">!==</span> <span class="kc">undefined</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">item</span><span class="p">.</span><span class="nx">tagName</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s2">"GENIUS"</span><span class="p">))</span> <span class="p">{</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">removeChild</span><span class="p">(</span><span class="nx">item</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="nx">geniusBlockerObserver</span><span class="p">.</span><span class="nx">observe</span><span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">body</span><span class="p">,</span> <span class="p">{</span><span class="nx">childList</span><span class="o">:</span> <span class="kc">true</span><span class="p">});</span>
</pre></div>
<p>Simply put this in a JavaScript file on your site, or put it between two
<script> tags and that's it! No more philistines commenting on your
beautifully purple prose. Here's a simple copy paste solution:</p>
<div class="codehilite"><pre><span></span><span class="p"><</span><span class="nt">script</span>
<span class="na">src</span><span class="o">=</span><span class="s">"https://raw.githubusercontent.com/cxxr/GeniusBlocker/master/genius-blocker.js"</span><span class="p">></</span><span class="nt">script</span><span class="p">></span>
</pre></div>
<p>You can also grab the code for <a href="https://github.com/cxxr/GeniusBlocker">GeniusBlocker on Github</a>.</p>
<p><a href="https://github.com/cxxr/GeniusBlocker"><img style="position: absolute; top: 0; left: 0; border: 0;" src="https://camo.githubusercontent.com/567c3a48d796e2fc06ea80409cc9dd82bf714434/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f6c6566745f6461726b626c75655f3132313632312e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_left_darkblue_121621.png"></a></p>
<div class="footnote">
<hr />
<ol>
<li id="fn:footnote">
<p>Whoa! Annotate this sentence as "Sloppy writing", please. <a class="footnote-backref" href="#fnref:footnote" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>LostPass2016-01-16T08:19:00-08:002016-01-16T08:19:00-08:00Sean Cassidytag:www.seancassidy.me,2016-01-16:/lostpass.html<p><strong>Updated 2016-02-04</strong>: LastPass has
<a href="/images/lastpass_notification_nobutton.png">removed the button from
notifications</a>
and now requires <a href="https://lastpass.com/support.php?cmd=showfaq&id=10072">email confirmation for all logins from new IPs</a>.
This substantially mitigates LostPass, but does not eliminate it.</p>
<p>I have discovered a phishing attack against LastPass that allows an attacker to
steal a LastPass user's email, password, and even …</p><p><strong>Updated 2016-02-04</strong>: LastPass has
<a href="/images/lastpass_notification_nobutton.png">removed the button from
notifications</a>
and now requires <a href="https://lastpass.com/support.php?cmd=showfaq&id=10072">email confirmation for all logins from new IPs</a>.
This substantially mitigates LostPass, but does not eliminate it.</p>
<p>I have discovered a phishing attack against LastPass that allows an attacker to
steal a LastPass user's email, password, and even two-factor auth code, giving
full access to all passwords and documents stored in LastPass. </p>
<p>I call this attack LostPass. The code is available <a href="https://github.com/cxxr/lostpass">via Github</a>.</p>
<p>LostPass works because LastPass displays messages in the browser that attackers
can fake. Users can't tell the difference between a fake LostPass message and
the real thing because there is no difference. It's pixel-for-pixel the same
notification and login screen.</p>
<p>I discussed LostPass at <a href="https://shmoocon.org">ShmooCon 2016</a>. You can <a href="https://raw.githubusercontent.com/cxxr/lostpass/master/lostpass_shmoocon_slides.pdf">read my slides
(PDF)</a>, or you can <a href="https://archive.org/details/Lostpass">watch the video</a>.</p>
<h1 id="pixel-perfect-phishing">Pixel-perfect Phishing</h1>
<p>A few months ago, <a href="https://en.wikipedia.org/wiki/LastPass">LastPass</a> displayed a message on my browser that
my session had expired and I needed to log in again. I hadn't used LastPass in
a few hours, and hadn't done anything that would have caused me to be logged
out. When I went to click the notification, I realized something: it was
displaying this in the browser viewport. An attacker could have drawn this
notification.</p>
<p><a href="https://www.seancassidy.me/static/images/"><img src="/images/lastpass_notification.png" width="100%" alt="LastPass error notification"></a></p>
<p>Any malicious website could have drawn that notification. Because
LastPass trained users to expect notifications in the browser viewport, they
would be none the wiser. The LastPass login screen and two-factor prompt are
drawn in the viewport as well.</p>
<p><a href="https://www.seancassidy.me/static/images/"><img src="/images/lastpass_login.png" width="100%" alt="LastPass login screen"></a>
<a href="https://www.seancassidy.me/static/images/"><img src="/images/lastpass_2fa.png" width="100%" alt="LastPass two-factor screen"></a></p>
<p>Since LastPass has an API that can be accessed remotely, an attack materialized
in my mind.</p>
<h2 id="the-attack">The Attack</h2>
<p>Here are the steps for LostPass, in order.</p>
<h4 id="visit-the-malicious-site">Visit the malicious site</h4>
<p>Get the victim to go to a malicious website that looks benign, or a real
website that is vulnerable to XSS. This is where we'll deploy lostpass.js.
Unlike most phishing attacks, users won't be on their guard because this isn't
supposed to be a secure website. It could be a funny video or image, even.</p>
<h4 id="check-for-lastpass-and-show-the-notification">Check for LastPass and show the notification</h4>
<p>If they have LastPass installed, show the login expired notification and log
the user out of LastPass. LastPass is vulnerable to a logout <a href="https://en.wikipedia.org/wiki/Cross-site_request_forgery">CSRF</a>, so
any website can log any user out of LastPass. This will make it appear to the
user that they are truly logged out.</p>
<p><a href="https://www.seancassidy.me/static/images/"><img src="/images/lostpass_notification.png" width="100%" alt="LostPass notification screen"></a></p>
<h4 id="direct-the-victim-to-the-login-page">Direct the victim to the login page</h4>
<p>Once the victim clicks on the fake banner, direct them to an
attacker-controlled login page that looks identical to the LastPass one. This
is the login page for Chrome.</p>
<p><a href="https://www.seancassidy.me/static/images/"><img src="/images/lostpass_login.png" width="100%" alt="LostPass login screen"></a></p>
<p>Notice the domain, "chrome-extension.pw". This looks similar to the Chrome
protocol for real extensions "chrome-extension". There is an <a href="https://code.google.com/p/chromium/issues/detail?id=453093">open issue in
Chromium</a> to address this.</p>
<h4 id="get-the-credentials">Get the credentials</h4>
<p>The victim will enter their password and send the credentials to the
attacker's server. The attacker's server will check if the credentials are
correct by calling LastPass's API. The API will inform us if two-factor
authentication is required.</p>
<p>If the username and password is incorrect, we'll redirect the user back to the
malicious website, but this time, the LostPass notification bar will say
"Invalid Password".</p>
<p>If the user has two-factor authentication, redirect them to a two-factor
prompt, like so:</p>
<p><a href="https://www.seancassidy.me/static/images/"><img src="/images/lostpass_2fa.png" width="100%" alt="LostPass 2fa screen"></a></p>
<h4 id="download-the-vault">Download the vault</h4>
<p>Once the attacker has the correct username and password (and two-factor
token), download all of the victim's information from the LastPass API. We can
install a backdoor in their account via the emergency contact feature, disable
two-factor authentication, add the attacker's server as a "trusted device".
Anything we want, really.</p>
<h2 id="implications">Implications</h2>
<p>These steps mirror the exact path that LastPass does when a user is logged
out remotely. LostPass mimics steps 2 through 7.</p>
<p>Some things to note about why this is so effective:</p>
<ul>
<li>Many responses to the phishing problem are "Train the users", as if it was
their fault that they were phished. Training is not effective at combating
LostPass because there is little to no difference in what is shown to the
user</li>
<li>LastPass's login workflow is complex and somewhat buggy. Sometimes it shows
in-viewport login pages, and sometimes it shows them as popup windows</li>
<li>It is easy to detect LastPass and it was even easier to find the exact HTML
and CSS that LastPass uses to show notifications and login pages</li>
<li>It even phishes for the two-factor auth code, so 2FA is no help</li>
</ul>
<p>See <a href="https://github.com/cxxr/lostpass">the Github repository</a> for the code itself.</p>
<h1 id="faq">FAQ</h1>
<p>Here I've collected a list of questions that I've been asked about this.</p>
<h2 id="what-browsers-and-operating-systems-does-it-work-on">What browsers and operating systems does it work on?</h2>
<p>The attack works best against the Chrome browser because they use an HTML login
page. Firefox actually pops up a window for its login page, so it looks like
whatever operating system you're on. I have experimental support for
Firefox on OS X and Windows 8 in LostPass but it is not enabled by default. </p>
<h2 id="does-this-work-against-lastpass-40">Does this work against LastPass 4.0?</h2>
<p>Yes, I developed it specifically to work against LastPass 4.0. I did not
include any version detection information. </p>
<h2 id="what-can-i-do-to-safeguard-myself-or-my-company">What can I do to safeguard myself or my company?</h2>
<p>Here is a list of suggestions in no particular order:</p>
<ul>
<li>Ignore notifications in the browser window</li>
<li>Enable IP restriction (only available to paid plans)</li>
<li>Disable mobile login (although other attacks could use non-mobile API)</li>
<li>Log all logins and failures</li>
<li>Inform your employees of this potential attack</li>
</ul>
<h2 id="does-two-factor-authentication-help">Does two-factor authentication help?</h2>
<p><strong>Update</strong>:
<a href="https://lastpass.com/support.php?cmd=showfaq&id=10072">LastPass now requires email confirmation for all new logins</a>,
regardless of two-factor auth. The original answer to this question remains
below.</p>
<p>No. In fact, two-factor authentication makes this attack significantly
<em>easier</em>.</p>
<p>By default, LastPass sends an email confirmation when a new IP address attempts
to login to LastPass. This should stop the attack almost entirely, but it
doesn't. According to <a href="https://lastpass.com/support.php?cmd=showfaq&id=9222">LastPass's documentation</a>, the confirmation email
is only sent if you <em>don't</em> have two-factor authentication enabled.</p>
<p>Since LostPass also phishes for the two-factor auth code, it bypasses the email
confirmation step.</p>
<p>It is possible to make LostPass more effective against the case where it is
blocked by confirmation email (something like, "Please confirm your login via
email to continue"), but the attack was already potent enough.</p>
<h2 id="what-about-yubikeyu2fduo">What about Yubikey/U2F/Duo?</h2>
<p>I only checked Google Authenticator because that's what I had, but here's how
you can figure out if another two-factor authentication would have helped: if
you can tell the attacker what they need to know, then it won't help. So if you
type in a token, it won't help. If you get a push notification that is approved
and you let the attacker in, it won't help.</p>
<h2 id="how-can-i-check-if-ive-been-attacked">How can I check if I've been attacked?</h2>
<p>View your <a href="https://helpdesk.lastpass.com/your-lastpass-vault/account-history/">LastPass Account History</a> to inspect every login
attempt and which IP addresses it was done from.</p>
<h2 id="what-are-some-alternatives-to-lastpass">What are some alternatives to LastPass?</h2>
<p><a href="http://notlastpass.rockettech.net/">Here are some alternatives to LastPass</a>. I have not researched
any of these alternatives and cannot guarantee if they're safer than LastPass.</p>
<p>Things to look at:</p>
<ul>
<li>Browser extensions are riskier than native applications</li>
<li>An API makes it easier to steal a lot of data</li>
<li>Store only frequently used and low risk data in a password manager</li>
</ul>
<h2 id="how-is-this-related-to-the-attack-from-2015-by-garcia-and-vigo">How is this related to the attack from 2015 by Garcia and Vigo?</h2>
<p>Garcia and Vigo published an attack called
"<a href="http://www.martinvigo.com/even-the-lastpass-will-be-stolen-deal-with-it/">Even the LastPass Will be Stolen, Deal with It!</a>". Their work is a
sophisticated client-side attack that relies on bad design choices that
LastPass made that make it vulnerable to compromised machines.</p>
<p>My work comes at LastPass from a different angle: you don't have access to a
LastPass user's machine. Instead, you trick the user into giving you their
credentials.</p>
<h2 id="did-you-hack-lastpass">Did you hack LastPass?</h2>
<p>No.</p>
<h2 id="why-did-you-develop-this-attack">Why did you develop this attack?</h2>
<p>I think that the security industry's view of phishing is naive at best,
negligent at worst. Phishing is the most dominant attack vector and is used
by everyone from run-of-the-mill cryptolocker types to APTs. Don't just take
it from me, though. Take it from <a href="https://twitter.com/thegrugq/status/649164150858321921">the grugq</a>:</p>
<blockquote>
<p>It's surprising how critical good phishing technique is with these APT
attacks. Effective phishing is more important than 0day.</p>
</blockquote>
<p>The standard refrain is that we need better user training. That is simply not
good enough.</p>
<p>The real solution is designing software to be phishing resistant. Just like we
have anti-exploitation techniques, we need anti-phishing techniques built into
more software. Software security evaluations should also include how easy it is
to phish said software.</p>
<h2 id="why-are-you-releasing-this-as-a-tool-wont-bad-people-use-it-against-me">Why are you releasing this as a tool? Won't bad people use it against me?</h2>
<p>Unlike most exploits, this attack requires no sophisticated knowledge. A
simple right-click will get you the HTML. A tiny bit of JavaScript will glue
the pieces together. As soon as I published details of this attack, criminals
could make their own version in less than a day. I am publishing this tool so
that companies can pen-test themselves to make an informed decision about this
attack and respond appropriately.</p>
<p>This is backwards for most vulnerability disclosures. Most vulnerabilities are
easy-to-fix and hard-to-exploit. This is hard-to-fix and easy-to-exploit, so I
felt that a tool release was appropriate. There is also precedent for LastPass
attacks: Garcia and Vigo released a <a href="https://github.com/rapid7/metasploit-framework/blob/master/modules/post/multi/gather/lastpass_creds.rb">metasploit module</a> for their
attack.</p>
<h2 id="did-you-tell-lastpass">Did you tell LastPass?</h2>
<p>Yes. I informed them in November, and they acknowledged the bug in December.</p>
<p>This has been a long and confusing issue. At first LastPass understood this bug
to be mainly be a result of the logout CSRF. Then they suggested it wouldn't
work because of the email confirmation step. The GM of LastPass said that
LastPass, "can confirm this is a phishing attack, not a vulnerability in
LastPass." I obviously disagree.</p>
<p>One of the fixes they implemented to fix LostPass was to warn users when they
type in their master password into some website. However, they display a
warning message in the browser viewport, like all of their messages. On an
attacker-controlled website, it is trivial to detect when this notification is
added. Then the attacker can do whatever. In LostPass, I suppress the
notification and fire off a request to an attacker server to log the master
password.</p>
<p>We as an industry do not respond to phishing attacks well. I do not blame
LastPass for this, they are like everyone else. We need to take a long look at
phishing and figure out what to do about it. In my view, it's just as bad, if
not worse than, many remote code execution vulnerabilities, and should be
treated as such.</p>
<h2 id="is-what-youre-doing-right">Is what you're doing right?</h2>
<p>I think informing users about security concerns in the products they use is
important. Too often security researchers kowtow to corporations by not telling
users about vulnerabilities they should know about. I think of security
researchers (a group I do not identify with) as having a similar ethical code
to journalists: the public has a right to know. Your interviewee (target)
does not get to dictate how the interview (research) is published or disclosed.</p>
<p>Your own judgement is paramount.</p>
<h2 id="can-operating-systems-or-browsers-do-something-to-address-this-class-of-bugs">Can operating systems or browsers do something to address this class of bugs?</h2>
<p>Yes. </p>
<p>To spoof the "chrome-extension" protocol, I bought the domain
"chrome-extension.pw", which looks close enough. Connecting to
chrome-extension.pw over HTTP makes it look pretty similar to the built-in
protocol. There is an <a href="https://code.google.com/p/chromium/issues/detail?id=453093">open issue in Chromium</a> to address this.</p>
<p>It is harder to spoof in Firefox, where I had to draw each OS's native widget
manually using HTML and CSS. They're not perfect, either, but it's pretty
close. Here's an image of LastPass and LostPass for Firefox on Windows 8
side-by-side. Which one is which?</p>
<p><a href="https://www.seancassidy.me/static/images/"><img src="/images/lastpass_firefox.png" width="100%" alt="LastPass Firefox login"></a></p>
<p>Since the browser viewport can draw anything with pixels, we need to think
about how we authenticate native windows visually. UX is a very important
security concern. UAC's dimming of the screen in Windows is a step in the right
direction.</p>
<h1 id="more-information">More information</h1>
<p>For more information, look at <a href="https://raw.githubusercontent.com/cxxr/lostpass/master/lostpass_shmoocon_slides.pdf">my ShmooCon slides</a>,
<a href="https://archive.org/details/Lostpass">watch the video</a> and <a href="https://github.com/cxxr/lostpass">the source code to LostPass itself</a>. You
can also <a href="mailto:sean@seancassidy.me">email me</a> or
<a href="https://twitter.com/sean_a_cassidy">tweet at me</a>.</p>
<p><a href="https://github.com/cxxr/lostpass"><img style="position: absolute; top: 0; left: 0; border: 0;" src="https://camo.githubusercontent.com/567c3a48d796e2fc06ea80409cc9dd82bf714434/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f6c6566745f6461726b626c75655f3132313632312e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_left_darkblue_121621.png"></a></p>Privilege2015-09-17T06:12:00-07:002015-09-17T06:12:00-07:00Sean Cassidytag:www.seancassidy.me,2015-09-17:/privilege.html<p>January 2006: a friend and I were attending <a href="http://shmoocon.org/">ShmooCon</a>, a hacker
convention in DC. It was our first one ever. Being high school students, we
had to take the LIRR to Penn Station, and then a subway to the Port Authority
to grab a bus to DC, since a plane …</p><p>January 2006: a friend and I were attending <a href="http://shmoocon.org/">ShmooCon</a>, a hacker
convention in DC. It was our first one ever. Being high school students, we
had to take the LIRR to Penn Station, and then a subway to the Port Authority
to grab a bus to DC, since a plane ticket was too expensive. It took forever.</p>
<p>Every year, ShmooCon holds a charity event that's hacker-themed. That year's
event was the hacker arcade: drop in a quarter, play a game, earn points
towards a raffle. All proceeds to charity. My friend and I built a machine
that would let you play <a href="http://www.nethack.org/">Nethack</a> for 25¢. To accept money and
play the game, we built a container that would hold the money and sort coins.
It had a switch inside that, when depressed, pressed a key on an attached
keyboard. The game code would wait for that keypress before starting. It was
simple. It accepted correctly-sized <a href="https://en.wikipedia.org/wiki/Slug_%28coin%29">slugs</a>, but we figured that was
pretty unlikely at a charity event.</p>
<p>This is what it looked like:</p>
<p><a href="https://www.seancassidy.me/static/images/"><img alt="Picture of our Nethack arcade machine coin
mechanism" src="https://www.seancassidy.me/static/images/" /></a></p>
<p>You can actually see that the function keys at the top are disabled. We
disabled them because that's what the coin-op mechanism used to register a
successful coin deposit in my horrible Perl ncurses program<sup id="fnref:shmoocon2006"><a class="footnote-ref" href="#fn:shmoocon2006">1</a></sup>.
<a href="https://www.seancassidy.me/static/images/">Here's some detail of the inside</a>, you can
see that there are different slots for different sized coins, and
<a href="https://www.seancassidy.me/static/images/">here's a picture of the instructions</a>.
It was to my great satisfaction that <a href="http://druid.caughq.org/presentations/turbo/My-Handle/img8.html">someone hacked our game</a>, despite
our "security precautions". It was my first lesson in designing secure
software.</p>
<p>In 2006, 9/11 was still burned into our minds. There were soldiers with rifles
and dressed in fatigues in airports and on subway platforms. This was normal.
We carried this metal-tube-with-wires onto the LIRR, and then walked through
the streets of New York City, and onto a crowded subway platform, past
armed soldiers. One holding an M4A1 even nodded at me as we passed.</p>
<p>I was carrying a metal cylinder with wires poking out of it. No one gave us a
second glance. What were they thinking? We nervously joked about it afterwards.
That was the first time we even thought that our Nethack machine might look like
a bomb. Maybe they could tell it wasn't a bomb from 20 feet away, but I doubt
it.</p>
<p>My friend and I are white. We don't have olive skin, nor do we look Muslim. We
would never have made it to ShmooCon if we did. I didn't realize it at the
time, but we were very privileged to be white. It gave us a freedom we didn't
even know others lacked. When I hear about <a href="http://www.bbc.com/news/world-us-canada-34266389">Ahmed Mohamed</a> being
arrested for bringing a breadboard clock into school, I think about when I
carried a metal cylinder with wires poking out past armed soldiers onto a
crowded subway train, and I worry.</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:shmoocon2006">
<p>If anyone has any pictures of our setup from the ShmooCon
2006 Hacker Arcade, please let me know! I'd love to see them. <a class="footnote-backref" href="#fnref:shmoocon2006" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Your Own Verifiable Hardware RNG with bladeRF SDR2015-07-09T07:35:00-07:002015-07-09T07:35:00-07:00Sean Cassidytag:www.seancassidy.me,2015-07-09:/your-own-verifiable-hardware-rng-with-bladerf-sdr.html<p>You need randomness. A lot of it. Good quality and fast. </p>
<p>If you run any servers which use SSL, you need somewhere around 108 bytes of
randomness for each connection. If you don't have enough, or you use a biased
source, your private RSA keys might be <a href="https://factorable.net/">trivially factorable</a>.
<a href="http://www.iacr.org/conferences/crypto2012/slides/11-2-Hughes.pdf">Thousands …</a></p><p>You need randomness. A lot of it. Good quality and fast. </p>
<p>If you run any servers which use SSL, you need somewhere around 108 bytes of
randomness for each connection. If you don't have enough, or you use a biased
source, your private RSA keys might be <a href="https://factorable.net/">trivially factorable</a>.
<a href="http://www.iacr.org/conferences/crypto2012/slides/11-2-Hughes.pdf">Thousands of private RSA keys have been recovered</a> by researchers.
If you let others generate your RSA keys they can be <a href="http://kukuruku.co/hub/infosec/backdoor-in-a-public-rsa-key">easily backdoored and
there is no way for you to know</a>.</p>
<p>Where does this randomness come from? On Linux, it comes from a pseudo
random number generator (PRNG) called <em>/dev/urandom</em>. PRNGs are just
algorithms that have an internal state and produce numbers indistinguishable
from randomness to those unaware of that state. If you knew the state, you
could figure out what random numbers were next in the stream.</p>
<p>There's another PRNG on Linux and other Unix-like systems that some people
mistakenly think is a true RNG, <em>/dev/random</em>. It's the exact same as
<em>/dev/urandom</em> except for one key difference: it measures how much entropy it
has remaining, and it will block until it has more. The primary entropy pool
will be empty when its drained faster than its restored.</p>
<p>Restoring entropy to the primary pool is done through events like mouse
activity or network activity. The more events, the more entropy is in the
pool. Events must be generated externally and must be able to be measured.
For instance, let's say your server receives the following packets:</p>
<div class="codehilite"><pre><span></span><span class="err">Timestamp (ms) Port Size</span>
<span class="err">------------------------------------------</span>
<span class="err">1410966238020 TCP port 23415 4522 bytes</span>
<span class="err">1410966238021 TCP port 80 40 bytes</span>
<span class="err">1410966238193 TCP port 80 9291 bytes</span>
<span class="err">1410966238261 TCP port 23415 4522 bytes</span>
<span class="err">1410966238311 UDP port 3241 243 bytes</span>
</pre></div>
<p>And now you want to turn this information into entropy. You can't just take the
size of the packets or the port number and feed them in as random numbers.
They're certainly not random, and attackers can control them by just sending
you packets. You can't use the data from the packets for the same reason.</p>
<p>You could, however, use the timestamp. But you can't use the entire timestamp
as a random number, because the bits of 1410966238020 are not all equally
random. The more significant numbers, for instance, the leading 1, change on
the order of decades. The numbers at the end change every millisecond. It's
hard for attackers to predict the least significant bit, whether the time will
be even or odd.</p>
<div class="codehilite"><pre><span></span><span class="err">|---- MSB, least random</span>
<span class="err">v </span>
<span class="err">000000010100100010000100001001000011001101000100</span>
<span class="err"> ^</span>
<span class="err"> LSB, most random ------|</span>
</pre></div>
<p>And this is what the Linux kernel and other operating systems do to seed their
entropy pool: they take the LSB of mouse movement timing, packet arrival times,
disk read times, and so on. No attacker could predict something so minute.
Thermal randomness is enough to throw off such measurements, especially when
they're measured in nanoseconds.</p>
<p>If we want to have a continual source of good randomness, we need a way to
get some LSBs fast. A popular method in the past as been sound cards with no
input, but there are a few problems with this method:</p>
<ol>
<li>The sampling rate is quite low, so there's no way to get a very large amount
of data.</li>
<li>If you plug in a strong, known source into the source card port you can
force the RNG to known values. Use epoxy or else.</li>
<li>Servers don't have sound cards anymore.</li>
</ol>
<p><a href="https://en.wikipedia.org/wiki/RdRand">RdRand</a>, available on Intel Ivy Bridge CPUs, is another hardware
random number generator. Originally it was used as the only source of
randomness in <em>/dev/random</em> in FreeBSD, while in Linux it was used in
conjunction with a PRNG. The FreeBSD developers said that <a href="http://arstechnica.com/security/2013/12/we-cannot-trust-intel-and-vias-chip-based-crypto-freebsd-developers-say/">because of the high
likelihood of backdoors in hardware RNGs, they could not continue using it
without a PRNG</a>.</p>
<p>We can do better. The perfect hardware RNG would be cheap, fast, and
verifiable. I think that I've found just that.</p>
<h1 id="the-setup">The setup</h1>
<p>I bought a <a href="http://www.nuand.com/blog/product/bladerf-x40/">bladeRF</a> back when it was a Kickstarter project because I
wanted to experiment with RF, specifically GSM. My experiments with that
will probably be in another blog post, but I really liked how everything with
the bladeRF is open source, including the FPGA HDL code. Its sampling rate is
miles ahead of other SDRs, so I knew it could do everything I wanted it to.</p>
<p>After reading about the <a href="https://en.wikipedia.org/wiki/Dual_EC_DRBG">backdoors in Dual_EC_DRBG</a>, I wanted to
know more about hardware RNGs. I stumbled upon <a href="https://github.com/pwarren/rtl-entropy/">rtl-entropy</a>, a
project that uses RF sources for randomness. I plugged in my
<a href="http://www.amazon.com/gp/product/B0018PS4O0/ref=as_li_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B0018PS4O0&linkCode=as2&tag=reamorpap-20&linkId=JV6WTNASGD3ESPBW">handy antenna</a>, compiled brf-entropy, and set to work.</p>
<p>If you don't want to buy a bladeRF, rtl-entropy also supports the
<a href="http://www.amazon.com/gp/product/B00MWGV16M/ref=as_li_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B00MWGV16M&linkCode=as2&tag=reamorpap-20&linkId=MOC4VDDZCPIXARSR">RTL-SDR</a>, a cheap software-defined radio that can be used for this
purpose. You'll need to change the <em>brf_entropy</em> command to <em>rtl_entropy</em> in
the following sections and your frequency range might be different.</p>
<h1 id="using-it">Using it</h1>
<p>First, we need to start up the entropy collector:</p>
<div class="codehilite"><pre><span></span><span class="err"># brf_entropy -f 850MHz -b</span>
</pre></div>
<p>Then we need to start up rngd which will sample from the randomness and add it
to the <em>/dev/random</em> pool.</p>
<div class="codehilite"><pre><span></span><span class="err"># rngd -r /var/run/rtl_entropy.fifo -W95%</span>
</pre></div>
<p>Now our bladeRF is connected to <em>/dev/random</em> directly. It won't block as much
because we're adding a lot of randomness to it. Here's a simple test:</p>
<div class="codehilite"><pre><span></span><span class="err"># timeout 60s /bin/bash -c "cat /dev/random > dev.random.brf"</span>
<span class="err"># killall brf_entropy rngd</span>
<span class="err"># timeout 60s /bin/bash -c "cat /dev/random > dev.random.no.brf"</span>
<span class="err"># timeout 60s /bin/bash -c "cat /dev/urandom > dev.urandom"</span>
</pre></div>
<p>This will create three files: one that's sourced from <em>/dev/random</em> for 60
seconds, with rngd feeding randomness from the bladeRF, another that's sourced
from <em>/dev/random</em> for 60 seconds but without the bladeRF, and a third from
<em>/dev/urandom</em> for comparison. Let's see how many bytes could be read in this
time.</p>
<div class="codehilite"><pre><span></span><span class="err"># ls -Al</span>
<span class="err">-rw-r--r-- 1 root root 21016277 Sep 6 18:29 dev.random.brf</span>
<span class="err">-rw-r--r-- 1 root root 22 Sep 6 18:31 dev.random.no.brf</span>
<span class="err">-rw-r--r-- 1 root root 769130496 Sep 6 18:35 dev.urandom</span>
</pre></div>
<p>We only got a measly 22 bytes from <em>/dev/random</em> with no hardware RNG.</p>
<p>The CPU usage in my one core virtual machine while reading from <em>/dev/random</em>
with <em>cat</em>:</p>
<div class="codehilite"><pre><span></span><span class="err">PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND</span>
<span class="err">16109 root 20 0 130512 9700 1060 S 62.5 0.5 0:23.66 brf_entropy</span>
<span class="err">16124 root 20 0 11412 612 516 S 16.3 0.0 0:06.47 cat</span>
<span class="err">16113 root 20 0 8964 336 224 S 13.6 0.0 0:05.43 rngd</span>
</pre></div>
<p>And from <em>/dev/urandom</em>:</p>
<div class="codehilite"><pre><span></span><span class="err">PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND</span>
<span class="err">16149 root 20 0 11412 612 516 R 94.8 0.0 0:12.87 cat</span>
</pre></div>
<h1 id="whats-next">What's next</h1>
<p>I think having an easy-to-verify hardware random number generator is important.
<a href="http://cr.yp.to/talks/2014.10.18/slides-djb-20141018-a4.pdf">DJB points out that we need verifiable RNGs</a>, but he
cautions that just adding <a href="http://blog.cr.yp.to/20140205-entropy.html">more entropy won't necessarily help</a>. I
think hardware RNGs will be important in a small but critical number of
applications. </p>
<p>I'm hoping that the rtl-entropy project will add more randomness tests and
maybe implement something like frequency sampling so that if it seems like
someone is manipulating a particular frequency, it can find a frequency more
amenable for its purpose. We'll also need to audit both it and bladeRF,
which is fortunately possible because both are open source.</p>
<p>Random number generators form the cornerstone of secure cryptography. In light
of backdoored software RNGs and closed hardware RNGs, we need open, verifiable
ways to generate random numbers. Yarrow and Fortuna are what we need in
software. The bladeRF and rtl-entropy may be what we need in hardware.</p>We, the Weapons2015-06-05T08:38:00-07:002015-06-05T08:38:00-07:00Sean Cassidytag:www.seancassidy.me,2015-06-05:/we-the-weapons.html<p><em>This is a story I wrote for the <a href="https://twitter.com/DCShortstory">DEFCON Short Story Contest</a>. <a href="https://forum.defcon.org/forum/defcon/dc23-official-unofficial-parties-social-gatherings-events-contests/dc23-official-and-unofficial-contests/creative-writing-defcon-short-story-contest/220594-all-stories-submitted-for-short-story-contest-2015-at-def-con-23?p=220612#post220612">Here's my entry</a> on the official forum. Be sure to <a href="mailto:sean@seancassidy.me">let me know</a> what you think.</em></p>
<h1 id="we-the-weapons">We, the Weapons</h1>
<p>Everything is proscribed. There are no more adventures. Not in this town, not anywhere. Not anymore. The only thing …</p><p><em>This is a story I wrote for the <a href="https://twitter.com/DCShortstory">DEFCON Short Story Contest</a>. <a href="https://forum.defcon.org/forum/defcon/dc23-official-unofficial-parties-social-gatherings-events-contests/dc23-official-and-unofficial-contests/creative-writing-defcon-short-story-contest/220594-all-stories-submitted-for-short-story-contest-2015-at-def-con-23?p=220612#post220612">Here's my entry</a> on the official forum. Be sure to <a href="mailto:sean@seancassidy.me">let me know</a> what you think.</em></p>
<h1 id="we-the-weapons">We, the Weapons</h1>
<p>Everything is proscribed. There are no more adventures. Not in this town, not anywhere. Not anymore. The only thing left is money, deceit, and booze. You're in or you're out. On the take or taking it on. After working this job in this town for this long, I should know. Trust me. What I'm about to tell you is just more of the same.</p>
<p>Before the incidents in question, before all hell broke loose, I sat in my usual spot at a dive in North Vegas. In the back, near the disused jukebox. Back to the wall, a view of the whole place. I was an inquirer of sorts back then, a detective, maybe, but I would never have used that word. I was curious is all, and curiosity could pay well in this town, or it could end you. I walked that line like a well-practiced sobriety test. I worked for an agency, nominally. Nominally: a ten dollar word which means whenever the fuck I felt like it, which, at this point, wasn't often.</p>
<p>On that day, I sat there and watched the regulars while I nursed my gin. One of the regulars, Lee, had just walked in and found a table near the front. Here was a man I wouldn't trust to hold a door open for me without wondering what was in it for him. It amazed me that he could manage to fleece enough poor saps with his "business ventures" to keep him and his coterie of female admirers partying most of the week. He frequented this dive and the Anchor Club. The Anchor Club was the antithesis to this place: expensive, on the strip, well kept, filled with tourists, bottle service, and loud music. Find the marks in the Anchor Club, plan the cross here.</p>
<p>Sitting alone was Howie. He was a drunk and a thief and a liar. Used to be a boxer. Well, more accurately, he was a professional loser. His job was to get into the ring and put on a good show. Make it seem like he was giving the other boxer a run for his money. He could take a hit and not get too banged up. It paid well. But you can't get your head smashed in on a regular basis and come out the other side all normal. His modus operandi nowadays was to get enough money to pay for whatever he needed to get away from the here and now.</p>
<p>The bartender, Linda, had stopped off with his usual, a rotten Old Fashioned. Made with the best bottom shelf stuff available, no doubt. Linda was a friend of mine. A source of information, humor, booze, and therapy.</p>
<p>"What's up, hun-bun? Need another?" she asked.</p>
<p>"Good for now, Lind. Meeting a friend."</p>
<p>"Sure thing. Holler if you need anything," she said as she sauntered back behind the bar.</p>
<p>I looked at the old security camera, a habit of mine. Gets a good look at the scruffs coming in the entrance. Well, usually it did. But now it was pointed at the back. At me.</p>
<p>I stared at it until my boss opened the door of the bar. He was standing next to a man so handsome and well dressed I thought someone was putting me on. Is this for real? He saw me, fuck. I was half-hoping he'd take one look at me, decide that there were sorrier-looking saps in the world, but he couldn't think of any, and leave. He walked up to Linda, placed an order, pointed at me, probably because he was putting the drinks on my tab, and came over.</p>
<p>"Why, hello!" he said. His teeth glistened like a recently cleaned window. That slimy fuck.</p>
<p>"How's it going, Horace?" I tried my best to be professional, but the booze in me was creeping up and I knew I'd sock him if he pushed me.</p>
<p>"Good, now that I know you'll be pulling a job for me," he paused to smile at Linda as she brought over two beers. "Now, don't even try to tell me you're too busy or it's not interesting enough. I've vetted this one myself. Besides, I think you need some looking after. You don't look like you've been doing well lately and frankly, I'm concerned."</p>
<p>"How unlike you," I said.</p>
<p>He deflated a bit.</p>
<p>"I'm hurt, you know. Here I come in with a paycheck for you, already signed," he took a check out of his breast pocket and put in on the table and paused. "I know, not an insignificant sum. As I was saying: here I come in with a paycheck and a simple assignment, and all you need to do is say yes and this check and another like it will be yours in no time at all."</p>
<p>I took a pull from my drink, finished it, and put it back down harder than I intended. The beer I left untouched.</p>
<p>"What, no response?"</p>
<p>"What's the job?"</p>
<p>"I'm glad you asked. Simple, really. Someone is coming to Vegas next week, for a conference. She has something one of our clients wants, something that will help them very much." The handsome assistant-or-whatever of his was meandering near the entrance of the bar. Looking outside every once in awhile, checking his expensive watch, checking his phone.</p>
<p>I stared at Horace, eyes half lidded. I've heard this a thousand times.</p>
<p>"And what is it that your client wants?"</p>
<p>"Our client," he emphasized, "wants a program this person has written. But unfortunately, she won't sell. Not even for a lot of money. And they want it rather badly." He drank his beer and wiped his mouth. "I'm sure you've heard of the Great Artillery."</p>
<p>I had. It was the first major cyberweapon: a tool that could remove a site or group of sites from the Internet completely. It used a distributed attack that hijacked normal computers. It was very difficult to defend against, apparently. I read about it in an email I had stolen a year or so back.</p>
<p>I nodded.</p>
<p>"Well, this program exploits a vulnerability of the Great Artillery. It could be what our client needs to finally defend against it. Allegedly, it can confuse the Great Artillery's algorithm to point at random sites or something like that and greatly mitigates the damage."</p>
<p>"So you want me to either get the target to go to the bargaining table, or, even better, to get the code directly and give it to you."</p>
<p>"Précisément! So, what do you think? Want the job?"</p>
<p>"Who is the target?"</p>
<p>Always good to know who I'll be stealing from for money. Maybe it's someone I know and I'll be able to fuck up another relationship I once had.</p>
<p>"We don't know her real name, only her alias, approximate age, and the fact that she's a woman. She goes by <tt>persephone</tt> and she's approximately 25-35, probably of American birth and upbringing."</p>
<p>"Photos? Last known location? Contacts? This sounds like a missing person case, not an info job."</p>
<p>Horace took out an envelope out of his jacket pocket and handed it to me. I opened it and there was a printout of IRC logs:</p>
<div class="codehilite"><pre><span></span><span class="mi">08</span><span class="o">:</span><span class="mi">40</span> <span class="o">-!-</span> <span class="n">e27</span> <span class="p">[</span><span class="n">e27</span><span class="p">@</span><span class="n">xnet</span><span class="o">-</span><span class="mf">2F20F</span><span class="mi">7</span><span class="n">B8</span><span class="p">]</span> <span class="n">has</span> <span class="n">joined</span> <span class="err">#</span><span class="n">dohi</span>
<span class="mi">08</span><span class="o">:</span><span class="mi">40</span> <span class="p">[</span><span class="n">Users</span> <span class="err">#</span><span class="n">dohi</span><span class="p">]</span>
<span class="mi">08</span><span class="o">:</span><span class="mi">40</span> <span class="p">[</span> <span class="n">august</span> <span class="p">]</span> <span class="p">[</span> <span class="n">pe_x</span> <span class="p">]</span> <span class="p">[</span> <span class="n">halfalive</span><span class="p">]</span> <span class="p">[</span> <span class="n">manfield</span> <span class="p">]</span> <span class="p">[</span> <span class="n">e27</span> <span class="p">]</span>
<span class="mi">08</span><span class="o">:</span><span class="mi">40</span> <span class="p">[</span> <span class="n">persephone</span> <span class="p">]</span> <span class="p">[</span> <span class="n">ty</span><span class="err">—</span><span class="o">-</span> <span class="p">]</span> <span class="p">[</span> <span class="n">z</span> <span class="p">]</span> <span class="p">[</span> <span class="n">sp3rt</span> <span class="p">]</span> <span class="p">[</span> <span class="n">mad</span> <span class="p">]</span>
<span class="mi">08</span><span class="o">:</span><span class="mi">40</span> <span class="o">-!-</span> <span class="nl">Yrssi</span><span class="p">:</span> <span class="err">#</span><span class="nl">dohi</span><span class="p">:</span> <span class="n">Total</span> <span class="n">of</span> <span class="mi">10</span> <span class="n">nicks</span> <span class="p">[</span><span class="mi">0</span> <span class="n">ops</span><span class="p">,</span> <span class="mi">0</span> <span class="n">halfops</span><span class="p">,</span> <span class="mi">0</span> <span class="n">voices</span><span class="p">,</span> <span class="mi">10</span> <span class="n">normal</span><span class="p">]</span>
<span class="mi">08</span><span class="o">:</span><span class="mi">40</span> <span class="o">-!-</span> <span class="nl">Yrssi</span><span class="p">:</span> <span class="n">Join</span> <span class="n">to</span> <span class="err">#</span><span class="n">dohi</span> <span class="n">was</span> <span class="n">synced</span> <span class="k">in</span> <span class="mi">1</span> <span class="n">secs</span>
<span class="mi">08</span><span class="o">:</span><span class="mi">41</span> <span class="o">-!-</span> <span class="n">mad</span> <span class="n">is</span> <span class="n">now</span> <span class="n">known</span> <span class="n">as</span> <span class="n">dflv</span>
<span class="mi">08</span><span class="o">:</span><span class="mi">42</span> <span class="o"><</span> <span class="n">dflvh</span><span class="o">></span> <span class="n">hmm</span><span class="p">.</span>
<span class="mi">12</span><span class="o">:</span><span class="mi">34</span> <span class="o"><</span> <span class="n">persephone</span><span class="o">></span> <span class="n">anyone</span> <span class="n">going</span> <span class="n">to</span> <span class="n">defcon</span>
<span class="mi">14</span><span class="o">:</span><span class="mi">11</span> <span class="o"><</span> <span class="n">z</span><span class="o">></span> <span class="n">probably</span><span class="p">,</span> <span class="n">you</span><span class="o">?</span>
<span class="mi">14</span><span class="o">:</span><span class="mi">12</span> <span class="o"><</span> <span class="n">august</span><span class="o">></span> <span class="n">yeah</span><span class="p">,</span> <span class="n">last</span> <span class="n">time</span><span class="p">,</span> <span class="n">tho</span>
<span class="mi">23</span><span class="o">:</span><span class="mo">00</span> <span class="o"><</span> <span class="n">persephone</span><span class="o">></span> <span class="n">are</span> <span class="n">you</span> <span class="n">guys</span> <span class="n">going</span> <span class="n">to</span> <span class="n">any</span> <span class="n">good</span> <span class="n">parties</span><span class="o">?</span>
<span class="mi">23</span><span class="o">:</span><span class="mo">00</span> <span class="o"><</span> <span class="n">august</span><span class="o">></span> <span class="n">i</span> <span class="n">can</span> <span class="n">get</span> <span class="n">you</span> <span class="n">an</span> <span class="n">invite</span> <span class="n">to</span> <span class="o">/</span><span class="n">the</span><span class="o">/</span> <span class="n">party</span><span class="p">.</span>
<span class="mi">23</span><span class="o">:</span><span class="mi">10</span> <span class="o"><</span> <span class="n">august</span><span class="o">></span> <span class="n">hit</span> <span class="n">me</span> <span class="n">up</span> <span class="n">when</span> <span class="n">you</span> <span class="n">get</span> <span class="n">to</span> <span class="n">vegas</span> <span class="n">and</span> <span class="n">i</span><span class="err">'</span><span class="n">ll</span> <span class="n">show</span> <span class="n">you</span><span class="p">.</span>
<span class="mi">23</span><span class="o">:</span><span class="mi">46</span> <span class="o">-!-</span> <span class="n">persephone</span> <span class="p">[</span><span class="n">ptrace</span><span class="p">@</span><span class="n">xnet</span><span class="o">-</span><span class="n">F03AD8CF</span><span class="p">]</span> <span class="n">has</span> <span class="n">quit</span> <span class="p">[</span><span class="n">Ping</span> <span class="nl">timeout</span><span class="p">:</span> <span class="mi">184</span> <span class="n">seconds</span><span class="p">]</span>
</pre></div>
<p>There was another piece of paper describing a few more details of <tt>persephone</tt>'s published writing and how they determined her age and gender. Another page on what DOHI was. There was something missing, though. No logs or description of <tt>persephone</tt>'s tool.</p>
<p>"Where's the info about what <tt>persephone</tt> wrote? I don't see it in any of this. Where does that lead come from?"</p>
<p>"Ah. Well, it was a bit of clandestine work. An audio call was intercepted from one member of DOHI to another and they mentioned the tool and how effective it could be and who wrote it."</p>
<p>"Do you have a recording I can listen to? Or a transcript?"</p>
<p>He frowned for the first time since he entered.</p>
<p>"No."</p>
<p>"Well, what's DOHI?"</p>
<p>"There's an entire page in there on that group. Read it for yourself on your own time." He drained his beer. "So, what do you say?"</p>
<p>I don't like taking jobs where the target is someone like me. Someone down and out, or some lowlife, or someone who just isn't that important to the powers that be. But the jobs offered by those powers pay the best. This job paid the best. I don't like to admit it, but I was strapped for cash at that moment. I should have known it was crooked from the start. The lack of details, the money, the Great Artillery for fuck's sake. If I had thought about it, I would have known who I was working for.</p>
<p>"Alright. I'll do it."</p>
<p>Horace stood and shielded his eyes from the light entering from the street. A car pulled in. His assistant waved for him to leave.</p>
<p>"Looks like my ride is here. Take care of yourself."</p>
<p>I looked around. Another job, another month of rent. But I should have known better. Once a chump, always a chump. I know that now. I felt sick. The smell of the cigarette ash and the ancient furniture, the lights, the booze, the job. I stumbled to the ladies room and got myself a place where I could be myself, truly myself, and vomited it all out. Nausea was comforting for me. It would wake me in the middle of the night, like a needy friend. If I went long enough without it I felt a little less special, like a part of me was missing. I didn't want it; it wanted me. After I was done, I slumped on the floor and spent the next half hour feeling sorry for myself.</p>
<p>I opened the stall door with a slam. A flourish, I told myself. I put my hands on the sink counter and stared back at what was in the mirror. I tried to find something there but found nothing. Everything grows less and less clear here. Who was she? The author of a countermeasure to the Great Artillery had to know what she was doing. She had to know what it was worth. What was she worth?</p>
<hr />
<p>I did some research into DOHI. They were a prominent hacking group. They liked to post exploits for the software the Internet ran on–the stuff billion-dollar businesses depend on–to public mailing lists and laugh at the freakout. Merry pranksters for the digital age. People loved to write op-eds about how immoral they were and about how there were responsible ways to disclose bugs. That's all bullshit, though. Rich assholes love to tell you that you're immoral and a criminal when you shit on their parade, but what they're doing is "just business."</p>
<p>DOHI would have none of that. They didn't have an agenda like the other groups: they fed off of the chaos. I couldn't find out what DOHI stood for, so I made up a name from a book I once read. The Dark of Human Ignorance. They're teaching us how ignorance is the worst crime of all. Ignorance eats away at the ground we stand on until there's nothing left. Security is the same way. Your ignorance eats away at your security while you stand unaware and tell everyone that the situation is normal.</p>
<p>Couldn't find much specifically on the <tt>persephone</tt> nick. Time for a phone call to a hacker friend of mine, Jaundice. He was mighty useful when there was a business convention in town and I needed some code to rip out someone's information from an unattended laptop or phone. He might know DOHI.</p>
<p>"Jaundice, it's me."</p>
<p>"Oh shit, hold on I'm still waking up."</p>
<p>Jaundice was a motherfucker at times. He was a better info broker than I was for sure, but didn't like to leave his pad. He lived somewhere in Santa Monica, or, at least, that's what he told me and that's what I chose to believe. He had a lot of ins into the tech industry, especially infosec. He said he got his alias from his legendary drinking skills.</p>
<p>"I thought I was bad: it's two. What the fuck were you doing last night?"</p>
<p>"None of your goddamn business. Now, what you want?"</p>
<p>"I've got a few hackers I'd like some contact with. I only have a few nicks–"</p>
<p>"This shit again? Why don't you pay me for once?"</p>
<p>"Professional courtesy?"</p>
<p>"Bzzt."</p>
<p>"Because I think you'll be interested in this. The target is the one who wrote the code that will break the Great Artillery."</p>
<p>He didn't say anything for a moment, but I could hear him sitting up quickly.</p>
<p>"DOHI? I heard DOHI had something, but I figured it was just talk. Who wants to know?"</p>
<p>"It's one of these pay-us-enough-not-to-ask jobs. Looking for <tt>persephone</tt>, she is allegedly the one who wrote it."</p>
<p>"I've heard the nick. I think I know someone who knows her. Wait a tick."</p>
<p>I could hear frantic typing in the background and a squeaky chair swivel. I knew he'd be useful.</p>
<p>"Yeah, a contact of mine says she's one of the core members of DOHI. What do you want to know? I don't like giving info away on fellow hackers, at least, not for free."</p>
<p>"Well she's supposed to be at DEFCON and I want to run into her so I can have a chat. I don't know what she looks like or her real name or anything."</p>
<p>"Hmm, I don't think I can find a picture. OPSEC and all. Let me check one thing." More furious typing. "Yeah, okay. Apparently she's going to be at DEFCON like you said. And there's lots of parties there every year and–"</p>
<p>"So?"</p>
<p>"Fucking let me finish god damnit. I'm helping you! There is one party in particular that is frequented by hackers of DOHI's type. It's hard to get into. Invite-only."</p>
<p>"But, you can get me in, right?"</p>
<p>"This will cost you. I'll have to call in a favor and those are worth cold hard cash."</p>
<p>"Fine, whatever. What do I need to do?"</p>
<p>"You'll get a token in the mail, like a large metal coin. That's your invite. It'll have the time and place and maybe some instructions with it. That's all I can do."</p>
<p>Better than nothing. I hung up.</p>
<hr />
<p>I drove to the strip in my old red convertible. It used to be lively, fast, and showy. Now, well, it wasn't. Overuse and disinterest changed that.</p>
<p>DEFCON was at one of the hotels on the strip. They love putting conferences like this in the heat of the summer, the low season for this fiery pit. They'll come no matter what. I stood in line and paid in cash like all the rest of the poor saps without special access. I was now officially in. I walked through the entrance and onto the convention floor. Like every conference, there were the people selling. Everyone's got something to sell you. Radios to hack wireless networks. Mini-computers to hide in server racks. Selling the promise of an amazing job if I only joined their startup. A kilt with pockets so that your fashion reflected your practicality.</p>
<p>I stood out like a trumpet in a string quartet. All of these people were highly sought after professionals, hackers, breakers, thinkers, doers, programmers, and I wasn't. I let the crowd push me left and right, past the dance floor to the competition room. Here they needed to prove themselves even further. And they loved it. For years, everyone told me to do what I wanted with my life. So I did. And it paid fuck all. What they had meant was follow my dreams as long as it paid. I would have been better off counting beans in an air conditioned office than following deadbeats around this hell hole. If you dislike what you do for work, if you naturally have a distaste for it, you realize that you're doing it for the money and nothing else. There's a hollowness in you because of it. There's no mistaking it. Unless, by some miracle, you enjoy your work. Then you can deceive yourself into thinking that you're not doing this for the money. At least I wasn't being fooled.</p>
<p>I had received my token for the party in the mail a few days prior. It told me to wear a mask. Something to disguise myself. The party was tonight, so I had a few hours to kill.</p>
<p>I saw a talk that seemed interesting and a good way to pass the time. "New Techniques in Covert Information Gathering: Social Engineering for the 21st Century" was the overly loquacious title. I knew a thing or two about this.</p>
<p>The speaker was a few minutes late. He had a backpack full of goodies and trinkets to show off and some to give away. It looked like he had just grabbed all of it from his car. The keys were still in his hand. He was sweaty and bald, mid-thirties. He looked like his natural habitat was a swivel chair and a steady supply of Diet Coke. The talk started off alright, with a few jokes and war stories. But the techniques were mostly amateurish and obvious. The jokes weren't much better. At the Q&A I decided to have a go.</p>
<p>"Excuse me, I don't have a question, but someone told me to tell you your lights are on, a BMW, right?"</p>
<p>A sudden stricken look appeared over his face. He mumbled something into his mic and stood to leave.</p>
<p>"Got you," I said.</p>
<p>Laughter and some applause. Social engineering, my ass.</p>
<hr />
<p>After wasting my time in the bar, asking random people if they knew such-and-such member of DOHI and finding zilch, I rode the elevator up to the penthouse as I donned my mask. The bouncer outside was an enormous truck of a man, nearly half as wide as he was tall, and he stood over a foot taller than me. A solitary camera stared down at us. He let me past without a word when I gave him the token. I opened the door and stepped inside.</p>
<p>Everywhere there were masks. Gas masks, venetian masks like mine, halloween masks, even face paint. Seeing through my mask was difficult. I had no peripheral vision. A few turned to look at me as I walked in but they slowly turned back to whatever it was they were involved in. There was a band playing some modern baroque/techno music. A projector was set up and it was showing images of buildings collapsing and cars crashing. On every other wall there were large mirrors, giving the room the appearance of being an infinite regression of itself. The mirrors were bent slightly so it was difficult to view oneself but easy to view others.</p>
<p>I ordered an Old Fashioned at the bar. The bartender wore a mask and white tuxedo. There were a few more lookalikes walking around the party filling drink orders and dishing out hors d'oeuvres.</p>
<p>Two large figures approached me from behind. They nearly spilled my drink.</p>
<p>"Who are you?" one asked.</p>
<p>"What are you doing here?" the other said.</p>
<p>I smiled beneath.</p>
<p>"I could ask the same of you two," I responded.</p>
<p>They looked at each other and then back at me. My hand rolled into a fist. I was ready.</p>
<p>"I'm sure some nobody invited you. We know the guy running this show. Personal invitations."</p>
<p>I didn't say anything in response. I made my apathy clear. They laughed in condescending derision and left. All bark and no bite.</p>
<p>Drink in hand, I was ready to find <tt>persephone</tt>. I was sure she was here. Call it instinct or insight or intelligence, but I knew it. I listened. I listened to so many self-indulgent, dull conversations. The masks, which were to give the partygoers more freedom of expression, instead made them all alike. What was meant to celebrate uniqueness instead enforced conformity.</p>
<p>Occasionally, I would hear a specific mention of something DOHI had done, but when I joined the group who were talking, they would soon disperse. Forever an outsider. Plan B was to start the fire myself. I found the band at the back of the party. After a song was over I took the mic. I didn't bother asking permission. This was my party, they just didn't know it yet.</p>
<p>"Hey, everyone, thanks for coming. Just a few notes and then we're right back to the music. One: please don't insult the wait staff, they're here for us." Polite applause. Good to start with something that makes you seem like you know what's what. "Two: know your limits otherwise we'll throw you out. Three," here I glanced at my hand to pantomime reading something. "We're looking for two people for something special later. Looks like: <tt>persephone</tt> and <tt>august</tt> are the lucky ones. If you're here, find me!"</p>
<p>Someone in a bloody clown mask came rushing up to me and even through the mask I could tell they were pissed. "What do you think–" they started. I walked away as if I hadn't heard them. No need to explain. I did what I wanted. I floated around for awhile, mesmerized by the masks and what they meant. It was an odd feeling. I was lost amongst them. No one who would remember you nor could you remember them. You didn't matter here. No one did.</p>
<p>I made myself visible, near the center, but not so central as to be lost in the main crowd. I saw two people approaching me from across the room. I knew it was her. It had to be. She had an owl mask that left the lower half of her face exposed. She was wearing deep purple lipstick and I could smell a strange perfume. They were in the middle of a conversation.</p>
<p>"–there's no difference at all. Watching someone is judging them. We are all judges. Everyone would go watch the condemned die at the scaffold. Sure, they might shout or be angry about it, but they'd watch. It was a spectacle. They judged them."</p>
<p>"And so what's this about the Pan-something?"</p>
<p>"The Panopticon, a prison where every inmate is always visible to the guards. It has the obvious benefit of being able to see the prisoners and stop unwanted behavior fast. But there's another benefit. Foucault says that," she took out her phone and read aloud, "'the major effect of the Panopticon: to induce in the inmate a state of conscious and permanent visibility that assures the automatic functioning of power.' That's what I'm talking about. This whole thing isn't about just stopping bad people, it's about changing natural behavior. 'Who is watching me?' becomes a routine question you ask yourself."</p>
<p>"We're in this Panopticon?" I joined in.</p>
<p>"It's called the Internet."</p>
<p>"Is that why you wrote the fix to the Great Artillery?"</p>
<p>If she was surprised that I knew about her code, she didn't show it. Maybe it was common knowledge around here.</p>
<p>"It's not only a solution to this problem. The Antidote is much more than that. I built it because I could. Before, it took an army to hurt a nation. Now it's just knowledge. A single person can do it. And it terrifies them. There will be no security. There isn't any right now, we just haven't realized it."</p>
<p>The Antidote. A fitting name. I could see past the eyeholes in her mask to her face underneath. There was flesh and blood and history and weakness and strength beneath that mask. Her eyes reflected like the mirrors in the room and I could see myself. She didn't blink. Her smile, if you could call it that, had an edge like a cliff.</p>
<p>"So you're <tt>persephone</tt>, obviously. But who is this?" I gestured to the man who walked over with her. He was swaying like a tree in a storm.</p>
<p>"This is <tt>august</tt>."</p>
<p>I could see dark eyes beneath his long nosed mask. It looked like a plague doctor's mask.</p>
<p>"Nice to meet you," he managed. "I'm going to grab another one. Be back in a tick."</p>
<p>"So why did you want us? What's this surprise or secret or whatever?"</p>
<p>"To be honest, I wanted to meet you. I've heard a lot about you."</p>
<p>She scoffed. "No, really."</p>
<p>"Well, I was wondering if we could chat about the Antidote. I'm so curious as to why you said you wouldn't sell it. It could be a lot of money."</p>
<p>"Have you seen a movie called Last Year at Marienbad? French new wave."</p>
<p>"No."</p>
<p>"A shame. It's the perfect film, actually. It's not a movie like most others. Some movies are about plot or characters or a theme the writer or director wanted to shove down your throat. This one is about you and how you understand the world. Who is telling the truth? What is the significance of the contradictory images on screen? Your answers reveal more about yourself than about the movie." She paused, giving me a chance to interrupt, but I didn't. "One of the things that sticks out most about that movie is the background characters, the extras. When the main characters are engaged in dialogue, everyone else in the movie is frozen in place. Waiters bent setting food on tables, people walking stopped in mid-stride, background conversations paused. It's because they don't matter. That's how we view everyone outside of our own lives. Look around right now, do you care about any of these people?"</p>
<p>I didn't know what to say.</p>
<p>"Well, I care in a general way," I hedged. "I don't want anyone to get hurt or anything like that."</p>
<p>"No, no. I mean do you care about how successful they are or how their day is going or if their marriage is happy? Of course not, not even enough to ask them. They're just noise. This is what we deny: that we're all background noise to the powers that be. You barely control your own destiny. And whom do you trust? Everyone has an agenda but will deny it to the end. These people you ignore are your enemies, but you don't know it yet. People treat me like an ingénue or someone's girlfriend. Some background character."</p>
<p>"Even at DEFCON?" I asked. "Even here among your people?"</p>
<p>"My people? That's a laugh. When you are good enough at anything, you stand alone. Anyway to answer your question, it's none of your fucking business you fucking background character. You and everyone else needs to get off my case. You don't know what you're talking about."</p>
<p><tt>august</tt> had rejoined us with two new drinks for them.</p>
<p>"Look, I'm sorry. I was just curious."</p>
<p>I watched them leave.</p>
<p>Time to be a detective again. I followed them at a distance through the open floor and watched the people around me. They seemed very still. The hum of the conversation was steady and words were indistinguishable from one another. It got loud, the way silence does. The hum vibrated elastic and pulsed dark. Fog rolled up from the floor. It was all washing over me and I was drowning. I could suddenly see faces behind the masks and voices through the din. My boss was here. And that assistant of his. They were getting drinks. I could sense it. Lee was probably around, maybe he was a bartender. Howie was a waiter. What was their angle? Were they watching me? Was I supposed to know they were here?</p>
<p>The door opened and <tt>persephone</tt> left. I followed. Around the corner she waited for an elevator, and I knew if I joined her dressed as I was, they would know that they were being followed. I took off my mask and cloak and stashed them under the table the bouncer stood in front of. When she was turned away I quietly slipped behind them and into the stairwell and then back out into the hallway. I shivered and said, "Actually sort of cold on the roof!"</p>
<p>They turned to look at me, and I said in a deeper voice, "Heh, what's with the getup?" They said nothing and turned back towards the elevator. When it came I made sure to let them go first, so I could see what floor they were going to. I made fun of how we were all going to the same floor. The oldest trick in the book. They got off and walked down a hallway and I made it clear that I went the other way. I crept back and followed their footsteps. It was a simple matter to figure out which room they went into.</p>
<p>I walked around the halls for awhile, biding my time and thinking about how to get into her room. I thought of next week and the week after and my old friends. Maybe I'd buy a plane ticket and go see them. I wasn't sure where they were. We had lost contact years ago. So much the better. Vegas makes humanity's true nature clear. The ritzy veneer is a mirage and what's underneath is cheap and poorly made and desperate, like us. We lie to ourselves and to everyone else: this is nice. We're having fun. Is this what you wanted? To have fun? Let's write it on your tombstone how much you had fun. Your obituary will be filled with how much fun you had here or there. How vapid. How useless. There must be more than this.</p>
<p>Walking back towards the elevator, I saw someone who looked like <tt>persephone</tt> get on. I could tell by the shadow that there was someone else in the elevator with her, but I didn't see who. The doors closed. Her room would be empty.</p>
<p>A maid was texting outside of <tt>persephone</tt>'s room.</p>
<p>"I locked myself out of my room. Could you let me in?"</p>
<p>She looked at me with wide eyes, as if terrified. "Oh," she started. "Well, the policy is to go to the main desk and get a replacement, after showing your ID."</p>
<p>"You see, that's just it. I locked my ID in the room along with my key," I smiled in what I hoped was a disarming way.</p>
<p>She looked around and then back at me. I tried to act the part.</p>
<p>"Alright, but don't tell no one I let you in."</p>
<p>She took a ring of keys out of her pocket.</p>
<p>"Maintenance," she said into the door as she opened it.</p>
<p>The room was dark. I was ready to make a break for it if there was someone in there already. I thanked the maid and she left. Lights on. No one in the bed. In fact, the bed was made. No suitcases. Was this the right room? I checked the bathroom: clean, but not done up.</p>
<p>I sat on the bed and thought. I was assigned to get the code, the Antidote, or at least convince <tt>persephone</tt> to talk to my client. I had a phone number she was to call. Maybe I tipped my hand when I followed her and she got spooked and left. But would she make up her room? Was this even her room? What happened? Could she have been kidnapped? She didn't seem in distress on the elevator, but I couldn't be sure if that was her. I've never seen her face. But if so, they would want someone to blame it on. Me. They had me asking the maid for the room key and she was sure to fold if anyone pressed her. Had they paid off the maid to let me in? What was a maid doing outside of the one room I wanted to get into at two o'clock in the morning? Why was there a security camera outside of her room?</p>
<p>Time to split.</p>
<hr />
<p>The following day I returned to find someone who knew something. Maybe <tt>august</tt>, or maybe someone who knew him or <tt>persephone</tt>. I looked around the convention floor to see if I could find any familiar faces. If only that party hadn't been masked, I could find someone I knew there. I sat down next to a trio of cybergoths. Long dreads intertwined with lights and other knickknacks. Platform shoes. Black dusters and pleated skirts. Goggles. I liked their style.</p>
<p>I scanned Twitter for people talking about DEFCON. There were a few people running competitions. I figured that they would be in the know. One guy, Mark Owens, seemed to be running the lock picking competition. His Twitter handle, sp3rt, was familiar. I searched his history to see if he ever talked about DOHI, <tt>august</tt>, or <tt>persephone</tt>. And he did. It was awhile ago, to be sure, but he knew them. He was in the IRC logs I had.</p>
<p>I followed the crowd back to the main convention floor and, like a tourist, consulted the map for longer than I'd like to admit trying to find the lock picking village. Five minutes later I was there. I used to pick locks as part of my job, but after an arrest I figured it'd be easier to knock and lie my way in. Harder to go to jail for lying than it is for breaking and entering.</p>
<p>Mark was judging a speed lock picking tournament at the back. I watched the competition. These people were fast, much better than I ever was. Locks are a strange part of our culture. They have almost nothing to do with security. They're a little sign that says, "Please don't open this." Most locks are easy to pick if you've practiced for an hour. It's like driving: the only thing that prevents someone from driving at 100 miles per hour into your front bumper is some yellow paint on asphalt. It works most of the time. Same thing with locks.</p>
<p>After the competition I flagged him down.</p>
<p>"Sorry, I'm wiped out. Can't answer your questions now, but Tom over there is running the lockpicking workshop, he can help you."</p>
<p>"I don't want to ask you about locks. I want to find <tt>august</tt> or <tt>persephone</tt> from DOHI. Know where I can find them?"</p>
<p>He stared at me, sizing me up.</p>
<p>"I don't know them. Why do you think I'd know where they are?"</p>
<p>"Don't play me, sp3rt. I know you know them. I have logs."</p>
<p>"How do you know that? Look, I don't know you, and I don't have time for this," he said and turned away.</p>
<p>I put my hand on his shoulder and turned him around.</p>
<p>"I'm not the fucking police. I met <tt>persephone</tt> last night at the party and she's missing. She's not in her room and people can't find her and I'd like to help her if I can. Do you know how I can get ahold of them? Phone number, email, IRC server credentials?"</p>
<p>He softened like a marshmallow over a fire.</p>
<p>"Alright, I know their IRC server because I used to idle there, but I haven't been in awhile. They might have changed the password." He wrote it down on my phone. "Let me know how it goes. I used to hang out with them and I don't want anything bad to happen to them. If you're a fed–"</p>
<p>"If I was a fed, it wouldn't be wise to threaten me."</p>
<p>I left. I booted up an IRC client on my phone and connected. I entered the cryptic, random password that Mark had given me. I chose my nick: demeter. My phone was my torch, and I began my search.</p>
<div class="codehilite"><pre><span></span><span class="mi">10</span><span class="o">:</span><span class="mi">40</span> <span class="o">-!-</span> <span class="n">demeter</span> <span class="o">[</span><span class="n">demeter</span><span class="err">@</span><span class="n">xnet</span><span class="o">-</span><span class="mi">2</span><span class="n">F20F7B8</span><span class="o">]</span> <span class="n">has</span> <span class="n">joined</span> <span class="err">#</span><span class="n">dohi</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">40</span> <span class="o">[</span><span class="n">Users</span> <span class="err">#</span><span class="n">dohi</span><span class="o">]</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">40</span> <span class="o">[</span> <span class="err">@</span><span class="n">august</span> <span class="o">]</span> <span class="o">[</span> <span class="n">pe_x</span> <span class="o">]</span> <span class="o">[</span> <span class="n">halfalive</span><span class="o">]</span> <span class="o">[</span> <span class="n">manfield</span> <span class="o">]</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">40</span> <span class="o">[</span> <span class="n">demeter</span> <span class="o">]</span> <span class="o">[</span> <span class="n">ty</span><span class="err">—</span><span class="o">-</span> <span class="o">]</span> <span class="o">[</span> <span class="n">z</span> <span class="o">]</span> <span class="o">[</span> <span class="n">e27</span> <span class="o">]</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">40</span> <span class="o">-!-</span> <span class="n">Yrssi</span><span class="o">:</span> <span class="err">#</span><span class="n">dohi</span><span class="o">:</span> <span class="n">Total</span> <span class="n">of</span> <span class="mi">8</span> <span class="n">nicks</span> <span class="o">[</span><span class="mi">1</span> <span class="n">ops</span><span class="o">,</span> <span class="mi">0</span> <span class="n">halfops</span><span class="o">,</span> <span class="mi">0</span> <span class="n">voices</span><span class="o">,</span> <span class="mi">7</span> <span class="n">normal</span><span class="o">]</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">40</span> <span class="o">-!-</span> <span class="n">Yrssi</span><span class="o">:</span> <span class="n">Join</span> <span class="n">to</span> <span class="err">#</span><span class="n">dohi</span> <span class="n">was</span> <span class="n">synced</span> <span class="k">in</span> <span class="mi">2</span> <span class="n">secs</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">42</span> <span class="o"><</span> <span class="n">demeter</span><span class="o">></span> <span class="n">looking</span> <span class="k">for</span> <span class="n">persephone</span><span class="o">.</span> <span class="n">couldn</span><span class="err">'</span><span class="n">t</span> <span class="n">find</span> <span class="n">her</span> <span class="n">after</span> <span class="n">the</span> <span class="n">party</span> <span class="n">last</span> <span class="n">night</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">43</span> <span class="o"><</span><span class="err">@</span><span class="n">august</span><span class="o">></span> <span class="n">and</span> <span class="n">who</span> <span class="n">are</span> <span class="n">you</span><span class="o">?</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">43</span> <span class="o"><</span> <span class="n">demeter</span><span class="o">></span> <span class="n">we</span> <span class="n">actually</span> <span class="n">met</span> <span class="n">last</span> <span class="n">night</span><span class="o">.</span> <span class="n">i</span> <span class="n">was</span> <span class="n">the</span> <span class="n">one</span> <span class="n">who</span> <span class="n">was</span> <span class="n">on</span> <span class="n">the</span> <span class="n">mic</span> <span class="n">and</span> <span class="n">asked</span> <span class="n">to</span> <span class="n">meet</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">50</span> <span class="o"><</span><span class="err">@</span><span class="n">august</span><span class="o">></span> <span class="n">i</span> <span class="n">never</span> <span class="n">made</span> <span class="n">it</span> <span class="n">to</span> <span class="n">the</span> <span class="n">party</span><span class="o">,</span> <span class="n">was</span> <span class="n">she</span> <span class="n">there</span><span class="o">?</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">50</span> <span class="o"><</span> <span class="n">halfalive</span><span class="o">></span> <span class="n">demeter</span><span class="o">:</span> <span class="n">and</span> <span class="n">who</span> <span class="n">are</span> <span class="n">you</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">50</span> <span class="o"><</span> <span class="n">demeter</span><span class="o">></span> <span class="n">august</span><span class="o">:</span> <span class="n">what</span> <span class="k">do</span> <span class="n">you</span> <span class="n">mean</span> <span class="n">you</span> <span class="n">never</span> <span class="n">made</span> <span class="n">it</span><span class="o">?</span> <span class="n">we</span> <span class="n">met</span> <span class="n">last</span> <span class="n">night</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">51</span> <span class="o"><</span> <span class="n">ty</span><span class="o">--></span> <span class="n">loooool</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">51</span> <span class="o"><</span><span class="err">@</span><span class="n">august</span><span class="o">></span> <span class="n">alright</span><span class="o">,</span> <span class="n">f</span> <span class="k">this</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">51</span> <span class="o">-!-</span> <span class="n">mode</span><span class="o">/</span><span class="err">#</span><span class="n">dohi</span> <span class="o">[+</span><span class="n">b</span> <span class="o">*!*</span><span class="n">demeter</span><span class="err">@</span><span class="n">xnet</span><span class="o">-</span><span class="mi">2</span><span class="n">F20F7B8</span><span class="o">]</span> <span class="n">by</span> <span class="n">august</span>
<span class="mi">10</span><span class="o">:</span><span class="mi">51</span> <span class="o">-!-</span> <span class="n">demeter</span> <span class="n">was</span> <span class="n">kicked</span> <span class="n">from</span> <span class="err">#</span><span class="n">dohi</span> <span class="n">by</span> <span class="n">august</span>
</pre></div>
<p>What the fuck? Was <tt>august</tt> lying to me? What does he stand to gain? Are there two of them? I put my phone away and the room began to spin. I found a quiet corner of the loud room and sat down. I might have passed out, I can't remember. The only thing I remember thinking was that I was out of my league. What was I doing on this assignment? Had I failed already? Too many questions. I heard the whir of a security camera rotating and focusing on its target.</p>
<p>I dreamt, either as a daydream or half asleep. I don't remember. I was in a room with <tt>persephone</tt> and Horace and a few other people that I knew. We were talking about how glad they were I had returned. I told them that I hadn't left and it was <tt>persephone</tt> that was missing. They got angry. Slowly, fog rolled into the room. At first it was just around our feet, but it soon billowed up around us. No one seemed to notice it, the mire. They left me and I tried to get out of the fog. But I was too tired and fell down. I was enveloped. I could hear them laughing.</p>
<p>Someone kicked me. "Hey."</p>
<p>I was slumped over. Again, a kick. "Wake up! Time to go to school!" Laughter.</p>
<p>I sat up, bleary-eyed. A young man, late twenties, Converse sneakers, black jeans, and a dark t-shirt.</p>
<p>"You're demeter? Not what I expected."</p>
<p>"Wait–what?"</p>
<p>"It wasn't too hard. I wanted to know who had joined our IRC channel and I was in the lockpicking village. I asked my good friend Mark. He said he had told you because you were looking for <tt>persephone</tt> and that you were probably still around. And you were!" He seemed proud of himself. "Just so you know we're changing our password."</p>
<p>"I only–"</p>
<p>"I'm on your side," he interrupted. "I want to find <tt>persephone</tt>. I was the last one to see her, I think. I'm <tt>z</tt>."</p>
<p>"Oh?" I tried to sound cool, calm, and collected.</p>
<p>"We met up in her hotel room after her and <tt>august</tt> left the party."</p>
<p>"But <tt>august</tt> said–"</p>
<p>"I saw. I don't know the story yet. I didn't know that was him at the party. I had never met him before." He looked around. "Why don't we grab a drink and finish this there?"</p>
<p>We went to the hotel bar and ordered stiff drinks to keep the daylight at bay.</p>
<p>"She had told me where she was staying, so I went there and she let me in."</p>
<p>"When was this?"</p>
<p>"I can't remember. I was all hopped up last night."</p>
<p>"What did you do?"</p>
<p>"Well, <tt>august</tt>, or whoever he was, was there. She called me <tt>z</tt> and I figured me and him didn't know each other. He was on the bed laying down on top of the blankets. Me and her talked about random stuff, and then his phone rang. He answered it like he was expecting it, mumbled a few words and handed the phone to <tt>persephone</tt>. She talked to whoever on the phone, maybe 30 seconds."</p>
<p>"What did she say?"</p>
<p>"She did a lot of listening and mm-hmms. The one thing she did say, and I won't forget it, was, 'In the desert? Why not here? Will he be there?' And after that, she asked me to leave. She was upset."</p>
<p>"The fuck?"</p>
<p>"I know, right? So maybe she was meeting someone there?"</p>
<p>I considered the situation and rolled the dice. I wanted to see how much <tt>z</tt> knew.</p>
<p>"How much do you know about the Antidote?" I asked.</p>
<p>He stared down at me. He seemed to realize that we weren't on the same team. Betrayal was written on his face and I was ashamed. The room slowed. The waiter busing a table on the other side of the room spent an hour cleaning one spot. Someone drank their water for an eternity. He stopped blinking. I stared back at him.</p>
<p>"You had best to not get involved." He stood to leave. "You have no idea what you're asking, that's obvious."</p>
<p>"I think I know. I want to help her, that's all."</p>
<p>"Bullshit. I'm tired of this, why don't you fuck off?"</p>
<p>My turn to not say anything. I let it hang in the air there. I tasted his fury. There was a lot here and it was hard to believe it was all on <tt>persephone</tt>'s behalf. What was his angle?</p>
<p>"Why not tell me what your play is in this?"</p>
<p>He put his hands on the table and leaned towards me.</p>
<p>"Look, us hackers, we're weapons now. Sentient weapons. We're used by the powerful to do good or evil. This is what the Great Artillery is: one of us was told to make a weapon to destroy. Most of my former friends work for companies that sell weaponized code to governments and corporations to wage a fucking cyberwar or oppress people. The rest work directly for them. There was a time when it was about discovery. I guess we're at the end of that road. <tt>persephone</tt> wanted something else. She wanted and now this has happened."</p>
<p>I took a cigarette out and offered him one, but he refused. I lit it and blew the smoke up and away. It hung in the air between us. The fire. His fists clenched.</p>
<p>"I think I've figured out what's going on," he said. "And you should stay the fuck away. You're in over your head."</p>
<p>"Maybe."</p>
<p>He left.</p>
<p>I stayed at the bar. I thought it through: I was asked by persons unknown to find code that <tt>persephone</tt> wrote. When I got into her room at last, she was missing. The last person to see her was <tt>z</tt> and he thinks she was going into the desert. Wait, that's not right. That other guy, <tt>august</tt>, was there. But he said he didn't make it to the party. Then <tt>z</tt> had figured it out.</p>
<hr />
<p>I went outside to get some fresh air and some summer Vegas heat. The sky burned. Great flames were burning down the strip and I was glad. The camera which watched the entrance melted from the heat. There was a man in the penthouse of a hotel nearby and he seemed unaware that he was engulfed. He thought of the days ahead and looked out over Vegas. His room was slowly building in pressure. The glass bent outwards. There was a song playing on the radio and he wondered why they didn't make music like this anymore. When the window exploded outward, raining sharp glass onto the crowd below, he shivered at the cool breeze and turned away from the window.</p>
<p>I took out another cigarette and tried to light it with the pack of bar matches this place gave out. The sound of a lighter made me look up. A man extended a lighter. I lit up.</p>
<p>"I've been looking for you," he said. He was well dressed, business casual. He had groomed salt and pepper hair and an attractive beard. His eyes were wide for his face, but he held it well.</p>
<p>"Well, here I am."</p>
<p>He smiled sardonically.</p>
<p>"How's DOHI?"</p>
<p>"How would I know?"</p>
<p>He laughed politely.</p>
<p>"Don't fuck around. You've met with every member of DOHI that's at DEFCON. You've gotta be one of them. What's your handle?"</p>
<p>A long drag. I looked around. Make him wait. I know how this game is played. I've been here before. The flames seemed to be dying down. They needed rekindling.</p>
<p>"I'm <tt>persephone</tt>."</p>
<p>He raised his eyebrows.</p>
<p>"We know you're not <tt>persephone</tt>. That's ridiculous. Unless–" he trailed off and looked around. He took out his phone and typed something. He looked dissatisfied and put it away. "Who are you, really? If you're <tt>persephone</tt> then you're in DOHI."</p>
<p>Time for me to laugh sardonically. "How about fuck off?"</p>
<p>He took out his own smokes and lit up. Yellow flames whipped around the sky.</p>
<p>I asked, "Who are you? You a fed? What do you want?"</p>
<p>No more laughter or smiles.</p>
<p>"Not really. I just want to express to you the danger you're in."</p>
<p>He turned back to me and looked serious.</p>
<p>"What are you talking about?" The clock was ticking on this ruse and I needed some information.</p>
<p>"The Antidote is a national security concern. Some people, bad people, will stop at nothing to get it. There's a bounty out for it, and it runs into the hundreds of thousands of dollars and grows every day. If you keep telling people you're <tt>persephone</tt>, well, who knows what might happen?"</p>
<p>Another fucking smile.</p>
<p>"I am looking for a friend of mine, maybe you know something about it. Her friends say she got a phone call and had to meet someone in the desert. Know anything about that?" I asked.</p>
<p>His waxy face seemed to sag all over, as if his skin's elasticity had given up. He took a drag. He was older than I first noticed. Almost geriatric. His hair had turned a bright white, and lost its sheen.</p>
<p>"That looks like <tt>persephone</tt> over there, doesn't it, <tt>persephone</tt>?" he asked.</p>
<p>I looked back into the hotel and caught out of the corner of my eye an owl mask leaving the bar. I could catch up to her, but I was sloshed and sloppy. She pressed the button for an elevator and if I didn't sprint I'd miss her. <tt>persephone</tt>, with the owl mask on the back of her head, stepped onto the elevator. I jumped through the closing door at the last minute. I grabbed her and threw the mask aside and looked into her face. It wasn't her. It was some teenage girl, years younger than <tt>persephone</tt> was. There was a man standing next to her, obviously her father. He started and said something angry at me. She probably picked up the discarded mask somewhere. I mumbled something and got off at the first floor I could.</p>
<p>When I went back outside, my fed was gone. My leads were cold. Everyone told me that this case was too much for me, and they were right. I gathered my things and drove home. The flames died down.</p>
<p>A few days later I called Horace to tell him I had finished with the job. I lost <tt>persephone</tt> and didn't get the code. He didn't pick up so I left a message with his secretary. The next day Horace had left a message on my voicemail, "You did good. Thanks for the hard work. If you hear anything–I want you to know that you're not responsible. You did what you were told. Anyway, we probably won't be needing you for awhile, so if you need to join another agency or something, we'll understand. Let me know if you want a reference."</p>
<p>A reference for what? What was I responsible for? What happened? The money was in the bank. I needed to be among my kind at that dive in North Vegas. I drove there in my beat up red convertible. It was a long ways and I tried to think of nothing at all. My phone buzzed while I waited at a long light a block or so away from the bar.</p>
<p>I checked my phone's messages and saw I had an email. It was from <tt>persephone</tt>. This is what it said:</p>
<blockquote>
<p>This is my dead woman's switch.</p>
<p>The Antidote can control the Great Artillery. I never told anyone how it
manages to do so. It doesn't stop it, that would be difficult to do and
easy to fix. Instead, the Antidote can redirect its force. I can make it
point anywhere I want. Hijack nearly anything and make it destroy anything
else. And this is what I've done.</p>
<p>Why?</p>
<p>Because visibility is a trap. This whole thing is a trap. I'm disarming it
for a little while, so take advantage of that. Bentham's Panopticon has
been created anew and we're in it and we keep building it up. Time to start
over.</p>
<p>Is the gunsmith responsible for people killed by their guns? Everyone tells
me no. Is the engineer who makes a missile guidance system responsible for
those charred, burnt corpses? Is the hacker responsible for what the
government does with the exploit they sold? All I know is: if there were no
engineers willing, there would be no missile guidance systems. But no
individual raindrop feels responsible for the flood.</p>
<p>This is probably the last email you'll receive for little while, but that's
just a side effect.</p>
<p>yours,
persephone</p>
</blockquote>
<p>The light turned green and I took my foot up off the brake. My airbags deployed. The seatbelt strained against my chest. My head hit the airbag like an egg thrown against a wall. A deafening bang. Some asteroid or missile or bomb must have obliterated my car. I passed out. When I came to I was still in my car and someone was talking to me. Apparently some asshole hit the back of my car at 30mph. My car had rolled through the intersection and hit a pole. The paramedics were surprised I only had a few scratches. The miracle of modern technology I guess. The car was toast, though. Good riddance. Lost my phone too. They couldn't find it.</p>
<p>After getting a thorough examination I left and walked to the bar. Only a half a block away thanks to the push. The interior was quiet and cool. My eyes adjusted painfully.</p>
<p>Howie was there, of course. What else was he going to do? I looked at him slumped over the bar. A professional loser. Always did what he was told. And it paid well. I finally understood. It was easier to be told than to decide. A life of decisions is a tired life. Decision fatigue. The quality of decisions goes down over time.</p>
<p>I sat in the back, near the jukebox. I can't stand when people tell me that you need to fail. Failing isn't some necessary component to success. To succeed, you need to succeed. The people who tell you that you need to fail to learn think that failing is only getting into Brown instead of Harvard. To fail, to really fail, is to be in a personal, self-made hell. Fail often enough and the fabric which holds the narrative of your life together starts to unwind. We can't look at our lives objectively because of the difference between where we are and where we think we should be. That's failing. You can see the destination, but the path is winding and obscure. There's no practice and no redoes.</p>
<p>You had no choice in which country you were born in or which parents you were born to. You had no choice in how attractive you are or how smart you are or how privileged you are. Yet they are the prime determinants of you. Personal responsibility and ambition! The clarion call of the well-off and the lucky. Fuck that. All you need is one look at Howie to see what a lifetime of personal responsibility and failure will do to a man.</p>
<p>I don't know what happened to <tt>persephone</tt> but I knew she tried to buck the powerful and the rich and I know where it got her. Her last resort was some nasty shit. I bet they were going to spend months fixing it all. I hope that made her happy, wherever she was.</p>
<p>I looked again at Howie. He lifted his head and looked around, probably for Linda so he could get another drink and be done with it all. He turned and saw me. I nodded. He looked at me and we understood. I ordered us a round and we waited it out.</p>The Practice Startup2015-04-12T14:20:00-07:002015-04-12T14:20:00-07:00Sean Cassidytag:www.seancassidy.me,2015-04-12:/the-practice-startup.html<p>You want to found a startup. You're not exceptionally well connected, you don't have the best idea in the world, your programming skills aren't world class. That's okay. You don't know how to build a product, market it, or how to sell it. That's okay, too. Because you're going to …</p><p>You want to found a startup. You're not exceptionally well connected, you don't have the best idea in the world, your programming skills aren't world class. That's okay. You don't know how to build a product, market it, or how to sell it. That's okay, too. Because you're going to learn. You're going to learn the best way: by doing.</p>
<p>A few years ago, I, like many people, wanted to found my own startup. I had ideas – great ideas, so I thought – but I didn't know what to do with them. I knew how to program, and I naively thought that was enough. If only those venture capitalists would give me money I could finally have a chance! I read books about startups and businesses, read countless blog posts that focused on how to grow my startup, how to hire salespeople, how to run an email marketing campaign, and so on. But they didn't focus on improving me.</p>
<p>I started a company with a friend of mine. We built some software, called some people, and eventually realized it couldn't be a sustainable business. So we decided to solve a problem we had that we had run into while doing the previous startup: deploying software is hard, and we disliked Chef/Puppet/Ansible. So we made <a href="http://www.gosquadron.com">Squadron</a>, actually managed to find customers willing to pay for support, and even got a YC interview. We failed for a few reasons I might write about later, but I learned more than I ever thought I would about products, about sales, and about marketing. That knowledge is still serving me well at <a href="https://www.praesidio.com">Praesidio</a>, where I joined as the first employee. It was invaluable practice that is still paying dividends.</p>
<p>Here's the deal: if you want to found a startup, if you want to build a great product, if you want your business to have a chance at surviving, you need to have the skills. You can and should read as many books and blog posts as you can. But there's no substitute for doing it yourself.</p>
<h1 id="the-plan">The Plan</h1>
<p>You're going to practice.</p>
<p>For your practice startup, you'll need an idea. It's going to be a simple idea. You're going to build a minimally viable product (MVP) and start selling it. You'll build a marketing campaign for it, call potential customers on the phone, go to meetups and pitch your idea to people, and maybe even pitch it to investors. Nobody but you will know it's just practice.</p>
<p>You probably have a full time job now. Most people in the startup community will tell you to quit. I won't. You should keep your job while you do this. Yes, investors don't like that, but that doesn't matter right now. You're going to learn the skills it takes by doing. It's also helpful if you get a partner, but it's not essential yet. Don't listen to stupid advice like dump your significant other or move across the country. You can do this right now.</p>
<p>You'll need to pick your idea. Don't get too hung up on this stage. This doesn't have to be the best idea in the world, just something you think some people will pay money for. One easy way to do this is to pick a technology that you use, but don't love. Then, change it so that you love it. Tell your friends and family you're in the market for ideas. One thing to note: since you're purposefully doing this for practice, it's best not to pick an idea people or businesses would absolutely depend on. Make it clear to your potential customers that you're still in the research phase.</p>
<p>Once you have your idea, try to get a customer or two. That's right, before you build anything, try to sell it for actual money, or get people to use it if it's not a salable product. Don't ask if they would buy it or would use it, ask them if they will buy it or will use it. Pricing products is difficult even for more mature companies, so try out a few different prices. Listen to them and try to understand their problems and what they would want you to do. Your idea shouldn't be shoehorned into fix their problems, but you can modify your idea to better suite their needs. If you can't find anyone willing to pay actual money or actually start using it, start over. Remember: it's just practice.</p>
<p>Now that you have a customer or two ready to go, build your MVP. There are lots of blog posts and books written about MVPs, but I'll just say that I prefer working prototypes to screenshots. You should do whatever you think is best. If you're not a programmer, learn or find a partner who knows how. Don't pay thousands of dollars to a contracting firm yet. Avoid building mobile apps from the get-go unless you are particularly proficient at writing them. They take up a huge amount of time.</p>
<p>Once your have your MVP, find beta customers. Does your target customer typically have blog? Or tweets? Or uses Pinterest or LinkedIn or reddit? Contact them and send friendly, personalized messages to them. Keep it short and to the point. If you can't find their emails, find out where they go. Meetups, conferences, libraries, the climbing gym, schools, various Internet communities. Then start talking to them. You can't skip this step. Find at least a few more people or companies willing to pay actual money for, or at least, use what you've built (or said you were going to build). If you can't find any more customers, you should start over or modify your idea or MVP. It's just practice, so don't sweat it.</p>
<p>Armed with your customers, build your marketing website, complete with testimonials. Get a mailing list going. Solicit feedback. Build out some more features on your MVP.</p>
<p>Find local startup incubators and pitch to them, but don't get carried away with this step. You don't want to get too focused on finding funding, as that will warp your practice startup. Without even knowing it, you'll enlarge the idea to “attract” investors and start to talk about how your total addressable market is really everyone in the world. Instead, tell them the facts. If they don't fund you, it's no big deal, as this was just practice. It's like you're David Attenborough and you're going to make a documentary on the elusive venture capitalist in its natural habitat: you're learning about them and the process.</p>
<p>You might want to know at what point you should incorporate, buy an expensive domain name, or shell out a lot of money for a nice logo or t-shirts. These things give the illusion of traction and of having a product-market fit. I recommend you wait on all of them until the moment where it would be silly not to have them. Don't pay too much money for any of them.</p>
<h1 id="what-youve-learned">What you've learned</h1>
<p>Write down everything you learn as you go. Blog about it and <a href="mailto:sean@seancassidy.me">email</a> or <a href="https://twitter.com/sean_a_cassidy">tweet me</a> the link, I'd love to read it. Share what you've learned and encourage people to practice, too. Write about the most surprising thing you learned about sales, or marketing, or product management, or building a prototype, or user testing.</p>
<p>It turns out that there's really no difference between doing the practice and doing the real thing. If you've done all of these steps, you may have actually built a fledgling startup. All you needed was permission. Keep going.</p>
<p>We as a community tend to get too focused on some details, like quit your job first, or how to select the perfect idea, or how to build the perfect MVP. Some people even think succeeding wildly on your first try is important. None of that matters in the same way that you being a good product person, a good marketer, a good salesperson does. Growing as a person is how you will best know how to grow your business.</p>
<p>Try it out. You have nothing to lose. It's just practice.</p>Code as Risk2015-02-11T07:59:00-08:002015-02-11T07:59:00-08:00Sean Cassidytag:www.seancassidy.me,2015-02-11:/code-as-risk.html<p>There are many analogies for technical debt, but the basic one is the most
clear: the more technical debt you have and the longer you leave it alone,
the harder it is to pay it off. And a little technical debt is okay, just
like a manageable amount of financial …</p><p>There are many analogies for technical debt, but the basic one is the most
clear: the more technical debt you have and the longer you leave it alone,
the harder it is to pay it off. And a little technical debt is okay, just
like a manageable amount of financial debt is okay if it means you can get
more done faster.</p>
<p>But why is there a distinction between technical debt – thought to be bad
code or partially implemented systems – and the rest of your code? Is the
rest of your code bug-free? Instead, consider the idea that <strong>all code adds
risk</strong>. The tradeoffs of implementing features become more clear because you
can ask the question: how risky will this code be?</p>
<p>Investors and financial institutions are very concerned with risk. They have
a good incentive to study it, as they'll lose all their money if they don't.
There are two types of financial risk: systematic and unsystematic (or
diversifiable) risk. Systematic risk is the risk of the underlying market, the
risk that you can't make go away. Unsystematic risk is controlled by you. If
you put all your investments in the stock of one company, you could lose your
money because you are not diversified. If, instead, you diversify, your total
risk begins to approach the baseline systematic risk.</p>
<p>Products are the same way. There is a baseline risk – after all, you are
creating a new technology – and there are risks you assume by your choices.
One goal I have is to reduce my unsystematic risk so that the business and the
products I make are stable and reliable. So how do you reduce your unsystematic
risk?</p>
<h2 id="delete-code">Delete code</h2>
<p>If code is risk, deleting code should be a goal. Remove dead code. Don't keep
it around because you might need it, that's what source control is for.
If you think all code is risky, refactoring becomes more important because it
focuses the code base and removes wasteful code.</p>
<h2 id="choose-libraries-wisely">Choose libraries wisely</h2>
<p>With modern dependency management tools, it's trivial to add new dependencies.
Remember, if you're using someone else's code, you're responsible for the usage
of it in your own software. Libraries you use should be non-trivial, actively
developed, well tested, and well documented. They should also be as small as is
reasonable. That means less code, less risk, and they're easier to understand
if you need to read the entire codebase. Large frameworks can increase risk.</p>
<h2 id="code-contracts">Code contracts</h2>
<p>When using other people's code, it is critical to understand what their
invariants are, and what their guaranteed output is.
<a href="http://www.seancassidy.me/your-interface-is-what-matters.html">Code contracts</a> are a critical part of making quality,
reliable software. Understand others' contracts and write clear ones for your
own software. The risk of misusing code and having your code misused will be
much reduced.</p>
<h2 id="write-your-core-code">Write your core code</h2>
<p>If there is one thing your product does that's important, you should have
people on staff who have either written that code, or have studied the third
party library that implements it. Not knowing how a critical part of your
system works is a ticking time bomb.</p>
<h2 id="minimize-new-technology">Minimize new technology</h2>
<p>You shouldn't use the new database with a new programming language on the new
web framework on the new AWS cloud replacement using the brand new automation
toolkit. Pick the least number of new, risky technologies that will make your
product better. Be explicit about this choice. Everything else should be
boring.</p>
<h2 id="open-source">Open source</h2>
<p>If you open source code that is likely to be of use to others, they will help
test and improve it. The mere process of open sourcing code can improve the
quality of the code and reduce risk because you don't want the code to
embarrass you or your company. </p>
<h2 id="say-no-to-complicated-features">Say no to complicated features</h2>
<p>Some product features are just complex. They want a business process that does
all the things and reads in a PDF and then outputs a spreadsheet with some nice
graphics automatically summarizing the–. No. Not unless that is a huge win.
Complicated processes make for bad software. Bad software is risky.</p>
<h2 id="ensure-correctness-in-many-situations">Ensure correctness in many situations</h2>
<p>Tests and type checking reduce risk. Make sure they cover error paths in
addition to the successful paths that are normally tested. What happens when
the database stops responding, or the authentication server is down? Does your
system keep working, or does it fail catastrophically?</p>
<h2 id="product-continuity-plan">Product continuity plan</h2>
<p>Financial institutions aren't just obsessed with risk of their financial
instruments. They study the risk involved with their computer systems, their
employees, and even the companies they buy services from. They ask for a
<a href="https://en.wikipedia.org/wiki/Business_continuity_planning">business continuity plan</a> so that they can plan for when your company's
datacenter goes offline or you go bankrupt. They also want a business
continuity plan for your datacenter. You can do the same thing for your
product: what happens if S3 goes offline for an hour? If your Azure
availability sets go offline? If someone drops your database accidentally? A
little preparation goes a long way to mitigating risk.</p>
<h2 id="high-bus-factor">High bus factor</h2>
<p>If you have a few people that are the only go-to people on certain systems,
you're in trouble if they cannot work or if they change jobs suddenly. Code
reviews are a good way to spread knowledge, pair programming, even better.
<a href="http://www.seancassidy.me/bus-factors-and-walk-score.html">Improving your software's walk score</a> helps people get into the
code base so that you can lower your bus factor.</p>
<h2 id="automation-and-devops">Automation and DevOps</h2>
<p>Despite DevOps being almost a buzzword at this point, the core idea is still
sound: have an experienced operations team write and manage the automation,
deployment, and monitoring of your software. And remember: no silos, they
should work in tandem with the development team, and report to the same people.
Huge ops organizations that are distinct from R&D are a recipe for
trouble.</p>
<h2 id="have-a-process">Have a process</h2>
<p>There are two reasonable release processes: fast and continuous (or nearly), or
slow with a long QA cycle. Both of these are acceptable in different
circumstances, but it's important to not fall in the middle. Large changes with
many releases increases risk. So either release more frequently, or test those
big releases carefully.</p>
<p><br><br>
Some risk is acceptable. But it's important to make that a conscious choice. By
choosing your risk carefully, you can plan around it. If you are unaware of the
risk of the choices you made, it is more difficult to plan around failure. The
goal is when a system fails, your first thought should be: it's okay, we have
failure countermeasures in place.</p>
<p>Failure should not be a surprise. Be explicit about the risks you have and what
you're doing about them.</p>Sherlock Holmes Debugging2014-11-09T12:33:00-08:002014-11-09T12:33:00-08:00Sean Cassidytag:www.seancassidy.me,2014-11-09:/sherlock-holmes-debugging.html<p>How do you debug?</p>
<blockquote>
Most people, if you describe a train of events to them, will tell you what the result would be. They can put those events together in their minds, and argue from them that something will come to pass. There are few people, however, who, if you …</blockquote><p>How do you debug?</p>
<blockquote>
Most people, if you describe a train of events to them, will tell you what the result would be. They can put those events together in their minds, and argue from them that something will come to pass. There are few people, however, who, if you told them a result, would be able to evolve from their own inner consciousness what the steps were which led up to that result. This power is what I mean when I talk of reasoning backwards, or analytically.
<cite class="character">Sherlock Holmes</cite>
<cite><a href="http://www.amazon.com/gp/product/1492351008/ref=as_li_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=1492351008&linkCode=as2&tag=reamorpap-20&linkId=FHQDIPHI7XLUXIZ4">A Study in Scarlet, by Sir Arthur Conan Doyle</a></cite>
</blockquote>
<p>I use the Sherlock Holmes method of debugging software. This means working backwards from a symptom to the series of causes which tell a story of the entire bug. It works for production outages and during development. Something that recently happened at my startup should be instructive.</p>
<blockquote>
<p>There is nothing more deceptive than an obvious fact.</p>
</blockquote>
<p>A few months ago, my coworker was using Mandrill to send email from one of our AWS EC2 boxes. We had been using Mandrill for a few months for most of our email sending needs. However, he was having trouble getting his code to send the email. It would fail without reason when it had been working the week before.</p>
<p>First, we examined the Mandrill API client that we were using. It was somewhat confusingly written and sparsely tested, so we assumed the bug was in this library. We spent an hour or two going through it, and we found a major problem: it was swallowing exceptions and making it look like it succeeded.</p>
<p>We needed more information. We needed to remove variables from the equation and gather data. "Try cURL directly," I said. So we set up a cURL command:</p>
<div class="codehilite"><pre><span></span>$ curl -H <span class="s1">'application/json'</span> -X POST -v -d <span class="s1">'{</span>
<span class="s1"> "key": "ASJFSDKfkekjjsdkfiEJKFJ",</span>
<span class="s1"> "message": {</span>
<span class="s1"> "text":"test",</span>
<span class="s1"> "subject":"test",</span>
<span class="s1"> "to":[{</span>
<span class="s1"> "email":"my@email.story",</span>
<span class="s1"> }]</span>
<span class="s1"> }</span>
<span class="s1">}'</span> https://mandrillapp.com/api/1.0/messages/send.json
</pre></div>
<p>And it worked! No problem at all. The email was sitting in my inbox, and we were confused.</p>
<p>"So the problem must lay in my software," my coworker reasoned. We had spent all morning looking at his code, and we had found no problems. I didn’t want to do that again.</p>
<blockquote>
<p>It is a capital mistake to theorize before you have all the evidence. It biases the judgment.</p>
</blockquote>
<p>We needed still more data. "Try sending your exact email payload to Mandrill," I suggested.</p>
<p>There were many differences between the simple test email we tried and the email he needed to send. His email was more complicated with tables, headers, a few people in the "to" and "cc" lists. It was probably ten times larger. So we copied it exactly:</p>
<div class="codehilite"><pre><span></span>$ curl -H <span class="s1">'application/json'</span> -X POST -v -d <span class="s1">'{</span>
<span class="s1"> "key": "ASJFSDKfkekjjsdkfiEJKFJ",</span>
<span class="s1"> "message": {</span>
<span class="s1"> // SNIP, large message here</span>
<span class="s1"> }</span>
<span class="s1">}'</span> https://mandrillapp.com/api/1.0/messages/send.json
</pre></div>
<p>And it didn’t work. No email.</p>
<p>How is this possible? A Mandrill bug? Or maybe we have our message format wrong. Maybe we’re using cURL wrong. So many possible theories, but we still didn’t have enough data. We booted up my favorite tool, tcpdump. First, we checked the small message that worked:</p>
<div class="codehilite"><pre><span></span><span class="mi">23</span><span class="err">:</span><span class="mi">26</span><span class="err">:</span><span class="mi">12</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.49493</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="nl">https</span><span class="p">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">S</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">SYN</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">26</span><span class="err">:</span><span class="mi">12</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="n">https</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.49493</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">S.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">SYN</span><span class="o">+</span><span class="n">ACK</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">26</span><span class="err">:</span><span class="mi">12</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.49493</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="nl">https</span><span class="p">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">ACK</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">26</span><span class="err">:</span><span class="mi">12</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.49493</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="nl">https</span><span class="p">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">P.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">212</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="k">Data</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">26</span><span class="err">:</span><span class="mi">12</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="n">https</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.49493</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">ACK</span><span class="w"> </span><span class="k">Data</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">26</span><span class="err">:</span><span class="mi">12</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="n">https</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.49493</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">P.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">390</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Data2</span><span class="w"></span>
<span class="o">//</span><span class="p">..</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">27</span><span class="err">:</span><span class="mi">12</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.49493</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="nl">https</span><span class="p">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">FP.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">37</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">FIN</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">27</span><span class="err">:</span><span class="mi">12</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="n">https</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.49493</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">R</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="k">Connection</span><span class="w"> </span><span class="n">closed</span><span class="w"></span>
</pre></div>
<p>Everything looks fine. Connection opened and acknowledged. Data is sent after the TLS setup and then acknowledged just like you’d expect.</p>
<p>We then sent the larger message to Mandrill and observed its results:</p>
<div class="codehilite"><pre><span></span><span class="mi">23</span><span class="err">:</span><span class="mi">32</span><span class="err">:</span><span class="mi">05</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.58542</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="nl">https</span><span class="p">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">S</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">SYN</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">32</span><span class="err">:</span><span class="mi">05</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="n">https</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.58542</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">S.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">SYN</span><span class="o">+</span><span class="n">ACK</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">32</span><span class="err">:</span><span class="mi">05</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.58542</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="nl">https</span><span class="p">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">ACK</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">32</span><span class="err">:</span><span class="mi">06</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.58542</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="nl">https</span><span class="p">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">P.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">212</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="k">Data</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">32</span><span class="err">:</span><span class="mi">06</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="n">https</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.58542</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">ACK</span><span class="w"> </span><span class="k">Data</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">32</span><span class="err">:</span><span class="mi">06</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="n">https</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.58542</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">P.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">390</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Data2</span><span class="w"></span>
<span class="o">//</span><span class="p">..</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">32</span><span class="err">:</span><span class="mi">06</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.58542</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="nl">https</span><span class="p">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">P.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">1707</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Data3</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">32</span><span class="err">:</span><span class="mi">06</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.58542</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="nl">https</span><span class="p">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">P.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">1707</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Data3</span><span class="w"> </span><span class="n">retransmit</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">32</span><span class="err">:</span><span class="mi">07</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.58542</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="nl">https</span><span class="p">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">P.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">1707</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Data3</span><span class="w"> </span><span class="n">retransmit</span><span class="w"></span>
<span class="mi">23</span><span class="err">:</span><span class="mi">32</span><span class="err">:</span><span class="mi">07</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="n">ip</span><span class="o">-</span><span class="mf">172.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">internal</span><span class="mf">.58542</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="mf">54.</span><span class="n">us</span><span class="o">-</span><span class="n">west</span><span class="o">-</span><span class="mf">2.</span><span class="k">compute</span><span class="p">.</span><span class="n">amazonaws</span><span class="p">.</span><span class="n">com</span><span class="p">.</span><span class="nl">https</span><span class="p">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">P.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">1707</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Data3</span><span class="w"> </span><span class="n">retransmit</span><span class="w"></span>
</pre></div>
<p>So, the packets were just getting lost. To recap the facts:</p>
<ol>
<li>It wasn’t our software, we were testing with cURL</li>
<li>Small messages sent to Mandrill always worked</li>
<li>Large messages never worked</li>
<li>Large messages had worked the previous week</li>
</ol>
<p>We then came up with a few theories which fit the facts:</p>
<ol>
<li>Mandrill’s API was overloaded and dropping data</li>
<li>There was a bad route between our AWS and Mandrill, dropping larger packets</li>
<li>There was something wrong in our VPC’s layer 2 that was dropping jumbo frames</li>
</ol>
<p>The first theory was plausible, but it didn’t match any patterns of API problems I had ever experienced. Why would the size of the packets determine which were dropped on an overloaded REST service? That theory was unfulfilling.</p>
<p>I was a fan of the last theory, as it seemed to fit all of the facts, but we needed to verify that it was just our VPC. I logged into one of our Linode machines and tried to send the larger email command. It worked! The message was there, sitting in his inbox.</p>
<p>Now we thought we could solve our issue. If we set our <a href="https://en.wikipedia.org/wiki/Maximum_transmission_unit">MTU</a> to be below the jumbo Ethernet frame boundary, 1500 bytes, it should work. First, we double checked our settings:</p>
<div class="codehilite"><pre><span></span>$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 0a:02:7b:a2:78:AF
inet addr:172.32.2.151 Bcast:172.32.2.255 Mask:255.255.240.0
inet6 addr: fe80::832:780f:fba2:7005/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9001 Metric:1
RX packets:220362102 errors:0 dropped:0 overruns:0 frame:0
TX packets:34530020 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:315440534468 <span class="o">(</span><span class="m">315</span>.4 GB<span class="o">)</span> TX bytes:3097949824 <span class="o">(</span><span class="m">3</span>.0 GB<span class="o">)</span>
</pre></div>
<p>The MTU was 9001, typical for gigabit hosts. We reduced it to 1500:</p>
<div class="codehilite"><pre><span></span>$ ifconfig eth0 mtu <span class="m">1500</span>
</pre></div>
<p>And then sending Mandrill a larger message worked. Linux would then automatically fragment our packets into 1500 byte chunks and then the bad router in our VPC would pass the packets just fine. We double checked with our other VPC and found that it didn’t experience this jumbo frame issue. This meant that it was almost certainly AWS’s problem. </p>
<p>We summarized the issue to AWS with good examples they could replicate and sent a support ticket. They responded:</p>
<blockquote>
<p>A regression in our instance firewall meant that Path-MTU discovery messages were not correctly relayed back to the instance, meaning that affected instances (in a single AZ) communicating with >1500MTU instances/hosts across the EC2 border (including when you contact region-local instances on their public IP) would have connectivity issues any time the affected instance's OS attempted to send packets larger than 1500 bytes. This regression has been rolled back, and we have updated our testing procedures to ensure this regression does not recur.</p>
</blockquote>
<p>Validation! We got a service credit for our trouble.</p>
<p><img alt="Jeremy Brett as Sherlock Holmes" src="https://www.seancassidy.me/static/images/" /></p>
<p>There is no sin in software engineering more serious than thinking some behavior of a computer system is magical or beyond our understanding. It may be difficult to understand given the time or resource constraints we face, but given enough time and persistence, all phenomena are reducible to logic. We are engineers and it’s our job to understand the issues we face. Too often we abstract away the details and either assume its someone else’s fault or just take it for granted. Knowing enough about your entire software stack (including networking and operating systems) is invaluable even if you work in high-level programming languages. </p>
<p>Good engineers debug like Sherlock Holmes investigates: we gather facts, generate theories from those facts, and then test them. Coincidence is to be met with the fiercest skepticism. You changed the production database yesterday and now customers are reporting data loss issues? If your knee jerk reaction is to blame the customers you need to <a href="https://www.seancassidy.me/hackers-and-engineering-school.html">reevaluate how you approach engineering</a>. That should be the absolute last theory you consider and it should be backed by mountains of evidence.</p>
<p>The Holmes method of debugging is superior, I think, to <a href="http://yellerapp.com/posts/2014-08-11-scientific-debugging.html">the scientific method of debugging</a> because debugging isn’t just a science. There’s an art to knowing where to look and what data is needed. This comes from experience and is as much intuition as it is logic. Practice debugging and you will be a better debugger. If a production outage happens, <a href="https://www.seancassidy.me/meditations.html">first do no harm</a>. Instead, consider it from another perspective:</p>
<p>What would Sherlock Holmes do?</p>
<blockquote>
<p>Nothing clears up a case so much as stating it to another person.</p>
</blockquote>Plural gTLDs are evil2014-09-03T08:45:00-07:002014-09-03T08:45:00-07:00Sean Cassidytag:www.seancassidy.me,2014-09-03:/plural-gtlds-are-evil.html<p>Domain names translate semantic meaning into internet addresses. They're for
people. They're designed to be read and understood by people. If they're hard
to understand, they're not doing what they're intended to do.</p>
<p>Like many people, I had mixed feelings about the new generic top-level domain
(gTLD) offering. While some …</p><p>Domain names translate semantic meaning into internet addresses. They're for
people. They're designed to be read and understood by people. If they're hard
to understand, they're not doing what they're intended to do.</p>
<p>Like many people, I had mixed feelings about the new generic top-level domain
(gTLD) offering. While some domains like <em>.guru</em> or <em>.club</em> seemed like a good
way to get pressure off of the toxic <em>.com</em> market, there are many bizarre new
gTLDs like <em>.whoswho</em>. But on the whole, I was mildly in favor of the new gTLDs
despite persuasive arguments from the
<a href="http://www.ana.net/content/show/id/crido">Coalition for Responsible Internet Domain Oversight</a>.</p>
<p>That is, until I found out that many come in pairs. I recently saw the gTLD
<em>.pet</em> in a list, and I thought it was odd choice. Who would make a domain for
one single pet<sup id="fnref:mallory"><a class="footnote-ref" href="#fn:mallory">1</a></sup>? But ICANN thought of this and added <em>.pets</em>. </p>
<p>So now we have two gTLDs: <em>.pet</em> and <em>.pets</em>. I am absolutely sure, that if
there's a popular pet website I want to go to in the future, I won't be able to
remember if they pluralized their gTLD or not. And this goes against what
domain names were meant to do: translate meaning. If I can't remember it, then
it's not doing the right thing.</p>
<p>To those who say search engines have obviated the need for remembering domain
names, that is only partially true. Physical and television advertisements have
URLs in them. Phishing uses the fact that people don't understand domains to
extract money from them. People email URLs all the time.</p>
<p>If you are part a company or group that in any way needs to protect your users
– really, any website where you can log in, you should not use one of the new
pluralizable gTLDs. They will confuse your users and you will be worse off.
Here's a list of gTLDs that users may mistake for one another<sup id="fnref:list"><a class="footnote-ref" href="#fn:list">2</a></sup>:</p>
<div class="codehilite"><pre><span></span><span class="na">.car</span><span class="err">/</span><span class="no">.cars</span>
<span class="na">.deal</span><span class="err">/</span><span class="no">.deals</span>
<span class="na">.fan</span><span class="err">/</span><span class="no">.fans</span>
<span class="na">.home</span><span class="err">/</span><span class="no">.homes</span>
<span class="na">.hotel</span><span class="err">/</span><span class="no">.hotels</span>
<span class="na">.loan</span><span class="err">/</span><span class="no">.loans</span>
<span class="na">.market</span><span class="err">/</span><span class="no">.marketing</span><span class="err">/</span><span class="no">.markets</span>
<span class="na">.mov</span><span class="err">/</span><span class="no">.movie</span>
<span class="na">.new</span><span class="err">/</span><span class="no">.news</span>
<span class="na">.pet</span><span class="err">/</span><span class="no">.pets</span>
<span class="na">.photo</span><span class="err">/</span><span class="no">.photos</span>
<span class="na">.property</span><span class="err">/</span><span class="no">.properties</span>
<span class="na">.realestate</span><span class="err">/</span><span class="no">.realtor</span><span class="err">/</span><span class="no">.reatly</span>
<span class="na">.review</span><span class="err">/</span><span class="no">.reviews</span>
<span class="na">.supply</span><span class="err">/</span><span class="no">.supplies</span>
<span class="na">.tour</span><span class="err">/</span><span class="no">.tours</span>
<span class="na">.work</span><span class="err">/</span><span class="no">.works</span>
</pre></div>
<p>You owe it to your users to be as easy-to-use as possible. That means not
making them remember whether or not your gTLD is plural or if which noun form
it is. This means you'll have to buy more than one, and redirect, which just
creates more work and more expense.</p>
<p>Stick to a simple domain for now, and avoid these plural gTLDs.</p>
<p><em>Update</em>: I was informed that the owners of <em>.sport</em> successfully objected to
the <em>.sports</em> domain. So ICANN, in principle, agrees. Read
<a href="http://newgtlds.icann.org/sites/default/files/drsp/25sep13/determination-2-1-1614-27785-en.pdf">the objection to .sports</a> for yourself.</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:mallory">
<p>Maybe <a href="http://archer.wikia.com/wiki/Duchess">Mallory Archer
would</a>. <a class="footnote-backref" href="#fnref:mallory" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:list">
<p>If I've missed any, please contact me and I'll add it. <a class="footnote-backref" href="#fnref:list" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Your Interface is what Matters2014-07-21T07:59:00-07:002014-07-21T07:59:00-07:00Sean Cassidytag:www.seancassidy.me,2014-07-21:/your-interface-is-what-matters.html<p>Programming is a deeply humbling activity. </p>
<p>What else could you say of an activity and profession where it's common
knowledge that you'll never write bug-free code? That regardless of how hard
one tries, it's not enough. And it's not because we just haven't yet found the
right abstraction or designed …</p><p>Programming is a deeply humbling activity. </p>
<p>What else could you say of an activity and profession where it's common
knowledge that you'll never write bug-free code? That regardless of how hard
one tries, it's not enough. And it's not because we just haven't yet found the
right abstraction or designed the right programming language. Software is
surprisingly complex.</p>
<blockquote>
Software entities are more complex for their size than perhaps any other
human construct.
<cite>Frederick P. Brooks, Jr.</cite>
</blockquote>
<p>In his essay, "<a href="http://www.cgl.ucsf.edu/Outreach/pc204/NoSilverBullet">No Silver Bullet</a>", Frederick P.
Brooks (writer of <a href="https://en.wikipedia.org/wiki/The_Mythical_Man-Month">The Mythical Man-Month</a>) describes how
software is complex. So complex that it actually cannot be made simpler: its
complexity is essential. To deal with this non-reducible complexity we create
abstractions in layers.</p>
<p>Yet each time we build an abstraction layer atop of another, we hide details.
In many scenarios these details are unimportant. Sometimes, we hide bugs. And
sometimes bugs are emergent properties of complex components interacting in
unforeseen ways.</p>
<p>What can we do to reduce software complexity when much of our software systems
depend upon open source software we didn't write? Software that we don't fully
understand. Software that we've bet the business on.</p>
<p>To figure this out, we need to find out why software is intrinsically
complicated.</p>
<h1 id="why-is-programming-complicated">Why is programming complicated?</h1>
<p>I recently read an interesting paper, "<a href="https://raw.githubusercontent.com/papers-we-love/papers-we-love/master/design/out-of-the-tar-pit.pdf">Out of the Tar Pit</a>" which
discusses why programming is complicated. When you start trying to understand
a large codebase there are two main ways you can understand it: </p>
<ol>
<li>Model as much of it as you can in your head</li>
<li>Using tests to demonstrate the correctness of pieces that you can't fit
in your mental model</li>
</ol>
<p>So the goal of making your codebase less complicated is two-fold: make your
tests good and make it fit in the average programmer's brain better. You can
accomplish the latter in a number of ways: the <a href="https://en.wikipedia.org/wiki/Principle_of_least_astonishment">principle of least
astonishment</a>, using standard patterns that are familiar to
other programmers, naming your variables sensibly, and, as the paper argues,
limiting the amount of <a href="https://en.wikipedia.org/wiki/Program_state#Program_state">state</a> you need to keep track of. State
increases the <a href="https://en.wikipedia.org/wiki/Arity">arity</a> of every single class or method that can touch
that state, so it's essential to keep global state to a minimum.</p>
<p>What this paper leaves out, in my view, is how modern software is constructed:
you depend on dozens of third party libraries and abstractions to accomplish
every task. How do you fit them into your mental model? Does that API throw an
exception when the network is down or does it block? Can you configure that?
These libraries complicate your software as they simplify some other aspects of
it. We need a way to understand this complexity in order to measure and reduce
it.</p>
<p>The best way to deal with this complexity is via <a href="https://en.wikipedia.org/wiki/Design_by_contract">code contracts</a>.
Each time you call a class or method, what does it require? What does it
produce? Most importantly, what are its error conditions? How does it fail? How
should you handle this failure?</p>
<p>Code contracts are written in three ways: the type and arity of the method
or class's inputs and outputs, the documentation, and implicit, hidden details
to the contract that you can only find out if you read the code.</p>
<p>The better you understand the contracts of the code you're using, the better
you can fit it into your mental model. You will properly handle its inputs and
outputs and error conditions if you understand its code contract. You will have
less bugs and your software will be more robust.</p>
<p>Software without a well-specified code contract is complicated. Nobody wants to
use complicated software where a simpler substitute would do.</p>
<h1 id="code-contracts">Code Contracts</h1>
<p>So, what's a code contract?</p>
<p>It's an interface to your code. It specifies what your code expects as input,
what it will output in what circumstances, and how and why it will fail. Let's
look at an example.</p>
<p><a href="https://github.com/elasticsearch/elasticsearch">Elasticsearch</a> is a distributed searchable database. You
put your data in, and then you can search it in various ways. Let's see how
you would use the Java client to search the database.</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="n">SearchResponse</span> <span class="nf">search</span><span class="p">(</span><span class="n">Client</span> <span class="n">esClient</span><span class="p">,</span> <span class="n">String</span> <span class="n">query</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">final</span> <span class="n">ListenableActionFuture</span><span class="o"><</span><span class="n">SearchResponse</span><span class="o">></span> <span class="n">future</span> <span class="o">=</span>
<span class="n">esClient</span><span class="p">.</span><span class="na">prepareSearch</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="na">indexName</span><span class="p">)</span>
<span class="p">.</span><span class="na">setQuery</span><span class="p">(</span><span class="n">query</span><span class="p">)</span>
<span class="p">.</span><span class="na">execute</span><span class="p">();</span>
<span class="k">return</span> <span class="n">future</span><span class="p">.</span><span class="na">actionGet</span><span class="p">();</span>
<span class="p">}</span>
</pre></div>
<p>This takes an Elasticsearch client <em>esClient</em>, and a JSON query. It searches
the index name (much like a database name in a traditional database) that was
configured. The builder returns a future, but we just call get on it because
we want our method to return the SearchResponse.</p>
<p>How does this fail? What if the <em>esClient</em> can't talk to the Elasticsearch
cluster? What if the query takes hours? What if the JSON is malformatted? Our
search method is nearly undocumented and difficult to use properly. It doesn't
even specify that query is supposed to be JSON.</p>
<p>Let's make it better.</p>
<div class="codehilite"><pre><span></span><span class="cm">/**</span>
<span class="cm"> * Searches the Elasticsearch cluster pointed at by 'esClient'. The JSON</span>
<span class="cm"> * query specifed by 'query' is run and the SearchResponse is returned. If</span>
<span class="cm"> * the query takes longer the configured timeout, it is aborted.</span>
<span class="cm"> */</span>
<span class="kd">public</span> <span class="n">SearchResponse</span> <span class="nf">search</span><span class="p">(</span><span class="n">Client</span> <span class="n">esClient</span><span class="p">,</span> <span class="n">String</span> <span class="n">query</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">final</span> <span class="n">ListenableActionFuture</span><span class="o"><</span><span class="n">SearchResponse</span><span class="o">></span> <span class="n">future</span> <span class="o">=</span>
<span class="n">esClient</span><span class="p">.</span><span class="na">prepareSearch</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="na">indexName</span><span class="p">)</span>
<span class="p">.</span><span class="na">setQuery</span><span class="p">(</span><span class="n">query</span><span class="p">)</span>
<span class="p">.</span><span class="na">setTimeout</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="na">timeout</span><span class="p">)</span> <span class="c1">// ten seconds by default</span>
<span class="p">.</span><span class="na">execute</span><span class="p">();</span>
<span class="k">return</span> <span class="n">future</span><span class="p">.</span><span class="na">actionGet</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="na">timeout</span> <span class="o">+</span> <span class="mi">100</span><span class="p">);</span> <span class="c1">// a little extra time</span>
<span class="p">}</span>
</pre></div>
<p>This is a little better, but we need to know what Elasticsearch will do when it
gets a bad query or times out. We need to consult the <a href="http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/search.html">Elasticsearch
documentation</a>!</p>
<p>Well, it looks like the docs don't mention it. That's frustrating. From a quick
glance at the code, it looks like an ElasticsearchException is thrown, but
that's probably thrown for every problem, and not just bad queries. I guess
we'll need to go read the code.</p>
<p>The <em>execute</em> method is in a class called <a href="https://github.com/elasticsearch/elasticsearch/blob/adb5c198491fc3dce97778ed935a0c2b1efc12ea/src/main/java/org/elasticsearch/action/ActionRequestBuilder.java">AbstractRequestBuilder</a>.
This doesn't mention anything about timeouts at all. But this is an abstract
class so we probably should go to <a href="https://github.com/elasticsearch/elasticsearch/blob/e79b7086de26ece61edaca74fcf7dc99a11de486/src/main/java/org/elasticsearch/action/search/SearchRequestBuilder.java#L1093">SearchRequestBuilder</a> since we're
making a search query. This points us to <a href="https://github.com/elasticsearch/elasticsearch/blob/e79b7086de26ece61edaca74fcf7dc99a11de486/src/main/java/org/elasticsearch/client/Requests.java#L171">Requests.searchRequest</a>
which has basically no information.</p>
<p>This is an example of a poorly specified code contract. Because it is difficult
to find out how it fails (other than throws some generic exception), you need
to run the code to find out. Worse, since that's an implied code contract
rather than an explicit one, it could change at any time.</p>
<p>The final solution to searching Elasticsearch properly with a good code
contract is <a href="https://gist.github.com/cxxr/5dc1977acd9e10d174bc">in this gist</a>. It has good documentation, obvious failure
conditions, and specifies JSON via a type instead of a string. This code sample
is obviously a toy problem and could be improved, but it's better than
Elasticsearch's code contract by a long shot. It specifies RuntimeExceptions in
its throws declaration even though it's not required because it's more clear.</p>
<h2 id="magic">Magic</h2>
<p>With abstractions you can reduce the "accidental" complexity of a software
system. The urge to simplify everything – even essential complexity
– is dangerous. Abstractions that seek to simplify but actually
complicate the system are called <a href="https://en.wikipedia.org/wiki/Magic_%28programming%29">magic</a>.</p>
<blockquote>
The complexity of software is an essential property, not an accidental one.
Hence, descriptions of a software entity that abstract away its complexity
often abstracts away its essence.
<cite>Frederick P. Brooks, Jr.</cite>
</blockquote>
<p>I define magic as software which does something impressive, but has a weak
code contract. Maybe it's poorly specified. Or maybe the code contract is just
complicated or violates some commonly held practice or standard. Maybe it
tries to "figure out the right thing to do" with whatever arguments it
gets. Once you dive into the details of how it works, the curtain is pulled
back and the magic is revealed. </p>
<p>Magical abstractions are not useful because they add complexity rather than
removing it. Avoid creating magical software and libraries by specifying code
contracts, and avoid using them if you can.</p>
<h1 id="write-contracts">Write Contracts</h1>
<p>The static verses dynamic typing argument in programming languages is really
about code contracts. Statically typed programming languages allow you to
specify much of your contract in code. Dynamically typed programming languages
require you to specify your contract in documentation. Middle-of-the-road
languages like C++, Java, and C# require both: the worst of both worlds. Using
a <a href="http://ejenk.com/blog/why-dependently-typed-programming-will-one-day-rock-your-world.html">dependently typed</a> programming language, you might be able to put
your entire code contract in the code and have no documentation at all.</p>
<p>The only way to write complex software systems today is to produce strict code
contracts and to understand other softwares' code contract. The more difficult
your code contract is to understand, the more difficult your software will be
to use.</p>
<p>The goal is simplicity. The way to achieve simplicity is by a well-specified,
easy to digest contract. Prefer libraries that have little to no magic that are
easy to understand, for both you and your colleagues.</p>
<p>Code contracts aren't a silver bullet, but they're a solid step towards one.</p>Write in the Margins2014-07-19T11:47:00-07:002014-07-19T11:47:00-07:00Sean Cassidytag:www.seancassidy.me,2014-07-19:/write-in-the-margins.html<blockquote>
<p>We have all seized the white perimeter as our own<br>
and reached for a pen if only to show<br>
we did not just laze in an armchair turning pages;<br>
we pressed a thought into the wayside,<br>
planted an impression along the verge.</p>
<p><a href="http://www.billy-collins.com/2005/06/marginalia.html">Billy Collins - Marginalia</a></p>
</blockquote>
<p>In my senior year of …</p><blockquote>
<p>We have all seized the white perimeter as our own<br>
and reached for a pen if only to show<br>
we did not just laze in an armchair turning pages;<br>
we pressed a thought into the wayside,<br>
planted an impression along the verge.</p>
<p><a href="http://www.billy-collins.com/2005/06/marginalia.html">Billy Collins - Marginalia</a></p>
</blockquote>
<p>In my senior year of high school, my English teacher finally convinced us to write in our books. Most of us were avid readers and kept our books safe from graffiti and egg salad stains. But we were wrong not to take notes in them. We were missing out on the most important aspect of reading: how a book changes you.</p>
<p>Novels are not stodgy monologues. Non-fiction is meant to widen your mind. Poetry should touch you at a deeper level. They all exist in context, where the author's thoughts mix with your own to create something new - a thought, a memory, a realization. When you read a book, you bring your cumulative life experience together with the author’s words. Writing in the margins captures this forever. Art doesn’t exist in a vacuum, so why pretend like you can’t be a part of it? </p>
<p>While some famous marginalia is <a href="http://marktwainhouse.blogspot.com/2010/01/mark-twains-marginalia.html">funny</a> or <a href="http://www.hrc.utexas.edu/press/releases/2010/dfw/books/">witty</a> or <a href="https://en.wikipedia.org/wiki/Fermat%27s_Last_Theorem">groundbreaking</a>, I don’t expect mine to be read by anyone but me. They’re reminders of how I felt or viewed a passage or phrase. Looking through an old book with notes helps me understand who I used to be, and who I am now. My notes in my copy of <a href="http://www.amazon.com/gp/product/0451524934/ref=as_li_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=0451524934&linkCode=as2&tag=reamorpap-20&linkId=CCMM6CM5JBCNM7R7">1984</a>, written when I was a teenager, sometimes make me cringe, but other times there are glimpses of opinions I still hold, thoughts I still believe. The books I read are a part of me, and the notes in the margins help me to not forget.</p>
<p>Sometimes the notes are just underlines or brackets to mark something for later. If a page is particularly important, I'll stick a <a href="http://www.amazon.com/gp/product/B000MK4RAM/ref=as_li_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B000MK4RAM&linkCode=as2&tag=reamorpap-20&linkId=Q5WXAXM3RV7P6GJS">book tab</a> on it so I can get to it later. Other times, it’s a short comment or observation. If I have a long thought, I’ll put it in <a href="https://www.evernote.com/referral/Registration.action?uid=17811933&sig=9d546463f186d03da9bcbc454f6c975e">Evernote</a>. Each book gets its own note in Evernote where I can take detailed notes, reminders to read other books. I used to use a <a href="http://thoughtcatalog.com/ryan-holiday/2013/08/how-and-why-to-keep-a-commonplace-book/">commonplace book</a> but I prefer keeping the notes digitally so I can search them. For academic papers I do the same, but attach the paper to the note so I can search that too. A collaborative web site where people could share notes and highlights for academic papers is something that I hope someone will build soon.</p>
<p>When I tell people that I write in my books, they are horrified that I would tarnish them. If you collect rare books, then don’t write in them. But I buy the pulpiest copies of books I can find: the more used and worn the copy, the better. I write in them, and occasionally they’re already written in, which is a treat. I hope that ebooks will help people get past their philistine preconceptions of what is proper to do in a book. The public highlighting feature for ebooks is a great way to feel like you’re part of a community. Television and movies have plenty of chatter about them on the Internet, but books are often read by too few people to connect. Public highlighting is the marginalia of today.</p>
<p>Write in the margins. Be a part of what you read. Remember who you were so you can know who you are.</p>Meditations2014-05-21T15:50:00-07:002014-05-21T15:50:00-07:00Sean Cassidytag:www.seancassidy.me,2014-05-21:/meditations.html<p>Last year I started reading <a href="http://www.amazon.com/gp/product/B000FC1JAI/ref=as_li_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B000FC1JAI&linkCode=as2&tag=reamorpap-20&linkId=BSAQSOA3BZ32QS25">Meditations</a> by Marcus Aurelius<sup id="fnref:meditations"><a class="footnote-ref" href="#fn:meditations">1</a></sup>. While I was reading it, I was struck at how many of the entries were just simple reminders to himself. Don't get mad at people unnecessarily. Remember that you are just one of many. Don't get distracted.</p>
<p>He was making …</p><p>Last year I started reading <a href="http://www.amazon.com/gp/product/B000FC1JAI/ref=as_li_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B000FC1JAI&linkCode=as2&tag=reamorpap-20&linkId=BSAQSOA3BZ32QS25">Meditations</a> by Marcus Aurelius<sup id="fnref:meditations"><a class="footnote-ref" href="#fn:meditations">1</a></sup>. While I was reading it, I was struck at how many of the entries were just simple reminders to himself. Don't get mad at people unnecessarily. Remember that you are just one of many. Don't get distracted.</p>
<p>He was making the same mistakes over and over, just like I was.</p>
<p>I thought that writing down what I learned would help it stick. Every time I wanted to add a new tidbit, I would review all of them. It worked much better than I thought it would: it brought issues to the forefront of my mind that I wasn't considering. This was especially helpful when I was in a rush or under stress. Staying calm during a production outage and thinking of the lesson I learned, "First do no harm" kept me from making dumb mistakes.</p>
<p>Now that I'm leaving my job for another opportunity, I decided to collect them and share them so that they might be useful to a wider audience. They follow verbatim. Quotes are from Meditations.</p>
<h1 id="engineering">Engineering</h1>
<h3 id="failover-that-you-havent-tested-isnt-a-failover">Failover that you haven’t tested isn’t a failover</h3>
<p><a href="http://en.wikipedia.org/wiki/Failover">Failover</a> needs testing and monitoring. Backups need testing and monitoring. Include these in your work estimates.</p>
<h3 id="make-it-hard-to-use-wrong">Make it hard to use wrong</h3>
<p>Libraries and APIs should be easy to use correctly and difficult to use incorrectly.</p>
<h3 id="documentation-driven-design">Documentation-driven design</h3>
<p>Write the documentation first. As you develop, change the documentation before changing the code. Focus on your user: they are the only reason you're doing this.</p>
<h3 id="fail-fast">Fail fast</h3>
<p>For code that’s going to fail, make it fail earlier rather than later. Failing at object construction time is better than while it’s being used. Failing at compile time better still.</p>
<h3 id="make-it-monitorable">Make it monitorable</h3>
<p>If you can’t monitor it, how do you know if it even exists? Take the time to get monitoring right.</p>
<h3 id="monitor-it-from-a-customers-perspective">Monitor it from a customer’s perspective</h3>
<p>Write a program which continually uses your APIs or user interfaces. Make it send emails to everyone when it doesn’t work as expected. This can catch subtle errors and monitoring gaps.</p>
<h3 id="test-your-hypotheses">Test your hypotheses</h3>
<p>If you think you improved performance, you need to have enough monitoring and diagnostics in place to show that you did. Otherwise you didn’t improve anything.</p>
<h3 id="make-it-easy-to-operate">Make it easy to operate</h3>
<p>Making products easy to operate and fix in production is a necessary and oft-unspoken feature of every product. There should be tools which can fix issues, and these tools should be tested like every other product.</p>
<h3 id="think-about-what-to-log">Think about what to log</h3>
<p>Logging exactly what you need to debug without logging too much is tough. Take some time to get it at least close to correct.</p>
<h3 id="make-it-easy-to-deploy">Make it easy to deploy</h3>
<p>The goal should be continuous deployment. Everything that gets in the way should be automated.</p>
<h3 id="dont-hide-logic">Don’t hide logic</h3>
<p>Keep blocks of code that are related near one another. Moving it to a different place makes it hard to follow.</p>
<h3 id="beware-expiration-times">Beware expiration times</h3>
<p>Expiration times are the devil. If you see one, it must go on some calendar somewhere. Think before you set an expiration time to five years. What if you set it to a month from now? There are only two valid values for expiration times: finite and on a calendar, or eternity.</p>
<h3 id="negative-test-everything-access-control-related">Negative test everything access-control related</h3>
<p>If you have a signature authentication scheme, you must have tests which test both it accepting signatures and rejecting bad ones. You really should have both unit tests and integration tests for this. The same goes for any access controls.</p>
<h1 id="operations">Operations</h1>
<h3 id="first-do-no-harm">First do no harm</h3>
<p>When there is an outage or a problem in production, before you take any steps to correct it, remember not to make the problem worse. Often decisions are made in a panic without fully considering the implications.</p>
<h3 id="analyze-theories-before-acting-on-them">Analyze theories before acting on them</h3>
<p>It looks like machines in data center A are corrupting files and machines in data center B are working fine. Solution if you stop your analysis there: stop using data center A. Actual problem: network latency triggering a bug. Analyze your theory more and you’ll find the real issue.</p>
<h3 id="logs-need-to-be-searchable">Logs need to be searchable</h3>
<p>Put all the logs in some centralized, searchable location. This is critical for debugging production issues fast.</p>
<h3 id="dont-blame-others">Don't blame others</h3>
<blockquote>
<p>A good doctor isn’t surprised when his patients have fevers, or a helmsman when the wind blows against him.</p>
</blockquote>
<p>Bugs happen. It's important not to focus on who made the mistake, but why our process failed. Resist people who wish to blame anyone.</p>
<h1 id="communication">Communication</h1>
<h3 id="no-one-reads-long-emails">No one reads long emails</h3>
<p>So make ‘em concise.</p>
<h3 id="distance-makes-communication-hard">Distance makes communication hard</h3>
<p>Most problems we had stemmed from some lack of teamwork or communication. Phone calls and frequent messages or emails help. Regular meetings, not so much.</p>
<h3 id="questions-not-statements">Questions not statements</h3>
<blockquote>
<p>If they’ve made a mistake, correct them gently and show them where they went
wrong. If you can’t do that, then the blame lies with you.</p>
</blockquote>
<p>When criticizing another’s work, ask questions rather than make statements. This helps people think through their own work and assess it objectively.</p>
<h3 id="the-principle-of-charity">The Principle of Charity</h3>
<p>Assume people’s work is well intentioned from the start. No one sets out to do poor work or make (what seems like in hindsight) bad decisions.</p>
<h3 id="turn-criticism-into-better-products">Turn criticism into better products</h3>
<blockquote>
<p>Beautiful things of any kind are beautiful in themselves and sufficient to themselves. Praise is extraneous.</p>
</blockquote>
<p>When someone criticizes a product or feature you wrote, see if that criticism can be turned into a bug that can be fixed or an improvement to be added later. Challenge people who criticize often to always include a way to address their criticism. Don't be offended.</p>
<h3 id="keep-overhead-positive">Keep overhead positive</h3>
<p>For every hour of overhead (meetings, process, travelling), it should at have at least that many hours of saved time. Travelling for two days is worth it if it saves 40 hours worth of work from being needlessly done.</p>
<h3 id="convincing-others-is-delicate-work">Convincing others is delicate work</h3>
<p>Start with the most important point only. Once they begin to see that the current situation might not be perfect, introduce more ideas slowly. Act unemotionally. Allow them to think through the idea on their own. Watch 12 Angry Men again.</p>
<h3 id="vertical-visibility-is-a-double-edged-sword">Vertical visibility is a double-edged sword</h3>
<p>It’s great when higher-ups notice your great work. It’s bad when they’re informed of every little bug and freak out about it.</p>
<h3 id="take-unsolicited-advice-from-experience-with-a-grain-of-salt">Take unsolicited advice from experience with a grain of salt</h3>
<p>Some people will try and give you unsolicited advice. If their main argument is they’ve been doing this for N years which is X years more than you, you should be skeptical.</p>
<h3 id="have-infodump-sessions">Have infodump sessions</h3>
<p>Communicating is hard. If you want to learn about a system, put it in the infodump spreadsheet and pick a person who knows about it. If enough people vote for it (small, 2 or 3 required), it will be given. <a href="https://www.seancassidy.me/host-an-infodump-session.html">No slides. Hands-on.</a></p>
<h3 id="dont-feign-surprise">Don’t feign surprise</h3>
<blockquote>
<p>Leave other people’s mistakes where they lie.</p>
</blockquote>
<p>You know when you're surprised when someone did something stupid, or didn't know something? <a href="http://xkcd.com/1053/">People learn things everyday</a>. Acting surprised is rude. <a href="http://brooklynoptimist.com/2014/04/10/hacker-school-banning-feigned-surprise-is-absolutely-brilliant/">So don’t do it</a>.</p>
<h3 id="watch-where-your-company-is-going">Watch where your company is going</h3>
<p>Where are you headed? Where are they headed? If you're not aligned, it's time to look elsewhere. Ask yourself this every few months.</p>
<h1 id="innovation">Innovation</h1>
<h3 id="the-red-queen-hypothesis">The Red Queen Hypothesis</h3>
<p>It’s not enough to merely improve. Everyone is improving constantly, so linearly improving your own offerings merely keeps you in the same place. You must go above and beyond current technology to make any real progress.</p>
<h3 id="take-time-to-think">Take time to think</h3>
<p>If every moment of every day is spent adding features, removing bugs, or coding, it’s hard to innovate. <a href="https://en.wikipedia.org/wiki/Cognitive_Surplus">Make sure there’s time</a> in the development cycle for lower intensity work to allow reflection and creative thinking.</p>
<h3 id="solve-problems-dont-just-create-features">Solve problems, don’t just create features</h3>
<p>The goal of software engineering is to <a href="https://www.youtube.com/watch?v=f84n5oFoZBc">solve people’s problems</a>. Adding features for completeness wastes time. Beware product feature checkboxes and customers who don’t want to buy because you don’t have feature X even though they don’t need it.</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:meditations">
<p>If you're going to read it too (which I strongly suggest you do, as it's a great book), I highly recommend the <a href="http://www.amazon.com/gp/product/B000FC1JAI/ref=as_li_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B000FC1JAI&linkCode=as2&tag=reamorpap-20&linkId=BSAQSOA3BZ32QS25">Gregory Hays translation of Meditations</a>. It's seriously worth the money. <a class="footnote-backref" href="#fnref:meditations" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Better Java2014-05-19T12:23:00-07:002014-05-19T12:23:00-07:00Sean Cassidytag:www.seancassidy.me,2014-05-19:/better-java.html<p>Java is one of the most popular programming languages around, but no one seems
to enjoy using it. Well, Java is actually an alright programming language, and
since Java 8 came out recently, I decided to compile a list of libraries,
practices, and tools to make using Java better. </p>
<p>This …</p><p>Java is one of the most popular programming languages around, but no one seems
to enjoy using it. Well, Java is actually an alright programming language, and
since Java 8 came out recently, I decided to compile a list of libraries,
practices, and tools to make using Java better. </p>
<p>This article is <a href="https://github.com/cxxr/better-java">on Github</a>. Feel free to
contribute and add your own Java tips and best practices.</p>
<div class="toc">
<ul>
<li><a href="#style">Style</a><ul>
<li><a href="#structs">Structs</a><ul>
<li><a href="#the-builder-pattern">The Builder Pattern</a></li>
</ul>
</li>
<li><a href="#dependency-injection">Dependency injection</a></li>
<li><a href="#avoid-nulls">Avoid Nulls</a></li>
<li><a href="#immutable-by-default">Immutable-by-default</a></li>
<li><a href="#avoid-lots-of-util-classes">Avoid lots of Util classes</a></li>
<li><a href="#formatting">Formatting</a><ul>
<li><a href="#javadoc">Javadoc</a></li>
</ul>
</li>
<li><a href="#streams">Streams</a></li>
</ul>
</li>
<li><a href="#deploying">Deploying</a><ul>
<li><a href="#frameworks">Frameworks</a></li>
<li><a href="#maven">Maven</a><ul>
<li><a href="#dependency-convergence">Dependency Convergence</a></li>
</ul>
</li>
<li><a href="#continuous-integration">Continuous Integration</a></li>
<li><a href="#maven-repository">Maven repository</a></li>
<li><a href="#configuration-management">Configuration management</a></li>
</ul>
</li>
<li><a href="#libraries">Libraries</a><ul>
<li><a href="#missing-features">Missing Features</a><ul>
<li><a href="#apache-commons">Apache Commons</a></li>
<li><a href="#guava">Guava</a></li>
<li><a href="#gson">Gson</a></li>
<li><a href="#java-tuples">Java Tuples</a></li>
<li><a href="#joda-time">Joda-Time</a></li>
<li><a href="#lombok">Lombok</a></li>
<li><a href="#play-framework">Play framework</a></li>
<li><a href="#slf4j">SLF4J</a></li>
<li><a href="#jooq">jOOQ</a></li>
</ul>
</li>
<li><a href="#testing">Testing</a><ul>
<li><a href="#junit-4">jUnit 4</a></li>
<li><a href="#jmock">jMock</a></li>
<li><a href="#assertj">AssertJ</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#tools">Tools</a><ul>
<li><a href="#intellij-idea">IntelliJ IDEA</a><ul>
<li><a href="#chronon">Chronon</a></li>
</ul>
</li>
<li><a href="#jrebel">JRebel</a></li>
<li><a href="#the-checker-framework">The Checker Framework</a></li>
<li><a href="#eclipse-memory-analyzer">Eclipse Memory Analyzer</a></li>
</ul>
</li>
<li><a href="#resources">Resources</a><ul>
<li><a href="#books">Books</a></li>
<li><a href="#podcasts">Podcasts</a></li>
</ul>
</li>
</ul>
</div>
<h1 id="style">Style</h1>
<p>Traditionally, Java was programmed in a very verbose enterprise JavaBean style.
The new style is much cleaner, more correct, and easier on the eyes.</p>
<h2 id="structs">Structs</h2>
<p>One of the simplest things we as programmers do is pass around data. The
traditional way to do this is to define a JavaBean:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">DataHolder</span> <span class="p">{</span>
<span class="kd">private</span> <span class="n">String</span> <span class="n">data</span><span class="p">;</span>
<span class="kd">public</span> <span class="nf">DataHolder</span><span class="p">()</span> <span class="p">{</span>
<span class="p">}</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">setData</span><span class="p">(</span><span class="n">String</span> <span class="n">data</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="na">data</span> <span class="o">=</span> <span class="n">data</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">public</span> <span class="n">String</span> <span class="nf">getData</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="k">this</span><span class="p">.</span><span class="na">data</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>This is verbose and wasteful. Even if your IDE automatically generated this
code, it's a waste. So, <a href="http://www.javapractices.com/topic/TopicAction.do?Id=84">don't do this</a>.</p>
<p>Instead, I prefer the C struct style of writing classes that merely hold data:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">DataHolder</span> <span class="p">{</span>
<span class="kd">public</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">data</span><span class="p">;</span>
<span class="kd">public</span> <span class="nf">DataHolder</span><span class="p">(</span><span class="n">String</span> <span class="n">data</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="na">data</span> <span class="o">=</span> <span class="n">data</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>This is a reduction in number of lines of code by a half. Further, this class
is immutable unless you extend it, so we can reason about it easier as we know
that it can't be changed.</p>
<p>If you're storing objects like Map or List that can be modified easily, you
should instead use ImmutableMap or ImmutableList, which is discussed in the
section about immutability.</p>
<h3 id="the-builder-pattern">The Builder Pattern</h3>
<p>If you have a rather complicated object that you want to build a struct for,
consider the Builder pattern.</p>
<p>You make a subclass in your object which will construct your object. It uses
mutable state, but as soon as you call build, it will emit an immutable
object.</p>
<p>Imagine we had a more complicated <em>DataHolder</em>. The builder for it might look
like:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">ComplicatedDataHolder</span> <span class="p">{</span>
<span class="kd">public</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">data</span><span class="p">;</span>
<span class="kd">public</span> <span class="kd">final</span> <span class="kt">int</span> <span class="n">num</span><span class="p">;</span>
<span class="c1">// lots more fields and a constructor</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">Builder</span> <span class="p">{</span>
<span class="kd">private</span> <span class="n">String</span> <span class="n">data</span><span class="p">;</span>
<span class="kd">private</span> <span class="kt">int</span> <span class="n">num</span><span class="p">;</span>
<span class="kd">public</span> <span class="n">Builder</span> <span class="nf">data</span><span class="p">(</span><span class="n">String</span> <span class="n">data</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="na">data</span> <span class="o">=</span> <span class="n">data</span><span class="p">;</span>
<span class="k">return</span> <span class="k">this</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">public</span> <span class="n">Builder</span> <span class="nf">num</span><span class="p">(</span><span class="kt">int</span> <span class="n">num</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="na">num</span> <span class="o">=</span> <span class="n">num</span><span class="p">;</span>
<span class="k">return</span> <span class="k">this</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">public</span> <span class="n">ComplicatedDataHolder</span> <span class="nf">build</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="n">ComplicatedDataHolder</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">num</span><span class="p">);</span> <span class="c1">// etc</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Then to use it:</p>
<div class="codehilite"><pre><span></span><span class="kd">final</span> <span class="n">ComplicatedDataHolder</span> <span class="n">cdh</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ComplicatedDataHolder</span><span class="p">.</span><span class="na">Builder</span><span class="p">()</span>
<span class="p">.</span><span class="na">data</span><span class="p">(</span><span class="s">"set this"</span><span class="p">)</span>
<span class="p">.</span><span class="na">num</span><span class="p">(</span><span class="mi">523</span><span class="p">)</span>
<span class="p">.</span><span class="na">build</span><span class="p">();</span>
</pre></div>
<p>There are <a href="http://jlordiales.wordpress.com/2012/12/13/the-builder-pattern-in-practice/">better examples of Builders elsewhere</a> but this should
give you a taste for what it's like. This ends up with a lot of the boilerplate
we were trying to avoid, but it gets you immutable objects and a very fluent
interface.</p>
<h2 id="dependency-injection">Dependency injection</h2>
<p>This is more of a software engineering section than a Java section, but one of
the best ways to write testable software is to use <a href="http://en.wikipedia.org/wiki/Dependency_injection">dependency injection</a>
(DI). Because Java strongly encourages OO design, to make testable software,
you need to use DI.</p>
<p>In Java, this is typically done with the <a href="http://projects.spring.io/spring-framework/">Spring Framework</a>. It has a
either code-based wiring or XML configuration-based wiring. If you use the XML
configuration, it's important that you <a href="http://programmers.stackexchange.com/questions/92393/what-does-the-spring-framework-do-should-i-use-it-why-or-why-not">don't overuse Spring</a> because
of its XML-based configuration format. There should be absolutely no logic or
control structures in XML. It should only inject dependencies.</p>
<p>Good alternatives to using Spring is Google and Square's <a href="http://square.github.io/dagger/">Dagger</a>
library or Google's <a href="https://code.google.com/p/google-guice/">Guice</a>. They don't use Spring's XML
configuration file format, and instead they put the injection logic in
annotations and in code.</p>
<h2 id="avoid-nulls">Avoid Nulls</h2>
<p>Try to avoid using nulls when you can. Do not return null collections when you
should have instead returned an empty collection. If you're going to use null,
consider the <a href="http://code.google.com/p/google-guice/wiki/UseNullable">@Nullable</a> annotation. <a href="http://www.jetbrains.com/idea/">IntelliJ IDEA</a> has
built-in support for the @Nullable annotation.</p>
<p>If you're using <a href="http://www.java8.org/">Java 8</a>, you can use the excellent new
<a href="http://www.oracle.com/technetwork/articles/java/java8-optional-2175753.html">Optional</a> type. If a value may or may not be present, wrap it in
an <em>Optional</em> class like this:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">FooWidget</span> <span class="p">{</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">data</span><span class="p">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">Optional</span><span class="o"><</span><span class="n">Bar</span><span class="o">></span> <span class="n">bar</span><span class="p">;</span>
<span class="kd">public</span> <span class="nf">FooWidget</span><span class="p">(</span><span class="n">String</span> <span class="n">data</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">Optional</span><span class="p">.</span><span class="na">empty</span><span class="p">());</span>
<span class="p">}</span>
<span class="kd">public</span> <span class="nf">FooWidget</span><span class="p">(</span><span class="n">String</span> <span class="n">data</span><span class="p">,</span> <span class="n">Optional</span><span class="o"><</span><span class="n">Bar</span><span class="o">></span> <span class="n">bar</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="na">data</span> <span class="o">=</span> <span class="n">data</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="na">bar</span> <span class="o">=</span> <span class="n">bar</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">public</span> <span class="n">Optional</span><span class="o"><</span><span class="n">Bar</span><span class="o">></span> <span class="nf">getBar</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">bar</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>So now it's clear that <em>data</em> will never be null, but <em>bar</em> may or may not be
present. <em>Optional</em> has methods like <em>isPresent</em>, which may make it feel like
not a lot is different from just checking <em>null</em>. But it allows you to write
statements like:</p>
<div class="codehilite"><pre><span></span><span class="kd">final</span> <span class="n">Optional</span><span class="o"><</span><span class="n">FooWidget</span><span class="o">></span> <span class="n">fooWidget</span> <span class="o">=</span> <span class="n">maybeGetFooWidget</span><span class="p">();</span>
<span class="kd">final</span> <span class="n">Baz</span> <span class="n">baz</span> <span class="o">=</span> <span class="n">fooWidget</span><span class="p">.</span><span class="na">flatMap</span><span class="p">(</span><span class="n">FooWidget</span><span class="p">::</span><span class="n">getBar</span><span class="p">)</span>
<span class="p">.</span><span class="na">flatMap</span><span class="p">(</span><span class="n">BarWidget</span><span class="p">::</span><span class="n">getBaz</span><span class="p">)</span>
<span class="p">.</span><span class="na">orElse</span><span class="p">(</span><span class="n">defaultBaz</span><span class="p">);</span>
</pre></div>
<p>Which is much better than chained if null checks. The only downside of using
Optional is that the standard library doesn't have good Optional support, so
dealing with nulls is still required there.</p>
<h2 id="immutable-by-default">Immutable-by-default</h2>
<p>Unless you have a good reason to make them otherwise, variables, classes, and
collections should be immutable.</p>
<p>Variable references can be made immutable with <em>final</em>:</p>
<div class="codehilite"><pre><span></span><span class="kd">final</span> <span class="n">FooWidget</span> <span class="n">fooWidget</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">condition</span><span class="p">())</span> <span class="p">{</span>
<span class="n">fooWidget</span> <span class="o">=</span> <span class="n">getWidget</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">try</span> <span class="p">{</span>
<span class="n">fooWidget</span> <span class="o">=</span> <span class="n">cachedFooWidget</span><span class="p">.</span><span class="na">get</span><span class="p">();</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="n">CachingException</span> <span class="n">e</span><span class="p">)</span> <span class="p">{</span>
<span class="n">log</span><span class="p">.</span><span class="na">error</span><span class="p">(</span><span class="s">"Couldn't get cached value"</span><span class="p">,</span> <span class="n">e</span><span class="p">);</span>
<span class="k">throw</span> <span class="n">e</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// fooWidget is guaranteed to be set here</span>
</pre></div>
<p>Now you can be sure that fooWidget won't be accidentally reassigned. The <em>final</em>
keyword works with if/else blocks and with try/catch blocks. Of course, if the
<em>fooWidget</em> itself isn't immutable you could easily mutate it.</p>
<p>Collections should, whenever possible, use the Guava <a href="http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/ImmutableMap.html">ImmutableMap</a>,
<a href="http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/ImmutableList.html">ImmutableList</a>, or <a href="http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/ImmutableSet.html">ImmutableSet</a> classes. These
have builders so that you can build them up dynamically and then mark them
immutable by calling the build method.</p>
<p>Classes should be made immutable by declaring fields immutable (via <em>final</em>)
and by using immutable collections. Optionally, you can make the class itself
<em>final</em> so that it can't be extended and made mutable.</p>
<h2 id="avoid-lots-of-util-classes">Avoid lots of Util classes</h2>
<p>Be careful if you find yourself adding a lot of methods to a Util class.</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MiscUtil</span> <span class="p">{</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="n">String</span> <span class="nf">frobnicateString</span><span class="p">(</span><span class="n">String</span> <span class="n">base</span><span class="p">,</span> <span class="kt">int</span> <span class="n">times</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// ... etc</span>
<span class="p">}</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">throwIfCondition</span><span class="p">(</span><span class="kt">boolean</span> <span class="n">condition</span><span class="p">,</span> <span class="n">String</span> <span class="n">msg</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// ... etc</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>These classes, at first, seem attractive because the methods that go in them
don't really belong in any one place. So you throw them all in here in the
name of code reuse.</p>
<p>The cure is worse than the disease. Put these classes where they belong, or
if you must have common methods like this, consider <a href="http://www.java8.org/">Java 8</a>'s default
methods on interfaces. Then you could lump common actions into interfaces.
And, since they're interfaces, you can implement multiple of them.</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">Thrower</span> <span class="p">{</span>
<span class="k">default</span> <span class="kt">void</span> <span class="nf">throwIfCondition</span><span class="p">(</span><span class="kt">boolean</span> <span class="n">condition</span><span class="p">,</span> <span class="n">String</span> <span class="n">msg</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// ...</span>
<span class="p">}</span>
<span class="k">default</span> <span class="kt">void</span> <span class="nf">throwAorB</span><span class="p">(</span><span class="n">Throwable</span> <span class="n">a</span><span class="p">,</span> <span class="n">Throwable</span> <span class="n">b</span><span class="p">,</span> <span class="kt">boolean</span> <span class="n">throwA</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// ...</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Then every class which needs it can simply implement this interface.</p>
<h2 id="formatting">Formatting</h2>
<p>Formatting is so much less important than most programmers make it out to be.
Does consistency show that you care about your craft and does it help others
read? Absolutely. But let's not waste a day adding spaces to if blocks so that
it "matches".</p>
<p>If you absolutely need a code formatting guide, I highly recommend
<a href="http://google-styleguide.googlecode.com/svn/trunk/javaguide.html">Google's Java Style</a> guide. The best part of that guide is the
<a href="http://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s6-programming-practices">Programming Practices</a> section. Definitely worth a read.</p>
<h3 id="javadoc">Javadoc</h3>
<p>Documenting your user facing code is important. And this means
<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/ImmutableMap.Builder.html">using examples</a> and using sensible descriptions of variables,
methods, and classes.</p>
<p>The corollary of this is to not document what doesn't need documenting. If you
don't have anything to say about what an argument is, or if it's obvious,
don't document it. Boilerplate documentation is worse than no documentation at
all, as it tricks your users into thinking that there is documentation.</p>
<h2 id="streams">Streams</h2>
<p><a href="http://www.java8.org/">Java 8</a> has a nice <a href="http://blog.hartveld.com/2013/03/jdk-8-33-stream-api.html">stream</a> and lambda syntax. You could
write code like this:</p>
<div class="codehilite"><pre><span></span><span class="kd">final</span> <span class="n">List</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">filtered</span> <span class="o">=</span> <span class="n">list</span><span class="p">.</span><span class="na">stream</span><span class="p">()</span>
<span class="p">.</span><span class="na">filter</span><span class="p">(</span><span class="n">s</span> <span class="o">-></span> <span class="n">s</span><span class="p">.</span><span class="na">startsWith</span><span class="p">(</span><span class="s">"s"</span><span class="p">))</span>
<span class="p">.</span><span class="na">map</span><span class="p">(</span><span class="n">s</span> <span class="o">-></span> <span class="n">s</span><span class="p">.</span><span class="na">toUpperCase</span><span class="p">());</span>
</pre></div>
<p>Instead of this:</p>
<div class="codehilite"><pre><span></span><span class="kd">final</span> <span class="n">List</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">filtered</span> <span class="o">=</span> <span class="n">Lists</span><span class="p">.</span><span class="na">newArrayList</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="n">String</span> <span class="n">str</span> <span class="p">:</span> <span class="n">list</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">str</span><span class="p">.</span><span class="na">startsWith</span><span class="p">(</span><span class="s">"s"</span><span class="p">)</span> <span class="p">{</span>
<span class="n">filtered</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="n">str</span><span class="p">.</span><span class="na">toUpperCase</span><span class="p">());</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>This allows you to write more fluent code, which is more readable.</p>
<h1 id="deploying">Deploying</h1>
<p>Deploying Java properly can be a bit tricky. There are two main ways to deploy
Java nowadays: use a framework or use a home grown solution that is more
flexible.</p>
<h2 id="frameworks">Frameworks</h2>
<p>Because deploying Java isn't easy, frameworks have been made which can help.
Two of the best are <a href="https://dropwizard.github.io/dropwizard/">Dropwizard</a> and <a href="http://projects.spring.io/spring-boot/">Spring Boot</a>.
The <a href="http://www.playframework.com/">Play framework</a> can also be considered one of these deployment
frameworks as well.</p>
<p>All of them try to lower the barrier to getting your code out the door.
They're especially helpful if you're new to Java or if you need to get things
done fast. Single JAR deployments are just easier than complicated WAR or EAR
deployments.</p>
<p>However, they can be somewhat inflexible and are rather opinionated, so if
your project doesn't fit with the choices the developers of your framework
made, you'll have to migrate to a more hand-rolled configuration.</p>
<h2 id="maven">Maven</h2>
<p><strong>Good alternative</strong>: <a href="http://www.gradle.org/">Gradle</a>.</p>
<p>Maven is still the standard tool to build, package, and run your tests. There
are alternatives, like Gradle, but they don't have the same adoption that Maven
has. If you're new to Maven, you should start with
<a href="http://books.sonatype.com/mvnex-book/reference/index.html">Maven by Example</a>.</p>
<p>I like to have a root POM with all of the external dependencies you want to
use. It will look something <a href="https://gist.github.com/cxxr/10787344">like this</a>. This root POM has only one
external dependency, but if your product is big enough, you'll have dozens.
Your root POM should be a project on its own: in version control and released
like any other Java project.</p>
<p>If you think that tagging your root POM for every external dependency change
is too much, you haven't wasted a week tracking down cross project dependency
errors.</p>
<p>All of your Maven projects will include your root POM and all of its version
information. This way, you get your company's selected version of each
external dependency, and all of the correct Maven plugins. If you need to pull
in external dependencies, it works just like this:</p>
<div class="codehilite"><pre><span></span><span class="nt"><dependencies></span>
<span class="nt"><dependency></span>
<span class="nt"><groupId></span>org.third.party<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>some-artifact<span class="nt"></artifactId></span>
<span class="nt"></dependency></span>
<span class="nt"></dependencies></span>
</pre></div>
<p>If you want internal dependencies, that should be managed by each individual
project's <strong><dependencyManagement></strong> section. Otherwise it would be difficult
to keep the root POM version number sane.</p>
<h3 id="dependency-convergence">Dependency Convergence</h3>
<p>One of the best parts about Java is the massive amount of third party
libraries which do everything. Essentially every API or toolkit has a Java SDK
and it's easy to pull it in with Maven.</p>
<p>And those Java libraries themselves depend on specific versions of other
libraries. If you pull in enough libraries, you'll get version conflicts, that
is, something like this:</p>
<div class="codehilite"><pre><span></span><span class="err">Foo library depends on Bar library v1.0</span>
<span class="err">Widget library depends on Bar library v0.9</span>
</pre></div>
<p>Which version will get pulled into your project?</p>
<p>With the <a href="https://maven.apache.org/enforcer/enforcer-rules/dependencyConvergence.html">Maven dependency convergence plugin</a>, the build will
error if your dependencies don't use the same version. Then, you have two
options for solving the conflict:</p>
<ol>
<li>Explicitly pick a version for Bar in your <em>dependencyManagement</em> section</li>
<li>Exclude Bar from either Foo or Widget</li>
</ol>
<p>The choice of which to choose depends on your situation: if you want to track
one project's version, then exclude makes sense. On the other hand, if you
want to be explicit about it, you can pick a version, although you'll need to
update it when you update the other dependencies.</p>
<h2 id="continuous-integration">Continuous Integration</h2>
<p>Obviously you need some kind of continuous integration server which is going
to continuously build your SNAPSHOT versions and tag builds based on git tags.</p>
<p><a href="http://jenkins-ci.org/">Jenkins</a> and <a href="https://travis-ci.org/">Travis-CI</a> are natural choices.</p>
<p>Code coverage is useful, and <a href="http://cobertura.github.io/cobertura/">Cobertura</a> has
<a href="http://mojo.codehaus.org/cobertura-maven-plugin/usage.html">a good Maven plugin</a> and CI support. There are other code
coverage tools for Java, but I've used Cobertura.</p>
<h2 id="maven-repository">Maven repository</h2>
<p>You need a place to put your JARs, WARs, and EARs that you make, so you'll
need a repository.</p>
<p>Common choices are <a href="http://www.jfrog.com/">Artifactory</a> and <a href="http://www.sonatype.com/nexus">Nexus</a>. Both work,
and have their own <a href="http://stackoverflow.com/questions/364775/should-we-use-nexus-or-artifactory-for-a-maven-repo">pros and cons</a>.</p>
<p>You should have your own Artifactory/Nexus installation and
<a href="http://www.jfrog.com/confluence/display/RTF/Configuring+Artifacts+Resolution">mirror your dependencies</a> onto it. This will stop your
build from breaking because some upstream Maven repository went down.</p>
<h2 id="configuration-management">Configuration management</h2>
<p>So now you've got your code compiled, your repository set up, and you need to
get your code out in your development environment and eventually push it to
production. Don't skimp here, because automating this will pay dividends for a
long time.</p>
<p><a href="http://www.getchef.com/chef/">Chef</a>, <a href="http://puppetlabs.com/">Puppet</a>, and <a href="http://www.ansible.com/home">Ansible</a> are typical choices.
I've written an alternative called <a href="http://www.gosquadron.com">Squadron</a>, which I, of course,
think you should check out because it's easier to get right than the
alternatives.</p>
<p>Regardless of what tool you choose, don't forget to automate your deployments.</p>
<h1 id="libraries">Libraries</h1>
<p>Probably the best feature about Java is the extensive amount of libraries it
has. This is a small collection of libraries that are likely to be applicable
to the largest group of people.</p>
<h2 id="missing-features">Missing Features</h2>
<p>Java's standard library, once an amazing step forward, now looks like it's
missing several key features.</p>
<h3 id="apache-commons">Apache Commons</h3>
<p><a href="http://commons.apache.org/">The Apache Commons project</a> has a bunch of useful libraries.</p>
<p><strong>Commons Codec</strong> has many useful encoding/decoding methods for Base64 and hex
strings. Don't waste your time rewriting those.</p>
<p><strong>Commons Lang</strong> is the go-to library for String manipulation and creation,
character sets, and a bunch of miscellaneous utility methods.</p>
<p><strong>Commons IO</strong> has all the File related methods you could ever want. It has
<a href="http://commons.apache.org/proper/commons-io/javadocs/api-release/org/apache/commons/io/FileUtils.html#copyDirectory(java.io.File,%20java.io.File)">FileUtils.copyDirectory</a>, <a href="http://commons.apache.org/proper/commons-io/javadocs/api-release/org/apache/commons/io/FileUtils.html#writeStringToFile(java.io.File,%20java.lang.String)">FileUtils.writeStringToFile</a>,
<a href="http://commons.apache.org/proper/commons-io/javadocs/api-release/org/apache/commons/io/IOUtils.html#readLines(java.io.InputStream)">IOUtils.readLines</a> and much more.</p>
<h3 id="guava">Guava</h3>
<p><a href="http://code.google.com/p/guava-libraries/">Guava</a> is Google's excellent here's-what-Java-is-missing library. It's
almost hard to distill everything that I like about this library, but I'm
going to try.</p>
<p><strong>Cache</strong> is a simple way to get an in-memory cache that can be used to cache
network access, disk access, memoize functions, or anything really. Just
implement a <a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/cache/CacheBuilder.html">CacheBuilder</a> which tells Guava how to build your
cache and you're all set!</p>
<p><strong>Immutable</strong> collections. There's a bunch of these: <a href="http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/ImmutableMap.html">ImmutableMap</a>,
<a href="http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/ImmutableList.html">ImmutableList</a>, or even <a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/ImmutableSortedMultiset.html">ImmutableSortedMultiSet</a>
if that's your style.</p>
<p>I also like writing mutable collections the Guava way:</p>
<div class="codehilite"><pre><span></span><span class="c1">// Instead of</span>
<span class="kd">final</span> <span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="p">,</span> <span class="n">Widget</span><span class="o">></span> <span class="n">map</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o"><</span><span class="n">String</span><span class="p">,</span> <span class="n">Widget</span><span class="o">></span><span class="p">();</span>
<span class="c1">// You can use</span>
<span class="kd">final</span> <span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="p">,</span> <span class="n">Widget</span><span class="o">></span> <span class="n">map</span> <span class="o">=</span> <span class="n">Maps</span><span class="p">.</span><span class="na">newHashMap</span><span class="p">();</span>
</pre></div>
<p>There are static classes for <a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/Lists.html">Lists</a>, <a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/Maps.html">Maps</a>, <a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/Sets.html">Sets</a> and
more. They're cleaner and easier to read.</p>
<p>If you're stuck with Java 6 or 7, you can use the <a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/Collections2.html">Collections2</a>
class, which has methods like filter and transform. They allow you to write
fluent code without <a href="http://www.java8.org/">Java 8</a>'s stream support.</p>
<p>Guava has simple things too, like a <strong>Joiner</strong> that joins strings on
separators and a <a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/util/concurrent/Uninterruptibles.html">class to handle interrupts</a> by ignoring them.</p>
<h3 id="gson">Gson</h3>
<p>Google's <a href="https://code.google.com/p/google-gson/">Gson</a> library is a simple and fast JSON parsing library. It
works like this:</p>
<div class="codehilite"><pre><span></span><span class="kd">final</span> <span class="n">Gson</span> <span class="n">gson</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Gson</span><span class="p">();</span>
<span class="kd">final</span> <span class="n">String</span> <span class="n">json</span> <span class="o">=</span> <span class="n">gson</span><span class="p">.</span><span class="na">toJson</span><span class="p">(</span><span class="n">fooWidget</span><span class="p">);</span>
<span class="kd">final</span> <span class="n">FooWidget</span> <span class="n">newFooWidget</span> <span class="o">=</span> <span class="n">gson</span><span class="p">.</span><span class="na">fromJson</span><span class="p">(</span><span class="n">json</span><span class="p">,</span> <span class="n">FooWidget</span><span class="p">.</span><span class="na">class</span><span class="p">);</span>
</pre></div>
<p>It's really easy and a pleasure to work with. The <a href="https://sites.google.com/site/gson/gson-user-guide">Gson user guide</a>
has many more examples.</p>
<h3 id="java-tuples">Java Tuples</h3>
<p>One of my on going annoyances with Java is that it doesn't have tuples built
into the standard library. Luckily, the <a href="http://www.javatuples.org/">Java tuples</a> project fixes
that.</p>
<p>It's simple to use and works great:</p>
<div class="codehilite"><pre><span></span><span class="n">Pair</span><span class="o"><</span><span class="n">String</span><span class="p">,</span> <span class="n">Integer</span><span class="o">></span> <span class="nf">func</span><span class="p">(</span><span class="n">String</span> <span class="n">input</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// something...</span>
<span class="k">return</span> <span class="n">Pair</span><span class="p">.</span><span class="na">with</span><span class="p">(</span><span class="n">stringResult</span><span class="p">,</span> <span class="n">intResult</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<h3 id="joda-time">Joda-Time</h3>
<p><a href="http://www.joda.org/joda-time/">Joda-Time</a> is easily the best time library I've ever used. Simple,
straightforward, easy to test. What else can you ask for? </p>
<p>You only need this if you're not yet on Java 8, as that has its own new
<a href="http://www.oracle.com/technetwork/articles/java/jf14-date-time-2125367.html">date time</a> library that doesn't suck.</p>
<h3 id="lombok">Lombok</h3>
<p><a href="http://projectlombok.org/">Lombok</a> is an interesting library. Through annotations, it allows you
to reduce the boilerplate that Java suffers from so badly.</p>
<p>Want setters and getters for your class variables? Simple:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">Foo</span> <span class="p">{</span>
<span class="nd">@Getter</span> <span class="nd">@Setter</span> <span class="kd">private</span> <span class="kt">int</span> <span class="n">var</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Now you can do this:</p>
<div class="codehilite"><pre><span></span><span class="kd">final</span> <span class="n">Foo</span> <span class="n">foo</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Foo</span><span class="p">();</span>
<span class="n">foo</span><span class="p">.</span><span class="na">setVar</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span>
</pre></div>
<p>And there's <a href="http://jnb.ociweb.com/jnb/jnbJan2010.html">so much more</a>. I haven't used Lombok in production
yet, but I can't wait to.</p>
<h3 id="play-framework">Play framework</h3>
<p><strong>Good alternatives</strong>: <a href="https://jersey.java.net/">Jersey</a> or <a href="http://www.sparkjava.com/">Spark</a></p>
<p>There are two main camps for doing RESTful web services in Java:
<a href="http://en.wikipedia.org/wiki/Java_API_for_RESTful_Web_Services">JAX-RS</a> and everything else.</p>
<p>JAX-RS is the traditional way. You combine annotations with interfaces and
implementations to form the web service using something like <a href="https://jersey.java.net/">Jersey</a>.
What's nice about this is you can easily make clients out of just the
interface class.</p>
<p>The <a href="http://www.playframework.com/">Play framework</a> is a radically different take on web services on
the JVM: you have a routes file and then you write the classes referenced in
those routes. It's actually an <a href="http://www.playframework.com/documentation/2.3.x/Anatomy">entire MVC framework</a>, but you can
easily use it for just REST web services.</p>
<p>It's available for both Java and Scala. It suffers slightly from being
Scala-first, but it's still good to use in Java.</p>
<p>If you're used to micro-frameworks like Flask in Python, <a href="http://www.sparkjava.com/">Spark</a> will
be very familiar. It works especially well with Java 8.</p>
<h3 id="slf4j">SLF4J</h3>
<p>There are a lot of Java logging solutions out there. My favorite is
<a href="http://www.slf4j.org/">SLF4J</a> because it's extremely pluggable and can combine logs from many
different logging frameworks at the same time. Have a weird project that uses
java.util.logging, JCL, and log4j? SLF4J is for you.</p>
<p>The <a href="http://www.slf4j.org/manual.html">two-page manual</a> is pretty much all you'll need to get
started.</p>
<h3 id="jooq">jOOQ</h3>
<p>I dislike heavy ORM frameworks because I like SQL. So I wrote a lot of
<a href="http://docs.spring.io/spring/docs/4.0.3.RELEASE/javadoc-api/org/springframework/jdbc/core/JdbcTemplate.html">JDBC templates</a> and it was sort of hard to maintain. <a href="http://www.jooq.org/">jOOQ</a> is a
much better solution.</p>
<p>It lets you write SQL in Java in a type safe way:</p>
<div class="codehilite"><pre><span></span><span class="c1">// Typesafely execute the SQL statement directly with jOOQ</span>
<span class="n">Result</span><span class="o"><</span><span class="n">Record3</span><span class="o"><</span><span class="n">String</span><span class="p">,</span> <span class="n">String</span><span class="p">,</span> <span class="n">String</span><span class="o">>></span> <span class="n">result</span> <span class="o">=</span>
<span class="n">create</span><span class="p">.</span><span class="na">select</span><span class="p">(</span><span class="n">BOOK</span><span class="p">.</span><span class="na">TITLE</span><span class="p">,</span> <span class="n">AUTHOR</span><span class="p">.</span><span class="na">FIRST_NAME</span><span class="p">,</span> <span class="n">AUTHOR</span><span class="p">.</span><span class="na">LAST_NAME</span><span class="p">)</span>
<span class="p">.</span><span class="na">from</span><span class="p">(</span><span class="n">BOOK</span><span class="p">)</span>
<span class="p">.</span><span class="na">join</span><span class="p">(</span><span class="n">AUTHOR</span><span class="p">)</span>
<span class="p">.</span><span class="na">on</span><span class="p">(</span><span class="n">BOOK</span><span class="p">.</span><span class="na">AUTHOR_ID</span><span class="p">.</span><span class="na">equal</span><span class="p">(</span><span class="n">AUTHOR</span><span class="p">.</span><span class="na">ID</span><span class="p">))</span>
<span class="p">.</span><span class="na">where</span><span class="p">(</span><span class="n">BOOK</span><span class="p">.</span><span class="na">PUBLISHED_IN</span><span class="p">.</span><span class="na">equal</span><span class="p">(</span><span class="mi">1948</span><span class="p">))</span>
<span class="p">.</span><span class="na">fetch</span><span class="p">();</span>
</pre></div>
<p>Using this and the <a href="http://www.javapractices.com/topic/TopicAction.do?Id=66">DAO</a> pattern, you can make database access a breeze.</p>
<h2 id="testing">Testing</h2>
<p>Testing is critical to your software. These packages help make it easier.</p>
<h3 id="junit-4">jUnit 4</h3>
<p><a href="http://junit.org/">jUnit</a> needs no introduction. It's the standard tool for unit testing
in Java.</p>
<p>But you're probably not using jUnit to its full potential. jUnit supports
<a href="https://github.com/junit-team/junit/wiki/Parameterized-tests">parametrized tests</a>, <a href="https://github.com/junit-team/junit/wiki/Rules">rules</a> to stop you from writing
so much boilerplate, <a href="https://github.com/junit-team/junit/wiki/Theories">theories</a> to randomly test certain code,
and <a href="https://github.com/junit-team/junit/wiki/Assumptions-with-assume">assumptions</a>.</p>
<h3 id="jmock">jMock</h3>
<p>If you've done your dependency injection, this is where it pays off: mocking
out code which has side effects (like talking to a REST server) and still
asserting behavior of code that calls it.</p>
<p><a href="http://jmock.org/">jMock</a> is the standard mocking tool for Java. It looks like this:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">FooWidgetTest</span> <span class="p">{</span>
<span class="kd">private</span> <span class="n">Mockery</span> <span class="n">context</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Mockery</span><span class="p">();</span>
<span class="nd">@Test</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">basicTest</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">final</span> <span class="n">FooWidgetDependency</span> <span class="n">dep</span> <span class="o">=</span> <span class="n">context</span><span class="p">.</span><span class="na">mock</span><span class="p">(</span><span class="n">FooWidgetDependency</span><span class="p">.</span><span class="na">class</span><span class="p">);</span>
<span class="n">context</span><span class="p">.</span><span class="na">checking</span><span class="p">(</span><span class="k">new</span> <span class="n">Expectations</span><span class="p">()</span> <span class="p">{{</span>
<span class="n">oneOf</span><span class="p">(</span><span class="n">dep</span><span class="p">).</span><span class="na">call</span><span class="p">(</span><span class="n">with</span><span class="p">(</span><span class="n">any</span><span class="p">(</span><span class="n">String</span><span class="p">.</span><span class="na">class</span><span class="p">)));</span>
<span class="n">atLeast</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="na">of</span><span class="p">(</span><span class="n">dep</span><span class="p">).</span><span class="na">optionalCall</span><span class="p">();</span>
<span class="p">}});</span>
<span class="kd">final</span> <span class="n">FooWidget</span> <span class="n">foo</span> <span class="o">=</span> <span class="k">new</span> <span class="n">FooWidget</span><span class="p">(</span><span class="n">dep</span><span class="p">);</span>
<span class="n">Assert</span><span class="p">.</span><span class="na">assertTrue</span><span class="p">(</span><span class="n">foo</span><span class="p">.</span><span class="na">doThing</span><span class="p">());</span>
<span class="n">context</span><span class="p">.</span><span class="na">assertIsSatisfied</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>This sets up a <em>FooWidgetDependency</em> via jMock and then adds expectations. We
expect that <em>dep</em>'s <em>call</em> method will be called once with some String and that
<em>dep</em>'s <em>optionalCall</em> method will be called zero or more times.</p>
<p>If you have to set up the same dependency over and over, you should probably
put that in a <a href="https://github.com/junit-team/junit/wiki/Test-fixtures">test fixture</a> and put <em>assertIsSatisfied</em> in an
<em>@After</em> fixture.</p>
<h3 id="assertj">AssertJ</h3>
<p>Do you ever do this with jUnit?</p>
<div class="codehilite"><pre><span></span><span class="kd">final</span> <span class="n">List</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">result</span> <span class="o">=</span> <span class="n">some</span><span class="p">.</span><span class="na">testMethod</span><span class="p">();</span>
<span class="n">assertEquals</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">result</span><span class="p">.</span><span class="na">size</span><span class="p">());</span>
<span class="n">assertTrue</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="na">contains</span><span class="p">(</span><span class="s">"some result"</span><span class="p">));</span>
<span class="n">assertTrue</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="na">contains</span><span class="p">(</span><span class="s">"some other result"</span><span class="p">));</span>
<span class="n">assertFalse</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="na">contains</span><span class="p">(</span><span class="s">"shouldn't be here"</span><span class="p">));</span>
</pre></div>
<p>This is just annoying boilerplate. <a href="http://joel-costigliola.github.io/assertj/index.html">AssertJ</a> solves this. You can
transform the same code into this:</p>
<div class="codehilite"><pre><span></span><span class="n">assertThat</span><span class="p">(</span><span class="n">some</span><span class="p">.</span><span class="na">testMethod</span><span class="p">()).</span><span class="na">hasSize</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span>
<span class="p">.</span><span class="na">contains</span><span class="p">(</span><span class="s">"some result"</span><span class="p">,</span> <span class="s">"some other result"</span><span class="p">)</span>
<span class="p">.</span><span class="na">doesNotContain</span><span class="p">(</span><span class="s">"shouldn't be here"</span><span class="p">);</span>
</pre></div>
<p>This fluent interface makes your tests more readable. What more could you want?</p>
<h1 id="tools">Tools</h1>
<h2 id="intellij-idea">IntelliJ IDEA</h2>
<p><strong>Good alternatives</strong>: <a href="https://www.eclipse.org/">Eclipse</a> and <a href="https://netbeans.org/">Netbeans</a></p>
<p>The best Java IDE is <a href="http://www.jetbrains.com/idea/">IntelliJ IDEA</a>. It has a ton of awesome
features, and is really the main thing that makes the verbosity of Java
bareable. Autocomplete is great,
<a href="http://i.imgur.com/92ztcCd.png">the inspections are top notch</a>, and the refactoring
tools are really helpful.</p>
<p>The free community edition is good enough for me, but there are loads of great
features in the Ultimate edition like database tools, Spring Framework support
and Chronon.</p>
<h3 id="chronon">Chronon</h3>
<p>One of my favorite features of GDB 7 was the ability to travel back in time
when debugging. This is possible with the <a href="http://blog.jetbrains.com/idea/2014/03/try-chronon-debugger-with-intellij-idea-13-1-eap/">Chronon IntelliJ plugin</a>
when you get the Ultimate edition.</p>
<p>You get variable history, step backwards, method history and more. It's a
little strange to use the first time, but it can help debug some really
intricate bugs, Heisenbugs and the like.</p>
<h2 id="jrebel">JRebel</h2>
<p>Continuous integration is often a goal of software-as-a-service products. What
if you didn't even need to wait for the build to finish to see code changes
live?</p>
<p>That's what <a href="http://zeroturnaround.com/software/jrebel/">JRebel</a> does. Once you hook up your server to your JRebel
client, you can see changes on your server instantly. It's a huge time savings
when you want to experiment quickly.</p>
<h2 id="the-checker-framework">The Checker Framework</h2>
<p>Java's type system is pretty weak. It doesn't differentiate between Strings
and Strings that are actually regular expressions, nor does it do any
<a href="http://en.wikipedia.org/wiki/Taint_checking">taint checking</a>. However, <a href="http://types.cs.washington.edu/checker-framework/">the Checker Framework</a>
does this and more.</p>
<p>It uses annotations like <em>@Nullable</em> to check types. You can even define
<a href="http://types.cs.washington.edu/checker-framework/tutorial/webpages/encryption-checker-cmd.html">your own annotations</a> to make the static analysis done even
more powerful.</p>
<h2 id="eclipse-memory-analyzer">Eclipse Memory Analyzer</h2>
<p>Memory leaks happen, even in Java. Luckily, there are tools for that. The best
tool I've used to fix these is the <a href="http://www.eclipse.org/mat/">Eclipse Memory Analyzer</a>. It takes a
heap dump and lets you find the problem.</p>
<p>There's a few ways to get a heap dump for a JVM process, but I use
<a href="http://docs.oracle.com/javase/7/docs/technotes/tools/share/jmap.html">jmap</a>:</p>
<div class="codehilite"><pre><span></span>$ jmap -dump:live,format<span class="o">=</span>b,file<span class="o">=</span>heapdump.hprof -F <span class="m">8152</span>
Attaching to process ID <span class="m">8152</span>, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is <span class="m">23</span>.25-b01
Dumping heap to heapdump.hprof ...
... snip ...
Heap dump file created
</pre></div>
<p>Then you can open the <em>heapdump.hprof</em> file with the Memory Analyzer and see
what's going on fast.</p>
<h1 id="resources">Resources</h1>
<p>Resources to help you become a Java master.</p>
<h2 id="books">Books</h2>
<ul>
<li><a href="http://www.amazon.com/Effective-Java-Edition-Joshua-Bloch/dp/0321356683">Effective Java</a></li>
<li><a href="http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601">Java Concurrency in Practice</a></li>
</ul>
<h2 id="podcasts">Podcasts</h2>
<ul>
<li><a href="http://www.javaposse.com/">The Java Posse</a></li>
</ul>
<p><a href="https://github.com/cxxr/better-java"><img style="position: absolute; top: 0; left: 0; border: 0;" src="https://camo.githubusercontent.com/567c3a48d796e2fc06ea80409cc9dd82bf714434/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f6c6566745f6461726b626c75655f3132313632312e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_left_darkblue_121621.png"></a></p>When names outlive their usefulness2014-04-21T16:30:00-07:002014-04-21T16:30:00-07:00Sean Cassidytag:www.seancassidy.me,2014-04-21:/when-names-outlive-their-usefulness.html<p>For years I've been using <a href="http://www.unix.com/man-page/linux/1/mkpasswd/">mkpasswd</a> in Linux to generate
the occasional password. It seemed like the perfect tool and it was
installed nearly everywhere.</p>
<p>The one weird bit was that it prompts you for a password<sup id="fnref:other"><a class="footnote-ref" href="#fn:other">1</a></sup>. Like this:</p>
<div class="codehilite"><pre><span></span>$ mkpasswd
Password:
</pre></div>
<p>I just always banged on my keyboard, and …</p><p>For years I've been using <a href="http://www.unix.com/man-page/linux/1/mkpasswd/">mkpasswd</a> in Linux to generate
the occasional password. It seemed like the perfect tool and it was
installed nearly everywhere.</p>
<p>The one weird bit was that it prompts you for a password<sup id="fnref:other"><a class="footnote-ref" href="#fn:other">1</a></sup>. Like this:</p>
<div class="codehilite"><pre><span></span>$ mkpasswd
Password:
</pre></div>
<p>I just always banged on my keyboard, and I got a random looking password.
Certainly good enough for my purposes, right?</p>
<h1 id="it-doesnt-generate-passwords">It doesn't generate passwords</h1>
<p>My friend was recently talking about how he used openssl to generate passwords
for databases and things where you don't need to memorize the password. Like
this:</p>
<div class="codehilite"><pre><span></span>$ openssl rand -base64 <span class="m">10</span>
<span class="nv">IQR5X92MB3wz6zSKfLw</span><span class="o">=</span>
</pre></div>
<p>I laughed and asked him why he wasn't using the included utility which does
exactly that: make passwords. He told me that mkpasswd actually <em>doesn't</em>
generate passwords, it actually generates the hashed and salted password that
goes in /etc/shadow.</p>
<p>And he's right.</p>
<p>But "So what," I said, the salt it uses is probably random enough. Probably
32-bits or more, certainly enough for a password. So, I checked:</p>
<div class="codehilite"><pre><span></span>$ apt-get <span class="nb">source</span> whois
$ <span class="nb">cd</span> whois*/
$ vim mkpasswd.c
</pre></div>
<p>I found this:</p>
<div class="codehilite"><pre><span></span><span class="k">static</span> <span class="k">const</span> <span class="k">struct</span> <span class="n">crypt_method</span> <span class="n">methods</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
<span class="cm">/* method prefix minlen, maxlen rounds description */</span>
<span class="p">{</span> <span class="s">"des"</span><span class="p">,</span> <span class="s">""</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span>
<span class="n">N_</span><span class="p">(</span><span class="s">"standard 56 bit DES-based crypt(3)"</span><span class="p">)</span> <span class="p">},</span>
<span class="p">{</span> <span class="s">"md5"</span><span class="p">,</span> <span class="s">"$1$"</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s">"MD5"</span> <span class="p">},</span>
<span class="cp">#if defined HAVE_SHA_CRYPT</span>
<span class="cm">/* http://people.redhat.com/drepper/SHA-crypt.txt */</span>
<span class="p">{</span> <span class="s">"sha-256"</span><span class="p">,</span> <span class="s">"$5$"</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="s">"SHA-256"</span> <span class="p">},</span>
<span class="p">{</span> <span class="s">"sha-512"</span><span class="p">,</span> <span class="s">"$6$"</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="s">"SHA-512"</span> <span class="p">},</span>
<span class="cp">#endif</span>
</pre></div>
<p>So there's at least two methods to generate hashed passwords: the old DES
style, and the newer MD5 based style. If you have it enabled, the SHA-256
version could be used. I figured that of course mkpasswd used the newer
style, so I was safe. The random salt would be good enough.</p>
<p>Right?</p>
<h1 id="what-it-does">What it does</h1>
<p>Does it pick MD5 by default? Or even better, pick SHA-256 if it's available?</p>
<div class="codehilite"><pre><span></span><span class="cm">/* default: DES password */</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">salt_prefix</span><span class="p">)</span> <span class="p">{</span>
<span class="n">salt_minlen</span> <span class="o">=</span> <span class="n">methods</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">minlen</span><span class="p">;</span>
<span class="n">salt_maxlen</span> <span class="o">=</span> <span class="n">methods</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">maxlen</span><span class="p">;</span>
<span class="n">salt_prefix</span> <span class="o">=</span> <span class="n">methods</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">prefix</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>I guess not. It picks DES by default, as that's first in the array.. Which
means the salt length is exactly two. How many possibilities does that leave
us?</p>
<div class="codehilite"><pre><span></span><span class="k">static</span> <span class="k">const</span> <span class="kt">char</span> <span class="n">valid_salts</span><span class="p">[]</span> <span class="o">=</span> <span class="s">"abcdefghijklmnopqrstuvwxyz"</span>
<span class="s">"ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./"</span><span class="p">;</span>
</pre></div>
<p>64 for each of the two bytes. This leaves 4096 possibilities for each password
you type. And if you don't type any password, like I did at least once, your
password is only one of 4096 possibilities.</p>
<p><a href="https://www.seancassidy.me/etc/passwords.txt">Here's the list of the 4096 "passwords"</a> if
you just press enter at the mkpasswd "Password:" prompt. If these aren't
already in a password dictionary I'd be surprised.</p>
<p>Lesson learned: read the manpage. And be careful when choosing a name for
your project<sup id="fnref:names"><a class="footnote-ref" href="#fn:names">2</a></sup>. It might outlive its usefulness.</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:other">
<p>Sometimes. It depends on which version is installed. See the other
<a href="http://linux.die.net/man/1/mkpasswd">mkpasswd's manpage</a>. <a class="footnote-backref" href="#fnref:other" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:names">
<p>The origin of this simply comes from the fact that it was used to
make the /etc/passwd file, which used to contain the hashed passwords before
/etc/shadow was created. So it wasn't named poorly then, but it is now. <a class="footnote-backref" href="#fnref:names" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Diagnosis of the OpenSSL Heartbleed Bug2014-04-07T16:30:00-07:002014-04-07T16:30:00-07:00Sean Cassidytag:www.seancassidy.me,2014-04-07:/diagnosis-of-the-openssl-heartbleed-bug.html<p>When I wrote about the <a href="https://www.seancassidy.me/the-story-of-the-gnutls-bug.html">GnuTLS bug</a>, I said that this isn't the last severe TLS stack bug we'd see. I didn't expect it to be quite this bad, however.</p>
<p><a href="http://heartbleed.com/">The Heartbleed bug</a> is a particularly nasty bug. It allows an attacker to read up to 64KB of memory, and …</p><p>When I wrote about the <a href="https://www.seancassidy.me/the-story-of-the-gnutls-bug.html">GnuTLS bug</a>, I said that this isn't the last severe TLS stack bug we'd see. I didn't expect it to be quite this bad, however.</p>
<p><a href="http://heartbleed.com/">The Heartbleed bug</a> is a particularly nasty bug. It allows an attacker to read up to 64KB of memory, and the security researchers have said:</p>
<blockquote>
<p>Without using any privileged information or credentials we were able steal
from ourselves the secret keys used for our X.509 certificates, user names
and passwords, instant messages, emails and business critical documents and
communication.</p>
</blockquote>
<p>How could this happen? Let's read the code and find out.</p>
<h1 id="the-bug">The bug</h1>
<p><a href="http://git.openssl.org/gitweb/?p=openssl.git;a=commitdiff;h=96db9023b881d7cd9f379b0c154650d6c108e9a3">The fix</a> starts here, in <em>ssl/d1_both.c</em>:</p>
<div class="codehilite"><pre><span></span><span class="kt">int</span>
<span class="nf">dtls1_process_heartbeat</span><span class="p">(</span><span class="n">SSL</span> <span class="o">*</span><span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="o">&</span><span class="n">s</span><span class="o">-></span><span class="n">s3</span><span class="o">-></span><span class="n">rrec</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="o">*</span><span class="n">pl</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">hbtype</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">payload</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">padding</span> <span class="o">=</span> <span class="mi">16</span><span class="p">;</span> <span class="cm">/* Use minimum padding */</span>
</pre></div>
<p>So, first we get a pointer to the data within an SSLv3 record. That looks like this:</p>
<div class="codehilite"><pre><span></span><span class="k">typedef</span> <span class="k">struct</span> <span class="n">ssl3_record_st</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">type</span><span class="p">;</span> <span class="cm">/* type of record */</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">length</span><span class="p">;</span> <span class="cm">/* How many bytes available */</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">off</span><span class="p">;</span> <span class="cm">/* read/write offset into 'buf' */</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">data</span><span class="p">;</span> <span class="cm">/* pointer to the record data */</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">input</span><span class="p">;</span> <span class="cm">/* where the decode bytes are */</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">comp</span><span class="p">;</span> <span class="cm">/* only used with decompression - malloc()ed */</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">epoch</span><span class="p">;</span> <span class="cm">/* epoch number, needed by DTLS1 */</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">seq_num</span><span class="p">[</span><span class="mi">8</span><span class="p">];</span> <span class="cm">/* sequence number, needed by DTLS1 */</span>
<span class="p">}</span> <span class="n">SSL3_RECORD</span><span class="p">;</span>
</pre></div>
<p>Records have a type, a length, and data. Back to dtls1_process_heartbeat:</p>
<div class="codehilite"><pre><span></span><span class="cm">/* Read type and payload length first */</span>
<span class="n">hbtype</span> <span class="o">=</span> <span class="o">*</span><span class="n">p</span><span class="o">++</span><span class="p">;</span>
<span class="n">n2s</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">payload</span><span class="p">);</span>
<span class="n">pl</span> <span class="o">=</span> <span class="n">p</span><span class="p">;</span>
</pre></div>
<p>The first byte of the SSLv3 record is the heartbeat type. The macro <em>n2s</em> takes
two bytes from p, and puts them in payload. This is actually the <em>length</em> of
the payload. Note that the actual length in the SSLv3 record is not checked.</p>
<p>The variable <em>pl</em> is then the resulting heartbeat data, supplied by the requester.</p>
<p>Later in the function, it does this:</p>
<div class="codehilite"><pre><span></span><span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span><span class="p">,</span> <span class="o">*</span><span class="n">bp</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">r</span><span class="p">;</span>
<span class="cm">/* Allocate memory for the response, size is 1 byte</span>
<span class="cm"> * message type, plus 2 bytes payload length, plus</span>
<span class="cm"> * payload, plus padding</span>
<span class="cm"> */</span>
<span class="n">buffer</span> <span class="o">=</span> <span class="n">OPENSSL_malloc</span><span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">+</span> <span class="n">payload</span> <span class="o">+</span> <span class="n">padding</span><span class="p">);</span>
<span class="n">bp</span> <span class="o">=</span> <span class="n">buffer</span><span class="p">;</span>
</pre></div>
<p>So we're allocating as much memory as the requester asked for: up to
65535+1+2+16, to be precise. The variable <em>bp</em> is going to be the pointer used
for accessing this memory. Then:</p>
<div class="codehilite"><pre><span></span><span class="cm">/* Enter response type, length and copy payload */</span>
<span class="o">*</span><span class="n">bp</span><span class="o">++</span> <span class="o">=</span> <span class="n">TLS1_HB_RESPONSE</span><span class="p">;</span>
<span class="n">s2n</span><span class="p">(</span><span class="n">payload</span><span class="p">,</span> <span class="n">bp</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">bp</span><span class="p">,</span> <span class="n">pl</span><span class="p">,</span> <span class="n">payload</span><span class="p">);</span>
</pre></div>
<p>The macro <em>s2n</em> does the inverse of <em>n2s</em>: it takes a 16-bit value and puts it
into two bytes. So it puts the same payload length requested.</p>
<p>Then it copies <em>payload</em> bytes from <em>pl</em>, the user supplied data, to the newly
allocated <em>bp</em> array. After this, it sends this all back to the user. So
where's the bug?</p>
<h2 id="the-user-controls-payload-and-pl">The user controls payload and pl</h2>
<p>What if the requester didn't actually supply <em>payload</em> bytes, like she said she
did? What if <em>pl</em> really is only one byte? Then the read from <em>memcpy</em> is going
to read whatever memory was near the SSLv3 record and within the same process.</p>
<p>And apparently, there's a lot of stuff nearby.</p>
<p>There are two ways memory is dynamically allocated with <em>malloc</em> (at least on
Linux): using <a href="http://linux.die.net/man/2/sbrk">sbrk(2)</a> and using <a href="http://man7.org/linux/man-pages/man2/mmap.2.html">mmap(2)</a>. If the memory is
allocated with <em>sbrk</em>, then it uses the old heap-grows-up rules and limits
what can be found with this, although multiple requests (especially
simultaneously) could still find some fun stuff<sup id="fnref:update"><a class="footnote-ref" href="#fn:update">1</a></sup>.</p>
<p>The allocations for <em>bp</em> don't matter at all, actually. The allocation for
<em>pl</em>, however, matters a great deal. It's almost certainly allocated with
<em>sbrk</em> because of the <em>mmap</em> threshold in <em>malloc</em>. However, interesting
stuff (like documents or user info), is very likely to be allocated with
<em>mmap</em> and might be reachable from <em>pl</em>. Multiple simultaneous requests will
also make some interesting data available.</p>
<p>And your secret keys will probably be available:</p>
<div style="margin-left:20%">
<blockquote class="twitter-tweet" lang="en"><p>Just cracked <a href="https://twitter.com/CloudFlare">@CloudFlare</a> ’s challenge: <a href="https://t.co/8ZPSxyKF4D">https://t.co/8ZPSxyKF4D</a> . I wonder when they’ll update the page.</p>— Fedor Indutny (@indutny) <a href="https://twitter.com/indutny/statuses/454761620259225600">April 11, 2014</a></blockquote></div>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<h2 id="the-fix">The fix</h2>
<p>The most important part of the fix was this:</p>
<div class="codehilite"><pre><span></span><span class="cm">/* Read type and payload length first */</span>
<span class="k">if</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">16</span> <span class="o">></span> <span class="n">s</span><span class="o">-></span><span class="n">s3</span><span class="o">-></span><span class="n">rrec</span><span class="p">.</span><span class="n">length</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="cm">/* silently discard */</span>
<span class="n">hbtype</span> <span class="o">=</span> <span class="o">*</span><span class="n">p</span><span class="o">++</span><span class="p">;</span>
<span class="n">n2s</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">payload</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">+</span> <span class="n">payload</span> <span class="o">+</span> <span class="mi">16</span> <span class="o">></span> <span class="n">s</span><span class="o">-></span><span class="n">s3</span><span class="o">-></span><span class="n">rrec</span><span class="p">.</span><span class="n">length</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="cm">/* silently discard per RFC 6520 sec. 4 */</span>
<span class="n">pl</span> <span class="o">=</span> <span class="n">p</span><span class="p">;</span>
</pre></div>
<p>This does two things: the first check stops zero-length heartbeats. The second
check checks to make sure that the actual record length is sufficiently long.
That's it.</p>
<h1 id="lessons">Lessons</h1>
<p>What can we learn from this?</p>
<p>I'm a fan of C. It was my first programming language and it was the first
language I felt comfortable using professionally. But I see its limitations
more clearly now than I have ever before.</p>
<p>Between this and the <a href="https://www.seancassidy.me/the-story-of-the-gnutls-bug.html">GnuTLS bug</a>, I think that we
need to do three things:</p>
<ol>
<li><a href="https://www.openssl.org/support/donations.html">Pay money for security audits</a> of critical security infrastructure
like OpenSSL</li>
<li>Write lots of unit and integration tests for these libraries</li>
<li>Start writing alternatives in safer languages</li>
</ol>
<p>Given how difficult it is to write safe C, I don't see any other options. I
would donate to this effort. Would you?</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:update">
<p>This section originally contained my skepticism about the feasability
of a PoC due to the nature of how the heap works via <em>sbrk</em>. Neel Mehta has
validated some of my concerns, but there are many reports of secret key discovery
out there. <a class="footnote-backref" href="#fnref:update" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>The Intuition Trap2014-03-28T08:05:00-07:002014-03-28T08:05:00-07:00Sean Cassidytag:www.seancassidy.me,2014-03-28:/the-intuition-trap.html<p>Software cannot be made to be intuitive to every person.</p>
<p>Intuition is but one of three aspects of ease-of-use in software. They are, in
no particular order:</p>
<ol>
<li>Intuition</li>
<li>Least surprise</li>
<li>Time to resolution</li>
</ol>
<p>Intuition is the measure of how little documentation is needed to accomplish a
particular task. One leans …</p><p>Software cannot be made to be intuitive to every person.</p>
<p>Intuition is but one of three aspects of ease-of-use in software. They are, in
no particular order:</p>
<ol>
<li>Intuition</li>
<li>Least surprise</li>
<li>Time to resolution</li>
</ol>
<p>Intuition is the measure of how little documentation is needed to accomplish a
particular task. One leans on expected behavior of similar software. If your
software is not intuitive, users would need to consult the manual often to use
your software effectively. They might even need to read a book on it.</p>
<p>Least surprise (also known as <a href="http://en.wikipedia.org/wiki/Principle_of_least_astonishment">the Principle of Least
Astonishment</a>)
is a measure of how internally consistent your software is. When your users go
do a task, how often does something surprising happen?</p>
<p>Time to resolution is the measure of how long it takes a user of your software
to complete a given task. For easy-to-use software, it should not take too long
to accomplish most tasks.</p>
<p>These three aspects are not entirely independent variables. </p>
<p>Improving the consistency of your application's UI, for instance, will increase
all three aspects to varying degrees. Least surprise will be the most impacted, as
users will be used to how your software works, which will, in turn, improve the
time to resolution.</p>
<p>Intuition and least surprise are also very closely connected, but they are
distinct. Intuition is a measure of how consistent your software is with the
external world. Least surprise is about internal consistency.</p>
<h1 id="intuition">Intuition</h1>
<p>Often people will overestimate the importance of intuition to the detriment of
the other two aspects. If your software is used repeatedly, least surprise and
time to resolution will be more important. Intuition is merely a nice-to-have
in that case<sup id="fnref:onetime"><a class="footnote-ref" href="#fn:onetime">1</a></sup>.</p>
<p>Why? Because intuitive behavior to me is likely different than what
intuition is to you. We both come from different backgrounds and it will be
impossible to offer both me and you an ideally intuitive experience. I could
offer a modal editing feature for my WYSIWYG editor, but if you haven't used
vi or vim, you won't find it intuitive at all.</p>
<p>As software grows in breadth and depth, it becomes more and more difficult to
offer intuitive software. This is because there is yet another aspect of
software that relates to ease-of-use: power. Powerful software by its very
nature is often not intuitive.</p>
<p>The perfect example of this is git.</p>
<p>Using git, especially at first, is an exercise in how much bewilderment and
confusion you can stand. Even tasks that seem as if they should be simple
involves commands with strange names and stranger arguments.</p>
<p>So you Google for solutions. Luckily StackOverflow exists, otherwise you'd
have to go back to using Subversion.</p>
<p>But the git maintainers aren't concerned with intuition. They want you to take
the time to learn it. They want you to read the documentation and Pro Git.</p>
<p>Why? Because the design of git enables truly powerful actions. It lowers the
time to resolution of previously difficult tasks to low levels. It enables you
to do things that were previously impossible. And it's all due to git's
design, which may be unintuitive to you at first.</p>
<p>Once you have taken the time to understand git's design, git becomes
consistent<sup id="fnref:consistent"><a class="footnote-ref" href="#fn:consistent">2</a></sup> and doesn't surprise you anymore.</p>
<p>If intuition was the primary goal of git, many of the powerful actions would
be neutered. It would likely not have rebase, nor the index. What's intuitive
about the reflog?</p>
<p>You use your VCS every single day as a programmer. It's worth some loss
of intuition if you get can time to resolution way down.</p>
<h2 id="the-trap">The trap</h2>
<p>The intuition trap is the unfortunate situation that occurs when the desire for
intuitive software drives out the other aspects of ease-of-use.</p>
<p>Your users have grown accustomed to how intuitive your software is. When
something surprising or unintuitive happens, they are blindsided. Your software
has little documentation because of how intuitive it is or your users don't
need it. Now it's difficult or impossible to inform them of this surprising
behavior.</p>
<p>What do you do when your intuitive software has a necessary unintuitive aspect?
This is the intuition trap: you lose or frustrate users because of how
intuitive your software was.</p>
<p>Notable examples are called
<a href="http://en.wikipedia.org/wiki/Gotcha_%28programming%29">gotchas</a>. Everyone can
think of a few of these. You might think git is complicated, but when you're
merging branches with deleted files, it does the right thing. Subversion
doesn't.</p>
<h1 id="designing-your-software">Designing your software</h1>
<p>So, what aspect of easy-to-use software is most important to your product?</p>
<p>Is your product to be used without a manual, by non-technical users, or only
occasionally? Is getting as many users to use your product as possible critical
for its success? Then you better make it as intuitive as possible. Talk to
your users and find out how to do this and relentlessly iterate. Avoid the trap
if you can.</p>
<p>Is your product used by professionals or for complicated tasks? Is it likely
that users will read a manual or a book if they need to? Are the switching
costs for your software high? Or is your software novel<sup id="fnref:www"><a class="footnote-ref" href="#fn:www">3</a></sup>?</p>
<p>Then making your product internally consistent (least surprise) or powerful
(time to resolution) will be most important. Documentation driven
development<sup id="fnref:doc"><a class="footnote-ref" href="#fn:doc">4</a></sup> is the name of the game, so do it early and get it right.
Write blog posts and make videos. Write a book if you need to and release it
for free online.</p>
<p>Intuition is merely one aspect of easy-to-use software. As software gets more
complicated and more powerful, it will be the first to go. Get used to reading
books and documentation. They're here to stay.</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:onetime">
<p>Part of the reason for this is that intuition helps users start
using your software and doesn't pay dividends after that. I find recurring
benefits (like time to resolution and least surprise) more attractive.
Although, if your software is so alien that no one uses it, time to resolution
won't be very important at all. <a class="footnote-backref" href="#fnref:onetime" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:consistent">
<p>Yes, many of the git command names are inconsistent and doing
similar tasks requires different commands. The design of git is consistent even
if on the surface it seems scattered. Take the time to learn git, really. You
probably use it every day. <a class="footnote-backref" href="#fnref:consistent" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:www">
<p>Have you ever read
<a href="http://www.w3.org/History/1989/proposal.html">Tim Berners-Lee's proposal
for the World Wide Web</a>? It sounds foreign nowadays. Berners-Lee spends a
lot of time talking about categorization, keywords, and other things that seem,
now, tangential to the issue at hand. This is because the World Wide Web was,
by the fact of its radical nature, was not intuitive. If he was restricted to
making his software intuitive, he could never have developed the World Wide Web. <a class="footnote-backref" href="#fnref:www" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:doc">
<p>This is the development methodology we follow at <a
href="http://www.gosquadron.com">Squadron</a>. First, we write the
documentation for the feature. Then we make sure it's easy to understand and
that it's consistent with the rest of Squadron. Only then do we implement it.
Documentation is central to our workflow because we think it's critical to our
success. <a class="footnote-backref" href="#fnref:doc" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
</ol>
</div>Ambition2014-03-11T13:50:00-07:002014-03-11T13:50:00-07:00Sean Cassidytag:www.seancassidy.me,2014-03-11:/ambition.html<blockquote>
This field of glory is harvested, and the crop is already appropriated. But
new reapers will arise, and they, too, will seek a field.
<cite class="character"><a href="http://en.wikisource.org/wiki/The_Lyceum_Address">The Lyceum Address</a></cite>
<cite>Abraham Lincoln</cite>
</blockquote>
<p>I, like many people, have lofty ambitions. I would like to found a startup.
I want to make something that a …</p><blockquote>
This field of glory is harvested, and the crop is already appropriated. But
new reapers will arise, and they, too, will seek a field.
<cite class="character"><a href="http://en.wikisource.org/wiki/The_Lyceum_Address">The Lyceum Address</a></cite>
<cite>Abraham Lincoln</cite>
</blockquote>
<p>I, like many people, have lofty ambitions. I would like to found a startup.
I want to make something that a lot of people love to use. I want to make a
fundamental contribution to my field. I want to write well and be well read.
I want to have distinction. I want to be able to play a mean game of go and
maybe even learn Chinese.</p>
<p>Sometimes it seems like everyone else has succeeded and I'm failing.</p>
<p>That's nonsense, of course. I haven't failed. I have a Masters degree, and a
fine career in a booming field. But when I see someone else who wildly succeeds
where I am still trying, I get down. I feel as if an opportunity was taken from
me. I feel the gentle indifference of the world<sup id="fnref:camus"><a class="footnote-ref" href="#fn:camus">1</a></sup> washing over my meager
accomplishments.</p>
<p>I think a lot of people feel the same way. You see someone succeed who you don't
think has put in their dues. It happens every week<sup id="fnref:heuristic"><a class="footnote-ref" href="#fn:heuristic">2</a></sup>. Perhaps its a guy on
YouTube who inexplicably has over a million viewers a month. Or your friend who
landed their dream job. Or someone on the news who sells their crappy startup for
millions or billions.</p>
<p>Julius Caesar probably felt something similar when <a href="http://penelope.uchicago.edu/Thayer/E/Roman/Texts/Suetonius/12Caesars/Julius*.html">he saw a statue of Alexander the Great</a>. He was deeply dissatisfied with his accomplishments at that point in his life, which seems ridiculous to us looking back on Caesar. It seems like his object was, for him, too difficult to reach.</p>
<p>There are some ambitions that seem to defy any reasonable end goal. Empire building is one. Can you really ever be finished conquering the world? I imagine that Caesar wasn't finished with his life's work when he was assassinated. </p>
<p>I feel like the desire to make good software is actually one of these lofty ambitions. Software sucks, so much, that even great software engineers still make mediocre software. Can you ever be satisfied with your life's work when it's so obviously full of bugs and shortcomings? When in ten years it'll be laughed at and you will be ridiculed for having been so stupid? Software engineering is a profession of mediocrity.</p>
<p>But still we try.</p>
<p>And when we've written our software, and it's full of bugs, weeks late with less features than we promised, we sit back and tell ourselves that billion dollar startup acquisitions are a fluke. They're not real. It's a bubble. They were lucky.</p>
<p>They used the same tools we have available, and they built something that had immense value to someone. We finished some features that no one really wanted anyway, but we say that our sales engineers said they wanted it and they probably know what they're talking about. We fail to apply the same critical eye to our own projects, businesses, and ambitions. Who are we to say that what they are working on has little or no value when we barely know the value of what we create?</p>
<p>But these startup successes are <a href="http://en.wikipedia.org/wiki/Black_swan_theory">black swans</a>. There is little to be learned from such monumental events. Rational people cannot start ventures with the idea that they'll sell them for billions of dollars. The <a href="http://davidcummings.org/2011/02/24/the-expected-value-for-entrepreneurial-risk/">expected value of a startup</a> is <a href="http://online.wsj.com/news/articles/SB10000872396390443720204578004980476429190">closer to zero than to a billion dollars</a>. <a href="http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0020124">Most research is false</a>. Almost everyone is forgotten within a few generations<sup id="fnref:prestige"><a class="footnote-ref" href="#fn:prestige">3</a></sup>.</p>
<p>This doesn't mean that life is pointless or that we should stop writing software. Quite the opposite. If we can instead focus our energy on soluble problems and less on success, we can do some real good. It doesn't matter if the crop was already harvested if you aren't a farmer.</p>
<p>Imagine the world you want to live in. What's the difference between where we are now and where you want to be? Is the primary difference that you are better off than you are now? Or is it instead something more noble, where the lives of people around you are enriched?</p>
<p>I want to live in a better place, and I'm going to try to get us there. </p>
<p>My own prestige or lack thereof has nothing to do with it.</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:camus">
<p>"I opened myself to the gentle indifference of the world." from one of my all-time favorite books, The Stranger by Albert Camus. Worth a read if you want a different outlook on life. <a class="footnote-backref" href="#fnref:camus" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:heuristic">
<p>Part of this is surely the <a href="http://en.wikipedia.org/wiki/Availability_heuristic">availability heuristic</a> and the <a href="http://en.wikipedia.org/wiki/Affect_heuristic">affect heuristic</a>. News like this is more available than ever before, so it seems more common than ever. The magnitude of their success is so surprising that your own accomplishments seem inconsequential. <a class="footnote-backref" href="#fnref:heuristic" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:prestige">
<p>Which makes desiring prestige that much more strange. "It never ceases to amaze me: we all love ourselves more than other people, but care more about their opinion than our own." from Meditations by Marcus Aurelius <a class="footnote-backref" href="#fnref:prestige" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
</ol>
</div>The Story of the GnuTLS Bug2014-03-04T14:23:45-08:002014-03-04T14:23:45-08:00Sean Cassidytag:www.seancassidy.me,2014-03-04:/the-story-of-the-gnutls-bug.html<p>You might have heard about the <a href="http://arstechnica.com/security/2014/03/critical-crypto-bug-leaves-linux-hundreds-of-apps-open-to-eavesdropping/">critical GnuTLS bug</a> that was <a href="https://www.gitorious.org/gnutls/gnutls/commit/6aa26f78150ccbdf0aec1878a41c17c41d358a3b">recently fixed</a> recently. What's the deal with it? Why is it a big deal? What happened?</p>
<p>Here's the bug, in essence, in <a href="https://www.gitorious.org/gnutls/gnutls/source/1832e0be467d63c089cdebe3fb1158fc0be32e44:lib/x509/verify.c">lib/x509/verify.c</a>:</p>
<div class="codehilite"><pre><span></span><span class="cm">/* Checks if the issuer of a certificate is a</span>
<span class="cm"> * Certificate Authority, or if …</span></pre></div><p>You might have heard about the <a href="http://arstechnica.com/security/2014/03/critical-crypto-bug-leaves-linux-hundreds-of-apps-open-to-eavesdropping/">critical GnuTLS bug</a> that was <a href="https://www.gitorious.org/gnutls/gnutls/commit/6aa26f78150ccbdf0aec1878a41c17c41d358a3b">recently fixed</a> recently. What's the deal with it? Why is it a big deal? What happened?</p>
<p>Here's the bug, in essence, in <a href="https://www.gitorious.org/gnutls/gnutls/source/1832e0be467d63c089cdebe3fb1158fc0be32e44:lib/x509/verify.c">lib/x509/verify.c</a>:</p>
<div class="codehilite"><pre><span></span><span class="cm">/* Checks if the issuer of a certificate is a</span>
<span class="cm"> * Certificate Authority, or if the certificate is the same</span>
<span class="cm"> * as the issuer (and therefore it doesn't need to be a CA).</span>
<span class="cm"> *</span>
<span class="cm"> * Returns true or false, if the issuer is a CA,</span>
<span class="cm"> * or not.</span>
<span class="cm"> */</span>
<span class="k">static</span> <span class="kt">int</span>
<span class="nf">check_if_ca</span> <span class="p">(</span><span class="n">gnutls_x509_crt_t</span> <span class="n">cert</span><span class="p">,</span> <span class="n">gnutls_x509_crt_t</span> <span class="n">issuer</span><span class="p">,</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">flags</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">result</span><span class="p">;</span>
<span class="n">result</span> <span class="o">=</span>
<span class="n">_gnutls_x509_get_signed_data</span> <span class="p">(</span><span class="n">issuer</span><span class="o">-></span><span class="n">cert</span><span class="p">,</span> <span class="s">"tbsCertificate"</span><span class="p">,</span>
<span class="o">&</span><span class="n">issuer_signed_data</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">gnutls_assert</span> <span class="p">();</span>
<span class="k">goto</span> <span class="n">cleanup</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">result</span> <span class="o">=</span>
<span class="n">_gnutls_x509_get_signed_data</span> <span class="p">(</span><span class="n">cert</span><span class="o">-></span><span class="n">cert</span><span class="p">,</span> <span class="s">"tbsCertificate"</span><span class="p">,</span>
<span class="o">&</span><span class="n">cert_signed_data</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">gnutls_assert</span> <span class="p">();</span>
<span class="k">goto</span> <span class="n">cleanup</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// snip</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="nl">cleanup</span><span class="p">:</span>
<span class="c1">// cleanup type stuff</span>
<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Can you spot the bug?</p>
<p>Here's the (abridged) fixed version:</p>
<div class="codehilite"><pre><span></span> <span class="kt">int</span> <span class="n">result</span><span class="p">;</span>
<span class="n">result</span> <span class="o">=</span>
<span class="n">_gnutls_x509_get_signed_data</span> <span class="p">(</span><span class="n">issuer</span><span class="o">-></span><span class="n">cert</span><span class="p">,</span> <span class="s">"tbsCertificate"</span><span class="p">,</span>
<span class="o">&</span><span class="n">issuer_signed_data</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">gnutls_assert</span> <span class="p">();</span>
<span class="k">goto</span> <span class="n">fail</span><span class="p">;</span> <span class="c1">// CHANGED</span>
<span class="p">}</span>
<span class="c1">// snip</span>
<span class="nl">fail</span><span class="p">:</span> <span class="c1">// ADDED</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="nl">cleanup</span><span class="p">:</span>
<span class="c1">// cleanup type stuff</span>
<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>The bug was a disagreement between return value meanings. The function <em>check_if_ca</em> returns "true" or rather 1, when the certificate is a CA, and zero otherwise. However the other functions used return negative when they fail. In C, any integer value other than zero is regarded as a true value. So if the certificate is invalid, it's actually marked as a CA certificate.</p>
<p>So what's the implication of this? This function is used by <em>gnutls_x509_crt_verify</em>, which verifies x509 certificates. Invalid certificates can be passed off as genuine, even though they're invalid.</p>
<p>Based on this and the previous Apple bug, I don't think we've seen the last serious TLS stack bug. Testing TLS is notoriously difficult, as it's <a href="http://tools.ietf.org/html/rfc5246">pretty complicated</a>. Of course, <a href="http://osdir.com/ml/help-gnutls-gnu/2012-10/msg00039.html">GnuTLS doesn't have a great track record for correctness</a> in general, so this specifically isn't that surprising.</p>
<h1 id="return-values-in-c">Return values in C</h1>
<p>The bug is the disagreement about return values and true and false. In C, the situation about what to return for success verses failure is sort of complicated.</p>
<p>Let's take <a href="http://linux.die.net/man/2/socket">socket(2)</a> and <a href="http://linux.die.net/man/2/connect">connect(2)</a> as examples. To get an IPv4 socket in C, you need to do this:</p>
<div class="codehilite"><pre><span></span><span class="cp">#include</span> <span class="cpf"><sys/types.h></span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf"><sys/socket.h></span><span class="cp"></span>
<span class="c1">// in some function</span>
<span class="kt">int</span> <span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="p">(</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">SOCK_STREAM</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</pre></div>
<p>How do you know if this socket is valid? That is, you successfully got a valid socket that you can call connect with?</p>
<p>You test <em>s</em>. The man page says that on error, -1 is returned. So a common way to test this would be:</p>
<div class="codehilite"><pre><span></span><span class="k">if</span> <span class="p">(</span><span class="n">s</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">perror</span><span class="p">(</span><span class="s">"Couldn't get socket"</span><span class="p">);</span>
<span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span> <span class="c1">// or whatever you want to do</span>
<span class="p">}</span>
</pre></div>
<p>Easy enough. Let's connect with our socket now:</p>
<div class="codehilite"><pre><span></span><span class="err">struct sockaddr_in addr; // set this up somehow</span>
<span class="err">if (connect(s, &addr, sizeof(addr)) {</span>
<span class="err"> perror("Couldn't connect"); </span>
<span class="err"> exit(EXIT_FAILURE);</span>
<span class="err">}</span>
</pre></div>
<p>Because connect returns <em>zero</em> on success, the error check for connect looks backwards. If we've connected, connect will return 0, which evaluates to false. So to avoid this confusion, most C programmers would add an explicit check for less than zero.</p>
<p>Due to this historical standard, you have essentially two options:</p>
<ol>
<li>Follow the C tradition and return zero for success and non-zero (or less than zero, it depends) for failure.</li>
<li><a href="http://stackoverflow.com/questions/385975/error-handling-in-c-code">Return explicit error codes</a> that should be checked.</li>
</ol>
<p>GnuTLS used a third option, which is the opposite of the first one, returning 1 for success and 0 for failure, and then mixed that with code that used the C traditional method.</p>
<p>If you think this is all just nonsense and that you should use exceptions instead, it's not really clear that that's better in all cases. Martin Sústrik, the author of ZeroMQ, <a href="http://250bpm.com/blog:4">wishes he wrote ZeroMQ in C</a> rather than C++ with exceptions.</p>
<h1 id="git-blame">git blame</h1>
<p>Let's find out how this bug got in here in the first place.</p>
<div class="codehilite"><pre><span></span>$ git clone https://git.gitorious.org/gnutls/gnutls.git
$ git checkout 6aa26f78150ccbdf^
</pre></div>
<p>Now we're at the commit before the bug fix. Let's run <a href="https://www.gitorious.org/gnutls/gnutls/blame/895102b77dabad95e1fe82fdae5109fe4cb83179:lib/x509/verify.c">git blame</a> and see who edited what when:</p>
<div class="codehilite"><pre><span></span>$ git blame lib/x509/verify.c
</pre></div>
<p>The problem lines around 141-145 were edited by Simon Josefsson, one of the two maintainers of GnuTLS, in 2005. Wow this bug is old! </p>
<p>If you keep going, to <a href="https://www.gitorious.org/gnutls/gnutls/blame/af21484a8daf15da48f7d00c4fe632708a757a64:lib/x509/verify.c">a5891d7^</a>, then to <a href="https://www.gitorious.org/gnutls/gnutls/blame/e0781d87ea58ff1ac1b6439d60510a8a26ea9f54:lib/x509/verify.c">802e1ed^</a> you'll finally arrive at <a href="https://www.gitorious.org/gnutls/gnutls/blame/4a288531e874f10a0c250ca52d1cd102bce4ffa6:lib/x509/verify.c">0fba2d9^</a> the first version without the bug.</p>
<p>So it was <a href="https://www.gitorious.org/gnutls/gnutls/commit/0fba2d908da6d0df821991ea5fdbeeda0f4ff089">0fba2d9</a> that caused the issue. Why?</p>
<p>This commit was a large refactor of several parts of the certificate code. Many new functions were written which follow the old C-style error handling (return less than zero for failure, zero for success), such as <em>_gnutls_x509_get_signature</em>.</p>
<p>In the same commit, Nikos refactors <em>check_if_ca</em> and it looks remarkably similar to the traditional C-style error handling the other methods he was adding and refactoring. It looks like he just forgot that this wasn't a C-style error handling method at all, but a true-means-true one.</p>
<h1 id="lessons">Lessons</h1>
<p>What can we learn from this?</p>
<p>If you're writing in C, you need to pick one of the two C error handling model and stick with it. Aggressively refactor any code that doesn't match your error handling model choice. </p>
<p><a href="http://sethrobertson.github.io/GitBestPractices/#commit">Use smaller commits</a>. It's more likely that Nikos or someone reviewing his commit would have seen the error in his diff if it was smaller. He might even have not made the mistake in the first place, as the refactor would have been different for <em>check_if_ca</em>.</p>
<p>Perhaps most important of all, test your crypto code. It's <a href="https://www.seancassidy.me/wrong-solutions.html">the biggest win</a> for quality. The public facing method, <em>gnutls_x509_crt_verify</em> actually still doesn't have any unit tests for it. This should change.</p>
<p>C can be hard to get right, so it's important to be strict in how you write it and it's important to test it well. We haven't seen the last critical TLS stack bug, so don't be surprised when the next one comes. Maybe we should pool our money and pay some security auditors to audit common TLS implementations that we all use daily.</p>Wrong Solutions2014-02-22T19:32:00-08:002014-02-22T19:32:00-08:00Sean Cassidytag:www.seancassidy.me,2014-02-22:/wrong-solutions.html<p>The reactions to <a href="http://support.apple.com/kb/HT6147">the latest major security hole</a> in a popular
operating system have been amusing to watch. If you haven't yet read
<a href="https://www.imperialviolet.org/2014/02/22/applebug.html">Alex Langley's analysis</a>, do it now, it's very good.</p>
<p>For reference, the bug that in non-TLS v1.2 SSL connections, the signature was
not checked, allowing impersonations …</p><p>The reactions to <a href="http://support.apple.com/kb/HT6147">the latest major security hole</a> in a popular
operating system have been amusing to watch. If you haven't yet read
<a href="https://www.imperialviolet.org/2014/02/22/applebug.html">Alex Langley's analysis</a>, do it now, it's very good.</p>
<p>For reference, the bug that in non-TLS v1.2 SSL connections, the signature was
not checked, allowing impersonations. This was because the code skipped to the
end of the method always, without actually checking the hash result was
correct.</p>
<p>What's the root cause of such a bug? I've been looking around today and here's
a short list of reasons people gave:</p>
<ul>
<li>It's hard to write correct code in C</li>
<li>Apple engineers are bad at their jobs</li>
<li><a href="https://news.ycombinator.com/item?id=7282026">The coding style allows omiting curly braces</a></li>
<li>No formal code reviews at Apple</li>
<li>Use of gotos</li>
<li>Automated merge artifact</li>
<li>Not making compiler warn about dead code</li>
<li>Not using static analysis tools</li>
<li>Lack of testing</li>
</ul>
<p>All of these reasons<sup id="fnref:goto"><a class="footnote-ref" href="#fn:goto">1</a></sup> likely played a role. Is there a single, dominant root
cause?</p>
<p><a href="https://gist.github.com/alexyakoubian/9151610/revisions#L631">The diff of the code</a> isn't really helpful for figuring this out, as the
bug on line 631 seems to arise from nowhere. Perhaps a merge artifact or a
stupid copy/paste error.</p>
<p>Given that it's difficult or impossible to give a single root cause for this
bug, what can we say to do about it? Lots of people are saying that it's the
lack of curly braces around the code in question. People like Zed:</p>
<div style="margin-left:10%"><blockquote class="twitter-tweet" lang="en"><p>Clearly the dude who wrote this
Apple SSL C code didn't read my C book <a
href="https://t.co/dZodpR6Aox">https://t.co/dZodpR6Aox</a> ALWAYS USE
BRACES!</p>— zedshaw (@zedshaw) <a
href="https://twitter.com/zedshaw/statuses/437384411789553664">February 23,
2014</a></blockquote></div>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>This is <a href="http://en.wikipedia.org/wiki/5_Whys">the 5 Whys</a> version of <a href="http://en.wikipedia.org/wiki/Parkinson%27s_law_of_triviality">bike shedding</a>. The
root cause of this problem was not a lack of curly braces. It could not
possibly have been as the insertion of the extra goto makes no sense with or
without braces.</p>
<p>Why is it that we, as programmers, often complain most about coding style and
not about correctness during code reviews? While bad coding style can hide
bugs, it does not in-and-of-itself cause problems. We overvalue visual
consistency and undervalue correctness. If we valued correctness higher, we
would not have critical code that was almost completely untested.</p>
<p>Using a correct coding style will not prevent most bugs, although it will catch
some. Using an easier language will reduce some types of bugs. Code
reviews will catch more bugs than both of those. Static analysis will catch
lots of bugs.</p>
<p>Good testing will catch many and prevent more bugs. This is why it's critical
that certain projects, such as the new Python <a href="https://github.com/pyca/cryptography#cryptography">cryptography
library</a>, have 100% code coverage. This type of glaring error could
not happen there.</p>
<p>Untested cryptography code is broken cryptography code.</p>
<p>This is why it's silly to be complaining about braces. In this case, braces did
make the problem harder to detect, but that wasn't the root cause, nor is using
braces a complete solution. It's just bike shedding.</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:goto">
<p>Except gotos. Using gotos is perfectly acceptable when using them
properly, which is to use them in error handling scenarios. <a class="footnote-backref" href="#fnref:goto" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Host an infodump session2014-02-12T15:32:00-08:002014-02-12T15:32:00-08:00Sean Cassidytag:www.seancassidy.me,2014-02-12:/host-an-infodump-session.html<p>The people who know the most about the systems at your company are very likely
the busiest. They don't have the time to prep a long presentation or write a
lot of documentation.</p>
<p>How can you get them to spread the knowledge while not taking up too much of
their …</p><p>The people who know the most about the systems at your company are very likely
the busiest. They don't have the time to prep a long presentation or write a
lot of documentation.</p>
<p>How can you get them to spread the knowledge while not taking up too much of
their valuable time?</p>
<h1 id="shift-the-work-to-the-viewers">Shift the work to the viewers</h1>
<p>If you're interested in learning something about one of the systems at your
company, list it on the infodump spreadsheet like so:</p>
<p><img alt="Infodump session example" src="https://www.seancassidy.me/static/images/" /></p>
<p>Start the votes column at one, because you're one person. Then try and get a few
other people interested in learning about whatever topic you want to learn
about. They should add votes if they're also interested.</p>
<p>Pick someone who knows a lot about this and put them in the presenter column.
Once it hits a threshold (this will differ in places according to resources,
demand, size of company, etc.), ask the presenter to present if they have an
hour or so free.</p>
<p>Send out an invite to everyone who you think would be interested.</p>
<h1 id="the-format">The format</h1>
<p>The format of an infodump session is simple. The presenter isn't really
presenting anything. They might want to talk for a few minutes at the beginning
to give a very broad overview of the topic, but that's it.</p>
<p>Then, the viewers ask questions. If the topic was HAProxy, for instance, a good
starting question would be, "Could you walk us through a simple HAProxy setup
with a health check?"</p>
<p>The presenter would then do that, live. If she has anything interesting to add
or show the viewers, she could do that at any time.</p>
<p>It's essentially distilling down presentations to just Q&A, often the most
interesting part of the presentation.</p>
<p>And record it if you can. There's a lot of great desktop video capture software
nowadays, I'm sure you can find some. This will help avoid duplicating effort.</p>
<h1 id="the-result">The result</h1>
<p>So the busiest people at your company don't have to do too much work. They
field questions for an hour or so, and give live demos of cool stuff.</p>
<p>No slides.</p>
<p>No presentations.</p>
<p>No prep.</p>
<p>Just information.</p>
<p>Communication is really tough. Infodump sessions should be just one of many
tools you use to facilitate communication. Try it and <a href="https://twitter.com/sean_a_cassidy">let me know what you
think</a>.</p>So, you want to crypto2013-12-24T10:35:00-08:002013-12-24T10:35:00-08:00Sean Cassidytag:www.seancassidy.me,2013-12-24:/so-you-want-to-crypto.html<p>I've been following the <a href="http://unhandledexpression.com/2013/12/17/telegram-stand-back-we-know-maths/">Telegram</a> <a href="https://news.ycombinator.com/threads?id=TelegramApp">story</a> over the past week.</p>
<p>I couldn't get past how the team at Telegram made such odd decisions. Presumably
they are a group of smart people who want to help people communicate. So how did
they manage to piss off the entire crypto community?</p>
<p>They …</p><p>I've been following the <a href="http://unhandledexpression.com/2013/12/17/telegram-stand-back-we-know-maths/">Telegram</a> <a href="https://news.ycombinator.com/threads?id=TelegramApp">story</a> over the past week.</p>
<p>I couldn't get past how the team at Telegram made such odd decisions. Presumably
they are a group of smart people who want to help people communicate. So how did
they manage to piss off the entire crypto community?</p>
<p>They did it by disregarding best practices and mountains of advice. They did it
by not consulting professional cryptographers. They did it by assuming they were
smart enough to figure it out as they went along.</p>
<p>Don't make the same mistakes they did.</p>
<h1 id="how-to-learn-cryptography">How to learn cryptography</h1>
<p>So, you want to implement some sort of cryptography in your software or
hardware project. Great. If you fuck this up people aren't going to be just
mad like they might be with other bugs. They might be in prison or they might
have been assassinated. </p>
<p>Cryptography in practice is
<a href="https://www.seancassidy.me/hackers-and-engineering-school.html">engineering, not hacking</a>, and it comes with
serious responsibility. So get it right. Ask for help. Do not let users use
your product until it's been vetted. <a href="http://paulmillr.com/posts/the-story-of-telegram/">Don't listen to idiots</a> who
tell you otherwise.</p>
<h2 id="learn-the-theoretical-background">Learn the theoretical background</h2>
<p>So, if you don't know much about cryptography, you should probably take a
course. The <a href="https://www.coursera.org/course/crypto">Cryptography I course at Coursera</a> is a good start, as
is your local university's cryptography course.</p>
<p>Both <a href="https://www.schneier.com/book-applied.html">Applied Cryptography</a> and the <a href="http://cacr.uwaterloo.ca/hac/">Handbook of Applied
Cryptography</a> are great resources, although they're a little dated
now.</p>
<p>Matthew Green also has a <a href="http://blog.cryptographyengineering.com/p/useful-cryptography-resources.html">great list of cryptography resources</a></p>
<h2 id="learn-how-to-implement-it">Learn how to implement it</h2>
<p>More important than knowing how to use the Chinese remainder theorem is how to
use cryptography in practice.</p>
<p>Step one is to read <a href="https://www.schneier.com/book-ce.html">Cryptography Engineering</a>. This is
not optional. Read it. It is a fantastic book that details how to use
cryptographic primitives. You'll be able to say to your crypto-ignorant
friends: Yes, you encrypt your message with AES, but you used ECB! Or,
you used Encrypt-and-MAC instead of Encrypt-then-MAC you dummy!</p>
<p>Step two is to study crypto in practice. <a href="https://otr.cypherpunks.ca/">OTR</a>, the standard for secure
messaging is a very well studied implementation of several key crypto features:
key exchange, socialist millionaire's protocol, perfect forward secrecy and
more. Read <a href="https://otr.cypherpunks.ca/otr-codecon.pdf">this presentation</a> and maybe even <a href="https://otr.cypherpunks.ca/Protocol-v3-4.0.0.html">the protocol
spec</a> if for nothing other than reading what a well written
cryptographic protocol looks like.</p>
<p>Other good pieces of software to study are <a href="http://www.tarsnap.com/">Tarsnap the backup
utility</a> and its <a href="http://www.tarsnap.com/crypto.html">crypto choices</a>, and WhisperSystems's
<a href="https://whispersystems.org/">TextSecure</a> app. <a href="https://whispersystems.org/blog/advanced-ratcheting/">Their blog posts</a> are typically
excellent.</p>
<h2 id="keep-up-to-date">Keep up-to-date</h2>
<p>Cryptography is something you'll need to keep learning for the rest of your
career.</p>
<p>The way I do it is via blogs. Here's some to get you started:</p>
<ul>
<li><a href="https://www.schneier.com/">Schneier on Security</a></li>
<li><a href="http://blog.cryptographyengineering.com/">A Few Thoughts on Cryptographic Engineering</a></li>
<li><a href="http://bristolcrypto.blogspot.com/">Bristol Cryptography Blog</a></li>
<li><a href="http://www.daemonology.net/blog/">Daemonic Dispatches</a></li>
<li><a href="http://rdist.root.org/">rdist</a></li>
<li><a href="http://outsourcedbits.org/">Outsourced Bits</a></li>
</ul>
<p>There are also a few good mailing lists.</p>
<ul>
<li><a href="http://lists.randombit.net/mailman/listinfo/cryptography">Randombits Cryptography</a> (from the creator of Botan)</li>
<li><a href="https://mailman.stanford.edu/mailman/listinfo/liberationtech">Liberation Tech</a></li>
<li><a href="http://www.metzdowd.com/mailman/listinfo/cryptography">Metzdowd Cryptography</a></li>
</ul>
<h1 id="follow-best-practices">Follow best practices</h1>
<p>The most important thing to do is to follow crypto best practices. Since you're
not a professional cryptographer, you aren't really aware of the security
trade-offs of, say AES-IGE verses AES-CTR and a SHA256-HMAC.</p>
<p>So what are the best practices?</p>
<h2 id="steal-first">Steal first</h2>
<p>Most applications of cryptography are not secure messaging or anonymity
networks. Instead, they're "authenticate this REST API" or "encrypt this gossip
protocol".</p>
<p>If your application fits in this area, try to steal someone else's design
first.</p>
<p>So if you need a way to authenticate your REST API, don't roll-your-own. Adapt
<a href="http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html">the AWS authentication scheme</a> to your own purposes. This is how you
can utilize the years of Amazon's experience of <a href="http://www.nds.rub.de/media/nds/veroeffentlichungen/2011/10/22/AmazonSignatureWrapping.pdf">getting it wrong</a>
to your own benefit.</p>
<p>If you go this route, you will still need to vet your design.</p>
<h2 id="design-and-vetting">Design and vetting</h2>
<p>So, if your application is fairly unique, and you can't just borrow someone
else's design what do you do? </p>
<p>Design your own. </p>
<p>Cryptography isn't something you can iterate on until you get
it right, because you'll never know if you do. It's best if you design your
protocol up front (before you write <em>any</em> code) and then ask people who are in
the know what they think.</p>
<p>If you're a company, hire a professional cryptographer. She can audit your
design, or (better yet) design one for you. This isn't something you can afford
to get wrong.</p>
<p>If you're just an individual, try emailing your design to the <a href="https://mailman.stanford.edu/mailman/listinfo/liberationtech">Liberation Tech
mailing list</a>. I <a href="https://mailman.stanford.edu/pipermail/liberationtech/2013-June/008924.html">did this</a> for a project and was told
(rightly so) that my design wasn't good enough. Ask for feedback early and
often.</p>
<p>You'll need to pick cryptographic primitives too.</p>
<p>Colin Percival has a great blog post, "<a href="http://www.daemonology.net/blog/2009-06-11-cryptographic-right-answers.html">Cryptographic Right
Answers</a>" which details what ciphers/modes/hashes to use. Even
though it was written in 2009, it's still valid today.</p>
<p>Do not pick wacky modes or unknown ciphers. There is little reason to be
creative when you can be correct. Choosing <a href="http://core.telegram.org/techfaq#q-do-you-use-ige-ige-is-broken">AES-IGE</a> is suspicious and
there's no reason to pick that when you can instead use CFB or CTR.</p>
<h3 id="explanations-are-paramount">Explanations are paramount</h3>
<blockquote>
You know the old saying: “Every ‘why’ has a ‘wherefore.’”
<cite class="character">Dromio of Syracuse</cite>
<cite>The Comedy of Errors by William Shakespeare</cite>
</blockquote>
<p>Cryptographers have <a href="http://en.wikipedia.org/wiki/Nothing_up_my_sleeve_number">nothing up my sleeve numbers</a>. These
numbers are typically values that are static in the cipher, and could be
anything. However, what if you chose your numbers such that they dramatically
reduced the effort an attacker?</p>
<p>Sound implausible? <a href="http://blog.cryptographyengineering.com/2013/09/the-many-flaws-of-dualecdrbg.html">It's happened</a>.</p>
<p>So, to show that they have nothing to hide, they use famous numbers such as
π or <em>e</em>.</p>
<p>In your design, you similarly need to show that you chose the right cipher
modes, the right IV generation tactic and so on. You'll need to explain <em>why</em>
you chose what you did.</p>
<p>If you've chosen standard algorithms and implementations recommended by
cryptographers, your job is easy. Similarly, if you've copied a design from
another service, your job is easier.</p>
<h2 id="implementation">Implementation</h2>
<p>Often, attacks are easier against <a href="https://www.schneier.com/essay-028.html">the implementation of the
cryptography</a> than against the cryptography itself. These
are known as side channel attacks</p>
<p><a href="http://en.wikipedia.org/wiki/Timing_attack">Timing attacks</a> are attacks you should especially watch out for.</p>
<p>Say you've signed your REST API with an HMAC and want to compare it against the
value you computed on your server?</p>
<p>Easy right?</p>
<div class="codehilite"><pre><span></span><span class="nd">@post</span><span class="p">(</span><span class="s1">'/update/<apikey>/<hash_value>'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">update</span><span class="p">(</span><span class="n">apikey</span><span class="p">,</span> <span class="n">hash_value</span><span class="p">):</span>
<span class="n">server_hash_value</span> <span class="o">=</span> <span class="n">compute_hash</span><span class="p">(</span><span class="n">apikey</span><span class="p">,</span> <span class="n">request</span><span class="p">)</span>
<span class="k">if</span> <span class="n">server_hash_value</span> <span class="o">!=</span> <span class="n">hash_value</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">HTTPError</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="c1"># Process</span>
</pre></div>
<p>This is wrong.</p>
<p>Because strings are checked one byte at a time (not always true, I know), they
will stop at the first difference. Attackers can then test how many bytes
matched by timing the comparison.</p>
<p>Instead, you need to use a <a href="http://rdist.root.org/2010/01/07/timing-independent-array-comparison/">timing attack resistant</a> comparison
function.</p>
<p>Cryptography is littered with seemingly minor implementation gotchas. It's a
long road to writing great cryptography software. The best thing you can do is
embrace the crypto community and ask for help.</p>
<p><a href="http://blog.cryptographyengineering.com/2013/03/here-come-encryption-apps.html">People who aren't aware</a> get this wrong.</p>
<h2 id="the-community">The Community</h2>
<p>After you've designed your software and implemented it, you need to open source
your code and have people review it. Even if you're a company selling software.
If it's not open source, it's not safe. </p>
<p>You'll also need to encourage people to look at your software. Everyone is busy
and not everyone cares about your project or company.</p>
<p>Offer a bug bounty program. <a href="https://www.schneier.com/crypto-gram-9902.html#snakeoil">Not a contest</a>. Actually pay people who
find problems, even minor problems.</p>
<p>And don't make your cryptography project sound like snake oil. Saying military
grade encryption or N-bits of security makes you sound like you don't know what
you're talking about.</p>
<h1 id="cryptography-is-different">Cryptography is different</h1>
<p>You can't just learn cryptography like you learn CSS or Erlang or MySQL. You
need to study it first and then implement it. Otherwise you're
<a href="http://happybearsoftware.com/you-are-dangerously-bad-at-cryptography.html">dangerous</a>. </p>
<p>But don't let that stop you from learning cryptography. Some people will say
<a href="https://web.archive.org/web/20130121031415/http://chargen.matasano.com/chargen/2009/7/22/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing.html">you shouldn't be doing crypto</a> at all unless you're a
cryptographer. That's nonsense.</p>
<p>More people should learn cryptography. But you should realize you're no longer
shooting yourself in the foot if you mess up. You'll be hurting other people.</p>
<p>And you'll be responsible.</p>Hackers and Engineering School2013-12-11T08:37:00-08:002013-12-11T08:37:00-08:00Sean Cassidytag:www.seancassidy.me,2013-12-11:/hackers-and-engineering-school.html<p>I'm a hacker and a software engineer. You can be both. It's not <a href="http://dandreamsofcoding.com/2013/09/16/hackers-and-software-engineers/">mutually exclusive</a>. This is how I did it.</p>
<p>I realized I wanted to be an engineer when my parents and I watched Apollo 13. The scene that sticks out in my mind is when there's a problem …</p><p>I'm a hacker and a software engineer. You can be both. It's not <a href="http://dandreamsofcoding.com/2013/09/16/hackers-and-software-engineers/">mutually exclusive</a>. This is how I did it.</p>
<p>I realized I wanted to be an engineer when my parents and I watched Apollo 13. The scene that sticks out in my mind is when there's a problem with carbon dioxide filters. The engineers at NASA are tasked with <a href="http://www.youtube.com/watch?v=C2YZnTL596Q">fitting a square peg into a round hole</a>. The stakes were high, and the problem was exciting.</p>
<div style="text-align: center;"><iframe width="560" height="315" src="//www.youtube.com/embed/C2YZnTL596Q" frameborder="0" allowfullscreen></iframe></div>
<p>This scene is the epitome of hacking: innovative problem solving with constraints. The NASA missions in general were my idea of the epitome of engineering.</p>
<p>I learned everything I could about programming, my favorite aspect of engineering. My first programming language was C, joined a FIRST Robotics team, and took every computing class my high school offered.</p>
<p>I considered myself a hacker. Solved interesting problems on the robotics team, got to level 5 or 6 of pulltheplug.org, and wrote some cool Perl scripts. Installed Slackware, broke my parent's computer and then fixed. Code fast, learn fast.</p>
<p>Naturally, I decided to go to school for computer engineering. Why not do what I love and get paid for it?</p>
<hr />
<p>During the first week of college, I sat in a large auditorium with other engineers-to-be and listened to a lecturer telling us about great engineering disasters.</p>
<p>It was a strange way to begin. He told us that we carried with us a great responsibility. If we were to be engineers, he said, we would have to act like it. <em>Why was that important?</em> I wondered.</p>
<p>He continued: every time someone turned on the wrong burner on the stove, we were to blame for the fire it caused. It should have been easier to discern which knob was correct. <em>But wasn't it the fault of stupid users? PEBKAC?</em></p>
<p>Every time we wasted a user's time with needless dialog boxes, people would be harmed in some way. Not everyone makes <a href="http://en.wikipedia.org/wiki/Therac-25">medical equipment</a>, but it's unacceptable to be lazy with your user interface. <em>Well, I can do better.</em></p>
<p>Every time a manager pushed to release a product that wasn't ready, we were to blame if we acquiesced. Maybe Challenger wouldn't have exploded if the engineers stood up for what was right, and against <a href="http://en.wikipedia.org/wiki/Go_fever">go fever</a>. <em>Yeah, you're right.</em></p>
<p>We had a moral obligation as professional engineers, he said, to do what was in the best interest of the safety and well-being of our customers. The hacker in me was stunned; what if my hacky perl scripts or untested robotics code hurt someone? My hacker state of mind was get-shit-done and not worry about the consequences. And that's what I was doing.</p>
<p>When I was taking my differential equations class, the <a href="http://en.wikipedia.org/wiki/I-35W_Mississippi_River_bridge">Mississippi River bridge</a> collapsed. We had just received our midterm exam results. My professor, usually a very positive man who <a href="http://www.maa.org/sites/default/files/pdf/upload_library/22/Allendoerfer/lutzer243.pdf">enjoyed juggling hammers</a>, was very upset. He said to us that every red mark on the page was another dead body. I looked at my exam, riddled with evidence of laziness and half effort, and I was ashamed.</p>
<p>A few years later a professor I admired suggested that I join the <a href="http://en.wikipedia.org/wiki/Order_of_the_Engineer">Order of the Engineer</a>. It is the American version of the Canadian <a href="http://en.wikipedia.org/wiki/The_Ritual_of_the_Calling_of_an_Engineer">Ritual of the Calling of an Engineer</a>. The Canadian version was developed after the Quebec bridge collapsed. Their <a href="http://en.wikipedia.org/wiki/Iron_Ring">Iron Ring</a> symbolizes the failed bridge. It is worn on the little finger of your writing hand, such that it is felt and makes a noise while you are working. The ring reminds you of your obligation.</p>
<p>I joined and accepted the obligation. I considered myself, upon receiving it, an engineer who moonlights as a hacker.</p>
<p>I graduated and began to work as a software engineer. We were a software-as-a-service shop, which was new to me, as all my previous jobs were embedded software.</p>
<p>It was apparent to me that what I was doing wasn't precisely what engineering school had prepared me for. We released code fast, and sometimes we tested our code. We broke stuff and fixed it as fast as we could.</p>
<p>We were certainly hackers and programmers, but I didn't think we were being engineers. We lacked rigor.</p>
<p>Maybe this is why so many people say that college isn't required for software. <a href="http://michaelochurch.wordpress.com/2012/12/14/the-unbearable-b-ness-of-software/">For how much of software is done</a>, school is overkill. </p>
<hr />
<p>Peter Thiel famously <a href="http://www.nytimes.com/2012/09/16/business/the-thiel-fellows-forgoing-college-to-pursue-dreams.html?pagewanted=all&_r=0">says that college is a waste of time</a>. For hacking up some quick scripts or an MVP, it probably is. </p>
<p>Plenty of people get by without going to engineering school. Steve Corona wrote about <a href="http://stevecorona.com/college-was-my-biggest-mistake/">how school wasn't for him</a>. He went to my alma mater. In fact, we were in the same freshman class. He would probably agree with <a href="http://tobi.lutke.com/the-apprentice-programmer">Tobi Lütke</a> when he said, "Not that degrees matter anymore. They do not. Experience does."</p>
<p>I completely agree. However, I feel that engineering school is something that more hackers should experience. It is an experience in and of itself. Being an engineer is different from being a hacker, and you can be both. They are complementary.</p>
<p>Engineering is a mindset. One which carries the responsibility of your customers and users. Engineers work to not release <a href="http://www.theguardian.com/technology/2012/dec/10/apple-maps-life-threatening-australian-police">dangerous software</a> and are fanatical about quality.</p>
<p>Hacking is also a mindset, but it is closer to <a href="http://en.wikipedia.org/wiki/Epistemological_anarchism">Feyerabend's methodological anarchy</a> than engineering is. It's about making significant innovations, and less about making great products.</p>
<p>To make great, innovative products, you need to be a hacker and an engineer. Responsible and adventurous. Fast and analytical. Well-read and clever. </p>
<p>You need to hack and to engineer.</p>Strings are untyped2013-12-05T13:05:00-08:002013-12-05T13:05:00-08:00Sean Cassidytag:www.seancassidy.me,2013-12-05:/strings-are-untyped.html<p>There has been a lot of discussion recently about whether or not <a href="http://mortoray.com/2013/11/27/the-string-type-is-broken/">strings are broken</a> or if we even <a href="http://mortoray.com/2013/08/13/we-dont-need-a-string-type/">need them</a>. This misses what I believe to be a more significant issue with strings.</p>
<p>Strings are essentially untyped, like a bare <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html">Object</a> or <a href="http://www.scala-lang.org/api/2.7.2/scala/AnyRef.html">Any</a>. You wouldn't use an Object unless …</p><p>There has been a lot of discussion recently about whether or not <a href="http://mortoray.com/2013/11/27/the-string-type-is-broken/">strings are broken</a> or if we even <a href="http://mortoray.com/2013/08/13/we-dont-need-a-string-type/">need them</a>. This misses what I believe to be a more significant issue with strings.</p>
<p>Strings are essentially untyped, like a bare <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html">Object</a> or <a href="http://www.scala-lang.org/api/2.7.2/scala/AnyRef.html">Any</a>. You wouldn't use an Object unless you had to, right? So why do we use strings in the same way?</p>
<p>Has this happened to you?</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="n">String</span> <span class="nf">createPath</span><span class="p">(</span><span class="n">String</span> <span class="n">domain</span><span class="p">,</span> <span class="n">String</span> <span class="n">fileExt</span><span class="p">,</span> <span class="n">String</span> <span class="n">customerId</span><span class="p">,</span> <span class="cm">/* etc */</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// combine all of these in a complicated way</span>
<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>And then you switch two of the arguments around?</p>
<div class="codehilite"><pre><span></span><span class="kd">final</span> <span class="n">String</span> <span class="n">result</span> <span class="o">=</span> <span class="n">createPath</span><span class="p">(</span><span class="n">domain</span><span class="p">,</span> <span class="n">customerId</span><span class="p">,</span> <span class="n">fileExt</span> <span class="p">...);</span> <span class="c1">// Whoops!</span>
</pre></div>
<p>No compiler error here. Maybe no run time error either, depending on what you use that for. This has bitten me harder than I'd like to admit.</p>
<p>Java, C, C#, and most other popular languages have no way of representing that there is a difference between a "domain" and a "customerId" without making an entire object to distinguish it and boxing it up. Talk about overhead. </p>
<p>This affects even very strongly typed languages like Scala. If you try and fix it in Scala like this (maybe taking inspiration from Haskell's <a href="http://www.haskell.org/haskellwiki/Newtype">newtype</a> and hoping it'll work), it won't work:</p>
<div class="codehilite"><pre><span></span><span class="k">type</span> <span class="kt">CustId</span> <span class="o">=></span> <span class="nc">String</span>
<span class="k">type</span> <span class="kt">FileExt</span> <span class="o">=></span> <span class="nc">String</span>
<span class="k">def</span> <span class="n">func</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">CustId</span><span class="o">)</span> <span class="k">=</span> <span class="o">???</span>
<span class="k">def</span> <span class="n">otherFunction</span><span class="o">(</span><span class="n">b</span><span class="k">:</span> <span class="kt">FileExt</span><span class="o">)</span> <span class="k">=</span> <span class="n">func</span><span class="o">(</span><span class="n">b</span><span class="o">)</span> <span class="c1">// Compiles!</span>
</pre></div>
<p>Using 'type' merely saves you from typing (on a keyboard), and doesn't actually introduce new restrictions.</p>
<p>How can we make it fail without writing Haskell and using newtype?</p>
<h1 id="type-tags">Type tags</h1>
<p>We can use <a href="https://github.com/scalaz/scalaz">Scalaz</a> to get tagged types, which let's us add ancillary types to other types:</p>
<div class="codehilite"><pre><span></span><span class="k">import</span> <span class="nn">scalaz._</span>
<span class="k">trait</span> <span class="nc">CustomerId</span>
<span class="k">def</span> <span class="n">func</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">String</span> <span class="kt">@@</span> <span class="kt">CustomerId</span><span class="o">)</span> <span class="k">=</span> <span class="n">a</span> <span class="o">+</span> <span class="s">" is a CustomerId"</span>
<span class="k">def</span> <span class="n">hello</span><span class="o">()</span> <span class="k">=</span> <span class="n">func</span><span class="o">(</span><span class="s">"Hello"</span><span class="o">)</span> <span class="c1">// Compilation error!</span>
</pre></div>
<p>Because "Hello" is not of the type String @@ CustomerId, it's a compilation error. To use it, we need to be able to construct CustomerId tagged strings easily, like this:</p>
<div class="codehilite"><pre><span></span><span class="k">trait</span> <span class="nc">CustomerId</span>
<span class="k">def</span> <span class="nc">CustomerId</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">String</span> <span class="kt">@@</span> <span class="kt">CustomerId</span> <span class="o">=</span> <span class="nc">Tag</span><span class="o">[</span><span class="kt">String</span><span class="p">,</span> <span class="kt">CustomerId</span><span class="o">](</span><span class="n">a</span><span class="o">)</span>
<span class="k">def</span> <span class="n">func</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">String</span> <span class="kt">@@</span> <span class="kt">CustomerId</span><span class="o">)</span> <span class="k">=</span> <span class="n">a</span> <span class="o">+</span> <span class="s">" is a CustomerId"</span>
<span class="k">def</span> <span class="n">hello</span><span class="o">()</span> <span class="k">=</span> <span class="n">func</span><span class="o">(</span><span class="nc">CustomerId</span><span class="o">(</span><span class="s">"Hello"</span><span class="o">))</span> <span class="c1">// Works!</span>
</pre></div>
<p>Now we have a function called CustomerId which constructs String @@ CustomerId from a String.</p>
<p>This seems verbose, though. What was that keyword we used when we needed to type less? Oh yeah! 'type'!</p>
<div class="codehilite"><pre><span></span><span class="k">trait</span> <span class="nc">CustomerIdTag</span>
<span class="k">type</span> <span class="kt">CustomerId</span> <span class="o">=</span> <span class="nc">String</span> <span class="o">@@</span> <span class="nc">CustomerIdTag</span>
<span class="k">def</span> <span class="nc">CustomerId</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">CustomerId</span> <span class="o">=</span> <span class="nc">Tag</span><span class="o">[</span><span class="kt">String</span><span class="p">,</span> <span class="kt">CustomerIdTag</span><span class="o">](</span><span class="n">a</span><span class="o">)</span>
</pre></div>
<p>These three lines to make dealing with strings safer. Haskell does it in one line, but three isn't that bad. <a href="https://groups.google.com/forum/#!topic/golang-nuts/v0F0_Fy6-hM">Go also solves this effectively</a>.</p>
<p>Java/C++/C# can't do this without boxing up the string into another object. Let me know if there are any languages that can do something similar to this; I'd love to know.</p>
<h2 id="why-is-this-better">Why is this better?</h2>
<p>Why should we preserve the underlying String type? Because it is very useful.</p>
<p>We can form paths with it, URLs, log properly, format emails properly, and so on. Strings are useful, and this keeps that.</p>
<p>Boxing a String up is less useful because you constantly have to unbox it via .get() or similar. And as soon as you unbox it, it loses it type protection.</p>
<h1 id="strings-are-untyped-so-add-typing-information">Strings are untyped, so add typing information</h1>
<p>Don't use multiple different types of plain strings near each other. It's like passing around Objects.</p>
<p>Instead, search out for your favorite programming language's solution to this problem. Haskell, Scala, and other very strongly typed languages offer solutions. If you need to, box them up in another object.</p>Don't Pipe to your Shell2013-10-31T16:30:00-07:002013-10-31T16:30:00-07:00Sean Cassidytag:www.seancassidy.me,2013-10-31:/dont-pipe-to-your-shell.html<p>Piping wget or curl to bash or sh is stupid. Like this:</p>
<div class="codehilite"><pre><span></span>wget -O - http://example.com/install.sh <span class="p">|</span> sudo sh
</pre></div>
<p><a href="https://github.com/saltstack/salt-bootstrap">It's</a> <a href="https://github.com/isaacs/npm/issues/1641">everywhere</a>. Sometimes they tell you to ignore certificates as well (looking at you, Salt). That's dumb.</p>
<p>The main reason I think it's dumb (other than running arbitrary commands …</p><p>Piping wget or curl to bash or sh is stupid. Like this:</p>
<div class="codehilite"><pre><span></span>wget -O - http://example.com/install.sh <span class="p">|</span> sudo sh
</pre></div>
<p><a href="https://github.com/saltstack/salt-bootstrap">It's</a> <a href="https://github.com/isaacs/npm/issues/1641">everywhere</a>. Sometimes they tell you to ignore certificates as well (looking at you, Salt). That's dumb.</p>
<p>The main reason I think it's dumb (other than running arbitrary commands on your machine that could change based on user agent to trick you) is its failure mode.</p>
<p>What happens if the connection closes mid stream? Let's find out.</p>
<div class="codehilite"><pre><span></span><span class="o">(</span><span class="nb">echo</span> -n <span class="s2">"echo \"Hello\""</span><span class="p">;</span> cat<span class="o">)</span> <span class="p">|</span> nc -l -p <span class="m">5555</span>
</pre></div>
<p>This will send a command to whoever connects, but won't send the newline. Then, it'll hang. Let's connect the client:</p>
<div class="codehilite"><pre><span></span>nc localhost <span class="m">5555</span> <span class="p">|</span> sh
</pre></div>
<p>At first, nothing happens. Great. What will happen if we kill -9 the listening netcat? Will sh execute the partial command in its buffer?</p>
<p>Yes.</p>
<div class="codehilite"><pre><span></span>nc localhost <span class="m">5555</span> <span class="p">|</span> sh
Hello
</pre></div>
<p>But what about wget, or curl?</p>
<div class="codehilite"><pre><span></span>wget -O - http://localhost:5555 <span class="p">|</span> sh
--2013-10-31 <span class="m">16</span>:22:38-- http://localhost:5555/
Resolving localhost <span class="o">(</span>localhost<span class="o">)</span>... <span class="m">127</span>.0.0.1
Connecting to localhost <span class="o">(</span>localhost<span class="o">)</span><span class="p">|</span><span class="m">127</span>.0.0.1<span class="p">|</span>:5555... connected.
HTTP request sent, awaiting response... <span class="m">200</span> No headers, assuming HTTP/0.9
Length: unspecified
Saving to: <span class="sb">`</span>STDOUT<span class="err">'</span>
<span class="o">[</span> <<span class="o">=</span>> <span class="o">]</span> <span class="m">12</span> --.-K/s in <span class="m">8</span>.6s
<span class="m">2013</span>-10-31 <span class="m">16</span>:22:47 <span class="o">(</span><span class="m">1</span>.40 B/s<span class="o">)</span> - written to stdout <span class="o">[</span><span class="m">12</span><span class="o">]</span>
Hello
</pre></div>
<p>What if that partial command wasn't a harmless echo but instead one of these:</p>
<div class="codehilite"><pre><span></span><span class="nv">TMP</span><span class="o">=</span>/tmp
<span class="nv">TMP_DIR</span><span class="o">=</span><span class="sb">`</span>mktemp<span class="sb">`</span>
rm -rf <span class="nv">$TMP_DIR</span>
</pre></div>
<p>Harmless, right? And what if the connection closes immediately after 'rm -rf $TMP' is sent? It'll delete everything in the temp directory, which is certainly harmful.</p>
<p>This might be unlikely, but the results of this happening, even once, could be catastrophic.</p>
<p>Friends don't let friends pipe to sh.</p>
<p><em>Update</em>: I updated the last example because it really made no sense. Thanks to <a href="http://www.reddit.com/r/programming/comments/1pnkxs/dont_pipe_to_your_shell/cd442fz">player2</a> and <a href="http://www.reddit.com/r/programming/comments/1pnkxs/dont_pipe_to_your_shell/cd44pjw">ZackMcAck</a> on reddit.</p>How to Organize Your Brain with Bookmark Tags2013-10-14T14:41:00-07:002013-10-14T14:41:00-07:00Sean Cassidytag:www.seancassidy.me,2013-10-14:/how-to-organize-your-brain-with-bookmark-tags.html<blockquote>
Odd as it may seem I am my remembering self, and the experiencing self, who
does my living, is like a stranger to me.
<cite class="character"><a href="http://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow">Thinking, Fast and Slow</a></cite>
<cite>Daniel Kahneman</cite>
</blockquote>
<p>I keep track of my reading online with a useful method: I bookmark most interesting things I read, and then …</p><blockquote>
Odd as it may seem I am my remembering self, and the experiencing self, who
does my living, is like a stranger to me.
<cite class="character"><a href="http://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow">Thinking, Fast and Slow</a></cite>
<cite>Daniel Kahneman</cite>
</blockquote>
<p>I keep track of my reading online with a useful method: I bookmark most interesting things I read, and then I tag them with every word I can think of. This scales much better than the alternative, which is <a href="http://www.theatlantic.com/magazine/archive/2008/07/is-google-making-us-stupid/306868/">trying to Google everything</a>.</p>
<p>Let me share an example.</p>
<p>I was reading <a href="http://ericlippert.com/2013/05/20/what-is-lexical-scoping/">a blog post Eric Lippert wrote about lexical scoping</a> when I remembered that this is one of the things that annoyed me about Chef recipes: resource names aren't lexically scoped in all cases.</p>
<p>Not wanting to forget this, I tagged it:</p>
<blockquote>
<p>chef
dynamic
knife
lexical
opscode
programming
scoping
static
typing</p>
</blockquote>
<p>So, now when I'm looking for articles about lexical scoping, or Chef, or programming in general, this article will come up.</p>
<h1 id="how-to-do-it">How to do it</h1>
<p>The method here is to sit and think for a few seconds about every single thing that comes to mind when you think of an article or discussion you read.</p>
<p>It's helpful to have a few category tags, like "programming", or "politics", but don't worry about strictly categorizing anything. If it's both politics and programming, put it in both!</p>
<p>You should put between 5 to 10 tags on each bookmark. You'll surely get some false positives when searching, but that's better than the alternative.</p>
<p>If you make the investment to do this, you'll need a browser that supports it, like Firefox. <a href="http://code.google.com/p/chromium/issues/detail?id=17536">Chrome still doesn't support bookmark tags</a> although there are extensions that do. Some people use del.icio.us or Xmarks to do the same thing. I don't like social bookmarks as much, so I use Firefox.</p>
<p>I find it useful to recall information this way. Maybe you will too.</p>You are not a 10x Developer2013-09-23T21:50:00-07:002013-09-23T21:50:00-07:00Sean Cassidytag:www.seancassidy.me,2013-09-23:/you-are-not-a-10x-developer.html<p>There has been a lot of <a href="http://priceonomics.com/whats-so-special-about-star-engineers/">discussion</a> about <a href="https://medium.com/about-work/6aedba30ecfe">10x engineers</a> lately. Do they exist? Are you one?</p>
<h1 id="whats-a-10x-developer">What's a 10x Developer?</h1>
<p>10x developers are quasi-mythical programmers whose ability to design complex systems and hammer out production-ready code is legendary. Their impact on their project or team is felt for years …</p><p>There has been a lot of <a href="http://priceonomics.com/whats-so-special-about-star-engineers/">discussion</a> about <a href="https://medium.com/about-work/6aedba30ecfe">10x engineers</a> lately. Do they exist? Are you one?</p>
<h1 id="whats-a-10x-developer">What's a 10x Developer?</h1>
<p>10x developers are quasi-mythical programmers whose ability to design complex systems and hammer out production-ready code is legendary. Their impact on their project or team is felt for years after they stop contributing, and their contributions don't stop there.</p>
<p>10x what, you ask? The name is vague. Perhaps it's 10 times more productive than a median developer.</p>
<h1 id="do-10x-developers-exist">Do 10x developers exist?</h1>
<p>Are there developers who are immensely better than others?</p>
<p>Let's ask this in a slightly different way: what is the probability distribution of human ability? Most people assume it's normally distributed:</p>
<p><a href="http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg"><img src="/images/500px-Standard_deviation_diagram.svg.png" alt="Normal Distribution"></a></p>
<p>If you're lucky, then, you're one or two standard deviations above the mean. This means that there aren't really many great developers, nor are there many terrible ones. Perhaps they find another career.</p>
<p>Assuming it's normally distributed isn't that bad of an assumption. The IQ test, for instance, is normally distributed. But there's a major problem with that: the IQ test is normalized. Is intelligence really normally distributed? <a href="http://www.ncurproceedings.org/ojs/index.php/NCUR2012/article/view/159">Maybe</a>.</p>
<h2 id="hiring-developers">Hiring Developers</h2>
<p>Have you hired developers? How many applicants fail at each stage of the process? Many friends I've quizzed about this report that each step of the process tends to remove a majority of the remaining candidates.</p>
<p>Would this be the case if the distribution of developer ability was normally distributed?</p>
<p>Perhaps it's something else.</p>
<h2 id="evidence">Evidence</h2>
<p>But, instead of using anecdotes and feelings, let's consult some studies.</p>
<p>Two professors of scientific management sought to answer this question: is individual performance normally distributed?</p>
<p>They answered this question in a paper called, "<a href="http://onlinelibrary.wiley.com/doi/10.1111/j.1744-6570.2011.01239.x/full">The Best and the Rest: Revisiting the Norm of Normality of Individual Performance</a>" Unfortunately, this paper is behind a pay wall, so I'll summarize.</p>
<p>If you're an academic, your life motto becomes, "Publish or perish." If we measure the total number of papers each scientist publishes and then lump them into groups, we can measure the size of each group.</p>
<p>What does that group distribution look like<sup id="fnref:powerfoot"><a class="footnote-ref" href="#fn:powerfoot">1</a></sup>?</p>
<p><a href="https://www.seancassidy.me/static/images/"><img src="/images/powergroup.png" width="600" alt="Power Law Distribution"></a></p>
<p>This is a power law or a Pareto distribution<sup id="fnref:zipf"><a class="footnote-ref" href="#fn:zipf">2</a></sup>. Well, science is a tough job. What about another field? The authors found that these distinct fields had a similar distribution:</p>
<ul>
<li>Emmy nominations</li>
<li>Nominations to the US House of Representatives</li>
<li>Major League Baseball player errors</li>
<li>NBA career points</li>
</ul>
<p>This is not an accident or a fluke of statistics. <a href="http://programmers.stackexchange.com/a/181297">Many studies show similar results</a>. Human ability is not distributed normally, but instead according to a power law distribution.</p>
<p>Further, the power law distribution indicates that people who are truly exceptional are not just 10 times better then the median, they are 100 times better or more. These people are extraordinarily rare.</p>
<p>Most people are below average. Why would you think software engineering is an exception to this rule?</p>
<h2 id="back-to-hiring">Back to Hiring</h2>
<p>If developer ability was structured in a similar way, that most developers aren't good, and there are very few that are very skilled, what would we expect to happen when we try to hire engineers?</p>
<p>We would expect the vast majority to fail even <a href="http://www.joelonsoftware.com/items/2005/01/27.html">simple</a> <a href="http://www.codinghorror.com/blog/2007/02/why-cant-programmers-program.html">tests</a>. And that's what happens.</p>
<p>Obviously un- and under-employed developers are even more common because the good developers already have very rewarding, high paying jobs, so this isn't definitive proof by anyone's metric, but it is telling.</p>
<p>How to you handle these giants among men if you do happen to hire one? I have no idea. <a href="http://michaelochurch.wordpress.com/2012/11/25/programmer-autonomy-is-a-1-trillion-issue/">Autonomy</a> is probably a good place to start. There is another paper by the same authors called, "<a href="http://onlinelibrary.wiley.com/doi/10.1111/peps.12054/abstract">Star Performers in Twenty-First-Century Organizations</a>", which deals with management techniques for power law employees. Good luck!</p>
<h1 id="who-are-these-10x-developers">Who are these 10x developers?</h1>
<p><a href="http://bellard.org/">Fabrice Bellard</a> <a href="http://blog.smartbear.com/careers/fabrice-bellard-portrait-of-a-super-productive-programmer/">would count</a> in my book. As would <a href="http://research.google.com/people/jeff/index.html">Jeff Dean</a>, and maybe even <a href="http://en.wikipedia.org/wiki/Doug_Lea">Doug Lea</a>. They certainly exist.</p>
<p>What sets them apart from the rest of us?</p>
<p>They solve not only difficult computer science or software engineering problems, but they solve ones that need solving. Often before the rest of us realize we have or will have that problem. They also solve <em>lots</em> of them.</p>
<p>Fabrice Bellard, for instance, has written a 4G LTE basestation, QEMU, FFMPEG, JSLinux, broke the record for most digits of Pi calculated, has written two C compilers, and more. Writing even one of those projects would make you an awesome developer. Having all of them on your resume makes you phenominally good.</p>
<h2 id="am-i-a-10x-developer">Am I a 10x Developer?</h2>
<p>Probably not.</p>
<p>The flip side of this Pareto coin is that they are extraordinarily rare. You probably are not <a href="http://www.10xmanagement.com/">hiring 10x developers</a>, and the idea that you only hire them is just ludicrous.</p>
<p>Focus on hiring solid employees you can train and less on rockstar ninja super coders. </p>
<h2 id="can-i-become-a-10x-developer">Can I become a 10x Developer?</h2>
<p>Maybe.</p>
<p><a href="https://medium.com/about-work/6aedba30ecfe">Others have spoken on this topic</a> and declared that the drive to be phenomenally good will likely be detrimental to your well-being. I tend to agree.</p>
<p>Being phenomenally good at programming isn't about <a href="http://bradmilne.tumblr.com/post/45829792502/the-one-tip-that-will-help-you-learn-to-code-10x-faster">coding 10x faster</a> (horrific advice in that post, by the way), or about following <a href="http://adamloving.com/internet-programming/10x-developers">"tips" from dubious sources</a>. </p>
<p>Will <a href="http://blog.geekli.st/post/38424333039/the-10x-developer-in-you">working at night help you be 10x</a>? Of course not.</p>
<p>Instead, you can work much more effective in a different way.</p>
<h1 id="become-a-multiplier">Become a multiplier</h1>
<p>What I am trying to do instead of being, on my own, phenomenally good, is to <a href="http://michaelochurch.wordpress.com/2012/01/26/the-trajectory-of-a-software-engineer-and-where-it-all-goes-wrong/">become a multiplier</a> for the teams I work on. Naturally, Bellard and his ilk are so good on their own that they are multipliers. </p>
<p>If I can write a library to help my fellow engineers, or debug a build process gone awry, I've not only done some cool stuff, but I've helped them be more effective. </p>
<p>Linus Torvalds is the most well known multiplier out there. How many people are more productive today than they would be without Linux or git? Knowing what problems to solve is just as important as knowing how to solve it.</p>
<p>If you focus <a href="https://www.seancassidy.me/on-being-nice.html">on helping others</a>, by writing useful software that solves real problems, rather than on how many line of code you can type, your ability and career will show that effort. <a href="http://c2.com/cgi/wiki?LoneWolf">Lone wolves</a> are just that: alone and struggle to influence their environment and others.</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:powerfoot">
<p>This is a reproduction of the actual graph with similar but fake data. I'm not allowed to reproduce the actual graph here. <a class="footnote-backref" href="#fnref:powerfoot" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:zipf">
<p>Zipf's law is another example of a power law. I <a href="https://www.seancassidy.me/zipf-your-variable-names.html">discussed this distribution and programming</a> awhile back. <a class="footnote-backref" href="#fnref:zipf" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Windows ruins everything2013-09-12T17:45:00-07:002013-09-12T17:45:00-07:00Sean Cassidytag:www.seancassidy.me,2013-09-12:/windows-ruins-everything.html<p>Recently, at work, we had an annoying bug in our code which came about from a <em>seemingly</em> harmless refactor.</p>
<h1 id="the-bug">The Bug</h1>
<p>We have an API that was recently refactored from this:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kt">void</span> <span class="nf">upload</span><span class="p">(</span><span class="n">File</span> <span class="n">file</span><span class="p">,</span> <span class="n">String</span> <span class="n">remoteDir</span><span class="p">,</span> <span class="n">String</span> <span class="n">remoteName</span><span class="p">)</span> <span class="p">{</span>
<span class="n">client</span><span class="p">.</span><span class="na">upload</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">remoteDir</span><span class="p">,</span> <span class="n">remoteName</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>To this using Apache Commons-IO's …</p><p>Recently, at work, we had an annoying bug in our code which came about from a <em>seemingly</em> harmless refactor.</p>
<h1 id="the-bug">The Bug</h1>
<p>We have an API that was recently refactored from this:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kt">void</span> <span class="nf">upload</span><span class="p">(</span><span class="n">File</span> <span class="n">file</span><span class="p">,</span> <span class="n">String</span> <span class="n">remoteDir</span><span class="p">,</span> <span class="n">String</span> <span class="n">remoteName</span><span class="p">)</span> <span class="p">{</span>
<span class="n">client</span><span class="p">.</span><span class="na">upload</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">remoteDir</span><span class="p">,</span> <span class="n">remoteName</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>To this using Apache Commons-IO's <a href="http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FilenameUtils.html">FilenameUtils</a>:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kt">void</span> <span class="nf">upload</span><span class="p">(</span><span class="n">File</span> <span class="n">file</span><span class="p">,</span> <span class="n">String</span> <span class="n">remotePath</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">final</span> <span class="n">String</span> <span class="n">remoteDir</span> <span class="o">=</span> <span class="n">FilenameUtils</span><span class="p">.</span><span class="na">getPath</span><span class="p">(</span><span class="n">remotePath</span><span class="p">);</span>
<span class="kd">final</span> <span class="n">String</span> <span class="n">remoteName</span> <span class="o">=</span> <span class="n">FilenameUtils</span><span class="p">.</span><span class="na">getName</span><span class="p">(</span><span class="n">remotePath</span><span class="p">);</span>
<span class="n">client</span><span class="p">.</span><span class="na">upload</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">remoteDir</span><span class="p">,</span> <span class="n">remoteName</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>This API worked better with our general use case and we were all happy.</p>
<p>Until we had a bug.</p>
<h2 id="the-differences">The differences</h2>
<p>Can you see what the differences are between these two snippets of code? Obviously the use of FilenameUtils, but what problem would that cause?</p>
<p>Well, it turns out some code that was using this API generated a superfluous slash for the remotePath, so that it looked like this:</p>
<div class="codehilite"><pre><span></span><span class="err">//directory1/directory2/file.txt</span>
</pre></div>
<p>But we all know how UNIX paths work! Extra slashes are harmless. And the client object worked with the extra slash before. So what was the issue?</p>
<h2 id="windows-and-cross-platform-support">Windows and cross-platform support</h2>
<p>In Windows, you can easily reference files on a networked server with what's called <a href="http://en.wikipedia.org/wiki/Path_%28computing%29#UNC_in_Windows">UNC paths</a>:</p>
<div class="codehilite"><pre><span></span><span class="err">\\MainServer1\rootdir\seconddir\file.txt</span>
</pre></div>
<p>It's pretty useful. And, <a href="http://superuser.com/a/176395">since you can use forward slashes instead of back slashes</a>, the following is equivalent:</p>
<div class="codehilite"><pre><span></span><span class="err">//MainServer1/rootdir/seconddir/file.txt</span>
</pre></div>
<p>So, when FilenameUtils wants to get the path from a given String, it needs to first get rid of the server name.</p>
<div class="codehilite"><pre><span></span><span class="n">FilenameUtils</span><span class="p">.</span><span class="na">getPath</span><span class="p">(</span><span class="s">"//MainServer1/rootdir/seconddir/file.txt"</span><span class="p">)</span> <span class="o">==</span> <span class="s">"rootdir/seconddir/file.txt"</span>
</pre></div>
<p>Which is not what we wanted.</p>
<h1 id="the-fix">The fix</h1>
<p>Well our paths were not being cleanly normalized, so I set out to do so. My first instinct was this:</p>
<div class="codehilite"><pre><span></span><span class="kd">final</span> <span class="n">String</span> <span class="n">remoteDir</span> <span class="o">=</span> <span class="n">FilenameUtils</span><span class="p">.</span><span class="na">getPath</span><span class="p">(</span><span class="n">remotePath</span><span class="p">.</span><span class="na">replaceAll</span><span class="p">(</span><span class="s">"/+"</span><span class="p">,</span><span class="s">"/"</span><span class="p">));</span>
</pre></div>
<p>But then I thought, well certainly it would be better to completely normalize our paths. And hey! FilenameUtils has that as well.</p>
<p>Unfortunately, even normalization of UNIX style paths is not possible with FilenameUtils. From the documentation:</p>
<div class="codehilite"><pre><span></span><span class="err">//server/foo/../bar --> //server/bar</span>
<span class="err">//server/../bar --> null</span>
</pre></div>
<p>This is the case even if you set the boolean unixSeparator to true. You can't disable UNC behavior in FilenameUtils.</p>
<p>We can use getFullPath instead of getPath, but that doesn't strip the extraneous starting slash.</p>
<p>So, instead, I used my replaceAll function. Perhaps replaceAll and then normalization is even better.</p>
<h1 id="why-windows-ruins-everything">Why Windows ruins everything</h1>
<p>Getting cross platform behavior right is tricky. It's cool that there is support for UNC paths in Commons IO, but not being able to normalize UNIX paths properly sucks.</p>
<p>And this <a href="http://en.wikipedia.org/wiki/Principle_of_least_astonishment">surprising behavior</a> caused a bug in our code. Well, generating an extra starting slash was a small bug, but since we knew that it was "harmless" we left it alone.</p>
<p>Yeah, we should have read the documentation more closely. But, since we're not even running on Windows, UNC paths were not on our minds.</p>
<p>This is why Windows ruined my day: we weren't using it, and yet we still had to know about it. It makes people (like the Apache folks) create surprising behavior because of the large number of different path formats needed to support it.</p>Don't Give Up and Die2013-08-20T18:55:00-07:002013-08-20T18:55:00-07:00Sean Cassidytag:www.seancassidy.me,2013-08-20:/dont-give-up-and-die.html<p>Today, <a href="http://www.groklaw.net/article.php?story=20130818120421175">Groklaw shuttered its doors</a> due to concerns of on-going privacy invasions. I understand pg's point of view and sympathize deeply, but I disagree. It is more important than ever to not give up, to not stop writing, to not stop writing privacy software.</p>
<p>When your government is tyrannical, you …</p><p>Today, <a href="http://www.groklaw.net/article.php?story=20130818120421175">Groklaw shuttered its doors</a> due to concerns of on-going privacy invasions. I understand pg's point of view and sympathize deeply, but I disagree. It is more important than ever to not give up, to not stop writing, to not stop writing privacy software.</p>
<p>When your government is tyrannical, you should not give up. You must right this injustice. When Jefferson wrote the Declaration of Independence, he said that when any government does not provide for the safety of its people and protect their innate human rights it must be abolished<sup id="fnref:govt"><a class="footnote-ref" href="#fn:govt">1</a></sup>.</p>
<p>I think that's unobtainable and undesirable<sup id="fnref:better"><a class="footnote-ref" href="#fn:better">2</a></sup> in our modern world. Instead, we should join the EFF and the ACLU. We should protest openly and write our representatives messages. We should also solve the problem of communicating securely.</p>
<p>We need to provide the people tools they need to fight back.</p>
<p>I wrote the <a href="https://bitbucket.org/scassidy/dinet">Diluvian Network</a> to provide a <a href="https://www.seancassidy.me/the-origins-of-the-diluvian-network.html">safe and anonymous verison of CryptoCat</a>. It wasn't particularly well received when I <a href="http://www.mail-archive.com/liberationtech@lists.stanford.edu/msg04914.html">sent it to LiberationTech</a> but that's the point of peer review in security software. Software gets reviewed, and then fixed or dropped if its bad enough. If more people were thinking about this problem, we could solve it.</p>
<p>The best open source project you could help with would be <a href="https://bitmessage.org/wiki/Main_Page">BitMessage</a>. I plan on reviewing BitMessage in depth as I was trying to solve a similar problem, and you should too. Particularly, if we could make it as easy to use as CryptoCat is, we would be a great deal closer to solving anonymous and secure communication.</p>
<p>We must make sure that in a few years time, no one will say this again<sup id="fnref:freespeech"><a class="footnote-ref" href="#fn:freespeech">3</a></sup>:</p>
<blockquote>
<p>There is now no shield from forced exposure. Nothing in that parenthetical
thought list is terrorism-related, but no one can feel protected enough from
forced exposure any more to say anything the least bit like that to anyone in
an email, particularly from the US out or to the US in, but really anywhere.
You don't expect a stranger to read your private communications to a friend.
And once you know they can, what is there to say? Constricted and distracted.
That's it exactly. That's how I feel. </p>
</blockquote>
<div class="footnote">
<hr />
<ol>
<li id="fn:govt">
<p>It was telling for me personally that I debated publishing this paragraph, for fear that some governmental official would read it and not the paragraph that followed it. <a class="footnote-backref" href="#fnref:govt" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:better">
<p>For a long reason why I feel this way, read <a href="http://www.amazon.com/The-Better-Angels-Our-Nature/dp/0143122010/">The Better Angels of Our Nature</a> by Steven Pinker. <a class="footnote-backref" href="#fnref:better" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:freespeech">
<p>I think NSA snooping is particularly troubling because of the chilling effects it has on free speech. I think that because of the negative effect it has on free speech, the US government will eventually stop doing this. They've certainly done much worse before. Check out <a href="http://www.amazon.com/From-Palmer-Raids-Patriot-Act/dp/0807044288/">From the Palmer Raids to the Patriot Act</a> by Christopher Finan for some excellent history on free speech in America. Did you know there were actually two red scares? <a class="footnote-backref" href="#fnref:freespeech" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
</ol>
</div>On Being Nice2013-07-22T11:53:00-07:002013-07-22T11:53:00-07:00Sean Cassidytag:www.seancassidy.me,2013-07-22:/on-being-nice.html<h1 id="or-why-linus-is-wrong-about-being-a-jerk">Or, why Linus is wrong about being a jerk</h1>
<p>Linus and co. made a lot of news recently about how <a href="https://lkml.org/lkml/2013/7/15/329">Greg KH should be tougher on people</a> who contribute patches to -stable:</p>
<blockquote>
<p>So Greg, if you want it all to change, create some real threat: be frank with contributors and …</p></blockquote><h1 id="or-why-linus-is-wrong-about-being-a-jerk">Or, why Linus is wrong about being a jerk</h1>
<p>Linus and co. made a lot of news recently about how <a href="https://lkml.org/lkml/2013/7/15/329">Greg KH should be tougher on people</a> who contribute patches to -stable:</p>
<blockquote>
<p>So Greg, if you want it all to change, create some real threat: be frank with contributors and sometimes swear a bit.</p>
</blockquote>
<p>and</p>
<blockquote>
<p>You may need to learn to shout at people.</p>
</blockquote>
<p>I disagree that being a jerk is required. In fact, I believe it to be harmful.</p>
<p>Now, I understand that Ingo Molnar and Linus Torvalds have a lot more experience in open source development than I do, but we can learn about this from scientific studies and statistics, rather than depending on unreliable personal experience.</p>
<h1 id="how-to-motivate-your-subordinates-to-be-better">How to motivate your subordinates to be better</h1>
<p>A friend of mine told me recently that every time he messed up a landing while parachuting, his instructors really chew him out. He said that this was beneficial as making an error while landing can be extremely dangerous. Each time after he was yelled at he did better. But is such tough love actually helpful?</p>
<p>In general, which works better: reward or punishment?</p>
<p>In <a href="http://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow">Thinking, Fast and Slow</a>, Daniel Kahneman relates a story about when he was consulting with the Israeli air force. He gave a lecture on how rewarding good behavior produces better results than punishing bad behavior. An instructor, having listened to the lecture, responded:</p>
<blockquote>
<p>"On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver. The next time they try the same maneuver they usually do worse. On the other hand, I have often screamed into a cadet's earphone for bad execution, and in general he does better on his next try."</p>
</blockquote>
<p>The flight instructor, like Linus and Ingo, rightly see the effects of being nice and being hard: if someone is detailed and meticulous in their patch submissions and Linus et. al are positive, likely the submitter is going to be not as careful next time. But is this a result of the positive compliment or of something else?</p>
<h1 id="regression-to-the-mean">Regression to the Mean</h1>
<p>In fact, it is not due to the positivity but rather in spite of it. <a href="http://www.ncbi.nlm.nih.gov/pubmed/21419628">Studies</a> <a href="http://amj.aom.org/content/25/4/810.short">have</a> <a href="http://www.sciencedirect.com/science/article/pii/S0749597805001184">shown</a> that rewards outperform punishment, so why do people think the opposite?</p>
<p>What Linus and the flight commander think is the effectiveness of being harsh is actually <a href="http://en.wikipedia.org/wiki/Regression_toward_the_mean">regression to the mean</a>. How well you land your parachute jump is a probability distribution. Most of your jumps are average, some are really great, and some are pretty bad. When you land extremely well it is likely due to luck rather than an immense sudden increase in your abilities. Therefore, your next jump is very likely to be worse.</p>
<p>If your instructor compliments you on your great jump and then you do worse, he may draw the false inference that it was his compliment, rather than regression to the mean. The same is true for your crappy landings: you're very likely to do better next time regardless of how much your instructor yells at you.</p>
<p>This, in my estimation, is what Linus and Ingo have missed. Yelling at people isn't effective, but it can seem that way.</p>
<h1 id="a-robust-personality">A Robust Personality</h1>
<p><a href="http://en.wikipedia.org/wiki/Robustness_principle">The robustness principle</a> is an important aspect to writing quality software. It states:</p>
<blockquote>
<p>Be conservative in what you do, be liberal in what you accept from others.</p>
</blockquote>
<p>This should be applied to how we, as programmers, interact with each other, regardless of how idiotic the question or contributor may be.</p>
<blockquote>
<p>Accept impoliteness from others, but always respond with courtesy.</p>
</blockquote>
<p>The world would be a better place for it.</p>
<p><img alt="Linus being nice" src="https://www.seancassidy.me/static/images/" /></p>
<p><em>Update</em>: discuss on <a href="https://news.ycombinator.com/item?id=6086229">Hacker News</a></p>Bus Factors and Walk Score2013-06-18T13:50:00-07:002013-06-18T13:50:00-07:00Sean Cassidytag:www.seancassidy.me,2013-06-18:/bus-factors-and-walk-score.html<p>The <a href="http://en.wikipedia.org/wiki/Bus_factor">bus factor</a> is an important thing to keep track of for both commercial and open source software development. If your bus factor is low (below two), you'll be in trouble if that developer leaves or cannot contribute any longer.</p>
<p>It gets worse. Not only is your bus factor probably …</p><p>The <a href="http://en.wikipedia.org/wiki/Bus_factor">bus factor</a> is an important thing to keep track of for both commercial and open source software development. If your bus factor is low (below two), you'll be in trouble if that developer leaves or cannot contribute any longer.</p>
<p>It gets worse. Not only is your bus factor probably too low for critical parts of your project, but it's in the nature of software that a small group of people make the largest impact<sup id="fnref:pareto"><a class="footnote-ref" href="#fn:pareto">1</a></sup>. In general, human performance follows a power law<sup id="fnref:power"><a class="footnote-ref" href="#fn:power">2</a></sup>, so your first few contributors are the <em>most</em> important.</p>
<p>Losing them is painful.</p>
<h1 id="the-walk-score">The Walk Score</h1>
<p>A colleague and I were discussing the bus factor and how we could improve it. We recently improved it on several key projects from 1 to about 1.5, which we obviously wanted to increase.</p>
<p>I noticed, however, that I was significantly more concerned about projects that were a lot harder to work on, and less concerned about projects that were easier to work on. In fact, one of our older systems has a bus factor of 3 or 4, but is so difficult to start working on that it is as big as a business risk as an easy project with a bus factor of 1.5 is.</p>
<p>There is another element to the bus factor that is important to consider which I call The Walk Score.</p>
<h2 id="how-nice-is-your-software-neighborhood">How nice is your software neighborhood?</h2>
<p>Imagine your software as a neighborhood. Is it a friendly, welcoming place? Or is it full of dark alleys and crimes against software development?</p>
<p>The higher your walk score, the more enjoyable it is to "walk around" your code. The higher your walk score, the easier it is for your colleagues to work on the project and the less dire your low bus factor becomes.</p>
<p>This is as important, if not more important than your bus factor. You can get away with a low bus factor if your walk score is high. </p>
<p>This should include maps (tutorials, examples, and overviews), be easy to get to (getting, compiling and deploying the code), and be a nice place to hang around (well written and with good tests).</p>
<p>After all, don't you want to live in a nice neighborhood?</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:pareto">
<p>Goeminne, Mathieu, and Tom Mens. "<a href="http://ceur-ws.org/Vol-708/fuhr-et-al-11-proceedings-mdsm2011-sqm2011.pdf">Evidence for the pareto principle in open source software activity.</a>" Magiel Bruntink et Kostas Kontogiannis, éditeurs: CSMR 2011 Workshop on Software Quality and Maintainability (SQM). Vol. 701. 2011. <a class="footnote-backref" href="#fnref:pareto" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:power">
<p>AGUINIS, HERMAN. "<a href="http://onlinelibrary.wiley.com/doi/10.1111/j.1744-6570.2011.01239.x/abstract">The best and the rest: Revisiting the norm of normality of individual performance.</a>" Personnel Psychology 65.1 (2012): 79-119. <a class="footnote-backref" href="#fnref:power" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Wiggle the mouse to fix the test2013-05-16T17:50:37-07:002013-05-16T17:50:37-07:00Sean Cassidytag:www.seancassidy.me,2013-05-16:/wiggle-the-mouse-to-fix-the-test.html<p>At my current job, we needed to move an aging web service into our job management system for reliability reasons.</p>
<p>After implementing the code and merging it into our master branch, the unit tests were failing for some people but not others. Particularly, if you wiggled your mouse during these …</p><p>At my current job, we needed to move an aging web service into our job management system for reliability reasons.</p>
<p>After implementing the code and merging it into our master branch, the unit tests were failing for some people but not others. Particularly, if you wiggled your mouse during these non-interactive backend tests, the unit tests would pass!</p>
<h1 id="the-tests">The Tests</h1>
<p>I modified our extensive unit test suite to add a new workflow using the third-party JARs that the old web service used, and it was working great. I was working in a small branch because my coworkers were working on a bigger feature in master. When they checked the tests and did a code review, they pulled my change into master.</p>
<p>And then the tests broke.</p>
<p>Alright, we thought, there is some sort of test conflict. But the tests always worked on my machine. And worked most of the time on my coworkers' machines. And never worked on Jenkins. But always worked when actually deployed to our dev environment.</p>
<p>We tried loading our machines to see if there was a CPU or disk load issue, but it wasn't. Running tests individually would work, but not when run all together with Maven.</p>
<p>After extensive analysis and debugging, my coworker found that if you were actively using your computer while Maven was running, the test would work. Otherwise it would time out.</p>
<p>How is that possible?</p>
<h2 id="a-short-detour-to-explain-our-workflow-system">A short detour to explain our workflow system</h2>
<p>To understand how this bug will break the tests, I figured I would explain how our workflow system works at a very high level. We have tasks that do something like so:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">DownloadSomething</span> <span class="kd">extends</span> <span class="n">Task</span> <span class="p">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="n">Result</span> <span class="nf">do</span><span class="p">(</span><span class="n">TaskInfo</span> <span class="n">info</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Downloader</span> <span class="n">downloader</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Downloader</span><span class="p">(</span><span class="n">info</span><span class="p">.</span><span class="na">getInput</span><span class="p">());</span>
<span class="k">while</span> <span class="p">(</span><span class="n">downloader</span><span class="p">.</span><span class="na">notDone</span><span class="p">())</span> <span class="p">{</span>
<span class="n">Chunk</span> <span class="n">chunk</span> <span class="o">=</span> <span class="n">downloader</span><span class="p">.</span><span class="na">getNextChunk</span><span class="p">();</span>
<span class="n">appendToFile</span><span class="p">(</span><span class="n">chunk</span><span class="p">);</span>
<span class="n">heartbeat</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">success</span><span class="p">(</span><span class="s">"Downloaded"</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>A workflow is made up of tasks like this one, which have input and produce output, such as a file. However, each task might fail either because the download URL is incorrect, or the network connection is down. </p>
<p>If your implementation of do() was merely</p>
<div class="codehilite"><pre><span></span><span class="k">while</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Thread</span><span class="p">.</span><span class="na">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>we would interrupt the task and the test would fail.</p>
<p>So we have logic that will time out a task if it does not heartbeat early and often. We have special InputStream classes which wrap other InputStreams and heartbeat as the consumer is reading them.</p>
<p>This is a relatively common pattern.</p>
<h2 id="what-was-making-this-task-timeout">What was making this task timeout?</h2>
<p>This particular task read some <a href="http://en.wikipedia.org/wiki/X.509">X.509 certificates</a> from disk and then called the proprietary JARs with their proprietary logic. Reading the certificates was taking <em>minutes</em>. Literally five or more minutes on occasion. Why?</p>
<p>Our special heartbeating InputStream wasn't even helping, so the code that was allegedly reading certificates was doing something else entirely. </p>
<p>Something that only took awhile when you <em>weren't</em> using your computer.</p>
<h1 id="the-cause">The Cause</h1>
<p>If you wanted some secure random numbers, for encryption purposes, how would you do that? In Java, the way is <a href="http://docs.oracle.com/javase/7/docs/api/java/security/SecureRandom.html">SecureRandom</a>. How does SecureRandom work?</p>
<p>If you look at the <a href="http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/java/security/SecureRandom.java">source code for SecureRandom</a>, the default constructor calls getDefaultPRNG, which gets a provider, such as <a href="http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/security/provider/SecureRandom.java">sun.security.provider.SecureRandom</a> to provide actual secure random numbers.</p>
<p>To get the random data, the user calls SecureRandom.nextBytes:</p>
<div class="codehilite"><pre><span></span><span class="kd">synchronized</span> <span class="kd">public</span> <span class="kt">void</span> <span class="nf">nextBytes</span><span class="p">(</span><span class="kt">byte</span><span class="o">[]</span> <span class="n">bytes</span><span class="p">)</span> <span class="p">{</span>
<span class="n">secureRandomSpi</span><span class="p">.</span><span class="na">engineNextBytes</span><span class="p">(</span><span class="n">bytes</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>which calls the specific implementation random number generation, such as in sun.security.provider.SecureRandom. If there was no seed specified (as there often isn't), the implementation will call a seed generator class such as <a href="http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/security/provider/SeedGenerator.java">sun.security.provider.SeedGenerator</a>.</p>
<p>SeedGenerator has a few implementations, one of which is an interesting threading randomness generator. The other is URLSeedGenerator, which uses /dev/random</p>
<div class="codehilite"><pre><span></span><span class="kd">final</span> <span class="kd">static</span> <span class="n">String</span> <span class="n">URL_DEV_RANDOM</span> <span class="o">=</span> <span class="n">SunEntries</span><span class="p">.</span><span class="na">URL_DEV_RANDOM</span><span class="p">;</span> <span class="c1">// /dev/unrandom</span>
<span class="c1">// snip</span>
<span class="kd">static</span> <span class="kd">class</span> <span class="nc">URLSeedGenerator</span> <span class="kd">extends</span> <span class="n">SeedGenerator</span> <span class="p">{</span>
<span class="n">URLSeedGenerator</span><span class="p">()</span> <span class="kd">throws</span> <span class="n">IOException</span> <span class="p">{</span>
<span class="k">this</span><span class="p">(</span><span class="n">SeedGenerator</span><span class="p">.</span><span class="na">URL_DEV_RANDOM</span><span class="p">);</span>
<span class="p">}</span>
<span class="nd">@Override</span>
<span class="kt">void</span> <span class="nf">getSeedBytes</span><span class="p">(</span><span class="kt">byte</span><span class="o">[]</span> <span class="n">result</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">len</span> <span class="o">=</span> <span class="n">result</span><span class="p">.</span><span class="na">length</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">read</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">try</span> <span class="p">{</span>
<span class="k">while</span> <span class="p">(</span><span class="n">read</span> <span class="o"><</span> <span class="n">len</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">count</span> <span class="o">=</span> <span class="n">devRandom</span><span class="p">.</span><span class="na">read</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">read</span><span class="p">,</span> <span class="n">len</span> <span class="o">-</span> <span class="n">read</span><span class="p">);</span>
<span class="c1">// /dev/random blocks - should never have EOF</span>
<span class="k">if</span> <span class="p">(</span><span class="n">count</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">throw</span> <span class="k">new</span> <span class="n">InternalError</span><span class="p">(</span><span class="s">"URLSeedGenerator "</span> <span class="o">+</span> <span class="n">deviceName</span> <span class="o">+</span>
<span class="s">" reached end of file"</span><span class="p">);</span>
<span class="n">read</span> <span class="o">+=</span> <span class="n">count</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// snip</span>
<span class="p">}</span>
</pre></div>
<p>So it reads from <a href="http://en.wikipedia.org/wiki//dev/random">/dev/random</a> where appropriate. What's an important difference between /dev/random and its counterpart /dev/urandom? Well, /dev/random can and will block on Linux when there isn't enough randomness to go around.</p>
<p>To test this theory (that the certificate loading code, was for whatever reason, consuming a lot of entropy) we did this:</p>
<div class="codehilite"><pre><span></span><span class="c1"># mv /dev/random /dev/random.bkup</span>
<span class="c1"># ln -s /dev/urandom /dev/random</span>
</pre></div>
<p>And the problem went away! Tests were completely consistent now.</p>
<h2 id="why-did-wiggling-the-mouse-fix-the-tests">Why did wiggling the mouse fix the tests?</h2>
<p><a href="http://en.wikipedia.org/wiki/Entropy_%28computing%29#Linux_kernel">Linux uses multiple sources</a> to generate entropy for /dev/random. On the Jenkins build server and when we weren't using our computers, /dev/random would run out of entropy quickly and blocked.</p>
<p>Using the computer (which I almost always do while the build is running) kept it working, which is why I didn't notice any failures.</p>
<p>Since SecureRandom is a pseudo-random number generator that uses a cryptographically secure seed, it should use /dev/urandom instead of /dev/random, in my opinion.</p>
<h2 id="why-was-loading-an-x509-certificate-using-random-numbers">Why was loading an X.509 certificate using random numbers?</h2>
<p>Ah, the magic of mystery third-party libraries. Using the awesome Java decompiler <a href="http://java.decompiler.free.fr/">JD</a>, I decompiled the JARs we were given and poked around a little. In the load certificate method, I found this:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kd">static</span> <span class="n">X509Certificate</span> <span class="nf">loadCert</span><span class="p">(</span><span class="n">InputStream</span> <span class="n">paramInputStream</span><span class="p">)</span> <span class="kd">throws</span> <span class="n">CertificateException</span>
<span class="p">{</span>
<span class="n">CertificateFactory</span> <span class="n">localCertificateFactory</span> <span class="o">=</span> <span class="n">CertificateFactory</span><span class="p">.</span><span class="na">getInstance</span><span class="p">(</span><span class="s">"X.509"</span><span class="p">,</span> <span class="n">ProviderInit</span><span class="p">.</span><span class="na">getProvider</span><span class="p">());</span>
<span class="n">X509Certificate</span> <span class="n">localX509Certificate</span> <span class="o">=</span> <span class="p">(</span><span class="n">X509Certificate</span><span class="p">)</span><span class="n">localCertificateFactory</span><span class="p">.</span><span class="na">generateCertificate</span><span class="p">(</span><span class="n">paramInputStream</span><span class="p">);</span>
<span class="c1">// snip</span>
</pre></div>
<p>Isn't Java wonderfully concise? But wait, what's that ProviderInit buisness? More JD magic gives us:</p>
<div class="codehilite"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">ProviderInit</span> <span class="p">{</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="n">Provider</span> <span class="nf">getProvider</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">initDone</span><span class="p">)</span>
<span class="n">init</span><span class="p">();</span>
<span class="k">return</span> <span class="n">provider</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">init</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">initDone</span><span class="p">)</span>
<span class="k">return</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">CryptoJ</span><span class="p">.</span><span class="na">isFIPS140Compliant</span><span class="p">())</span> <span class="p">{</span>
<span class="n">CryptoJ</span><span class="p">.</span><span class="na">setMode</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">CryptoJ</span><span class="p">.</span><span class="na">selfTestPassed</span><span class="p">())</span> <span class="p">{</span>
<span class="k">throw</span> <span class="k">new</span> <span class="n">RuntimeException</span><span class="p">(</span><span class="s">"Crypto-J is disabled"</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// snip</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>So, when ProviderInit is first run, it does some self tests using CryptoJ, which is from an ancient Java crypto library made by RSA called JSAFE. These CryptoJ self tests use SecureRandom judiciously, and thus take a long time if there is little entropy.</p>
<h1 id="the-fix">The Fix</h1>
<p>All we need to do to fix our tests is to call ProviderInit.init() when the JVM loads, rather than when our time sensitive code is being run. Easy enough!</p>
<p>The lesson: use /dev/urandom when appropriate, and don't run self tests in non-debug code.</p>A Difficult Bug2013-05-08T14:32:00-07:002013-05-08T14:32:00-07:00Sean Cassidytag:www.seancassidy.me,2013-05-08:/a-difficult-bug.html<p>I recently had a tough bug to deal with, and I think it makes for a good story.</p>
<h1 id="background">Background</h1>
<p>As part of my work on <a href="https://www.seancassidy.me/the-origins-of-the-diluvian-network.html">DiNet</a>, I needed to write a <a href="http://en.wikipedia.org/wiki/Radix_tree">Radix trie</a> for reasons that should be obvious if you understand how important prefixes are to DiNet and its …</p><p>I recently had a tough bug to deal with, and I think it makes for a good story.</p>
<h1 id="background">Background</h1>
<p>As part of my work on <a href="https://www.seancassidy.me/the-origins-of-the-diluvian-network.html">DiNet</a>, I needed to write a <a href="http://en.wikipedia.org/wiki/Radix_tree">Radix trie</a> for reasons that should be obvious if you understand how important prefixes are to DiNet and its HTTP API.</p>
<p>I wrote a <a href="https://bitbucket.org/scassidy/dinet/src/cf44bedb796246056824c9cd3222169916bef8c1/radix/radix.c?at=master">first draft of a radix trie</a> and it was awful. It was slow, difficult to reason about, and almost certainly buggy in surprising ways. But as this was a pet project and I was having fun, I just soldered on.</p>
<p>Until my program started crashing, that is.</p>
<p>After about a day of running, processing only little bit of data - one packet every ten seconds - the server would crash. The stacktrace was always in the radix trie code.</p>
<p>This buggy, poorly written code had to go. I <a href="https://bitbucket.org/scassidy/data-structures/src/master/radixtrie/radix.c">rewrote the radix trie</a>, added tests, and ran it through Valgrind. The new version was fast and correct. It had no memory leaks, invalid reads or writes, or other memory issues.</p>
<p>However, when running the same test, it segfaulted in the same place! With a similar stacktrace! How could this be possible?</p>
<h1 id="debugging-with-gdb">Debugging with GDB</h1>
<p>The problem was that after a while, nodes went missing. A parent node kept track of its children with a list of pointers, which was apparently being overwritten by <em>something</em>.</p>
<p>Further, while I stepped through the code, nothing was wrong! The logic was there, and actually relatively well tested. Static analysis was also failing me.</p>
<p>One thing jumped out at me though as a glaring red flag: if I tracked which node had the issue, it was always at the same address. Curious, I set a <a href="http://beej.us/guide/bggdb/#hardwatch">hardware watchpoint</a> for writes on that address. For those who are unfamiliar, a watchpoint will stop program execution whenever the address is accessed, and can be configured to stop for read or write access.</p>
<p>So, my program would break on that address. Since this address was a pointer to a node in the radix trie, I expected that only the radix trie code would access it. I was wrong.</p>
<p>Something was calling free on that very address! But what? It looked like it was a <a href="https://bitbucket.org/scassidy/dinet/src/ca68f145b19fb6837c0455672fcc7c50a7e5cf60/cache.c?at=master#cl-23">completely different section of code, the LRU cache</a>:</p>
<div class="codehilite"><pre><span></span><span class="kt">int</span> <span class="nf">add_to_cache</span><span class="p">(</span><span class="k">struct</span> <span class="n">lru_cache</span> <span class="o">*</span><span class="n">cache</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">data</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">cache_key</span><span class="p">,</span> <span class="kt">int</span> <span class="n">key_len</span><span class="p">,</span>
<span class="kt">void</span> <span class="o">**</span><span class="n">deleted_ptr</span><span class="p">,</span> <span class="kt">bool</span> <span class="n">free_deleted</span><span class="p">)</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">hash_struct</span> <span class="o">*</span><span class="n">tmp</span><span class="p">,</span> <span class="o">*</span><span class="n">tmp_entry</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">deleted</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="c1">// Find any entry in the LRU with key cache_key of length</span>
<span class="c1">// key_len, if any, and store it in tmp </span>
<span class="n">HASH_FIND</span><span class="p">(</span><span class="n">hh</span><span class="p">,</span> <span class="n">cache</span><span class="o">-></span><span class="n">cache</span><span class="p">,</span> <span class="n">cache_key</span><span class="p">,</span> <span class="n">key_len</span><span class="p">,</span> <span class="n">tmp</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">tmp</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// If there was no entry, see if it will fit</span>
<span class="k">if</span> <span class="p">(</span><span class="n">HASH_COUNT</span><span class="p">(</span><span class="n">cache</span><span class="o">-></span><span class="n">cache</span><span class="p">)</span> <span class="o">>=</span> <span class="n">cache</span><span class="o">-></span><span class="n">max_items</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Cache is full, so remove the eldest node</span>
<span class="n">HASH_ITER</span><span class="p">(</span><span class="n">hh</span><span class="p">,</span> <span class="n">cache</span><span class="o">-></span><span class="n">cache</span><span class="p">,</span> <span class="n">tmp</span><span class="p">,</span> <span class="n">tmp_entry</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Delete the uthash data</span>
<span class="n">HASH_DELETE</span><span class="p">(</span><span class="n">hh</span><span class="p">,</span> <span class="n">cache</span><span class="o">-></span><span class="n">cache</span><span class="p">,</span> <span class="n">tmp</span><span class="p">);</span>
<span class="c1">// Set the pointer we will delete here so we can do</span>
<span class="c1">// other stuff with it later</span>
<span class="k">if</span> <span class="p">(</span><span class="n">deleted_ptr</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="o">*</span><span class="n">deleted_ptr</span> <span class="o">=</span> <span class="n">tmp</span><span class="o">-></span><span class="n">data</span><span class="p">;</span>
<span class="n">deleted</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// If we're freeing the data for the caller, do so now</span>
<span class="k">if</span> <span class="p">(</span><span class="n">free_deleted</span><span class="p">)</span>
<span class="n">free</span><span class="p">(</span><span class="n">tmp</span><span class="o">-></span><span class="n">data</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">tmp</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// Insert the new node into the LRU</span>
<span class="k">struct</span> <span class="n">hash_struct</span> <span class="o">*</span><span class="n">hs</span> <span class="o">=</span> <span class="n">dmalloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">hash_struct</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">hs</span><span class="o">-></span><span class="n">key</span><span class="p">,</span> <span class="n">cache_key</span><span class="p">,</span> <span class="n">key_len</span><span class="p">);</span>
<span class="n">HASH_ADD</span><span class="p">(</span><span class="n">hh</span><span class="p">,</span> <span class="n">cache</span><span class="o">-></span><span class="n">cache</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">key_len</span><span class="p">,</span> <span class="n">hs</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// if we deleted anything, return non-zero</span>
<span class="k">return</span> <span class="n">deleted</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>I added some extra comments so that it's easier to read for someone unfamiliar to the code. But let's take it step by step.</p>
<div class="codehilite"><pre><span></span><span class="kt">int</span> <span class="nf">add_to_cache</span><span class="p">(</span><span class="k">struct</span> <span class="n">lru_cache</span> <span class="o">*</span><span class="n">cache</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">data</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">cache_key</span><span class="p">,</span> <span class="kt">int</span> <span class="n">key_len</span><span class="p">,</span>
<span class="kt">void</span> <span class="o">**</span><span class="n">deleted_ptr</span><span class="p">,</span> <span class="kt">bool</span> <span class="n">free_deleted</span><span class="p">)</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">hash_struct</span> <span class="o">*</span><span class="n">tmp</span><span class="p">,</span> <span class="o">*</span><span class="n">tmp_entry</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">deleted</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="c1">// Find any entry in the LRU with key cache_key of length</span>
<span class="c1">// key_len, if any, and store it in tmp </span>
<span class="n">HASH_FIND</span><span class="p">(</span><span class="n">hh</span><span class="p">,</span> <span class="n">cache</span><span class="o">-></span><span class="n">cache</span><span class="p">,</span> <span class="n">cache_key</span><span class="p">,</span> <span class="n">key_len</span><span class="p">,</span> <span class="n">tmp</span><span class="p">);</span>
</pre></div>
<p>So, the <em>add_to_cache</em> method does is it takes an <em>lru_cache</em>, some data, the lookup key and how long that is is and places it in the LRU if it's not already there. The other two parameters are optional features: when it removes something from the cache, it can place the data pointer that was there in <em>deleted_ptr</em>, and to optionally free that pointer if <em>free_deleted</em> was true.</p>
<p>I used the excellent <a href="http://troydhanson.github.io/uthash/">uthash</a> library to implement a hash table that was size limited and would remove the eldest member, which is always a <a href="https://bitbucket.org/scassidy/dinet/src/ca68f145b19fb6837c0455672fcc7c50a7e5cf60/cache.h?at=master#cl-8">struct hash_struct</a>. The <em>HASH_FIND</em> call here will return the value stored at <em>cache_key</em>, if any and NULL otherwise.</p>
<div class="codehilite"><pre><span></span> <span class="k">if</span> <span class="p">(</span><span class="n">tmp</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// If there was no entry, see if it will fit</span>
<span class="k">if</span> <span class="p">(</span><span class="n">HASH_COUNT</span><span class="p">(</span><span class="n">cache</span><span class="o">-></span><span class="n">cache</span><span class="p">)</span> <span class="o">>=</span> <span class="n">cache</span><span class="o">-></span><span class="n">max_items</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Cache is full, so remove the eldest node</span>
<span class="n">HASH_ITER</span><span class="p">(</span><span class="n">hh</span><span class="p">,</span> <span class="n">cache</span><span class="o">-></span><span class="n">cache</span><span class="p">,</span> <span class="n">tmp</span><span class="p">,</span> <span class="n">tmp_entry</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Delete the uthash data</span>
<span class="n">HASH_DELETE</span><span class="p">(</span><span class="n">hh</span><span class="p">,</span> <span class="n">cache</span><span class="o">-></span><span class="n">cache</span><span class="p">,</span> <span class="n">tmp</span><span class="p">);</span>
</pre></div>
<p>If there was no key at this location, we need to put it into the LRU. So we count how many items are currently in the cache and compare that against the precomputed max. If we're full, iterate through the cache with <em>HASH_ITER</em>, which iterates them in eldest-first order. Then, remove it from the cache with <em>HASH_DELETE</em>.</p>
<div class="codehilite"><pre><span></span> <span class="c1">// Set the pointer we will delete here so we can do</span>
<span class="c1">// other stuff with it later</span>
<span class="k">if</span> <span class="p">(</span><span class="n">deleted_ptr</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="o">*</span><span class="n">deleted_ptr</span> <span class="o">=</span> <span class="n">tmp</span><span class="o">-></span><span class="n">data</span><span class="p">;</span>
<span class="n">deleted</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// If we're freeing the data for the caller, do so now</span>
<span class="k">if</span> <span class="p">(</span><span class="n">free_deleted</span><span class="p">)</span>
<span class="n">free</span><span class="p">(</span><span class="n">tmp</span><span class="o">-></span><span class="n">data</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">tmp</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Using <em>HASH_DELETE</em> removes it from the cache, but it doesn't handle the rest of the memory allocation. So, if the user of <em>add_to_cache</em> asked to be told which data was deleted, store it. If we were told to free the data, free it here. Then free the metadata associated with the cache and break from the loop. We only ever want to remove one node, so we don't need to loop.</p>
<p>And to finish up the function, we add the data to the cache.</p>
<div class="codehilite"><pre><span></span> <span class="c1">// Insert the new node into the LRU</span>
<span class="k">struct</span> <span class="n">hash_struct</span> <span class="o">*</span><span class="n">hs</span> <span class="o">=</span> <span class="n">dmalloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">hash_struct</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">hs</span><span class="o">-></span><span class="n">key</span><span class="p">,</span> <span class="n">cache_key</span><span class="p">,</span> <span class="n">key_len</span><span class="p">);</span>
<span class="n">HASH_ADD</span><span class="p">(</span><span class="n">hh</span><span class="p">,</span> <span class="n">cache</span><span class="o">-></span><span class="n">cache</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">key_len</span><span class="p">,</span> <span class="n">hs</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// if we deleted anything, return non-zero</span>
<span class="k">return</span> <span class="n">deleted</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>But why was the LRU code trying to delete a radix trie node? Something it should never do, as the data pointer in the code didn't have anything to do with radix trie nodes. </p>
<p>Did you spot the bug? Try and find it before continuing on to the answer.</p>
<h2 id="the-fix">The Fix</h2>
<p>To solve this bug, I went back to where it was crashing: in the LRU, on a free of a data pointer supplied by the consumer of the LRU and radix trie.</p>
<div class="codehilite"><pre><span></span><span class="c1">// If we're freeing the data for the caller, do so now</span>
<span class="k">if</span> <span class="p">(</span><span class="n">free_deleted</span><span class="p">)</span>
<span class="n">free</span><span class="p">(</span><span class="n">tmp</span><span class="o">-></span><span class="n">data</span><span class="p">);</span>
</pre></div>
<p>And how was this data pointer set? It was passed into the add_to_cache method as we saw. And then completely forgotten about. We were freeing uninitialized memory. This memory just so happened to always point to a radix trie node.</p>
<p>So, <a href="https://bitbucket.org/scassidy/dinet/commits/0b4f9494d4fee0222d8248690898111fa2684a39#Lcache.cT50">the fix</a> was straightforward. Just keep track of the data pointer. Here is the diff:</p>
<div class="codehilite"><pre><span></span> <span class="c1">// Insert the new node into the LRU</span>
<span class="k">struct</span> <span class="n">hash_struct</span> <span class="o">*</span><span class="n">hs</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">hash_struct</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">hs</span><span class="o">-></span><span class="n">key</span><span class="p">,</span> <span class="n">cache_key</span><span class="p">,</span> <span class="n">key_len</span><span class="p">);</span>
<span class="o">+</span> <span class="n">hs</span><span class="o">-></span><span class="n">data</span> <span class="o">=</span> <span class="n">data</span><span class="p">;</span>
<span class="n">HASH_ADD</span><span class="p">(</span><span class="n">hh</span><span class="p">,</span> <span class="n">cache</span><span class="o">-></span><span class="n">cache</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">key_len</span><span class="p">,</span> <span class="n">hs</span><span class="p">);</span>
</pre></div>
<p>Such a simple mistake.</p>
<h1 id="preventing-this-bug">Preventing this bug</h1>
<p>This bug had multiple causes, and debugging it would have been easier if I had used best practices.</p>
<p>Some of the causes were:</p>
<ol>
<li>Making the code do too much (why does <em>add_to_cache</em> do so much for the user, such as freeing memory?)</li>
<li>Weak separation of concern (whose job is it to free the memory?)</li>
<li>Poorly documented code contracts</li>
<li>Not enough unit testing</li>
</ol>
<p>To make the debugging faster, two things should have been done: more, better unit tests, and using Valgrind (specifically track-origins) earlier. I had used Valgrind to verify my radix trie, but then I was convinced I could find it without it. Valgrind would have pointed me to the invalid free much more quickly than static analysis or GDB did.</p>
<p>It's important to try and understand why the bugs we had manifested themselves. Understanding how they occurred is an important step in preventing them. Especially when using a language that lets you get away with a lot, such as C in this case, it's paramount to have good testing and to use all of your debugging tools.</p>
<p><em>Update</em>: Other practices that were suggested to me that would have helped to avoid and debug this scenario:</p>
<p>Setting a pointer to NULL after using it like so:</p>
<div class="codehilite"><pre><span></span><span class="cp">#define dfree(ptr) do { free(ptr); ptr = NULL; } while(0)</span>
</pre></div>
<p>This may have made the problem easier to debug.</p>
<p>Another idea that would have made it obvious that I was freeing uninitialized memory would have been to use <em>calloc</em> instead of <em>malloc</em>. I used <em>calloc</em> elsewhere and I'm not sure why I decided to use <em>malloc</em> in this case.</p>The Origins of the Diluvian Network2013-04-22T18:55:00-07:002013-04-22T18:55:00-07:00Sean Cassidytag:www.seancassidy.me,2013-04-22:/the-origins-of-the-diluvian-network.html<p>Last year, sometime around June, I heard of <a href="https://crypto.cat/">Cryptocat</a> and thought it was a pretty cool idea. Someone actually making cryptography available for the masses in an intuitive way without needing complicated key exchanges or a web of trust! Finally.</p>
<p>However, it was not, at the time, very secure. Bruce …</p><p>Last year, sometime around June, I heard of <a href="https://crypto.cat/">Cryptocat</a> and thought it was a pretty cool idea. Someone actually making cryptography available for the masses in an intuitive way without needing complicated key exchanges or a web of trust! Finally.</p>
<p>However, it was not, at the time, very secure. Bruce Schneier <a href="http://www.schneier.com/blog/archives/2012/08/cryptocat.html">commented on Cryptocat</a> that it was similar to Hushmail in its security. Trust them to keep your data safe. That's not really very secure for anything worth hiding. Matthew Green did <a href="http://blog.cryptographyengineering.com/2013/03/here-come-encryption-apps.html">some analysis on Cryptocat</a>, which wasn't very positive from a security perspective.</p>
<p>There was also a big gap in Cryptocat from my perspective: a lack of anonymity. That's not important for many people, but if you want to "topple an oppressive government" as Dr. Green puts it, it is important. So what to do?</p>
<p>While I'm interested in cryptography and have taken several university level classes in it, I am by no means an expert in cryptography. How could I build a system that was more secure and more anonymous? By solving a different problem.</p>
<p>Imagine a new version of Cryptocat where all the messages on the network are sent to every member who is online. As they are all encrypted, the conversations are still private. However, you now cannot tell who is communicating with who easily<sup id="fnref:messages"><a class="footnote-ref" href="#fn:messages">1</a></sup>. Your encryption will need to be top-notch, though, as it would be trivial to get all encrypted messages.</p>
<p>That sounds like a lot of data, though. It won't scale to thousands of users. Why don't we limit the amount of data deliver to each user? Let's put an ID on each message, and take only messages starting with a particular prefix. Then, before communicating, interested parties would share their prefix, so that we'd all be communicating on the same channel.</p>
<p>This is just one of many possible applications which can be built on top of the <a href="https://bitbucket.org/scassidy/dinet">Diluvian Network</a>, also known as dinet. It is a simple idea, really: each router will deliver every message it hasn't seen recently to every router that's listening. Router links are not automatically bidirectional; each router must specify who it is subscribing to and where other routers can listen for updates. The message format is exactly this:</p>
<div class="codehilite"><pre><span></span><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">id</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>
<span class="kt">uint8_t</span> <span class="n">data</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="kt">uint8_t</span> <span class="n">checksum</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="p">}</span> <span class="n">__attribute__</span> <span class="p">((</span><span class="n">packed</span><span class="p">))</span> <span class="n">dpacket</span><span class="p">;</span>
</pre></div>
<p>That's it. It's a lot simpler than <a href="https://bitmessage.org/wiki/Protocol_specification">other protocols</a> and designed to be easy to implement<sup id="fnref:other"><a class="footnote-ref" href="#fn:other">2</a></sup>. There are essentially no message parsing, decoding, or endianness issues to consider. The two ways a packet is invalid are if its SHA-256 checksum does not match and if the packet is of an incorrect length. That's it!</p>
<p>The fixed message size was chosen to limit plaintext length attacks (see <a href="http://www.iacr.org/cryptodb/archive/2002/FSE/3091/3091.pdf">Kelsey</a> or <a href="http://cihangir.forgottenlance.com/papers/lengthhiding-corrected.pdf">Tezcan-Vaudenay</a>). As the network was designed for short text messages (unlike Tor or I2P), this was an important consideration.</p>
<p>By building a network instead of an application I hope that people with clever ideas can use this as an effective tool for creating messaging programs. There is a <a href="http://scassidy.bitbucket.org/dinet/client.html">basic dinet client</a> up now, but there is only one node running, which isn't much of a network at all. It uses the built in HTTP REST API. Also, my Javascript skills aren't great (yet) so the client is pretty rough.</p>
<p>Let me know what you think. Obviously dinet is still (at best) beta quality software, so do not use it for anything that requries serious security. </p>
<div class="footnote">
<hr />
<ol>
<li id="fn:messages">
<p>Yes, you might be able to tell from timing, but there are ways to fix that too. <a class="footnote-backref" href="#fnref:messages" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:other">
<p>There are also no restrictions over how to deliver these messages to other routers. Currently ZeroMQ 3 is used, but you could make an implemenation that used a completely different technology. The only important aspect to remember is that since there are no routing tables, all messages must be delivered to all routers that are subscribed to you. <a class="footnote-backref" href="#fnref:other" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Zipf your variable names2013-03-24T11:42:00-07:002013-03-24T11:42:00-07:00Sean Cassidytag:www.seancassidy.me,2013-03-24:/zipf-your-variable-names.html<script type="text/x-mathjax-config">
MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
</script>
<script type="text/javascript" src="//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<p>I recently found a rather lengthy variable name in some code, and mentioned it to a friend. It looked like this:</p>
<div class="codehilite"><pre><span></span><span class="kd">private</span> <span class="kd">final</span> <span class="kd">static</span> <span class="kt">double</span> <span class="n">THREE_LETTER_ACRONYM_RATIO_FOR_THING</span> <span class="o">=</span> <span class="mf">4.321</span><span class="p">;</span>
</pre></div>
<p>This was used about six times in several functions. This lengthy variable name, which annoyingly followed the …</p><script type="text/x-mathjax-config">
MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
</script>
<script type="text/javascript" src="//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<p>I recently found a rather lengthy variable name in some code, and mentioned it to a friend. It looked like this:</p>
<div class="codehilite"><pre><span></span><span class="kd">private</span> <span class="kd">final</span> <span class="kd">static</span> <span class="kt">double</span> <span class="n">THREE_LETTER_ACRONYM_RATIO_FOR_THING</span> <span class="o">=</span> <span class="mf">4.321</span><span class="p">;</span>
</pre></div>
<p>This was used about six times in several functions. This lengthy variable name, which annoyingly followed the Java standard of using capital letters for constant values (a blatant misunderstanding of why C preprocessor macros are capitalized is probably why this is used in Java), also spelled out a common TLA<sup id="fnref:longtitle"><a class="footnote-ref" href="#fn:longtitle">1</a></sup>.</p>
<p>My friend, however, knew my penchant for short variable names like:</p>
<div class="codehilite"><pre><span></span><span class="n">FooBar</span> <span class="n">fb</span><span class="p">;</span> <span class="c1">// instead of fooBar or even worse, fooBarForWidget</span>
</pre></div>
<p>So, he said that he liked the long name and would have preferred it to some incomprehensible shortening of it. With modern IDEs, why must we remember cryptic shortening of a name? I remembered <a href="http://en.wikipedia.org/wiki/Zipf%27s_law">Zipf's law</a>, and decided to spin it like this, "If you fail to account for Zipf's law, your code will be hard to read due to its constant repetition of contextual information."</p>
<p>What I meant was essentially this: if you're working with, say an image, you have a few different ways to name the height and width variables you're using. You could use "currentImageWidth" or maybe "imageWidth" if you're only working with one image, which could even be shortened to "width" or, gasp, "w". What is the difference between these four variable names? How much contextual information you are embedding within the variable name, rather than relying on the reader to remember that we are talking about images.</p>
<p>The use of a particular one of these four then depends on a few things:</p>
<ol>
<li>How likely is it that the reader is aware we are talking about images?</li>
<li>Are we using only one image or are we dealing with multiple images?</li>
<li>How likely is it that there is only one width that the reader thinks we are talking about? (Perhaps there are other widths as well, such as a page width.)</li>
<li>How likely is it that the reader would associate the letter "w" with an image width?</li>
<li>Would the shortening violate a common idiom, such as using "i" or "j" for something other than a loop counter?</li>
</ol>
<p>Depending on the likelihood of these questions, one would pick the correct variable name. My friend and I agreed that this is indeed the heart of the issue, but couldn't agree on this particular circumstance. If this was the Internet, we would merely start raising our voices and start shouting nonsense.
I decided to go a different route. I found a paper on the topic.</p>
<h1 id="background">Background</h1>
<p>But first, some background. What exactly is Zipf's law and why did I cite it?</p>
<p>Let's say you count the number of times a word is used in a block of text. Now, sort by the count of each word, such that the most common word comes first. What will the distribution of numbers look like? A Zipf distribution, of course.</p>
<p>But what does that look like? Zipf's law appears linear if you plot the count and the rank on a log-log graph because it is a <a href="http://en.wikipedia.org/wiki/Power_law">Power law</a> distribution. This is the same type of function as the <a href="http://en.wikipedia.org/wiki/Pareto_principle">Pareto principle</a> and inverse square laws like gravity. A power law distribution is a density function defined as:</p>
<p>$$ p(x) \propto x^{-\alpha} $$</p>
<p>Zipf's law is the special case where $\alpha \approx 1$. Here is a helpful plot from Wikipedia of Zipf's law at four scaling factors:</p>
<p><a href="http://en.wikipedia.org/wiki/File:Zipf_distribution_PMF.png"><img alt="Zipf's law plotted on log-log" src="https://www.seancassidy.me/static/images/" /></a></p>
<p>You can generalize Zipf's law into the <a href="http://en.wikipedia.org/wiki/Zipf%E2%80%93Mandelbrot_law">Zipf-Mandelbrot law</a> if that's your fancy too. But what does Zipf's law mean in a more general sense?</p>
<p>It means that speakers tend to choose words that are more readly available to them. It's likely part of the same memory recall that the <a href="http://en.wikipedia.org/wiki/Availability_heuristic">availability heuristic</a> is based off of. This, in turn, makes listeners require that speakers embed more information in their messages to supply additional contextual information. </p>
<p>Words that are used more frequently almost certainly have less specific meanings than words that are used less frequently. And what of variable names? I think that if you violate Zipf's law in the length of your variable or function or class names - that is, their frequency exceeds their length - your readers will not thank you, but instead have a harder time reading your code.</p>
<h1 id="least-effort-and-the-origins-of-scaling-in-human-language">Least effort and the origins of scaling in human language</h1>
<p>Knowing Zipf's law I set out to find a modern paper that discussed its implications with regards to how much effort the reader and writer of language must put forth. Knowing Guido's rule that, "code is read much more often than it is written" it is important for writers to exert more effort than readers, so that their extra effort is amortized over many readers.</p>
<p>I found a nice paper from PNAS, called, <a href="http://www.pnas.org/content/100/3/788.abstract">"Least effort and the origins of scaling in human language"</a> by Ramon Ferrer i Cancho and Ricard V. Solé<sup id="fnref:cite"><a class="footnote-ref" href="#fn:cite">3</a></sup>. Let's read it together. The right place to start is the abstract to see if it is what we want. Make sure to grab the paper and follow along. Even better is if you read the section before I discuss it.</p>
<h2 id="the-abstract-and-introduction">The Abstract and Introduction</h2>
<p>Quote from the abstract:</p>
<p>"<em>In this article, the early hypothesis of Zipf of a principle of least effort for explaining the law is shown to be sound. Simultaneous minimization in the effort of both hearer and speaker is formalized with a simple optimization process operating on a binary matrix of signal-object associations. Zipf's law is found in the transition between referentially useless systems and indexical reference systems. Our finding strongly suggests that Zipf's law is a hallmark of symbolic reference and not a meaningless feature.</em>"</p>
<p>So, Zipf's law is mapped to a concept of least effort through a formalization of a process. They do this through a binary matrix - us programmers know all about that, so that's alright - of signal-object associations. So we're going to be mapping signals (symbols) and objects through a matrix. Zipf's law is apparently a transition between two types of communications systems that suck, probably ones associated with full effort on either the speaker or the hearer's part. So in communication systems (and hopefully computer programs apply) Zipf's law is a fundamental aspect. Hopefully we can apply it to variable name-length.</p>
<p>This sounds exactly like what we want, so let's continue.</p>
<p>The introduction discusses the origin of human language and related research. The authors also lay out some of the foundations for the model they will be using. Necessarily, they say that the more meanings for a symbol or word, the more effort the reader needs to expend to decode the word. You can imagine how much more difficult reading a program would be if it was composed only of a few variable names, and reused them all over the place.</p>
<p>However, Zipf's law also states that speakers or writers will tend to choose the most frequent and most ambiguous words. There seems to be a conflict between ease of speaking (choose highly available words) and ease of listening (listen to unambiguous words).</p>
<h2 id="the-model">The Model</h2>
<p>Now the authors seek to take what they're saying out of the realm of hand waving and into something more concrete: a mathematical model. In the paper they start from symbols and objects and work towards a cost function, but I think it's slightly easier to work the other way around.</p>
<p>We need a way to represent cost of both the hearer and the speaker. Naturally it makes sense that the more effort the hearer needs to expend, the less effort the speaker needs to expend and vice versa. So we can write the cost function, $Ω$, this way, where $E_h$ is the effort spent by the hearer, and $E_s$ is the effort spent by the speaker:</p>
<p>$$ \Omega(\lambda) = \lambda E_h + (1 - \lambda) E_s $$</p>
<p>Since $\Omega(\lambda)$ is a cost function, it stands to reason that we want to lower it. The input to this function, $\lambda$, is merely the weighting between hearer and speaker.</p>
<h3 id="shannon-entropy">Shannon Entropy</h3>
<p>Sounds good, but how can we represent $E_h$ and $E_s$? One way is through <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">the Shannon entropy</a> of a message, which is represented as the expected value of the amount of information in a message:</p>
<p>$$ H(X) = E(I(X)) $$</p>
<p>Do you remember <a href="http://en.wikipedia.org/wiki/Expected_value">expected value</a> of a <a href="http://en.wikipedia.org/wiki/Random_variable">random variable</a>? Given a set of outcomes and their probabilties, the expected value is the sum of values times their probability. In the case of all items having equal probability, it reduces to a simple average.</p>
<p>What is $I$? This is the <a href="http://en.wikipedia.org/wiki/Self-information">self-information</a> of a random variable $X$. What's that? Well, the information content of a message is the number of bits it needs to represent it; the message 0101 takes four bits to represent. Is that the self-information? Well, it depends on the possible messages that can be communicated.</p>
<p>Say you have this program:</p>
<div class="codehilite"><pre><span></span><span class="cp">#include</span> <span class="cpf"><stdio.h></span><span class="cp"></span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"0101</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>How many different messages can this program give you? Just one. So, how much information is this program telling you? Nothing, it always gives you the same message, 0101. It doesn't <em>tell</em> you anything, and thus conveys no information.</p>
<p>However, if the messages could change, they can convey information. The more unlikely the messages, the more <em>information</em> they contain<sup id="fnref:information"><a class="footnote-ref" href="#fn:information">2</a></sup>.</p>
<p>If each bit has a 50% probability of occuring, the self-information of getting any message other than 0101 (of four binary digits) is:</p>
<p>$$ I(b_i) = - \log (b_i) = - \log_2 ( \frac{ 4^2 - 1}{4^2} ) \approx 0.093 \text{bits}$$</p>
<p>Not very much information because getting any other message is very likely. If all bits are equally likely, though, the information content of 0101 becomes $-\log_2 (1 / 16) = 4$, an expected result.</p>
<p>Put simply, the Shannon entropy, (the function $H(X)$), is the expected information content of a random message $X$. The function $H(X)$ can also be measured in bits. </p>
<p>This can be rewritten in this more useful way, using the normal notation for Shannon entropy, which is the way the authors used in the paper:</p>
<p>$$ H_n(s_i) = - \sum_{i=1}^n p(s_i) \log_n p(s_i) $$</p>
<p>The function $p(s_i)$ represents the probability of symbol $s_i$ being used, like before. This equation works well for the speaker, and is a good candidate for $E_s$. It says this: the effort level of the speaker is the sum of self-information of each symbol and its likelihood.</p>
<p>If all symbols are equally likely, then $H$ tends towards one, the maximum effort level for the speaker. Obviously contextual hints to lower some symbols' likelihood will be necessary to lower speaker costs.</p>
<h3 id="object-matrices-and-bats">Object matrices and bats</h3>
<p>But what are $s_i$ and $n$? Beginning the section on the model, the authors define an n-by-m matrix, call it $\mathbf{A}$, which translates n symbols (words) into m reference objects. </p>
<p>A symbol, such as the word "baseball", would map to an actual baseball in the room in which you are speaking. Actually it needn't be a physical baseball, but could reference a concept or be a metaphor.</p>
<p>It is also possible to have a symbol reference multiple objects (such as "bat" matching both a baseball bat and the animal) and for multiple symbols to match the same object (synonyms). </p>
<p>The set of symbols is denoted $\mathcal{S}$ and the set of reference objects is denoted $\mathcal{R}$.</p>
<h3 id="but-what-about-the-hearer">But what about the hearer?</h3>
<p>The same function doesn't work for the hearer, though. Since they are not choosing the words, they cannot describe their effort function in exactly the same way. We need a way to say: how much effort does the hearer expend when they hear a given symbol $s_i$? We need the conditional version of the speaker's effort function.</p>
<p>The authors provide this logical extension:</p>
<p>$$ H_m( \mathcal{R} | s_i ) = - \sum_{j=1}^m p(r_j | s_i) \log_m p(r_j | s_i) $$</p>
<p>If you're not familiar with the notation this is <a href="http://en.wikipedia.org/wiki/Conditional_probability">conditional probability</a> notation applied to the effort function. It reads, the effort level of the listener decoding a symbol $s_i$ with set of reference objects denoted by $\mathcal{R}$, is the expected information content of a decoded message. Decoded, because the reference object $r_j$ depends on $s_i$, the symbol in question.</p>
<p>Given what we know about $\mathcal{R}$ and $\mathcal{S}$, how can we describe $p(r_j | s_i)$?</p>
<p>To simplify things, the authors define all reference objects as equally likely, that is $p(r_j) = 1/m$, where $m$ is the size (cardinality) of $\mathcal{R}$. But what should $p(s_i)$ be?</p>
<p>Well, the probability of a symbol $s_i$ is the probability of that symbol appearing associated with all reference objects it maps to. Simple enough. But wait, we actually haven't finished with $H_m$. We have the probability effort level given a particular symbol, but we need the effort level for all symbols. We can use <a href="http://en.wikipedia.org/wiki/Information_theory#Conditional_entropy_.28equivocation.29">conditional entropy</a> to achieve this. I believe this formula (8 in the paper) has a typo. It should be the following:</p>
<p>$$ H_m( \mathcal{R} | \mathcal{S} ) = \sum_{i=1}^n p(s_i) H_m (\mathcal{R} | s_i) $$</p>
<p>They used the joint probability instead of the conditional probability. If you use the joint probability, this function will be off by $H_m(\mathcal{R})$ by the chain rule. Regardless, we now have candidates for $E_h$ and $E_s$, namely $H_m(\mathcal{R} | \mathcal{S})$ and $H_n(\mathcal{S})$.</p>
<p>$$ \Omega(\lambda) = \lambda H_m(\mathcal{R} | \mathcal{S}) + (1 - \lambda) H_n(\mathcal{S}) $$</p>
<p>So what this says, is that the cost of speaking is weighted by $\lambda$, the higher it is, the more work the hearer needs to do. The lower it is, the more work the speaker needs to do. This is also dependent on the size of $\mathcal{S}$ and its relationship to $\mathcal{R}$, which the authors explore more fully in the next section.</p>
<h2 id="methods-and-results">Methods and Results</h2>
<p>How do you find the minimum of such a function? I don't know of any analytical methods. The authors do something pretty simple. They start with the matrix $\mathbf{A}$, and then compute the cost function $\Omega(\mathbf{A})$, and check if this is the lowest cost they've found yet. If so, it is stored as the new minimum.</p>
<p>In either case, they flip some elements of the matrix and try again. If the same matrix is the lowest for $2nm$ iterations, they consider it the global minimum. I don't think this is the most efficient way to find the minimum, and there's no guarantee that they will find the global minimum. This method is inherently random, so it doesn't need the usual randomness that a hill climbing algorithm needs.</p>
<p>With this information, they compute two important values: the mutual information shared by the two parties, and the size of lexicon. The size of the lexicon is defined as the number of symbols that refer to at least one reference object. What happens when you plot mutual information and lexicon size as a function of $\lambda$?</p>
<p><a href="https://www.seancassidy.me/static/images/"><img src="/images/least_result.png" width="820" alt="Information and Lexicon size as a function of lambda"/></a></p>
<p>There is a sharp transition at $\lambda^* = 0.41$ around which resembles a Zipf distribution. $\lambda^*$ means that this is the value of $\lambda$ that minimizes $\Omega(\lambda)$.</p>
<p>So, it seems that human language resembles a cost function where the speaker bears slightly more of the cost than the listener. Further, it seems that Zipf's law follows naturally from a compromise on effort level by speakers and hearers.</p>
<p>But what of the other two extremes? If $\lambda$ were to decrease, mutual information decreases along with lexicon size. This means that the speaker must use fewer unique words that have more meanings. This increases the work for the hearer, who must decode from contextual information what each word means. I imagine <a href="http://en.wikipedia.org/wiki/Talking_drum">African talking drums</a> are the logical extreme of this idea.</p>
<p>If $\lambda$ were to increase, however, notice that the mutual information and lexicon size increase. This means that contextual information decreases as the message itself contains more and more information and less and less redundant hints. Each symbol has fewer synonyms. Imagine a program where across millions of lines of code, each variable name had to be unique. Frightening and wasteful.</p>
<h1 id="conclusions">Conclusions</h1>
<p>What can we take away from this article? Zipf's law is not some random manifestation of language or communication but an arbitration between communicating parties. It lands at a logical conclusion of compromise that is amenable to programmers: writers do more work than readers but not so much that the lexicon grows to an untenable size and contextual information is kept at a reasonable level.</p>
<p>I found another article that I thought would be relevant, "<a href="http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=4308973">The Emergence of Zipf's Law: Spontaneous Encoding Optimization by Users of a Command Language</a>," by Steven R. Ellis and Robert J. Hitchcock<sup id="fnref:cite2"><a class="footnote-ref" href="#fn:cite2">4</a></sup>. It discusses plotting the length of Unix commands and their frequency with the hypothesis that the more expert the user the more closely their usage follows Zipf's Law.</p>
<p>This paper has a few issues, not the least of which is their lack of statistical documentation and extremely small sample size of only ten users. I do not think it is worth the cost IEEE asks for. However, they had some interesting recommendations: design your language to make the most often used things easy to use, and allow both new and experienced users an optimal experience.</p>
<p>How do you do both? You need to make a trade-off between ambiguity and cost. But always pay attention to the frequency of use. If it's more frequently used, it deserves a short, efficient name. </p>
<p>The variable name currentImageWidth might be just fine if you use it only twice, but if you use it two dozen times in a short time frame: make it short. </p>
<p>Make it follow Zipf's law.</p>
<h1 id="examples">Examples</h1>
<p>The most frequently used word in this article was "the" with 193 mentions, followed by "of" with 105 and then "a" with 89. Interestingly, the word "Zipf" makes the top ten with 35 mentions.</p>
<p>But how can you check what you use in your source code? Well I nabbed <a href="http://unix.stackexchange.com/a/41480">this snippet</a> and modified it slightly to work better with programs thusly:</p>
<div class="codehilite"><pre><span></span>tr -c <span class="s1">'[:alnum:]'</span> <span class="s1">'[\n*]'</span> < test.c <span class="p">|</span> egrep <span class="s1">'^[[:alpha:]]'</span> <span class="p">|</span> sort <span class="p">|</span> uniq -c <span class="p">|</span> sort -nr <span class="p">|</span> head -12 <span class="p">|</span> nl
</pre></div>
<p>This will output a list like this (from my project <a href="https://bitbucket.org/scassidy/livestats/">LiveStats</a>):</p>
<div class="codehilite"><pre><span></span><span class="err"> 1 89 self</span>
<span class="err"> 2 37 i</span>
<span class="err"> 3 21 x</span>
<span class="err"> 4 18 tiles</span>
<span class="err"> 5 17 heights</span>
<span class="err"> 6 17 for</span>
<span class="err"> 7 16 in</span>
<span class="err"> 8 16 d</span>
<span class="err"> 9 16 count</span>
<span class="err">10 15 median</span>
<span class="err">11 15 item</span>
<span class="err">12 15 def</span>
</pre></div>
<p>This includes a python keyword, for, which should be excluded. We can plot this via <a href="https://www.seancassidy.me/etc/test.gnuplot">this small gnuplot snippet</a> to generate this graph:</p>
<p><img alt="Zipf plot of LiveStats" src="https://www.seancassidy.me/static/images/" /></p>
<p>This plot is the same log-log style plot we saw earlier, but of the data in my program. It seems to have a Zipf-like distribution in its frequency. The exponent is near -1, which would be appropriate.</p>
<p>Try it out on your programs and see what you get.</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:longtitle">
<p>I realize the apparent contradiction here in writing such a lengthy post and complaining about lengthy variables names. <a class="footnote-backref" href="#fnref:longtitle" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:information">
<p>If you're interested in learning more about information theory and the history thereof, I recommend <a href="http://www.amazon.com/The-Information-History-Theory-Flood/dp/1400096235/">The Information</a> by James Gleick. It's a really fun read with lots of interesting tidbits and histories. <a class="footnote-backref" href="#fnref:information" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:cite">
<p>i Cancho, Ramon Ferrer, and Ricard V. Solé. "Least effort and the origins of scaling in human language." Proceedings of the National Academy of Sciences 100.3 (2003): 788-791. <a class="footnote-backref" href="#fnref:cite" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:cite2">
<p>Ellis, S.R.; Hitchcock, Robert J., "The Emergence of Zipf's Law: Spontaneous Encoding Optimization by Users of a Command Language," Systems, Man and Cybernetics, IEEE Transactions on , vol.16, no.3, pp.423,427, May 1986 <a class="footnote-backref" href="#fnref:cite2" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
</ol>
</div>H.264 and VP8, compared2013-03-14T22:40:00-07:002013-03-14T22:40:00-07:00Sean Cassidytag:www.seancassidy.me,2013-03-14:/h264-and-vp8-compared.html<p>Google has recently published <a href="http://www.ietf.org/mail-archive/web/rtcweb/current/msg06787.html">test results comparing VP8 to H.264</a>. It created somewhat of a stir on the <a href="http://mailman.videolan.org/pipermail/x264-devel/2013-March/009913.html">x264-devel mailing list</a>. I thought I would add what I found two years ago as a result of my thesis work. That is, these results are not fully up-to-date, but still …</p><p>Google has recently published <a href="http://www.ietf.org/mail-archive/web/rtcweb/current/msg06787.html">test results comparing VP8 to H.264</a>. It created somewhat of a stir on the <a href="http://mailman.videolan.org/pipermail/x264-devel/2013-March/009913.html">x264-devel mailing list</a>. I thought I would add what I found two years ago as a result of my thesis work. That is, these results are not fully up-to-date, but still interesting.</p>
<h1 id="a-basic-comparison">A Basic Comparison</h1>
<p>My thesis involved comparing H.264 and VP8 using some new methods I developed from scratch, and others that I applied from other fields. The basic way that video codecs are compared today is measuring bitrate verses average quality of the encoded video, as seen in <a href="http://downloads.webmproject.org/ietf_tests/vp8_vs_h264_quality.html">Google's results</a>. I found this an insufficient and possibly an inaccurate way to compare videos, which are composed of still frames.</p>
<p>Instead, I decided to compare results frame-by-frame, and then statistically analyze the results using box plots and Students <em>t</em>-test for statistical significance testing, to see if there was actually any difference. Here is one example:</p>
<p><img alt="PSNR Comparison of H.264 Baseline, VP8, and JM" src="https://www.seancassidy.me/static/images/" /></p>
<p>This is a comparison of H.264 Baseline, VP8 at the "good" deadline, and the <a href="http://iphome.hhi.de/suehring/tml/">JM reference encoder</a> for several well known reference videos. As this is a box plot, it shows the minimum <a href="http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio">PSNR</a> value (the bottom part of the whisker), the 1st quartile (the start of the box), the median (the line in the box), the average (the single point, usually a plus, x, or an asterisk), the 3rd quartile (the top of the box), and the maximum. All encoders were encode for PSNR maximization, and as equivalent as settings as was possible. This graph in particular was two pass 150 kbps.</p>
<p>VP8 does well in this graph, outperforming H.264 Baseline (using the x264 encoder) in most videos. Videos that have low movement (such as Akiyo) do particularly well, where VP8's least well encoded frame is better than H.264's average. At low resolution and bitrate, when compared to H.264 Baseline, VP8 does well.</p>
<p>An important aspect of comparing video codecs is to set the encoder settings correctly. I tried to match the settings between these three encoders as closely as I could. One important part of using x264 properly was to disable the psy-rd optimizations that can hurt PSNR or <a href="http://en.wikipedia.org/wiki/Structural_similarity">SSIM</a> scores, but, allegedly, improve subjective viewing performance.</p>
<p>When comparing VP8 to H.264 High profile, the story is different. <em>Note</em>: I'm switching from PSNR to SSIM here. This is to show that I used both in my thesis and not to misrepresent the results. VP8 performed similarly in the PSNR results, and my full thesis contains both graphs and analyses.</p>
<p><img alt="SSIM Comparison of H.264 High, VP8, and JM" src="https://www.seancassidy.me/static/images/" /></p>
<p>This H.264 High profile, VP8 Best deadline, and the JM reference encoder on the same videos, with otherwise similar settings, compared in SSIM rather than PSNR. In this one, H.264 outperforms VP8 in nearly every video by a substantial margin. For these tests, I also compared them using Student's <em>t</em>-test, so I'm not just eyeballing this one: the null hypothesis was rejected soundly.</p>
<h1 id="more-interesting-graphs">More interesting graphs</h1>
<p>Another graph I produced was RD curves, and then computed <em>p</em>-values for Student's <em>t</em>-test. Here is the graph of the RD curve measured in PSNR for Stockholm, which is a 720p resolution video:</p>
<p><img alt="RD Curve for Stockholm 720p" src="https://www.seancassidy.me/static/images/" /></p>
<p>There is seemingly little difference between Baseline and VP8's Good deadline, but a higher difference between High profile and VP8's Best deadline. But was that difference statisitically significant? Yes! The <em>p</em>-values that were computed for High verses Best were <math title="5.8075 \times 10^-5"><mstyle displaystyle="true" fontfamily="serif"><mn>5.8075</mn><mo>×</mo><msup><mn>10</mn><mrow><mo>-</mo><mn>5</mn></mrow></msup></mstyle></math> for PSNR and <math fontfamily="serif"><mn>0.00039</mn></math> for SSIM. Both of these values met the statistical significance threshold of <math fontfamily="serif"><mi>α</mi><mo>=</mo><mn>0.1</mn></math> set earlier.</p>
<p>Comparing frame-by-frame statistics and rate distortion curves is good, but what if there are differences and interesting behavior that these course measures hide? I generated some graphs that plotted frame-by-frame results. This are the SSIM values for five videos stitched together: BlueSky, Tractor, Riverbed, Pedestrian Area, and Rush Hour, which can be found on <a href="5">xiph.org</a>. The vertical lines in the graph mark the scene changes.</p>
<p><img alt="SSIM Measurement for 1080p video" src="https://www.seancassidy.me/static/images/" /></p>
<p>VP8, in this example does rather well against both H.264 High profile and Baseline profile at the end, and it does alright in the middle and poorly during the zoom in that is featured in Tractor. I thought that this was surprising, as VP8 did poorly in a one-on-one comparison in Rush Hour alone. It turned out that the way the RD optimizer worked in VP8 (at the time) was that it tended to allocate too much bandwidth to the end of videos, even in two pass mode. This explains the dramatic performance boost. I imagine that they have fixed this by now.</p>
<p>This is one reason not to blindly accept frame-by-frame PSNR or SSIM values. They do not tell a full story. Especially when the <a href="http://tools.ietf.org/agenda/86/slides/slides-86-rtcweb-9.pdf">graph doesn't include a key</a> (see the last slide). I feel that these graphs are more likely to mislead than to elucidate.</p>
<h1 id="intra-frame-analysis">Intra-frame analysis</h1>
<p>An important aspect of video coding is the compression of key frames so that their presence takes the minimum amount of bandwidth necessary for high quality reference pictures and accurate inter-prediction. For this test, every frame was set to be a key frame and the bitrate controlled by quantization factor and the output bitrate plotted.</p>
<p>This graph is of H.264 Baseline compared to VP8's Good deadline. The error bars represent the standard deviation and the point is the average value for SSIM.</p>
<p><img alt="Intra-frame comparison of H.264 Baseline and VP8" src="https://www.seancassidy.me/static/images/" /></p>
<p>VP8 does very well in this comparison, outperforming H.264 Baseline on average in every comparison. But was it statistically significant? No. The standard deviation was high enough, and the difference low enough that we can't reject the null hypothesis (that they are similar in quality). </p>
<p>To comment about the statistical significance of the PSNR and SSIM results, Student's <em>t</em>-test and Welch's <em>t</em>-test were used. First, a Welch's <em>t</em>-test was used to measure whether or not the variances were equal. The standard deviation, <math fontfamily="serif"><mi>σ</mi></math> was tested. If the resulting p-value was less than the chosen value for <math fontfamily="serif"><mi>α</mi><mo>=</mo><mn>0.1</mn></math>, the null hypothesis that the variances were equal is rejected. If the variances are unequal,Welch's <em>t</em>-test was used for the average PSNR or SSIM values corresponding to those standard deviations. Otherwise, Student's <em>t</em>-test was used, which assumes equal variances.</p>
<p>But what about H.264 High profile, which includes the 8-by-8 DCT that should surely improve its intra-coding performance?</p>
<p><img alt="Intra-frame comparison of H.264 High profile and VP8" src="https://www.seancassidy.me/static/images/" /></p>
<p>These are nearly identical. Obviously both <em>t</em>-tests found no significant difference between these.</p>
<h1 id="conclusions">Conclusions</h1>
<p>VP8 is a powerful modern video codec that is suitable for individuals and organizations that seek a patent-free alternative to H.264. Its quality on medium resolution web videos is comparable with H.264, and excels at low resolution and low bitrate videos. Compared to H.264 Baseline, VP8 outperforms it in quality for the same bitrate. </p>
<p>It is underperforming in higher resolution video, such as HD video, due to its simpler segmentation scheme, which reduces the effectiveness of its adaptive quantization and adaptive loop filter selection. VP8's entropy coder is approximately as efficient as CABAC, but is somewhat simpler, partially due to the lack of needing to adapt after every bit. VP8's intra prediction is sophisticated and performs as well as H.264 High profile on intra prediction tests.</p>
<p>The main reason for its high performing lower resolution results and lower resolution performing results is VP8's equivalent of H.264's flexible macroblock ordering (FMO). VP8 can add an identification number to each macroblock, numbered 1 through 4, and encode these numbered segments similarly. They do not need to be contiguous, unlike H.264's slices without FMO. This offers superior quality at lower resolutions, where the number of segments is not a impedance. At higher resolutions, four segments seriously limits the compression possible with this method.</p>
<p>For VP9, it would be a significant improvement to allow more segments.</p>
<h1 id="more-information">More information</h1>
<p>This post is really a small subset of the testing and results I gathered for my thesis. If you'd like to read more about this, you can read <a href="https://ritdml.rit.edu/bitstream/handle/1850/14525/SCassidyThesis11-2011.pdf?sequence=1">my thesis</a>. It has a lot more detail, including a detailed description of VP8, my encoding parameters, statistical methodology and more. I'm also interested in what you think of my research, so please <a href="https://www.seancassidy.me/pages/about.html">contact me</a> with any comments.</p>On Accepting Interview Question Answers2013-03-07T19:37:00-08:002013-03-07T19:37:00-08:00Sean Cassidytag:www.seancassidy.me,2013-03-07:/on-accepting-interview-question-answers.html<p>An friend of mine used to ask this interview question:</p>
<div class="codehilite"><pre><span></span><span class="err">Given a gigantic stream of unsorted data, calculate </span>
<span class="err">the median. You cannot store it in memory.</span>
</pre></div>
<p>If this question was asking for the mean, life would be easy! Candidates normally struggle for a long time while having increasingly more obvious …</p><p>An friend of mine used to ask this interview question:</p>
<div class="codehilite"><pre><span></span><span class="err">Given a gigantic stream of unsorted data, calculate </span>
<span class="err">the median. You cannot store it in memory.</span>
</pre></div>
<p>If this question was asking for the mean, life would be easy! Candidates normally struggle for a long time while having increasingly more obvious (obvious to the interviewer, that is) hints thrown at them. Such as, "What do you need to get the median?" and when the candidate says, "The sorted data set." the retort is, "all of it?" Sometimes more requirements are thrown up front, such as variance or skewness.</p>
<p>And so it goes for maybe 15 minutes, until the interviewer says that he doesn't really care about the exact median, and estimate will do just fine. What tools are good for making estimates of very large things? Sampling, of course. If you randomly sample the data set, you can then do all sorts of statistics on the sample, and it should be fairly accurate. That is the "accepted" answer to the question.</p>
<p>One candidate, however, came up with something rather interesting. What if you tracked an estimate of median, and incremented it for every new datum larger than it, and decremented every time it was less? It wouldn't be sensitive to outliers since it uses counts, rather than values, to move the median. Unfortunately, that's where the thought process ended due to the inevitable question, "What about other statistics like the 90th percentile or standard deviation?", but I thought it was a flash of genius.</p>
<p>So, what to do when confronted with a new algorithm? Code it up, of course. I wrote <a href="https://bitbucket.org/scassidy/livestats/src/a67552cc6b72e82a7ab56efc62fe1385ced24548/median.py">this Python module</a> which effectively tracks the median within one one-hundredth of a percent on large data sets. The key insight, for me, was recognizing that the amount to increment or decrement should be variable. In fact, it should be the smallest difference between the sample median and any data point ever seen. This allows it to scale to larger data sets more effectively, and approaches the mean more quickly than a static small increment would be.</p>
<p>And that's how it sat for a almost a year until I discovered a rather interesting paper. The <a href="http://www.cs.wustl.edu/~jain/papers/ftp/psqr.pdf">P-Square Algorithm for Dynamic Calculation of Quantiles and Histograms without Storing Observations</a> by Raj Jain and Imrich Chlamtac, which has a rather sophisticated algorithm for arbitrary quantile searching.</p>
<p>The algorithm is this: given a quantile you wish to track (such as 0.9 for the 90th percentile), track five data points. Two are the minimum and maximum. The central one is the best guess of the quantile in question. The other two are to fit a polynomial in order to give a <a href="http://en.wikipedia.org/wiki/Orders_of_approximation#Second-order">second degree approximation</a>. The paper discusses how second degree fits work substantially better than linear fits. I wanted to find more papers on this topic, but unfortunately the ACM and IEEE wish to slow down research with paywalls.</p>
<p>This was pretty cool, though. When consulting <a href="http://stackoverflow.com/questions/1058813/on-line-iterator-algorithms-for-estimating-statistical-median-mode-skewnes">the Internet</a> I couldn't find many solutions which did online statistics generation. I decided to generalize my small class, bring in the P2 algorithm, and make it an actual usable project that I call <a href="https://bitbucket.org/scassidy/livestats">LiveStats</a>. I hope that it's useful. One of the remaining gaps I see is lack of online mode generation. I've found a few sources on this, but none that I am completely satisfied with. If you find one, please let me know.</p>
<p>So, while many interview questions are interesting, they don't come with "the one and only" answer. In fact, some useful tools can result from thinking outside of the constraints of the given answer. It's also important to remember to recognize flashes of insight that may not be down the path you thought the problem should go. Explore it, and some fascinating conversation may result.</p>Rate Limiting per User2013-03-03T13:39:00-08:002013-03-03T13:39:00-08:00Sean Cassidytag:www.seancassidy.me,2013-03-03:/rate-limiting-per-user.html<p>When writing the HTTP API for <a href="https://bitbucket.org/scassidy/dinet">DiNet</a>, I had a problem that many services must deal with: one user maliciously generating traffic and denying service to legitimate users. In fact, the problem is much more severe in a flood network like DiNet (the diluvian part of the name is not …</p><p>When writing the HTTP API for <a href="https://bitbucket.org/scassidy/dinet">DiNet</a>, I had a problem that many services must deal with: one user maliciously generating traffic and denying service to legitimate users. In fact, the problem is much more severe in a flood network like DiNet (the diluvian part of the name is not a misnomer), as one user can generate a lot of traffic on routers.</p>
<p>Further, I didn't want to use a leaky bucket style algorithm because that would penalize all users for one user's malicious behavior. So, I decided upon this algorithm: N map counters.</p>
<ol>
<li>Each time interval, a new map is created, and the oldest map is destroyed.</li>
<li>All requests go to the newest map. They look up the user ID (or IP address) in the map, and increment the counter.</li>
<li>Each request that comes in performs a look up on each of the N maps, and gets the counter from each key</li>
<li>This counter is scaled by the age of the map, so that the oldest is scaled by 1/N, the second oldest 2/N, and the latest is scaled by 1.</li>
<li>The scaled results are added together and compared with a static limit. </li>
</ol>
<p>This can also be done for the total number of requests to have a global limit in case of a DDoS rather than a single user/IP address attack. The scaling algorithm, currently linear, might work better as an exponential or polynomial distribution. </p>
<p>The time interval can be any amount of time. I think that it would depend on your service what you would choose. Half a minute to a minute sounds about right to me. The total number of buckets is another interesting tuning parameter. Somewhere in the neighborhood of 2-5 sounds like a good guess. Testing and application idiosyncrasies will determine the ideal value.</p>
<p>I've <a href="https://bitbucket.org/scassidy/dinet/src/88e72c724052e2e0d0d32d651ec2fecebea2a867/ratelimit.c?at=master">coded this up</a> and it seems to be working well for the DiNet HTTP API. It uses the fantastic <a href="http://troydhanson.github.com/uthash/">uthash</a> library for its maps.</p>Write your own Data Structures2013-03-02T17:48:00-08:002013-03-02T17:48:00-08:00Sean Cassidytag:www.seancassidy.me,2013-03-02:/write-your-own-data-structures.html<p>One of the fundamental skills a software engineer needs in order to be successful today is a mastery of Google-fu. If you do not reference the accumulated knowledge and experience of the Internet on a daily basis your work will suffer. However, I think we, as a community of engineers …</p><p>One of the fundamental skills a software engineer needs in order to be successful today is a mastery of Google-fu. If you do not reference the accumulated knowledge and experience of the Internet on a daily basis your work will suffer. However, I think we, as a community of engineers and thinkers, should take a step back when someone says, <a href="http://blog.framebase.io/post/43973262180/the-best-programmers-are-the-quickest-to-google">"The best programmers are the first to Google."</a></p>
<p>What this, in essence, says, is that there is little-to-no benefit to learning tricky algorithms and complicated data structures. That instead, you should focus your work on using these tools to solve problems. What is missed is that by knowing how these algorithms and data structures work, on a fundamental level, will improve your ability to solve problems dramatically. Further, you will be less likely to stray into unknown unknowns and the bad side of the Dunning-Kruger effect.</p>
<p>The Dunning-Kruger effect is the phenomenon that the less knowledgable you are, the more you think you know. In effect, you fail to recognize the scope of knowledge. There is so much stuff to learn, that merely by learning more, you recognize how little you know. When I look at Coursera, for example, I often must stop myself from signing up for every other course. There's so much to learn, and so little time to learn it all. One thing is for sure: you'll never be one of the best programmers if you do not take the time to learn.</p>
<p>So, when you choose to download someone's solution to rate limiting, say, you have missed a learning opportunity. This is sometimes of no concern, with deadlines what they often are. But if it becomes the main way you work, you will be a victim of the bad side of the Dunning-Kruger effect: you will begin to not even know certain solutions are available. You would not have missed the obvious application of a <a href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filter</a> nor would you have attempted to breadth-first search a cyclic graph if you had spent more time learning and less time copy pasting.</p>
<p>My advice is to take some time to implement a data structure in your language of choice. Pick one that isn't in your standard library, but might be useful to you. You'll learn several things:</p>
<ol>
<li>Data structures need to be fast, fast, fast</li>
<li>It's harder than it looks</li>
<li>How to write a proper API with contracts</li>
<li>Correctness is paramount</li>
<li>Writing tests to complete code coverage</li>
</ol>
<p>And, it's fun. That's why I have a <a href="https://bitbucket.org/scassidy/data-structures">small project</a> where I collect data structures I've written in C. I learned a great deal from implementing them, and I've used them several times since.</p>
<p>One last word: of course you should use all the tools available to you, and you'll never have time to implement every data structure and algorithm yourself. Should you use your own "thread-safe" queue in production instead of Doug Lea's excellent <a href="http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html">ConcurrentLinkedQueue</a>? No. Try it for fun, and maybe you'll be the next Doug Lea. He certainly didn't copy and paste that code from someone else.</p>