Preston Lee's Blog

Tag: development

Keeping Multiple Development Machines Synchronized with Homeboy
Most of us developer types have at least two machines we use routinely, and managing that can be a chore. Specifically, I usually want to do the following every time I sit down at a machine to hack on something:
- Keep config files like .bash_profile and .gitconfig synchronized. (This requires scripting since Dropbox ignores hidden files.)
- Patch all OS-level libraries using the native package management system on OSX, Ubuntu and CentOS.
- Update my database daemons and other system services.
- Upgrade development libraries.
- Merge in ‘git pull origin master’ project source code for all my clones.
Doing all this every time I hop to a different machines was chore, so I wrote a few BASH scripts to help. More importantly, I recently packaged them into a public GitHub and released it! Homeboy is a set of small, plain BASH scripts. After following the simply installation instructions, you just run homeboy every morning:

$ homeboy

This will run the updates you specify, though I’ve only included the stuff I use regularly: namely brew (OSX), rvm (OSX and Linux), apt (Ubuntu), yum (Red Hat/CentOS) etc. I ask that you submit pull requests to add support for updating Perl, Python, MacPorts etc. The synchronization mechanism works by zipping the specified list of files into a .zip in a synchronized directory managed by Dropbox, SugarSync etc. “Pushing” your current set of files to Dropbox is done via:

$ homeboy-push

After pushing, the next time `homeboy’ is run on any configured machine, the .zip file will be unzipped into your home directory. It’s really not complicated, but saves time by having to make the same change a bunch of times across different machines and platforms, and having subtle differences.

When using the git options, homeboy assumes you have a single directory where all your clones are kept, such as ~/Developer/git. Every subdirectory that looks like a git clone will have ‘git pull origin master’ run inside it.

Pretty silly stuff, right? But hey, all I do is run `homeboy’ every morning I plan on doing development work on a machine, and sip on a cup coffee while everything is brought up to date. 🙂

Please help test and submit pull requests!
2013.05.14
Quick FASTQ File Parsing Via Memory Mapping In C/C++

I recently had a need to speedily parse through 8GiB+ .fastq text files to calculate a simple statistic of genomic data. My initial “pfastqcount” implementation in Ruby worked fine, but with many files to process took longer than I had hoped in addition to consuming an alarming amount of CPU. I ended up reimplementing the pfastqcount command-line program in C, which takes one or more .fastq files, memory maps them, and creates the statistic. Simply dropping my algorithm down to raw C significantly sped up the process and reduced CPU usage, especially coming from an interpreted language. If any of you bioinformaticians find the need to implement a FASTQ data processing algorithm in C, I encourage you to fork the project and use it as a template. The project is Apache 2.0 licensed for your convenience and publicly available on GitHub.

2011.11.17
Ruby: On The Perl Origins Of “&&” And “||” Versus “and” And “or”.
Overview

Avdi Grimm has a recent post noting the use of “and” and “or” Ruby keywords as essentially control flow operators, and hinting on their Perl origins. This instantly recalled several mental notes of old Perl programs, so I though I’d put out a few quick notes on the Perl equivalents for Ruby programmers not versed in Perl 5.

Perl Semantics

Perl actually borrows the precedence rules of “&&” and “||” from C, though I’m not entirely convinced the Perl and Ruby semantics of “and” and “or” are identical. We have to remember that Larry Wall is a linguist, and that some of Perl’s idiosyncrasies are due more to human considerations than machine. Programming Perl has several pages of great content on “and” and “or”. For example, here’s an excerpt from the circa 2000 version of Programming Perl:

But you can’t just up and replace instances of || with or. Suppose you change this:

$xyz = $x || $y || $z;

to this:

$xyz = $x or $y or $z; # WRONG

That wouldn’t do the same thing at all! The precedence of the assignment is higher than “or” but lower than ||, so it would always assign $x to $xyz, and then do the ors. To get the same effect as ||, you’d have to write:

$xyz = ($x or $y or $z);

The moral of the story is that you must still learn precedence (or use parentheses) no matter which variety of logical operator you use.

What Avdi says about precedence and control flow seems correct, though the reasons for it–in Perl at least–is to differentiate between two subtly distinct human linguistic semantics. For example, consider the following two English statements.
1. I’m either traveling by car or just staying home.
2. I’m either traveling by car or by train.
At first glance the semantics seem identical, but on closer inspection they are completely different. Let’s look at each.

I’m either traveling by car or just staying home.

In Perl, this is the equivalent of:

$traveling_by_car = can_use_car($me) or stay_home($me);

The semantics of or is to describe consequence (aka control flow) in a specific order. Either I’m traveling_by_car(), or else screw the whole thing and I’ll just stay_home() instead, in which case it makes little difference what the value of $traveling_by_car is. In other words, the semantics of the assignment trump that of the consequence, and should the consequence occur (which in this case probably has side effects), the assignment probably doesn’t matter. This is why Perl users often use or in the following context instead of ||:

open FOO, $file or die “a horrible death: $!”;
@lines = <FOO> or die “$file is empty?”;

I’m either traveling by car or by train.

In English, this is a misleading statement of truth for several reasons.
1. What we mean is that one of the two options is true, but one and only one. As programmers we know this better as an xor statement–A is true if and only if B is false, and vice versa–but since there is no colloquial English equivalent of xor, we instinctively infer the meaning from the speakers misstatement.
2. As programmers we tend to look at || as a short-circuit operator evaluated left to right. But this English statement defines no explicit evaluation order. This could have been written “I’m either traveling by train or by car” (notice “car” and “train” are reversed), but it means the same thing. In other words, we are evaluating the arguments for truth, not consequence. Thus, order should not affect truth.
Closing Thoughts

In most real-world Ruby contexts these operators will work interchangeably so long as precedence order is considered. But even so, keep in mind that mixing the two styles in different contexts is not necessarily a sign of inconsistency. Using the appropriate operator may actually be a sign of maturity: a way of communicating slightly different semantics in an otherwise logically equivalent context.
2010.08.04
Offering Developers Startup Equity, A Dialog

I have this conversation about once a month, generally by a well-intentioned dreamer new to the software space who doesn’t understand why I can’t accept projects for equity. I may be exaggerating slightly, but it sure feels this way… 🙂

Preston: Hi Bill, nice to meet you. How can we help you develop your online venture?

Bill: I have a unique web startup opportunity worth $4B and am accepting HTML experts to implement it.

Preston: [immediately suspicious of the phrase “HTML expert”] Ok, you have my attention. What’s the business plan?

Bill: It’s essentially a combination of eBay, Facebook…

Preston: [senses where this is going]

Bill: …Slashdot and TheSuperficial.

Preston: It’s a news and auction site for celebrity social networks?

Bill: No no no, it’s more like Google meets MySpace.

Preston: Like.. Orkut?

Bill: Kinda, but simpler.

Preston: [completely confused] Back to the business plan part for a minute. Could you tell me about the nature of the business? Is this an ad-based site?

Bill: No.

Preston: Ahh, ok. Some sort of subscription thing then like Salon or TheOnion?

Bill: No way. Users hate paying for stuff. It’ll affect our bottom line. We’re going to keep it free for everybody. And green. We should probably add a database of sites using ecological products. And videos, of course.

Preston: [now confident of where this is going] Let me restate the question. Where did that $4B figure come from?

Bill: YouTube was bought out for $18B. Google will be all over this after we capture 10% market share.

Preston: [completely ignores the issues with those two sentences] I see. To be completely honest, I should share a couple general thoughts. [brings up telephone script #4 from personal wiki] We haven’t talked about budgets at all, but I assume this is an equity-share idea, and I’m really honored you thought of us. There are a lot of great people out there, and I’m happy and thankful to have stood out. Unfortunately, we’re not accepting equity-based projects at this time for two primary reasons. First–and again all in frank honesty–we have the technical, business and other resources to implement these things on our own without external partners. We have a lot of great ideas, and it makes the most sense for us to pursue them internally. Secondly, it’s our goal to treat employees the way we all want to be treated: with respect, recognition and great benefits. That comes with cash flow requirements we just can’t meet with equity-heavy relationships. I’m going to email you some contact information for other resources you may want to follow up with directly, and I think you’ll find that reputable software engineering shops will share these two sentiments in common as a matter of prudence. We look forward to working with you in the future, however, and we’ll keep in touch periodically to check up on you!

[exchange of pleasantries]

2009.02.11
What If Ruby Had Final Variables Like Java Or Erlang?

After a long confusing Ruby debate today at OpenRain on the merits of functional, Erlang-esque write-once-read-many variables, I’m going to step onto the podium and just say it… Ruby should get “final” or “const” variables in a similar semantic style to Java, except at runtime. Rather than ramble on for 12 paragraphs explaining exactly how this might work, read this fictitious Ruby code snippet instead. (Optional: Also check out the chapter on “final” in Hardcore Java.)

Final variables like this are really just an inline TDD mechanism.

Allowing local stack data to be constant provides no functional enhancements to the software, but alleviates the need for certain types of tests by using the compiler and/or runtime to assert certain memory is immutable. The “friend_best” method variant in the code snippet would obviously break most existing Ruby programs, but ups the bar for defensive programming by preventing many common bugs out-of-the-box while still providing support for traditional Ruby variables. At the very least we should have something like “friend_better”. Adding this information to the parse tree will also make it easier for IDEs to provide features more easily implemented for static languages.

TDD/BDD is in–no qualms about it–but we can make our code safer, cleaner and more concise by applying some of the lessons learned by our statically-typed language cousins over the last few decades.

2009.01.01
iPhone Developers May Now Speak… Almost

Apple announced this morning that the NDA preventing developers from holding open development discussions will be lifted. While details of the new agreement are not yet available, we are already beginning to see changes in the iPhone development landscape. Details on the first Phoenix iPhone Developer Group meeting will be announced tomorrow morning on the OpenRain blog!

Publishers are also rejoicing, as many have been effectively sitting on completed books in anticipation of today. iPhone SDK Development by The Pragmatic Programmers is already available for immediate electronic download, and an O’Reilly representative has informed me that O’Reilly Media has just released iPhone Forensics.

It begins.

2008.10.01
Identifying Senior Software Engineers: Six Critical Differences
For HR and legal purposes, most development companies classify Software Engineers into ranks from I to IV (or V). The higher the rank, the higher the responsibilities, expectations, independence and pay grade. To cut it as an interviewer and manager, you’ll need to classify people accurately with a minimum amount of direct personal exposure: a non-ideal but practical requirement of most hiring processes.

While we don’t regularly use titles at OpenRain, we nevertheless have to distinguish senior talent. The core issue is, “How do you objectively identify ‘senior’ engineering qualities?” Today we’ll focus on several key factors always present in quality engineers, independant of language and platform.

Instinct

He/She has developed extraordinary technical relevance filtering to the point of being able to scroll through a never-before-seen 500 line file in a language they don’t know, and tell you..
- how complicated the code is.
- where potential bugs are.
Even with no formal knowledge of code smells or design patterns, a senior developer can sense ugly code and architecture from a mile away even if they don’t yet know exactly why.

Foresight

Long-term implications are always on the mind of the Senior Software Engineer. They’ve been through the end-to-end development process (from requirements gathering to product maintenance and end-of-life) numerous times, know what issues are going to arise and will point out a suitable solution long before the symptoms start to appear. (This quality thus becomes most apparent after delivery when work is bombarded with never-before-seen use cases.) The truly elite developer is often hard to identify because they’re solving the important issues before anyone else notices the problem. (Ben is a primary example of this extraordinary perceptiveness.)

Results Focus

Knowledge without application leads to arrogance without insight. Senior developers are always focused on results which stand the test of time and can easily see through posers who fluff their way through status meetings.

Communication

New developers seldom understand the required differences of communication between different types of stakeholders. Newbies tend to treat all stakeholders as authoritative figures, and are quick to lose direction when exposed to people with differing incentives. The criticality of non-verbal developer communication is also apparent to the senior engineer. For example, a green engineer may see issue tracking as micro-management, automated testing as an ideological obsession, and project planning as administrative overhead, but these are all monumentally important aspects required to keep all developers and stakeholders in a real-time communications loop, since many do not directly interact. A senior engineer will see these concepts as empowering and often get grouchy when not present, because not having clear priorities and documentation introduces roadblocks to results.

Time & Priority Management

A senior engineer can more-or-less tell you what their schedule looks like a week out, even if it’s not written down, and won’t be hesitant to express any issues with workload ahead of time.

Estimation

New software engineers seem to invariably produce time estimates magnitudes off from reasonable numbers. The issue largely appears to be one of Foresight since accurate estimates are oft best produced by benchmarking current requirements against past similar project experiences: a task more easily accomplished with experience. This issue is an arguing point against the “Customer is Always Available” aspect of eXtreme Programming since a green developer is generally more likely to over-commit to a workload than a senior.
2008.08.20
Parallels Server Pricing: Redux

After a few grumpy emails between myself and our Account Manager, I’m happy to report that we have purchased the GA release and it’s working well. If you are using Parallels Server for internal development purposes and not for hosting, they will extend a more reasonable price per machine: $200 + $50/year maintenance. I think that’s a very reasonable price point for our usage, and am happy to pay it.

This likely has more to do with meeting end-of-Q2 sales quotas than attracting my dinky business, but regardless, a win is a win! Thanks!

2008.06.25
Parallels Desktop Coherence Mode Rocks: OS X/Windows XP Screenshot

I tried Parallels Desktop‘s Coherence mode today, and was so blown away I had to blog about it immediately.

The above image has not been doctored. It’s my normal OS X desktop with Windows XP running in coherence mode. When activated, the window around the XP virtualization session vanishes, the XP taskbar integrates into your OS X desktop, and XP application windows are free to float around. With Parallels Tools installed each XP application has a dock item which can be Command-Tabbed to. If you look closely you can see I’m running IE 6 next to Safari, both natively, without the visual distraction of the virtualization window. This is a huge usability landmark. Thank you Parallels!

Try it yourself by selecting the View -> Coherence menu option when running Parallels Desktop.

(Question: Does VMWare currently have a feature like this?)

2007.04.23