Home Contact Sitemap

Preston Lee

Founder, CEO, OpenRain.com

About

During the day I run OpenRain in Phoenix: a Ruby on Rails development shop. During the night... well... I still have to run OpenRain, but when I find time to breathe, I exhale here. If you'd like to get in touch professionally, don't hesitate to contact me via the OpenRain website.

Peace,

Preston

OpenSolaris ZFS vs. Linux ext3 RAID5

Preston Says: I asked Dan McClary for a big favor recently: use his general UNIX knowledge and graduate-level statistics voodoo to produce a report highlighting performance characteristic differencess between OpenSolaris ZFS and Linux RAID5 on a common, COTS hardware platform. The following analysis is his work, reformatted to fit your screen. You may download the PDF, HTML, graphs and original TeX source here.

A Brief Comparison of I/O Performance for RAIDZ1 and RAID-5 Filesystems
Dan McClary
June 28, 2007

Introduction

The following is a description of results obtained benchmarking I/O performance for two OS/filesystem combinations running identical hardware. The hardware used in the tests is as follows:

  • Motherboard: Asus M2NPV-VM.
  • CPU: AMD Athlon 64 X2 4800+ Dual Core Processor. 2.5GHz, 2?512KB, 1GHz Bus
  • Memory: 4 x 1GB via OCZ OCZ2G8002GK DDR2-800 PC2-6400
  • Drives: 4 x 500GB Western Digital Caviar SE 16 WD5000AAKS 7200RPM 16MB Cache SATA 3.0Gb/s

The Linux/RAID-5 combination uses a stock Ubuntu Server Edition installation, running kernel 2.6.19-generic, with RAID-5 configured via mdadm and formatted ext3. The Solaris/RAID-Z1 configuration is a stock installation of Solaris Developer Express Edition with zpool managing the zfs-formatted RAID-Z1 drives. Block size for all relevant tests is 4096 bytes.

Basic I/O testing is conducted using bonnie++ (version 1.03a), tiobench (version 0.3.3-5), and a series of BASH-scripted operations. Tests focus on I/O throughput and CPU usage for operations either much larger than available memory, and very large numbers of operations on small files. All figures, unless otherwise noted, chart mean performance with 2% deviation for large-file operations and 5% for small-file operations. These bounds well-exceed the 95% confidence interval, implying a range of high significance.

Large-File Operations

In dealing sequential reads and writes, particularly of large files, the Solaris/RAID-Z1 configuration displays much higher throughput than the Ubuntu/RAID-5 combination. Latency and CPU usage, however, appear to be higher than in the Ubuntu configuration. The reasons for these disparities are not determinable from the tests concluded, though one might venture that the management algorithm used by ZFS and each systems caching policies may play a part.

picture-2.png

Figures 1, 2, and 3 summarize large-file writing performance in the bonnie++ suite. In large writes, Solaris-ZFS displays marginally higher throughput and occasionally lower CPU usage. However, the disparities are not great enough to make a strict recommendation based solely on large-file writing performance.

picture-4.png

picture-5.png

picture-6.png

picture-7.png

picture-8.png

picture-9.png

Figures 4 and 5 illustrate throughput and CPU usage while reading large files in the bonnie++ suite. Generally, results are consistent between platforms, with the Ubuntu configuration showing a slight edge when reading 15,680MB files (though with an associated drop in CPU efficiency).

tiobench results for random reads and writes given in Tables 3 and 4 show the Ubuntu/RAID-5 configuration displaying both higher throughput and greater CPU efficiency. However, these results seem somewhat questionable given the results in section §3.

picture-10.png

picture-11.png

Small-File Operations

In examining the performance of both configurations on small files, both in the bonnie++ suite and from shell-executed commands, the most obvious statement that can be made is that the Solaris configuration displays greater CPU usage. This, though, may not be indicative of poor performance. Instead, it may be the result of an aggressive caching or other kernel-level policies. A more detailed study would be required to determine both the causes and effects of this result. In each test, 102,400 files of either 8 or 4KB were created.

picture-12.png

picture-13.png

Figures 6(a)-6(c) and 7(a)-7(c) illustrate bonnie++ performance for both configurations. In contrast to the tiobench results, the Solaris configuration generally displays slightly higher throughput (on the order of 1-2MB/s) than its counterpart. However, as previously noted, CPU usage is much higher.

Finally, Tables 5-6 lists measured times as given by the standard Unix command time when measuring command execution. In these results, there are some surprises. The Ubuntu configuration performs somewhat faster when executing a large write (using the command dd). However, the Solaris configuration is much faster when dealing with 100,000 sequential 8KB files. For reference, all file creation is done via dd, copying by cp and deletion by rm.

picture-14.png

picture-15.png

Conclusions

Few overarching conclusions can be drawn from the limited results of this study. Certainly, there are situations in which the Solaris/RAID-Z1 configuration appears to outperform the Ubuntu/RAID-5 configuration. Many questions remain regarding the large discrepancy in CPU usage for small-file operations. Likewise, the Ubuntu/RAID-5 configuration appears to perform slightly better in certain situations, though not overwhelmingly so. At best, under these default configurations, one can say that overall the Solaris configuration performs no worse, and indicates that it might perform better under live operating conditions. The latter, though, is largely speculation.

Indeed, from the analyst’s point of view, both configurations show reasonable performance. The desire to deploy either configuration in an enterprise setting suggests that significant-factor studies and robust parameter designs be conducted on, if not both candidates, whichever is most likely to be deployed. These studies would provide insight into why the discrepancies in current study exist, and more importantly, achieve optimized performance in the presences of significant uncontrollable factors (e.g. variable request-load).

Preston Says: Thanks for the outstanding work, Dan!

Share/Save/Bookmark

Tags: , , , , , ,

. 17 Jul 07 | Computer | Comments (5)

Software Engineering Curse Words

images-1.jpeg

Here lie terms frequently used in software development which I don’t particularly care for.

Programmer

Commercial software is as much about programming as building bridges is about installing steel I-beams. Writing actual code is only part of the engineering effort. When I see a job posting entitled “Java Programmer” I usually suspect that this is either (1) a low-level monkey position and/or (2) the person behind the post doesn’t really understand the scope of typical developer work.

Developers are required–much unlike the mechanical nature of an assembly line worker–to make decisions and assumptions about the external purpose and internal nature of their work, often part of a seemingly ingrokable ecosystem. Unless you have a micro managing boss or a heavy-weight process itemizing every last byte of work, you must personally exercise critical thinking, time management and interpersonal skills to balance your never ending stream of unclear and incompletely stated priorities. Being a successful programmer thus requires much more than programming knowledge.

The Point: The term “programmer” in an inaccurate trivialization of the real job. I prefer “Software Engineer” formally and “developer” in colloquial usage.

Senior

For HR purposes, “Senior” is a nice way of labeling someone as having a bit more knowledge, responsibility, general weight, and more income than a non-senior person. The problem is that both senior and non-senior developers tend to have very similar job duties; so aside from income, the criteria of senior personnel are inherently qualitative, subjective, relative to a particular domain (read: not necessarily guaranteed to transfer being projects), and/or effectively indistinguishable from non-senior status.

The effect is that, in a matrix organization, a new project may end up with n00bs who are senior, experts who are junior, and a pay structure which reflects an old project now completely irrelevant to the current situation. Senior and non-senior developers often work together as peers, and everybody quickly figures out who the real leaders are. And that’s frequently very different from the formal structure and correlating pay grade.

The Point: “Senior” tells me that you’re expecting to make more and are probably good at something, which may or may not be relevant to me. It’s not a global implication of elevated wisdom.

Architect

Most “software architects” I’ve met do far more operational and project management than architectural design work. This isn’t to say that they don’t or aren’t capable of making significant design contributions to the project, but that all the overhead of email and meetings between business/team/customer/whomever members sucks up so much time that lower level engineers have to either make the design decisions for the architect or block indefinitely as the architect plays Inbox-fu.

The Point: If you’re an “architect” who doesn’t have time to sit down with the engineers and talk about design, you’re really a technical manager who needs to officially delegate the design work to avoid becoming a bottleneck for the team.

Resource

I shudder whenever I hear or use this word, usually in a managerial, Mythical Man Monthian context trying to quantize everyone into tiny cube shaped units. I find it so important to account for individual character when planning and estimating that I consciously use the word “people” instead of “resources”; it’s a simple trick to force yourself into remembering the undeniable human individuality of the worker bee.

The Point: People aren’t Legos, so let’s not pretend they are.

Share/Save/Bookmark

Tags: , ,

. 06 Jul 07 | Computer | Comment (1)

Parallels Desktop Coherence Mode Rocks: OS X/Windows XP Screenshot

I tried Parallels Desktop’s Coherence mode today, and was so blown away I had to blog about it immediately.

parallels-coherence-mode-small.png

The above image has not been doctored. It’s my normal OS X desktop with Windows XP running in coherence mode. When activated, the window around the XP virtualization session vanishes, the XP taskbar integrates into your OS X desktop, and XP application windows are free to float around. With Parallels Tools installed each XP application has a dock item which can be Command-Tabbed to. If you look closely you can see I’m running IE 6 next to Safari, both natively, without the visual distraction of the virtualization window. This is a huge usability landmark. Thank you Parallels!

Try it yourself by selecting the View -> Coherence menu option when running Parallels Desktop.

(Question: Does VMWare currently have a feature like this?)

Share/Save/Bookmark

Tags: , , , , ,

. 23 Apr 07 | Computer | Comment (1)

5 Roadblocks To Enterprise Rails Acceptance

rails.pngI love Rails for its pragmatic design and agile culture: two qualities not usually associated with the large, enterprisey systems of Fortune 500 companies. In my last formal position I was part of a small internal movement to drive the Rails train upward through the IT ranks, but the effort was met with limited success. The unfortunately reality is that Rails currently lacks several key qualities to which enterprise project leaders have become accustomed. Here are five reasons of varying significance to start us off.

Insane Query Support

Most documentation you read about ActiveRecord will take you through tidy, minimalistic examples which are squeaky clean and really fast. Complex queries, however, will be easier to do using Model.find_by_sql, which accepts a raw SQL query. Ordinary dynamic finds with deep loading behavior may require you to hard-code names in the query to avoid issues with the generated SQL. ActiveRecord is way easier to use, but far from Hibernate. I’d say that over 95% of the queries issued by a larger application are of trivial or medium complexity, but a lot of time and your best developers go into that last 5%, and this is where the heavier OR/M frameworks start looking better than ActiveRecord.

Distributed Transactions

The rise in SOA interest over the last couple years has led to more applications using multiple data sources. While it is possible to nest transactions, “Rails doesn’t support distributed two-phase commits (which is the jargon term for the protocol that lets databases synchronize with each other).” (From Agile Development with Rails, 2nd Edition.) In many situations, simply nesting transactions will suffice; however, many situations should really have the safely and reliability of two-phase semantics, and this factor alone could be a deal breaker.

Data Integrity

Database Designers (DBDs) like FOREIGN KEY constraints, CHECKs, high levels of normalization, and are the natural enemy of null fields. In other words, DBDs don’t like Rails. While I’m certainly no Pedantic Data Nazi (PDN?), there should at least be a basic set of built-in mechanisms for generating such simple self-defenses against naughty applications. Frankly I’m surprised that the community isn’t pushing harder for solid constraint support within migrations.

IDEs

This isn’t technically an issue with Rails itself, but a roadblock to its adoption nonetheless. Most Rails developers (including myself) appear to be using TextMate. A smaller population use RDT, Emacs, or numerous other packages. But there isn’t yet an application which comes close to the basic core feature of the popular Java and .Net IDEs. The currently broken breakpointer is another swift kick in the pants. What I can do with Eclipse on a remote application server isn’t in the same universe of functionality as the Rails breakpointer, even when it worked.

Top-Down Push

For whatever reason, CTOs and CIOs haven’t yet become seriously interested in Rails, and without this air of implicit exploratory approval, managers seem reluctant to give in to antsy developers. I would love to see Rails become a flagship of agile enterprise projects, but that’s not going to happen until management sees the real ROI of a project done by experienced Rails developers.

None of these things are insurmountable, but there are many more challenges to overcome if Rails will ever sit on the same application servers as Java and .Net. What challenges have you faced with Rails at your organization?

Share/Save/Bookmark

Tags: , , , , ,

. 17 Apr 07 | Computer | Comments (17)

JXTA: Not The Solution To Java Peer Discovery

sun_jxta.gifOnly developers with hair should use JXTA, because those with bald or shaven heads won’t have anything to violently rip from their skulls while they develop with it. I have been, and continue to be excited by, JXTA’s potential, but have been very disappointed at the pace at which a project progresses when using it. JXTA’s capabilities, on the PowerPoint level, are impressive. It facilitates a great deal of networking features necessary for peer-to-peer operation and service discovery. So what’s my beef? A couple major areas off the top o’ me head..

Documentation

There isn’t exactly a massive community using JXTA. There are limitless possibilities of the platform and a few significant projects that use it, but it’s not exactly a common-place technology. That’s ok. Communities need time to grow. But to build a better mousetrap, people must understand why yours is better, and how to use it properly. At first I suspected I had jumped into the system at a particularly odd moment, but most of the documentation I’ve read is either out of date, or, in the case of much of the code itself, completely missing. This may come as a surprise to the good folks at jxta.org who provide many links to JXTA articles, but as a developer new to the platform sitting down and getting started, you’ll find yourself confused by deprecated and changed APIs without a clear understanding of the Right Way to do things. The - popular - books - are - long - outdated.

Testing

As an advocate of test-driven development, my application unit tests attempt to cover the interactions between multiple peers on the JXTA network. Doing so requires instantiating multiple cores within the same Java unit test process, and being able to reset them to initial states between test cases. Unfortunately, JXTA is designed as a singleton, which as we already know is not a friendly pattern to test-driven development. Couple this unfortunate design with the general difficulties of multi-threaded unit testing, and you’ll either be spending vast amount of time with your unit tests, or forgoing the complicated ones completely. Probably the latter. So what’s the solution? I’m not exactly sure, but I’ve started working on one.

Journeta

Currently code named “Journeta”, that goal is to create a greatly simplified, zero-configuration-required peer discovery and communication Java library for “trusted” networks. No configuration files, hefty learning curve or even constructor arguments, but no security or over-the-internet functionality either. (At least at the library level.) While I haven’t been actively developing it this year, I started the project last year over at OpenRain, and anticipate releasing a build sometime this summer. Let me know if you’re interested and I’ll ping you when we release a demo.

Share/Save/Bookmark

Tags: , , ,

. 14 Apr 07 | Computer | Comments (7)