Blog

A System Implementer's Flywheel May 9, 2026

Several months ago I took up a monster of a project I had no clear idea how to tackle. What was abundantly clear was that I would need to gain a broad-reaching understanding of company policies, internal systems, and open source technologies to succeed. I had one quarter to deliver meaningfully. In an act of desperation, I reached for my coding agent as a research assistant.

Releasing HBase on Linux with Podman Mar 25, 2026

I have previously released artifacts for Apache HBase using MacOS and Windows 11 + WSL2. Now I am running a native Linux installation, and so I again have some minor details to work through. This install is built on systemd, which is of minor concern. More interestingly, I decided to drop Docker and instead use Podman and crun as my interface over Linux containers.

HBase root dir on a Mac Jul 11, 2014

While working on HBase bug fixes and feature development, it’s often quite convenient to test changes on a local-mode HBase. This is done by running HBase right out of your developer sandbox. Though a lot of HBase development happens on Macs these days, it’s a system designed first to run on Linux. That means there are a couple minor annoyances for non-Linux users. Let me show you how I work around one of them.

Greetings from Europe Jun 12, 2014

Between HBaseCon and Hadoop Summit I took a short trip to Europe. I got to spend some more time working along side Nicolas and meet some of the Scaled Risk crew. I also took a small holiday through the hillside in Romania! Along the way, I was invited to present for both the Paris HPC Meetup and the London HBase Meetup.

BlockCache 101: Lightning Talk Edition Jun 10, 2014

Every year at Hadoop Summit there’s a little un-conference, call the Birds of a Feather Sessions, or BoF for short. These are topical meetups that take place after the conference proceedings and are open to non-attendees. This year I helped organize the HBase BoF, along with Subash D’Souza.

Latency talk at Hadoop Summit May 29, 2014

The Latency Talk Nicolas and I gave at HBaseCon has been accepted for Hadoop Summit San Jose. If you missed us at HBaseCon, you get one more opportunity! We’re speaking on June 4th at 3:25p.

See you in June!

Edit: Unfortunately, Nicolas was unable to make it so I presented solo. I hope I did his section justice.

HBase: where online meets low latency May 12, 2014

HBaseCon was another fantastic conference this year! It’s a great resource for information about and around HBase, no matter where you are along your path. This year I presented a talk along with a colleague of mine, Nicolas Liochon of Scaled Risk fame. Our topic: HBase as an online, low-latency system.

BlockCache Showdown Mar 7, 2014

The HBase BlockCache is an important structure for enabling low latency reads. As of HBase 0.96.0, there are no less than three different BlockCache implementations to choose from. But how to know when to use one over the other? There’s a little bit of guidance floating around out there, but nothing concrete. It’s high time the HBase community changed that! I did some benchmarking of these implementations, and these results I’d like to share with you here.

Note that this is my second post on the BlockCache. In my previous post, I provide an overview of the BlockCache in general as well as brief details about each of the implementations. I’ll assume you’ve read that one already.

BlockCache 101 Feb 13, 2014

Edit: The sequel post, BlockCache Showdown is now available!

HBase is a distributed database built around the core concepts of an ordered write log and a log-structured merge tree. As with any database, optimized I/O is a critical concern to HBase. When possible, the priority is to not perform any I/O at all. This means that memory utilization and caching structures are of utmost importance. To this end, HBase maintains two cache structures: the “memory store” and the “block cache”. Memory store, implemented as the MemStore, accumulates data edits as they’re received, buffering them in memory ¹. The block cache, an implementation of the BlockCache interface, keeps data blocks resident in memory after they’re read.

HBase via Hive, Part 2 Nov 15, 2013

This is the second of two posts examining the use of Hive for interaction with HBase tables. This is a hands-on exploration so the first post isn’t required reading for consuming this one. Still, it might be good context.

“Nick!” you exclaim, “that first post had too many words and I don’t care about JIRA tickets. Show me how I use this thing!”

This is post is exactly that: a concrete, end-to-end example of consuming HBase over Hive. The whole mess was tested to work on a tiny little 5-node cluster running HDP-1.3.2, which means Hive 0.11.0 and HBase 0.94.6.1.

HBase via Hive, Part 1 Nov 9, 2013

This is the first of two posts examining the use of Hive for interaction with HBase tables. The second post is now available.

One of the things I’m frequently asked about is how to use HBase from Apache Hive. Not just how to do it, but what works, how well it works, and how to make good use of it. I’ve done a bit of research in this area, so hopefully this will be useful to someone besides myself. This is a topic that we did not get to cover in HBase in Action, perhaps these notes will become the basis for the 2nd edition ;) These notes are applicable to Hive 0.11.x used in conjunction with HBase 0.94.x. They should be largely applicable to 0.12.x + 0.96.x, though I haven’t tested everything yet.

HBase for Architects, redux Nov 2, 2013

I spent last week in NYC at this year’s Strata+Hadoop World, where I was invited to speak. The title of this talk is the same as the talk I gave at the Big Data Deep Dive in May, but the content received a thorough overhaul. Thanks to all the attendees and friends who give me great advice on this first go-around. Hopefully the improvements were helpful.

Speaking at Strata/NYC Sep 6, 2013

Strata + Hadoop World 2013

Wow, what a busy summer. In addition to Hadoop Summit, HBaseCon, and a little holiday, I managed to squeeze the foundation patches for a client-managed data type API into HBase 0.95.2. I also received word that my proposal to speak at Strata/NYC was accepted!

Data Types != Schema Jul 28, 2013

My work on adding data types to HBase has come along far enough that ambiguities in the conversation are finally starting to shake out. These were issues I’d hoped to address through initial design documentation and a draft specification. Unfortunately, it’s not until there’s real code implemented that the finer points are addressed in concrete. I’d like to take a step back from the code for a moment to initiate the conversation again and hopefully clarify some points about how I’ve approached this new feature.

Edit: this entry has been cross-posted onto the Apache HBase blog. You might find more comments and discussion over there.

Cascalog's not so lazy-generator Jun 23, 2013

I find Cascalog’s choice of name for the lazy-generator to be a bit of a misnomer. That is, it’s not actually lazy! The lazy-generator consumes entirely your lazy-seq into a temporary tap. This necessary inconvenience results in a convenient side-effect, however.

Transcript of Bring Cartography to the Cloud with Apache Hadoop Jun 19, 2013

I had the honor of presenting to a full house at FOSS4G-NA 2013 this May. This is a rough transcript of that presentation. Just like my talk at the Big Data Deep Dive, no recording was made, as far as I’m aware. So just like that transcript, this is a recitation from memory.

The deck is available on slideshare, and embedded at the bottom of the post.

How to contribute to HBase and Hadoop2 Jun 17, 2013

In case you haven’t heard, Hadoop2 is on the way! There are loads more new features than I can begin to enumerate, including lots of interesting enhancements to HDFS for online applications like HBase. One of the most anticipated new features is YARN, an entirely new way to think about deploying applications across your Hadoop cluster. It’s easy to think of YARN as the infrastructure necessary to turn Hadoop into a cloud-like runtime for deploying and scaling data-centric applications. Early examples of such applications are rare, but two noteworthy examples are Knitting Boar and Storm on YARN. Hadoop2 will also ship a MapReduce implementation built on top of YARN that is binary compatible with applications written for MapReduce on Hadoop-1.x.

The HBase project is rearing to get onto this new platform as well. Hadoop2 will be a fully supported deployment environment for HBase 0.96 release. There are still lots of bugs to squish and the build lights aren’t green yet. That’s where you come in!

Transcript of HBase for Architects Presentation May 29, 2013

I was invited to speak at the Seattle Technical Forum’s first “Big Data Deep Dive”. The event was very well organized and all three presentations dove-tailed into each other quite well. No recording was made of the event, so this is a transcription of my talk based on notes and memory.

The deck is available on slideshare, and embedded at the bottom of the post.

Speaking this May Apr 30, 2013

Aside from re-skinning the place, I’ve been pretty quite here lately. I’m busy working on my type system experiment (HBASE-8089) and simplifying interoperability between HBase and Pig (PIG-2786, PIG-3285), Hive (HIVE-2055, HIVE-2379), and HCatalog (HCAT-621). I’m also preparing for some talks for later next month. The first one will be here in Seattle (Bellevue, really) and the second in Minneapolis. If you’re able to make either one, do step up and introduce yourself.

Dropbox as a git archive Apr 5, 2013

You use git and have a Dropbox account, right? Here’s a little trick I use from time to time for archiving Git repositories. Create a bare repository in your Dropbox account and push a mirror. Now you can delete your local sandbox, but you’ll still have the full history available if you need it later. Sure, you could set up private repos on Github, but that’ll become expensive fast, while Dropbox is free, at least from the beginning.

HBase Clients at Seattle Scalability Meetup Mar 28, 2013

Yesterday I spoke at this month’s Seattle Scalability Meetup. My topic didn’t deviate too far from what was originally posted. Here are the slides. If you were able to join us yesterday, please take a moment to leave some feedback.

So long Posterous Mar 24, 2013

With Posterous shutting their doors, I’m finally motivated to reexamine the web space I don’t really maintain. The whole point of choosing posterous was to have a minimal barrier to posting. To that extent, the string of short-text-plus-images posts proves the format effective. In search of a replacement, I’m not excited about anything I’ve found. However, since finishing the book, I have a number of ideas and half-writings to share. So, it’s time to make something work.

HBaseCon2012, Scaling GIS in 3 Acts, Lightning Edition May 24, 2012

HBasecon2012, the first of it’s kind, happened on Wednesday. I had the honor of presenting a lightning talk at the end of one of the Applications tracks. I shared a little of what I’ve learned over the last couple months in the new-to-me domain of GIS. I think the talk went well, despite my nerves, because I had many good questions from the audience. I look forward to continuing the work and providing more details the next time around.

Snomg Jan 18, 2012

Seattle Log The 18th day of January on this year of our Lord, two-thousand and twelve.

Yesterday vs. Today Jan 15, 2012

It just happens I took a photo of yesterday’s blue sky. Today is quite the contrast.

Vandalism Dec 21, 2011

If a window is smashed in the night and no one is awake to hear it, does it make a sound?

Seattle from the Air Nov 3, 2011

I just returned from a couple weeks of work in San Francisco. On the way down, I snapped some shots as we were departing SEA. Through the magic of white-balance correction I’ve managed to pull out a few of the nicer ones; I’m pleased with the results. Enjoy.

Breaking in the New Grill Aug 15, 2011

Salmon and asparagus beside clams and leeks drowned in Chardonnay. The asparagus ended up a little over-cooked but the seafood was perfect. I now need to make good on 1.5c of delicious clam juice. Yum!

Throwing up Flowers May 9, 2011

Yesterday I was gifted these lovely flowers, fresh from Pike’s Market. Lacking any kind of vase, I found a use of the gurgling pot.

s3-edge: Serve files from S3 as if they were your own. May 3, 2011

Have you ever wished you could use HTTPS with your own CNAME on S3? “Well, sure, maybe that one time” you say? Great! You still can’t. But until you can, there’s a work-around: s3-edge.

New Motorcycle! Apr 29, 2011

This project has been a long time coming. Last week I purchased a 2009 Yamaha FZ6R from a friend of a friend. While technically not my first bike, it’s the first one I’ve ridden (as opposed to worked on). I went for my first ride on Saturday and stopped by a friend’s house along the way. He’s also an amateur photographer and kindly snapped a few shots for me.

"The best minds of my generation are thinking about how to make people click ads. That sucks." Apr 25, 2011

A recent post in Read Write Web calls out a short and sweet quote which hits close to home:

Market Matters Most. Not Team. Not Product. Market. Apr 24, 2011

For when I can’t find this later:

Drum and Bass Night Apr 3, 2011

I had the good fortune of being invited out for Drum and Bass night on Saturday – who knew Temple Billiards has a basement AND a weekly show with a thriving community? There was a lot more nuance to the beats than I expected though it could have been louder. The people were really friendly and very into it. No one seemed to mind me using my camera and I enjoyed playing in the low-light.

The Flobots, at Neumo's, May 9, 2010 May 10, 2010

Great live show. Ravenous energy, consumed the croud. Authentic sound.

Moving and Shaking Feb 8, 2010

I start a new consultant position on Wednesday. This employer requires I submit to a background check, including I provide my last 7 years of residences. I had most of the information compiled from a previous bureaucratic encounter, so I just had to add a couple more addresses.

Fun with the Garmin! Jan 14, 2010

Here’s a couple charts from my new toy. I really like having access to the statistics regarding my runs. With the ability to evaluate past performance and make calculated changes to improve future performance, I might consider calling this “training”.

Protip: Cooking Lasagna Noodles Jan 11, 2010

Only cook the pasta to 2/3 done. Pull and place in a bowl of room-temp water. By the time you’ve cooked all your pasta, they’ll have absorbed enough additional liquid to be “cooked”. Plus, they’re far less likely to rip while handling. Plus, they won’t stick together between the pot and assembly.

T-3 Hours: a Story about Fear Nov 20, 2009

Three hours until my YC interview and all I can think to do is write a blog post. As you well know, I don’t really blog. This morning, however, I’m compelled to put thoughts to ether. Thoughts surrounding the sequence of decisions which have led me to this spot: sitting in a hotel room in the Silicon Valley, wearing a waffled hotel robe, drinking pretty decent hotel coffee, preparing to go in front of people upon whom I’ve had an internet crush for the last 6 years and compel them to give me - over all the other people they will interview today - a boost in starting my company.

Vandalism Happens Sep 12, 2009

Turns out the Fit Sport comes stock with some nice rims which all the kids are boosting these days. Guy at the dealership says the only reason they had the wheels in stock is because they see 4-5 of these a month.

With SpringSource, VMware Moves Beyond Virtualization Aug 11, 2009

VMWare buys Spring? I’m quite sure the sky is falling. Who’d have thought Spring would be worth $400+ mil?

First Post Apr 20, 2009

Word is, Posterous is the shite. Here’s to learning. Also, the option of expressing myself to the tubes in more than 140 characters is new and exciting.

My back yard Apr 20, 2009

As if I needed to rub it in, here’s a shot from my back porch. And my family wonders why I won’t move back to the Midwest, HA!