Wow, what a busy summer. In addition to Hadoop Summit, HBaseCon, and a little holiday, I managed to squeeze the foundation patches for a client-managed data type API into HBase 0.95.2. I also received word that my proposal to speak at Strata/NYC was accepted!
Data Types != Schema
My work on adding data types to HBase has come along far enough that ambiguities in the conversation are finally starting to shake out. These were issues I’d hoped to address through initial design documentation and a draft specification. Unfortunately, it’s not until there’s real code implemented that the finer points are addressed in concrete. I’d like to take a step back from the code for a moment to initiate the conversation again and hopefully clarify some points about how I’ve approached this new feature.
Edit: this entry has been cross-posted onto the Apache HBase blog. You might find more comments and discussion over there.
Cascalog’s Not So Lazy-generator
I find Cascalog’s choice of name for the lazy-generator
to be a
bit of a misnomer. That is, it’s not actually lazy! The
lazy-generator
consumes entirely your lazy-seq into a temporary tap.
This necessary inconvenience results in a convenient side-effect,
however.
Transcript of Bring Cartography to the Cloud With Apache Hadoop
I had the honor of presenting to a full house at FOSS4G-NA 2013 this May. This is a rough transcript of that presentation. Just like my talk at the Big Data Deep Dive, no recording was made, as far as I’m aware. So just like that transcript, this is a recitation from memory.
The deck is available on slideshare, and embedded at the bottom of the post.
How to Contribute to HBase and Hadoop2
In case you haven’t heard, Hadoop2 is on the way! There are loads more new features than I can begin to enumerate, including lots of interesting enhancements to HDFS for online applications like HBase. One of the most anticipated new features is YARN, an entirely new way to think about deploying applications across your Hadoop cluster. It’s easy to think of YARN as the infrastructure necessary to turn Hadoop into a cloud-like runtime for deploying and scaling data-centric applications. Early examples of such applications are rare, but two noteworthy examples are Knitting Boar and Storm on YARN. Hadoop2 will also ship a MapReduce implementation built on top of YARN that is binary compatible with applications written for MapReduce on Hadoop-1.x.
The HBase project is rearing to get onto this new platform as well. Hadoop2 will be a fully supported deployment environment for HBase 0.96 release. There are still lots of bugs to squish and the build lights aren’t green yet. That’s where you come in!
Transcript of HBase for Architects Presentation
I was invited to speak at the Seattle Technical Forum’s first ”Big Data Deep Dive”. The event was very well organized and all three presentations dove-tailed into each other quite well. No recording was made of the event, so this is a transcription of my talk based on notes and memory.
The deck is available on slideshare, and embedded at the bottom of the post.
Speaking This May
Aside from re-skinning the place, I’ve been pretty quite here lately. I’m busy working on my type system experiment (HBASE-8089) and simplifying interoperability between HBase and Pig (PIG-2786, PIG-3285), Hive (HIVE-2055, HIVE-2379), and HCatalog (HCAT-621). I’m also preparing for some talks for later next month. The first one will be here in Seattle (Bellevue, really) and the second in Minneapolis. If you’re able to make either one, do step up and introduce yourself.
Dropbox as a Git Archive
You use git and have a Dropbox account, right? Here’s a little trick I use from time to time for archiving Git repositories. Create a bare repository in your Dropbox account and push a mirror. Now you can delete your local sandbox, but you’ll still have the full history available if you need it later. Sure, you could set up private repos on Github, but that’ll become expensive fast, while Dropbox is free, at least from the beginning.
HBase Clients at Seattle Scalability Meetup
Yesterday I spoke at this month’s Seattle Scalability Meetup. My topic didn’t deviate too far from what was originally posted. Here are the slides. If you were able to join us yesterday, please take a moment to leave some feedback.
So Long Posterous
With Posterous shutting their doors, I’m finally motivated to reexamine the web space I don’t really maintain. The whole point of choosing posterous was to have a minimal barrier to posting. To that extent, the string of short-text-plus-images posts proves the format effective. In search of a replacement, I’m not excited about anything I’ve found. However, since finishing the book, I have a number of ideas and half-writings to share. So, it’s time to make something work.