Open Source Hadoop with HBase to Provide Scalabale Platform

| Comments

When dealing with vast amount of data you need a scalable distributed storage system. All of my database driven web sites use MySQL and for smaller databases you can mount a MySQL search to the web site. But you soon find out that to deliver fast searching capabilities to a site, or if as in my case you intend to offer a search service of hundreds of millions of crawled niche data you need a scalable distributed storage system.

Recently Google hosted a Conference on Scalability in Seattle where they talked about MapReduce, BigTable, and other distributed systems for large datasets. Listed here are the talks which are now available on Google video:

(Kudo's to Greg Linden for compiling the list of videos.)

The video's provide some technical detail while Marissa Mayer's provides some insight into Google's big picture plans.

Google's technology however is closed so if you're interested in a solution that you can use then turning to open source projects is the way to go. And this is where Hadoop with HBase come in.


Hadoop is a framework for running applications on large clusters of commodity hardware. There's a lot of development going into Hadoop right now mostly being led by Doug Cutting and Owen O'Malley of Yahoo. In my experience if you implement Hadoop you really need to stay on top of it and tweak to suite your needs. To show how young Hadoop is, the current stable release is 0.13.0.

HBase is a distributed storage system for structured data and designed for storing very large amounts of data in a distributed environment. It's intent is to be similar in function to Google's Bigtable which is used with the Google File System. Hbase will provide Bigtable-like capabilities on top of Hadoop.

While these projects are still in their infancy the open source model is leading to rapid development in these technologies.

blog comments powered by Disqus

Recent Blog Entries

The Next Breakthrough Space Technologies for Canada
The Canadian Space Commerce Association (CSCA) will be holding its annual meeting on March 18th in Toronto and my colleagues…
Canada's Housing Bubble, Real or Imaginary?
The Canadian Centre for Policy Alternatives yesterday released a study title "Canada's Housing Bubble - An Accident Waiting to Happen"…
Do China's Actions Signal a Greater Military Role for their Space Program?
Brian Weeden of the Secure World Foundation has a very interesting article on The Space Review titled Dancing in the…
Amazon Unleashes Cluster Compute Instances for High Performance Computing
I have to say I'm fairly excited at the news today that Amazon is making available a new instance type…
Shame on the New York Times for Forcing Apple to Remove Pulse from iTunes
When the New York Times objected officially to Apple about an iPad application called Pulse they shot themselves in the…
What if Microsoft and Apple Merged?
TechCrunch is reporting that Microsoft could be taking over the search on Apple's iPhone with the upcoming release of the…