Friday 30 January 2015

Hadoop History

Hadoop was created by Doug Cutting who had created the Apache Lucene(Text Search),which is origin in Apache Nutch(Open source search Engine).Hadoop is a part of Apache Lucene Project.Actually Apache Nutch was started in 2002 for working crawler and search system.Nutch Architecture would not  scale up to billions of pages on the web.

In 2003 google had published one Architecture  called Google Distributed Filesystem(GFS),which was solve the storage need for the very large files generated as a part of the web crawl and indexing process.

In 2004 based on GFS architecture Nutch was implementing open source called the Nutch Distributed Filesystem (NDFS).In 2004 google was published Mapreduce,In 2005 Nutch developers had working on Mapreduce in Nutch Project.Most of the Algorithms had been ported to run using mapreduce and NDFS.

In February 2006 they moved out of Nutch to form an independent subproject of Lucene called Hadoop.At around the same time, Doug Cutting joined Yahoo!, which provided a dedicated team and the resources to turn Hadoop into a system that ran at web scale. This was demonstrated in February 2008 when Yahoo! announced that its production search index was being generated by a 10,000-core Hadoop cluster.
In January 2008, Hadoop was made its own top-level project at Apache, confirming its success and its diverse, active community. By this time, Hadoop was being used by many other companies besides Yahoo!, such as Last.fm, Facebook, and the New York Times.
In April 2008, Hadoop broke a world record to become the fastest system to sort a terabyte of data. Running on a 910-node cluster, Hadoop sorted one terabyte in 209 seconds (just under 3½ minutes), beating the previous year’s winner of 297 seconds.


No comments:

Post a Comment