Big Data Analytics Takes a Step Towards Apache Spark
Posted on Oct 21, 2014
Hadoop and MapReduce were never designed for real-time highly interactive business data analytics. While these two tools went a long way towards simplifying the analysis of big data, they are, in some ways, the victims of their own success.
Once enabled to mine huge datasets for insights, analysts wanted more complex reporting, more interactive, ad-hoc, reporting, and real-time online processing. How to build this on top of a toolset designed for batch jobs?
Enter Apache Spark, an open-source project that lays on top of Hadoop, and performs in-memory processing to translate the needs of real-time analysis to the MapReduce batch processing model.
Spark began as a research project locally, here in the Bay area, at the UC Berkeley AMPLab in 2009, and converted to open source in 2010. Princeton University, Klout, Foursquare, Conviva, Quantified, AirBNB, Yahoo! Research and others have contributed code.
The Spark community claims that applications built on the platform run in memory one-hundred times faster than straight Hadoop MapdReduce, or ten times faster than on disk.
Now, Silicon Valley movers-and-shakers are rushing to migrate their big data analytics platforms to leverage Spark. For example, Platfora, a San Mateo-based startup that has raised $65 million on funding, announced at the Strata+Hadoop World conference in New York City last week, that it expects to have it's platform migrated to Spark in the first half of the new year.
You can read more about Platfora's plans on its big data analytics blog on its website.