MS Software Engineering student at San Jose State University
What is Apache Spark?
[As from Wikipedia] Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS).
It is currently one of the most active and hot project in the Hadoop ecosystem, with wide speculation that it would eventually completely replace Hadoop. You can read more about it at:
About ElasticBox
[As from elasticbox.com] ElasticBox makes it as easy as possible to develop, deploy, and manage applications for any cloud infrastructure. Public, private, or hybrid cloud deployments across AWS, Google Compute, Azure, OpenStack, CloudStack, and VMware - all just need a few clicks.
ElasticBox enables you to write an application once, and deploy it on any cloud architecture without being locked on any one specific cloud. You can read more about it at:
Now assuming that you know about Apache Spark and the concepts in ElasticBox (which are just three: providers, boxes and instances), let us drill down on deploying Apache Spark via ElasticBox.
Note: