Spark Internals and Design Basics

Apache Spark is an open-source general-purpose cluster computing engine built around speed, ease of use, and sophisticated analytics, that is primarily used for Big Data. It came in as an alternative to cope with the complexity and tediousness in Hadoop MapReduce for running Machine Learning algorithms. Spark introduces two main abstractions: resilient distributed datasets (RDDs) and parallel operations. Spark is written in Scala Programming Language and runs on Java Virtual Machine (JVM) environment. It very well exploits the functional programming aspect of Scala for elegance and simplicity.
Continue reading “Spark Internals and Design Basics”