Spark Internals and Design Basics

Apache Spark is an open-source general-purpose cluster computing engine built around speed, ease of use, and sophisticated analytics, that is primarily used for Big Data. It came in as an alternative to cope with the complexity and tediousness in Hadoop MapReduce for running Machine Learning algorithms. Spark introduces two main abstractions: resilient distributed datasets (RDDs) and parallel operations. Spark is written in Scala Programming Language and runs on Java Virtual Machine (JVM) environment. It very well exploits the functional programming aspect of Scala for elegance and simplicity.
Continue reading “Spark Internals and Design Basics”

IPv6: Floating IPs and Duplicate Address Detection

The very nature of the floating IPs can lead to some classical quirks in a distributed system network. This discussion mainly focuses on IPv6, and how its duplicate IP detection mechanism can clash with the floating IP technique.

Floating IPs are a common scenario in Highly Available or Scaled-out Distributed Systems. The basic idea behind it is to have a transient IP address that can move from one node to another, keeping the change of serving-node transparent on the access-side of the network. For instance, if there are two server machines, each represented by an unique IP, and one of them goes down, then its IP address “floats” to the other server which will henceforth process the client requests. This technique is widely used to provide seamless transition from one serving-node to another in case of failures. One such implementation is present in OpenStack Nova.

On the other hand, Duplicate Address Detection (DAD) is a mechanism to identify if same IP is assigned to multiple nodes in a local network. It is implemented using the Neighbor Discovery Protocol (NDP) under IPv6. It uses the Neighbor Solicitation (NS) and Neighbor Advertisement (NA) messages. The operation is applicable to all the IPs that are link-local. More specifically:

  • all the IPs that fall under the link-local address-family
  • all the IPs that fall under global address-family but are present on the same LAN (one hop away on the link)

Continue reading “IPv6: Floating IPs and Duplicate Address Detection”