Spark RDD : groupByKey VS reduceByKey

let’s look at two different ways to compute word counts, one using reduceByKey and the other using groupByKey: val words = Array( “a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”,”a”,”b”,”c”,”a”); val pairs = sc.parallelize(words).map(line => (line,1)); val wordCountsWithGroup = pairs.groupByKey().map(t => (t._1, t._2.sum)).collect() val wordCountsWithReduce = pairs.reduceByKey(_ + _) .collect()   While both of these functions will produce the correct answer, the reduceByKey example works much better...

Best practices to avoid NullPointerException

Table of Contents1 Call equals() and equalsIgnoreCase() method on known String2 Prefer valueOf() over toString() where both return same result3 Using null safe methods and libraries4 Avoid returning null from method, instead return empty collection or empty array. Call equals() and equalsIgnoreCase() method on known String Always call equals() method on known String which is not null. Since equals() method...