In this blog post we will explain you how to “skip header and footer rows in hive”. In Hive we can ignore N number of rows from top and bottom from a file using TBLPROPRTIES clause. The TBLPROPERTIES clause provides various feature which can be set as per our need.   …

The retention period of records in Kafka is configurable. The default retention period is 7 days. The retention period is specific to topic. SO in the cluster each topic can have their own retention period. The retention attribute is available in the server.properties of the apache kafka distribution. The attribute …

A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages. Some important points: Distributed systems are the systems which are designed in such a way that it distributes the load within the system and process the load simultaneously. To …

Apache Spark is an analytics engine which can process huge data volumes at a speed much faster than MapReduce, because the data is persisted on Spark’s own processing framework. That is why it has been catching the attention of both professionals and the press. It was first developed at AMPLab …

What kind of issues you’re facing while using cluster? 1. Lack of configuration management. 2. Poor allocation of resources. 3. Lack of a dedicated network. 4. Lack of monitoring and metrics. 5. Ignorance of what log files contain what information. 6. Inadvertent introduction of single points of failure. Cluster issues …

Table of Contents1 What is Lazy Evaluation in Spark2 Advantages of Lazy Evaluation in Spark transformation What is Lazy Evaluation in Spark As the name itself indicates its definition, lazy evaluation in Spark means that the execution will not start until an action is triggered. In Spark, the picture of …

Table of Contents1 Traditional approach2 Typed pattern3 Functional approach to pattern matching4 Pattern matching and collection: the look-alike approach Traditional approach There is an approach to the Scala’s pattern matching that looks similar to the switch-case structure in C and Java: each case entry use an integer or any scalar …

Table of Contents1 Statically Typed Languages2 Dynamically Typed Languages Statically Typed Languages The type of every variable is known at compile time. Code is more verbose Bugs get caught quickly and easily. E.g : Java, C, C++ Dynamically Typed Languages The type of every variable is known at run time. …

Sometimes you may not know the value of your variable immediately. You can only assign your variable’s value at some later point in time during the execution of your application. Let’s assume that you need to declare a variable called msg of type String, but you won’t initialize it just yet. We’ve …