Hadoop has become a huge part of Data Warehouse in most companies. It is used for a variety of use-cases: Search and Web Indexing, Machine learning, Analytics and Reporting, and so on. Most organizations are building Hadoop clusters in addition to maintaining traditional Data Warehousing technologies such as Teradata, Vertica and other OLAP systems. A very common use-case is processing huge amounts of data (by huge I mean Tera, Peta!) with Hadoop and loading aggregated summary data into the OLAP engines.

However, with Hadoop we need to keep in mind that it’s a system designed for fault-tolerance, amongst others. In order to keep the job processing moving forward in case of machine failures, Hadoop spawns a Task (Map or Reduce) on more than 1 node if it thinks the originally spawned task is slower compared to the rest. This is known as speculative execution. A task could be slow due to bad memory, disk or other hardware issues (or simply due to handling more data compared to other Map or Reduce tasks). And Hadoop tries to have a node that is free perform the work for a task that is slow (compared to the rest of the tasks).

Here is a good description of Speculative Execution from Yahoo Developer Network Blog

One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program. For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes.

By forcing tasks to run in isolation from one another, individual tasks do not know where their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. This process is known as speculative execution. When tasks complete, they announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first.

Data export from MapReduce to Database:

There could be a problem when using Speculative Execution with MapReduce jobs that directly write to database. The problem arises due to the fact that if the same input is processed multiple times, each one of those parallel tasks might be writing the exact same records to the database before the redundant tasks are killed.

Disable Speculative execution to avoid this:

Speculative execution is enabled by default. You can disable speculative execution for the mappers and reducers by setting the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution Configuration options to false, respectively.

Advertisements