Online Safety Community

In order to understand the goals of MapReduce, it is important to realize for which scenarios MapReduce is optimized. The MapReduce programming model is created for processing data which requires “DATA PARALLELISM”, the ability to compute multiple independent operations in any order (King). In parallel processing, commutative operations are operations where the order of execution does not matter to the results of the equation. Commutativity can apply to complex operations and even processes, as long as they don’t manipulate the same memory. For example, in the figure below, as long as foo(a) and bar(b) don’t manipulate the same variable, they can occur in parallel in different threads. However, the write operation must wait for both foo() and bar() to complete. The figure below illustrates a dependency graph between foo(a), bar(a) and the write command.

Figure 1 – Parallelism Dependency Graph

One of the goals of parallelism is identifying the logical “tasks” or units which can be run in parallel as threads. Parallel programming techniques require developers to implement dependency graphs, which can become much more as the amount of shared information and sequence of operations increases. Techniques such as locks and barriers, critical sections, semaphores, monitors, RPC and rendezvous have been proposed to aid in the design of multi threaded and distributed. In Parallel and Distributed processing, intelligent task design attempts to eliminate as many synchronization points as possible, but some will still be required. Patterns such as “Master/Worker” and “Producer/Consumer” are different patterns that developers can use to implement parallel thread processing.

MapReduce provides a programming model which abstracts many of the aforementioned complexities of parallel processing from the software engineer. The MapReduce implementation performs much of the “wiring” associated with parallel processing, leaving the developer to implement relatively simple methods. The use of MapReduce does come with some constraints, making it less appropriate for some tasks. MapReduce models are optimized for tasks where a large number of key*value input lists must be processed somewhat independently. MapReduce map() method must be commutative, in order for the MapReduce implementation to make use of parallelization. MapReduce enables the parallelization across hundreds and even thousands of CPU’s.

Views: 13

Reply to This

Take our poll!

Take our poll!

Latest Activity

Jam Blanco posted a blog post

Appropriate Safety Masks to Use to Avoid Inhaling Asbestos

Workers from practically all trades, from factory workers to roofers, carpenters, auto mechanics, plumbers, and others are in danger of being exposed to asbestos on the job. Studies prove that individuals…See More
9 hours ago
Adam Fleaming posted a blog post

Virtual healthcare has to be understood and used for what it can offer

Among the many offshoots of the growth of technology; virtual healthcare is a very recent and important development. In simple language, virtual healthcare, a term easy enough to understand, is the use of technologies that enable remote consultation and monitoring of healthcare.Putting technologies to remote use has been in use for a while now, what with corporate entities carrying out conferences and virtual conferences at the push of a button. Virtual technology has also been in widespread…See More
15 hours ago
Mark Nilson posted events
16 hours ago
John Robinson posted events
17 hours ago


Understanding Data Parallelism in MapReduce

In order to understand the goals of MapReduce, it is important to realize for which scenarios MapReduce is optimized. The MapReduce programming model is created for processing data which requires…Continue

Tags: program, Implementation, Mapreduce

Started by gracylayla Mar 14.

Automation Anywhere. How do I pick a value from dropdown

Automation Anywhere. How do I pick a value from dropdown. I tried 'set text' from a copied variable. Its very slow, and also doesnt…Continue

Tags: anywhere, automation

Started by emmablisa Mar 9.

TensorFlow serving vs TensorFlow service

I have a question regarding the difference between TensorFlow Serving versus TensorFlow service. (Sorry that I'm not familiar with this at all.)I found TensorFlow serving's definition, which is "…Continue

Tags: training, online, tensorflow

Started by emmablisa Feb 27.

Proper maintenance for Hi Vis clothing

Can you tell me how to properly take care of Hi Vis jackets? I recently purchased a few ones from this…Continue

Started by Lily Osborn Feb 25.

Forklift Operator Requirements 1 Reply

At our company we have a lot of forklift traffic that has to share the same aisles as our pedestrians. We limit the speed of our lifts to 3 mph.  I am wanting to find out what requirements for…Continue

Started by Rick Briggs. Last reply by Tony Ferraro Feb 25.



© 2018   Created by Safety Community.   Powered by

Badges  |  Report an Issue  |  Terms of Service