Normal view MARC view ISBD view

Topics in Parallel and Distributed Computing : Introducing Concurrency in Undergraduate Courses.

Prasad, Sushil K.

Topics in Parallel and Distributed Computing : Introducing Concurrency in Undergraduate Courses. - 1st ed. - 1 online resource (359 pages)

Front Cover -- Topics in Parallel and Distributed Computing: Introducing Concurrency in Undergraduate Courses -- Contents -- Contributors -- Editor and author biographical sketches -- Symbol or phrase -- Chapter 1: Editors' introduction and road map -- 1.1 Why This Book? -- 1.2 Chapter introductions -- 1.2.1 Part One-For Instructors -- 1.2.2 Part Two-For Students -- 1.3 How to Find a Topic or Material for a Course -- 1.4 Invitation to write for volume 2 -- Part 1: For instructors -- Chapter 2: Hands-on parallelism with no prerequisites and little time using Scratch -- 2.1 Contexts for Application -- 2.2 Introduction to Scratch -- 2.3 Parallel Computing and Scratch -- 2.3.1 Parallelism and Communication for Clean Solutions -- 2.3.2 A Challenge of Parallelism: Race Conditions -- 2.3.3 Blocking and Nonblocking Commands -- 2.3.4 Shared and Private Variables -- 2.4 Conclusion -- References -- Chapter 3: Parallelism in Python for novices -- 3.1 Introduction -- 3.2 Background -- 3.2.1 Target Audience of this Chapter -- 3.2.2 Goals for the Reader -- 3.2.3 Tools -- Python -- The multiprocessing module -- 3.3 Student prerequisites -- 3.3.1 Motivating Examples Involving Parallelism Concepts -- 3.3.2 Python Programming -- 3.4 General approach: parallelism as a medium -- 3.5 Course materials -- 3.6 Processes -- 3.6.1 Spawning a Process -- Sample key ideas -- 3.6.2 Spawning Multiple Processes -- Sample key ideas -- Sample key ideas -- 3.6.3 Spawning Multiple Processes Using Pool -- 3.6.4 Anonymous Processes -- Sample key ideas -- 3.6.5 Specifying Process Names -- 3.6.6 Using a Lock to Control Printing -- Sample key ideas -- 3.6.7 Digging Holes -- Sample key ideas -- 3.7 Communication -- 3.7.1 Communicating Via a Queue -- Sample key ideas -- 3.7.2 Extended Communication Via a Queue -- Sample key ideas -- 3.7.3 The Join Method -- Sample key ideas. 3.7.4 Obtaining a Result from a Single Child -- Sample key ideas -- 3.7.5 Using a Queue to Merge Multiple Child Process Results -- Sample key ideas -- 3.7.6 Mergesort Using Process Spawning and Queue Objects -- 3.7.7 Sharing a Data Structure -- 3.8 Speedup -- 3.8.1 Timing the Summation of Random Numbers -- 3.8.2 Using Join to Time a Child Process -- 3.8.3 Comparing Parallel and Sequential Performance -- 3.9 Further examples using the Pool/map paradigm -- 3.9.1 Monte Carlo Estimation of -- 3.9.2 Integration by Riemann Sum -- 3.9.3 Mergesort -- 3.10 Conclusion -- References -- Chapter 4: Modules for introducing threads -- 4.1 Introduction -- Target audience -- Threads and OpenMP -- Topics covered -- Using these modules -- Computing environment -- 4.2 Prime Counting -- 4.2.1 Using this Module -- 4.2.2 Sequential Code -- 4.2.3 Step-by-Step Parallelization -- Creating multiple threads -- Joining the threads -- Fixing the race condition -- Privatizing the counter -- Improving load balance -- 4.2.4 Followup Assignment -- 4.3 Mandelbrot -- 4.3.1 Using this Module -- 4.3.2 Sequential Program -- 4.3.3 Step-by-Step Parallelization with OpenMP -- Pragma on the outer loop -- Fixing the race condition -- Swapping the loops -- Pragma on the inner loop -- Dynamic scheduling -- 4.3.4 Parallelization Using Explicit Threads -- References -- Chapter 5: Introducing parallel and distributed computing concepts in digital logic -- 5.1 Number representation -- 5.2 Logic gates -- 5.2.1 Fan-Out and Fan-In of Gates -- 5.2.2 Tristate Gates and Buses -- 5.3 Combinational logic synthesis and analysis -- 5.3.1 Timing Analysis -- 5.3.2 Karnaugh Maps -- 5.4 Combinational building blocks -- 5.4.1 Adders -- 5.4.2 Multiplexers -- 5.5 Counters and registers -- 5.5.1 Counters -- 5.5.2 Shift Registers -- 5.6 Other digital logic topics -- 5.6.1 Latches and Flip-Flops. 5.6.2 Finite State Machines -- 5.6.3 Verilog -- 5.6.4 Programmable Logic Devices to Field-Programmable Gate Arrays -- 5.6.5 Practical Considerations -- References -- Chapter 6: Networks and MPI for cluster computing -- 6.1 Why Message Passing/MPI? -- 6.1.1 Shared Memory Cache Coherent Nonuniform Memory Access Architecture does not Extend to an Extreme Scale -- 6.1.2 Message Passing Provides Scalability for ``Grand Challenge'' Problems -- Need more tightly coupled systems than distributed computing efforts -- 6.1.3 SPMD Model Enables Reasoning About and Programming O(Million) Process Programs -- 6.1.4 HPC in the Cloud Compared with Traditional HPC -- 6.2 The Message Passing Concept -- 6.2.1 Point-to-Point Communication -- Basic idea of communicating processes -- Unique addresses for each process -- Send-receive pairs illustrated by an example -- Using buffers for storage -- Point-to-point communications and message size -- Complete forms of MPI_Send and MPI_Recv and message tags -- 6.2.2 Introduction to Collective Communication -- Barrier synchronization -- Other types of collectives -- Suggested classroom activity -- 6.2.3 BSP Model Illustrated with a Simple Example -- 6.2.4 Nonblocking Communication -- 6.3 High-Performance Networks -- 6.3.1 Differences from Commodity Networks -- 6.3.2 Introduction to RDMA -- 6.3.3 Main Memory and RDMA -- 6.3.4 Ethernet RDMA -- 6.4 Advanced Concepts -- 6.4.1 Hybrid Programming with MPI and OpenMP -- 6.4.2 Supercomputers: Enabling HPC -- References -- Part 2: For students -- Chapter 7: Fork-join parallelism with a data-structures focus -- 7.1 Meta-introduction: an instructor's view of this material -- 7.1.1 Where This Material Fits in a Changing Curriculum -- 7.1.2 Six Theses On A Successful Approach to this Material -- 7.1.3 How to Use These Materials-And Improve Them -- 7.2 Introduction. 7.2.1 More Than One Thing At Once -- 7.2.2 Parallelism Versus Concurrency Control -- 7.2.3 Basic Threads and Shared Memory -- 7.2.4 Other Models -- 7.3 Basic fork-join parallelism -- 7.3.1 A Simple Example: Okay Idea, Inferior Style -- 7.3.2 Why Not Use One Thread Per Processor? -- 7.3.3 Divide-and-Conquer Parallelism -- 7.3.4 The Java ForkJoin Framework -- 7.3.5 Reductions and Maps -- 7.3.6 Data Structures Besides Arrays -- 7.4 Analyzing fork-join algorithms -- 7.4.1 Work and Span -- Defining work and span -- Defining speedup and parallelism -- The ForkJoin framework bound -- 7.4.2 Amdahl's Law -- 7.4.3 Comparing Amdahl's Law and Moore's Law -- 7.5 Fancier fork-join algorithms: prefix, pack, sort -- 7.5.1 Parallel Prefix Sum -- 7.5.2 Pack -- 7.5.3 Parallel Quicksort -- 7.5.4 Parallel Mergesort -- Acknowledgments -- Reference -- Chapter 8: Shared-memory concurrency control with a data-structures focus -- 8.1 Introduction -- 8.2 The Programming Model -- 8.3 Synchronization with Locks -- 8.3.1 The Need for Synchronization -- 8.3.2 Locks -- 8.3.3 Locks in Java -- 8.4 Race Conditions: Bad Interleavings and Data Races -- 8.4.1 Bad Interleavings: An Example with Stacks -- 8.4.2 Data Races: Wrong Even When They Look Right -- Inability to reason in the presence of data races -- Partial explanation of why data races are disallowed -- The grand compromise -- Avoiding data races -- A more likely example -- 8.5 Concurrency Programming Guidelines -- 8.5.1 Conceptually Splitting Memory into Three Parts -- More thread-local memory -- More immutable memory -- 8.5.2 Approaches to Synchronization -- 8.6 Deadlock -- 8.7 Additional Synchronization Primitives -- 8.7.1 Reader/Writer Locks -- 8.7.2 Condition Variables -- 8.7.3 Other Primitives -- Acknowledgments -- Reference -- Chapter 9: Parallel computing in a Python-based computer science course. 9.1 Parallel programming -- 9.1.1 Parallelizable Loops -- 9.1.2 Barriers -- 9.1.3 Outline -- 9.2 Parallel reduction -- 9.2.1 Reducing in Parallel when n ≤ p -- 9.2.2 Reducing in Parallel when n > -- p -- 9.3 Parallel scanning -- 9.3.1 Scanning in Parallel when n p -- 9.3.2 Scanning in Parallel when n > -- p -- 9.3.3 Inclusive Scans in Parallel -- 9.4 Copy-scans -- 9.4.1 Copy-Scanning in Parallel when n p -- 9.4.2 Copy-Scanning in Parallel when n > -- p -- 9.5 Partitioning in parallel -- 9.5.1 Meld Operations -- 9.5.2 Permute Operations -- 9.5.3 Partitioning -- 9.5.4 Analysis -- 9.6 Parallel quicksort -- 9.6.1 Analysis -- 9.7 How to perform segmented scans and reductions -- 9.7.1 Segmented Scans -- 9.7.2 Segmented Inclusive Scans -- 9.7.3 Segmented Copy-Scans -- 9.7.4 Segmented Reductions -- 9.8 Comparing sequential and parallel running times -- References -- Chapter 10: Parallel programming illustrated through Conway's Game of Life -- 10.1 Introduction -- 10.1.1 Conway's Game of Life -- 10.1.2 Programming the Game of Life -- 10.1.3 General Thoughts on Parallelism -- 10.2 Parallel variants -- 10.2.1 Data Parallelism -- Vector instructions -- Vector pipelining -- GPUs -- 10.2.2 Loop-Based Parallelism -- 10.2.3 Coarse-Grained Data Parallelism -- Shared memory parallelism -- Distributed memory parallelism -- Distributed memory programming -- Task scheduling -- 10.3 Advanced topics -- 10.3.1 Data Partitioning -- 10.3.2 Combining Work, Minimizing Communication -- 10.3.3 Load Balancing -- 10.4 Summary -- References -- Appendix A: Chapters and topics -- Index -- Back Cover.

ISBN: 9780128039380

Subjects--Topical Terms:
Parallel processing (Electronic computers).

Index Terms--Genre/Form:
Electronic books.

LC Class. No.: QA76.58 .T675 2015

Dewey Class. No.: 004.3/5