High Performance Parallelism Pearls Volume One : Multicore and Many-Core Programming Approaches.

By:

Reinders, James

Contributor(s):

Jeffers, James

Material type: Text

text

Media type:

computer

Carrier type:

online resource

ISBN:

9780128021996

Subject(s):

Genre/Form:

Electronic books.

Additional physical formats: Print version:: High Performance Parallelism Pearls Volume OneDDC classification:

004.35

LOC classification:

QA76.642 -- .R456 2015eb

Online resources:

Click to View

Contents:

Front Cover -- High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches -- Copyright -- Contents -- Contributors -- Acknowledgments -- Foreword -- Humongous computing needs: Science years in the making -- Open standards -- Keen on many-core architecture -- Xeon Phi is born: Many cores, excellent vector ISA -- Learn highly scalable parallel programming -- Future demands grow: Programming models matter -- Preface -- Inspired by 61 cores: A new era in programming -- Chapter 1: Introduction -- Learning from successful experiences -- Code modernization -- Modernize with concurrent algorithms -- Modernize with vectorization and data locality -- Understanding power usage -- ISPC and OpenCL anyone? -- Intel Xeon Phi coprocessor specific -- Many-core, neo-heterogeneous -- No "Xeon Phi" in the title, neo-heterogeneous programming -- The future of many-core -- Downloads -- Chapter 2: From "Correct" to "Correct &amp -- Efficient": A Hydro2D Case Study with Godunov's Scheme -- Scientific computing on contemporary computers -- Modern computing environments -- CEA's Hydro2D -- A numerical method for shock hydrodynamics -- Euler's equation -- Godunov's method -- Where it fits -- Features of modern architectures -- Performance-oriented architecture -- Programming tools and runtimes -- Our computing environments -- Paths to performance -- Running Hydro2D -- Hydro2D's structure -- Computation scheme -- Data structures -- Measuring performance -- Optimizations -- Memory usage -- Thread-level parallelism -- Arithmetic efficiency and instruction-level parallelism -- Data-level parallelism -- Summary -- The coprocessor vs the processor -- A rising tide lifts all boats -- Performance strategies -- Chapter 3: Better Concurrency and SIMD on HBM -- The application: HIROMB - BOOS -Model -- Key usage: DMI -- HBM execution profile.

Overview for the optimization of HBM -- Data structures: Locality done right -- Thread parallelism in HBM -- Data parallelism: SIMD vectorization -- Trivial obstacles -- Premature abstraction is the root of all evil -- Results -- Profiling details -- Scaling on processor vs. coprocessor -- Contiguous attribute -- Summary -- References -- Chapter 4: Optimizing for Reacting Navier-Stokes Equations -- Getting started -- Version 1.0: Baseline -- Version 2.0: ThreadBox -- Version 3.0: Stack memory -- Version 4.0: Blocking -- Version 5.0: Vectorization -- Intel Xeon Phi coprocessor results -- Summary -- Chapter 5: Plesiochronous Phasing Barriers -- What can be done to improve the code? -- What more can be done to improve the code? -- Hyper-Thread Phalanx -- What is nonoptimal about this strategy? -- Coding the Hyper-Thread Phalanx -- How to determine thread binding to core and HT within core? -- The Hyper-Thread Phalanx hand-partitioning technique -- A lesson learned -- Back to work -- Data alignment -- Use aligned data when possible -- Redundancy can be good for you -- The plesiochronous phasing barrier -- Let us do something to recover this wasted time -- A few "left to the reader" possibilities -- Xeon host performance improvements similar to Xeon Phi -- Summary -- Chapter 6: Parallel Evaluation of Fault Tree Expressions -- Motivation and background -- Expressions -- Expression of choice: Fault trees -- An application for fault trees: Ballistic simulation -- Example implementation -- Syntax and parsing results -- Creating evaluation arrays -- Evaluating the expression array -- Using ispc for vectorization -- Other considerations -- Summary -- Chapter 7: Deep-Learning Numerical Optimization -- Fitting an objective function -- Objective functions and principle components analysis -- Software and example data -- Training data -- Runtime results.

Scaling results -- Summary -- Chapter 8: Optimizing Gather/Scatter Patterns -- Gather/scatter instructions in Intel® architecture -- Gather/scatter patterns in molecular dynamics -- Optimizing gather/scatter patterns -- Improving temporal and spatial locality -- Choosing an appropriate data layout: AoS versus SoA -- On-the-fly transposition between AoS and SoA -- Amortizing gather/scatter and transposition costs -- Summary -- Chapter 9: A Many-Core Implementation of the Direct N-Body Problem -- N-Body simulations -- Initial solution -- Theoretical limit -- Reduce the overheads, align your data -- Optimize the memory hierarchy -- Improving our tiling -- What does all this mean to the host version? -- Summary -- Chapter 10: N -Body Methods -- Fast N -body methods and direct N -body kernels -- Applications of N -body methods -- Direct N -body code -- Performance results -- Summary -- Chapter 11: Dynamic Load Balancing Using OpenMP 4.0 -- Maximizing hardware usage -- The N-Body kernel -- The offloaded version -- A first processor combined with coprocessor version -- Version for processor with multiple coprocessors -- Chapter 12: Concurrent Kernel Offloading -- Setting the context -- Motivating example: particle dynamics -- Organization of this chapter -- Concurrent kernels on the coprocessor -- Coprocessor device partitioning and thread affinity -- Offloading from OpenMP host program -- Offloading from MPI host program -- Case study: concurrent Intel MKL dgemm offloading -- Persistent thread groups and affinities on the coprocessor -- Concurrent data transfers -- Case study: concurrent MKL dgemm offloading with data transfers -- Force computation in PD using concurrent kernel offloading -- Parallel force evaluation using Newton's 3rd law -- Implementation of the concurrent force computation -- Performance evaluation: before and after -- The bottom line.

Chapter 13. Heterogeneous Computing with MPI -- MPI in the modern clusters -- MPI task location -- Single-task hybrid programs -- Selection of the DAPL providers -- The first provider ofa-v2-mlx4_0-1u -- The second provider ofa-v2-scif0 and the impact of the intra-node fabric -- The last provider, also called the proxy -- Hybrid application scalability -- Load balance -- Task and thread mapping -- Summary -- Acknowledgments -- Chapter 14: Power Analysis on the Intel® Xeon Phi™ Coprocessor -- Power analysis 101 -- Measuring power and temperature with software -- Creating a power and temperature monitor script -- Creating a power and temperature logger with the micsmc tool -- Power analysis using IPMI -- Hardware-based power analysis methods -- A hardware-based coprocessor power analyzer -- Summary -- Chapter 15: Integrating Intel Xeon Phi Coprocessors into a Cluster Environment -- Early explorations -- Beacon system history -- Beacon system architecture -- Hardware -- Software environment -- Intel MPSS installation procedure -- Preparing the system -- Installation of the Intel MPSS stack -- Generating and customizing configuration files -- MPSS upgrade procedure -- Setting up the resource and workload managers -- Torque -- Prologue -- Epilogue -- TORQUE /coprocessor integration -- Moab -- Improving network locality -- Moab/coprocessor integration -- Health checking and monitoring -- Scripting common commands -- User software environment -- Future directions -- Summary -- Acknowledgments -- Chapter 16: Supporting Cluster File Systems on Intel® Xeon Phi™ Coprocessors -- Network configuration concepts and goals -- A look at networking options -- Steps to set up a cluster enabled coprocessor -- Coprocessor file systems support -- Support for NFS -- Support for Lustre® file system -- Support for Fraunhofer BeeGFS ® (formerly FHGFS) file system.

Support for Panasas® PanFS ® file system -- Choosing a cluster file system -- Summary -- Chapter 17. NWChem: Quantum Chemistry Simulations at Scale -- Introduction -- Overview of single-reference CC formalism -- NWChem software architecture -- Global Arrays -- Tensor Contraction Engine -- Engineering an offload solution -- Offload architecture -- Kernel optimizations -- Performance evaluation -- Summary -- Acknowledgments -- Chapter 18: Efficient Nested Parallelism on Large-Scale Systems -- Motivation -- The benchmark -- Baseline benchmarking -- Pipeline approach-flat_arena class -- Intel® TBB user-managed task arenas -- Hierarchical approach-hierarchical_arena class -- Performance evaluation -- Implication on NUMA architectures -- Summary -- Chapter 19: Performance Optimization of Black-Scholes Pricing -- Financial market model basics and the Black-Scholes formula -- Financial market mathematical model -- European option and fair price concepts -- Black-Scholes formula -- Options pricing -- Test infrastructure -- Case study -- Preliminary version-Checking correctness -- Reference version-Choose appropriate data structures -- Reference version-Do not mix data types -- Vectorize loops -- Use fast math functions: erff() vs. cdfnormf() -- Equivalent transformations of code -- Align arrays -- Reduce precision if possible -- Work in parallel -- Use warm-up -- Using the Intel Xeon Phi coprocessor-"No effort" port -- Use Intel Xeon Phi coprocessor: Work in parallel -- Use Intel Xeon Phi coprocessor and streaming stores -- Summary -- Chapter 20: Data Transfer Using the Intel COI Library -- First steps with the Intel COI library -- COI buffer types and transfer performance -- Applications -- Summary -- Chapter 21: High-Performance Ray Tracing -- Background -- Vectorizing ray traversal -- The Embree ray tracing kernels -- Using Embree in an application.

Performance.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

No physical items for this record

Performance.

Description based on publisher supplied metadata and other sources.

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

There are no comments on this title.

to post a comment.

Back to results

1 Arithmetic Geometry :
by Aubry, Yves.
2 In Memoriam Richard Dedekind (1831–1916) Number Theory – Algebra – Set Theory – History – Philosophy :
by Scheel, Katrin.
3 Arithmetic, Geometry, and Coding Theory :
by Pellikaan, R.
4 Group Theory, Algebra, and Number Theory :
by Zimmer, Horst G.
5 Groups, Difference Sets, and the Monster :
by Arasu, K. T.
6 Théorie des Nombres / Number Theory :
by Koninck, Jean M. de.
7 Algebraic K-Theory and Algebraic Number Theory.
by Stein, Michael R.
8 Applications of Algebraic K-Theory to Algebraic Geometry and Number Theory.
by Bloch, Spencer J.
9 Applications of Algebraic K-Theory to Algebraic Geometry and Number Theory.
by Bloch, Spencer J.
10 Number Theory.
by Murty, V. Kumar.
11 Representation Theory and Number Theory in Connection with the Local Langlands Conjecture.
by Ritter, J.
12 Analysis, Geometry, Number Theory :
by Grinberg, E.L.
13 Women in Numbers 2 :
by David, Chantal.
14 Interactions Between Hyperbolic Geometry, Quantum Topology and Number Theory.
by Champanerkar, Abhijit.
15 Primes and Knots.
by Kohno, Toshitake.
16 Finite Fields :
by McGuire, Gary.
17 Diophantine Methods, Lattices, and Arithmetic Theory of Quadratic Forms.
by Chan, Wai Kiu.
18 Dynamical Numbers :
by Kolyada, Sergiy.
19 Computational Arithmetic Geometry.
by Lauter, Kristin E.
20 Spectral Analysis in Geometry and Number Theory.
by Kotani, Motoko.