ORPP logo
Image from Google Jackets

Heterogeneous Computing with OpenCL 2. 0.

By: Contributor(s): Material type: TextTextPublisher: San Diego : Elsevier Science & Technology, 2015Copyright date: ©2015Edition: 1st edDescription: 1 online resource (330 pages)Content type:
  • text
Media type:
  • computer
Carrier type:
  • online resource
ISBN:
  • 9780128016497
Subject(s): Genre/Form: Additional physical formats: Print version:: Heterogeneous Computing with OpenCL 2. 0DDC classification:
  • 005.2752
LOC classification:
  • QA76.642.H484 2015
Online resources:
Contents:
Front Cover -- Heterogeneous Computing with OpenCL 2.0 -- Copyright -- Contents -- List of Figures -- List of Tables -- Foreword -- Acknowledgments -- Chapter 1: Introduction -- 1.1 Introduction to Heterogeneous Computing -- 1.2 The Goals of This Book -- 1.3 Thinking Parallel -- 1.4 Concurrency and Parallel Programming Models -- 1.5 Threads and Shared Memory -- 1.6 Message-Passing Communication -- 1.7 Different Grains of Parallelism -- 1.7.1 Data Sharing and Synchronization -- 1.7.2 Shared Virtual Memory -- 1.8 Heterogeneous Computing with OpenCL -- 1.9 Book Structure -- References -- Chapter 2: Device architectures -- 2.1 Introduction -- 2.2 Hardware Trade-offs -- 2.2.1 Performance Increase with Frequency, and its Limitations -- 2.2.2 Superscalar Execution -- 2.2.3 Very Long Instruction Word -- 2.2.4 SIMD and Vector Processing -- 2.2.5 Hardware Multithreading -- 2.2.6 Multicore Architectures -- 2.2.7 Integration: Systems-on-Chip and the APU -- 2.2.8 Cache Hierarchies and Memory Systems -- 2.3 The Architectural Design Space -- 2.3.1 CPU Designs -- Low-power CPUs -- Mainstream desktop CPUs -- Server CPUs -- 2.3.2 GPU Architectures -- Handheld GPUs -- At the high end: AMD Radeon R9 290X and NVIDIA GeForce GTX 780 -- 2.3.3 APU and APU-like Designs -- 2.4 Summary -- References -- Chapter 3: Introduction to OpenCL -- 3.1 Introduction -- 3.1.1 The OpenCL Standard -- 3.1.2 The OpenCL Specification -- 3.2 The OpenCL Platform Model -- 3.2.1 Platforms and Devices -- 3.3 The OpenCL Execution Model -- 3.3.1 Contexts -- 3.3.2 Command-Queues -- 3.3.3 Events -- 3.3.4 Device-Side Enqueuing -- 3.4 Kernels and the OpenCL Programming Model -- 3.4.1 Compilation and Argument Handling -- 3.4.2 Starting Kernel Execution on a Device -- 3.5 OpenCL Memory Model -- 3.5.1 Memory Objects -- Buffers -- Images -- Pipes -- 3.5.2 Data Transfer Commands -- 3.5.3 Memory Regions.
3.5.4 Generic Address Space -- 3.6 The OpenCL Runtime with an Example -- 3.6.1 Complete Vector Addition Listing -- 3.7 Vector Addition Using an OpenCL C++ Wrapper -- 3.8 OpenCL for CUDA Programmers -- 3.9 Summary -- Reference -- Chapter 4: Examples -- 4.1 OpenCL Examples -- 4.2 Histogram -- 4.3 Image Rotation -- 4.4 Image Convolution -- 4.5 Producer-Consumer -- 4.6 Utility Functions -- 4.6.1 Reporting Compilation Errors -- 4.6.2 Creating a Program String -- 4.7 Summary -- Chapter 5: OpenCL runtime and concurrency model -- 5.1 Commands and the Queuing Model -- 5.1.1 Blocking Memory Operations -- 5.1.2 Events -- 5.1.3 Command Barriers and Markers -- 5.1.4 Event Callbacks -- 5.1.5 Profiling Using Events -- 5.1.6 User Events -- 5.1.7 Out-of-Order Command-Queues -- 5.2 Multiple Command-Queues -- 5.3 The Kernel Execution Domain: Work-Items, Work-Groups, and NDRanges -- 5.3.1 Synchronization -- 5.3.2 Work-Group Barriers -- 5.3.3 Built-In Work-Group Functions -- 5.3.4 Predicate Evaluation Functions -- 5.3.5 Broadcast Functions -- 5.3.6 Parallel Primitive Functions -- 5.4 Native and Built-In Kernels -- 5.4.1 Native kernels -- 5.4.2 Built-in kernels -- 5.5 Device-Side Queuing -- 5.5.1 Creating a Device-Side Queue -- 5.5.2 Enqueuing Device-Side Kernels -- Dynamic local memory -- Enforcing dependencies using events -- 5.6 Summary -- Reference -- Chapter 6: OpenCL host-side memory model -- 6.1 Memory Objects -- 6.1.1 Buffers -- 6.1.2 Images -- 6.1.3 Pipes -- 6.2 Memory Management -- 6.2.1 Managing Default Memory Objects -- 6.2.2 Managing Memory Objects with Allocation Options -- 6.3 Shared Virtual Memory -- 6.4 Summary -- Chapter 7: OpenCL device-side memory model -- 7.1 Synchronization and Communication -- 7.1.1 Barriers -- 7.1.2 Atomics -- 7.2 Global Memory -- 7.2.1 Buffers -- 7.2.2 Images -- 7.2.3 Pipes -- 7.3 Constant Memory -- 7.4 Local Memory.
7.5 Private Memory -- 7.6 Generic Address Space -- 7.7 Memory Ordering -- 7.7.1 Atomics Revisited -- 7.7.2 Fences -- 7.8 Summary -- Chapter 8: Dissecting OpenCL on a heterogeneous system -- 8.1 OpenCL on an AMD FX-8350 CPU -- 8.1.1 Runtime Implementation -- 8.1.2 Vectorizing Within a Work-Item -- 8.1.3 Local Memory -- 8.2 OpenCL on the AMD Radeon R9 290X GPU -- 8.2.1 Threading and the Memory System -- 8.2.2 Instruction Set Architecture and Execution Units -- 8.2.3 Resource Allocation -- 8.3 Memory Performance Considerations in OpenCL -- 8.3.1 Global Memory -- 8.3.2 Local Memory as a Software-Managed Cache -- 8.4 Summary -- References -- Chapter 9: Case study: Image clustering -- 9.1 Introduction -- 9.2 The Feature Histogram on the CPU -- 9.2.1 Sequential Implementation -- 9.2.2 OpenMP parallelization -- 9.3 OpenCL Implementation -- 9.3.1 Naive GPU Implementation: GPU1 -- 9.3.2 Coalesced Memory Accesses: GPU2 -- 9.3.3 Vectorizing Computation: GPU3 -- 9.3.4 Move SURF Features to Local Memory: GPU4 -- 9.3.5 Move Cluster Centroids to Constant Memory: GPU5 -- 9.4 Performance Analysis -- 9.4.1 GPU Performance -- 9.5 Conclusion -- References -- Chapter 10: OpenCL profilingand debugging -- 10.1 Introduction -- 10.2 Profiling OpenCL Code Using Events -- 10.3 AMD CodeXL -- 10.4 Profiling Using CodeXL -- 10.4.1 Collecting OpenCL Application Traces -- 10.4.2 Host API Trace View -- 10.4.3 Summary Pages View -- 10.4.4 Collecting GPU Kernel Performance Counters -- 10.4.5 CPU Performance Profiling Using CodeXL -- 10.5 Analyzing Kernels Using CodeXL -- 10.5.1 KernelAnalyzer Statistics and ISA Views -- 10.5.2 KernelAnalyzer Analysis View -- 10.6 Debugging OpenCL Kernels Using CodeXL -- 10.6.1 API-Level Debugging -- 10.6.2 Kernel Debugging -- Multi-Watch-viewing data during kernel debugging -- 10.7 Debugging Using printf -- 10.8 Summary.
Chapter 11: Mapping high-levelprogramming languagesto OpenCL 2.0 -- 11.1 Introduction -- 11.2 A Brief Introduction to C++ AMP -- 11.2.1 C++ AMP array_view -- 11.2.2 C++ AMP parallel_for_each, or Kernel Invocation -- Functors as kernels -- Captured variables as kernel arguments -- The restrict(amp) modifier -- 11.3 OpenCL 2.0 as a Compiler Target -- 11.4 Mapping Key C++ AMP Constructs to OpenCL -- 11.5 C++ AMP Compilation Flow -- 11.6 Compiled C++ AMP Code -- 11.7 How Shared Virtual Memory in OpenCL 2.0 Fits in -- 11.8 Compiler Support for Tiling in C++AMP -- 11.8.1 Dividing the Compute Domain -- 11.8.2 Specifying the Address Space and Barriers -- 11.9 Address Space Deduction -- 11.10 Data Movement Optimization -- 11.10.1 discard_data() -- 11.10.2 array_view&lt -- const T, N&gt -- -- 11.11 Binomial Options: A Full Example -- 11.12 Preliminary Results -- 11.13 Conclusion -- Reference -- Chapter 12: WebCL: EnablingOpenCL accelerationof Web applications -- 12.1 Introduction -- 12.2 Programming with WebCL -- 12.3 Synchronization -- 12.4 Interoperability with WebGL -- 12.5 Example Application -- 12.6 Security Enhancement -- 12.7 WebCL on the Server -- 12.8 Status and Future of WebCL -- References -- Works Cited -- Chapter 13: Foreign lands: Plugging OpenCL in -- 13.1 Introduction -- 13.2 Beyond C and C++ -- 13.3 Haskell OpenCL -- 13.3.1 Module Structure -- 13.3.2 Environments -- 13.3.3 Reference Counting -- 13.3.4 Platform and Devices -- 13.3.5 The Execution Environment -- Contexts -- Command queues -- Buffers -- Creating an OpenCL program object -- The OpenCL kernel -- Full source code example for vector addition -- 13.4 Summary -- References -- Index.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
No physical items for this record

Front Cover -- Heterogeneous Computing with OpenCL 2.0 -- Copyright -- Contents -- List of Figures -- List of Tables -- Foreword -- Acknowledgments -- Chapter 1: Introduction -- 1.1 Introduction to Heterogeneous Computing -- 1.2 The Goals of This Book -- 1.3 Thinking Parallel -- 1.4 Concurrency and Parallel Programming Models -- 1.5 Threads and Shared Memory -- 1.6 Message-Passing Communication -- 1.7 Different Grains of Parallelism -- 1.7.1 Data Sharing and Synchronization -- 1.7.2 Shared Virtual Memory -- 1.8 Heterogeneous Computing with OpenCL -- 1.9 Book Structure -- References -- Chapter 2: Device architectures -- 2.1 Introduction -- 2.2 Hardware Trade-offs -- 2.2.1 Performance Increase with Frequency, and its Limitations -- 2.2.2 Superscalar Execution -- 2.2.3 Very Long Instruction Word -- 2.2.4 SIMD and Vector Processing -- 2.2.5 Hardware Multithreading -- 2.2.6 Multicore Architectures -- 2.2.7 Integration: Systems-on-Chip and the APU -- 2.2.8 Cache Hierarchies and Memory Systems -- 2.3 The Architectural Design Space -- 2.3.1 CPU Designs -- Low-power CPUs -- Mainstream desktop CPUs -- Server CPUs -- 2.3.2 GPU Architectures -- Handheld GPUs -- At the high end: AMD Radeon R9 290X and NVIDIA GeForce GTX 780 -- 2.3.3 APU and APU-like Designs -- 2.4 Summary -- References -- Chapter 3: Introduction to OpenCL -- 3.1 Introduction -- 3.1.1 The OpenCL Standard -- 3.1.2 The OpenCL Specification -- 3.2 The OpenCL Platform Model -- 3.2.1 Platforms and Devices -- 3.3 The OpenCL Execution Model -- 3.3.1 Contexts -- 3.3.2 Command-Queues -- 3.3.3 Events -- 3.3.4 Device-Side Enqueuing -- 3.4 Kernels and the OpenCL Programming Model -- 3.4.1 Compilation and Argument Handling -- 3.4.2 Starting Kernel Execution on a Device -- 3.5 OpenCL Memory Model -- 3.5.1 Memory Objects -- Buffers -- Images -- Pipes -- 3.5.2 Data Transfer Commands -- 3.5.3 Memory Regions.

3.5.4 Generic Address Space -- 3.6 The OpenCL Runtime with an Example -- 3.6.1 Complete Vector Addition Listing -- 3.7 Vector Addition Using an OpenCL C++ Wrapper -- 3.8 OpenCL for CUDA Programmers -- 3.9 Summary -- Reference -- Chapter 4: Examples -- 4.1 OpenCL Examples -- 4.2 Histogram -- 4.3 Image Rotation -- 4.4 Image Convolution -- 4.5 Producer-Consumer -- 4.6 Utility Functions -- 4.6.1 Reporting Compilation Errors -- 4.6.2 Creating a Program String -- 4.7 Summary -- Chapter 5: OpenCL runtime and concurrency model -- 5.1 Commands and the Queuing Model -- 5.1.1 Blocking Memory Operations -- 5.1.2 Events -- 5.1.3 Command Barriers and Markers -- 5.1.4 Event Callbacks -- 5.1.5 Profiling Using Events -- 5.1.6 User Events -- 5.1.7 Out-of-Order Command-Queues -- 5.2 Multiple Command-Queues -- 5.3 The Kernel Execution Domain: Work-Items, Work-Groups, and NDRanges -- 5.3.1 Synchronization -- 5.3.2 Work-Group Barriers -- 5.3.3 Built-In Work-Group Functions -- 5.3.4 Predicate Evaluation Functions -- 5.3.5 Broadcast Functions -- 5.3.6 Parallel Primitive Functions -- 5.4 Native and Built-In Kernels -- 5.4.1 Native kernels -- 5.4.2 Built-in kernels -- 5.5 Device-Side Queuing -- 5.5.1 Creating a Device-Side Queue -- 5.5.2 Enqueuing Device-Side Kernels -- Dynamic local memory -- Enforcing dependencies using events -- 5.6 Summary -- Reference -- Chapter 6: OpenCL host-side memory model -- 6.1 Memory Objects -- 6.1.1 Buffers -- 6.1.2 Images -- 6.1.3 Pipes -- 6.2 Memory Management -- 6.2.1 Managing Default Memory Objects -- 6.2.2 Managing Memory Objects with Allocation Options -- 6.3 Shared Virtual Memory -- 6.4 Summary -- Chapter 7: OpenCL device-side memory model -- 7.1 Synchronization and Communication -- 7.1.1 Barriers -- 7.1.2 Atomics -- 7.2 Global Memory -- 7.2.1 Buffers -- 7.2.2 Images -- 7.2.3 Pipes -- 7.3 Constant Memory -- 7.4 Local Memory.

7.5 Private Memory -- 7.6 Generic Address Space -- 7.7 Memory Ordering -- 7.7.1 Atomics Revisited -- 7.7.2 Fences -- 7.8 Summary -- Chapter 8: Dissecting OpenCL on a heterogeneous system -- 8.1 OpenCL on an AMD FX-8350 CPU -- 8.1.1 Runtime Implementation -- 8.1.2 Vectorizing Within a Work-Item -- 8.1.3 Local Memory -- 8.2 OpenCL on the AMD Radeon R9 290X GPU -- 8.2.1 Threading and the Memory System -- 8.2.2 Instruction Set Architecture and Execution Units -- 8.2.3 Resource Allocation -- 8.3 Memory Performance Considerations in OpenCL -- 8.3.1 Global Memory -- 8.3.2 Local Memory as a Software-Managed Cache -- 8.4 Summary -- References -- Chapter 9: Case study: Image clustering -- 9.1 Introduction -- 9.2 The Feature Histogram on the CPU -- 9.2.1 Sequential Implementation -- 9.2.2 OpenMP parallelization -- 9.3 OpenCL Implementation -- 9.3.1 Naive GPU Implementation: GPU1 -- 9.3.2 Coalesced Memory Accesses: GPU2 -- 9.3.3 Vectorizing Computation: GPU3 -- 9.3.4 Move SURF Features to Local Memory: GPU4 -- 9.3.5 Move Cluster Centroids to Constant Memory: GPU5 -- 9.4 Performance Analysis -- 9.4.1 GPU Performance -- 9.5 Conclusion -- References -- Chapter 10: OpenCL profilingand debugging -- 10.1 Introduction -- 10.2 Profiling OpenCL Code Using Events -- 10.3 AMD CodeXL -- 10.4 Profiling Using CodeXL -- 10.4.1 Collecting OpenCL Application Traces -- 10.4.2 Host API Trace View -- 10.4.3 Summary Pages View -- 10.4.4 Collecting GPU Kernel Performance Counters -- 10.4.5 CPU Performance Profiling Using CodeXL -- 10.5 Analyzing Kernels Using CodeXL -- 10.5.1 KernelAnalyzer Statistics and ISA Views -- 10.5.2 KernelAnalyzer Analysis View -- 10.6 Debugging OpenCL Kernels Using CodeXL -- 10.6.1 API-Level Debugging -- 10.6.2 Kernel Debugging -- Multi-Watch-viewing data during kernel debugging -- 10.7 Debugging Using printf -- 10.8 Summary.

Chapter 11: Mapping high-levelprogramming languagesto OpenCL 2.0 -- 11.1 Introduction -- 11.2 A Brief Introduction to C++ AMP -- 11.2.1 C++ AMP array_view -- 11.2.2 C++ AMP parallel_for_each, or Kernel Invocation -- Functors as kernels -- Captured variables as kernel arguments -- The restrict(amp) modifier -- 11.3 OpenCL 2.0 as a Compiler Target -- 11.4 Mapping Key C++ AMP Constructs to OpenCL -- 11.5 C++ AMP Compilation Flow -- 11.6 Compiled C++ AMP Code -- 11.7 How Shared Virtual Memory in OpenCL 2.0 Fits in -- 11.8 Compiler Support for Tiling in C++AMP -- 11.8.1 Dividing the Compute Domain -- 11.8.2 Specifying the Address Space and Barriers -- 11.9 Address Space Deduction -- 11.10 Data Movement Optimization -- 11.10.1 discard_data() -- 11.10.2 array_view&lt -- const T, N&gt -- -- 11.11 Binomial Options: A Full Example -- 11.12 Preliminary Results -- 11.13 Conclusion -- Reference -- Chapter 12: WebCL: EnablingOpenCL accelerationof Web applications -- 12.1 Introduction -- 12.2 Programming with WebCL -- 12.3 Synchronization -- 12.4 Interoperability with WebGL -- 12.5 Example Application -- 12.6 Security Enhancement -- 12.7 WebCL on the Server -- 12.8 Status and Future of WebCL -- References -- Works Cited -- Chapter 13: Foreign lands: Plugging OpenCL in -- 13.1 Introduction -- 13.2 Beyond C and C++ -- 13.3 Haskell OpenCL -- 13.3.1 Module Structure -- 13.3.2 Environments -- 13.3.3 Reference Counting -- 13.3.4 Platform and Devices -- 13.3.5 The Execution Environment -- Contexts -- Command queues -- Buffers -- Creating an OpenCL program object -- The OpenCL kernel -- Full source code example for vector addition -- 13.4 Summary -- References -- Index.

Description based on publisher supplied metadata and other sources.

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

There are no comments on this title.

to post a comment.

© 2024 Resource Centre. All rights reserved.