High Performance Computing : 4th Latin American Conference, CARLA 2017, Buenos Aires, Argentina, and Colonia Del Sacramento, Uruguay, September 20-22, 2017, Revised Selected Papers.
Material type:
- text
- computer
- online resource
- 9783319733531
- 4.3
- TK7885-7895
Intro -- Preface -- Organization -- Contents -- HPC Infrastructures and Datacenters -- A Deep Learning Mapper (DLM) for Scheduling on Heterogeneous Systems -- 1 Introduction -- 2 Motivation -- 2.1 Mapping -- 2.2 Machine/Deep Learning -- 2.3 Program Behaviors and CPU Scheduling -- 3 Scheduling Model -- 3.1 Deep Learning Mapper (DLM) -- 3.2 Overheads -- 4 Evaluation -- 4.1 Methodology -- 5 Related Work -- 6 Future Work and Conclusion -- References -- Power Consumption Characterization of Synthetic Benchmarks in Multicores -- 1 Introduction -- 2 Related Work -- 3 Methodology for Power Consumption Evaluation -- 3.1 Overview of the Proposed Methodology -- 3.2 Benchmarks -- 3.3 Multicore Hosts and Power Monitoring Setup -- 3.4 Design of Experiments -- 4 Experimental Results -- 4.1 Single Benchmark Executions -- 4.2 Combined Benchmark Executions -- 4.3 Performance Evaluation -- 4.4 Energy Efficiency Analysis -- 5 Conclusions and Future Work -- References -- Initial Experiences from TUPAC Supercomputer -- 1 Introduction -- 2 Projects and Users -- 2.1 Scientific Projects -- 2.2 Industrial Projects -- 3 Cluster Operations -- 3.1 Resource Management -- 3.2 Infrastructure Monitoring -- 3.3 User Support -- 4 Usage of TUPAC -- 5 Conclusions -- References -- HPC Industry and Education -- romeoLAB: A High Performance Training Platform for HPC, GPU and DeepLearning -- 1 Introduction -- 2 Related Work and Motivations -- 2.1 Online Tools for Specific Code Development -- 2.2 Online Tools for Educational Purposes -- 2.3 romeoLAB Motivations -- 3 A Web-Based Solution in a HPC Cluster -- 3.1 User View of the Lab Starting Process -- 3.2 Server Part -- 3.3 Jupyter Resources -- 3.4 Additional Tools -- 3.5 Security -- 4 Features and Usages -- 4.1 The Web-Portal Use Cases -- 5 Practical Usages -- 6 Discussions and Future Work -- 7 Conclusion -- References.
GPU, Multicores, Accelerators -- Analysis and Characterization of GPU Benchmarks for Kernel Concurrency Efficiency -- 1 Introduction -- 2 Related Work -- 3 Benchmark Suites -- 4 Methodology -- 5 Experimental Results -- 5.1 Experimental Environment -- 5.2 Individual Analysis -- 5.3 Global Analysis -- 5.4 Discussion -- 6 Concluding Remarks -- References -- Parallel Batch Self-Organizing Map on Graphics Processing Unit Using CUDA -- Abstract -- 1 Introduction -- 2 SOM Algorithm -- 3 Related Work -- 4 Parallel Batch-SOM on CUDA -- 5 Comparison and Results -- 6 Conclusion and Future Work -- Acknowledgments -- References -- Performance Prediction of Acoustic Wave Numerical Kernel on Intel Xeon Phi Processor -- 1 Introduction -- 2 Acoustic Wave Equation -- 3 Testbed and Machine Learning Methodology -- 3.1 Feature Vectors -- 3.2 Machine Learning Model -- 4 Experiments -- 4.1 Preliminary Results -- 4.2 Performance Prediction -- 5 Related Works -- 6 Conclusion -- References -- Evaluating the NVIDIA Tegra Processor as a Low-Power Alternative for Sparse GPU Computations -- 1 Introduction -- 2 Accelerated Solution of Sparse Linear Systems with ILUPACK -- 2.1 Computation of the Preconditioner -- 2.2 Application of the Preconditioner During the Iterative Solve -- 3 Proposal -- 3.1 Exploiting the Data Parallelism in ILUPACK -- 3.2 Threshold Based Version -- 4 Experimental Evaluation -- 4.1 Experimental Setup -- 4.2 Results -- 5 Final Remarks and Future Work -- References -- HPC Applications and Tools -- Benchmarking Performance: Influence of Task Location on Cluster Throughput -- 1 Introduction -- 2 Related Work -- 3 Characterization of HPC Facilities -- 3.1 Benchmarking -- 3.2 Infrastructure Characterization -- 3.3 Influence of Node Sharing on Memory Access Time -- 4 Results -- 4.1 Cluster Performance -- 4.2 NAS Benchmarking -- 4.3 Dedicated Nodes Cluster Setup.
4.4 Sensitivity to the Clusters Setup -- 5 Conclusions -- References -- PRIMULA: A Framework Based on Finite Elements to Address Multi Scale and Multi Physics Problems -- Abstract -- 1 Introduction -- 2 PRIMULA General Features -- 3 Scalability Results -- 4 An Example of Field: PLATE Fuel -- 5 Conclusions -- Acknowledgment -- References -- FaaSter, Better, Cheaper: The Prospect of Serverless Scientific Computing and HPC -- 1 Research Direction -- 2 Background on Function-as-a-Service -- 2.1 Programming Models and Runtimes -- 2.2 Providers and Performance -- 3 Scientific Computing Experiments with Functions -- 3.1 Mathematics: Calculation of -- 3.2 Computer Graphics: Face Detection -- 3.3 Cryptology: Password Cracking -- 3.4 Meteorology: Precipitation Forecast -- 4 Findings -- 5 Summary and Repeatability -- References -- AccaSim: An HPC Simulator for Workload Management -- 1 Introduction -- 2 Workload Management System in HPC -- 3 AccaSim -- 3.1 Architecture and Main Features -- 3.2 Implementation, Instantiation and Customization -- 4 Case Study -- 5 Related Work -- 6 Conclusions -- References -- SherlockFog: Finding Opportunities for MPI Applications in Fog and Edge Computing -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 SherlockFog: A Distributed Experimental Framework to Enable Fog and Edge Computing -- 3.2 Features of SherlockFog -- 3.3 Considerations When Using SherlockFog -- 3.4 Underlying Topology -- 4 Validation -- 4.1 Latency Emulation -- 4.2 Token Ring -- 4.3 MPI Token Ring -- 5 Results -- 6 Conclusions -- References -- Big Data and Data Management -- IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart Grid Application -- 1 Introduction -- 2 Related Work and Discussion -- 3 Architecture and Implementation -- 3.1 Cloud Layer -- 3.2 Edge Layer -- 3.3 Sensor Layer -- 3.4 Communication Protocol.
3.5 Measurement Algorithm -- 4 Evaluation -- 4.1 Communication Evaluation -- 4.2 Application Evaluation -- 5 Conclusion and Future Works -- References -- Model-R: A Framework for Scalable and Reproducible Ecological Niche Modeling -- 1 Introduction -- 2 Model-R Framework -- 2.1 Frontend -- 3 Modeling Process and Backend -- 4 Reproducibility -- 5 Case Study and Evaluation -- 6 Related Work -- 7 Conclusion -- References -- Parallel and Distributed Algorithms -- Task Scheduling for Processing Big Graphs in Heterogeneous Commodity Clusters -- 1 Introduction -- 2 Related Work -- 3 Graph Processing Frameworks -- 3.1 Fork-Join for Graphs -- 3.2 Pregel -- 3.3 DPM -- 4 Scheduling Strategies -- 5 Experiments -- 5.1 Twitter Followee Recommender -- 5.2 Dataset -- 5.3 Scenarios -- 5.4 Results -- 6 Conclusions -- References -- Exploring Application-Level Message-Logging in Scalable HPC Programs -- 1 Resilience in HPC Applications -- 2 Application-Level Message-Logging -- References -- Accelerated Numerical Optimization with Explicit Consideration of Model Constraints -- Abstract -- 1 Introduction -- 2 Particle Swarm Optimization -- 3 PSO Implementation -- 4 Results -- 5 Conclusions and Future Work -- Acknowledgments -- References -- Parallel Processing of Intra-cranial Electroencephalogram Readings on Distributed Memory Systems -- 1 Introduction -- 2 Related Work -- 2.1 Historic Review -- 2.2 Distributed Approaches to Process iEEG Readings -- 3 Distributed Approaches for Processing iEEG Readings -- 3.1 Working with iEEG Data in a Distributed Environment -- 3.2 Data and Functional Distribution Approaches -- 4 Two Implementations for Processing iEEG Readings on Distributed Computing Systems -- 4.1 The Proposed Processing Algorithm -- 4.2 Proposed Implementation Using a Message Passing Approach -- 4.3 Map-Reduce Implementation Using Apache Hadoop.
5 Experimental Evaluation -- 5.1 Execution Environments -- 5.2 Evaluation Metrics -- 5.3 Computational Efficiency Analysis -- 6 Conclusions and Future Work -- References -- Support Vector Machine Acceleration for Intel Xeon Phi Manycore Processors -- 1 Introduction -- 2 Hardware and Software Platform -- 2.1 Manycore Processors and Intel®Xeon PhiTM -- 2.2 Intel®C++ Compiler -- 2.3 Intel®Math Kernel Library -- 3 Related Work -- 4 LIBSVM Implementation for Intel®Xeon PhiTM -- 4.1 Coarse-Grain Parallelism Using OpenMP -- 4.2 Compiling with Intel®C++ Compiler -- 4.3 Integration with Intel®MKL -- 5 Experimental Analysis -- 5.1 Execution Platform -- 5.2 Problem Instances -- 5.3 Coarse-Grain Parallelization -- 5.4 Vectorized Dot Product Computation -- 5.5 Two-Level Parallelization Approach -- 6 Conclusions and Future Work -- References -- Performance Improvements of a Parallel Multithreading Self-gravity Algorithm -- 1 Introduction -- 2 Spatial Domain Decomposition Techniques and the Parallel Self-gravity Implementation on ESyS-Particle -- 2.1 Spatial Domain Decomposition Techniques -- 2.2 A Parallel Algorithm for Self-gravity Calculation -- 2.3 Self-gravity Implementation on ESyS-Particle -- 3 Improvements of the Baseline Implementation -- 3.1 Reducing the Execution Time of the Self-gravity Computation -- 3.2 Profiling the Self-gravity Calculation -- 4 Experimental Evaluation -- 4.1 Description of the Test Scenario and Instances -- 4.2 Profiling of the Optimized Version -- 4.3 Performance Evaluation Results -- 5 Conclusions -- References -- A Fast GPU Convolution/Superposition Method for Radiotherapy Dose Calculation -- 1 Introduction -- 2 Theory -- 2.1 Radiological Path -- 2.2 Convolution/Superposition -- 2.3 Collapsed Cone Approximation -- 3 The Parallel Collapsed Cone Kernel Algorithm -- 3.1 Ray Tracing -- 3.2 Discrete CCK Algorithm.
3.3 GPU Implementation.
Description based on publisher supplied metadata and other sources.
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
There are no comments on this title.