Spark Cookbook : With over 60 Recipes on Spark, Covering Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX Libraries This Is the Perfect Spark Book to Always Have by Your Side.
Material type:
- text
- computer
- online resource
- 9781783987078
- 005.756
- QA76.9.D32 .Y384 2015
Cover -- Copyright -- Credits -- About the Author -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Getting Started with Apache Spark -- Introduction -- Installing Spark from binaries -- Building the Spark source code with Maven -- Launching Spark on Amazon EC2 -- Deploying on a cluster in standalone mode -- Deploying on a cluster with Mesos -- Deploying on a cluster with YARN -- Using Tachyon as an off-heap storage layer -- Chapter 2: Developing Applications with Spark -- Introduction -- Exploring the Spark shell -- Developing Spark applications in Eclipse with Maven -- Developing Spark applications in Eclipse with SBT -- Developing a Spark application in IntelliJ IDEA with Maven -- Developing a Spark application in IntelliJ IDEA with SBT -- Chapter 3: External Data Sources -- Introduction -- Loading data from the local filesystem -- Loading data from HDFS -- Loading data from HDFS using a custom InputFormat -- Loading data from Amazon S3 -- Loading data from Apache Cassandra -- Loading data from relational databases -- Chapter 4: Spark SQL -- Introduction -- Understanding Catalyst optimizer -- Creating HiveContext -- Inferring schema using case classes -- Programmatically specifying the schema -- Loading and saving data using the Parquet format -- Loading and saving data using the JSON format -- Loading and saving data from relational databases -- Loading and saving data from an arbitrary source -- Chapter 5: Spark Streaming -- Introduction -- Word count using Streaming -- Streaming Twitter data -- Streaming using Kafka -- Chapter 6: Getting Started with Machine Learning using MLlib -- Introduction -- Creating vectors -- Creating a labeled point -- Creating matrices -- Calculating summary statistics -- Calculating correlation -- Doing hypothesis testing -- Creating machine learning pipelines using ML.
Chapter 7: Supervised Learning with MLlib Regression -- Introduction -- Using linear regression -- Understanding cost function -- Doing linear regression with lasso -- Doing ridge regression -- Chapter 8: Supervised Learning with MLlib - Classification -- Introduction -- Doing classification using logistic regression -- Doing binary classification using SVM -- Doing classification using decision trees -- Doing classification using Random Forests -- Doing classification using Gradient Boosted Trees -- Doing classification with Naïve Bayes -- Chapter 9: Unsupervised Learning -- Introduction -- Clustering using k-means -- Dimensionality reduction with principal component analysis -- Dimensionality reduction with singular value decomposition -- Chapter 10: Recommender Systems -- Introduction -- Collaborative filtering using explicit feedback -- Collaborative filtering using implicit feedback -- Chapter 11: Graph Processing Using GraphX -- Introduction -- Fundamental operations on graphs -- Using PageRank -- Finding connected components -- Performing neighborhood aggregation -- Chapter 12: Optimizations and Performance Tuning -- Introduction -- Optimizing memory -- Using compression to improve performance -- Using serialization to improve performance -- Optimizing garbage collection -- Optimizing level of parallelism -- Understanding future of optimization - project Tungsten -- Index.
Description based on publisher supplied metadata and other sources.
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
There are no comments on this title.