TY - BOOK AU - Yadav,Rishi TI - Spark Cookbook: With over 60 Recipes on Spark, Covering Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX Libraries This Is the Perfect Spark Book to Always Have by Your Side SN - 9781783987078 AV - QA76.9.D32 .Y384 2015 U1 - 005.756 PY - 2015/// CY - Birmingham PB - Packt Publishing, Limited KW - Big data KW - Data mining -- Computer programs KW - SPARK (Electronic resource) KW - Electronic books N1 - Cover -- Copyright -- Credits -- About the Author -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Getting Started with Apache Spark -- Introduction -- Installing Spark from binaries -- Building the Spark source code with Maven -- Launching Spark on Amazon EC2 -- Deploying on a cluster in standalone mode -- Deploying on a cluster with Mesos -- Deploying on a cluster with YARN -- Using Tachyon as an off-heap storage layer -- Chapter 2: Developing Applications with Spark -- Introduction -- Exploring the Spark shell -- Developing Spark applications in Eclipse with Maven -- Developing Spark applications in Eclipse with SBT -- Developing a Spark application in IntelliJ IDEA with Maven -- Developing a Spark application in IntelliJ IDEA with SBT -- Chapter 3: External Data Sources -- Introduction -- Loading data from the local filesystem -- Loading data from HDFS -- Loading data from HDFS using a custom InputFormat -- Loading data from Amazon S3 -- Loading data from Apache Cassandra -- Loading data from relational databases -- Chapter 4: Spark SQL -- Introduction -- Understanding Catalyst optimizer -- Creating HiveContext -- Inferring schema using case classes -- Programmatically specifying the schema -- Loading and saving data using the Parquet format -- Loading and saving data using the JSON format -- Loading and saving data from relational databases -- Loading and saving data from an arbitrary source -- Chapter 5: Spark Streaming -- Introduction -- Word count using Streaming -- Streaming Twitter data -- Streaming using Kafka -- Chapter 6: Getting Started with Machine Learning using MLlib -- Introduction -- Creating vectors -- Creating a labeled point -- Creating matrices -- Calculating summary statistics -- Calculating correlation -- Doing hypothesis testing -- Creating machine learning pipelines using ML; Chapter 7: Supervised Learning with MLlib Regression -- Introduction -- Using linear regression -- Understanding cost function -- Doing linear regression with lasso -- Doing ridge regression -- Chapter 8: Supervised Learning with MLlib - Classification -- Introduction -- Doing classification using logistic regression -- Doing binary classification using SVM -- Doing classification using decision trees -- Doing classification using Random Forests -- Doing classification using Gradient Boosted Trees -- Doing classification with Naïve Bayes -- Chapter 9: Unsupervised Learning -- Introduction -- Clustering using k-means -- Dimensionality reduction with principal component analysis -- Dimensionality reduction with singular value decomposition -- Chapter 10: Recommender Systems -- Introduction -- Collaborative filtering using explicit feedback -- Collaborative filtering using implicit feedback -- Chapter 11: Graph Processing Using GraphX -- Introduction -- Fundamental operations on graphs -- Using PageRank -- Finding connected components -- Performing neighborhood aggregation -- Chapter 12: Optimizations and Performance Tuning -- Introduction -- Optimizing memory -- Using compression to improve performance -- Using serialization to improve performance -- Optimizing garbage collection -- Optimizing level of parallelism -- Understanding future of optimization - project Tungsten -- Index UR - https://ebookcentral.proquest.com/lib/orpp/detail.action?docID=2120230 ER -