ORPP logo
Image from Google Jackets

Spark Cookbook : With over 60 Recipes on Spark, Covering Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX Libraries This Is the Perfect Spark Book to Always Have by Your Side.

By: Material type: TextTextPublisher: Birmingham : Packt Publishing, Limited, 2015Copyright date: ©2015Edition: 1st edDescription: 1 online resource (226 pages)Content type:
  • text
Media type:
  • computer
Carrier type:
  • online resource
ISBN:
  • 9781783987078
Subject(s): Genre/Form: Additional physical formats: Print version:: Spark CookbookDDC classification:
  • 005.756
LOC classification:
  • QA76.9.D32 .Y384 2015
Online resources:
Contents:
Cover -- Copyright -- Credits -- About the Author -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Getting Started with Apache Spark -- Introduction -- Installing Spark from binaries -- Building the Spark source code with Maven -- Launching Spark on Amazon EC2 -- Deploying on a cluster in standalone mode -- Deploying on a cluster with Mesos -- Deploying on a cluster with YARN -- Using Tachyon as an off-heap storage layer -- Chapter 2: Developing Applications with Spark -- Introduction -- Exploring the Spark shell -- Developing Spark applications in Eclipse with Maven -- Developing Spark applications in Eclipse with SBT -- Developing a Spark application in IntelliJ IDEA with Maven -- Developing a Spark application in IntelliJ IDEA with SBT -- Chapter 3: External Data Sources -- Introduction -- Loading data from the local filesystem -- Loading data from HDFS -- Loading data from HDFS using a custom InputFormat -- Loading data from Amazon S3 -- Loading data from Apache Cassandra -- Loading data from relational databases -- Chapter 4: Spark SQL -- Introduction -- Understanding Catalyst optimizer -- Creating HiveContext -- Inferring schema using case classes -- Programmatically specifying the schema -- Loading and saving data using the Parquet format -- Loading and saving data using the JSON format -- Loading and saving data from relational databases -- Loading and saving data from an arbitrary source -- Chapter 5: Spark Streaming -- Introduction -- Word count using Streaming -- Streaming Twitter data -- Streaming using Kafka -- Chapter 6: Getting Started with Machine Learning using MLlib -- Introduction -- Creating vectors -- Creating a labeled point -- Creating matrices -- Calculating summary statistics -- Calculating correlation -- Doing hypothesis testing -- Creating machine learning pipelines using ML.
Chapter 7: Supervised Learning with MLlib Regression -- Introduction -- Using linear regression -- Understanding cost function -- Doing linear regression with lasso -- Doing ridge regression -- Chapter 8: Supervised Learning with MLlib - Classification -- Introduction -- Doing classification using logistic regression -- Doing binary classification using SVM -- Doing classification using decision trees -- Doing classification using Random Forests -- Doing classification using Gradient Boosted Trees -- Doing classification with Naïve Bayes -- Chapter 9: Unsupervised Learning -- Introduction -- Clustering using k-means -- Dimensionality reduction with principal component analysis -- Dimensionality reduction with singular value decomposition -- Chapter 10: Recommender Systems -- Introduction -- Collaborative filtering using explicit feedback -- Collaborative filtering using implicit feedback -- Chapter 11: Graph Processing Using GraphX -- Introduction -- Fundamental operations on graphs -- Using PageRank -- Finding connected components -- Performing neighborhood aggregation -- Chapter 12: Optimizations and Performance Tuning -- Introduction -- Optimizing memory -- Using compression to improve performance -- Using serialization to improve performance -- Optimizing garbage collection -- Optimizing level of parallelism -- Understanding future of optimization - project Tungsten -- Index.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
No physical items for this record

Cover -- Copyright -- Credits -- About the Author -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Getting Started with Apache Spark -- Introduction -- Installing Spark from binaries -- Building the Spark source code with Maven -- Launching Spark on Amazon EC2 -- Deploying on a cluster in standalone mode -- Deploying on a cluster with Mesos -- Deploying on a cluster with YARN -- Using Tachyon as an off-heap storage layer -- Chapter 2: Developing Applications with Spark -- Introduction -- Exploring the Spark shell -- Developing Spark applications in Eclipse with Maven -- Developing Spark applications in Eclipse with SBT -- Developing a Spark application in IntelliJ IDEA with Maven -- Developing a Spark application in IntelliJ IDEA with SBT -- Chapter 3: External Data Sources -- Introduction -- Loading data from the local filesystem -- Loading data from HDFS -- Loading data from HDFS using a custom InputFormat -- Loading data from Amazon S3 -- Loading data from Apache Cassandra -- Loading data from relational databases -- Chapter 4: Spark SQL -- Introduction -- Understanding Catalyst optimizer -- Creating HiveContext -- Inferring schema using case classes -- Programmatically specifying the schema -- Loading and saving data using the Parquet format -- Loading and saving data using the JSON format -- Loading and saving data from relational databases -- Loading and saving data from an arbitrary source -- Chapter 5: Spark Streaming -- Introduction -- Word count using Streaming -- Streaming Twitter data -- Streaming using Kafka -- Chapter 6: Getting Started with Machine Learning using MLlib -- Introduction -- Creating vectors -- Creating a labeled point -- Creating matrices -- Calculating summary statistics -- Calculating correlation -- Doing hypothesis testing -- Creating machine learning pipelines using ML.

Chapter 7: Supervised Learning with MLlib Regression -- Introduction -- Using linear regression -- Understanding cost function -- Doing linear regression with lasso -- Doing ridge regression -- Chapter 8: Supervised Learning with MLlib - Classification -- Introduction -- Doing classification using logistic regression -- Doing binary classification using SVM -- Doing classification using decision trees -- Doing classification using Random Forests -- Doing classification using Gradient Boosted Trees -- Doing classification with Naïve Bayes -- Chapter 9: Unsupervised Learning -- Introduction -- Clustering using k-means -- Dimensionality reduction with principal component analysis -- Dimensionality reduction with singular value decomposition -- Chapter 10: Recommender Systems -- Introduction -- Collaborative filtering using explicit feedback -- Collaborative filtering using implicit feedback -- Chapter 11: Graph Processing Using GraphX -- Introduction -- Fundamental operations on graphs -- Using PageRank -- Finding connected components -- Performing neighborhood aggregation -- Chapter 12: Optimizations and Performance Tuning -- Introduction -- Optimizing memory -- Using compression to improve performance -- Using serialization to improve performance -- Optimizing garbage collection -- Optimizing level of parallelism -- Understanding future of optimization - project Tungsten -- Index.

Description based on publisher supplied metadata and other sources.

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

There are no comments on this title.

to post a comment.

© 2024 Resource Centre. All rights reserved.