Spark

4 books

Order by

View

Learning Spark, 2nd Edition

Lightning-Fast Data Analytics

by Jules S. Damji, Brooke Wenig, Tathagata Das and Denny Lee

Data is bigger, arrives faster, and comes in a variety of formatsâ??and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark.

Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, youâ??ll be able to:

Learn Python, SQL, Scala, or Java high-level Structured APIs
Understand Spark operations and SQL Engine
Inspect, tune, and debug Spark operations with Spark configurations and Spark UI
Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka
Perform analytics on batch and streaming data using Structured Streaming
Build reliable data pipelines with open source Delta Lake and Spark
Develop machine learning pipelines with MLlib and productionize models using MLflow

About the book

4.36/5 on Goodreads

ISBN 9781492050049

Published in 2020

397 pages

O'Reilly Media

Spark GraphX in Action

by Michael S. Malak and Robin East

Spark GraphX in Action starts out with an overview of Apache Spark and the GraphX graph processing API. This example-based tutorial then teaches you how to configure GraphX and how to use it interactively. Along the way, you'll collect practical techniques for enhancing applications and applying machine learning algorithms to graph data.

About the book

3.56/5 on Goodreads

ISBN 9781617292521

Published in 2016

280 pages

Manning Publications

Spark in Action, 2nd Edition

Covers Apache Spark 3 with Examples in Java, Python, and Scala

by Jean-Georges Perrin

The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In

Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop.

About the book

3.96/5 on Goodreads

ISBN 9781617295522

Published in 2020

576 pages

Manning Publications

Spark: The Definitive Guide

Big Data Processing Made Simple

by Bill Chambers and Matei Zaharia

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.

Youâ??ll explore the basic operations and common functions of Sparkâ??s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Sparkâ??s scalable machine-learning library.

Get a gentle overview of big data and Spark
Learn about DataFrames, SQL, and Datasetsâ??Sparkâ??s core APIsâ??through worked examples
Dive into Sparkâ??s low-level APIs, RDDs, and execution of SQL and DataFrames
Understand how Spark runs on a cluster
Debug, monitor, and tune Spark clusters and applications
Learn the power of Structured Streaming, Sparkâ??s stream-processing engine
Learn how you can apply MLlib to a variety of problems, including classification or recommendation

About the book

4.14/5 on Goodreads

ISBN 9781491912218

Published in 2018

603 pages

O'Reilly Media