Skip to content
MasterData
  • Home
  • All Courses
  • Blog
    • Apache Spark
Start Learning
Start Learning
MasterData
  • Home
  • All Courses
  • Blog
    • Apache Spark

Apache Airflow

2
  • Apache Airflow: What, Why, and How?
  • How to Deploy Apache Airflow on Kubernetes

Apache Iceberg

3
  • [01] – Introduction to Apache Iceberg
  • [02] – Getting Started with Apache Iceberg
  • [03] – Apache Iceberg Architecture

Apache Spark

4
  • [00] – Apache Spark By Example Course
  • [01] – What is Apache Spark?
  • [02] – Installing Apache Spark Locally
  • [03] – Deploy Apache Spark with Kubernetes (K8s)

Data Build Tool (DBT)

7
  • [00] – dbt by Example Course
  • [01] – dbt : What it is, Why and How?
  • [02] – Install dbt in local
  • [03] – Explore dbt Models
  • [04] – Sources in dbt
  • [05] – Seeds in dbt
  • [06] – Jinja Templates and Macros in dbt

SQL - Advanced

2
  • [02] – View vs Materialized View
  • [03] – Window function in SQL

SQL - Basics

1
  • 02 – Understanding SQL Operations (DML, DDL, DCL, and TCL)

SQL - Intermediate

1
  • SQL Joins: Understanding INNER, LEFT, RIGHT, and FULL Joins
  • Home
  • Docs
  • Data Processing
  • Apache Spark
  • [02] – Installing Apache Spark Locally
View Categories

[02] – Installing Apache Spark Locally

kerrache.massipssa

In this tutorial, you will learn how to install Apache Spark / PySpark 3.5.4 on your local machine using the Apache Spark binary. The guide provides step-by-step instructions for Windows, macOS, and Linux.

Additionally, if you’re interested in deploying Apache Spark on Kubernetes, check out this tutorial: Deploy Apache Spark with Kubernetes (K8s).

Prerequisites #

Before installing Apache Spark, ensure you meet the following requirements:

  • Java 8 or later is required for Spark to run.
  • If you plan to write Spark jobs in Scala, you need to install Scala.
  • If you plan to write Spark jobs in Python, you need to install Python.

Install Apache Spark #

Step 1: Download Spark #

  1. Visit the official Apache Spark website.
  2. Select:
  • Spark version (latest stable)
  • Hadoop version (choose pre-built for Hadoop 3.x if unsure)
  • Download as a .tgz file.
  1. Extract the archive
tar -xvzf spark-*.tgz
mv spark-* ~/spark

Step 2: Set Environment Variables #

Now, you need to add Spark bin location to the environment variables.

Windows (PowerShell) #

  1. Open PowerShell.
  2. Set environment variables:
$env:SPARK_HOME="C:\path\to\spark"
$env:PATH+=";$env:SPARK_HOME\bin"

💡 SPARK PATH

Replace C:\path\to\spark with the actual location where you extracted the Spark binary on your system. For example, if you extracted Spark to C:\Spark, update the path accordingly.

Linux #

  1. Open the terminal and edit ~/.bashrc or ~/.zshrc:
nano ~/.bashrc
  1. Add:
export SPARK_HOME=~/spark
export PATH=$SPARK_HOME/bin:$PATH
  1. Apply changes:
source ~/.bashrc

Install PySpark (for Python users) #

pip install pyspark

Verify Installation #

To confirm that Apache Spark was installed correctly, you need to check if both spark-shell (for Scala) and pyspark (for Python) run without errors.

  • Check Spark Shell

Run the command spark-shell to check Scala Spark Shell:

install apacje spark
  • Check PySpark

Run pysaprk command to check PySpark (Python API):

install pyspark
  • Checkl Spark UI

While spark-shell or pyspark is running, open your preferred browser and navigate to: http://localhost:4040. This should display the Spark UI, which provides insights into running jobs, stages, and tasks

Conclusion #

By following this guide, you have successfully installed and configured Apache Spark locally on your machine. Whether you’re using Windows, macOS, or Linux, you now have a functional Spark environment ready for big data processing and analytics.

To deepen your knowledge, explore Spark’s core features, including RDDs, DataFrames, and SQL operations. If you plan to run Spark in a distributed environment, consider learning about YARN, Kubernetes, and cloud deployments.

Updated on March 5, 2025

Leave a Reply Cancel reply

You must be logged in to post a comment.

Table of Contents
  • Prerequisites
  • Install Apache Spark
    • Step 1: Download Spark
    • Step 2: Set Environment Variables
      • Windows (PowerShell)
      • Linux
  • Install PySpark (for Python users)
  • Verify Installation
    • Conclusion

Copyright © 2025 MasterData

Powered by MasterData

Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}