In this tutorial, you will learn how to install Apache Spark / PySpark 3.5.4 on your local machine using the Apache Spark binary. The guide provides step-by-step instructions for Windows, macOS, and Linux.
Additionally, if you’re interested in deploying Apache Spark on Kubernetes, check out this tutorial: Deploy Apache Spark with Kubernetes (K8s).
Prerequisites #
Before installing Apache Spark, ensure you meet the following requirements:
- Java 8 or later is required for Spark to run.
- If you plan to write Spark jobs in Scala, you need to install Scala.
- If you plan to write Spark jobs in Python, you need to install Python.
Install Apache Spark #
Step 1: Download Spark #
- Visit the official Apache Spark website.
- Select:
- Spark version (latest stable)
- Hadoop version (choose pre-built for Hadoop 3.x if unsure)
- Download as a
.tgz
file.
- Extract the archive
tar -xvzf spark-*.tgz mv spark-* ~/spark
Step 2: Set Environment Variables #
Now, you need to add Spark bin location to the environment variables.
Windows (PowerShell) #
- Open PowerShell.
- Set environment variables:
$env:SPARK_HOME="C:\path\to\spark" $env:PATH+=";$env:SPARK_HOME\bin"
💡 SPARK PATH
Replace
C:\path\to\spark
with the actual location where you extracted the Spark binary on your system. For example, if you extracted Spark toC:\Spark
, update the path accordingly.
Linux #
- Open the terminal and edit
~/.bashrc
or~/.zshrc
:
nano ~/.bashrc
- Add:
export SPARK_HOME=~/spark export PATH=$SPARK_HOME/bin:$PATH
- Apply changes:
source ~/.bashrc
Install PySpark (for Python users) #
pip install pyspark
Verify Installation #
To confirm that Apache Spark was installed correctly, you need to check if both spark-shell
(for Scala) and pyspark
(for Python) run without errors.
- Check Spark Shell
Run the command spark-shell
to check Scala Spark Shell:

- Check PySpark
Run pysaprk command to check PySpark (Python API):

- Checkl Spark UI
While spark-shell
or pyspark
is running, open your preferred browser and navigate to: http://localhost:4040. This should display the Spark UI, which provides insights into running jobs, stages, and tasks

Conclusion #
By following this guide, you have successfully installed and configured Apache Spark locally on your machine. Whether you’re using Windows, macOS, or Linux, you now have a functional Spark environment ready for big data processing and analytics.
To deepen your knowledge, explore Spark’s core features, including RDDs, DataFrames, and SQL operations. If you plan to run Spark in a distributed environment, consider learning about YARN, Kubernetes, and cloud deployments.