Blog

Deploy Apache Airflow with Kubernetes

Apache Airflow: an open-source platform designed for orchestrating complex workflows. Airflow’s Directed Acyclic Graphs (DAGs) provide a visual representation of pipeline tasks and their dependencies...

How does Adaptive Query Execution (AQE) fix your Spark performance issues?

Bykerrache.massipssa | March 8, 2025 | Apache Spark |

In Apache Spark versions before 3.0, the common performance issues encountered are: Data skewness, inadequate partitioning, causing uneven distribution. Suboptimal query plan choices, where Spark...

Deploy Apache Spark with Kubernetes (K8s)

Bykerrache.massipssa | January 25, 2025 | Apache Spark,Kubernetes |

Apache Spark is one of the most used distributed engines to deal with large amounts of data. Multiple tools can be used to run Spark: Spark Standalone, Apache Hadoop Yarn, Apache Mesos, and...

Optimize Apache Spark SQL Queries using Predicate Pushdown

Bykerrache.massipssa | January 25, 2025 | Apache Spark |

In this article, we will explore how leveraging Predicate Pushdown can enhance the performance of your Spark SQL queries, providing insights into a powerful optimization technique for efficient data...

Why You Should Avoid Using UDFs in PySpark

Bykerrache.massipssa | January 25, 2025 | Apache Spark |

In Apache Spark, it’s well-known that using User-Defined Functions (UDFs), especially with PySpark, can aggressively compromise your application’s performance. In this article, we’ll explore why and...

Ten Essential Dockerfile Commands

Bykerrache.massipssa | January 25, 2025 | Docker |

Docker is a powerful containerization technology that allows you to package and distribute applications along with their dependencies in a consistent and portable way. One of the key components of...

Spark Internal Execution Plans

Bykerrache.massipssa | January 25, 2025 | Apache Spark |

When working with Apache Spark, it’s crucial to understand the concepts of logical and physical plans, as they play a pivotal role in the execution of your data processing tasks. In this blog post, we...

Dynamic Secrets: HashiCorp Vault, PostgreSQL and Python

Bykerrache.massipssa | January 20, 2025 | Other Topics |

It is standard security practice to isolate secrets from code, and developers should not concern themselves with the origin of these secrets. This is where HashiCorp Vault comes in to centralize those...

Apache Spark Partitioning and Bucketing

Bykerrache.massipssa | January 20, 2025 | Apache Spark |

One of Apache Spark’s key features is its ability to efficiently distribute data across a cluster of machines and process it in parallel. This parallelism is crucial for achieving high performance in...

Blog

Deploy Apache Airflow with Kubernetes

How does Adaptive Query Execution (AQE) fix your Spark performance issues?

Deploy Apache Spark with Kubernetes (K8s)

Optimize Apache Spark SQL Queries using Predicate Pushdown

Why You Should Avoid Using UDFs in PySpark

Ten Essential Dockerfile Commands

Spark Internal Execution Plans

Dynamic Secrets: HashiCorp Vault, PostgreSQL and Python

Apache Spark Partitioning and Bucketing

About Us

Departments

Quick Links