Whether it’s recent news or just new to you, every two weeks the Data Planet serves up fascinating insights and resources from the data analytics and BI world.
Our snack-size summaries skip straight to the point.
This week’s edition of the Data Planet includes:
- How Event-Driven Architecture Solves Modern Web App Problems
- The Changing Face of ETL: Event-Driven Architecture for Data Engineers
- Logs & Offsets: (Near) Real Time ELT with Apache Kafka + Snowflake
- Software Spotlight: Apache Kafka
How Event-Driven Architecture Solves Modern Web App Problems
This article is geared toward app dev, but provides plenty of information that would be useful to data engineers. Event-driven architecture (EDA) has gained popularity in more forward-looking companies, so it’s good to have some level of awareness of it.
The author walks you through the issues that have driven innovation in web development and explains how EDA aims to solve them. A very interesting read, indeed.
Read the article
The Changing Face of ETL: Event Driven Architecture for Data Engineers
Put aside some time for this great video that explains the basics of events in relation to software engineers and data engineers and how events can unify architectures as never before. The presenter talks about how thinking in the same way we used to doesn’t work in the modern world where data proliferates exponentially and we need to access multiple sources. His promise is to show “how stream processing makes sense in both a microservices and ETL environment, and why analytics, data integration and ETL fit naturally into a streaming world.”
Watch the video: “The Changing Face of ETL: Event Driven Architecture for Data Engineers”
The Future of ETL Isn’t What It Used to Be
If you don’t have time for the preceding video, read this short post from the past. It shows how EDA can reshape how we think about data. They propose that all data should pass through a streaming platform. If you browse many enterprise software websites, you’ll usually come across a diagram where their product is in the middle and all the arrows are point at it. “Buy our product and it shall be the cornerstone on which your business will be built.” In Confluent/Kafka’s case, there may be some truth to that. It provides a platform that developers and data engineers can tap into.
Read “The Future of ETL Isn’t What It Used to Be”
Logs & Offsets: (Near) Real Time ELT with Apache Kafka + Snowflake
If you’re interested in learning how a complex organization built a “robust and accessible data warehouse,” where all the data is available to all employees, this is the right article. It even points to “data democratization” in a very practical way.
The case study is unique in that it was written first-hand by the organization that moved from a slow batch process to a new, latency process, implementing the platforms themselves. You’ll get a candid, real-world glimpse into what does and what doesn’t work.
Read the article
Read Onebridge’s “What You Need to Know About Data Democratization”
To be honest, Apache Kafka isn’t a software package you can pick up and run on a whim. Its complexity approaches that of Apache Spark/Databricks.
The most succinct explanation of Apache Kafka is that it’s an event streaming platform that enhances and enables real-time communication between applications. The platform is wildly popular in larger companies, and competitors have sprung up in recent years (Azure Event Hubs, AWS Kinesis, Google Pub Sub).
While Kafka is an older platform (as big data technologies go), its maintainers made a number of significant enhancements over the last couple years. One of the things that makes Kafka popular is the huge ecosystem of products around it.
Get a Quick Intro to Apache Kafka
Watch Short Videos to See How Apache Kafka Works