Find out how data professionals at organizations across the globe responded to questions about data, governance, integration, and the challenges they face. The survey results are full of interesting insights and statistics, and it’s worth the download. Fivetran financed this survey, compensating respondents for participation, so keep that in mind.
Key takeaways:
Get the Results: 2021 Data Engineers Global Survey
Even in simpler times, the 1990s, you couldn’t store all the data you needed for analytics in a single data warehouse. Data lakes didn’t solve the problem either. Imagine the challenge today, with a tsunami of data.
Data virtualization is a solution that takes a very different approach. Get a glimpse into the merits of a data virtualization layer in this short, but informative article.
See Why No Single Data Repository Can Be Your Silver Bullet
The Power of Denodo: An Inside Look at The Leading Data Virtualization Platform
In ETL, the assumed direction of data is from source systems to the data warehouse or data lake. Reverse ETL is about taking data from your data warehouse and putting it back into operational systems.
Hundreds to thousands of hours are often required to clean and organize data in data warehouses. It makes sense to leverage that effort for more than reporting. But there are a couple problems with this approach.
First is latency. Most data warehouses are built with the assumption that data will be loaded in big batches. Depending on your business processes, this latency could be an issue. Second, data warehouses don’t work well as transactional systems. Warehouses are designed for fewer big queries, but transactional systems are for frequent small queries.
Reverse ETL is still a new concept that will probably take a few years for the market to sort out. Meanwhile, companies like hightouch are offering solutions, and we’re interested to see where this might go.
See the Connection: Modern Data Warehouses and Reverse ETL
Tidyverse is a set of highly regarded R libraries. Their goal is to present a better user experience for R developers. Most seasoned R developers have probably used dplyr and ggplot2. But there are other interesting packages as well, including the following, which are described on the Tidyverse site.