Sometimes you have to be a little mad to be a genius, or so it’s said. That may be the case for those who choose to build their own bespoke ETL, or data integration, solution. With the many off-the-shelf solutions available, why would anyone choose to go through that extra effort? There are many who would say it’s madness. But when the performance outmatches any commercially available solution, it may very well be genius.
We highlight five reasons to consider building your own ETL. Ultimately, the choice you make may determine whether you are seen as a genius or not.
Getting from there to here
We’ve come a long way since the early days of data warehousing of the last century. Yes, the 1990s! The hardware we rely on has improved in spectacular ways. The software has taken advantage of those advancements, providing us with new and better options for moving and storing the mountains of data collected by businesses worldwide. The concept of data engineering has become a specialty by necessity. The complexity of integrating, moving, storing and securing data has grown along with everything else.
In the beginning, we had SQL and whatever procedural language the database vendor provided. While this may have been wrapped in another programming language, the majority of the coding was done in the former with orchestration handled by the latter. Over time, software makers started to see value in developing applications dedicated to extracting and loading data warehouses and the ETL wars were launched.
Today a variety of tools is available to fit almost any scenario. The large selections is almost overwhelming. Gartner started producing research on this subject to help organizations make informed choices.
The who and why of ETL solutions
Lacking exact stats, it’s safe to say that most organizations are using some kind of off-the-shelf tool to augment the development, deployment and execution of their data delivery systems. The application may be a simple cloud-based point-and-click front end or via a full-service analytics provider. It could be one of the many highly complex tools that have been evolving over the past 10 to 15 years. It could be one of the latest cloud-based solutions that offer hybrid options to meet the more diverse needs of a complex global company. The range of implementations is tremendous.
Then we have the minority who are choosing to roll their own ETL. The types of users vary as much as their reasons. Some are large organizations that run huge big data systems and want to optimize for scale. Others are resellers of applications with embedded analytics that require their own data delivery solutions built into their apps. And there is also the “no ETL” faction who choose to instead leverage advances in data virtualization to integrate their data.
5 good reasons for choosing to build your own ETL
With this historical backdrop to provide some context, let’s look at why an organization might choose a bespoke ETL solution.
- You are a huge organization with a deep bench of highly skilled developers and have a need for extremely high performing processes. You also have an efficient, mature software development process.
- You are not a huge organization but have skilled developers with deep knowledge of the business and no concerns about needing to replace these resources any time soon.
- Your requirements are extremely complex and you are unable to find a tool that will accommodate them. There is nothing more flexible than writing everything yourself.
- You need extreme levels of performance and tight integration with a single data source. Using the fewest number of products is the best answer, but still requires highly skilled developers.
- You have serious objections to additional vendor dependencies. The “less is more” philosophy. Word of caution: don’t choose the custom ETL approach thinking to save money, because you won’t.
If you have elected to code your own, perhaps you have other reasons that are just as valid. You may be a genius.
There is a much longer list of reasons why you should use an off-the-shelf tool. But that is the subject for another post.
Are you considering how best to integrate your data? Do you fit the profile for a bespoke ETL solution? Or are you stymied by which of the many off-the-shelf solutions is best suited to your needs? We can help.
Senturus has deployed most of the major ETL tools out there including SQL Server Integration Services (SSIS), IBM DataStage, Microsoft Power Query, Informatica, Cognos Data Manager, Tableau Prep, and Alteryx. We would love to discuss your data integration challenge and determine the best solution for you. Get in touch with us.