ETL is more than tools or an attempt at organized chaos.
Posted: Sat Jul 10, 2010 2:54 pm
ETL is more than tools or an attempt at organized chaos. Good ETL is designed to remove chaos completely for organized structure. And, while good ETL does require good tools like Datastage, but more than tools, it is a companies dedication to a data governance policy and to an enterprise data management practice. Else, it is nothing more than vapor.
Imagine if you will, that someone wants to get some data for something, just so that they can move onto the next task of that something as fast as possible. Maybe creating a web page or some other type of application. Now imagine if you will, that that something which they need is data. And, that that somone is actually many folks in many different silos.
Now, imagine that you have silos A-G and those silos all want access to ETL. Imagine that you need to move something from silo A->C.
In such a scenario, you have three very undesirable options using ETL.
Option 1: Give silo A access to parts of silo C.
Option 2: Give silo C access to parts of silo A.
Option 3: Produce flat files or other intermediate data sets from A which are fed to someone in C to process (very inefficient).
Now, none of the option above are good. Option 1 propagates security to many systems to people who shouldn't have had access to begin with (IMHO). Same problem with Option 2. Option 3, well that is just inefficient and it is something that good ETL is designed to do away with.
So, what are we left with? Silos of data trying to be managed by people who don't care about ETL and who are not ETL experts or developers, trying to perform as such, is both a security challenge and a major inefficiency. Not to mention, imagine that you had to use ETL just one time in your area. Now, you have to support a couple of jobs which you don't have the capability or knowledge to support.
Thus, ETL should be a centralized staff of people who are dedicated to ETL standards, methodologies, metadata management etc... This group of people does the administration, architecting, and developing of the ETL jobs. They are experts. Thus, ETL is not designed to destroy silos, but to allow the people who work with such datasets to not have to focus on the data movement aspects anymore, but on what they best. Maybe it is writing a .NET program or C program. But surely, not ETL.
Thoughts? Is it really this simple, but at that the same time difficult for people to understand? Or, is it just me that believes in such fantasies?
Imagine if you will, that someone wants to get some data for something, just so that they can move onto the next task of that something as fast as possible. Maybe creating a web page or some other type of application. Now imagine if you will, that that something which they need is data. And, that that somone is actually many folks in many different silos.
Now, imagine that you have silos A-G and those silos all want access to ETL. Imagine that you need to move something from silo A->C.
In such a scenario, you have three very undesirable options using ETL.
Option 1: Give silo A access to parts of silo C.
Option 2: Give silo C access to parts of silo A.
Option 3: Produce flat files or other intermediate data sets from A which are fed to someone in C to process (very inefficient).
Now, none of the option above are good. Option 1 propagates security to many systems to people who shouldn't have had access to begin with (IMHO). Same problem with Option 2. Option 3, well that is just inefficient and it is something that good ETL is designed to do away with.
So, what are we left with? Silos of data trying to be managed by people who don't care about ETL and who are not ETL experts or developers, trying to perform as such, is both a security challenge and a major inefficiency. Not to mention, imagine that you had to use ETL just one time in your area. Now, you have to support a couple of jobs which you don't have the capability or knowledge to support.
Thus, ETL should be a centralized staff of people who are dedicated to ETL standards, methodologies, metadata management etc... This group of people does the administration, architecting, and developing of the ETL jobs. They are experts. Thus, ETL is not designed to destroy silos, but to allow the people who work with such datasets to not have to focus on the data movement aspects anymore, but on what they best. Maybe it is writing a .NET program or C program. But surely, not ETL.
Thoughts? Is it really this simple, but at that the same time difficult for people to understand? Or, is it just me that believes in such fantasies?