Page 1 of 1

what's a staging area

Posted: Thu Aug 14, 2003 7:24 am
by spracht
I stumbled across the term 'staging area' a couple of times, and I have no faint idea what that is. Could somebody please explain what it means? Thanks in advance!

Stephan

Posted: Thu Aug 14, 2003 7:46 am
by luca
Hi !
I understand a "staging area" like a point and place within your ETL process that allows you to store a temporary "image" (most of the time it is done using sequential files) of your data set at that stage of the process (staging point). E.g. after extraction, before loading, ...
This is done in order to facilitate a process of recovery in case your ETL process went wrong or to enable restart capabilities.
Hope this quick explanation will help.
Cheers.
Luca

Posted: Thu Aug 14, 2003 8:46 am
by kduke
Stephan

You need to read the data warehouse gods Kimball and Inman. The concepts of data mart, staging area and a thousand other terms are defined. If you are doing ETL then the staging area is where the transformation part occurs. Normally this is a database. Thats the way Informatica would do it. DataStage is called that because the staging happens within its integrated database if used properly. I normally do not consider sequetial files part of the staging area but I guess they really are.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

Posted: Thu Aug 14, 2003 9:29 am
by luca
Obviously my understanding of staging area can be improved ;o)
Can we consider that Hashed files, sequential files and temporary tables are the support of the staging area?
Can we say that the staging area is what we find between the Source Systems or ODS and the DW?
Thanks for your clarifications.

Posted: Thu Aug 14, 2003 9:31 am
by kcbland
Sandbox. Your playground for transforming the source data into the target data. Source --> Sandbox --> Target. You should use sequential staging files (Kimball) because they're compressible, transportable, viewable, searchable, loadable, blah blah blah. It doesn't preclude using hash files and work tables, but it's in your best interest to keep and archive these for an operational window that meets your recoverability SLA and audit capabilities. The sandbox exists between sources and persistent staging database (induction database), PSD and ODS, ODS and EDW, EDW and marts. It's a philosophy, a way of life.

You'll hear people say the Sandbox isn't necessary. However, restart and recoverability are major concerns for almost every client. Not landing ready-to-load files is a big boo-boo. Ask a client whose database corrupts and they need to go to a 5 day old backup. Now you have to catch up 5 days worth of data. Is it better to re-process 5 days, or re-load 5 days? Re-loading wins EVERY TIME. If you're doing atomic level tables or slowly changing dimensions, missing 5 days could mean missing 5 days worth of change detecting and recording, where re-loading doesn't have this problem. This is unacceptable, losing data.

Kenneth Bland

Posted: Thu Aug 14, 2003 9:33 am
by kduke
Luca

I apologize if you think I was trying to be little what you said. I just never thought of it that way. I think your second answer is very accurate.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

Posted: Thu Aug 14, 2003 9:48 am
by luca
Kim,
Don't have to apologize, I did not take your answer in a bad way. It just maid me realize that my answer was more about Staging Points which so far I didn't really differentiate from Staging area. I'm happy having learned this!

Posted: Thu Aug 14, 2003 9:50 am
by roy
Hi,
Actually When I say Staging area it is a real live DB which is the final DB in which the data was loaded to before the production.

at this point you can check for integrity of data and if all is well you only need to transfer the data as is from the staging area to the production ( in delta or compleete methods depending on FACT/DIMENSION and more ).

This way of working in DWH allowes you to simulate the entire load process as if it was a 1 huge transaction performed in parallel.
after which you can eithere move the results to the production or dump the whole thing leaving your production uneffected.

That was, as Ken says, my 2 cents worth
to know more is to read/learn more and this place is not the way for all that info.

Regards,

Roy R.

Posted: Thu Aug 14, 2003 11:07 am
by spracht
Roy

I apologize, if the topic should be misplaced here [B)]. But thanks to all for the responses it provoked nonetheless [:)].

Stephan

Posted: Thu Aug 14, 2003 12:28 pm
by mhester
Stephan,

I don't think that you should apologize nor do I believe this is the wrong place for this information. On the contrary, I believe it is this kind of question that evokes good discussion and a better understanding of a term or methodology.

As you can see from the posts, many folks have an idea of what a "staging area" is and all are slightly different. I believe that you would find, as many answers as there are projects.

It is not reasonable to expect that everyone will read Kimball or Inman and if they did they may not fully understand the implications of doing something a certain way. That's where this forum helps. It helps by allowing developers to give their side (or experience) of the story so that an informed decision can be made with the best interests of the client in mind.

Again, it does not have to be so structured or rigid that you are locked into a certain methodology. I have been with many customers where hybrids of different philosophies are used and all worked well. It's also based on your experience and those practices that have worked in the past (failures and successes).

Regards,

Michael Hester

Posted: Fri Aug 15, 2003 6:18 pm
by roy
Hi,
I ment to go further into this would be to much cause this isn't a class-room [:)]

Never ment the question was out of place [:)]
also I wasn't sure if there is a need (probably yes) and how far into this should I get.

next time I should be more clear [:I]

Regards,

Roy R.