what's a staging area

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
spracht
Participant
Posts: 105
Joined: Tue Apr 15, 2003 11:30 pm
Location: Germany

what's a staging area

Post by spracht »

I stumbled across the term 'staging area' a couple of times, and I have no faint idea what that is. Could somebody please explain what it means? Thanks in advance!

Stephan
luca
Participant
Posts: 81
Joined: Wed May 14, 2003 11:34 pm

Post by luca »

Hi !
I understand a "staging area" like a point and place within your ETL process that allows you to store a temporary "image" (most of the time it is done using sequential files) of your data set at that stage of the process (staging point). E.g. after extraction, before loading, ...
This is done in order to facilitate a process of recovery in case your ETL process went wrong or to enable restart capabilities.
Hope this quick explanation will help.
Cheers.
Luca
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Stephan

You need to read the data warehouse gods Kimball and Inman. The concepts of data mart, staging area and a thousand other terms are defined. If you are doing ETL then the staging area is where the transformation part occurs. Normally this is a database. Thats the way Informatica would do it. DataStage is called that because the staging happens within its integrated database if used properly. I normally do not consider sequetial files part of the staging area but I guess they really are.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
luca
Participant
Posts: 81
Joined: Wed May 14, 2003 11:34 pm

Post by luca »

Obviously my understanding of staging area can be improved ;o)
Can we consider that Hashed files, sequential files and temporary tables are the support of the staging area?
Can we say that the staging area is what we find between the Source Systems or ODS and the DW?
Thanks for your clarifications.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Sandbox. Your playground for transforming the source data into the target data. Source --> Sandbox --> Target. You should use sequential staging files (Kimball) because they're compressible, transportable, viewable, searchable, loadable, blah blah blah. It doesn't preclude using hash files and work tables, but it's in your best interest to keep and archive these for an operational window that meets your recoverability SLA and audit capabilities. The sandbox exists between sources and persistent staging database (induction database), PSD and ODS, ODS and EDW, EDW and marts. It's a philosophy, a way of life.

You'll hear people say the Sandbox isn't necessary. However, restart and recoverability are major concerns for almost every client. Not landing ready-to-load files is a big boo-boo. Ask a client whose database corrupts and they need to go to a 5 day old backup. Now you have to catch up 5 days worth of data. Is it better to re-process 5 days, or re-load 5 days? Re-loading wins EVERY TIME. If you're doing atomic level tables or slowly changing dimensions, missing 5 days could mean missing 5 days worth of change detecting and recording, where re-loading doesn't have this problem. This is unacceptable, losing data.

Kenneth Bland
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Luca

I apologize if you think I was trying to be little what you said. I just never thought of it that way. I think your second answer is very accurate.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
luca
Participant
Posts: 81
Joined: Wed May 14, 2003 11:34 pm

Post by luca »

Kim,
Don't have to apologize, I did not take your answer in a bad way. It just maid me realize that my answer was more about Staging Points which so far I didn't really differentiate from Staging area. I'm happy having learned this!
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Actually When I say Staging area it is a real live DB which is the final DB in which the data was loaded to before the production.

at this point you can check for integrity of data and if all is well you only need to transfer the data as is from the staging area to the production ( in delta or compleete methods depending on FACT/DIMENSION and more ).

This way of working in DWH allowes you to simulate the entire load process as if it was a 1 huge transaction performed in parallel.
after which you can eithere move the results to the production or dump the whole thing leaving your production uneffected.

That was, as Ken says, my 2 cents worth
to know more is to read/learn more and this place is not the way for all that info.

Regards,

Roy R.
spracht
Participant
Posts: 105
Joined: Tue Apr 15, 2003 11:30 pm
Location: Germany

Post by spracht »

Roy

I apologize, if the topic should be misplaced here [B)]. But thanks to all for the responses it provoked nonetheless [:)].

Stephan
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Stephan,

I don't think that you should apologize nor do I believe this is the wrong place for this information. On the contrary, I believe it is this kind of question that evokes good discussion and a better understanding of a term or methodology.

As you can see from the posts, many folks have an idea of what a "staging area" is and all are slightly different. I believe that you would find, as many answers as there are projects.

It is not reasonable to expect that everyone will read Kimball or Inman and if they did they may not fully understand the implications of doing something a certain way. That's where this forum helps. It helps by allowing developers to give their side (or experience) of the story so that an informed decision can be made with the best interests of the client in mind.

Again, it does not have to be so structured or rigid that you are locked into a certain methodology. I have been with many customers where hybrids of different philosophies are used and all worked well. It's also based on your experience and those practices that have worked in the past (failures and successes).

Regards,

Michael Hester
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
I ment to go further into this would be to much cause this isn't a class-room [:)]

Never ment the question was out of place [:)]
also I wasn't sure if there is a need (probably yes) and how far into this should I get.

next time I should be more clear [:I]

Regards,

Roy R.
Post Reply