Loading data with no incremental jobs -Urgent Info Required

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

spalsikar
Participant
Posts: 10
Joined: Mon Aug 02, 2004 8:08 am

Loading data with no incremental jobs -Urgent Info Required

Post by spalsikar »

Could anybody please suggest or pass on some info about how they dealt with a situation like the one below.

We are in the process of loading the data from PeopleSoft financials to the PeopleSoft DataWarehouse, we have some huge tables which have millions of rows of data sitting in and millions of rows are added on a daily basis, for these types of jobs we would like to do an incremental load every night (after we go live) instead of destructive load which we cannot finish in our batch window. Ascential has ONLY delivered destructive jobs for these tables not incremental jobs; in order to create incremental jobs we have to have the LASTUPDDTTM field added on the PeopleSoft (Source) database, if we go this route we end up doing a huge customization and we do not want to do it.

Is there any other way that you can deal with this type of situation?
Thanks in advance.

~Shashi
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You have no solution other than writing your own ETL. You probably have the same opinion of the current ETL offering that I have.

Look at it as an opportunity to develop a solution to market back to Ascential/Peoplesoft.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
spalsikar
Participant
Posts: 10
Joined: Mon Aug 02, 2004 8:08 am

Post by spalsikar »

Thank you for the response Kenneth.

I was looking at SEQ_FDM84_88_JRNL_HEADER_D_I job, this record doesn't have LASTUPDDTTM on the source database but this specific job is being used as both incremental and destructive. Up on further investigation I found that the Update action in the Target DRS is set to "Insert new rows or Update existing rows" it sounds to me like an incremental load type and PeopleSoft/Ascential is populating the LASTUPDDTTM on the target database with DSJobStartTimestamp which means it's looking at the timestamp and loading the rows.

I was wondering if we can use the same method and create incremental jobs for huge transaction tables; I am not sure about the performance but it looks like at least we have an option.
~Shashi
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

If you have sorted versions of the current and previous full load files, you can use the unix comm command to identify the rows unique to each file. Next you can load these rows to hash files, and compare one file to the other to determine inserts, updates, and deletes.

Periodically, you may want to export the table and compare the current full load file to the export file, instead of the previous full load file. Do this just in case some of our changes from previous runs were rejected by the database.
venkatrranga
Participant
Posts: 15
Joined: Thu Sep 23, 2004 2:18 pm

Peoplesoft/Ascential Incremental Load

Post by venkatrranga »

I am looking after similar exercise for one of my client. We have the same issue. I spent considerable amount of time to look through how to change the extracts to incremental.

Very highlevel this is what I found out. The logic is different from each dataset you are trying to change incremental. Couple of things you may want to look closely..

1. PS load FC and DM's in destrictive mode. If you want to change FC to incremental mode... you may need to observe closely depending dimensions.

2. How to replace Zero SID values in dimenstions when the corresponding data is available.

3. How do you identify the incremental rows itself ( don't dependend upon the LASTUPDTTM time stamp)

4. Is change mode allowed in HCM application?



Regards
Venkat Ranga
spalsikar
Participant
Posts: 10
Joined: Mon Aug 02, 2004 8:08 am

Post by spalsikar »

Thank you for your response Venkat Ranga,

I really appreciate if you could elaborate a bit.

1. PS load FC and DM's in destrictive mode. If you want to change FC to incremental mode... you may need to observe closely depending dimensions.
-- What does FC stand for? Did you mean Datamart or Datamap when you mentioned DM?
2. How to replace Zero SID values in dimenstions when the corresponding data is available.

4. Is change mode allowed in HCM application?

-- What does HCM stand for?

You mentioned not to rely on the LASTUPDTTM time stamp, is there any specific reason?

Sorry, this tool is still new for us.

Thanks in advance.
Shashi
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Re: Loading data with no incremental jobs -Urgent Info Requi

Post by peternolan9 »

spalsikar wrote:Could anybody please suggest or pass on some info about how they dealt with a situation like the one below.

We are in the process of loading the data from PeopleSoft financials to the PeopleSoft DataWarehouse, we have some huge tables which have millions of rows of data sitting in and millions of rows are added on a daily basis, for these types of jobs we would like to do an incremental load every night (after we go live) instead of destructive load which we cannot finish in our batch window. Ascential has ONLY delivered destructive jobs for these tables not incremental jobs; in order to create incremental jobs we have to have the LASTUPDDTTM field added on the PeopleSoft (Source) database, if we go this route we end up doing a huge customization and we do not want to do it.

Is there any other way that you can deal with this type of situation?
Thanks in advance.

~Shashi
Hi Shashi,
drop and recreate a DW as a way to load it went out of fashion in 1994 when the very first propoagation products came onto the market....

You will have seen by this time that you must re-write the code.....I just did an Oracle Apps install with DS and we wrote the code from scratch...
Best Regards
Peter Nolan
www.peternolan.com
venkatrranga
Participant
Posts: 15
Joined: Thu Sep 23, 2004 2:18 pm

Post by venkatrranga »

spalsikar wrote:Thank you for your response Venkat Ranga,

I really appreciate if you could elaborate a bit.

1. PS load FC and DM's in destrictive mode. If you want to change FC to incremental mode... you may need to observe closely depending dimensions.
-- What does FC stand for? Did you mean Datamart or Datamap when you mentioned DM?
2. How to replace Zero SID values in dimenstions when the corresponding data is available.

4. Is change mode allowed in HCM application?

-- What does HCM stand for?

You mentioned not to rely on the LASTUPDTTM time stamp, is there any specific reason?

Sorry, this tool is still new for us.

Thanks in advance.
Shashi

FC - Is the suffix given to all Fact tables in PeopleSoft HCM
DM - is the suffix given to all Dimension tables in Peoplesoft HCM
CR - Is the suffix givent to all Dimensions contains a Current Active Row in Peoplesoft HCM

HCM is nothing but Peoplesoft

Regards
Venkat Ranga
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Post by peternolan9 »

Venkat,
"You mentioned not to rely on the LASTUPDTTM time stamp, is there any specific reason?"

I think we discussed this about may last year on this forum.....The problem is that in Oracle (bless it's little cotton socks) the last update timestamp is the time that the row was updated by oracle in memory and NOT the time it was committed to the database.

I was totally, totally amazed at that little nugget of information....but the Oracle guys tell me it's true......

I can't think of any other database where this is true......every other database makes the last update timestamp the time the row is committed to the database and made available to other applications....


The implications of this are obvious.......you are scanning for changed rows and a row changes after your 'last extract timestamp' but it is not yet committed.....you will not see the change.......and if you use just >= last extract timestamp you will miss this row the next time around as well......Like I said, just unbelievably amazing if you ask me....

To get around this you must, as a miniumum:

1. Always use between lastextracttimestamp and thisextracttimestamp.
2. Set your this extract timestamp before you start the extract.

What we do is wait for 5 minutes between the time we set this extract timestamp and when we start scanning. Then in Oracle apps the database has 5 minutes to commit all changes. We could achieve the same result by requesting the operational system perform a quiesce to commit all changes or we could ask them to tell us when they have committed all changes and kick off our extract like that.

If the database is Oracle, you can't just scan on last update timestamp....
Best Regards
Peter Nolan
www.peternolan.com
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

The implications of this are obvious.......you are scanning for changed rows and a row changes after your 'last extract timestamp' but it is not yet committed.....you will not see the change.......and if you use just >= last extract timestamp you will miss this row the next time around as well......Like I said, just unbelievably amazing if you ask me....
What you think is unbelievable :wink: is just a side effect. Oracle guarantees the data you see in a query is fixed the moment you open the query, and also only that data will be returned.

So if another session has not committed its data before you open your query, even if the other session commits its data during your query (with a timestamp that should be processed) your query will never see this data.

The benefit of this is that you can never read dirty data, so Oracle implements more than the sql standards propose. It also makes sure that no reader can get deadlocked (Oracle does not use row locks).

But this is becoming a bit off topic :lol:

Ogmios
In theory there's no difference between theory and practice. In practice there is.
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Post by peternolan9 »

Hi Ogmios,
but what you describe is a 'repeatable read' and this is an 'option' with other databases, you get to choose if you want to see updates that are committed after the start of the query......are you saying with Oracle this is not an 'option' but how it works all the time?......

Also, I thought row level locks were an option in Oracle.....there is a legend that SAP asked Oracle for row level locks and they provided them...

In any case, our solution of setting the lastextracttimestamp and thisextracttimestamp is still safe. It's a bit excessive I thought but better safe than sorry.....
Best Regards
Peter Nolan
www.peternolan.com
tej2004
Participant
Posts: 3
Joined: Fri Sep 03, 2004 8:37 am

Post by tej2004 »

Shashi,
i observe that source record JRNL_HEADER does have a timestamp field DTTM_STAMP_SEC coming from source which u can compare with LASTUPDDTTM in tgt for incremental load.Otherwise, every row coming into PSFT warehouse(staging area) from source database is always timestamped with LASTUPDDTTM as u correctly mentioned is DSJobStartTimestamp in case of datastage jobs.
But if do not have any such timestamp coming from source, then as u mentioned correctly Update action in the Target DRS is the option.
vardhan354
Participant
Posts: 76
Joined: Wed Jan 25, 2006 6:42 pm

Post by vardhan354 »

Hey

You can use the CRC32 function as u job logic to get rid of timestamp for your incremental loading.

Bye,

Vish.
ETL
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Coming in late to the thread, let me address something in the original question. With the Oracle buyout of PeopleSoft you are not going to get much in the way of PeopleSoft product upgrades, this kind of gives you a license to customise your DataStage jobs.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Coming in late... you guys do realize you are over a year late, don'tcha?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply