Page 1 of 2

Loading data with no incremental jobs -Urgent Info Required

Posted: Thu Jan 06, 2005 9:25 am
by spalsikar
Could anybody please suggest or pass on some info about how they dealt with a situation like the one below.

We are in the process of loading the data from PeopleSoft financials to the PeopleSoft DataWarehouse, we have some huge tables which have millions of rows of data sitting in and millions of rows are added on a daily basis, for these types of jobs we would like to do an incremental load every night (after we go live) instead of destructive load which we cannot finish in our batch window. Ascential has ONLY delivered destructive jobs for these tables not incremental jobs; in order to create incremental jobs we have to have the LASTUPDDTTM field added on the PeopleSoft (Source) database, if we go this route we end up doing a huge customization and we do not want to do it.

Is there any other way that you can deal with this type of situation?
Thanks in advance.

~Shashi

Posted: Thu Jan 06, 2005 9:37 am
by kcbland
You have no solution other than writing your own ETL. You probably have the same opinion of the current ETL offering that I have.

Look at it as an opportunity to develop a solution to market back to Ascential/Peoplesoft.

Posted: Thu Jan 06, 2005 3:01 pm
by spalsikar
Thank you for the response Kenneth.

I was looking at SEQ_FDM84_88_JRNL_HEADER_D_I job, this record doesn't have LASTUPDDTTM on the source database but this specific job is being used as both incremental and destructive. Up on further investigation I found that the Update action in the Target DRS is set to "Insert new rows or Update existing rows" it sounds to me like an incremental load type and PeopleSoft/Ascential is populating the LASTUPDDTTM on the target database with DSJobStartTimestamp which means it's looking at the timestamp and loading the rows.

I was wondering if we can use the same method and create incremental jobs for huge transaction tables; I am not sure about the performance but it looks like at least we have an option.
~Shashi

Posted: Fri Jan 07, 2005 5:05 pm
by chucksmith
If you have sorted versions of the current and previous full load files, you can use the unix comm command to identify the rows unique to each file. Next you can load these rows to hash files, and compare one file to the other to determine inserts, updates, and deletes.

Periodically, you may want to export the table and compare the current full load file to the export file, instead of the previous full load file. Do this just in case some of our changes from previous runs were rejected by the database.

Peoplesoft/Ascential Incremental Load

Posted: Tue Jan 11, 2005 1:42 pm
by venkatrranga
I am looking after similar exercise for one of my client. We have the same issue. I spent considerable amount of time to look through how to change the extracts to incremental.

Very highlevel this is what I found out. The logic is different from each dataset you are trying to change incremental. Couple of things you may want to look closely..

1. PS load FC and DM's in destrictive mode. If you want to change FC to incremental mode... you may need to observe closely depending dimensions.

2. How to replace Zero SID values in dimenstions when the corresponding data is available.

3. How do you identify the incremental rows itself ( don't dependend upon the LASTUPDTTM time stamp)

4. Is change mode allowed in HCM application?



Regards
Venkat Ranga

Posted: Tue Jan 11, 2005 1:57 pm
by spalsikar
Thank you for your response Venkat Ranga,

I really appreciate if you could elaborate a bit.

1. PS load FC and DM's in destrictive mode. If you want to change FC to incremental mode... you may need to observe closely depending dimensions.
-- What does FC stand for? Did you mean Datamart or Datamap when you mentioned DM?
2. How to replace Zero SID values in dimenstions when the corresponding data is available.

4. Is change mode allowed in HCM application?

-- What does HCM stand for?

You mentioned not to rely on the LASTUPDTTM time stamp, is there any specific reason?

Sorry, this tool is still new for us.

Thanks in advance.
Shashi

Re: Loading data with no incremental jobs -Urgent Info Requi

Posted: Sat Jan 15, 2005 5:26 am
by peternolan9
spalsikar wrote:Could anybody please suggest or pass on some info about how they dealt with a situation like the one below.

We are in the process of loading the data from PeopleSoft financials to the PeopleSoft DataWarehouse, we have some huge tables which have millions of rows of data sitting in and millions of rows are added on a daily basis, for these types of jobs we would like to do an incremental load every night (after we go live) instead of destructive load which we cannot finish in our batch window. Ascential has ONLY delivered destructive jobs for these tables not incremental jobs; in order to create incremental jobs we have to have the LASTUPDDTTM field added on the PeopleSoft (Source) database, if we go this route we end up doing a huge customization and we do not want to do it.

Is there any other way that you can deal with this type of situation?
Thanks in advance.

~Shashi
Hi Shashi,
drop and recreate a DW as a way to load it went out of fashion in 1994 when the very first propoagation products came onto the market....

You will have seen by this time that you must re-write the code.....I just did an Oracle Apps install with DS and we wrote the code from scratch...

Posted: Sat Jan 22, 2005 6:36 pm
by venkatrranga
spalsikar wrote:Thank you for your response Venkat Ranga,

I really appreciate if you could elaborate a bit.

1. PS load FC and DM's in destrictive mode. If you want to change FC to incremental mode... you may need to observe closely depending dimensions.
-- What does FC stand for? Did you mean Datamart or Datamap when you mentioned DM?
2. How to replace Zero SID values in dimenstions when the corresponding data is available.

4. Is change mode allowed in HCM application?

-- What does HCM stand for?

You mentioned not to rely on the LASTUPDTTM time stamp, is there any specific reason?

Sorry, this tool is still new for us.

Thanks in advance.
Shashi

FC - Is the suffix given to all Fact tables in PeopleSoft HCM
DM - is the suffix given to all Dimension tables in Peoplesoft HCM
CR - Is the suffix givent to all Dimensions contains a Current Active Row in Peoplesoft HCM

HCM is nothing but Peoplesoft

Regards
Venkat Ranga

Posted: Sun Jan 23, 2005 5:29 am
by peternolan9
Venkat,
"You mentioned not to rely on the LASTUPDTTM time stamp, is there any specific reason?"

I think we discussed this about may last year on this forum.....The problem is that in Oracle (bless it's little cotton socks) the last update timestamp is the time that the row was updated by oracle in memory and NOT the time it was committed to the database.

I was totally, totally amazed at that little nugget of information....but the Oracle guys tell me it's true......

I can't think of any other database where this is true......every other database makes the last update timestamp the time the row is committed to the database and made available to other applications....


The implications of this are obvious.......you are scanning for changed rows and a row changes after your 'last extract timestamp' but it is not yet committed.....you will not see the change.......and if you use just >= last extract timestamp you will miss this row the next time around as well......Like I said, just unbelievably amazing if you ask me....

To get around this you must, as a miniumum:

1. Always use between lastextracttimestamp and thisextracttimestamp.
2. Set your this extract timestamp before you start the extract.

What we do is wait for 5 minutes between the time we set this extract timestamp and when we start scanning. Then in Oracle apps the database has 5 minutes to commit all changes. We could achieve the same result by requesting the operational system perform a quiesce to commit all changes or we could ask them to tell us when they have committed all changes and kick off our extract like that.

If the database is Oracle, you can't just scan on last update timestamp....

Posted: Sun Jan 23, 2005 7:39 am
by ogmios
The implications of this are obvious.......you are scanning for changed rows and a row changes after your 'last extract timestamp' but it is not yet committed.....you will not see the change.......and if you use just >= last extract timestamp you will miss this row the next time around as well......Like I said, just unbelievably amazing if you ask me....
What you think is unbelievable :wink: is just a side effect. Oracle guarantees the data you see in a query is fixed the moment you open the query, and also only that data will be returned.

So if another session has not committed its data before you open your query, even if the other session commits its data during your query (with a timestamp that should be processed) your query will never see this data.

The benefit of this is that you can never read dirty data, so Oracle implements more than the sql standards propose. It also makes sure that no reader can get deadlocked (Oracle does not use row locks).

But this is becoming a bit off topic :lol:

Ogmios

Posted: Mon Jan 24, 2005 8:13 am
by peternolan9
Hi Ogmios,
but what you describe is a 'repeatable read' and this is an 'option' with other databases, you get to choose if you want to see updates that are committed after the start of the query......are you saying with Oracle this is not an 'option' but how it works all the time?......

Also, I thought row level locks were an option in Oracle.....there is a legend that SAP asked Oracle for row level locks and they provided them...

In any case, our solution of setting the lastextracttimestamp and thisextracttimestamp is still safe. It's a bit excessive I thought but better safe than sorry.....

Posted: Mon Jan 24, 2005 10:41 pm
by tej2004
Shashi,
i observe that source record JRNL_HEADER does have a timestamp field DTTM_STAMP_SEC coming from source which u can compare with LASTUPDDTTM in tgt for incremental load.Otherwise, every row coming into PSFT warehouse(staging area) from source database is always timestamped with LASTUPDDTTM as u correctly mentioned is DSJobStartTimestamp in case of datastage jobs.
But if do not have any such timestamp coming from source, then as u mentioned correctly Update action in the Target DRS is the option.

Posted: Thu Mar 09, 2006 4:19 pm
by vardhan354
Hey

You can use the CRC32 function as u job logic to get rid of timestamp for your incremental loading.

Bye,

Vish.

Posted: Thu Mar 09, 2006 7:07 pm
by vmcburney
Coming in late to the thread, let me address something in the original question. With the Oracle buyout of PeopleSoft you are not going to get much in the way of PeopleSoft product upgrades, this kind of gives you a license to customise your DataStage jobs.

Posted: Thu Mar 09, 2006 7:10 pm
by chulett
Coming in late... you guys do realize you are over a year late, don'tcha?