Write a constraint to Load rows based on last row #

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
DS_MJ
Participant
Posts: 157
Joined: Wed Feb 02, 2005 10:00 am

Write a constraint to Load rows based on last row #

Post by DS_MJ »

Hello:

Need to load data on DB2:

- During the very First load, store the last record number.

- During the next load need to check for the last record number and then start loading data after that record number.

There is a Primary key column called REC_NUM where it sequentially stores the rec number.

Need to write a constraint that checks the last REC_NUM then to start loading the data wherein the REC_NUM is > then the last REC_NUM.

How do I store the last REC_NUM? Then use it in my constraint as mentioned above.

Thanks.
Thanks in advance,
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Decide if you want to do a select max on the table or store the last number after each run. I recommend always doing a select max prior to running your job, as that will reliable work and you don't have to worry of synchronizing the stored max with the actual in case of data corrections or load issue.

Some ideas:
1. Use a stage variable in the constraint to filter out the unwanted rows. In the initialize configuration box, use a function call to go out and query the required table and return the appropriate value back to the stage variable. The stage variable derivation is just itself, ie. FRED=FRED.

2. Use a job to select the max value (either directly from the table or from your storage point) and spool to a text file. In your your control (Batch, Sequence, whatever, read the text file and pass in the value as a parameter to to your transformation job. Put the job parameter as part of the filtering constraint.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

:idea: If you initialize a stage variable it doesn't actually need a derivation. If there is no derivation, the initial value remains accessible.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

ray.wurlod wrote::idea: If you initialize a stage variable it doesn't actually need a derivation. If there is no derivation, the initial value remains accessible.
Never noticed, never tried. Make things easier, though.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
DS_MJ
Participant
Posts: 157
Joined: Wed Feb 02, 2005 10:00 am

Post by DS_MJ »

Thank you kcbland and ray.wurlod for the suggestions.

I am still trying out your suggestions and if need further guidance will definitely touch base.

Once again thank you.

MJ
Thanks in advance,
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

Rather than trying to work out how to do this, I would be asking why.

Why would you want to keep processing the same rows over and over? You could end up reading a million rows just to write out the last 100.

Why can't new records be written to another file and, if there is a need to keep everything in one file, append them to a history file after processing?
Jim Paradies
DS_MJ
Participant
Posts: 157
Joined: Wed Feb 02, 2005 10:00 am

Post by DS_MJ »

Hi jzparad:

Well what u say is also one of the things that I am trying out.

But as kcbland pointed out (which also is one of my concerns) by doing a select max prior to running the job, as that will reliable work and one does't have to worry of synchronizing the stored max with the actual in case of data corrections or load issue.
Thanks in advance,
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

MJ,

I'm not too sure of how your system hangs together. By the sound of it, you are receiving a full extract from a source system somewhere. What I would be investigating is whether the source system can do delta extracts instead. From the little that you've told us, this would not be hard as there is a record number that could be used to easily identify what records have already been sent.

If this isn't possible, then the next best solution would be to do as Ken suggested and do a "select max". This would be quick as the column you're looking at is the primary key. Once you've got that you have a number of options including passing it to another job as a parameter (as Ken suggested) or, if you want to keep everything in one job, write it to a hash file which you can then be used as a lookup to determine if the incoming record should be loaded.
Jim Paradies
DS_MJ
Participant
Posts: 157
Joined: Wed Feb 02, 2005 10:00 am

Post by DS_MJ »

Hello:

I have been able to break the job into 2 jobs.

in the first job i do the aggregating of the Col_X then save the max into a hash file.

In the second job I lookup the hash file for the last max value. In the constraint have put if source column is > then the hash file column then go ahead and insert without clearing into DB2.

The first job works fine. Stores the Max value into the hash file. The second job finishes fine (without errors) but no rows get inserted into DB2.

What could be wrong...? Why are the remaing rows not getting inserted into DB2...? The remaining rows with Col X has a greater value then the one in the hash file. Its should start inserting these rows into DB2.

Code: Select all


                           Hash File
                              |
                              |
                             \/
SAP Extract R/3 ---------> Transform -----------> DB2

Thanks in advance,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What's the Key field in your hashed file? Hopefully not just the Max Value. You'll need to 'hard code' a key value, like an 'X' and carry the max key in a data field in the hashed file. Then always look up the same row by hard-coding the key expression and check your row number to the looked up max number.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DS_MJ
Participant
Posts: 157
Joined: Wed Feb 02, 2005 10:00 am

Post by DS_MJ »

"By the way, it's hashed file, not hash file.
A hash file is a tool for reducing the size of a block of hash." - Ray Wurlod
I stand corrected, its 'Hashed file'. :oops:
Thanks in advance,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes, but did you fix your problem?
-craig

"You can never have too many knives" -- Logan Nine Fingers
DS_MJ
Participant
Posts: 157
Joined: Wed Feb 02, 2005 10:00 am

Post by DS_MJ »

chulett wrote:Yes, but did you fix your problem?
Hi chulett:

I apologize for replying so late.....but u must have figured out that I am facing some challanges yet. :( Plus had to finish other stuff first which I am happy to say was not a problem.

My target is DB2 8.1
DataStage : 7.5.2A
DS Server on : Linux
I work from DS Client

Anyways, this is what I am doing. Have 2 jobs.

In the First job I do the following:

Code: Select all


                           DB2 UDB API
                                |
                                |
                                \/
Abap Extract R/3 -------> Transform --------> Hashed File

From Abap Extract Stage extract COL_A (PK col in SAP) then in DB2 UDB API use a custom query to find the MAX(COL_A) and put that row in the Hashed file. Parameter is set to #Max_Value# for Hashed file.
Second job:

Code: Select all


                              Hashed File
                                 |
                                 |
                                 \/
Abap Extract R/3 -------> Transform --------> DB2_UDB_API


Here in the ABAP Extract R/3 in the where clause I put COL_A > #Max_Value#

my SQL in the ABAP Extract R/3 shows the following SQL:

SELECT
ZZZ~COL_A
ZZZ~COL_B
ZZZ~COL_C
ZZZ~COL_D
......
ZZZ~COL_n
FROM
ZZZ AS ZZZ
WHERE
ZZZ~COL_A > DS_JOB_PARAM@Max_Value

There are no syntax errors.

In the Tansform Put the constraint that ABAP_extract_COL_A > HASHED_COL_A

Output all the columns.

The job runs but never finishes or aborts with errors. Have to force the job to stop from Director.

Can u explain:
1. Why the parameters in the ABAP Extract dont work....? In another job when I hard code the values it works but when I use parameters it doesnt.?

2. Why does the job run and not pass any rows or abort or finish?
Thanks in advance,
DS_MJ
Participant
Posts: 157
Joined: Wed Feb 02, 2005 10:00 am

Post by DS_MJ »

Hello:

I am glad to say that my job is working. :)

Thanks to all.
Thanks in advance,
Post Reply