Write a constraint to Load rows based on last row #

DS_MJ · Post by **DS_MJ** » Wed Jan 04, 2006 4:15 pm

Hello:

Need to load data on DB2:

- During the very First load, store the last record number.

- During the next load need to check for the last record number and then start loading data after that record number.

There is a Primary key column called REC_NUM where it sequentially stores the rec number.

Need to write a constraint that checks the last REC_NUM then to start loading the data wherein the REC_NUM is > then the last REC_NUM.

How do I store the last REC_NUM? Then use it in my constraint as mentioned above.

Thanks.

kcbland · Post by **kcbland** » Wed Jan 04, 2006 4:29 pm

Decide if you want to do a select max on the table or store the last number after each run. I recommend always doing a select max prior to running your job, as that will reliable work and you don't have to worry of synchronizing the stored max with the actual in case of data corrections or load issue.

Some ideas:
1. Use a stage variable in the constraint to filter out the unwanted rows. In the initialize configuration box, use a function call to go out and query the required table and return the appropriate value back to the stage variable. The stage variable derivation is just itself, ie. FRED=FRED.

2. Use a job to select the max value (either directly from the table or from your storage point) and spool to a text file. In your your control (Batch, Sequence, whatever, read the text file and pass in the value as a parameter to to your transformation job. Put the job parameter as part of the filtering constraint.

ray.wurlod · Post by **ray.wurlod** » Wed Jan 04, 2006 5:39 pm

If you initialize a stage variable it doesn't actually need a derivation. If there is no derivation, the initial value remains accessible.

kcbland · Post by **kcbland** » Wed Jan 04, 2006 9:19 pm

ray.wurlod wrote: If you initialize a stage variable it doesn't actually need a derivation. If there is no derivation, the initial value remains accessible.

Never noticed, never tried. Make things easier, though.

DS_MJ · Post by **DS_MJ** » Fri Jan 06, 2006 11:10 am

Thank you kcbland and ray.wurlod for the suggestions.

I am still trying out your suggestions and if need further guidance will definitely touch base.

Once again thank you.

MJ

jzparad · Post by **jzparad** » Sat Jan 07, 2006 1:30 pm

Rather than trying to work out how to do this, I would be asking why.

Why would you want to keep processing the same rows over and over? You could end up reading a million rows just to write out the last 100.

Why can't new records be written to another file and, if there is a need to keep everything in one file, append them to a history file after processing?

DS_MJ · Post by **DS_MJ** » Sun Jan 08, 2006 8:06 am

Hi jzparad:

Well what u say is also one of the things that I am trying out.

But as kcbland pointed out (which also is one of my concerns) by doing a select max prior to running the job, as that will reliable work and one does't have to worry of synchronizing the stored max with the actual in case of data corrections or load issue.

jzparad · Post by **jzparad** » Sun Jan 08, 2006 7:45 pm

MJ,

I'm not too sure of how your system hangs together. By the sound of it, you are receiving a full extract from a source system somewhere. What I would be investigating is whether the source system can do delta extracts instead. From the little that you've told us, this would not be hard as there is a record number that could be used to easily identify what records have already been sent.

If this isn't possible, then the next best solution would be to do as Ken suggested and do a "select max". This would be quick as the column you're looking at is the primary key. Once you've got that you have a number of options including passing it to another job as a parameter (as Ken suggested) or, if you want to keep everything in one job, write it to a hash file which you can then be used as a lookup to determine if the incoming record should be loaded.

DS_MJ · Post by **DS_MJ** » Thu Jan 19, 2006 1:56 pm

Hello:

I have been able to break the job into 2 jobs.

in the first job i do the aggregating of the Col_X then save the max into a hash file.

In the second job I lookup the hash file for the last max value. In the constraint have put if source column is > then the hash file column then go ahead and insert without clearing into DB2.

The first job works fine. Stores the Max value into the hash file. The second job finishes fine (without errors) but no rows get inserted into DB2.

What could be wrong...? Why are the remaing rows not getting inserted into DB2...? The remaining rows with Col X has a greater value then the one in the hash file. Its should start inserting these rows into DB2.

Code: Select all


                           Hash File
                              |
                              |
                             \/
SAP Extract R/3 ---------> Transform -----------> DB2

chulett · Post by **chulett** » Thu Jan 19, 2006 2:22 pm

What's the Key field in your hashed file? Hopefully not just the Max Value. You'll need to 'hard code' a key value, like an 'X' and carry the max key in a data field in the hashed file. Then always look up the same row by hard-coding the key expression and check your row number to the looked up max number.

DS_MJ · Post by **DS_MJ** » Thu Jan 19, 2006 2:43 pm

"By the way, it's hashed file, not hash file.
A hash file is a tool for reducing the size of a block of hash." - Ray Wurlod

I stand corrected, its 'Hashed file'.

chulett · Post by **chulett** » Thu Jan 19, 2006 5:21 pm

Yes, but did you fix your problem?

DS_MJ · Post by **DS_MJ** » Thu Feb 09, 2006 10:26 am

chulett wrote:Yes, but did you fix your problem?

Hi chulett:

I apologize for replying so late.....but u must have figured out that I am facing some challanges yet.

Plus had to finish other stuff first which I am happy to say was not a problem.

My target is DB2 8.1
DataStage : 7.5.2A
DS Server on : Linux
I work from DS Client

Anyways, this is what I am doing. Have 2 jobs.

In the First job I do the following:

Code: Select all


                           DB2 UDB API
                                |
                                |
                                \/
Abap Extract R/3 -------> Transform --------> Hashed File

From Abap Extract Stage extract COL_A (PK col in SAP) then in DB2 UDB API use a custom query to find the MAX(COL_A) and put that row in the Hashed file. Parameter is set to #Max_Value# for Hashed file.
Second job:

Code: Select all


                              Hashed File
                                 |
                                 |
                                 \/
Abap Extract R/3 -------> Transform --------> DB2_UDB_API

Here in the ABAP Extract R/3 in the where clause I put COL_A > #Max_Value#

my SQL in the ABAP Extract R/3 shows the following SQL:

SELECT
ZZZ~COL_A
ZZZ~COL_B
ZZZ~COL_C
ZZZ~COL_D
......
ZZZ~COL_n
FROM
ZZZ AS ZZZ
WHERE
ZZZ~COL_A > DS_JOB_PARAM@Max_Value

There are no syntax errors.

In the Tansform Put the constraint that ABAP_extract_COL_A > HASHED_COL_A

Output all the columns.

The job runs but never finishes or aborts with errors. Have to force the job to stop from Director.

Can u explain:
1. Why the parameters in the ABAP Extract dont work....? In another job when I hard code the values it works but when I use parameters it doesnt.?

2. Why does the job run and not pass any rows or abort or finish?

DS_MJ · Post by **DS_MJ** » Fri Feb 17, 2006 11:24 am

Hello:

I am glad to say that my job is working.

Thanks to all.