Update Strategy

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
pkomalla
Premium Member
Premium Member
Posts: 44
Joined: Tue Mar 21, 2006 6:18 pm

Update Strategy

Post by pkomalla »

Hi Guys,

I have a requirement to update one of the keys in the fact table.its a part of combo key in the table. Now this key is a dimension key and this dimension table used to have a slowly changing dimension till last year .
now it was taken out due to the business requirements. the deal is to fix this dimension key in the fact table .

ex:

dealer delaer dim

12345 678
12345 7334
12345 94794

so these are the dimension keys for dealer 12345 in the dimension table and the fact table but now each record in the fact table should be having 94794 as the dim key for that dealer all along in the fact table.

So to do this i have to update the existing dealer dimension keys with the max delaer dim from the dealer dimension table.

Now scenario is we have about 220 million records to be updated.
So when i tried to update them for some packet of records of 600000
it took me about five hours .
what i did was took a day of records and updated them with the max dim key for that coloumn matching on whole of the primary key ,but the primary key also consists of the coloumn whic i am updating.

i have a question here can i update them by taking a dealer and his dim and just send this record to the fact table to update all some one milion records for this dealer to have the correct key

just want to know any suggestions to do it more efficient and fast way.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

How many distinct dealer values are there? Seems like you could build a hashed file with the 'dealer' as the key and the max dim as the data element with a 'select max' style query.

Then send each distinct dealer value down a job that uses that hashed file and pick up the max for that key. Send it to your database and update dealer dim to the looked up value for each distinct dealer. That will update all dim values for each dealer value. You could even add an additional piece of DML to say "and dealer_dim <> max_dim" so it only changed the ones that really need changing.

Or use some sort of 'bulk' update functionality which I assume exists in DB2, something akin to MERGE in Oracle.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pkomalla
Premium Member
Premium Member
Posts: 44
Joined: Tue Mar 21, 2006 6:18 pm

Post by pkomalla »

Yes Exactly i was asking the same thing if i can do the same thing because i have only 5000 distinct dealers i can extract all the distinct delaers from my fact table and also look uo with a file from dealer table with all the dealers and the max dealer dim for that dealer.

Only thing i wanted to check is that one record with a dealer fom source does a lookup gets the max dealer dim and then it should update some million records for that delaer with this dim in the fact table.i wanted to know if someone else has done the similar thing before to have an idea of time the job will take to update this mass amount of rows is the job runs for all the dealers it shoul update more than 300 million records
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is probably most easily done using SQL directly in the database. Sure DataStage could be used, but so could a fairly straightforward UPDATE (using a correlated subquery containing MAX).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Of course, but this isn't CorrelatedSubqueryXchange.com. :wink:

No-one can tell you how long it will take. Run it and then let us know. Either way, job or SQL.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pkomalla
Premium Member
Premium Member
Posts: 44
Joined: Tue Mar 21, 2006 6:18 pm

Post by pkomalla »

Sure i will and update you guys on monday

Thanks
pkomalla
Premium Member
Premium Member
Posts: 44
Joined: Tue Mar 21, 2006 6:18 pm

Post by pkomalla »

Hi

I tried doing this with the datastage job and for one delaer it took about two minutes to update every record for that dealer .But when i tried it with around 1500 delaers job is in pending remote request stage in the database and its taking time , as the users are querying on the transaction table i had to abort the job and kill the process in the database.

So what i am thinking is to create a loop and pass all these delaers to the start loop one by one so that the job runs for once for each dealer may be it will help my case,i went through the posts for this looping and tried to build a job but having a hard time to pass the value of the dealer from the command activity stage to the query in the database stage in the job .

i tried doing this sending #Execute_Command_41.$CommandOutput# as a parameter for the job but still it doesnt recognize , i am definitely going wrong some where actually can someone please explain how to use the variable created in the user variable stage for this purpose.

job design

uservariables_activity -----> execute command activity --------> start loop --------> job activity --------> end loop to start loop.

also in the start loop i used unix new line character \n as a delimiter but its not able to recognize that there are foour dealer codes in the file , it only says one iteration but when i use comma and use this in the delimited values Field(Execute_Command_41.$CommandOutput, @FM, 1) then its doing three iterations where as i have four dealercodes , so the job should run four times. but please let me know how and what to pass to the query in the database stage in the job to let it know for which dealer code its running.


Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use the expression builder to construct your expression (you don't need the "#" characters).
Check, using another Execute Command activity to echo the previously calculated return value, that all four names are being picked up. Check also that there is a terminator following the fourth.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pkomalla
Premium Member
Premium Member
Posts: 44
Joined: Tue Mar 21, 2006 6:18 pm

Post by pkomalla »

Ray,


Which of these should i use in the parameter for my query

SELECT
DISTINCT main.I_CURR
FROM DLROB.SMRATRN main,
(select i_curr, max(i_dlr_dim) i_dlr_dim
from dlrob.smrdlrd
group by i_curr
having count (*) > 1
) temp
where main.i_curr = temp.i_curr
and main.i_curr=#delimiter#
in the source stage in the job



Left(Execute_Command_41.$CommandOutput, Len(Execute_Command_41.$CommandOutput) - 1)

StartLoop_Activity_17.$Counter
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The first contains the entire list. The $Counter activity variable contains the single value for the current iteration through the loop.

Please perform the checks I asked you to perform.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pkomalla
Premium Member
Premium Member
Posts: 44
Joined: Tue Mar 21, 2006 6:18 pm

Post by pkomalla »

Ray,


I got it woking , i am able to pick up the value in the query now the only thing is the terminator. job ran 4 times loaded 4 records in a sample file everything is fine but the sequencer aborted after the fourth run saying
Error calling DSSetParam(delimiter), code=-4
[ParamValue/Limitvalue is not appropriate]

Now i used #Execute_Command_41.$CommandOutput# in the start loop delimited values coloumn and for the variable that i created i used StartLoop_Activity_17.$Counter .

Result looks good except the abort after the fourth run

the sample fiel i created has 06504000,08655000,08767000,23017000 these values exactly.


Any suggestions please

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's a fifth, empty line, caused by the final line terminator.
Take the leftmost (N-1) characters of the command output, where the command output contains N characters.

Code: Select all

Left(EC.$CommandOutput, Len(EC.$CommandOutput)-1)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply