Update Strategy
Moderators: chulett, rschirm, roy
Update Strategy
Hi Guys,
I have a requirement to update one of the keys in the fact table.its a part of combo key in the table. Now this key is a dimension key and this dimension table used to have a slowly changing dimension till last year .
now it was taken out due to the business requirements. the deal is to fix this dimension key in the fact table .
ex:
dealer delaer dim
12345 678
12345 7334
12345 94794
so these are the dimension keys for dealer 12345 in the dimension table and the fact table but now each record in the fact table should be having 94794 as the dim key for that dealer all along in the fact table.
So to do this i have to update the existing dealer dimension keys with the max delaer dim from the dealer dimension table.
Now scenario is we have about 220 million records to be updated.
So when i tried to update them for some packet of records of 600000
it took me about five hours .
what i did was took a day of records and updated them with the max dim key for that coloumn matching on whole of the primary key ,but the primary key also consists of the coloumn whic i am updating.
i have a question here can i update them by taking a dealer and his dim and just send this record to the fact table to update all some one milion records for this dealer to have the correct key
just want to know any suggestions to do it more efficient and fast way.
Thanks
I have a requirement to update one of the keys in the fact table.its a part of combo key in the table. Now this key is a dimension key and this dimension table used to have a slowly changing dimension till last year .
now it was taken out due to the business requirements. the deal is to fix this dimension key in the fact table .
ex:
dealer delaer dim
12345 678
12345 7334
12345 94794
so these are the dimension keys for dealer 12345 in the dimension table and the fact table but now each record in the fact table should be having 94794 as the dim key for that dealer all along in the fact table.
So to do this i have to update the existing dealer dimension keys with the max delaer dim from the dealer dimension table.
Now scenario is we have about 220 million records to be updated.
So when i tried to update them for some packet of records of 600000
it took me about five hours .
what i did was took a day of records and updated them with the max dim key for that coloumn matching on whole of the primary key ,but the primary key also consists of the coloumn whic i am updating.
i have a question here can i update them by taking a dealer and his dim and just send this record to the fact table to update all some one milion records for this dealer to have the correct key
just want to know any suggestions to do it more efficient and fast way.
Thanks
How many distinct dealer values are there? Seems like you could build a hashed file with the 'dealer' as the key and the max dim as the data element with a 'select max' style query.
Then send each distinct dealer value down a job that uses that hashed file and pick up the max for that key. Send it to your database and update dealer dim to the looked up value for each distinct dealer. That will update all dim values for each dealer value. You could even add an additional piece of DML to say "and dealer_dim <> max_dim" so it only changed the ones that really need changing.
Or use some sort of 'bulk' update functionality which I assume exists in DB2, something akin to MERGE in Oracle.
Then send each distinct dealer value down a job that uses that hashed file and pick up the max for that key. Send it to your database and update dealer dim to the looked up value for each distinct dealer. That will update all dim values for each dealer value. You could even add an additional piece of DML to say "and dealer_dim <> max_dim" so it only changed the ones that really need changing.
Or use some sort of 'bulk' update functionality which I assume exists in DB2, something akin to MERGE in Oracle.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Yes Exactly i was asking the same thing if i can do the same thing because i have only 5000 distinct dealers i can extract all the distinct delaers from my fact table and also look uo with a file from dealer table with all the dealers and the max dealer dim for that dealer.
Only thing i wanted to check is that one record with a dealer fom source does a lookup gets the max dealer dim and then it should update some million records for that delaer with this dim in the fact table.i wanted to know if someone else has done the similar thing before to have an idea of time the job will take to update this mass amount of rows is the job runs for all the dealers it shoul update more than 300 million records
Only thing i wanted to check is that one record with a dealer fom source does a lookup gets the max dealer dim and then it should update some million records for that delaer with this dim in the fact table.i wanted to know if someone else has done the similar thing before to have an idea of time the job will take to update this mass amount of rows is the job runs for all the dealers it shoul update more than 300 million records
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
This is probably most easily done using SQL directly in the database. Sure DataStage could be used, but so could a fairly straightforward UPDATE (using a correlated subquery containing MAX).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi
I tried doing this with the datastage job and for one delaer it took about two minutes to update every record for that dealer .But when i tried it with around 1500 delaers job is in pending remote request stage in the database and its taking time , as the users are querying on the transaction table i had to abort the job and kill the process in the database.
So what i am thinking is to create a loop and pass all these delaers to the start loop one by one so that the job runs for once for each dealer may be it will help my case,i went through the posts for this looping and tried to build a job but having a hard time to pass the value of the dealer from the command activity stage to the query in the database stage in the job .
i tried doing this sending #Execute_Command_41.$CommandOutput# as a parameter for the job but still it doesnt recognize , i am definitely going wrong some where actually can someone please explain how to use the variable created in the user variable stage for this purpose.
job design
uservariables_activity -----> execute command activity --------> start loop --------> job activity --------> end loop to start loop.
also in the start loop i used unix new line character \n as a delimiter but its not able to recognize that there are foour dealer codes in the file , it only says one iteration but when i use comma and use this in the delimited values Field(Execute_Command_41.$CommandOutput, @FM, 1) then its doing three iterations where as i have four dealercodes , so the job should run four times. but please let me know how and what to pass to the query in the database stage in the job to let it know for which dealer code its running.
Thanks
I tried doing this with the datastage job and for one delaer it took about two minutes to update every record for that dealer .But when i tried it with around 1500 delaers job is in pending remote request stage in the database and its taking time , as the users are querying on the transaction table i had to abort the job and kill the process in the database.
So what i am thinking is to create a loop and pass all these delaers to the start loop one by one so that the job runs for once for each dealer may be it will help my case,i went through the posts for this looping and tried to build a job but having a hard time to pass the value of the dealer from the command activity stage to the query in the database stage in the job .
i tried doing this sending #Execute_Command_41.$CommandOutput# as a parameter for the job but still it doesnt recognize , i am definitely going wrong some where actually can someone please explain how to use the variable created in the user variable stage for this purpose.
job design
uservariables_activity -----> execute command activity --------> start loop --------> job activity --------> end loop to start loop.
also in the start loop i used unix new line character \n as a delimiter but its not able to recognize that there are foour dealer codes in the file , it only says one iteration but when i use comma and use this in the delimited values Field(Execute_Command_41.$CommandOutput, @FM, 1) then its doing three iterations where as i have four dealercodes , so the job should run four times. but please let me know how and what to pass to the query in the database stage in the job to let it know for which dealer code its running.
Thanks
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Use the expression builder to construct your expression (you don't need the "#" characters).
Check, using another Execute Command activity to echo the previously calculated return value, that all four names are being picked up. Check also that there is a terminator following the fourth.
Check, using another Execute Command activity to echo the previously calculated return value, that all four names are being picked up. Check also that there is a terminator following the fourth.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray,
Which of these should i use in the parameter for my query
SELECT
DISTINCT main.I_CURR
FROM DLROB.SMRATRN main,
(select i_curr, max(i_dlr_dim) i_dlr_dim
from dlrob.smrdlrd
group by i_curr
having count (*) > 1
) temp
where main.i_curr = temp.i_curr
and main.i_curr=#delimiter#
in the source stage in the job
Left(Execute_Command_41.$CommandOutput, Len(Execute_Command_41.$CommandOutput) - 1)
StartLoop_Activity_17.$Counter
Which of these should i use in the parameter for my query
SELECT
DISTINCT main.I_CURR
FROM DLROB.SMRATRN main,
(select i_curr, max(i_dlr_dim) i_dlr_dim
from dlrob.smrdlrd
group by i_curr
having count (*) > 1
) temp
where main.i_curr = temp.i_curr
and main.i_curr=#delimiter#
in the source stage in the job
Left(Execute_Command_41.$CommandOutput, Len(Execute_Command_41.$CommandOutput) - 1)
StartLoop_Activity_17.$Counter
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The first contains the entire list. The $Counter activity variable contains the single value for the current iteration through the loop.
Please perform the checks I asked you to perform.
Please perform the checks I asked you to perform.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray,
I got it woking , i am able to pick up the value in the query now the only thing is the terminator. job ran 4 times loaded 4 records in a sample file everything is fine but the sequencer aborted after the fourth run saying
Error calling DSSetParam(delimiter), code=-4
[ParamValue/Limitvalue is not appropriate]
Now i used #Execute_Command_41.$CommandOutput# in the start loop delimited values coloumn and for the variable that i created i used StartLoop_Activity_17.$Counter .
Result looks good except the abort after the fourth run
the sample fiel i created has 06504000,08655000,08767000,23017000 these values exactly.
Any suggestions please
Thanks
I got it woking , i am able to pick up the value in the query now the only thing is the terminator. job ran 4 times loaded 4 records in a sample file everything is fine but the sequencer aborted after the fourth run saying
Error calling DSSetParam(delimiter), code=-4
[ParamValue/Limitvalue is not appropriate]
Now i used #Execute_Command_41.$CommandOutput# in the start loop delimited values coloumn and for the variable that i created i used StartLoop_Activity_17.$Counter .
Result looks good except the abort after the fourth run
the sample fiel i created has 06504000,08655000,08767000,23017000 these values exactly.
Any suggestions please
Thanks
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
There's a fifth, empty line, caused by the final line terminator.
Take the leftmost (N-1) characters of the command output, where the command output contains N characters.
Take the leftmost (N-1) characters of the command output, where the command output contains N characters.
Code: Select all
Left(EC.$CommandOutput, Len(EC.$CommandOutput)-1)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.