output column value should not be modified

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Sandhya.Arumugam
Participant
Posts: 7
Joined: Tue Oct 18, 2005 1:16 am
Contact:

output column value should not be modified

Post by Sandhya.Arumugam »

Hi,
I have done a server job which inserts new or updates existing records.The hash file is also updated in the same job.Im doing this using a lookup on the key column. I have several transformations as below for individual columns

If input is A Then "All"
else If input is B Then "Ball"
Else "Dont change the value in the output table."

I have done this pulling all these output columns into hash file and inserting the corresponding column value if the above condition is not satisfied, but is this possible without having to pull those column into hash file.Because 25 columns have to left not modified in this manner and have to update the hashfile within the job using same transformations which is not i feel is not an efficient way of handling.

I have tried building a routine using case statements but the ouptut column is made NULL if the condition is not satisfied. If statements also dont seem to work.

Is there anyway of leaving output value un-touched without using lookups?
Is there any approach to this?
Any suggestions would be of great help.

Thanks,
Sandy
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I couldn't exactly follow all of your issues, but here goes.

If you reference a row from a hash file and write back to that same row, you must write back all columns. This is because hash files store a row of data as a single contiguous text string. To not change a column value while changing others means you still have to write the entire row back. Therefore, derive the updating row completely and write the entire row back.

If you're worried about performance from reading/writing all columns, you may consider using the CRC32 function and just store that value. During a reference comparison, you compute your new rows with the CRC32 function and reference the CRC32 value stored in the hash file. If they're the same, the data is doesn't need to be updated. If they are different, then perform the update. If you actually need to use the current value in deciding the derivation, then this method is not for you.

Then again, how bad is your performance? Have you measured your process and absolutely identified this as the source of the bottleneck. If you're transforming and loading the database simultaneously in the same job, you're most likely waiting on database overhead more than anything else. Try removing/disabling with a constraint the output link to the database and see how fast your job goes. You'll probably find that your design in general is the issue, not the hash file.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Sandhya.Arumugam
Participant
Posts: 7
Joined: Tue Oct 18, 2005 1:16 am
Contact:

Post by Sandhya.Arumugam »

Thanks for the reply, I would follow your suggestion and test with CRC32 function to update the hash file and test the performance.

But what my real concern is the condition below

if
status = 'A'
then target column = 'aaa'
else if
status = 'B'
then target column = 'bbb'

There is no closing else condition i.e from source table i need to validate only two value A or B.
Now the question if any other value comes then the target column should not be overwritten (i.e the previously existing value should be retained). I built a routine with CASE statment but if condition is not statisfied an EMPTY STRING is returned which overwrites the existing value in the table. Is there is a work around for this condition without using a hash file lookup?? Is this possible within the tranformer derivation part or using routines.
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

Post by Kirtikumar »

If I have got it correctly then solution is as follows:
Assuming the job design and 3 cols ColA, ColB and ColC as column to which will take the dynamic values

Code: Select all

             Hash(1)
              |(ref)
              |
Source----> Tfm---> Hash(1)

Each column on output link to hash file will have derivation as follows:
ColA = If status = 'A' then 'aaa' else value from hash file
ColB = If status = 'B' then 'bbb' else value from hash file
ColC = If status = 'C' then 'ccc' else value from hash file

This will ensure that if the value is other than A, B and C, same data will be written to hash file again. Or else for performance purpose you can put constraint on the outgoing link as (Status = A or Status = B or Status = C)

Hope this helps.
Regards,
S. Kirtikumar.
Sandhya.Arumugam
Participant
Posts: 7
Joined: Tue Oct 18, 2005 1:16 am
Contact:

Post by Sandhya.Arumugam »

Hi Kritikumar,

The solution what you have given is exactly what i have alreday done. But there are 25 such columns so i have to pull 25 columns from the target table into hash file.And also the hash file is being updated in the same job.Also the input columns have complex derivations to derive the output column value.

What i need is a condition in the ELSE part of the IF then Else construct which would leave the target output column with the previous value. Is there any other approach to this problem without using lookups.This is the desgin im following

Code: Select all

Hash(ref from output table) 
                | 
Source---->   Tfm---> Output table
                |
      Hash(update same hash file)

Thanks,
Sandy
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sandhya.Arumugam wrote:Is there any other approach to this problem without using lookups.
No, not really - if we're talking a pure DS solution. Any output column will be updated in the target with whatever value you provide. In order for you to allow the value to remain unchanged in certain circumstances, you'll need to do exactly what you are doing - look it up and pass it back.

The only other solution off the top of my head would need to involve a Stored Procedure. It could be written to take all 25 values and understand that a null value would mean to not change the current value. You'll either need to conditionally issue up to 25 update statements (slow) or build a dynamic DML statement that only included columns with non-null values.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sandhya.Arumugam
Participant
Posts: 7
Joined: Tue Oct 18, 2005 1:16 am
Contact:

Post by Sandhya.Arumugam »

Thanks everybody for the replies, I will try working around it. i'l try the procedure else i have to stick to the lookup concept and implement the CRC32 funda and test the performance. :(

Thanks a Lot!!!!!
Any other suggestions would also be of great help.
chulett wrote: The only other solution off the top of my head would need to involve a Stored Procedure. It could be written to take all 25 values and understand that a null value would mean to not change the current value. You'll either need to conditionally issue up to 25 update statements (slow) or build a dynamic DML statement that only included columns with non-null values.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I think you totally missed my point.
kcbland wrote:Then again, how bad is your performance? Have you measured your process and absolutely identified this as the source of the bottleneck. If you're transforming and loading the database simultaneously in the same job, you're most likely waiting on database overhead more than anything else. Try removing/disabling with a constraint the output link to the database and see how fast your job goes. You'll probably find that your design in general is the issue, not the hash file.
You're worried about the Titanic sinking so you're tossing deck chairs overboard. Yeah, it might slow down the sinking, but the real problem is that iceberg sized hole in the side of the ship.

First determine if performance is an issue because of a fat hash file or a database connection impeding transformation performance. A dedicated transformation job with no database+network traffic i/o plus a pure inserts bulk loader job and pure updates dedicated as a stream job will outperform an all-in-one job most of the time. Your fat hash file put on a diet will improve performance, but really you need to confirm that the lowest hanging fruit is the hash file, not the database+network i/o.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Sandhya.Arumugam
Participant
Posts: 7
Joined: Tue Oct 18, 2005 1:16 am
Contact:

Post by Sandhya.Arumugam »

The lowest hanging fruit is indeed a hashfile as im looking up and updating the hash file within the same server job. If lookup is not found then hash file would also be updated with new record.

As i now have to pull these 25 columns i have to update the hash file also if any older records have been changed. So it would be a performance issue as around million records have to be loaded as history load and there would be around 30,000 new records on daily load. So if i can avoid pulling these 25 columns into lookup then performance would not be in question.

Presently im working on writing a DML statements to handle this situation in order to avoid using the fat lookup or even updating older records the hash file in the same job.

Thanks,
Sandy
Post Reply