Reading and writing to a table in same job..

Xanadu · Post by **Xanadu** » Tue Aug 17, 2004 9:14 am

Hello...
I have a job that writes to a table A..

The setup looks like this:
Table A----------------> TFM--------------->Table A ---------> Hash_table_A
---------(ref lookup)----------(writes to A)-----------(writes hash)

The transformer does a look up on a table A and writes to Table A. The data from Table A is then written to Hashfile.

My question is : Is it better if the lookup is the hashfile instead of the table? I thought this might give raise to the race condition. For example if a row is being written to the hash file(it is still in process - "being written"..) and then the lookup occurs..
The data is actually in table but it isn't in the hash file..so the lookup might fail right ?
(I put the parameter array size to 1 when I am writing to the table..so that the lookup would not fail for consecutive records ... )

Any comments.. ?
Thanks
-Xan

chucksmith · Post by **chucksmith** » Tue Aug 17, 2004 10:18 am

I would do all of this work in a hash file. Preload it with the rows from the table that you need, or all rows if its size makes sense.

Add a column to the hash file to indicate if this is a new or existing row.

Now, run your job, inserting new rows in the hash file and flagging them as new.

After the previous step completes, select all rows from the hash file with the new flag and insert them into your table.

Doing it this way also give you the option of doing a bulk load.

Xanadu · Post by **Xanadu** » Tue Aug 17, 2004 10:28 am

I would do all of this work in a hash file. Preload it with the rows from the table that you need, or all rows if its size makes sense.
--> I cannot preload this..if u notice the table that loads this hash file is being built in the same job that writes this file...

Doing it this way also give you the option of doing a bulk load.
--> I am working on PSFT EPM and to maintain consistency across other jobs we are using only DRS stages..

sumitgulati · Post by **sumitgulati** » Tue Aug 17, 2004 10:49 am

Making a lookup to hash file will definitely be faster.

Well here is what I would suggest

Code: Select all

TableA --> HashA
             |
            TFM --> HashA --> TableA

Load every thing from Table A to Hash A in the same job. Then make a lookup to the hash file in a transformer and load the same hash file. Remember that you should have caching disabled. Then from this hash file you can load data into the table.

Hope this helps.

Regards,
Sumit

Xanadu · Post by **Xanadu** » Tue Aug 17, 2004 11:01 am

Well this brings to 2 open points..

1) Since you are working on EPM jobs too..if u notice that the structure i gave u is from those jobs... if u look at my structure again..
the input to the table as array size of 1...but output to the hash file from this table has arrray size of 32767.. so again brings me to the same point of race condition..
i guess i am not clear since both of you did not cover this aspect...

1 row is being to the hash file (its already in the table..)...wudn't the lookup fail on that row ? if i keep the array size as 1 for the table output to hash file...would that record be written immediately after it enters the table ? would that solve the problem..

2) and also if you notice...the cache aspect...the existing jobs have problems in 2 ways...
when writing to hash files the cache is enabled which is erroneous and to make matters worse.....during reference lookup the hash file is preloaded...if its preloaded..what happens to all the records that are being loaded in this particular iteration of the job..
for example: if there are 100 rows in table and hash file initially
and 10 new rows are being ritten in this incremental load..
5th and 7th records are duplicates...
while processing 7th record ... the lookup is still performed only the pre-existing 100 records...alrite that can solved by not preloading the hash file..even then problem is not solved..
the 5th record is ritten to table and being ritten to hash file (its not yet in hash file) again the lookup fails right ?
would it help if i put the array size as 1 in both input and output links of the target table ?

Thanks sumit
-Xan

sumitgulati wrote:Making a lookup to hash file will definitely be faster.

Well here is what I would suggest
Code: Select all
TableA --> HashA
             |
            TFM --> HashA --> TableA
Load every thing from Table A to Hash A in the same job. Then make a lookup to the hash file in a transformer and load the same hash file. Remember that you should have caching disabled. Then from this hash file you can load data into the table.

Hope this helps.

Regards,
Sumit

sumitgulati · Post by **sumitgulati** » Tue Aug 17, 2004 11:18 am

Xan, the design that you gave may work with array size and transaction size as 1 but it will be slow because you are making a lookup to the table.
To make a lookup to the hash file you may have to consider the design I proposed.

To answer your questions:

1) 1 row is being to the hash file (its already in the table..)...wudn't the lookup fail on that row ? if i keep the array size as 1 for the table output to hash file...would that record be written immediately after it enters the table ? would that solve the problem..

See, this is how your design will work. If array size and transaction size to the table input is 1 every record will immediately get reflected to the reference table. The hash file load will happen only once table A load is completely over. So, just making the array size 1 in the table output link will not work.

2) if there are 100 rows in table and hash file initially and 10 new rows are being ritten in this incremental load.. 5th and 7th records are duplicates... while processing 7th record ... the lookup is still performed only the pre-existing 100 records...

In your desing it the 5th record will get reflected in the lookup if you give array size and transaction size as 1 in table input link. Also have caching disabled.
In my desing also the 5th record will get reflected in the lookup if caching is disabled.

DataStage Gurus, please correct me if I am wrong.

Regards,
Sumit

Xanadu · Post by **Xanadu** » Tue Aug 17, 2004 11:29 am

See, this is how your design will work. If array size and transaction size to the table input is 1 every record will immediately get reflected to the reference table. The hash file load will happen only once table A load is completely over. So, just making the array size 1 in the table output link will not work.
--> Right now the transaction size is 0 ..u mean if i put the transaction size as 1, each new row is written to the hash file immediately ?

2) if there are 100 rows in table and hash file initially and 10 new rows are being ritten in this incremental load.. 5th and 7th records are duplicates... while processing 7th record ... the lookup is still performed only the pre-existing 100 records...

In your desing it the 5th record will get reflected in the lookup if you give array size and transaction size as 1 in table input link. Also have caching disabled.
--> yeah i wud have caching disabled and also preloading the hash file disabled.
i wud put the array size as 1 on both input and output links of the table and transaction size as 1 on the input link.
(and also how is transaction size related..wudn't putting the array size 1 be sufficient ? i rite each record to the table but i wont commit it wudnt this new record be included in the lookup ? and also when i say transaction size as 0 the transaction is committed everytime the data is written..here the array size is 1 ..so transaction is committed after each record ? or am i confused ? )
With this I am sure the new row would be in the table by the time I do the lookup..but I am not sure it wud be in the hash file yet.
So putting arraysize and transaction size as 1 and using table lookup is robust but slow...hmmmm....

thanks sumit

sumitgulati · Post by **sumitgulati** » Tue Aug 17, 2004 11:41 am

See, this is how your design will work. If array size and transaction size to the table input is 1 every record will immediately get reflected to the reference table. The hash file load will happen only once table A load is completely over. So, just making the array size 1 in the table output link will not work.
--> Right now the transaction size is 0 ..u mean if i put the transaction size as 1, each new row is written to the hash file immediately ?

#### Guess I did not make my point clear. Here is an example of how rows will get inserted with your design. Assuming array size is 1. If you are processing 100 new records then all these 100 records will get inserted into table A one by one. The output link execution will start only once the input link execution is completely over. That means your hash file load will start only once all the 100 records have been written into table A. Even if the array size of the output link is 1, every record that goes into table A will not go into the hash file immediately. It will wait for the input link execution to get over.

2) if there are 100 rows in table and hash file initially and 10 new rows are being ritten in this incremental load.. 5th and 7th records are duplicates... while processing 7th record ... the lookup is still performed only the pre-existing 100 records...

In your desing it the 5th record will get reflected in the lookup if you give array size and transaction size as 1 in table input link. Also have caching disabled.
--> yeah i wud have caching disabled and also preloading the hash file disabled.
i wud put the array size as 1 on both input and output links of the table and transaction size as 1 on the input link.
(and also how is transaction size related..wudn't putting the array size 1 be sufficient ? i rite each record to the table but i wont commit it wudnt this new record be included in the lookup ? and also when i say transaction size as 0 the transaction is committed everytime the data is written..here the array size is 1 ..so transaction is committed after each record ? or am i confused ? )
With this I am sure the new row would be in the table by the time I do the lookup..but I am not sure it wud be in the hash file yet.
So putting arraysize and transaction size as 1 and using table lookup is robust but slow...hmmmm....
#### transaction size 0 means that a commit will happen once all the records get processed into the table. Say you are processing 100 records and you array size is 10 and transaction size is 0. The database commit will happen only once all 100 records are written into the table (10 at a time). However if you have both array size and transaction size 1 the commint will happen after every one record is written into the table.

Regards,
Sumit

Xanadu · Post by **Xanadu** » Tue Aug 17, 2004 11:46 am

Even if the array size of the output link is 1, every record that goes into table A will not go into the hash file immediately. It will wait for the input link execution to get over.

--> thanks again sumit..now if i put transaction size as 1...would the problem persists even then ? or is the data ritten to the hash file once its committed ?

and also for the lookup to be successful is it mandatory for the data to be committed... ? if array size is 1, i rite a record .... (this record is not yet committed) ..and i look for this record..this shud be in the database and lookup shud be successful ?

sumitgulati · Post by **sumitgulati** » Tue Aug 17, 2004 11:53 am

Even if the array size of the output link is 1, every record that goes into table A will not go into the hash file immediately. It will wait for the input link execution to get over.

--> thanks again sumit..now if i put transaction size as 1...would the problem persists even then ? or is the data ritten to the hash file once its committed ?
#### There is nothing called transaction size in the output link. Only in the table input link you have transaction size. The answer still remains the same. Even if both array size and transaction size in the input link are 1 the hash file load will start only once the input link execution is over. The data wil not immediately get reflected into the hash file.

and also for the lookup to be successful is it mandatory for the data to be committed... ? if array size is 1, i rite a record .... (this record is not yet committed) ..and i look for this record..this shud be in the database and lookup shud be successful ?
#### I do not know if the commit is mandatry. I can not even do a sample job to test it because I don't have access to DataStage right now.

Regards,
Sumit

Xanadu · Post by **Xanadu** » Tue Aug 17, 2004 11:58 am

#### There is nothing called transaction size in the output link. Only in the table input link you have transaction size. The answer still remains the same. Even if both array size and transaction size in the input link are 1 the hash file load will start only once the input link execution is over. The data wil not immediately get reflected into the hash file.
--> yup yup.. that was a typo regarding T.S on output link..
yeah even i tested it jsut now...the hash file is ritten at the end..so even if array size and T.S. is 1, the hash file is not getting updated immediately..so there is no point doing a look-up from hash file then...i will just do a table lookup.. would create an index on the key column..

#### I do not know if the commit is mandatry. I can not even do a sample job to test it because I don't have access to DataStage right now.
--> yup this is something that we have to do research on... but for now i am playing it safe...i put both array size and tranasaction size to 1...
would change the jobs designed by me to look up from the table instead of hash files...

thanks so much sumit..