Records with duplicate key values to be loaded into the hash

pmadhavi · Post by **pmadhavi** » Mon Oct 09, 2006 10:16 am

Hi

We have a mainframe file which has to be loaded to SQL server.
We have so many processes in between before loading it into the database. There are 5 key columns which have duplicate rows in mainframe file. We need to load all the records to a hashed file. But I think as they are key columns, hashed file wont allow duplicates to be loaded. So some records are getting discarded in this process. I created a new column and made it as a key thinking that now the combination is unique and all the records will be loaded into the hashed file without any problem. But even now I can see some records loss.

Please let me know why is this happening. and How to solve this problem.

Thanks in Advance.

DeepakCorning · Post by **DeepakCorning** » Mon Oct 09, 2006 10:35 am

Here is a workaround -- Assupmtion that I made it you are not using this Hashed file as a Lookup , if yes then this wont work.

Create a Column in the Hashed File called as a counter or something , make it a key and populate it with a System Variable "@INROWNUM" or "@OUTROWNUM". The system variable are dependent on the row number flowing through the job and hence are always unique. This will aloow to populate everything in Hashed File.

shrey3a · Post by **shrey3a** » Mon Oct 09, 2006 10:36 am

What is the transformer derivatio for the new Key column u have added make it to @INROWNUM and see if it works

[quote="pmadhavi"]Hi

We have a mainframe file which has to be loaded to SQL server.
We have so many processes in between before loading it into the database. There are 5 key columns which have duplicate rows in mainframe file. We need to load all the records to a hashed file. But I think as they are key columns, hashed file wont allow duplicates to be loaded. So some records are getting discarded in this process. I created a new column and made it as a key thinking that now the combination is unique and all the records will be loaded into the hashed file without any problem. But even now I can see some records loss.

Please let me know why is this happening. and How to solve this problem.

Thanks in Advance.[/quote]

pmadhavi · Post by **pmadhavi** » Mon Oct 09, 2006 10:42 am

We are going to use the hashed file as look up in the next job.
I created a column and populated @OUTROWNUM to the column and also made the column as key along with other key columns. but still I am not able to populate all the records into the hashed file. few records are getting discarded.

DeepakCorning · Post by **DeepakCorning** » Mon Oct 09, 2006 10:44 am

When you say Discarded do you mean it throws an error in the log file and does not write or it does not populate only??

Can you please let us know if the @outrownum values are unique in the Hashed file you created?

shrey3a · Post by **shrey3a** » Mon Oct 09, 2006 10:46 am

Pl post the Job design

[quote="pmadhavi"]We are going to use the hashed file as look up in the next job.
I created a column and populated @OUTROWNUM to the column and also made the column as key along with other key columns. but still I am not able to populate all the records into the hashed file. few records are getting discarded.[/quote]

pmadhavi · Post by **pmadhavi** » Mon Oct 09, 2006 10:48 am

It is not throwing any warning in the log. It's not populating the rows to the hashed file.

S, the ourrownum generated unique values for all the records.

DeepakCorning · Post by **DeepakCorning** » Mon Oct 09, 2006 11:02 am

Thats quite strange , if one of the Key columns are unique then no rows should fall out from the Hashed File. Let me try to replicate the issue and see if that is possible or not.

What happens if you use a sequential file? Populate it and check out if there are any duplicates. As it can be opened by an Excel should be easy.

pmadhavi · Post by **pmadhavi** » Mon Oct 09, 2006 1:05 pm

DeepakCorning wrote:Thats quite strange , if one of the Key columns are unique then no rows should fall out from the Hashed File. Let me try to replicate the issue and see if that is possible or not.

What happens if you use a sequential file? Populate it and check out if there are any duplicates. As it can be opened by an Excel should be easy.

Sorry. I found some duplicate rows in the SEQ file. its because if the look up is found, then it will populate the same rowid created int he previous stage. so even the rowid is getting repeated. Now i changed it like this. I gave a new column ROWID just before loading to the hashed file and made it the key.

Now I am able to get all the records into the hashed file. In the next job I am going to use it as lookup. But i dont want to look up on row ID. it was used just to make sure that all the rows have come into the hashed file. As it is the key column, i need to have a value to look up in transformer stage. Can i give a dummy value? If i dont give anyhting, it is throwing compilation error.

Please help me.

pmadhavi · Post by **pmadhavi** » Mon Oct 09, 2006 1:06 pm

Sorry. I found some duplicate rows in the SEQ file. its because if the look up is found, then it will populate the same rowid created int he previous stage. so even the rowid is getting repeated. Now i changed it like this. I gave a new column ROWID just before loading to the hashed file and made it the key.

Now I am able to get all the records into the hashed file. In the next job I am going to use it as lookup. But i dont want to look up on row ID. it was used just to make sure that all the rows have come into the hashed file. As it is the key column, i need to have a value to look up in transformer stage. Can i give a dummy value? If i dont give anyhting, it is throwing compilation error.

Please help me.

DeepakCorning · Post by **DeepakCorning** » Mon Oct 09, 2006 1:21 pm

DeepakCorning wrote:Here is a workaround -- Assumption - you are not using this Hashed file as a Lookup , if yes then this wont work.

Right now I can not think of any way to make this hashed file used as a lookup , as I clearly mentioned it was just a wrokaround to bring in complete data....

ray.wurlod · Post by **ray.wurlod** » Mon Oct 09, 2006 3:42 pm

The only way is to use a UV stage, and declare the actual value (not the ROWID value) as Key in your metadata. This will be terribly slow since it's a non-key-based lookup. You can improve performance by creating an index on the hashed file. Search the forum for CREATE.INDEX command.

DSXchange

Records with duplicate key values to be loaded into the hash

Records with duplicate key values to be loaded into the hash

Re: Records with duplicate key values to be loaded into the

Re: Records with duplicate key values to be loaded into the

Re: Records with duplicate key values to be loaded into the

Re: Records with duplicate key values to be loaded into the

Re: Records with duplicate key values to be loaded into the

Re: Records with duplicate key values to be loaded into the

Re: Records with duplicate key values to be loaded into the

Re: Records with duplicate key values to be loaded into the

Re: Records with duplicate key values to be loaded into the

Re: Records with duplicate key values to be loaded into the