Page 1 of 1

Records with duplicate key values to be loaded into the hash

Posted: Mon Oct 09, 2006 10:16 am
by pmadhavi
Hi

We have a mainframe file which has to be loaded to SQL server.
We have so many processes in between before loading it into the database. There are 5 key columns which have duplicate rows in mainframe file. We need to load all the records to a hashed file. But I think as they are key columns, hashed file wont allow duplicates to be loaded. So some records are getting discarded in this process. I created a new column and made it as a key thinking that now the combination is unique and all the records will be loaded into the hashed file without any problem. But even now I can see some records loss.

Please let me know why is this happening. and How to solve this problem.

Thanks in Advance.

Re: Records with duplicate key values to be loaded into the

Posted: Mon Oct 09, 2006 10:35 am
by DeepakCorning
Here is a workaround -- Assupmtion that I made it you are not using this Hashed file as a Lookup , if yes then this wont work.

Create a Column in the Hashed File called as a counter or something , make it a key and populate it with a System Variable "@INROWNUM" or "@OUTROWNUM". The system variable are dependent on the row number flowing through the job and hence are always unique. This will aloow to populate everything in Hashed File.

Re: Records with duplicate key values to be loaded into the

Posted: Mon Oct 09, 2006 10:36 am
by shrey3a
What is the transformer derivatio for the new Key column u have added make it to @INROWNUM and see if it works

[quote="pmadhavi"]Hi

We have a mainframe file which has to be loaded to SQL server.
We have so many processes in between before loading it into the database. There are 5 key columns which have duplicate rows in mainframe file. We need to load all the records to a hashed file. But I think as they are key columns, hashed file wont allow duplicates to be loaded. So some records are getting discarded in this process. I created a new column and made it as a key thinking that now the combination is unique and all the records will be loaded into the hashed file without any problem. But even now I can see some records loss.

Please let me know why is this happening. and How to solve this problem.

Thanks in Advance.[/quote]

Re: Records with duplicate key values to be loaded into the

Posted: Mon Oct 09, 2006 10:42 am
by pmadhavi
We are going to use the hashed file as look up in the next job.
I created a column and populated @OUTROWNUM to the column and also made the column as key along with other key columns. but still I am not able to populate all the records into the hashed file. few records are getting discarded.

Re: Records with duplicate key values to be loaded into the

Posted: Mon Oct 09, 2006 10:44 am
by DeepakCorning
When you say Discarded do you mean it throws an error in the log file and does not write or it does not populate only??

Can you please let us know if the @outrownum values are unique in the Hashed file you created?

Re: Records with duplicate key values to be loaded into the

Posted: Mon Oct 09, 2006 10:46 am
by shrey3a
Pl post the Job design

[quote="pmadhavi"]We are going to use the hashed file as look up in the next job.
I created a column and populated @OUTROWNUM to the column and also made the column as key along with other key columns. but still I am not able to populate all the records into the hashed file. few records are getting discarded.[/quote]

Re: Records with duplicate key values to be loaded into the

Posted: Mon Oct 09, 2006 10:48 am
by pmadhavi
It is not throwing any warning in the log. It's not populating the rows to the hashed file.

S, the ourrownum generated unique values for all the records.

Re: Records with duplicate key values to be loaded into the

Posted: Mon Oct 09, 2006 11:02 am
by DeepakCorning
Thats quite strange , if one of the Key columns are unique then no rows should fall out from the Hashed File. Let me try to replicate the issue and see if that is possible or not.

What happens if you use a sequential file? Populate it and check out if there are any duplicates. As it can be opened by an Excel should be easy.

Re: Records with duplicate key values to be loaded into the

Posted: Mon Oct 09, 2006 1:05 pm
by pmadhavi
DeepakCorning wrote:Thats quite strange , if one of the Key columns are unique then no rows should fall out from the Hashed File. Let me try to replicate the issue and see if that is possible or not.

What happens if you use a sequential file? Populate it and check out if there are any duplicates. As it can be opened by an Excel should be easy.
Sorry. I found some duplicate rows in the SEQ file. its because if the look up is found, then it will populate the same rowid created int he previous stage. so even the rowid is getting repeated. Now i changed it like this. I gave a new column ROWID just before loading to the hashed file and made it the key.

Now I am able to get all the records into the hashed file. In the next job I am going to use it as lookup. But i dont want to look up on row ID. it was used just to make sure that all the rows have come into the hashed file. As it is the key column, i need to have a value to look up in transformer stage. Can i give a dummy value? If i dont give anyhting, it is throwing compilation error.

Please help me.

Re: Records with duplicate key values to be loaded into the

Posted: Mon Oct 09, 2006 1:06 pm
by pmadhavi
Sorry. I found some duplicate rows in the SEQ file. its because if the look up is found, then it will populate the same rowid created int he previous stage. so even the rowid is getting repeated. Now i changed it like this. I gave a new column ROWID just before loading to the hashed file and made it the key.

Now I am able to get all the records into the hashed file. In the next job I am going to use it as lookup. But i dont want to look up on row ID. it was used just to make sure that all the rows have come into the hashed file. As it is the key column, i need to have a value to look up in transformer stage. Can i give a dummy value? If i dont give anyhting, it is throwing compilation error.

Please help me.

Re: Records with duplicate key values to be loaded into the

Posted: Mon Oct 09, 2006 1:21 pm
by DeepakCorning
DeepakCorning wrote:Here is a workaround -- Assumption - you are not using this Hashed file as a Lookup , if yes then this wont work.
Right now I can not think of any way to make this hashed file used as a lookup , as I clearly mentioned it was just a wrokaround to bring in complete data....

Posted: Mon Oct 09, 2006 3:42 pm
by ray.wurlod
The only way is to use a UV stage, and declare the actual value (not the ROWID value) as Key in your metadata. This will be terribly slow since it's a non-key-based lookup. You can improve performance by creating an index on the hashed file. Search the forum for CREATE.INDEX command.