Records with duplicate key values to be loaded into the hash

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
pmadhavi
Charter Member
Charter Member
Posts: 92
Joined: Fri Jan 27, 2006 2:54 pm

Records with duplicate key values to be loaded into the hash

Post by pmadhavi »

Hi

We have a mainframe file which has to be loaded to SQL server.
We have so many processes in between before loading it into the database. There are 5 key columns which have duplicate rows in mainframe file. We need to load all the records to a hashed file. But I think as they are key columns, hashed file wont allow duplicates to be loaded. So some records are getting discarded in this process. I created a new column and made it as a key thinking that now the combination is unique and all the records will be loaded into the hashed file without any problem. But even now I can see some records loss.

Please let me know why is this happening. and How to solve this problem.

Thanks in Advance.
Thanks,
Madhavi
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Re: Records with duplicate key values to be loaded into the

Post by DeepakCorning »

Here is a workaround -- Assupmtion that I made it you are not using this Hashed file as a Lookup , if yes then this wont work.

Create a Column in the Hashed File called as a counter or something , make it a key and populate it with a System Variable "@INROWNUM" or "@OUTROWNUM". The system variable are dependent on the row number flowing through the job and hence are always unique. This will aloow to populate everything in Hashed File.
shrey3a
Premium Member
Premium Member
Posts: 234
Joined: Sun Nov 21, 2004 10:41 pm

Re: Records with duplicate key values to be loaded into the

Post by shrey3a »

What is the transformer derivatio for the new Key column u have added make it to @INROWNUM and see if it works

[quote="pmadhavi"]Hi

We have a mainframe file which has to be loaded to SQL server.
We have so many processes in between before loading it into the database. There are 5 key columns which have duplicate rows in mainframe file. We need to load all the records to a hashed file. But I think as they are key columns, hashed file wont allow duplicates to be loaded. So some records are getting discarded in this process. I created a new column and made it as a key thinking that now the combination is unique and all the records will be loaded into the hashed file without any problem. But even now I can see some records loss.

Please let me know why is this happening. and How to solve this problem.

Thanks in Advance.[/quote]
pmadhavi
Charter Member
Charter Member
Posts: 92
Joined: Fri Jan 27, 2006 2:54 pm

Re: Records with duplicate key values to be loaded into the

Post by pmadhavi »

We are going to use the hashed file as look up in the next job.
I created a column and populated @OUTROWNUM to the column and also made the column as key along with other key columns. but still I am not able to populate all the records into the hashed file. few records are getting discarded.
Thanks,
Madhavi
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Re: Records with duplicate key values to be loaded into the

Post by DeepakCorning »

When you say Discarded do you mean it throws an error in the log file and does not write or it does not populate only??

Can you please let us know if the @outrownum values are unique in the Hashed file you created?
shrey3a
Premium Member
Premium Member
Posts: 234
Joined: Sun Nov 21, 2004 10:41 pm

Re: Records with duplicate key values to be loaded into the

Post by shrey3a »

Pl post the Job design

[quote="pmadhavi"]We are going to use the hashed file as look up in the next job.
I created a column and populated @OUTROWNUM to the column and also made the column as key along with other key columns. but still I am not able to populate all the records into the hashed file. few records are getting discarded.[/quote]
pmadhavi
Charter Member
Charter Member
Posts: 92
Joined: Fri Jan 27, 2006 2:54 pm

Re: Records with duplicate key values to be loaded into the

Post by pmadhavi »

It is not throwing any warning in the log. It's not populating the rows to the hashed file.

S, the ourrownum generated unique values for all the records.
Thanks,
Madhavi
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Re: Records with duplicate key values to be loaded into the

Post by DeepakCorning »

Thats quite strange , if one of the Key columns are unique then no rows should fall out from the Hashed File. Let me try to replicate the issue and see if that is possible or not.

What happens if you use a sequential file? Populate it and check out if there are any duplicates. As it can be opened by an Excel should be easy.
pmadhavi
Charter Member
Charter Member
Posts: 92
Joined: Fri Jan 27, 2006 2:54 pm

Re: Records with duplicate key values to be loaded into the

Post by pmadhavi »

DeepakCorning wrote:Thats quite strange , if one of the Key columns are unique then no rows should fall out from the Hashed File. Let me try to replicate the issue and see if that is possible or not.

What happens if you use a sequential file? Populate it and check out if there are any duplicates. As it can be opened by an Excel should be easy.
Sorry. I found some duplicate rows in the SEQ file. its because if the look up is found, then it will populate the same rowid created int he previous stage. so even the rowid is getting repeated. Now i changed it like this. I gave a new column ROWID just before loading to the hashed file and made it the key.

Now I am able to get all the records into the hashed file. In the next job I am going to use it as lookup. But i dont want to look up on row ID. it was used just to make sure that all the rows have come into the hashed file. As it is the key column, i need to have a value to look up in transformer stage. Can i give a dummy value? If i dont give anyhting, it is throwing compilation error.

Please help me.
Thanks,
Madhavi
pmadhavi
Charter Member
Charter Member
Posts: 92
Joined: Fri Jan 27, 2006 2:54 pm

Re: Records with duplicate key values to be loaded into the

Post by pmadhavi »

Sorry. I found some duplicate rows in the SEQ file. its because if the look up is found, then it will populate the same rowid created int he previous stage. so even the rowid is getting repeated. Now i changed it like this. I gave a new column ROWID just before loading to the hashed file and made it the key.

Now I am able to get all the records into the hashed file. In the next job I am going to use it as lookup. But i dont want to look up on row ID. it was used just to make sure that all the rows have come into the hashed file. As it is the key column, i need to have a value to look up in transformer stage. Can i give a dummy value? If i dont give anyhting, it is throwing compilation error.

Please help me.
Thanks,
Madhavi
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Re: Records with duplicate key values to be loaded into the

Post by DeepakCorning »

DeepakCorning wrote:Here is a workaround -- Assumption - you are not using this Hashed file as a Lookup , if yes then this wont work.
Right now I can not think of any way to make this hashed file used as a lookup , as I clearly mentioned it was just a wrokaround to bring in complete data....
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The only way is to use a UV stage, and declare the actual value (not the ROWID value) as Key in your metadata. This will be terribly slow since it's a non-key-based lookup. You can improve performance by creating an index on the hashed file. Search the forum for CREATE.INDEX command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply