Maximum number of KEY Fields in a Hash File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

Maximum number of KEY Fields in a Hash File

Post by bapajju »

Hello,

We are writing records to a hash file that has 70 KEY fields. But we are unable to write data to the hash file. We get the below error message.

1.EDWSTGCsksINS_debugjob..Hashed_File_42.All_Ins_to_HASH1: ds_uvput() - Write failed for record id '500.....

and

2.record ID size of 768 characters.
Program "JOB.618388182.DT.1418661486.TRANS1": Line 508, Attempted WRITE with record ID larger than file/table maximum
record ID size of 768 characters.

Any clue on this? Is there a limit for the Size or number of key field?


Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes, there is a maximum set in the uvconfig file and guess what? It's 768. :shock:

Code: Select all

# MAXKEYSIZE - This sets the maximum number of characters that can
#   be used within a primary key.  The minimum value of 255 retains
#   the historical maximum key length as well as the historical encoding
#   of record locks within a 64-byte entry.  Any other value, which must
#   be a multiple of 64 between 256 and 2048, will extend the maximum key
#   length to that value.  Additionally, the full record id will be stored
#   within the record lock entry.  This value, once raised, can only be
#   reset by calling Ascential Support. Do NOT change this from the default
#   value of 768 without understanding its effect on the record lock table.
MAXKEYSIZE 768
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... and can not be tuned any higher.

Prove to me that you can not obtain uniqueness with fewer than 70 columns. This is a ridiculous amount! The most I've ever seen in very many years in database technology is nine. Sheesh! :roll:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

Post by bapajju »

Thanks for the response Ray. I do understand what you are saying.

IT IS RIDICULOUS:)

I am not using the has file for uniqueness. I am trying to use hash file TO REPLACE Link Collector. Link Collector sometimes throws "MUTEX" error. I am trying to avoid link collector. I am trying to bring data from two transformer stages and trying to collect all the records and pass it to a single target. To pass all the records from two sources I have made all the fields as key fields. In this was I will not eliminate any record.

Is there any other way to replace link collector?

Thanks in Advance
ray.wurlod wrote:... and can not be tuned any higher.

Prove to me that you can not obtain uniqueness with fewer than 70 columns. This is a ridiculous amount! The most I've ever seen in very many years in database ...
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

bapajju wrote:
Is there any other way to replace link collector?

Thanks in Advance
You could just write them to sequential files and concatenate them in UNIX.

And besides, surely you don't need all the columns to be keys to make a record unique - I think that is what Ray was alluding to.
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

Post by bapajju »

Out of the 70 fields any one field (except for the fierst field) can change and rest of them will remain same. I do not know which field might change.
ShaneMuir wrote:
bapajju wrote:
Is there any other way to replace link collector?

Thanks in Advance
You could just write them to sequential files and concatenate them in UNIX.

And besides, surely you don't need all the columns to be keys to make a record unique - I think that is what Ray was alluding to.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do you have an English language specification of what you are trying to accomplish, that you could share with us?

There's usually another way, but we don't want to waste time suggesting things that aren't what's needed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

Post by bapajju »

Ray,

Let me explain what I am trying to achive.

1. I have two sources.
2. I am trying to process these records in to a single target table (TERADATA) through one datastage job
3. I am trying to merge these records and push them into the target table
4. There might be a case when the two sources can send exactly the same record, but I do not want to eliminate any one of them. I want to process both the records.

Please let me know if I can achive this through a single job.

Thanks in advance


ray.wurlod wrote:Do you have an English language specification of what you are trying to accomplish, that you could share with us?

There's usually another way, but we don't want to waste time suggesting things that ...
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

bapajju wrote:
I am not using the has file for uniqueness. I am trying to use hash file TO REPLACE Link Collector. Link Collector sometimes throws "MUTEX" error. I am trying to avoid link collector. I am trying to bring data from two transformer stages and trying to collect all the records and pass it to a single target. To pass all the records from two sources I have made all the fields as key fields. In this was I will not eliminate any record.
If you want to replace link-collector, that is the worst way to do it. Not only there is size limitation on record length, but also writing to hash is expensive affair. Every record has to go through hashing algorithm which would take lot of time if you have so many keys and the time would increase with number of record. secondly hash files has 2GB size limitation...

Suggestion 1: Enable Row Buffer > Inter process > Increase the time-out to a higher number. You will have to do some R&D to find the optimum value for time-out.

Suggestion2: Write it to, two Seqiential file and in the after job script or after trasform script append one file over another.
Shantanu Choudhary
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

bapajju wrote:Ray,

Let me explain what I am trying to achive.

1. I have two sources.
2. I am trying to process these records in to a single target table (TERADATA) through one datastage job
3. I am trying to merge these records and push them into the target table
4. There might be a case when the two sources can send exactly the same record, but I do not want to eliminate any one of them. I want to process both the records.

What is the key on the target table? Surely if you have 2 records the same they can't both be loaded? What processing are you doing?

It would seem to me that you need to create a new key (eq a generated number) to append to the front of the row - if you want to keep duplicates.
Yuan_Edward
Participant
Posts: 73
Joined: Tue May 10, 2005 6:21 pm
Location: Sydney

Post by Yuan_Edward »

A simple way (probably not a good way) is to put two input links to one TeraData OCI, something like:

source1=>Transformer1=>
TeraData OCI
source2=>Transformer2=>
bapajju wrote:Ray,

Let me explain what I am trying to achive.

1. I have two sources.
2. I am trying to process these records in to a single target table (TERADATA) through one datastage job
3. I am trying to merge these records and push them into the target table
4. There might be a case when the two sources can send exactly the same record, but I do not want to eliminate any one of them. I want to process both the records.

Please let me know if I can achive this through a single job.

Thanks in advance


ray.wurlod wrote:Do you have an English language specification of what you are trying to accomplish, that you could share with us?

There's usually another way, but we don't want to waste time suggesting things that ...
Edward Yuan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Simple is often the most effective. But, with this design, how do you handle the duplicate key? Or are you generating surrogate (artificial) keys into the target table?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

Yuan_Edward wrote:A simple way (probably not a good way) is to put two input links to one TeraData OCI, something like:

source1=>Transformer1=>
TeraData OCI
source2=>Transformer2=>
I don't know about Teradata, but DB2 and Oracle establish a lock before writing to the table. So in your scenario, there will be contention for establishing a lock. While one of the link(process) is writing to the table after establishing a lock, the other will be in wait stage and it may abort after some time.
There is chances that your table and tablespace goes in unusable state. for DB2 it would go in Load pending state; I dont remember what it is called in Oracle.
Shantanu Choudhary
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Post by DeepakCorning »

I do not think declaring all the columns as Key is the solution here , but the simplest that I can think of is to declare an extra column in the target and assign it a counter (may be a row counter) - same concept as surrogate key generator.
So if your source 1 has 10 and source 2 has 10 rows then the target will have 20 rows with unique counter values.
Post Reply