Page 1 of 1

configuration and partitioning

Posted: Sun Jun 05, 2005 3:44 am
by kcshankar
Hi,
My job cantains
1.source file
2.Lookup file
3.Lookup Satge
4.Target file
I want to run my job in 2*2 configuration.
When I do that, Iam not getting total output(in rows,ie say for 10 rows iam getting only 6).

If I run the same job in
a.default configuration or
b.SAME partition type and 2*2 configuration
iam getting all the records :? .


can anyone explain me what is happening :?:


Thanks in advance
kcs

Posted: Sun Jun 05, 2005 4:16 am
by roy
Hi & welcome abord :),

Can you explain more about the differences between the 2 runs you mentioned?

Posted: Sun Jun 05, 2005 4:52 am
by kcshankar
Hi Roy,
There are 2 jobs.Iam having problem in the 2nd job.

Job1.

Taking A1 file as Source and doing a Lookup on C file and creating a Dataset ,A1C.
Key ---Custid
Partition type ---- Hash based on CustId.
Configuration type 2 * 2.

job2.

Taking A1C file(output from job1) as source and doing a Lookup on A2 file and creating a Dataset A1A2C.
Key-----acctid
Partition type -- Hash based on acctid
Configuration type 2*2.
Result is iam not getting all the records.


The 2nd job either of the following conditions is working fine,

a.default config
or
b.SAME partition,2*2 config.




Thanks in advance
kcs

Posted: Sun Jun 05, 2005 5:39 am
by SriKara
what does "configuration 2 * 2 " mean ?? :(

Posted: Mon Jun 06, 2005 6:09 am
by dsxdev
Hi Shankar,
What is 2*2 is it 2-node configuration file you are referring to ?
Any way I see you are hash partitioning on acctid and custid.

Are you any chance sorting and using unique option ?

This could be one reason for loosing the records.
When you use default partitioning perform sort with unique option is not active.

Posted: Mon Jun 06, 2005 7:07 am
by kcshankar
friends,
Thanks for your replies.

Dev,Iam not using Sort or Unique options.
Fine,what is happening when the same job is running in 2-node configuration with the Partitioning type as SAME.



Thanks in advance
kcs

Posted: Mon Jun 06, 2005 7:21 am
by SriKara
When you use the partitioning type "same" , the hash partitioning performed in the first job is preserved.
i.e. the dataset used in the second job also has the same hash partitioning on custid.

But when you perform hash partitioning on the acctid in the second job, the records are dropped. Can you check if there are any records with duplicate acctids?

Also in the reference link to lookup stage in second job, have you specified any particular partitioning property??