configuration and partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kcshankar
Charter Member
Charter Member
Posts: 91
Joined: Mon Jan 10, 2005 2:06 am

configuration and partitioning

Post by kcshankar »

Hi,
My job cantains
1.source file
2.Lookup file
3.Lookup Satge
4.Target file
I want to run my job in 2*2 configuration.
When I do that, Iam not getting total output(in rows,ie say for 10 rows iam getting only 6).

If I run the same job in
a.default configuration or
b.SAME partition type and 2*2 configuration
iam getting all the records :? .


can anyone explain me what is happening :?:


Thanks in advance
kcs
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi & welcome abord :),

Can you explain more about the differences between the 2 runs you mentioned?
Last edited by roy on Wed Sep 21, 2005 11:09 pm, edited 1 time in total.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kcshankar
Charter Member
Charter Member
Posts: 91
Joined: Mon Jan 10, 2005 2:06 am

Post by kcshankar »

Hi Roy,
There are 2 jobs.Iam having problem in the 2nd job.

Job1.

Taking A1 file as Source and doing a Lookup on C file and creating a Dataset ,A1C.
Key ---Custid
Partition type ---- Hash based on CustId.
Configuration type 2 * 2.

job2.

Taking A1C file(output from job1) as source and doing a Lookup on A2 file and creating a Dataset A1A2C.
Key-----acctid
Partition type -- Hash based on acctid
Configuration type 2*2.
Result is iam not getting all the records.


The 2nd job either of the following conditions is working fine,

a.default config
or
b.SAME partition,2*2 config.




Thanks in advance
kcs
SriKara
Premium Member
Premium Member
Posts: 30
Joined: Wed Jun 01, 2005 8:40 am
Location: UK

Post by SriKara »

what does "configuration 2 * 2 " mean ?? :(
dsxdev
Participant
Posts: 92
Joined: Mon Sep 20, 2004 8:37 am

Post by dsxdev »

Hi Shankar,
What is 2*2 is it 2-node configuration file you are referring to ?
Any way I see you are hash partitioning on acctid and custid.

Are you any chance sorting and using unique option ?

This could be one reason for loosing the records.
When you use default partitioning perform sort with unique option is not active.
Happy DataStaging
kcshankar
Charter Member
Charter Member
Posts: 91
Joined: Mon Jan 10, 2005 2:06 am

Post by kcshankar »

friends,
Thanks for your replies.

Dev,Iam not using Sort or Unique options.
Fine,what is happening when the same job is running in 2-node configuration with the Partitioning type as SAME.



Thanks in advance
kcs
SriKara
Premium Member
Premium Member
Posts: 30
Joined: Wed Jun 01, 2005 8:40 am
Location: UK

Post by SriKara »

When you use the partitioning type "same" , the hash partitioning performed in the first job is preserved.
i.e. the dataset used in the second job also has the same hash partitioning on custid.

But when you perform hash partitioning on the acctid in the second job, the records are dropped. Can you check if there are any records with duplicate acctids?

Also in the reference link to lookup stage in second job, have you specified any particular partitioning property??
Post Reply