Hi,
My job cantains
1.source file
2.Lookup file
3.Lookup Satge
4.Target file
I want to run my job in 2*2 configuration.
When I do that, Iam not getting total output(in rows,ie say for 10 rows iam getting only 6).
If I run the same job in
a.default configuration or
b.SAME partition type and 2*2 configuration
iam getting all the records .
Hi Roy,
There are 2 jobs.Iam having problem in the 2nd job.
Job1.
Taking A1 file as Source and doing a Lookup on C file and creating a Dataset ,A1C.
Key ---Custid
Partition type ---- Hash based on CustId.
Configuration type 2 * 2.
job2.
Taking A1C file(output from job1) as source and doing a Lookup on A2 file and creating a Dataset A1A2C.
Key-----acctid
Partition type -- Hash based on acctid
Configuration type 2*2.
Result is iam not getting all the records.
The 2nd job either of the following conditions is working fine,
Dev,Iam not using Sort or Unique options.
Fine,what is happening when the same job is running in 2-node configuration with the Partitioning type as SAME.
When you use the partitioning type "same" , the hash partitioning performed in the first job is preserved.
i.e. the dataset used in the second job also has the same hash partitioning on custid.
But when you perform hash partitioning on the acctid in the second job, the records are dropped. Can you check if there are any records with duplicate acctids?
Also in the reference link to lookup stage in second job, have you specified any particular partitioning property??