Hi, I'm am trying to familiarise myself with PX and have the following problem.
2 files both files 50 records (to keep it easy)
CustData (keyed by custid(int) 1-50)
CustLoookup (keyed by custid(int) 1-50)
PX job 1 load CustLookup into lookup dataset modulas partioned
on custid
PX job 2 read CUstDate -->> tfm with partion mod on cust id
lookup stage ref custlookup - reject any not found records
- out put found records
results
PX job 2 outputs 12 records and rejects 38
the 12 records are all in partion 0 (custId = 4,8,12 etc)
the 38 records are in partions 1-3 (custid = 1,2,3,5,6,7 etc)
I have a .apt config with 4 nodes
Question is ???????????????????????/
I have worked out after a bit of sleuth work , i.e. using the data above that I am not partioning correctly, but since both the lookup data set and the lookup stage are partioned by mod(custid) - I have checked this with Peek stage how come my lookup stage only seems to find data on Part0
I KNOW I am missing something but any tips / gotchas / pithy comments and abuse welcome, I have speant an evening on this when I should have been down the pub so would like to get it sorted
Thanks in advance
Fridge
Stupid Beginners Ques re lookup stage - HELP I am to dense
Moderators: chulett, rschirm, roy
Re: Stupid Beginners Ques re lookup stage - HELP I am to den
Please do not use manual partitioning. Lookup stage is designed to handle partitioning on its own (in fact, by default they use "Entire" partitioning). If you order specific partitioning, that override the lookup's default behavior.fridge wrote:PX job 1 load CustLookup into lookup dataset modulas partioned on custid
PX job 2 read CUstDate -->> tfm with partion mod on cust id
Manual partitioning are only to be used when the documents/help files specifically do not say that it do it. From the help files:
"There are some special partitioning considerations for lookup stages. You need to ensure that the data being looked up in the lookup table is in the same partition as the input data referencing it. One way of doing this is to partition the lookup tables using the Entire method. Another way is to partition it in the same way as the input data (although this implies sorting of the data)."
The Entire method is done by default. Do an $APT_DUMP_SCORE if you want to observe the behavior of the nodes.
This response is assuming you are using 7.x. Lookup behavior is different for 6.x.
Just make sure it is standard practice for everything -- do not partition/sort data unless you have to. And when you think you have to do it, run a nice small test with a bunch of randomized data to confirm that this is indeed required.
Ascential tries to minimize the need to do manual control over this aspect, especially starting at 7.x.
I have seen developers sort and partition data going to datasets. Thus other developers working on jobs down the streams would get very unusual results, spending days trying to fix it before turning to me for help. Once I traced the error up the flow to this, and removed the sort/partitioning, the problems suddenly disappears.
Ascential tries to minimize the need to do manual control over this aspect, especially starting at 7.x.
I have seen developers sort and partition data going to datasets. Thus other developers working on jobs down the streams would get very unusual results, spending days trying to fix it before turning to me for help. Once I traced the error up the flow to this, and removed the sort/partitioning, the problems suddenly disappears.