Search found 65 matches

by wuruima
Wed Jul 30, 2014 10:21 pm
Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
Topic: Q. apt file Node setting
Replies: 3
Views: 1901

Q. apt file Node setting

Dear all, May I ask a quick question? If I have 4 file system(every one is 10G), and I set the apt file like this node "node1" { fastname "xxx" pools "" resource disk "/Node1/DataSets81" {pools ""} resource scratchdisk "/Node1/Scratch81" {p...
by wuruima
Thu Nov 07, 2013 8:20 pm
Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
Topic: datastage sort best performance
Replies: 9
Views: 11086

ray.wurlod wrote:Did you increase the memory using the Sort stage? ...
:o i didn't change any stage setting for my jobs, let me have a try..
by wuruima
Wed Nov 06, 2013 3:01 am
Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
Topic: datastage sort best performance
Replies: 9
Views: 11086

You could avoid splitting the file by using the "multiple readers per node" capability. Use Sort stage to sort the individual partitions (partition by the first sort key so that results are correct), ... I find that, use the sort function in Aggregator instead of using a sort stage, perfo...
by wuruima
Tue Nov 05, 2013 9:47 pm
Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
Topic: datastage sort best performance
Replies: 9
Views: 11086

Thanks for your reply, yes the big file is a sequential file. I did some testing to get the best practise, and find that if I split this big file to 4 small files, and use 4 aggregator stages to do the pre sort/sum for each file after reading, and then use funnel to collect all the 4 links and use t...
by wuruima
Mon Nov 04, 2013 10:33 pm
Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
Topic: datastage sort best performance
Replies: 9
Views: 11086

datastage sort best performance

dear all, I would like to sort a file of more than 80,000,000 records. i find that if i use one file(named file1) to save all these records, and build a job(4 nodes in cfg file) to read/sort/sum result/output, the job takes long time to read. However, if i break this file1 to 4 files(file1,file2,fil...