Search found 107 matches
- Thu Sep 01, 2011 2:26 pm
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Join stage takes a long time
- Replies: 9
- Views: 13663
Yes Sandeep. Your understanding of my job design is right. 1. Data Should be sorted on the key columns that were joined and should be applied with hash partitioning. I added the sort stages after the input of the sequential file stages. The first join produces the same number of record count as befo...
- Thu Sep 01, 2011 1:11 pm
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Join stage takes a long time
- Replies: 9
- Views: 13663
1) The OS line is meant to be the OS your DataStage instance is running in Ths OS is AIX 5.3 2) My job design: A sequential file input stage that feeds into a transformer to filter based on some conditons and then feeds into the first join stage. A second sequential file stage serves as the right li...
- Thu Sep 01, 2011 7:22 am
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Join stage takes a long time
- Replies: 9
- Views: 13663
Join stage takes a long time
I am designing parallel jobs to convert the business logic in SAS to DataStage to get an improvement in the total run time. In one of the jobs, we have two flat files as input each having about 40,000 records. A join stage (inner join) is used and the join stage creates about 1,120,000 records. The ...
- Thu Sep 01, 2011 7:06 am
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Need to retain unique records
- Replies: 9
- Views: 4702
Thanks Sandeep. But, one of the key columns can be nullable according to the business logic and so I cannot replace null with a value. The aggregator drops some records and I then use a filter stage and then a join stage to get the non-key columns back. After the join stage, I get the same record co...
- Wed Aug 31, 2011 2:16 pm
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Need to retain unique records
- Replies: 9
- Views: 4702
- Wed Aug 31, 2011 1:01 pm
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Need to retain unique records
- Replies: 9
- Views: 4702
Thanks DSGuru. I followed Shrikant's suggestion from the link below http://www.dsxchange.com/viewtopic.php?t=106508 I am able to get the correct record count of the unique records after the filter stage. But the aggregator stage drops some of the non-unique records. Any suggestions on why the aggega...
- Wed Aug 31, 2011 10:36 am
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Need to retain unique records
- Replies: 9
- Views: 4702
Need to retain unique records
Hello, I have a requirement where I need to retain unique records (records that are present only once in the source) and discard all records that appear two or more times at the source. If a record appears more than once, I need to discard all the records and not just remove dupliates of that partic...
- Thu Aug 25, 2011 3:06 pm
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Job aborts due to error in writeBlock - could not write
- Replies: 2
- Views: 4925
An update: The $APT_DISABLE_COMBINATION set to False, I get the following info when I run the job. APT_CombinedOperatorController(1),0: Numeric string expected for stage variable 'StageVar0_StageVar2'. Use default value. So, I change $APT_DISABLE_COMBINATION to true and when I run the job, I get the...
- Fri Aug 19, 2011 5:03 pm
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Job aborts due to error in writeBlock - could not write
- Replies: 2
- Views: 4925
Job aborts due to error in writeBlock - could not write
Hello, The parallel job has a sequential file input and a lookup stage that lookups into two files and outputs data to a sequential file. The following error happens occasionally. The job might abort the first time but if the job is recompiled and run again it runs fine and the process can be repeat...
- Thu Aug 18, 2011 9:58 am
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Join stage: Need to retain the column key from both tables
- Replies: 4
- Views: 1865
- Wed Aug 17, 2011 4:03 pm
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Join stage: Need to retain the column key from both tables
- Replies: 4
- Views: 1865
That was quick, Ray. My inputs are flat (sequential) files and not tables as previously mentioned. Sorry about that. I used a Copy stage after the Sequential file stage to create an additional column that will be a copy of the key column but it looks like the same input column cannot be mapped to mo...
- Wed Aug 17, 2011 3:37 pm
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Join stage: Need to retain the column key from both tables
- Replies: 4
- Views: 1865
Join stage: Need to retain the column key from both tables
Hello,
I have a join stage joining data from Table A and Table B based on join key Column A. Is there a way by which Column A from both Table A and Table B can be retained at the output of the Join stage?
Thanks.
I have a join stage joining data from Table A and Table B based on join key Column A. Is there a way by which Column A from both Table A and Table B can be retained at the output of the Join stage?
Thanks.
- Wed Aug 17, 2011 10:31 am
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Job aborts due to heap size allocation problem
- Replies: 4
- Views: 2535
There are about 13 million records in one of the files and about 30,000 records in the other file and the order of the links does not seem to matter as I get the same error even after changing the link ordering. So, I guess I have a lot of duplicates in either file. I split the 13 million record fil...
- Wed Aug 17, 2011 7:21 am
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Job aborts due to heap size allocation problem
- Replies: 4
- Views: 2535
Please see below for the results from running the ulimit command inside the job Command: ulimit -S My output: unlimited Command: ulimit -H My ouptput: unlimited Comand: ulimt -a time(seconds) unlimited file(blocks) unlimited data(kbytes) 786432 stack(kbytes) 4194304 memory(kbytes) 32768 coredump(blo...
- Tue Aug 16, 2011 3:45 pm
- Forum: IBM<sup>®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)
- Topic: Job aborts due to heap size allocation problem
- Replies: 4
- Views: 2535
Job aborts due to heap size allocation problem
Hello, The parallel job has a join stage that is joining millions of records from two data sets and the job aborts with the following error: Join_8,2: The current soft limit on the data segment (heap) size (805306368) is less than the hard limit (2147483647), consider increasing the heap size limit ...