File Split logic ..

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Developer9
Premium Member
Premium Member
Posts: 187
Joined: Thu Apr 14, 2011 5:10 pm

File Split logic ..

Post by Developer9 »

Hi ,

I have the following requirement I am developing Seq/parallel jobs.

Code: Select all

Dep_Group logic --Per group limit -5000 employees (records with Dep_Indicator BCF,DS)
Non_Dep_Group --No grouping required -Assign default '0' (records with Non_Dep_Indicator DW)
Input File

Code: Select all

Area|Employee_count|Deparment|Dep_DSC_INDICATOR|Dep_GROUP|NON_Dep_DSC_INDICATOR|NON_Dep_GROUP
1|2500|10|BCF,DS|1||		
2|2500|10|BCF,DS|1||					
3|2500|10|DS|2|DW|0
4|2500|10|BCF,DS|2||					
5|2500|10|BCF,DS|3||					
6|2500|10|DS|3|DW|0
Expected output

Code: Select all

Dep_DSC_Deparment_GROUP.txt|ExpectedAreas in each file
BCF_10_1.txt (1,2)
BCF_10_2.txt (4)
BCF_10_3.txt (5)
DS_10_1.txt (1,2)
DS_10_2.txt (3,4)
DS_10_3.txt (5,6)
DW_10_0.txt (3,6)
Current logic developed :

Code: Select all

uservariable_group_ctr (max_group counter) -->start loop --->split job (parallel job) -->end loop -exit loop
split job (M I -pass group number as constraints and generic file Dep_DSC_Deparment_GROUP.txt with parameterized )

Code: Select all

input file --xfm -->output file
Above development approach is based on the grouping requirement but how can I achieve splitting files as a whole (grouping and non grouping i.e default '0' records)

Please suggest me some ideas to implement.

Thank you
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

First, it would help to state your original requirements in plain text. You've clearly done a detailed analysis, but I can't follow your train of thought for it. Besides, what you have may contain flaws that no one can help find if we don't know what the original requirements look like.

As I understand it so far, you want to direct a row to a file based on grouping criteria. Further, you have complex groupings that allow a row to be directed to more than one file. My general suggestion is to create an output file of all rows, then query the file for each file-distinct grouping.

This covers two potential problems: you get the row in one file but not in the others to which it should go; you use direct selection criteria. Nothing beats SQL type coding for a clear link between code and requirement.

EDIT: I'm very big on design that has easy access for change. My client users are very big on changing their minds or discovering "new" requirements. Putting your split process in a distinct job -- reading the full file to generate the grouping files -- is an easy place to make changes to grouping criteria and to add new groupings.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Developer9
Premium Member
Premium Member
Posts: 187
Joined: Thu Apr 14, 2011 5:10 pm

Post by Developer9 »

Hi FranklinE,

Code: Select all

Edited earlier post 
The original requirement to group the employees per Area based on maximum limit set for each group .(<= 4800 per group )

Source File

Code: Select all

Employee Number,Area,Department
For instance
Area 1 -2500 employees ,
Area 2 -2300 employees
Area 3 -2000 employees
Area 4 -2000 employees

There is a chance that Area 3 may present in two output files

File 1--Area 1,2,3 (2500+2300+200 =5000)
File 2 -Area 3,4 (1800+2000=3800 )

Make sure that in any case Particular Area (ex:3) should not be present in more than one file.but how do we calculate we are reaching the limit for each file.

Expected

File1 --Area 1,2 (2500+2300 =4800 )
File2 --Area 3,4 (2000+2000 =4000 )


I will provide more details as things progress in my development.

Thank you
Last edited by Developer9 on Fri Aug 26, 2016 10:15 am, edited 1 time in total.
Developer9
Premium Member
Premium Member
Posts: 187
Joined: Thu Apr 14, 2011 5:10 pm

Re: File Split logic ..

Post by Developer9 »

Hi,

I used transformer loop concept to convert a single row for data with repeating values in a single column (Dep_DSC_INDICATOR) to multiple output rows.

Transformer loop job

Code: Select all

SEQ (INPUT) ---XFM --SEQ (OUTPUT)
Logic in Transformer

Code: Select all

--Stage Variable 'svDepDelim'
Dcount(Seq_AreaList_Out.Dep_DSC_INDICATOR,",")
--lOOP condition
@ITERATION <= svDepDelim
--Loop Variable
lvDepLoop  --FIELD(Seq_AreaList_Out.Dep_DSC_INDICATOR, ";", @ITERATION, 1)
--Output link metadata and derivations
Dep_DSC_INDICATOR -- lvDepLoop
Output File: OnlyDep_DSC_INDICATOR field is split as below

Code: Select all

Area|Employee_count|Deparment|Dep_DSC_INDICATOR|Dep_GROUP|NON_Dep_DSC_INDICATOR|NON_Dep_GROUP 
1|2500|10|BCF|1||
1|2500|10|DS|1||
3|2500|10|DS|2|DW|0
5|2500|10|BCF|3||
5|2500|10|DS|3||
2|2500|10|BCF|1||
2|2500|10|DS|1||
4|2500|10|BCF|2||
4|2500|10|DS|2||
6|2500|10|DS|3|DW|0
My Question is ,
1. How can I generate output based on NON_Dep_DSC_INDICATOR field based on current value of DW or if it has DW,DX instead for Area 3 ,6-records .

Please advise me from this point on wards.
Thanks
Last edited by Developer9 on Fri Aug 19, 2016 10:40 am, edited 1 time in total.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:!: Developer9 - please use the Reply to topic option as there's absolutely no need to quote everything you are replying to. Saves me from having to come along and clean up as well.

Thanks.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Developer9
Premium Member
Premium Member
Posts: 187
Joined: Thu Apr 14, 2011 5:10 pm

Post by Developer9 »

chulett wrote::!: Developer9 - please use the Reply to topic option as there's absolutely no need to quote everything you are replying to. Saves me from having to come along and clean up as well.

Thanks.
@chulett,I made the change to earlier post to remove earlier post content.

Thanks for the suggestion ..I hope I did what you were expecting me to do :)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Oh... so close. Yes, removing the excessive quote was part of the deal but so was not continuing to do it going forward. And then you quoted me asking you to stop quoting. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why are you using different delimiter characters in the DCount() and Field() functions?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Developer9
Premium Member
Premium Member
Posts: 187
Joined: Thu Apr 14, 2011 5:10 pm

Post by Developer9 »

@ray,

I overlooked while posting in the forum its the same in both functions as per my current code .

Code: Select all

';'
I am still figuring out the way to pass more than one field loop variables .But had some progress on the (Dep_DSC_INDICATOR) file splitting through Sequence loop activity .


Approach I followed

Code: Select all

Department Counter(Loop1)  --when multiple departments in file (1 -Here in example - 10)
Group Counter (Loop2) --Calculate maximum group number for each department (3) --Loop for 3 times (1,2,3)
But I need to add one more loop for existing job for split based on Dep_DSC_INDICATOR

Code: Select all

Delimiter Counter (Loop3)--2 (BCF;DS) get First occurance and use in file mask .
So I am expecting target files in this format.

Code: Select all

BCF_10_1.txt
BCF_10_2.txt
BCF_10_3.txt
Please let me know any ideas for implementing for this scenario.
Post Reply