Page 1 of 1
File Split logic ..
Posted: Wed Aug 17, 2016 4:16 pm
by Developer9
Hi ,
I have the following requirement I am developing Seq/parallel jobs.
Code: Select all
Dep_Group logic --Per group limit -5000 employees (records with Dep_Indicator BCF,DS)
Non_Dep_Group --No grouping required -Assign default '0' (records with Non_Dep_Indicator DW)
Input File
Code: Select all
Area|Employee_count|Deparment|Dep_DSC_INDICATOR|Dep_GROUP|NON_Dep_DSC_INDICATOR|NON_Dep_GROUP
1|2500|10|BCF,DS|1||
2|2500|10|BCF,DS|1||
3|2500|10|DS|2|DW|0
4|2500|10|BCF,DS|2||
5|2500|10|BCF,DS|3||
6|2500|10|DS|3|DW|0
Expected output
Code: Select all
Dep_DSC_Deparment_GROUP.txt|ExpectedAreas in each file
BCF_10_1.txt (1,2)
BCF_10_2.txt (4)
BCF_10_3.txt (5)
DS_10_1.txt (1,2)
DS_10_2.txt (3,4)
DS_10_3.txt (5,6)
DW_10_0.txt (3,6)
Current logic developed :
Code: Select all
uservariable_group_ctr (max_group counter) -->start loop --->split job (parallel job) -->end loop -exit loop
split job (M I -pass group number as constraints and generic file Dep_DSC_Deparment_GROUP.txt with parameterized )
Above development approach is based on the grouping requirement but how can I achieve splitting files as a whole (grouping and non grouping i.e default '0' records)
Please suggest me some ideas to implement.
Thank you
Posted: Thu Aug 18, 2016 7:36 am
by FranklinE
First, it would help to state your original requirements in plain text. You've clearly done a detailed analysis, but I can't follow your train of thought for it. Besides, what you have may contain flaws that no one can help find if we don't know what the original requirements look like.
As I understand it so far, you want to direct a row to a file based on grouping criteria. Further, you have complex groupings that allow a row to be directed to more than one file. My general suggestion is to create an output file of all rows, then query the file for each file-distinct grouping.
This covers two potential problems: you get the row in one file but not in the others to which it should go; you use direct selection criteria. Nothing beats SQL type coding for a clear link between code and requirement.
EDIT: I'm very big on design that has easy access for change. My client users are very big on changing their minds or discovering "new" requirements. Putting your split process in a distinct job -- reading the full file to generate the grouping files -- is an easy place to make changes to grouping criteria and to add new groupings.
Posted: Thu Aug 18, 2016 11:42 am
by Developer9
Hi FranklinE,
The original requirement to group the employees per Area based on maximum limit set for each group .(<= 4800 per group )
Source File
For instance
Area 1 -2500 employees ,
Area 2 -2300 employees
Area 3 -2000 employees
Area 4 -2000 employees
There is a chance that
Area 3 may present in two output files
File 1--Area 1,2,3 (2500+2300+200 =5000)
File 2 -Area 3,4 (1800+2000=3800 )
Make sure that in any case
Particular Area (ex:3) should not be present in more than one file.but
how do we calculate we are reaching the limit for each file.
Expected
File1 --Area 1,2 (2500+2300 =4800 )
File2 --Area 3,4 (2000+2000 =4000 )
I will provide more details as things progress in my development.
Thank you
Re: File Split logic ..
Posted: Thu Aug 18, 2016 8:23 pm
by Developer9
Hi,
I used transformer loop concept to convert a single row for data with repeating values in a single column
(Dep_DSC_INDICATOR) to multiple output rows.
Transformer loop job
Logic in Transformer
Code: Select all
--Stage Variable 'svDepDelim'
Dcount(Seq_AreaList_Out.Dep_DSC_INDICATOR,",")
--lOOP condition
@ITERATION <= svDepDelim
--Loop Variable
lvDepLoop --FIELD(Seq_AreaList_Out.Dep_DSC_INDICATOR, ";", @ITERATION, 1)
--Output link metadata and derivations
Dep_DSC_INDICATOR -- lvDepLoop
Output File: Only
Dep_DSC_INDICATOR field is split as below
Code: Select all
Area|Employee_count|Deparment|Dep_DSC_INDICATOR|Dep_GROUP|NON_Dep_DSC_INDICATOR|NON_Dep_GROUP
1|2500|10|BCF|1||
1|2500|10|DS|1||
3|2500|10|DS|2|DW|0
5|2500|10|BCF|3||
5|2500|10|DS|3||
2|2500|10|BCF|1||
2|2500|10|DS|1||
4|2500|10|BCF|2||
4|2500|10|DS|2||
6|2500|10|DS|3|DW|0
My Question is ,
1. How can I generate output based on
NON_Dep_DSC_INDICATOR field based on current value of DW or if it has DW,DX instead for Area 3 ,6-records .
Please advise me from this point on wards.
Thanks
Posted: Fri Aug 19, 2016 6:37 am
by chulett
Developer9 - please use the
Reply to topic option as there's
absolutely no need to quote everything you are replying to. Saves me from having to come along and clean up as well.
Thanks.
Posted: Fri Aug 19, 2016 10:43 am
by Developer9
chulett wrote:
Developer9 - please use the
Reply to topic option as there's
absolutely no need to quote everything you are replying to. Saves me from having to come along and clean up as well.
Thanks.
@chulett,I made the change to earlier post to remove earlier post content.
Thanks for the suggestion ..I hope I did what you were expecting me to do
![Smile :)](./images/smilies/icon_smile.gif)
Posted: Fri Aug 19, 2016 11:57 am
by chulett
Oh...
so close. Yes, removing the excessive quote was part of the deal but so was not continuing to do it going forward. And then you quoted me asking you to stop quoting.
![Wink :wink:](./images/smilies/icon_wink.gif)
Posted: Fri Aug 19, 2016 3:56 pm
by ray.wurlod
Why are you using different delimiter characters in the DCount() and Field() functions?
Posted: Fri Aug 19, 2016 5:28 pm
by Developer9
@ray,
I overlooked while posting in the forum its the same in both functions as per my current code .
I am still figuring out the way to pass
more than one field loop variables .But had some progress on the
(Dep_DSC_INDICATOR) file splitting through Sequence loop activity .
Approach I followed
Code: Select all
Department Counter(Loop1) --when multiple departments in file (1 -Here in example - 10)
Group Counter (Loop2) --Calculate maximum group number for each department (3) --Loop for 3 times (1,2,3)
But I need to add one more loop for existing job for split based on Dep_DSC_INDICATOR
Code: Select all
Delimiter Counter (Loop3)--2 (BCF;DS) get First occurance and use in file mask .
So I am expecting target files in this format.
Code: Select all
BCF_10_1.txt
BCF_10_2.txt
BCF_10_3.txt
Please let me know any ideas for implementing for this scenario.