Page 1 of 1

File Split logic ..

Posted: Wed Aug 17, 2016 4:16 pm
by Developer9
Hi ,

I have the following requirement I am developing Seq/parallel jobs.

Code: Select all

Dep_Group logic --Per group limit -5000 employees (records with Dep_Indicator BCF,DS)
Non_Dep_Group --No grouping required -Assign default '0' (records with Non_Dep_Indicator DW)
Input File

Code: Select all

Area|Employee_count|Deparment|Dep_DSC_INDICATOR|Dep_GROUP|NON_Dep_DSC_INDICATOR|NON_Dep_GROUP
1|2500|10|BCF,DS|1||		
2|2500|10|BCF,DS|1||					
3|2500|10|DS|2|DW|0
4|2500|10|BCF,DS|2||					
5|2500|10|BCF,DS|3||					
6|2500|10|DS|3|DW|0
Expected output

Code: Select all

Dep_DSC_Deparment_GROUP.txt|ExpectedAreas in each file
BCF_10_1.txt (1,2)
BCF_10_2.txt (4)
BCF_10_3.txt (5)
DS_10_1.txt (1,2)
DS_10_2.txt (3,4)
DS_10_3.txt (5,6)
DW_10_0.txt (3,6)
Current logic developed :

Code: Select all

uservariable_group_ctr (max_group counter) -->start loop --->split job (parallel job) -->end loop -exit loop
split job (M I -pass group number as constraints and generic file Dep_DSC_Deparment_GROUP.txt with parameterized )

Code: Select all

input file --xfm -->output file
Above development approach is based on the grouping requirement but how can I achieve splitting files as a whole (grouping and non grouping i.e default '0' records)

Please suggest me some ideas to implement.

Thank you

Posted: Thu Aug 18, 2016 7:36 am
by FranklinE
First, it would help to state your original requirements in plain text. You've clearly done a detailed analysis, but I can't follow your train of thought for it. Besides, what you have may contain flaws that no one can help find if we don't know what the original requirements look like.

As I understand it so far, you want to direct a row to a file based on grouping criteria. Further, you have complex groupings that allow a row to be directed to more than one file. My general suggestion is to create an output file of all rows, then query the file for each file-distinct grouping.

This covers two potential problems: you get the row in one file but not in the others to which it should go; you use direct selection criteria. Nothing beats SQL type coding for a clear link between code and requirement.

EDIT: I'm very big on design that has easy access for change. My client users are very big on changing their minds or discovering "new" requirements. Putting your split process in a distinct job -- reading the full file to generate the grouping files -- is an easy place to make changes to grouping criteria and to add new groupings.

Posted: Thu Aug 18, 2016 11:42 am
by Developer9
Hi FranklinE,

Code: Select all

Edited earlier post 
The original requirement to group the employees per Area based on maximum limit set for each group .(<= 4800 per group )

Source File

Code: Select all

Employee Number,Area,Department
For instance
Area 1 -2500 employees ,
Area 2 -2300 employees
Area 3 -2000 employees
Area 4 -2000 employees

There is a chance that Area 3 may present in two output files

File 1--Area 1,2,3 (2500+2300+200 =5000)
File 2 -Area 3,4 (1800+2000=3800 )

Make sure that in any case Particular Area (ex:3) should not be present in more than one file.but how do we calculate we are reaching the limit for each file.

Expected

File1 --Area 1,2 (2500+2300 =4800 )
File2 --Area 3,4 (2000+2000 =4000 )


I will provide more details as things progress in my development.

Thank you

Re: File Split logic ..

Posted: Thu Aug 18, 2016 8:23 pm
by Developer9
Hi,

I used transformer loop concept to convert a single row for data with repeating values in a single column (Dep_DSC_INDICATOR) to multiple output rows.

Transformer loop job

Code: Select all

SEQ (INPUT) ---XFM --SEQ (OUTPUT)
Logic in Transformer

Code: Select all

--Stage Variable 'svDepDelim'
Dcount(Seq_AreaList_Out.Dep_DSC_INDICATOR,",")
--lOOP condition
@ITERATION <= svDepDelim
--Loop Variable
lvDepLoop  --FIELD(Seq_AreaList_Out.Dep_DSC_INDICATOR, ";", @ITERATION, 1)
--Output link metadata and derivations
Dep_DSC_INDICATOR -- lvDepLoop
Output File: OnlyDep_DSC_INDICATOR field is split as below

Code: Select all

Area|Employee_count|Deparment|Dep_DSC_INDICATOR|Dep_GROUP|NON_Dep_DSC_INDICATOR|NON_Dep_GROUP 
1|2500|10|BCF|1||
1|2500|10|DS|1||
3|2500|10|DS|2|DW|0
5|2500|10|BCF|3||
5|2500|10|DS|3||
2|2500|10|BCF|1||
2|2500|10|DS|1||
4|2500|10|BCF|2||
4|2500|10|DS|2||
6|2500|10|DS|3|DW|0
My Question is ,
1. How can I generate output based on NON_Dep_DSC_INDICATOR field based on current value of DW or if it has DW,DX instead for Area 3 ,6-records .

Please advise me from this point on wards.
Thanks

Posted: Fri Aug 19, 2016 6:37 am
by chulett
:!: Developer9 - please use the Reply to topic option as there's absolutely no need to quote everything you are replying to. Saves me from having to come along and clean up as well.

Thanks.

Posted: Fri Aug 19, 2016 10:43 am
by Developer9
chulett wrote::!: Developer9 - please use the Reply to topic option as there's absolutely no need to quote everything you are replying to. Saves me from having to come along and clean up as well.

Thanks.
@chulett,I made the change to earlier post to remove earlier post content.

Thanks for the suggestion ..I hope I did what you were expecting me to do :)

Posted: Fri Aug 19, 2016 11:57 am
by chulett
Oh... so close. Yes, removing the excessive quote was part of the deal but so was not continuing to do it going forward. And then you quoted me asking you to stop quoting. :wink:

Posted: Fri Aug 19, 2016 3:56 pm
by ray.wurlod
Why are you using different delimiter characters in the DCount() and Field() functions?

Posted: Fri Aug 19, 2016 5:28 pm
by Developer9
@ray,

I overlooked while posting in the forum its the same in both functions as per my current code .

Code: Select all

';'
I am still figuring out the way to pass more than one field loop variables .But had some progress on the (Dep_DSC_INDICATOR) file splitting through Sequence loop activity .


Approach I followed

Code: Select all

Department Counter(Loop1)  --when multiple departments in file (1 -Here in example - 10)
Group Counter (Loop2) --Calculate maximum group number for each department (3) --Loop for 3 times (1,2,3)
But I need to add one more loop for existing job for split based on Dep_DSC_INDICATOR

Code: Select all

Delimiter Counter (Loop3)--2 (BCF;DS) get First occurance and use in file mask .
So I am expecting target files in this format.

Code: Select all

BCF_10_1.txt
BCF_10_2.txt
BCF_10_3.txt
Please let me know any ideas for implementing for this scenario.