File Split logic ..

Developer9 · Post by **Developer9** » Wed Aug 17, 2016 4:16 pm

Hi ,

I have the following requirement I am developing Seq/parallel jobs.

Dep_Group logic --Per group limit -5000 employees (records with Dep_Indicator BCF,DS)
Non_Dep_Group --No grouping required -Assign default '0' (records with Non_Dep_Indicator DW)

Input File

Code: Select all

Area|Employee_count|Deparment|Dep_DSC_INDICATOR|Dep_GROUP|NON_Dep_DSC_INDICATOR|NON_Dep_GROUP
1|2500|10|BCF,DS|1||		
2|2500|10|BCF,DS|1||					
3|2500|10|DS|2|DW|0
4|2500|10|BCF,DS|2||					
5|2500|10|BCF,DS|3||					
6|2500|10|DS|3|DW|0

Expected output

Code: Select all

Dep_DSC_Deparment_GROUP.txt|ExpectedAreas in each file
BCF_10_1.txt (1,2)
BCF_10_2.txt (4)
BCF_10_3.txt (5)
DS_10_1.txt (1,2)
DS_10_2.txt (3,4)
DS_10_3.txt (5,6)
DW_10_0.txt (3,6)

Current logic developed :

Code: Select all

uservariable_group_ctr (max_group counter) -->start loop --->split job (parallel job) -->end loop -exit loop

split job (M I -pass group number as constraints and generic file Dep_DSC_Deparment_GROUP.txt with parameterized )

Code: Select all

input file --xfm -->output file

Above development approach is based on the grouping requirement but how can I achieve splitting files as a whole (grouping and non grouping i.e default '0' records)

Please suggest me some ideas to implement.

Thank you

FranklinE · Post by **FranklinE** » Thu Aug 18, 2016 7:36 am

First, it would help to state your original requirements in plain text. You've clearly done a detailed analysis, but I can't follow your train of thought for it. Besides, what you have may contain flaws that no one can help find if we don't know what the original requirements look like.

As I understand it so far, you want to direct a row to a file based on grouping criteria. Further, you have complex groupings that allow a row to be directed to more than one file. My general suggestion is to create an output file of all rows, then query the file for each file-distinct grouping.

This covers two potential problems: you get the row in one file but not in the others to which it should go; you use direct selection criteria. Nothing beats SQL type coding for a clear link between code and requirement.

EDIT: I'm very big on design that has easy access for change. My client users are very big on changing their minds or discovering "new" requirements. Putting your split process in a distinct job -- reading the full file to generate the grouping files -- is an easy place to make changes to grouping criteria and to add new groupings.

Developer9 · Post by **Developer9** » Thu Aug 18, 2016 11:42 am

Hi FranklinE,

Code: Select all

Edited earlier post

The original requirement to group the employees per Area based on maximum limit set for each group .(<= 4800 per group )

Source File

Code: Select all

Employee Number,Area,Department

For instance
Area 1 -2500 employees ,
Area 2 -2300 employees
Area 3 -2000 employees
Area 4 -2000 employees

There is a chance that Area 3 may present in two output files

File 1--Area 1,2,3 (2500+2300+200 =5000)
File 2 -Area 3,4 (1800+2000=3800 )

Make sure that in any case Particular Area (ex:3) should not be present in more than one file.but how do we calculate we are reaching the limit for each file.

Expected

File1 --Area 1,2 (2500+2300 =4800 )
File2 --Area 3,4 (2000+2000 =4000 )

I will provide more details as things progress in my development.

Thank you

Developer9 · Post by **Developer9** » Thu Aug 18, 2016 8:23 pm

Hi,

I used transformer loop concept to convert a single row for data with repeating values in a single column (Dep_DSC_INDICATOR) to multiple output rows.

Transformer loop job

Code: Select all

SEQ (INPUT) ---XFM --SEQ (OUTPUT)

Logic in Transformer

Code: Select all

--Stage Variable 'svDepDelim'
Dcount(Seq_AreaList_Out.Dep_DSC_INDICATOR,",")
--lOOP condition
@ITERATION <= svDepDelim
--Loop Variable
lvDepLoop  --FIELD(Seq_AreaList_Out.Dep_DSC_INDICATOR, ";", @ITERATION, 1)
--Output link metadata and derivations
Dep_DSC_INDICATOR -- lvDepLoop

Output File: OnlyDep_DSC_INDICATOR field is split as below

Code: Select all

Area|Employee_count|Deparment|Dep_DSC_INDICATOR|Dep_GROUP|NON_Dep_DSC_INDICATOR|NON_Dep_GROUP 
1|2500|10|BCF|1||
1|2500|10|DS|1||
3|2500|10|DS|2|DW|0
5|2500|10|BCF|3||
5|2500|10|DS|3||
2|2500|10|BCF|1||
2|2500|10|DS|1||
4|2500|10|BCF|2||
4|2500|10|DS|2||
6|2500|10|DS|3|DW|0

My Question is ,
1. How can I generate output based on NON_Dep_DSC_INDICATOR field based on current value of DW or if it has DW,DX instead for Area 3 ,6-records .

Please advise me from this point on wards.
Thanks

chulett · Post by **chulett** » Fri Aug 19, 2016 6:37 am

Developer9 - please use the Reply to topic option as there's absolutely no need to quote everything you are replying to. Saves me from having to come along and clean up as well.

Thanks.

Developer9 · Post by **Developer9** » Fri Aug 19, 2016 10:43 am

chulett wrote: Developer9 - please use the Reply to topic option as there's absolutely no need to quote everything you are replying to. Saves me from having to come along and clean up as well.

Thanks.

@chulett,I made the change to earlier post to remove earlier post content.

Thanks for the suggestion ..I hope I did what you were expecting me to do

chulett · Post by **chulett** » Fri Aug 19, 2016 11:57 am

Oh... so close. Yes, removing the excessive quote was part of the deal but so was not continuing to do it going forward. And then you quoted me asking you to stop quoting.

ray.wurlod · Post by **ray.wurlod** » Fri Aug 19, 2016 3:56 pm

Why are you using different delimiter characters in the DCount() and Field() functions?

Developer9 · Post by **Developer9** » Fri Aug 19, 2016 5:28 pm

@ray,

I overlooked while posting in the forum its the same in both functions as per my current code .

Code: Select all

';'

I am still figuring out the way to pass more than one field loop variables .But had some progress on the (Dep_DSC_INDICATOR) file splitting through Sequence loop activity .

Approach I followed

Code: Select all

Department Counter(Loop1)  --when multiple departments in file (1 -Here in example - 10)
Group Counter (Loop2) --Calculate maximum group number for each department (3) --Loop for 3 times (1,2,3)

But I need to add one more loop for existing job for split based on Dep_DSC_INDICATOR

Code: Select all

Delimiter Counter (Loop3)--2 (BCF;DS) get First occurance and use in file mask .

So I am expecting target files in this format.

Code: Select all

BCF_10_1.txt
BCF_10_2.txt
BCF_10_3.txt

Please let me know any ideas for implementing for this scenario.

DSXchange

File Split logic ..

File Split logic ..

Re: File Split logic ..