Field wise record count for different Files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Cherukuri
Participant
Posts: 46
Joined: Wed Jul 25, 2007 2:43 am
Location: India
Contact:

Field wise record count for different Files

Post by Cherukuri »

Hi,

Very good wishes to all!

Could you Please assist on the question on Datastage technically for the below question.

My question:

Source file:

Field1 Field2 Field3
abc 12f sLm
1234 hjd 12d
Hyd 34
Chn

My target file should generate the record count on field wise as below:
Field1 has 4 values and field2 has 3 values and field3 has two values so the target file should be as below.

Target File:

Field1 Field2 Field3
4 3 2

My source file can changes with more columns or less, so the target file should also have the same number of columns with the record count. Please help to resolve this..

Thanks and Regards,
Cheru
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Except for the fields per row changing, this is a simple aggregation exercise. Set a count to 0 for null and 1 for not null and sum them. Do you know the maximum number of fields that could show up?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Cherukuri
Participant
Posts: 46
Joined: Wed Jul 25, 2007 2:43 am
Location: India
Contact:

Field wise record count for different Files

Post by Cherukuri »

Hi Craig,

Thank you for your response.

the maximum number of columns are 150 columns.

Thanks and Regards,
Cheru.
Cheru
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'll have to ponder this to see if there is a more elegant solution out there, hopefully others will chime in as well.

Off the top of my head you could 'brute force' it by handling all 150 columns, it will just be a little tedious to set up but won't affect performance. And best to use a Server job IMHO or a Server Sequential File stage in your Parallel job to handle the 'missing' columns, something the PX version can't do. Either that or read it as one long varchar and then parse it yourself.

The kicker to me is constraining the output to match the input, field wise. I'm thinking you could capture the highest field # with a populated column name but you'd have to read that first record as data to do that. Seems to me you could then write those column headers back out as the first line of the file (don't let the stage create them for you) and then trim the count record down to an appropriate size based on the number of fields in it and write it out as a single string.

Something like that anywho... just pulling things out of thin air right now. Need to get on the road and spend an hour in snowbound traffic to get into work. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

Source --> Transformer1 --> Pivot Enterprise --> Transformer2 --> Aggregator --> Target

Transformer1 : Create derivation like this for all your fields wherein hard-code your field name in Else
If NullToEmpty(Field1) = '' Then '' Else 'Field1'

Pivot Enterprise : Do horizontal pivoting with Derivation having all your fields. Output of pivot will be single field

Transformer2 : Remove null rows

Aggregator: Perform count on field generated from Pivot

If your source is sequential file, then you have to add field delimiters accordingly to make them 150 fields. If you are on Unix server, then try like this on filter option

Code: Select all

awk -F'\t' ' NF != 150 { for(i=NF; i<= 150;i++) { $0=$0 FS } }; 1 ' 
You are the creator of your destiny - Swami Vivekananda
Post Reply