Variable column numbers (metadata) during File Import

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Vikas Jain
Participant
Posts: 15
Joined: Tue Dec 13, 2005 12:38 am

Variable column numbers (metadata) during File Import

Post by Vikas Jain »

Hi all,
I have to import a file with unknown number of columns in a parallel job. By unknown # of columns, I mean that the meta data information may be different for the same file for different runs. If I use Seq file stage and enable RCP and only give one column name, it rejects all others. If I give max possible no. of columns ( say 60) it rejects the record altogether for the case when it is < 60. I am not sure if it can be achieved by other file stages as well.
Is there any way I can get this working.
Also, I read couple of posts in the forum, but could not find any solution. Kindly help, if you have any approach.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

Caveat: I only have DS 7.5.x Server perspective.

One approach: read the entire line as one long string, use a loop plus the field function to extract each delimited field, exit loop when done.

Second approach: use a sed or awk script to normalize your file to the largest common denominator. If your delimiter is a tilde, this will give the number of tildes in your file:

This snippet will count the number of delimiters in the first 5 lines of a file. Adjust as necessary.

Code: Select all

for f in $*
do
   printf "$f\n"
   head -5 $f | awk -F'~' '{print NF-1}'
done
then add the number of tildes to pad out your file to the desired number.

HTH...
Some people are the Michael Jordan's of Datastage. I'm more like Muggsy Bogues :-)
Post Reply