Page 1 of 1

how to count the number of delimiters in a reord

Posted: Tue Dec 12, 2006 4:10 pm
by kollurianu
Hi All,

iam getting a pipe delimited file.

iam getting warnings saying that too many columns in a record , looks like the some of the field values are having pipes , so inorder to eliminate those records , i want pass the record only when it has right number of delimiters , so inorder to check the number of delimiters in the record , what is better way to do this?

i was guessing DCOUNT function might be helpful , but not as it would count two consequtive delimiters as one delimiter.

any inputs , most welcome.

thank you all.

Posted: Tue Dec 12, 2006 4:14 pm
by kollurianu
i just re read the fundtionality of DCOUNT , i guess that is going to help.

Thank you all.

Posted: Tue Dec 12, 2006 4:37 pm
by DSguru2B
Yes. Count and DCOUNT both will work. You need to read your row as a single column and something like DCOUNT(in.Link,"|") = x in contraint where x is the number of pipes your looking for.

Posted: Tue Dec 12, 2006 7:30 pm
by chalasaniamith
first question back to you
is it a fixed length file or a variable length file?

if its a fixed width file you can get the length of each column right.Then check the length of the column and check for | and replace by space or some thing default value you want.
then you wont have any problem

let me know if it works

Re: how to count the number of delimiters in a reord

Posted: Tue Dec 12, 2006 7:35 pm
by I_Server_Whale
kollurianu wrote:Hi All,

iam getting a pipe delimited file.

Posted: Tue Dec 12, 2006 7:39 pm
by chalasaniamith
sorry about that

Posted: Tue Dec 12, 2006 8:31 pm
by DSguru2B
A pipe delimited would automatically mean that its not fixed. A fixed width file has no delimiter, just that every column's length is defined and the values are picked based upon that column length.

Use AWK

Posted: Wed Dec 13, 2006 12:35 am
by Krishna Nair
This is the ideal scenario to go for Unix tools like AWK scripting.
This can be easily done in awk.

Posted: Wed Dec 13, 2006 12:49 am
by chulett
... and easily done in DataStage as well.

Posted: Wed Dec 13, 2006 1:12 am
by ray.wurlod
More easily done in DataStage for those who don't know or don't have awk.

Posted: Wed Dec 13, 2006 7:49 am
by chulett
Exactly. One tool to rule them all... :wink:

Posted: Fri Dec 15, 2006 4:08 am
by aakashahuja
Since this is a troubleshooting activity, so u can use

awk < YOUR_FILE_PATH_AND_NAME ' BEGIN {FS="<YOUR_DELIMITER_CHAR>"} {print NR " : " NF+1}'

The above will give you the line number followed by its lengths. This can be cumbersome if your file is too big. IN that case you can simply use

awk < YOUR_FILE_PATH_AND_NAME ' BEGIN {FS="<YOUR_DELIMITER_CHAR>"} {print NF+1}'
# The above will give you the culprit length. Lets says it is BadLen

then use

awk < YOUR_FILE_PATH_AND_NAME ' BEGIN {FS="<YOUR_DELIMITER_CHAR>"} {print NF+1}' | grep -n BadLen

This will give you the line number which has less or more number of
columns.

Hope this helps.

Cheers

Posted: Fri Dec 15, 2006 6:50 am
by DSguru2B
Once again. Stick to DataStage. You will have much lesses pieces to bind and maintain.

Posted: Sat Dec 16, 2006 6:40 pm
by Raghavendra
As your file is a piped delimited file you will be knowing the number of pipes (delimiters) per record. That is; you might know the number of columns present in the record.You can use an Index function to check the number of delimiters here.

Check whether you can use for pipe symbol.

Index function syntax is Index(string,substring,instance)

Posted: Sat Dec 16, 2006 6:44 pm
by ray.wurlod
Index() is the wrong function. It reports the character position in the string of the nth occurrence of the substring.

Count() or DCount() is the correct function for counting instances, or fields delimited by instances of a delimiter, respectively.