Sequential files, filter command and \000 characters

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Sequential files, filter command and \000 characters

Post by ArndW »

I have a fixed width sequential file that contains \000 (null) characters coming from a mainframe. From UNIX, I can execute

Code: Select all

cat TestFile.txt | tr '\000' '?'
and it successfully replaces all nulls with the question mark character. But I cannot get that to work from the filter command in the sequential file stage.

If I use a sequential file stage without a filter on that fixed width file it reads the file correctly. If I put in a filter command that merely echos the input ("cat -") it also works correctly.

But if I put in the tr command "tr '\000' '?'" as listed above it should work, but it doesn't - the null characters are stripped out of the string; and the job gives warning due to a short read on the fixed length record and those records with nulls are skipped.

If I change the command to replace all spaces with question marks (using \040 instead of \000) then the tr command replaces spaces correctly but the nulls are still automagically removed. This tells me that the problem isn't with quote characters and backslashes and 'escaping' strings in the filter stage.

The files in question are large, so I cannot easily shell out to make a cleansed temporary copy and the incoming data format is fixed. I do need to handle the nulls (and other non-displayable characters) outside of DataStage because, in addition to the tr command, I also need to remove the last trailing character using the

Code: Select all

sed '$s/.$//'
but the sed command strips out \000 and I cannot find a way to change that behaviour on sed.

I don't think that this problem is restricted to V8 and was curious if anyone has any ideas on what else I might try. Will perl allow character substitutions?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Try enclosing your cat and tr command in a shell. Let the shell spit out the result to stdout. Call this script in the filter command and see if that does the trick.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Now it gets even stranger. If I do a "cat testfile.txt | tr '\000' '?' > testfile2.txt" on a file from my default shell, sh, or from a ksh it will work as expected and have the same length but replace 000 characters. DataStage here is set up so that the default shell is ksh and I've verified that it opens a ksh session for external commands. But if I write a DS routine and user either DSExecute('UNIX'...) or a DataStage TCL shelling out to UNIX and issue the same tr command as above the nulls are stripped out. I cannot for the life of me see a difference. It does seem to be something to do with UNIX as opposed to DataStage, though.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Its tough when gurus themselves have a problem. I wish I could help. :roll:
gateleys
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I wrote a simple script which does the tr command. When I call the script from my shell session it works correctly. When I call the same script from DataStage (either from a Job Sequence or from the Before-Job tab in a job) it doesn't work the same way and all nulls are stripped out. I have not been able to reproduce the null stripping from the command line. Can anyone think of an environment setting that might affect this behaviour?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Here is an easy example. If anyone here has a minute or two time to see if they have the same problem (perhaps someone not on V8 or AIX):

1. From your UNIX command line enter

Code: Select all

echo "\000" | tr '\000' '?' | wc
This should give an output of 1 1 2 {1 line, 1 word, and 2 characters as UNIX will append a terminating LF character}

2. Write a dummy job and in the before-job tab using ExecSH enter the exact same command.

In my case I get an output of 1 0 1 when executed from DataStage, because the null character is dropped. This also happens if the command is in a script called from DataStage. Can anyone reproduce this on their system?

Many thanks!
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

I have tried it. In my case both the outputs are the same

Code: Select all

BeforeJob (ExecSH): Executed command: echo "\000" | tr '\000' '?' | wc
*** Output from command was: ***
1 1 2
I am no HPUX and DataStage 7.5.1.
Hope it helps.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Maveric - Many thanks for your help!!!

It does narrow down the problem so that now it is no longer a general issue common to all implementations, so now we'll look at OS and/or Version related settings.

It would be great if someone could try just the Job portion on AIX and 7.5x - Aakash, are you around?
dohertys
Participant
Posts: 39
Joined: Thu Oct 11, 2007 3:26 am
Location: Sheffield

Post by dohertys »

Hi Arnd! I don't think Aakash is around today.

I get...

: Executed command: echo "\000" | tr '\000' '?' | wc
*** Output from command was: ***
1 1 2
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Thanks Steve!!!

Your result means that it isn't an AIX issue (unless this box has some odd configuration setup); so it looks like it is a V8 issue. Now, in order for DSXchange to do much of IBM Support's work, we need one more volunteer from a Version 8 system on some other platform :)

I've opened up a support call for this and thanks to your help can add this information to the ticket!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

It seems to be the way that DataStage calls up the ksh at version 8. If, from my shell, I execute the following two commands I see that the second one doesn't work as expected. I have no idea why the two commands run differently.

Code: Select all

echo "\000" | tr '\000' '?' | wc
ksh -c echo "\000" | tr '\000' '?' | wc
I wonder if this is restricted to AIX? Anyone willing to experiment on their system and posting their results?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

IBM Support has unraveled the mystery.
In DS 7.5.2, the following tr command is being called:
/usr/bin/tr

In DS 8.0.1, the following tr command is being called:
/usr/ucb/tr
It seems that the /usr/ucb directory is being prepended and the tr command in that directory acts differently from the default one. Explicitly setting /user/bin/tr in the filter command solved the problem.
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

Hey ArndW,

Assuming you got this figured out with your last post, what did you end up using for your final code? I am having a similar issue but my files contain all kinds of non-printing characters. I'm just wondering if I can learn from your example.

Thanks!
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

Hey ArndW,

Assuming you got this figured out with your last post, what did you end up using for your final code? I am having a similar issue but my files contain all kinds of non-printing characters. I'm just wondering if I can learn from your example.

Thanks!
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Very simple replacement string if the system has Perl

Post by jdmiceli »

Hi all,

I know this is kind of old and already resolved, but just in case anyone is interested. If the OS you are on has Perl installed (standard on must Unix/Linux installations), then the following command can be run as a before/after command as a RunSH wherever you need it in your process:

Code: Select all

perl -ple 's/\000/\|/g;s/[[:^print:]]//g' -i filename
What this does is replace any instances of a null string (\000) with a pipe. You can of course change that to anything you want if you use a different delimiter. The second statement takes any other non-printing characters and simply removes them. The '-i' makes the script work on the file in place (so it is destructive in that respect). If you would prefer it didn't do it in place and want it to make another file instead then modify the code as follows:

Code: Select all

perl -ple 's/\000/\|/g;s/[[:^print:]]//g' filename > newfilename
I don't know if this will help anyone, but one line fits in well with simple things like this. Standard parameter rules apply by the way, so you should be able to control the filenames as you need to.

Hope this helps!
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
Post Reply