Page 1 of 1

Reading in a binary file

Posted: Mon Apr 17, 2006 9:26 am
by EJRoufs
For my first time ever using DataStage, I am trying to read in a binary file. Is there an easy way to do that? I'm guessing (hoping) there is, but still looking for the way. So thought I'd throw this question out there while I continue my search.

Thanks for any info! :)

Posted: Mon Apr 17, 2006 9:45 am
by kcbland
Define "binary". Are you talking about an EBCDIC file? See the CFF stage.

Remember, a Server Sequential stage expects to either read a delimited row ending with a line terminator (CR/LF, LF, user-defined) or a fixed width row with optional line terminator (read to line terminator or via set block size).

Reading a binary file, you're probably not dealing with "rows" but blocks of characters. If you can read Nchars at a time and deal with them, then you can use the Sequential stage. However, you get into situations where there are character values that are reserved by DS and you run the risk doing so.

If your data cannot be mapped into rows with columns, please clarify what you need to do.

Posted: Mon Apr 17, 2006 9:49 am
by EJRoufs
kcbland wrote:Define "binary". Are you talking about an EBCDIC file? See the CFF stage.

Remember, a Server Sequential stage expects to either read a delimited row ending with a line terminator (CR/LF, LF, user-defined) or a fixed width row with optional line terminator (read to line terminator or via set block size).

Reading a binary file, you're probably not dealing with "rows" but blocks of characters. If you can read Nchars at a time and deal with them, then you can use the Sequential stage. However, you get into situations where there are character values that are reserved by DS and you run the risk doing so.

If your data cannot be mapped into rows with columns, please clarify what you need to do.
Nope, not EBCDIC. I've read in EBCDIC files from the mainframe before with no problems... I just convert it to ASCII with a Transformer stage. This is different. They are sending me a server file (ASCII), but they said it was a Binary file, so it basically looks like garbage to me.

Posted: Mon Apr 17, 2006 10:01 am
by kcbland
What's the structure of the binary file, did they tell you how to read it?

Posted: Mon Apr 17, 2006 10:05 am
by EJRoufs
kcbland wrote:What's the structure of the binary file, did they tell you how to read it? ...

Nope. That's normal with a lot of the files I get. I just have to start taking a look at them and kind of figure it out.... basically, I'm "winging" it. :) And this one is no different, except that I can't tell what anything is, because it looks like garbage. ;)

Posted: Mon Apr 17, 2006 10:10 am
by kcbland
Well, how do you know what constitutes a "row" of data, when it begins and ends, what attributes make up a column and meaning, etc? Sounds like you need to crack a few heads. :x

You could invent a Karnac (sp?) stage that figures out the answer without knowing the question. Sounds like a Douglas Adams story to me.

Posted: Mon Apr 17, 2006 10:18 am
by EJRoufs
kcbland wrote:Well, how do you know what constitutes a "row" of data, when it begins and ends, what attributes make up a column and meaning, etc? Sounds like you need to crack a few heads. :x

You could invent a Karnac (sp?) stage that figures out the answer without knowing the question. Sounds like a Douglas Adams story to me.

I am usually lucky enough that most of our data starts out with a Production Date or such, so if I can find those dates, I can usually determine row sizes by that.

The Karnac Stage definitely would be useful. Could make millions! :)

Posted: Mon Apr 17, 2006 3:03 pm
by ray.wurlod
Surely "they" produced the file using some kind of metadata or design. Get "them" to supply the file layout(s) to you.

"Binary" is OK if the data are fixed-width columns. For example, an integer will occupy four bytes. You can declare it as Char(4) and use Oconv() to turn it into a string of ASCII decimal digits. Or a CFF stage can handle binary data directly.

Posted: Tue Apr 18, 2006 12:59 am
by ArndW
It usually helps quite a lot when you know which HW and which language generated the file - i.e. with Cobol when you have "garbage" you can try an EBCDIC conversion to see if it makes more sense, and if that doesn't work you know that the binary is probably going to be some form COMPutational format. If you don't see the typical binary patterns for signed number then you are stuck (unless you assume that the text is in some UTF-8 encoding for a dead language that some professor managed to convince the ISO comittee to include in the character set.
If it is PL/1 you will have excess-128 numeric representations for integers and the first words of VarChar strings and floating points are usually easy to detect by their typical format for the exponent portion.
But the only thing to do is to try to get your source metadata - there is no Rosetta Stone stage in 7.x.

Posted: Fri Jun 02, 2006 10:48 am
by jdmiceli
EJRoufs wrote:
kcbland wrote:Define "binary". Are you talking about an EBCDIC file? See the CFF stage.

Remember, a Server Sequential stage expects to either read a delimited row ending with a line terminator (CR/LF, LF, user-defined) or a fixed width row with optional line terminator (read to line terminator or via set block size).

Reading a binary file, you're probably not dealing with "rows" but blocks of characters. If you can read Nchars at a time and deal with them, then you can use the Sequential stage. However, you get into situations where there are character values that are reserved by DS and you run the risk doing so.

If your data cannot be mapped into rows with columns, please clarify what you need to do.
Nope, not EBCDIC. I've read in EBCDIC files from the mainframe before with no problems... I just convert it to ASCII with a Transformer stage. This is different. They are sending me a server file (ASCII), but they said it was a Binary file, so it basically looks like garbage to me.
Is it possible that the file is UUEncoded? It would look like garbage.

Don't know if that helps,
Bestest,