Importing a sequential file and treating like a table

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
palmeal
Participant
Posts: 122
Joined: Thu Oct 14, 2004 7:56 am
Location: Edinburgh, Scotland

Importing a sequential file and treating like a table

Post by palmeal »

I have created a sequential file (file.txt) that contains data for 4 columns (comma separated). There is no header row in the file (i.e. no column names).

I want to treat the file like a table in either an ODBC or SYBASE_OC stage. I have imported the file using Import..Sequential File Definitions and defined the 4 columns accordingly.

I want to select data from the file and output it to another file - to keep things simple I am just trying to write out the exact same file until I get things working.

On the output tab of the SYBASE_OC stage I added file.txt to the Table Names: box and loaded the file definition on the columns tab.

So the simple query looks like this: SELECT series_id,ptfo_cd,curr_cd,catn_cd FROM file.txt;

When I hit View Data I get the following message:

GetMeasurmentData1..Sybase_OC_0: ct_results has fetched an error
SQL statement: SELECT series_id,ptfo_cd,curr_cd,catn_cd FROM series.txt
GetMeasurmentData1..Sybase_OC_0.DSLink1: DSP.Open GCI $DSP.Open error -100.

I get the same error if I select "use column derivation field".

I tried the same test on the ODBC stage and the error returned when I hit View Data was:

GetMeasurmentData1..ODBC_3.DSLink4: DSD.BCIOpenR call to SQLTables failed.
SQLSTATE=HY000, DBMS.CODE=2762
[DataStage][SQL Client][ODBC][DataDirect][ODBC Sybase Wire Protocol driver][SQL Server]The 'CREATE TABLE' command is not allowed within a multi-statement transaction in the 'tempdb' database.

This is regardless of whatever transaction handling I am using - the error message doesn't seem to fit the query that I am running.


Am I incorrect in the way that I am trying to use a sequential file as a table ?
If I am then how can I use a sequential file as an input table or do I use a different type of file ?
Ideally I want to join this table to other tables actually on a Sybase server.

As always, any help is appreciated :)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Importing a sequential file and treating like a table

Post by chulett »

palmeal wrote:Am I incorrect in the way that I am trying to use a sequential file as a table ?
Totally. :wink: As far as I know, only Oracle has the ability to treat a flat file as an 'external' table. In any case, they aren't close to being the same thing.
If I am then how can I use a sequential file as an input table or do I use a different type of file ? Ideally I want to join this table to other tables actually on a Sybase server.
You 'join' it by either loading into another Sybase table and joining those two or by putting it into a hash file lookup and do the 'join' via the lookup constraint.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try using the ODBC driver for text files rather than the ODBC driver for Sybase. I vaguely recall, however, that one requirement to use this driver is that the text file has column headings in line 1. This may thwart you.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
palmeal
Participant
Posts: 122
Joined: Thu Oct 14, 2004 7:56 am
Location: Edinburgh, Scotland

Post by palmeal »

cheers for the comments but I'm still having problems with this.

I've created an ODBC stage and connected to my sybase server - the ouptut (one row to keep it simple) is written to a hashed file. I have selected the hashed file using a directory path rather than an account name. This creates two files in a new directory of the name I specified in the File Name box.

I then imported the hashed file into Table Definitions using Universe File Definitions - ideally I was hoping that there would be a ODBC File Defintion in my import menu options. Is this what you meant to try Ray when you said

"Try using the ODBC driver for text files rather than the ODBC driver for Sybase"

I then tried to read this file into an ODBC stage so that I could use this to join to other sybase tables. When I hit View Data I don't see the content of the file. Unsurprisingly when I try to read the file using a Universe Stage I can see the data but a Universe Stage will not let me join to a sybase database defined using ODBC.

Am I missing an import option when I do Table Defintions--> Import ?

So far I have learned that I can't elegantly use Sybase Stored Procedures that "do something" and that I can't treat flat files as tables in joins against other sybase tables. The option to load flat file data into a sybase table and do the joins that way will not work as there may be many processes running at one time populating the same table. The only thing that I can see that gives me an option to continue what I am doing is hashed files if I can get them working.

I've been told that version 7.5.1 that supports sybase stored procedures is not an option for me until at least May so my options are somewhat limited. I have to use this tool to do the task in hand so hopefully with a little more help I will get there. :lol:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No, what I meant was that you should define an ODBC data source name (DSN) that uses the ODBC driver for text files to connect to the text file in question. This is one of the drivers supplied in the branded_odbc directory; consult the help there to determine which one (I don't have access to UNIX datastage at the moment).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
palmeal
Participant
Posts: 122
Joined: Thu Oct 14, 2004 7:56 am
Location: Edinburgh, Scotland

Post by palmeal »

Thanks Ray - once America wakes up I'll post a question to my colleagues who look after the server.

Is there anything else that I can do with hashed files to get things moving for me in terms of connecting to sybase servers ?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, stop even believing that hashed files connect to anything. They won't help you to connect to Sybase.

Hashed files are nothing more than an implementation mechanism of a database table, using in databases such as UniVerse, UniData, D3 and others. And, of course, DataStage Engine.

You need either to set up an ODBC connection, or to use Sybase Open Client (the Sybase OC stage or Dynamic Relational Stage) to have DataStage connect to Sybase.

At run time, DataStage will act as a client to Sybase database server, so that you must have Sybase client software (or ODBC driver) installed on the DataStage server machine.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

But does any of this help the OP treat a flat file like a table? :?

I don't think so and assume my advice regarding how to proceed still stands - in addition to yours, of course. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
palmeal
Participant
Posts: 122
Joined: Thu Oct 14, 2004 7:56 am
Location: Edinburgh, Scotland

Post by palmeal »

thanks for the continued assistence Ray/Craig - I understand Ray what you say about Universe's not helping me with Sybase but am missing the link with hashed files that Craig suggested. How can I use hashed files to help me out here - "putting it into a hash file lookup and do the 'join' via the lookup constraint.".

I have ODBC and Sybase OC and have requested ODBC for text files but as said above I need to be able to treat a file/data set as a table that I can join to other sybase tables.

Have any other Sybase users come up against these restrictions re stored procedures/flat files as tables and if so how have you got around them ?
As said before I will not be able to upgrade to 7.5.1 until at least May - if I can get around these problems then it will save me writing my own staging system in VB.NET.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

palmeal wrote:Have any other Sybase users come up against these restrictions re stored procedures/flat files as tables and if so how have you got around them ?
Please don't take this the wrong way, but these are not so much 'restrictions' as a lack of understanding on your part.

The only way you can treat this flat file as a table and truly join it with a Sybase table is to get it up into Sybase and make it another table. I really didn't understand your reason why you couldn't simply create a 'temporary' / work table somewhere out of the way in Sybase, load the contents of the flat file into it and then just join the two tables like normal. :? I do this quite a bit with Oracle to great success. However, assuming that you can't, then you are looking at a full DataStage solution.

My suggestion with the hash file is pretty basic DataStage stuff. Rather than load the flat file's contents into Sybase, load them into a hash file. Define the key fields in the hash file to match the fields you will need to join on, everything else from the file simply becomes data fields there.

Set this new hash file up as a Reference Lookup in a job and stream in the contents of the main Sybase table. A reference lookup - by definition - does an 'equi-join', but the end result is an outer join between the input stream and the contents of the hash file. Whether it's a 'left outer' or a 'right outer' depends on which side of the join you consider the hash file to sit on. :wink: The other difference is that it happens one record at a time as they flow through the job rather than in the database where all you see is the end result.

Define a constraint in the transformer performing the lookup to only pass out rows where the lookup succeeded. You have now, in essence, joined the two tables and turned the outer join into an inner join. Inside the transformer, you can also bring in any appropriate data fields from the hash file when the lookup succeeds, unless all you are doing with the sequential file data is using it to constrain the output of the join.

:!: Caveat: I don't know enough about your data and the volumes you'd be processing to know if this is a good approach or not. While it will work, it may not be the most efficient approach if your input volume is high and the output volume is a small fraction of that. If that is the case, then you are much better off letting the database do the join work for you by loading the flat file into a work table in Sybase.

I hope that helps get you 'on the path'. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
palmeal
Participant
Posts: 122
Joined: Thu Oct 14, 2004 7:56 am
Location: Edinburgh, Scotland

Post by palmeal »

Thanks Craig for the great attention to detail. I am not offended in anyway by your comments and do acknowledge that my understanding of DataStage is not what it should be. I've now been at this solidly for 3 weeks now and as I'm training myself I know that I will go off at a tangent now and again and show a lack of understanding. I've almost talked management into getting me some training - bit like shutting the barn door after the horse has bolted !!

The reason that I don't think I can use a temporary(not tempdb)/workdb table to load my flat files into is if I am running multiple processes concurrently - it makes it hard to identify which data rows belong to which process that is running. I would have to add in a process id column.

I'll have a play with your other suggestions to see what I come up with but the data volumes will be large for the initial population of the Warehouse (50GB) but quite small on a nightly basis so hashed files may be too slow.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

palmeal wrote:I'll have a play with your other suggestions to see what I come up with but the data volumes will be large for the initial population of the Warehouse (50GB) but quite small on a nightly basis so hashed files may be too slow.
Cool. Certainly couldn't hurt to 'play' and at the very least expand your knowledge of DataStage. Also, you might be suprised out fast certain job designs can be. :wink:

I've got one right now that does something similar because the join performance in my source database can be horrendous at times. So, I put a small bit of information into a hash and then stream in upwards of 10 million rows a night from the source. It is a pretty simple basic query and records generally run at 8,000 to 10,000 rows per second through the job, including the lookup to the (preloaded to memory) hash.

Have fun! :lol:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply