joining to sequencial files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
easwaran
Participant
Posts: 7
Joined: Mon Dec 30, 2002 9:22 am
Location: USA

joining to sequencial files

Post by easwaran »

Dear all,
I have two different sequencial files and having a common column data as key column. Can the joiner stage used to join these to files and extract data and load to oracle database.
Other thing can I join the sequncial file to a oracle table and process further.
Any help will be appreciated.

--Easwaran
DSGuru
Participant
Posts: 16
Joined: Fri Feb 15, 2002 4:46 pm

Post by DSGuru »

Yes, you can.
Use the merger stage to merge two sequential files and output to a transformer, then use an OCI lookup to lookup for data from Oracle. The output of the transformer then can insert data into Oracle table.

Good luck.

Leon
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Do you want to join the files or merge them?

As DSGuru mentioned you can merge the two files to form a union data stream. However if you want to join the two files using a shared key then you will need to load one of them into a hash file first. Sequential files cannot be used as lookups in DataStage jobs.

You job may look like this:

Seq File 1 ----> Hash File 1 Hash File 2 Tranformer -----> Oracle

The job starts by loading up the two hash files, one has the sequential file lookup data, the other has the Oracle lookup data. The second sequential file is then processed and put into Oracle.

regards
Vincent
easwaran
Participant
Posts: 7
Joined: Mon Dec 30, 2002 9:22 am
Location: USA

Post by easwaran »

Thanks Leon and Vincent. I was looking for the answer given by Vincent.
Since there is a size limit in Hash files, I have to joing the two falt files on the key column. If it is not psooible then I can load into a oracle staging table and look up on the flat file to load the file is it ok, or both the files I have to load to oracle staging tables.
one more question, I have three hash files, c1 is key in hash1, c1, c3 are key in hash2 and c3 is key column in hash3.
Can I link all the three and join to a single transformer?

Thanks in advance.

--easwaran
DSGuru
Participant
Posts: 16
Joined: Fri Feb 15, 2002 4:46 pm

Post by DSGuru »

Many people mistake the merge stage as an appending stage but it's more than that. It serves many functions, Inner join, complete set, right outer join, left outer join, etc. these joins require a common key or more. The merge stage just takes two sequential files for input, and it's a little tricky to join two sequential files, you need to include all the keys from both tables in your output. That's why I suggested merger -> transformer -> oci, so you can exclude un-wanted columns in the transformer.

To have a uion data stream sequential file as a sql1 union all sql2 in Oracle, we simply do appending sequential files. For example, if you want to include all rows from file1 and file2, you simply have two sequential stages. The first stage is where you want your data to append from, and the second stage is where you want your data to. So if you want file2 to append to file1, then first stage is file2, and second stage is file1. In you second stage select "append to" option. This would give you one stream data file1(result) = file1 + file2. It's not the case that you're looking for.

About your situation, I wonder if you want to have all rows from file2, and lookup fields from file1, and file3. Or you want to inner join all three files?
If you want all rows from file2, then for simplicity, just load file2 -> oracle staging table with truncate and insert option, then load file1, and file3 to the same table with update option.
If you want inner join three files, then hash two files (file1, file3) then take file2 to transfomer, and lookup those hash files. This option also works if you want all rows from file2 with no constraint. If you want an inner join of the three files then you need to put in constraint to exclude rows where the lookup fails.

Good Luck.

Leon
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Hey easwaran, I'm still a bit confused but I'll give it a go.

Firstly, you can have as many hash file lookups as you want in a single transform. They can all use different keys. In your transform you'll see each hash file on the left hand side of the diagram, just drag your fields from your input file onto each of your hash file lookup key fields.

You can bulk load your sequential files into Oracle staging tables and then join them together in a SQL select statement. Since at least one of your lookups is already in Oracle you may find this faster than creating hash file lookups. You will need indexes on those tables to optimise the join.

Regarding "size limit in Hash files", a standard Hash file can take up to 2GB of data, if you need a bigger hash file do a search in the forum archive for the subject "Hash file maximum size" and you will find Ray Wurlod's instructions on creating hash files over 2GB.

regards
Vincent
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What is the "size limit in hashed files" that is concerning you?
A hashed file can be as large as the operating system permits a file to be; depending on the OS this can be up to 9 million TB.
(The default hashed file has a maximum size of 2GB, but this is not a ceiling except on 32-bit operating systems such as Windows. But there is even a workaround, called a Distributed file, in this case.)


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
easwaran
Participant
Posts: 7
Joined: Mon Dec 30, 2002 9:22 am
Location: USA

Post by easwaran »

Thanks for all the inputs.
Ray Wurlod mentions that a HAsh file can be as big as 9 million TB. But in our unix operating system, we can not ftp more than 2Gb files. It has a limitaion of 2GB. Also we tried in Windows 2000, if the Hash file grows bigger than 2GB then it stops the Hashing process. Also the file gets corrupted. Hence we have been forced to load all data into a oracle staging tables. Then we used staging table as input source for further processing.
If there is a way to get distributed hash files, please explain how to do that.
One additional quick question--is the Ardent Data Stage and Ascential Data Stage are same or different?
Thanks in advance
Easwaran
kjanes
Participant
Posts: 144
Joined: Wed Nov 06, 2002 2:16 pm

Post by kjanes »

There is a parameter in the "uvconfig" file (filepath DataStage/DSEngine) that by default is set to 32 Bit even when installed on Unix/AIX.

The parameter is 64BIT_FILES. Set it to 1 to exceed the 2GB limit. We ran into this same problem. Save the changes to uvconfig. Stop and restart your DataStage server.

Kevin Janes
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Changing 64BIT_FILES is not recommended. Instead, create your hashed files with the 64BIT keyword when necessary. Don't do it for hashed files that will never be anywhere near 2GB.
Secondly, 64-bit support must exist in the operating system. Windows operating systems do NOT have 64-bit support. In most UNIX versions, 64-bit support must be explicitly enabled via a kernel setting. The fact that you can not FTP more than 2GB may be an operating system limit, or may be a limit imposed by the FTP program you are using. Check with your UNIX administrator.
Third, Kevin has missed one vital step in reconfiguring; after changing uvconfig you must build a new image by executing uvregen, before re-starting DataStage.

Quick History Lesson
The company that initially released DataStage was called VMARK Software. It merged with UniData to become Ardent Software. Later Ardent sold part of its business (the non-DataStage part) to Informix while the DataStage part re-invented itself as Ascential Software.
Thus, Ardent DataStage and Ascential DataStage are the same beast.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Ardent owned DataStage, they were bought by Informix, IBM then bought the Informix database products, the remaining half of the company became a new company called Ascential. Ascential now focus on data integration. Ray could probably even tell you who owned DataStage before Ardent!

Why are you trying to ftp your hash files? Hash files should only exist on the DataStage server. The common process is to ftp your sequential files onto your DataStage server, load the lookup data into Hash files on that same server, then load the data files into your database. Don't ftp your hash files!



Vincent McBurney
Data Integration Services
www.intramatix.com
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Whoops, Ray beat me to it. Answered the question while I was still typing.

Vincent McBurney
Data Integration Services
www.intramatix.com
kjanes
Participant
Posts: 144
Joined: Wed Nov 06, 2002 2:16 pm

Post by kjanes »

Ray,
we ran into a situation where we were creating sequential output files that exceeded 2GB and the job would just blow up. Ascential Advanced Engineering has us change the 64BIT flag in uvconfig to fix the problem. It has worked without any problems ever since. We hammer on DataStaqe with over 500 weekly jobs that run in about 12 hours significant amounts of data. We are not a small shop. We use server side and /390 and have pushed the limit on DataStage's capabilities as far as complex coding goes. In addition, we were a test site for Viper and P/E.

We have not shown that setting the 64Bit switch has any adverse affects so far. Have you experienced anything different?

-KJ

Kevin Janes
kjanes
Participant
Posts: 144
Joined: Wed Nov 06, 2002 2:16 pm

Post by kjanes »

My replies are specifically towards outputs/sequential files. Not Hash Files. A 2GB hash file seems fairly large. Can it be useful at that size in most cases? I think we shy away from hash files when they are excessively large.

-KJ

Kevin Janes
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The 64BIT_FILES flag ONLY affects hashed files (and therefore UniVerse tables).
We recommend leaving it off because the majority of hashed files (the ones DataStage creates for the Repository for example) don't have to be anywhere near 2GB, and therefore should not need to carry the overhead of processing 64-bit internal pointers unnecessarily. Of course, it WILL work, but it's unnecessary work.
I suspect Advanced Engineering changed the 64BIT_FILES flag in your case because the volumes of data you were processing meant that some hashed files were likely to exceed 2GB, and it was easier (and cheaper for you, if they're charging by the hour) to change a global setting rather than to determine which hashed files actually needed this capability.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Post Reply