Merge 2 files into 1

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Have you checked out the Merge stage?
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

How did you set it up? You don't connect your files to the stage, you reference the files directly inside the stage and only need an output link.
-craig

"You can never have too many knives" -- Logan Nine Fingers
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

chulett wrote:Have you checked out the Merge stage?
Can the Merge stage take hashed files as inputs? I think you will need to stream them out into sequential files first.

If you can guarantee that one of the hashed files will contain all of the key values that are in the other, then you could stream that hashed file out into a Transformer that uses the other hashed file as a reference link to pick up the other values from it.

If you can't guarantee that one is a strict superset or equivalent set to the other, then you will need to either stream them both out to sequential files and use the Merge stage, or stream them each out using the other as a reference lookup and then through a Link Collector, then a Sort stage, and then do duplicate removal with key change logic in a transformer.
Phil Hibbs | Capgemini
Technical Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sorry, I missed the fact that your sources were hashed files and thought they were flat files for some reason. D'oh! Should have read the body closer and not gone strictly off the subject line.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Jessie
Participant
Posts: 16
Joined: Wed Mar 07, 2007 2:11 pm

Post by Jessie »

I'm new to data stage, is there any detailed documentation? the HELP doesn' thelp much
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

Jessie wrote:I'm new to data stage, is there any detailed documentation? the HELP doesn' thelp much
I seem to remember that there are two different "Merge" stages, make sure the documentation you read is for the Server version, not the PX or EE version. Server version specifies two input files (and is very annoying to use), PX version takes two stream input links. I assume you are creating Server jobs as you are posting in the Server forum.

Also there's another gotcha with the Merge stage, which I've already documented in this forum. It doesn't treat its input files in the same way that the Sequential File stage does - Sequential File does Excel-style quote-doubling so the string A Rusty 6" Nail gets written out as "A Rusty 6"" Nail", whereas if this file is used as one of the inputs to the Merge stage it will remove the quotes after the 6 entirely. I think this can be fixed by removing the \ escape character in the Merge Stage dialog, but I'm not sure.
Phil Hibbs | Capgemini
Technical Consultant
Jessie
Participant
Posts: 16
Joined: Wed Mar 07, 2007 2:11 pm

Post by Jessie »

does it need extension like .txt when create sequence file?

I have 2 seq_file for the 'Merge' stage, but cannot pass validation, the error is 'Link property retrieval error' any idea?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The extension doesn't matter. Can you copy/paste the full errors you are seeing in your logs, please?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Jessie
Participant
Posts: 16
Joined: Wed Mar 07, 2007 2:11 pm

Post by Jessie »

test2..Sequential_File_2.IDENT1: DSD.StageRun Active stage starting, tracemode = 0.

test2..Merge_1: Stage Properties
> First File Path = [/home/tttt/S_al.txt]
> Second File Path = [/home/tttt/S_Med.txt]
> Working Directory = [/home/tttt/S_temp]
> Stage Trace Level = [1]

test2..Merge_1: Error opening first input file

test2..Merge_1: Link property retrieval error

Attempting to Cleanup after ABORT raised in stage test2..Merge_1

Job test2 aborted.


is there a better way to copy the log? I did it one by one.

thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You could print the full detail log to a file first, then work from there. If you Reset the aborted job, is a 'From previous run...' message added to the log?
-craig

"You can never have too many knives" -- Logan Nine Fingers
raj158347
Participant
Posts: 26
Joined: Thu Apr 19, 2007 5:15 am
Location: Chennai

Post by raj158347 »

Hi,
You can merge two has files into 1 files by converting one files into seq file and using transformer

Step :1
Convert the primary hash(file) file into Seq File

Step :2
Use transformer to merge the file
Seq file as input and hash files as ref
O/p file will be Hash file like you final file with K1,K2 C1,C2,C3,C4,C5,C6
K1,K2,C1,C2,C3 should be taken from Seq File
C4,C5,C6 should be taken from Hash file Ref link


Limitation
Incase if you more in the hash file which is not in the seq file(file1) that will not output..

Regards
Raj
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Code: Select all

              HashedFile2
                   |
                   | 
                   V
HashedFile1 ---> Transfomer ---> Target
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bkumar103
Participant
Posts: 214
Joined: Wed Jul 25, 2007 2:29 am
Location: Chennai

Post by bkumar103 »

It is not required to use sequential file as the input link. Hash file can also be used as the input link as the sequential file which can do lookup with another hash file.
Birendra
Jessie
Participant
Posts: 16
Joined: Wed Mar 07, 2007 2:11 pm

Post by Jessie »

[quote="ray.wurlod"][code] HashedFile2
|
|
V
HashedFile1 ---> Transfomer ---> Target[/code] ...[/quote]

what is it? :(
I'm too new to Datastage,
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

Jessie wrote:
ray.wurlod wrote:

Code: Select all

              HashedFile2
                   |
                   | 
                   V
HashedFile1 ---> Transfomer ---> Target
...
what is it? :(
I'm too new to Datastage,
Hello,

What Ray Wurlod is suggesting is, if you have two hashed files and you want to merge them together, design your job in such a way:

I/P: Hash File 1
O/P: Target (whatever your target is)
Lookup: Hash file 2

Inside the transformer, get all your desired rows and put them into the output just like the way you wanted: Key1, Key2, 22, 33, 44, 55...

If the keys are the same and you want to join them, you can do so. Or else, may be you can create a dummy field (with a constant default value) in the lookup Hash file and inside the transformer, you can hard code the field with that value and get all the other fields into the output. There are a lot of ways to do it and I am sure there are a lot of other exotic solutions.

Hope this helps...
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
Post Reply