Merge 2 files into 1

chulett · Post by **chulett** » Mon May 12, 2008 9:23 am

Have you checked out the Merge stage?

chulett · Post by **chulett** » Mon May 12, 2008 10:01 am

How did you set it up? You don't connect your files to the stage, you reference the files directly inside the stage and only need an output link.

PhilHibbs · Post by **PhilHibbs** » Mon May 12, 2008 10:47 am

chulett wrote:Have you checked out the Merge stage?

Can the Merge stage take hashed files as inputs? I think you will need to stream them out into sequential files first.

If you can guarantee that one of the hashed files will contain all of the key values that are in the other, then you could stream that hashed file out into a Transformer that uses the other hashed file as a reference link to pick up the other values from it.

If you can't guarantee that one is a strict superset or equivalent set to the other, then you will need to either stream them both out to sequential files and use the Merge stage, or stream them each out using the other as a reference lookup and then through a Link Collector, then a Sort stage, and then do duplicate removal with key change logic in a transformer.

chulett · Post by **chulett** » Mon May 12, 2008 11:04 am

Sorry, I missed the fact that your sources were hashed files and thought they were flat files for some reason. D'oh! Should have read the body closer and not gone strictly off the subject line.

Jessie · Post by **Jessie** » Mon May 12, 2008 11:07 am

I'm new to data stage, is there any detailed documentation? the HELP doesn' thelp much

PhilHibbs · Post by **PhilHibbs** » Mon May 12, 2008 11:15 am

Jessie wrote:I'm new to data stage, is there any detailed documentation? the HELP doesn' thelp much

I seem to remember that there are two different "Merge" stages, make sure the documentation you read is for the Server version, not the PX or EE version. Server version specifies two input files (and is very annoying to use), PX version takes two stream input links. I assume you are creating Server jobs as you are posting in the Server forum.

Also there's another gotcha with the Merge stage, which I've already documented in this forum. It doesn't treat its input files in the same way that the Sequential File stage does - Sequential File does Excel-style quote-doubling so the string A Rusty 6" Nail gets written out as "A Rusty 6"" Nail", whereas if this file is used as one of the inputs to the Merge stage it will remove the quotes after the 6 entirely. I think this can be fixed by removing the \ escape character in the Merge Stage dialog, but I'm not sure.

Jessie · Post by **Jessie** » Mon May 12, 2008 12:54 pm

does it need extension like .txt when create sequence file?

I have 2 seq_file for the 'Merge' stage, but cannot pass validation, the error is 'Link property retrieval error' any idea?

chulett · Post by **chulett** » Mon May 12, 2008 2:29 pm

The extension doesn't matter. Can you copy/paste the full errors you are seeing in your logs, please?

Jessie · Post by **Jessie** » Mon May 12, 2008 2:53 pm

test2..Sequential_File_2.IDENT1: DSD.StageRun Active stage starting, tracemode = 0.

test2..Merge_1: Stage Properties
> First File Path = [/home/tttt/S_al.txt]
> Second File Path = [/home/tttt/S_Med.txt]
> Working Directory = [/home/tttt/S_temp]
> Stage Trace Level = [1]

test2..Merge_1: Error opening first input file

test2..Merge_1: Link property retrieval error

Attempting to Cleanup after ABORT raised in stage test2..Merge_1

Job test2 aborted.

is there a better way to copy the log? I did it one by one.

thanks

chulett · Post by **chulett** » Mon May 12, 2008 2:56 pm

You could print the full detail log to a file first, then work from there. If you Reset the aborted job, is a 'From previous run...' message added to the log?

raj158347 · Post by **raj158347** » Tue May 13, 2008 2:09 am

Hi,
You can merge two has files into 1 files by converting one files into seq file and using transformer

Step :1
Convert the primary hash(file) file into Seq File

Step :2
Use transformer to merge the file
Seq file as input and hash files as ref
O/p file will be Hash file like you final file with K1,K2 C1,C2,C3,C4,C5,C6
K1,K2,C1,C2,C3 should be taken from Seq File
C4,C5,C6 should be taken from Hash file Ref link

Limitation
Incase if you more in the hash file which is not in the seq file(file1) that will not output..

Regards
Raj

ray.wurlod · Post by **ray.wurlod** » Tue May 13, 2008 4:55 am

Code: Select all

              HashedFile2
                   |
                   | 
                   V
HashedFile1 ---> Transfomer ---> Target

bkumar103 · Post by **bkumar103** » Tue May 13, 2008 5:26 am

It is not required to use sequential file as the input link. Hash file can also be used as the input link as the sequential file which can do lookup with another hash file.

Jessie · Post by **Jessie** » Tue May 13, 2008 3:08 pm

[quote="ray.wurlod"][code] HashedFile2
|
|
V
HashedFile1 ---> Transfomer ---> Target[/code] ...[/quote]

what is it?

I'm too new to Datastage,

vivekgadwal · Post by **vivekgadwal** » Tue May 13, 2008 3:49 pm

Jessie wrote:
ray.wurlod wrote:
Code: Select all
              HashedFile2
                   |
                   | 
                   V
HashedFile1 ---> Transfomer ---> Target
...
what is it?
I'm too new to Datastage,

Hello,

What Ray Wurlod is suggesting is, if you have two hashed files and you want to merge them together, design your job in such a way:

I/P: Hash File 1
O/P: Target (whatever your target is)
Lookup: Hash file 2

Inside the transformer, get all your desired rows and put them into the output just like the way you wanted: Key1, Key2, 22, 33, 44, 55...

If the keys are the same and you want to join them, you can do so. Or else, may be you can create a dummy field (with a constant default value) in the lookup Hash file and inside the transformer, you can hard code the field with that value and get all the other fields into the output. There are a lot of ways to do it and I am sure there are a lot of other exotic solutions.

Hope this helps...