Capture input XML file name

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
collabxchange
Premium Member
Premium Member
Posts: 34
Joined: Thu Aug 28, 2014 8:48 pm
Location: United States

Capture input XML file name

Post by collabxchange »

Hi,
I have a simple job whose design is

XML Stage ---> Column Gen ---> NZ

I am reading the input file as a wild card character. E.g. customer*.txt when the actual file name can be customer_206789_USA.txt.

Is there an easy way to capture the actual file name as a column in the design so that I can populate it in the target table?
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Yes. Don't use the XML Stage to read the file. Use the External Source Stage, which can receive a unix list (ls) command with whatever wildcards you want (I generally use a column called "filenames", with varchar and length of 250)....it will then send a "list" of filenames downstream. The XML Stage is set up to handle that, and then you also have the filename in a column to do whatever you need.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
collabxchange
Premium Member
Premium Member
Posts: 34
Joined: Thu Aug 28, 2014 8:48 pm
Location: United States

Post by collabxchange »

Ernie - Thank you for your response. As you suggested I changed the design to read the input XML file by External Source. But I am not able to figure out how to map it in the XML stage.

http://i.imgur.com/mUWL6Wo.png?1

http://i.imgur.com/oWpoeO9.png?1

http://i.imgur.com/20UtWmd.png?1

http://i.imgur.com/LLevtwo.png?2

How do I map to R from input to output in the 3rd link above? As you can see from the 4th link, it is greyed out.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I don't know what you are trying to do with it. Are you trying to have it be part of your final target xml document? This is all part of the science (art?) of working with the xml stage. You need to first map all the lowest level repeating nodes to the appropriate output list (blue icon) and then bring in any "higher level" repeating detail.

Hard to tell just from looking at this.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
collabxchange
Premium Member
Premium Member
Posts: 34
Joined: Thu Aug 28, 2014 8:48 pm
Location: United States

Post by collabxchange »

Basically I am trying to do the following,

- Look for a patter of XML file in a directory.

- Read it and output to 2 different links.
Output link 1: It will contain the file name and the count of records(coming in a
column from the XML file).

Output Link 2: It will fetch the data/records from the XML file as per the
schema file we are specifying in the column generator.

Thank you
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Not sure what you mean by "number of records"......there will always be only "one record" for each parse-able xml. ...there may of course be many filenames sent into the stage...one for each document.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
collabxchange
Premium Member
Premium Member
Posts: 34
Joined: Thu Aug 28, 2014 8:48 pm
Location: United States

Post by collabxchange »

Sorry, I should have been little more clear.

The Number of records comes in the form of a field in the footer/trailer called "count" that we want to capture.

Essentially 1st output will capture the file name and a column. 2nd output will capture all records from the XML file.

Thanks
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

In your screen shots it looks like you are trying to map the "batch id" to the regular output link with the filename? That is most likely the mismatch, because batch id is within the xml, and not of the same list as your filename. Try removing that first, and see if you get the result that you are looking for, albeit without that column.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

In your screen shots it looks like you are trying to map the "batch id" to the regular output link with the filename? That is most likely the mismatch, because batch id is within the xml, and not of the same list as your filename. Try removing that first, and see if you get the result that you are looking for, albeit without that column.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
collabxchange
Premium Member
Premium Member
Posts: 34
Joined: Thu Aug 28, 2014 8:48 pm
Location: United States

Post by collabxchange »

The problem is I don't know what to match it with against. The External Source outputs only 1 column i.e. File_Name. The data in the File_Name column is actually the file I want to read. It's structure can be

Header:
Source_system
Batch_Id
...
Record/Data:
R
Footer:
Count

So what I want to do is read the XML file coming in the "File_Name" column and then output ALL the above columns in output Link 1 as,
Source_system Batch_Id R Count

Output Link 2 will have,

File_Name Count

The XML input stage is not allowing me to map R to R as it is not finding it in the input. Clear as mud? :?
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I know what you "want" to do... :) ...but first I want you to try just having a separate link that dumps just the filename. Get that working first.

Map the input link blue icon to the "other" output link, and make sure you can get the filename out on that link and also your parsed xml on the other link.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Just to make sure I understand your objective...

You want one link to carry forward the filename from the input link.

You want another link to carry the parsed xml.

..but....you also want "one" of the columns from the parsed xml to go out onto the filename link? ...on the same row?

If that's the case, then at least be sure you are first getting both links working, as I outlined above. One link with your filename and the other with your parsed xml. Test that and make sure it is doing exactly what you want. I can't emphasize this enough...because if that isn't working, you may be debugging the wrong thing.

Assuming that works, the typical problem here is that the Stage believes that you are "mismatching" the occurences of a hierarchy. Your filename is "one row"...one single value ....while the "column" that is being parsed from the xml could (potentially) be 1000's of rows. I'm guessing that the column you want is high level and at the top of your document, but depending on your xsd, the Stage doesn't know that. So you have to fool it into thinking that the dimensions of the filename and the column from your xml are identical.

There are probably a lot of ways to do this, but I would probably work with the list-to-group function and change what the stage things about the repeatability of these entries.

Click on your xmlParser step...then go to the right of the page and click on the "output" tab. Then "highlight" the blue icon for your incoming linkname that has the column containing your input filenames. Then right click and select "list-to-group".

Then add an aggregate step after the xmlParser. Use the desired column from your xml, probably with a scope of top. THEN.....at the output of the Aggregate Step, do the same thing with the result list.....highlight it and change it using the right click option: "list to group".

Now the two columns should have a similar dimension. I'm not looking at the stage assembly right now, but I suspect that you will then map the blue "top" icon to your general output link and then can assign the filename column and the column-from-your-xml that was aggregated earlier.

Let us know how that goes.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
collabxchange
Premium Member
Premium Member
Posts: 34
Joined: Thu Aug 28, 2014 8:48 pm
Location: United States

Post by collabxchange »

Ernie - Thank you very much. Yes, your understanding on the objective is correct. :)

The DS environment has been acting up since morning for which I am not able to develop/test anything at the moment. I will your recommended approach sometime in the weekend or Monday and let you know.

Again, thank you very much for all your help.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

ok. good luck. I forgot to mention --- on the aggregator step, be sure to select "first" or "last"...you just want it to pick up the character value and not do anything with it.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply