Page 1 of 2

Need help to generate schema files dynamically for txt files

Posted: Sun May 31, 2015 1:07 am
by JohnFrancisK
Hi, I have a requirement like there will be set of input .txt files with same set of columns but columns order will be different in each input .txt file. I need to create one reusable job to convert the columns order into one particular column order for all the input .txt files.


For Ex:

I/P:
1st Input .txt file Columns order is A B C D E F G H
2nd Input .txt file Columns Order is A D C H G B F E
3rd Input .txt file Columns Order is B G E H A C F D
....
Nth Input .txt file Columns Order is E H D A C F B G (in some order)

O/P:


1st Input .txt file Columns order is A B C D E F G H
2nd Input .txt file Columns Order is A B C D E F G H
3rd Input .txt file Columns Order is A B C D E F G H
....
Nth Input .txt file Columns Order is A B C D E F G H

Can someone pls help me on this requirement?

Thanks in advance!

Posted: Sun May 31, 2015 4:01 pm
by chulett
So... I'm wondering how dynamic this all is. How will you know which files have what column layout? It is simply numbered as you've shown (the first is always X, the second always Y, etc. whatever first and second mean at any given time) or is there a naming standard that tells you? Do you need to discover it individually for each file? Something completely different? :?

Let's start there.

Posted: Sun May 31, 2015 4:52 pm
by ray.wurlod
Do the files have column headings? (If yes, that will make the task a lot easier).

Posted: Sun May 31, 2015 4:54 pm
by ray.wurlod
Do the files have column headings? (If yes, that will make the task a lot easier).

That said, I don't believe there's going to be an easy "one job" solution. You will almost certainly be up for doing some coding, either a routine or a Build stage.

Posted: Sun May 31, 2015 6:19 pm
by JohnFrancisK
Thanks for your reply...

Yes. All the input files have column headings.


Please let me know how to approach/solution on this......

Posted: Sun May 31, 2015 9:51 pm
by ray.wurlod
My solution would use two jobs - one to re-order the columns into the standard format, and the other to process that file.

The first job would read the file as a single VarChar column, including reading the heading row as data (do not check the "first line is column headings check box), and store the column headings and positions from row 0 into stage variables. Its output would use the positional information and the Field() function to direct columns appropriately onto the output.

Posted: Mon Jun 01, 2015 5:01 am
by abhilashnair
Read the file with a single column and then have a column import stage. In column import select option for schema file and have a parameter for the same.

In this way you will have a single multi instance job but you will have 'n' schema file for n files. When you invoke the job just pass the parameter for the specific schema file. You can re-use the job

Posted: Mon Jun 01, 2015 3:16 pm
by ray.wurlod
That solution would involve n! schema files to accommodate every possible schema file. 10! = 3,628,800.

Posted: Tue Jun 02, 2015 9:46 am
by tonystark622
Another solution to try would be to use the TEXT file ODBC driver and the ODBC stage. As long as you have column headers that are always the same, even if the columns aren't in the same order it should work just by doing "select a,b,c,d,e,f,g,h from text file name". I'm not sure of the exact syntax of the select for text files, but some google research should give you what you need.

I hope this helps,
Tony

Posted: Tue Jun 02, 2015 9:55 am
by chulett
That's an interesting though... treat them like a database table. 8)

Posted: Tue Jun 16, 2015 3:33 pm
by JohnFrancisK
Hi Abhilash, I tried your solution. If the input file column header order and the schema file column order are same then only its processing the input file and loading the data into output file.

Could you please help me on the details how i can reorder the columns with this approach?

Thanks in advance!

John

Posted: Tue Jun 16, 2015 3:56 pm
by JohnFrancisK
Hi Ray,

I tried your solution. i was able to read the input file as Single Varchar column and able to store the column heading and positions into stage variables.

After that i stuck to proceed further (on how to iterate through all the rows and direct column appropriately onto to the output using Field() function)....

could you please give some more insight on this?

Thanks in advance!

John

Posted: Tue Jun 16, 2015 4:37 pm
by ray.wurlod
Basically it's a whole heap of If..Then..Else expressions that test the field number associated with the specific column heading and, where appropriate, use the Field() function to extract that particular substring out of the single VarChar input column.

Posted: Tue Jun 16, 2015 4:40 pm
by JohnFrancisK
Sure Ray. I will try the same.

Thanks,
John :)

Posted: Tue Jun 16, 2015 8:52 pm
by chulett
John, a couple of things.

Use the Reply to topic link rather than quoting everything from the post you are replying to using Reply with quote unless it makes sense to do so. Also it doesn't make a whole lot of sense to use BBCode tags like for quoting, bold, italics and other things and then check the option to "Disable BBCode in this post"... pretty much defeats the purpose. :wink: