Page 1 of 1

Best way to search through a DataSet

Posted: Fri Jun 07, 2013 7:21 pm
by JPalatianos
Hi,
I have been asked by our develoment team if there is an alternate/better way to serach through a dataset. I origianlly pointed them to the Data Set Management utility and they cam back with "searching through millions of rows would take hours with the limited row display". Besides dumping to a text file or stage table, is there a way to easily query a dataset for debugging purposes.

We are running Versiuon 8.0.1 on Windows and are in the process of upgrading to 8.7 on Windows.

Thanks - - John

Posted: Sat Jun 08, 2013 12:53 am
by ray.wurlod
Short answer: no.

Probably the fastest would be a parallel job that reads the Data Set and uses a Transformer stage to effect the search. You can run this with more nodes than exist in the Data Set.

Posted: Wed Jun 12, 2013 10:42 pm
by SURA
What Ray said is the best way. You can also use ORCHADMIN command to move the data into text file and open it with .xls / use grep (MKS tool kit) and find the name in that text file.

Again it depends your data volume. So you need to decide.

Posted: Thu Jun 13, 2013 5:40 am
by miwinter
Use the debugger, breakpointing on a condition equating to the search on the data you're interested in, on the dataset output link

Posted: Wed Jul 10, 2013 1:41 pm
by JPalatianos
I appreciate all the suggestions!!

Posted: Wed Jul 10, 2013 2:40 pm
by rameshrr3
I vote hands down for orchadmin with the dump option , and pipe it to a grep condition. The Dataset Management utillity does not scale . If you are on the newer versions , you can use the debugger also.

Posted: Thu Jul 11, 2013 1:14 am
by sendmkpk
ray.wurlod wrote:Probably the fastest would be a parallel job that reads the Data Set and uses a Transformer stage to effect the search. You can run this with more nodes than exist in the Data Set.
so, ray, did you mean, we could write the dataset using one config file and read it with another, how is it possible?

reg
praveen

Posted: Thu Jul 11, 2013 3:04 am
by ray.wurlod
Yes, that's what I'm saying. A copy of the configuration file used to write the Data Set is stored in its descriptor file and this can be used to read the Data Set (the data then have to be automatically re-partitioned in to the nodes of the currently active configuration file). DataStage looks after that for you. If you prefer to use the orchadmin command specify the -x option.