BigIntegrate in Hadoop - Dataset stage vs BDFS stage
Posted: Sat Mar 17, 2018 9:34 pm
Hi,
We are currently using dataset stage for creating intermediate files in HDFS. The descriptor file is created on the edge node (Linux) and data files resides on the data nodes (HDFS). My questions are:
1) Do we have any I/O performance overhead since descriptor is not in HDFS and data files are in HDFS?
2) IS BDFS stage better than Dataset stage as it is a Hadoop native stage?
Need your expert advice in this.
Thanks in Advance!
We are currently using dataset stage for creating intermediate files in HDFS. The descriptor file is created on the edge node (Linux) and data files resides on the data nodes (HDFS). My questions are:
1) Do we have any I/O performance overhead since descriptor is not in HDFS and data files are in HDFS?
2) IS BDFS stage better than Dataset stage as it is a Hadoop native stage?
Need your expert advice in this.
Thanks in Advance!