adarsh
2010-12-06 08:44:49 UTC
Dear all,
Thanks for your replies so that I am able to configure Hadoop with SGE integrated on 10 nodes cluster.
I overcomed all the difficulties faced during Configuration.
Yet there are some doubts in my mind.
1. I loaded data of different types in Hadoop ( 24MB, 2 GB, 20 GB file ). When i issued a command ./qhost -F | grep hdfs, it shows data paths. But when I ran any SGE job on these types of data files,it executes on only 1 execution daemon.
It is good for small files, but for 20 Gb file, data is distributed on 10 nodes. So it might runs all tasktrackers for running wordcount. But it shows only one execution daemon.
I check through Web UI and logs, only one execution daemon is running.
It causes data transfer to one node which takes too much time.
What is the benefit, Hadoop made for distribution processing.
Is it our configuration problem ( I configured all.q to all execution daemons )
Is it possible to run a job on several hosts concurrently ( Hadoop is used for ) though single or different queues.
Thanks & Regards
Adarsh Sharma
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302383
To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
Thanks for your replies so that I am able to configure Hadoop with SGE integrated on 10 nodes cluster.
I overcomed all the difficulties faced during Configuration.
Yet there are some doubts in my mind.
1. I loaded data of different types in Hadoop ( 24MB, 2 GB, 20 GB file ). When i issued a command ./qhost -F | grep hdfs, it shows data paths. But when I ran any SGE job on these types of data files,it executes on only 1 execution daemon.
It is good for small files, but for 20 Gb file, data is distributed on 10 nodes. So it might runs all tasktrackers for running wordcount. But it shows only one execution daemon.
I check through Web UI and logs, only one execution daemon is running.
It causes data transfer to one node which takes too much time.
What is the benefit, Hadoop made for distribution processing.
Is it our configuration problem ( I configured all.q to all execution daemons )
Is it possible to run a job on several hosts concurrently ( Hadoop is used for ) though single or different queues.
Thanks & Regards
Adarsh Sharma
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302383
To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].