Hi,
I'm Chinese as well.
BDE 主要是提供在vSphere 基础架构上一键式创建Hadoop集群。可以通过指定jsony文件来定义Hadoop集群的配置情况。
下面是一个json文件的例子:nodeGroups->master 是定义一个Hadoop master node,worker是定义data node, client 是定义一些需要的services 的client. 在nodeGroups里同样也可以定义 Hadoop 的使用资源,比如CPU, memory, storage等等。haFlag 是定义启用vSphere HA功能,当某个node 出现了问题会自动重启这个node。configuration是提供修改Hadoop集群的配置的一个入口。
{
"nodeGroups":[
{
"name": "master",
"roles": [
"hadoop_namenode",
"hadoop_resourcemanager"
],
"instanceNum": 1,
"cpuNum": 2,
"memCapacityMB": 7500,
"storage": {
"type": "SHARED",
"sizeGB": 50
},
"haFlag": "on",
"configuration": {
"hadoop": {
}
}
},
{
"name": "worker",
"roles": [
"hadoop_datanode",
"hadoop_nodemanager"
],
"instanceNum": 3,
"cpuNum": 2,
"memCapacityMB": 7500,
"storage": {
"type": "LOCAL",
"sizeGB": 50
},
"haFlag": "off",
"configuration": {
"hadoop": {
}
}
},
{
"name": "client",
"roles": [
"hadoop_client",
"hive",
"hive_server",
"pig"
],
"instanceNum": 1,
"cpuNum": 1,
"memCapacityMB": 3748,
"storage": {
"type": "LOCAL",
"sizeGB": 50
},
"haFlag": "off",
"configuration": {
"hadoop": {
}
}
}
],
// we suggest running convert-hadoop-conf.rb to generate "configuration" section and paste the output here
"configuration": {
"hadoop": {
"core-site.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml
// note: any value (int, float, boolean, string) must be enclosed in double quotes and here is a sample:
// "io.file.buffer.size": "4096"
},
"hdfs-site.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
},
"mapred-site.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
},
"hadoop-env.sh": {
// "HADOOP_HEAPSIZE": "",
// "HADOOP_NAMENODE_OPTS": "",
// "HADOOP_DATANODE_OPTS": "",
// "HADOOP_SECONDARYNAMENODE_OPTS": "",
// "HADOOP_JOBTRACKER_OPTS": "",
// "HADOOP_TASKTRACKER_OPTS": "",
// "HADOOP_CLASSPATH": "",
// "JAVA_HOME": "",
// "PATH": ""
},
"yarn-site.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
},
"yarn-env.sh": {
// "YARN_OPTS": "",
// "YARN_HEAPSIZE": "",
// "JAVA_HEAP_MAX": "",
// "YARN_RESOURCEMANAGER_OPTS": "",
// "YARN_RESOURCEMANAGER_HEAPSIZE": "",
// "YARN_NODEMANAGER_OPTS": "",
// "YARN_NODEMANAGER_HEAPSIZE": "",
// "YARN_PROXYSERVER_OPTS": "",
// "YARN_PROXYSERVER_HEAPSIZE": "",
// "YARN_CLIENT_OPTS": "",
// "YARN_ROOT_LOGGER": "",
// "YARN_CLASSPATH": ""
},
"log4j.properties": {
// "hadoop.root.logger": "INFO,RFA",
// "log4j.appender.RFA.MaxBackupIndex": "10",
// "log4j.appender.RFA.MaxFileSize": "100MB",
// "hadoop.security.logger": "DEBUG,DRFA"
},
"fair-scheduler.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
// "text": "the full content of fair-scheduler.xml in one line"
},
"capacity-scheduler.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
}
}
}
}
BDE 会根据用户提供的json文件去分配资源。基本步骤如下:
1, 根据资源计算出Hadoop node放置的位置,例如:放在哪个Host,哪个Storage.
2, 从BDE template VM克隆出需要的Hadoop node,并且放在已经计算好的Host上。
3, 启动Hadoop node, 出始化配置(networking, Storage),所有node得到 IP 和 FQDN之后,Hadoop集群所需要的基础架构就好了。
4, BDE会根据用户使用的App manager执行自动化部署Hadoop的services 并且按需要启动他们。
BDE的优势在于用户可以根据自己的需要随时创建和删除Hadoop集群。不需要每次创建Hadoop集群太多的准备基础架构(Host, network, storage),这样会大大减少IT的工作量。具我所知目前使用BDE的用户中最大集群有大概256个Hadoop data node。而且运行的很稳定。