Hue编译安装使用
简介
由于大数据框架很多,为了解决某个问题,一般来说会用到多个框架,但是每个框架又都有自己的web UI监控界面,对应着不同的端口号。比如HDFS(9870)、YARN(8088)、MapReduce(19888)等。这个时候有一个统一的web UI界面去管理各个大数据常用框架是非常方便的。这就使得对大数据的开发、监控和运维更加的方便。由此Hue诞生就是为了解决每个框架都有自己的Web界面的问题。
编译安装
Hue官方网站:https://gethue.com/
HUE官方用户手册:https://docs.gethue.com/
官方安装文档:https://docs.gethue.com/administrator/installation/install/
HUE下载地址:Hue - The open source SQL Assistant for Data Warehouses
下载(点上面那个Hue下载地址下面地址作废)
Hue - The open source SQL Assistant for Data Warehouses
相关安装包
- centos 7
- hue 4.5
- node.js v10.6.0(参考官网建议,高版本编译存在问题)
hue源码包
链接:https://pan.百度.com/s/10UPgRfejKpwdV6qT4WuJog
提取码:yyds
--来自百度网盘超级会员V5的分享
npm
先下载npm,安装,这里我就不具体了(记得加环境变量)
-
wget https://nodejs.org/dist/v14.15.4/node-v14.15.4-linux-x64.tar.xz
-
tar -xf node-v14.15.4-linux-x64.tar.xz
配置环境变量
sudo vi /etc/profile.d/my_env.sh
-
#NPM_HOME
-
NPM_HOME=/home/bigdata/node-v14.15.4-linux-x64
-
export PATH=$PATH:$NPM_HOME/bin:$NPM_HOME/sbin
source /etc/profile.d/my_env.sh
配置淘宝镜像
npm config set registry https://registry.npm.taobao.org
查看是否切换成功
npm config get registry
如果npm不好使,使用cnpm
-
npm install -g cnpm --registry=https://registry.npm.taobao.org
-
cd /usr/bin
-
ln -s /usr/local/node/bin/cnpm cnpm
编译
tar -zxvf hue-4.5.0.tgz
安装依赖包(安装最好在一台没有安装过mysql的机器编译安装)
-
# 需要Python支持(Python 2.7 / Python 3.5 )
-
python --version
-
# 在 CentOS 系统中安装编译 Hue 需要的依赖库
-
sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel
以上依赖仅适用CentOS/RHEL 7.X,其他情况请参考https://docs.gethue.com/administrator/installation/dependencies/
安装Hue的节点上最好没有安装过MySQL,否则可能有版本冲突
安装过程中需要联网,网络不好会有各种奇怪的问题
修改hue.ini文件
-
# [desktop]
-
http_host=node2
-
http_port=8000
-
time_zone=Asia/Shanghai
-
server_user=bigdata
-
server_group=bigdata
-
default_user=bigdata
-
app_blacklist=search
-
# [[database]]。Hue默认使用SQLite数据库记录相关元数据,替换为mysql
-
engine=mysql
-
host=master
-
port=3306
-
user=root
-
password=root
-
#数据库名称
-
name=hue
-
# 1003行左右,Hadoop配置文件的路径
-
hadoop_conf_dir=/home/bigdata/hadoop/hadoop/etc/hadoop
hue编译
-
# 进入 hue 源码目录,进行编译。 使用 PREFIX 指定安装 Hue 的路径
-
cd hue-4.5.0
-
make apps
如果遇到下列问题
yum install mysql-devel
然后删除上面指定编译目录的target里面的文件
-
PREFIX=/home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target
-
-
cd /home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target
-
rm -rf ./*
如果出现下面的错误
sudo yum install -y libxslt-devel
如果出现下面的错误
查找对应的依赖
sudo yum search sqlite3
找到对应的依赖进行安装
-
sudo yum install -y libsqlite3x.x86_64
-
sudo yum install -y libsqlite3x-devel.x86_64
-
sudo yum install -y gmp-devel.x86_64
再次编译
PREFIX=/home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target make install
稍微的等待.......恭喜编译成功!
编译以后的包不能到其他机器使用,因为挺多都是觉得路径里面,除非环境一样。
tar -zcvf hue.tar.gz hue
整合
HDFS
修改hadoop配置
在 hdfs-site.xml 中增加配置
-
<!-- HUE -->
-
<property>
-
<name>dfs.webhdfs.enabled</name>
-
<value>true</value>
-
</property>
-
<property>
-
<name>dfs.permissions.enabled</name>
-
<value>false</value>
-
</property>
在 core-site.xml 中增加配置
-
<!-- HUE -->
-
<property>
-
<name>hadoop.proxyuser.bigdata.hosts</name>
-
<value>*</value>
-
</property>
-
<property>
-
<name>hadoop.proxyuser.bigdata.groups</name>
-
<value>*</value>
-
</property>
-
<property>
-
<name>hadoop.proxyuser.hdfs.hosts</name>
-
<value>*</value>
-
</property>
-
<property>
-
<name>hadoop.proxyuser.hdfs.groups</name>
-
<value>*</value>
-
</property>
增加 httpfs-site.xml 文件,加入配置
-
<configuration>
-
<!-- HUE -->
-
<property>
-
<name>httpfs.proxyuser.bigdata.hosts</name>
-
<value>*</value>
-
</property>
-
<property>
-
<name>httpfs.proxyuser.bigdata.groups</name>
-
<value>*</value>
-
</property>
-
</configuration>
备注:修改完HDFS相关配置后,需要把配置scp给集群中每台机器,重启hdfs服务。
修改hue配置
-
cd /home/bigdata/apache-maven-3.8.6/hue-4.5.0/desktop/conf
-
vi hue.ini
-
# [desktop]
-
http_host=node2
-
http_port=8000
-
time_zone=Asia/Shanghai
-
server_user=bigdata
-
server_group=bigdata
-
default_user=bigdata
-
app_blacklist=search
-
# [[database]]。Hue默认使用SQLite数据库记录相关元数据,替换为mysql
-
engine=mysql
-
host=master
-
port=3306
-
user=root
-
password=root
-
#数据库名称
-
name=hue
-
# 1003行左右,Hadoop配置文件的路径
-
hadoop_conf_dir=/home/bigdata/hadoop/hadoop/etc/hadoop
创建数据库
CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
Hue初始化数据库
-
# 初始化数据库
-
cd /home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target/hue/build/env/bin
-
./hue syncdb
-
./hue migrate
-
# 检查数据
启动hue
/data/hue/build/env/bin/supervisor
最全配置
core-site.xml
-
<?xml version="1.0" encoding="UTF-8"?>
-
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
<!--
-
Licensed under the Apache License, Version 2.0 (the "License");
-
you may not use this file except in compliance with the License.
-
You may obtain a copy of the License at
-
-
http://www.apache.org/licenses/LICENSE-2.0
-
-
Unless required by applicable law or agreed to in writing, software
-
distributed under the License is distributed on an "AS IS" BASIS,
-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-
See the License for the specific language governing permissions and
-
limitations under the License. See accompanying LICENSE file.
-
-->
-
-
<!-- Put site-specific property overrides in this file. -->
-
-
<configuration>
-
-
<property>
-
<!--指定 namenode 的 hdfs 协议文件系统的通信地址-->
-
<name>fs.defaultFS</name>
-
<!--指定hdfs高可用的集群名称-->
-
<value>hdfs://bigdatacluster</value>
-
</property>
-
<property>
-
<!--指定 hadoop 集群存储临时文件的目录-->
-
<name>hadoop.tmp.dir</name>
-
<value>/home/bigdata/module/hadoop-3.1.3/data</value>
-
</property>
-
-
<!-- 配置HDFS网页登录使用的静态用户为bigdata -->
-
<property>
-
<name>hadoop.http.staticuser.user</name>
-
<value>bigdata</value>
-
</property>
-
-
<!-- 回收站 -->
-
<property>
-
<name>fs.trash.interval</name>
-
<value>1</value>
-
</property>
-
-
<property>
-
<name>fs.trash.checkpoint.interval</name>
-
<value>1</value>
-
</property>
-
-
<!-- 配置该bigdata(superUser)允许通过代理访问的主机节点 -->
-
<property>
-
<name>hadoop.proxyuser.bigdata.hosts</name>
-
<value>*</value>
-
</property>
-
<!-- 配置该bigdata(superUser)允许通过代理用户所属组 -->
-
<property>
-
<name>hadoop.proxyuser.bigdata.groups</name>
-
<value>*</value>
-
</property>
-
<!-- 配置该bigdata(superUser)允许通过代理的用户-->
-
<property>
-
<name>hadoop.proxyuser.bigdata.users</name>
-
<value>*</value>
-
</property>
-
-
<!-- 指定zkfc要连接的zkServer地址 -->
-
<property>
-
<name>ha.zookeeper.quorum</name>
-
<value>node1:2181,node2:2181,node3:2181</value>
-
</property>
-
-
<!-- Hue -->
-
<property>
-
<name>hadoop.proxyuser.hdfs.hosts</name>
-
<value>*</value>
-
</property>
-
<property>
-
<name>hadoop.proxyuser.hdfs.groups</name>
-
<value>*</value>
-
</property>
-
-
<property>
-
<name>hadoop.proxyuser.httpfs.hosts</name>
-
<value>*</value>
-
</property>
-
<property>
-
<name>hadoop.proxyuser.httpfs.groups</name>
-
<value>*</value>
-
</property>
-
-
<property>
-
<name>hadoop.proxyuser.hue.hosts</name>
-
<value>*</value>
-
</property>
-
<property>
-
<name>hadoop.proxyuser.hue.groups</name>
-
<value>*</value>
-
</property>
-
-
</configuration>
hdfs-site.xml
-
<?xml version="1.0" encoding="UTF-8"?>
-
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
<!--
-
Licensed under the Apache License, Version 2.0 (the "License");
-
you may not use this file except in compliance with the License.
-
You may obtain a copy of the License at
-
-
http://www.apache.org/licenses/LICENSE-2.0
-
-
Unless required by applicable law or agreed to in writing, software
-
distributed under the License is distributed on an "AS IS" BASIS,
-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-
See the License for the specific language governing permissions and
-
limitations under the License. See accompanying LICENSE file.
-
-->
-
-
<!-- Put site-specific property overrides in this file. -->
-
-
<configuration>
-
-
<!-- NameNode数据存储目录 -->
-
<property>
-
<name>dfs.namenode.name.dir</name>
-
<value>file://${hadoop.tmp.dir}/name</value>
-
</property>
-
<!-- DataNode数据存储目录 -->
-
<property>
-
<name>dfs.datanode.data.dir</name>
-
<value>file://${hadoop.tmp.dir}/data</value>
-
</property>
-
<!-- JournalNode数据存储目录 -->
-
<property>
-
<name>dfs.journalnode.edits.dir</name>
-
<value>${hadoop.tmp.dir}/jn</value>
-
</property>
-
<!-- 完全分布式集群名称 对应core.xml里面的fs.defaultFS-->
-
<property>
-
<name>dfs.nameservices</name>
-
<value>bigdatacluster</value>
-
</property>
-
<!-- 集群中NameNode节点都有哪些 -->
-
<property>
-
<name>dfs.ha.namenodes.bigdatacluster</name>
-
<value>nn1,nn2</value>
-
</property>
-
<!-- NameNode的RPC通信地址 -->
-
<property>
-
<name>dfs.namenode.rpc-address.bigdatacluster.nn1</name>
-
<value>master1:8020</value>
-
</property>
-
<property>
-
<name>dfs.namenode.rpc-address.bigdatacluster.nn2</name>
-
<value>master2:8020</value>
-
</property>
-
<!-- NameNode的http通信地址 -->
-
<property>
-
<name>dfs.namenode.http-address.bigdatacluster.nn1</name>
-
<value>master1:9870</value>
-
</property>
-
<property>
-
<name>dfs.namenode.http-address.bigdatacluster.nn2</name>
-
<value>master2:9870</value>
-
</property>
-
<!-- 指定NameNode元数据在JournalNode上的存放位置 -->
-
<property>
-
<name>dfs.namenode.shared.edits.dir</name>
-
<value>qjournal://node1:8485;node2:8485;node3:8485/bigdatacluster</value>
-
</property>
-
<!-- 访问代理类:client用于确定哪个NameNode为Active -->
-
<property>
-
<name>dfs.client.failover.proxy.provider.bigdatacluster</name>
-
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
-
</property>
-
<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
-
<property>
-
<name>dfs.ha.fencing.methods</name>
-
<value>sshfence</value>
-
</property>
-
<!-- 使用隔离机制时需要ssh秘钥登录-->
-
<property>
-
<name>dfs.ha.fencing.ssh.private-key-files</name>
-
<value>/home/bigdata/.ssh/id_rsa</value>
-
</property>
-
-
<!-- 配置黑名单 -->
-
<property>
-
<name>dfs.hosts.exclude</name>
-
<value>/home/bigdata/module/hadoop-3.1.3/etc/blacklist</value>
-
</property>
-
-
<!-- 启用nn故障自动转移 -->
-
<property>
-
<name>dfs.ha.automatic-failover.enabled</name>
-
<value>true</value>
-
</property>
-
-
<!-- HUE -->
-
<property>
-
<name>dfs.webhdfs.enabled</name>
-
<value>true</value>
-
</property>
-
<property>
-
<name>dfs.permissions.enabled</name>
-
<value>false</value>
-
</property>
-
-
</configuration>
mapred-site.xml
-
<?xml version="1.0"?>
-
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
<!--
-
Licensed under the Apache License, Version 2.0 (the "License");
-
you may not use this file except in compliance with the License.
-
You may obtain a copy of the License at
-
-
http://www.apache.org/licenses/LICENSE-2.0
-
-
Unless required by applicable law or agreed to in writing, software
-
distributed under the License is distributed on an "AS IS" BASIS,
-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-
See the License for the specific language governing permissions and
-
limitations under the License. See accompanying LICENSE file.
-
-->
-
-
<!-- Put site-specific property overrides in this file. -->
-
-
<configuration>
-
-
<!-- 启用jvm重用 -->
-
<property>
-
<name>mapreduce.job.jvm.numtasks</name>
-
<value>10</value>
-
<description>How many tasks to run per jvm,if set to -1 ,there is no limit</description>
-
</property>
-
-
<!--
-
<property>
-
<name>mapreduce.job.tracker</name>
-
<value>hdfs://master1:8001</value>
-
<final>true</final>
-
</property>
-
-->
-
<property>
-
<!--指定 mapreduce 作业运行在 yarn 上-->
-
<name>mapreduce.framework.name</name>
-
<value>yarn</value>
-
</property>
-
-
<property>
-
<name>yarn.app.mapreduce.am.env</name>
-
<value>HADOOP_MAPRED_HOME=/home/bigdata/module/hadoop-3.1.3</value>
-
</property>
-
<property>
-
<name>mapreduce.map.env</name>
-
<value>HADOOP_MAPRED_HOME=/home/bigdata/module/hadoop-3.1.3</value>
-
</property>
-
<property>
-
<name>mapreduce.reduce.env</name>
-
<value>HADOOP_MAPRED_HOME=/home/bigdata/module/hadoop-3.1.3</value>
-
</property>
-
-
<!-- 历史服务器端地址 -->
-
<property>
-
<name>mapreduce.jobhistory.address</name>
-
<value>master1:10020</value>
-
</property>
-
<!-- 历史服务器web端地址 -->
-
<property>
-
<name>mapreduce.jobhistory.webapp.address</name>
-
<value>master1:19888</value>
-
</property>
-
-
</configuration>
yarn-site.xml
-
<?xml version="1.0"?>
-
<!--
-
Licensed under the Apache License, Version 2.0 (the "License");
-
you may not use this file except in compliance with the License.
-
You may obtain a copy of the License at
-
-
http://www.apache.org/licenses/LICENSE-2.0
-
-
Unless required by applicable law or agreed to in writing, software
-
distributed under the License is distributed on an "AS IS" BASIS,
-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-
See the License for the specific language governing permissions and
-
limitations under the License. See accompanying LICENSE file.
-
-->
-
-
<configuration>
-
-
<property>
-
<name>yarn.nodemanager.aux-services</name>
-
<value>mapreduce_shuffle</value>
-
</property>
-
-
<!-- 启用resourcemanager ha -->
-
<property>
-
<name>yarn.resourcemanager.ha.enabled</name>
-
<value>true</value>
-
</property>
-
-
<!-- 声明两台resourcemanager的地址 -->
-
<property>
-
<name>yarn.resourcemanager.cluster-id</name>
-
<value>cluster-yarn1</value>
-
</property>
-
<!--指定resourcemanager的逻辑列表-->
-
<property>
-
<name>yarn.resourcemanager.ha.rm-ids</name>
-
<value>rm1,rm2</value>
-
</property>
-
<!-- ========== rm1的配置 ========== -->
-
<!-- 指定rm1的主机名 -->
-
<property>
-
<name>yarn.resourcemanager.hostname.rm1</name>
-
<value>master1</value>
-
</property>
-
<!-- 指定rm1的web端地址 -->
-
<property>
-
<name>yarn.resourcemanager.webapp.address.rm1</name>
-
<value>master1:8088</value>
-
</property>
-
<!-- 指定rm1的内部通信地址 -->
-
<property>
-
<name>yarn.resourcemanager.address.rm1</name>
-
<value>master1:8032</value>
-
</property>
-
<!-- 指定AM向rm1申请资源的地址 -->
-
<property>
-
<name>yarn.resourcemanager.scheduler.address.rm1</name>
-
<value>master1:8030</value>
-
</property>
-
<!-- 指定供NM连接的地址 -->
-
<property>
-
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
-
<value>master1:8031</value>
-
</property>
-
<!-- ========== rm2的配置 ========== -->
-
<!-- 指定rm2的主机名 -->
-
<property>
-
<name>yarn.resourcemanager.hostname.rm2</name>
-
<value>master2</value>
-
</property>
-
<property>
-
<name>yarn.resourcemanager.webapp.address.rm2</name>
-
<value>master2:8088</value>
-
</property>
-
<property>
-
<name>yarn.resourcemanager.address.rm2</name>
-
<value>master2:8032</value>
-
</property>
-
<property>
-
<name>yarn.resourcemanager.scheduler.address.rm2</name>
-
<value>master2:8030</value>
-
</property>
-
<property>
-
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
-
<value>master2:8031</value>
-
</property>
-
-
<!-- 指定zookeeper集群的地址 -->
-
<property>
-
<name>yarn.resourcemanager.zk-address</name>
-
<value>node1:2181,node2:2181,node3:2181</value>
-
</property>
-
-
<!-- 启用自动恢复 -->
-
<property>
-
<name>yarn.resourcemanager.recovery.enabled</name>
-
<value>true</value>
-
</property>
-
-
<!-- 指定resourcemanager的状态信息存储在zookeeper集群 -->
-
<property>
-
<name>yarn.resourcemanager.store.class</name>
-
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
-
</property>
-
<!-- 环境变量的继承 -->
-
<property>
-
<name>yarn.nodemanager.env-whitelist</name>
-
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
-
</property>
-
-
<!-- 开启日志聚集功能 -->
-
<property>
-
<name>yarn.log-aggregation-enable</name>
-
<value>true</value>
-
</property>
-
<!-- 设置日志聚集服务器地址 -->
-
<!-- 设置日志聚集服务器地址 -->
-
<property>
-
<name>yarn.log.server.url</name>
-
<value>http://master1:19888/jobhistory/logs</value>
-
</property>
-
<!-- 设置日志保留时间为7天 -->
-
<property>
-
<name>yarn.log-aggregation.retain-seconds</name>
-
<value>604800</value>
-
</property>
-
-
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
-
<property>
-
<name>yarn.nodemanager.pmem-check-enabled</name>
-
<value>false</value>
-
</property>
-
-
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
-
<property>
-
<name>yarn.nodemanager.vmem-check-enabled</name>
-
<value>false</value>
-
</property>
-
-
<property>
-
<name>yarn.nodemanager.resource.memory-mb</name>
-
<value>24576</value>
-
</property>
-
-
</configuration>
httpfs-site.xml
-
<?xml version="1.0" encoding="UTF-8"?>
-
<!--
-
Licensed under the Apache License, Version 2.0 (the "License");
-
you may not use this file except in compliance with the License.
-
You may obtain a copy of the License at
-
-
http://www.apache.org/licenses/LICENSE-2.0
-
-
Unless required by applicable law or agreed to in writing, software
-
distributed under the License is distributed on an "AS IS" BASIS,
-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-
See the License for the specific language governing permissions and
-
limitations under the License.
-
-->
-
<configuration>
-
-
<!-- HUE -->
-
<property>
-
<name>httpfs.proxyuser.bigdata.hosts</name>
-
<value>*</value>
-
</property>
-
<property>
-
<name>httpfs.proxyuser.bigdata.groups</name>
-
<value>*</value>
-
</property>
-
-
-
</configuration>
capacity-scheduler.xml
-
<!--
-
Licensed under the Apache License, Version 2.0 (the "License");
-
you may not use this file except in compliance with the License.
-
You may obtain a copy of the License at
-
-
http://www.apache.org/licenses/LICENSE-2.0
-
-
Unless required by applicable law or agreed to in writing, software
-
distributed under the License is distributed on an "AS IS" BASIS,
-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-
See the License for the specific language governing permissions and
-
limitations under the License. See accompanying LICENSE file.
-
-->
-
<configuration>
-
-
<property>
-
<name>yarn.scheduler.capacity.maximum-applications</name>
-
<value>10000</value>
-
<description>
-
Maximum number of applications that can be pending and running.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
-
<value>0.3</value>
-
<description>
-
Maximum percent of resources in the cluster which can be used to run
-
application masters i.e. controls number of concurrent running
-
applications.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.resource-calculator</name>
-
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
-
<description>
-
The ResourceCalculator implementation to be used to compare
-
Resources in the scheduler.
-
The default i.e. DefaultResourceCalculator only uses Memory while
-
DominantResourceCalculator uses dominant-resource to compare
-
multi-dimensional resources such as Memory, CPU etc.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.root.queues</name>
-
<value>high,low</value>
-
<description>
-
The queues at the this level (root is the root queue).
-
</description>
-
</property>
-
<!--
-
队列占比
-
-->
-
<property>
-
<name>yarn.scheduler.capacity.root.high.capacity</name>
-
<value>70</value>
-
<description>Default queue target capacity.</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.root.low.capacity</name>
-
<value>30</value>
-
<description>Default queue target capacity.</description>
-
</property>
-
-
-
<!--
-
百分比
-
-->
-
<property>
-
<name>yarn.scheduler.capacity.root.high.user-limit-factor</name>
-
<value>1</value>
-
<description>
-
Default queue user limit a percentage from 0.0 to 1.0.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.root.low.user-limit-factor</name>
-
<value>1</value>
-
<description>
-
Default queue user limit a percentage from 0.0 to 1.0.
-
</description>
-
</property>
-
-
-
<!--
-
运行状态
-
-->
-
<property>
-
<name>yarn.scheduler.capacity.root.high.maximum-capacity</name>
-
<value>100</value>
-
<description>
-
The maximum capacity of the default queue.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.root.low.state</name>
-
<value>RUNNING</value>
-
<description>
-
The state of the default queue. State can be one of RUNNING or STOPPED.
-
</description>
-
</property>
-
-
<!--
-
权限
-
-->
-
<property>
-
<name>yarn.scheduler.capacity.root.high.acl_submit_applications</name>
-
<value>*</value>
-
<description>
-
The ACL of who can submit jobs to the default queue.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.root.low.acl_submit_applications</name>
-
<value>*</value>
-
<description>
-
The ACL of who can submit jobs to the default queue.
-
</description>
-
</property>
-
-
-
<!--
-
权限
-
-->
-
<property>
-
<name>yarn.scheduler.capacity.root.high.acl_administer_queue</name>
-
<value>*</value>
-
<description>
-
The ACL of who can administer jobs on the default queue.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.root.low.acl_administer_queue</name>
-
<value>*</value>
-
<description>
-
The ACL of who can administer jobs on the default queue.
-
</description>
-
</property>
-
-
<!--
-
权限
-
-->
-
<property>
-
<name>yarn.scheduler.capacity.root.high.acl_application_max_priority</name>
-
<value>*</value>
-
<description>
-
The ACL of who can submit applications with configured priority.
-
For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.root.low.acl_application_max_priority</name>
-
<value>*</value>
-
<description>
-
The ACL of who can submit applications with configured priority.
-
For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
-
</description>
-
</property>
-
-
<!--
-
权限
-
-->
-
<property>
-
<name>yarn.scheduler.capacity.root.high.maximum-application-lifetime
-
</name>
-
<value>-1</value>
-
<description>
-
Maximum lifetime of an application which is submitted to a queue
-
in seconds. Any value less than or equal to zero will be considered as
-
disabled.
-
This will be a hard time limit for all applications in this
-
queue. If positive value is configured then any application submitted
-
to this queue will be killed after exceeds the configured lifetime.
-
User can also specify lifetime per application basis in
-
application submission context. But user lifetime will be
-
overridden if it exceeds queue maximum lifetime. It is point-in-time
-
configuration.
-
Note : Configuring too low value will result in killing application
-
sooner. This feature is applicable only for leaf queue.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.root.low.maximum-application-lifetime
-
</name>
-
<value>-1</value>
-
<description>
-
Maximum lifetime of an application which is submitted to a queue
-
in seconds. Any value less than or equal to zero will be considered as
-
disabled.
-
This will be a hard time limit for all applications in this
-
queue. If positive value is configured then any application submitted
-
to this queue will be killed after exceeds the configured lifetime.
-
User can also specify lifetime per application basis in
-
application submission context. But user lifetime will be
-
overridden if it exceeds queue maximum lifetime. It is point-in-time
-
configuration.
-
Note : Configuring too low value will result in killing application
-
sooner. This feature is applicable only for leaf queue.
-
</description>
-
</property>
-
-
-
<!--
-
生命周期
-
-->
-
<property>
-
<name>yarn.scheduler.capacity.root.high.default-application-lifetime
-
</name>
-
<value>-1</value>
-
<description>
-
Default lifetime of an application which is submitted to a queue
-
in seconds. Any value less than or equal to zero will be considered as
-
disabled.
-
If the user has not submitted application with lifetime value then this
-
value will be taken. It is point-in-time configuration.
-
Note : Default lifetime can't exceed maximum lifetime. This feature is
-
applicable only for leaf queue.
-
</description>
-
</property>
-
<property>
-
<name>yarn.scheduler.capacity.root.low.default-application-lifetime
-
</name>
-
<value>-1</value>
-
<description>
-
Default lifetime of an application which is submitted to a queue
-
in seconds. Any value less than or equal to zero will be considered as
-
disabled.
-
If the user has not submitted application with lifetime value then this
-
value will be taken. It is point-in-time configuration.
-
Note : Default lifetime can't exceed maximum lifetime. This feature is
-
applicable only for leaf queue.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.node-locality-delay</name>
-
<value>40</value>
-
<description>
-
Number of missed scheduling opportunities after which the CapacityScheduler
-
attempts to schedule rack-local containers.
-
When setting this parameter, the size of the cluster should be taken into account.
-
We use 40 as the default value, which is approximately the number of nodes in one rack.
-
Note, if this value is -1, the locality constraint in the container request
-
will be ignored, which disables the delay scheduling.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
-
<value>-1</value>
-
<description>
-
Number of additional missed scheduling opportunities over the node-locality-delay
-
ones, after which the CapacityScheduler attempts to schedule off-switch containers,
-
instead of rack-local ones.
-
Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
-
attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
-
after 40 20=60 missed opportunities.
-
When setting this parameter, the size of the cluster should be taken into account.
-
We use -1 as the default value, which disables this feature. In this case, the number
-
of missed opportunities for assigning off-switch containers is calculated based on
-
the number of containers and unique locations specified in the resource request,
-
as well as the size of the cluster.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.queue-mappings</name>
-
<value></value>
-
<description>
-
A list of mappings that will be used to assign jobs to queues
-
The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
-
Typically this list will be used to map users to queues,
-
for example, u:%user:%user maps all users to queues with the same name
-
as the user.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
-
<value>false</value>
-
<description>
-
If a queue mapping is present, will it override the value specified
-
by the user? This can be used by administrators to place jobs in queues
-
that are different than the one specified by the user.
-
The default is false.
-
</description>
-
</property>
-
-
<property>
-
<name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
-
<value>1</value>
-
<description>
-
Controls the number of OFF_SWITCH assignments allowed
-
during a node's heartbeat. Increasing this value can improve
-
scheduling rate for OFF_SWITCH containers. Lower values reduce
-
"clumping" of applications on particular nodes. The default is 1.
-
Legal values are 1-MAX_INT. This config is refreshable.
-
</description>
-
</property>
-
-
-
<property>
-
<name>yarn.scheduler.capacity.application.fail-fast</name>
-
<value>false</value>
-
<description>
-
Whether RM should fail during recovery if previous applications'
-
queue is no longer valid.
-
</description>
-
</property>
-
-
</configuration>
yarn-env.sh
-
#这个主要是解决找不到java的问题
-
export JAVA_HOME=/home/bigdata/module/jdk1.8.0_161
hadoop-server.sh
-
#!/bin/bash
-
if [ $# -lt 1 ]
-
then
-
echo "No Args Input..."
-
exit ;
-
fi
-
case $1 in
-
"start")
-
echo " =================== 启动 hadoop集群 ==================="
-
echo "node1的journalnode启动"
-
ssh node1 "hdfs --daemon start journalnode"
-
echo "node2的journalnode启动"
-
ssh node2 "hdfs --daemon start journalnode"
-
echo "node3的journalnode启动"
-
ssh node3 "hdfs --daemon start journalnode"
-
-
-
echo " --------------- 启动 hdfs ---------------"
-
ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/start-dfs.sh"
-
echo " --------------- 启动 yarn ---------------"
-
ssh master2 "/home/bigdata/module/hadoop-3.1.3/sbin/start-yarn.sh"
-
-
echo " --------------- 启动 historyserver ---------------"
-
ssh master1 "/home/bigdata/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
-
echo " --------------- 启动 httpfs ---------------"
-
ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/httpfs.sh start"
-
#建议/home/bigdata/hadoop/hadoop/bin/hdfs --daemon start httpfs
-
;;
-
"stop")
-
echo " --------------- 关闭httpfs ---------------"
-
#建议/home/bigdata/hadoop/hadoop/bin/hdfs --daemon stop httpfs
-
ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/httpfs.sh stop"
-
echo " =================== 关闭 hadoop集群 ==================="
-
echo " --------------- 关闭 historyserver ---------------"
-
ssh master1 "/home/bigdata/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
-
-
echo " --------------- 关闭 yarn ---------------"
-
ssh master2 "/home/bigdata/module/hadoop-3.1.3/sbin/stop-yarn.sh"
-
echo " --------------- 关闭 hdfs ---------------"
-
ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/stop-dfs.sh"
-
-
echo "node1的journalnode关闭"
-
ssh node1 "hdfs --daemon stop journalnode"
-
echo "node2的journalnode关闭"
-
ssh node2 "hdfs --daemon stop journalnode"
-
echo "node3的journalnode关闭"
-
ssh node3 "hdfs --daemon stop journalnode"
-
;;
-
*)
-
echo "Input Args Error..."
-
;;
-
esac
Hue整合Hdfs和Yarn集群配置
hue.ini
-
[hadoop]
-
-
# Configuration for HDFS NameNode
-
# ------------------------------------------------------------------------
-
[[hdfs_clusters]]
-
# HA support by using HttpFs
-
-
[[[default]]]
-
# Enter the filesystem uri
-
fs_defaultfs=hdfs://master1:8020
-
-
# NameNode logical name.
-
## logical_name=
-
-
# Use WebHdfs/HttpFs as the communication mechanism.
-
# Domain should be the NameNode or HttpFs host.
-
# Default port is 14000 for HttpFs.
-
#要单独启动对应的webhdfs
-
webhdfs_url=http://master1:14000/webhdfs/v1
-
-
# Change this if your HDFS cluster is Kerberos-secured
-
## security_enabled=false
-
-
# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
-
# have to be verified against certificate authority
-
## ssl_cert_ca_verify=True
-
-
# Directory of the Hadoop configuration
-
hadoop_conf_dir=/home/bigdata/module/hadoop-3.1.3/etc/hadoop
-
hadoop_bin=/home/bigdata/module/hadoop-3.1.3/bin
-
hadoop_hdfs_home=/home/bigdata/module/hadoop-3.1.3
-
-
# Configuration for YARN (MR2)
-
# ------------------------------------------------------------------------
-
[[yarn_clusters]]
-
-
[[[default]]]
-
# Enter the host on which you are running the ResourceManager
-
resourcemanager_host=cluster-yarn1
-
-
# The port where the ResourceManager IPC listens on
-
resourcemanager_port=8032
-
-
# Whether to submit jobs to this cluster
-
submit_to=True
-
-
# Resource Manager logical name (required for HA)
-
logical_name=rm1
-
-
# Change this if your YARN cluster is Kerberos-secured
-
## security_enabled=false
-
-
# URL of the ResourceManager API
-
resourcemanager_api_url=http://master1:8088
-
-
# URL of the ProxyServer API
-
proxy_api_url=http://master1:8088
-
-
# URL of the HistoryServer API
-
history_server_api_url=http://master1:19888
-
-
# URL of the Spark History Server
-
## spark_history_server_url=http://localhost:18088
-
-
# Change this if your Spark History Server is Kerberos-secured
-
## spark_history_server_security_enabled=false
-
-
# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
-
# have to be verified against certificate authority
-
## ssl_cert_ca_verify=True
-
-
# HA support by specifying multiple clusters.
-
# Redefine different properties there.
-
# e.g.
-
-
[[[ha]]]
-
# Resource Manager logical name (required for HA)
-
logical_name=rm2
-
-
# Un-comment to enable
-
submit_to=True
-
-
# URL of the ResourceManager API
-
resourcemanager_api_url=http://master2:8088
-
history_server_api_url=http://master1:19888
-
# ...
对接hive的时候把超时加长不然很容易就任务失败了
server_conn_timeout=3600
Hue整合Hbase
hbase启动脚本(要开启thrift主要是给Hue用)
-
-
case $1 in
-
"start"){
-
for i in master2
-
do
-
echo " --------启动 $i hbase-------"
-
ssh $i "/home/bigdata/module/hbase-2.4.9/bin/start-hbase.sh"
-
ssh $i "/home/bigdata/module/hbase-2.4.9/bin/hbase-daemons.sh start thrift"
-
done
-
};;
-
"stop"){
-
for i in master2
-
do
-
echo " --------停止 $i hbase-------"
-
ssh $i "/home/bigdata/module/hbase-2.4.9/bin/hbase-daemons.sh stop thrift"
-
ssh $i "/home/bigdata/module/hbase-2.4.9/bin/stop-hbase.sh"
-
done
-
};;
-
esac
Hue配置
-
[hbase]
-
# Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
-
# Use full hostname. If hbase.thrift.ssl.enabled in hbase-site is set to true, https will be used otherwise it will use http
-
# If using Kerberos we assume GSSAPI SASL, not PLAIN.
-
hbase_clusters=(Cluster|node3:9090)
-
-
# HBase configuration directory, where hbase-site.xml is located.
-
hbase_conf_dir=/home/bigdata/module/hbase-2.4.9/conf
-
-
# Hard limit of rows or columns per row fetched before truncating.
-
## truncate_limit = 500
-
-
# Should come from hbase-site.xml, do not set. 'framed' is used to chunk up responses, used with the nonblocking server in Thrift but is not supported in Hue.
-
# 'buffered' used to be the default of the HBase Thrift Server. Default is buffered when not set in hbase-site.xml.
-
## thrift_transport=buffered
-
-
# Choose whether Hue should validate certificates received from the server.
-
## ssl_cert_ca_verify=true
参考文章
https://blog.csdn.net/yxluojiecpp/article/details/126828755
这篇好文章是转载于:学新通技术网
- 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
- 本站站名: 学新通技术网
- 本文地址: /boutique/detail/tanhgfhgbg
-
photoshop保存的图片太大微信发不了怎么办
PHP中文网 06-15 -
《学习通》视频自动暂停处理方法
HelloWorld317 07-05 -
Android 11 保存文件到外部存储,并分享文件
Luke 10-12 -
word里面弄一个表格后上面的标题会跑到下面怎么办
PHP中文网 06-20 -
photoshop扩展功能面板显示灰色怎么办
PHP中文网 06-14 -
微信公众号没有声音提示怎么办
PHP中文网 03-31 -
excel下划线不显示怎么办
PHP中文网 06-23 -
excel打印预览压线压字怎么办
PHP中文网 06-22 -
TikTok加速器哪个好免费的TK加速器推荐
TK小达人 10-01 -
怎样阻止微信小程序自动打开
PHP中文网 06-13