Hive On Spark(切换计算引擎为Spark)

将Hive默认的MapReduce计算引擎替换成Spark

本文章基于已经安装配置好完成元数据初始化的Hive

关闭Yarn内存检查

修改hadoop的yarn-site.xml配置文件

<property>

<name>yarn.nodemanager.pmem-check-enabled</name>

<value>false</value>

</property>

<property>

<name>yarn.nodemanager.vmem-check-enabled</name>

<value>false</value>

</property>

修改hadoop配置后需要分发并重启集群

上传Spark安装包并解压

注意要选择版本对应的.避免出现版本兼容性问题

tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /opt/module/

mv /opt/module/spark-3.0.0-bin-hadoop3.2/ /opt/module/spark

配置Spark环境变量

exportSPARK_HOME=/opt/module/spark

exportPATH=$PATH:$SPARK_HOME/bin

source /etc/profile.d/myenv.sh

存储Spark默认参数 spark-defaults.conf

spark.master yarn

spark.eventLog.enabled true

spark.eventLog.dir hdfs://master:9000/spark/logs

spark.executor.memory 1g

spark.driver.memory 1g

向HDFS上传纯净版的spark的jar包

hdfs dfs -put /opt/module/spark/spark-3.0.0-bin-without-hadoop/jars/* /spark/jars

#事先创建要用到的HDFS目录

hdfs dfs -mkdir -p /spark/jars

hdfs dfs -mkdir -p /spark/logs

修改hive-site.xml配置文件

<!– Hive on Spark–>

<property>

<name>spark.yarn.jars</name>

<value>hdfs://master:9000/spark/jars</value>

</property>

<property>

<name>hive.execution.engine</name>

<value>spark</value>

</property>

修改spark-env.sh配置文件

exportJAVA_HOME=/opt/module/jdk

exportYARN_CONF_DIR=/opt/module/hadoop/etc/hadoop

exportHADOOP_CONF_DIR=/opt/module/hadoop/etc/hadoop

exportSPARK_HOME=/opt/module/spark

exportSPARK_DIST_CLASSPATH=$(hadoop classpath)

SPARK_DAEMON_JAVA_OPTS=”

-Dspark.deploy.recoveryMode=ZOOKEEPER

-Dspark.deploy.zookeeper.url=master:2181,slave1:2181,slave2:2181

-Dspark.deploy.zookeeper.dir=/spark/zookeeper”

SPARK_HISTORY_OPTS=”

-Dspark.history.fs.logDirectory=hdfs://master:9000/spark/logs

-Dspark.history.fs.cleaner.enabled=true”

hadoop classpath 需要在环境变量中配置好

测试

createdatabasetmp;

usetmp;

createtable student (id int, name string);

insertintotable student values(1,’abc’);

报错org.apache.hadoop.hive.ql.parse.SemanticException:Failed to get a spark session:

查看export SPARK_DIST_CLASSPATH=$(hadoop classpath)是否配置成功