Wednesday, August 21, 2013

Apache Hadoop - Configure an automatic process for starting Hive Server

If you want to use a third party software such as "Tableau" to connect to your Hadoop Hive server, you need to make sure you have a Hive server up running and listening to all the income connections.

You can start the Hive thrift service by typing the following command:

# hive --service hiverserver

But the above command will terminate when you exit your Hadoop terminal session, so you may need to run the Hive service in a persisted state. To move the Hive service into the background, type the following command:

# nohup HIVE_PORT=10000 hive --service hiveserver &

For long-term usage, you will want to configure an automatic process for starting Hive along with the cluster itself. I created a simple script for the long-term use purpose.

OS: CentOS 6.x

#!/bin/bash
# chkconfig: 2345 95 20
# description: Hive Thrift server start/stop script
# processname: hive_thrift

# Append date to log file
NOW=$(date +"%m-%d-%Y")
LOG=/var/log/hive/hive_thrift_server.$NOW.log


# Get function from functions library
. /etc/init.d/functions


# Start the service Hive Thrift server
export HIVE_PORT=10000

start() {

        echo "[$(date +"%m-%d-%Y-%r")]: Starting Hive Thrift server: " > $LOG
        /usr/bin/hive --service hiveserver > /dev/null &
        PID=$!
        echo "[$(date +"%m-%d-%Y-%r")]: Hive Thrift server started" >> $LOG
        echo "[$(date +"%m-%d-%Y-%r")]: Process ID is $PID" >> $LOG
        echo >> $LOG
        cat $LOG
}



# Restart the service Hive Thrift Server
stop() {
        echo >> $LOG
        echo "[$(date +"%m-%d-%Y-%r")]: Stopping Hive Thrift server: " >> $LOG
        echo "[$(date +"%m-%d-%Y-%r")]: Stopping Hive Thrift server: "
        pid=`lsof -i tcp:10000 | tail -n +2 | awk '{print $2}'`
        if [ -f /proc/$pid/exe ]; then
            echo "[$(date +"%m-%d-%Y-%r")]: Kill Hive Thrift server with pid=$pid" >> $LOG
            echo "[$(date +"%m-%d-%Y-%r")]: Kill Hive Thrift server with pid=$pid"
            `kill -9 $pid`
            echo "[$(date +"%m-%d-%Y-%r")]: Hive Thrift server stoped" >> $LOG
            echo "[$(date +"%m-%d-%Y-%r")]: Hive Thrift server stoped"
        else
            echo "[$(date +"%m-%d-%Y-%r")]: Hive Thrift server is not running, no need to kill" >> $LOG
            echo "[$(date +"%m-%d-%Y-%r")]: Hive Thrift server is not running, no need to kill"
        fi
}

# Status of Hive Thrift server
status() {
       pid=`lsof -i tcp:10000 | tail -n +2 | awk '{print $2}'`
       if [ -f /proc/$pid/exe ]; then
           echo "Hive Thrift server is running, process ID is $pid"
       else
           echo "Hive Thrift server is not running."
       fi

}


### main logic ###
case "$1" in
  start)
        start
        ;;
  stop)
        stop
        ;;
  status)
        status
        ;;
  restart|reload|condrestart)
        stop
        start
        ;;
  *)
        echo $"Usage: $0 {start|stop|status|restart}"
        exit 1

esac
exit 0


To make it a init startup script, just do:
# cp hive_thrift /etc/init.d/
# chkconfig --level 345 hive_thrift on
For Debian or Ubuntu:
# cp hive_thrift /etc/init.d/
# update-rc.d hive_thrift defaults

No comments: