This section will guide you through the necessary steps to set up your environment for executing Hydrograph jobs remotely. In order to execute jobs remotely you’ll need to build the engine and server components of the Hydrograph project. For more information about how to build these components please see Local Development Install.
Before setting up Hydrograph on your remote execution environment you’ll need to ensure that you’ve installed the following pre-requisites:
Hydrograph is made up of three components - the developer UI, the XML custom code, and the backend execution. For a general overview of the core Hydrograph components please see High Level Architecture. In addition to the backend engine, there is a Hydrograph server that is responsible for monitoring job execution. For the purpose of deploying Hydrograph in a remote environment we’ll be concerned with the server and engine projects.
The Hydrograph engine is responsible for reading the Hydrograph job XML and creating Spark flows. Once you build the hydrograph.engine project, you will get the dependent libraries and the following Hydrograph jars:
In order to set up the engine properly in your remote environment you’ll need to use the following directory structure:
hydrograph-engine
|_ configs
|_ libs
|_ scripts
This directory contains the configuration files for the Hydrograph engine. In addition to the configurations it contains property files from the following projects - hydrograph.engine/hydrograph.engine.spark/src/main/resources/ and hydrograph.engine/hydrograph.engine.core/src/main/resources/
This directory should contain all of the JAR files for the Hydrograph engine modules along with any additional dependencies.
This directory contains scripts that are used to execute Hydrograph jobs.
**Property** | **Example** | **Description** |
HYDROGRAPH_HOME | HYDROGRAPH_HOME=/code/hydrograph/spark-engine | Update above path only if you change the base directory path from the default one. |
SPARK_LIB | SPARK_LIB=/opt/spark | Location of the spark installation directory. |
A sample script is available [here]
Hydrograph provides a way to track the execution of jobs as well as to view the sample data generated as part of job execution. These features are available on Hydrograph UI tool, but in order to use them we need to enable a services which keep track of job execution and provide the generated data to Hydrograph UI.
In order to set up the Hydrograph server, you’ll need to create a directory named ‘server’ under the base directory. This directory will contain libraries, logs and configuration folders required for the server. Go ahead and create a group of directories with the following structure:
hydrograph-server
|_ bin
|_ configs
|_ libs
|_ scripts
Execution service, view data service, and all required dependency jars are placed under this directory.
The Hydrograph service jar is placed in this directory.
This directory contains some of the essential configuration files needed to customize server behavior.
1. ServiceConfig.properties
Configurations related to port and Kerberos are placed in this file.
Key properties to configure:
2. mail.properties
The Hydrograph server sends notifications to registered participants for certain events (such as server shutting down abruptly).
This configuration file contains recipients for the notifications.
Properties to configure:
3. log4j.properties
This file contains configurations for logging. Some properties that you might want to update include:
4.hydrographViewDataService-exec.sh
Use this script to start/stop the view data service.A sample of the script can be found [here]