Automic Workload Automation

Expand all | Collapse all

Hadoop / Spark Integration

  • 1.  Hadoop / Spark Integration

    Posted 08-12-2015 12:38 PM
    Is anyone here using Spark with Automic and or Hadoop and what is our experience.

    Thanks


  • 2.  Hadoop / Spark Integration

    Posted 08-13-2015 02:49 PM
    Hey Fred, there's the new Hadoop Agent available for running Pig, Hive, and Map Reduce jobs. Here's a link to the documentation for it.

    There's also a plugin on the Marketplace with template jobs for running Hadoop from the Command Line.

    Unfortunately I haven't done anything with Spark yet.


  • 3.  Hadoop / Spark Integration

    Posted 11-14-2016 04:25 PM
    Update:

    Currently, the Hadoop agent supports the following implementations:

    HDFS
    HIVE
    MAPREDUCEJAR
    MAPREDUCESTREAM
    PIG

    Apache SPARK is not implemented. Please submit a feature enhancement request for consideration in the next release of the Hadoop agent -> http://ideas.automic.com/

    As a workaround, you can use the Windows or Unix agent to interact with Apache Spark using Python command line -> http://spark.apache.org/docs/latest/.

    According to that article...
    Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). It’s easy to run locally on one machine — all you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.

    Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API, Spark 2.0.1 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).