Oozie creates workflows, manually configures and uses Hue configurations

  python

Oozie creates workflow

The execution command of the workflow refers to the blog:https://www.jianshu.com/p/6cb3a4b78556, you can also typeoozie helpView help

Manually configure oozie’s workflow

Job.properties file, which stores some parameters that may be used in workflow.xml file
job.properties

# 注意变量名不要包含特殊字符,否则在 spark 中会出现无法解析变量名的问题
# oozie.wf.application.path的路径必须在hdfs上,因为整个集群要访问

nameNode=hdfs://txz-data0:9820
resourceManager=txz-data0:8032
oozie.use.system.libpath=true
oozie.libpath=${nameNode}/share/lib/spark2/jars/,${nameNode}/share/lib/spark2/python/lib/,${nameNode}/share/lib/spark2/hive-site.xml
oozie.wf.application.path=${nameNode}/workflow/data-factory/download_report_voice_and_upload/Workflow
oozie.action.sharelib.for.spark=spark2

archive=${nameNode}/envs/py3.tar.gz#py

# 如果 dryrun 为 true,表示只是测试当前的 workflow,并不具体记录相应 job
dryrun=false

sparkMaster=yarn-cluster
sparkMode=cluster
scriptRoot=/workflow/data-factory/download_report_voice_and_upload/Python
sparkScriptBasename=download_parquet_from_data0_upload_online.py
sparkScript=${scriptRoot}/${sparkScriptBasename}
pysparkPath=py/py3/bin/python3

Xml file

<!--
    这是为oozie的workflow提供参数,里面用到的变量默认来自job.properties文件
-->

<workflow-app xmlns='uri:oozie:workflow:1.0' name='download_parquet_from_data0_upload_online'>

    <global>
        <resource-manager>${resourceManager}</resource-manager>
        <name-node>${nameNode}</name-node>
    </global>

    <start to='spark-node' />

    <action name='spark-node'>
        <spark xmlns="uri:oozie:spark-action:1.0">
            <master>${sparkMaster}</master>
            <mode>${sparkMode}</mode>
            <name>report_voice_download_pyspark</name>
            <jar>${sparkScriptBasename}</jar>
            <spark-opts>
                --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=${pysparkPath}
            </spark-opts>
            <file>${sparkScript}#${sparkScriptBasename}</file>
            <archive>${archive}</archive>
        </spark>

        <ok to="end" />
        <error to="fail" />
    </action>

    <kill name="fail">
        <message>
            Workflow failed, error
            message[${wf:errorMessage(wf:lastErrorNode())}]
        </message>
    </kill>
    <end name='end' />
</workflow-app>

Place the two files on a local disk, for example, in a folder./home/workflow/In

run commandoozie job -oozie http://txz-data0:11000/oozie -config /home/workflow/job.properties -runYou can run this workflow

This handwritten configuration is not visible on the Hue, so the workflow is configured on the Hue and then the Schedule is configured. See blog for specific configuration.https://blog.csdn.net/qq_22918243/article/details/89204111