ApacheDolphinScheduler-DockerCompose详解
Apache DolphinScheduler 是一个分布式去中心化,易扩展的可视化 DAG 工作流任务调度系统。简称 DS,包括 Web 及若干服务,它依赖 PostgreSQL 和 Zookeeper,自身的服务模块包括:api, alert, master, worker(有一个 logger 服务,运行在 worker 中)等。详细部署可以参考:Docker 部署 Dolphin Scheduler
官方提供了 docker-compose.yml,位于项目的 docker/docker-swarm/ 目录下,本文以 v1.3.8 版本为例,讲解 docker-compose.yml 内的具体内容,该版本的 Compose 基于 apache/dolphinscheduler:1.3.8 的 Docker 镜像,DS Docker 构建可以参考之前写的这篇博客:Apache Dolphin Scheduler - Dockerfile 详解,主要的配置修改、流程启动都封装在 Dockerfile 中
Docker Compose
version: "3.1" services: # PostgreSQL dolphinscheduler-postgresql: image: postgres:11.12 environment: # 设置时区 TZ: Asia/Shanghai # PostgreSQL 相关的配置 POSTGRES_USER: root POSTGRES_PASSWORD: root POSTGRES_DB: dolphinscheduler # 数据卷 volumes: - dolphinscheduler-postgresql:/var/lib/postgresql/data # 重启策:在容器退出时总是重启容器 restart: unless-stopped # 配置网络 networks: - dolphinscheduler # Zookeeper dolphinscheduler-zookeeper: image: zookeeper:3.6.3 environment: TZ: Asia/Shanghai # Zookeeper 相关配置 ZOO_DATA_LOG_DIR: /data ZOO_4LW_COMMANDS_WHITELIST: srvr,ruok,wchs,cons volumes: - dolphinscheduler-zookeeper:/data restart: unless-stopped networks: - dolphinscheduler # DS 服务模块 dolphinscheduler-api: image: apache/dolphinscheduler:1.3.8 command: api-server ports: - 12345:12345 environment: TZ: Asia/Shanghai # 引入外部环境变量 env_file: config.env.sh # 健康检查 healthcheck: test: ["CMD", "/root/checkpoint.sh", "ApiApplicationServer"] interval: 30s timeout: 5s retries: 3 # 依赖 PostgreSQL 和 Zookeeper depends_on: - dolphinscheduler-postgresql - dolphinscheduler-zookeeper volumes: - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft - dolphinscheduler-resource-local:/dolphinscheduler restart: unless-stopped networks: - dolphinscheduler dolphinscheduler-alert: image: apache/dolphinscheduler:1.3.8 command: alert-server environment: TZ: Asia/Shanghai env_file: config.env.sh healthcheck: test: ["CMD", "/root/checkpoint.sh", "AlertServer"] interval: 30s timeout: 5s retries: 3 depends_on: - dolphinscheduler-postgresql volumes: - dolphinscheduler-logs:/opt/dolphinscheduler/logs restart: unless-stopped networks: - dolphinscheduler dolphinscheduler-master: image: apache/dolphinscheduler:1.3.8 command: master-server environment: TZ: Asia/Shanghai env_file: config.env.sh healthcheck: test: ["CMD", "/root/checkpoint.sh", "MasterServer"] interval: 30s timeout: 5s retries: 3 depends_on: - dolphinscheduler-postgresql - dolphinscheduler-zookeeper volumes: - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft restart: unless-stopped networks: - dolphinscheduler dolphinscheduler-worker: image: apache/dolphinscheduler:1.3.8 command: worker-server environment: TZ: Asia/Shanghai env_file: config.env.sh healthcheck: test: ["CMD", "/root/checkpoint.sh", "WorkerServer"] interval: 30s timeout: 5s retries: 3 depends_on: - dolphinscheduler-postgresql - dolphinscheduler-zookeeper volumes: - dolphinscheduler-worker-data:/tmp/dolphinscheduler - dolphinscheduler-logs:/opt/dolphinscheduler/logs - dolphinscheduler-shared-local:/opt/soft - dolphinscheduler-resource-local:/dolphinscheduler restart: unless-stopped networks: - dolphinscheduler # 声明使用到的网络 networks: dolphinscheduler: driver: bridge # 声明使用到的数据卷 volumes: dolphinscheduler-postgresql: dolphinscheduler-zookeeper: dolphinscheduler-worker-data: dolphinscheduler-logs: dolphinscheduler-shared-local: dolphinscheduler-resource-local:
每一个 service 都定义了 TZ 的环境变量,设置容器的时区为亚洲上海,restart 重启策略都设置为:unless-stopped,即:在容器退出时总是重启容器
在 yml 的最后定义了 Compose 使用到的 networks 和 volume
所有 service 使用同一个网络:dolphinscheduler,driver 定义为:bridge,默认就是 bridge,bridge 用于应用部署在不同容器,它们之间需要通信的情况
DS 的每个服务模块都通过 env_file 导入独立的环境变量文件 config.env.sh
healthcheck 是健康检查,调用容器内的 checkpoint.sh,并传入服务名称,检查该 Java 进程是否存在。两次健康检查的间隔 30s,超时时间为 5s,如果超过这个时间,本次健康检查就被视为失败,retries 重试次数设置为 3,当连续失败指定次数后,则将容器状态视为 unhealthy
PostgreSQL:POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB 分别定义了 PostgreSQL 的用户名、密码、一个名为:dolphinscheduler 的数据库
Zookeeper:添加环境变量 ZOO_4LW_COMMANDS_WHITELIST: srvr,ruok,wchs,cons,把这四个命令加入白名单,避免在使用这四个四字命令时提示:stat is not executed because it is not in the whitelist
conf 配置信息
#============================================================================ # Database #============================================================================ # postgresql DATABASE_TYPE=postgresql DATABASE_DRIVER=org.postgresql.Driver DATABASE_HOST=dolphinscheduler-postgresql DATABASE_PORT=5432 DATABASE_USERNAME=root DATABASE_PASSWORD=root DATABASE_DATABASE=dolphinscheduler DATABASE_PARAMS=characterEncoding=utf8 # mysql # DATABASE_TYPE=mysql # DATABASE_DRIVER=com.mysql.jdbc.Driver # DATABASE_HOST=dolphinscheduler-mysql # DATABASE_PORT=3306 # DATABASE_USERNAME=root # DATABASE_PASSWORD=root # DATABASE_DATABASE=dolphinscheduler # DATABASE_PARAMS=useUnicode=true&characterEncoding=UTF-8 #============================================================================ # ZooKeeper #============================================================================ ZOOKEEPER_QUORUM=dolphinscheduler-zookeeper:2181 ZOOKEEPER_ROOT=/dolphinscheduler #============================================================================ # Common #============================================================================ # common opts DOLPHINSCHEDULER_OPTS= # common env DATA_BASEDIR_PATH=/tmp/dolphinscheduler RESOURCE_STORAGE_TYPE=HDFS RESOURCE_UPLOAD_PATH=/dolphinscheduler FS_DEFAULT_FS=file:/// FS_S3A_ENDPOINT=s3.xxx.amazonaws.com FS_S3A_ACCESS_KEY=xxxxxxx FS_S3A_SECRET_KEY=xxxxxxx HADOOP_SECURITY_AUTHENTICATION_STARTUP_STATE=false JAVA_SECURITY_KRB5_CONF_PATH=/opt/krb5.conf LOGIN_USER_KEYTAB_USERNAME=hdfs@HADOOP.COM LOGIN_USER_KEYTAB_PATH=/opt/hdfs.keytab KERBEROS_EXPIRE_TIME=2 HDFS_ROOT_USER=hdfs RESOURCE_MANAGER_HTTPADDRESS_PORT=8088 YARN_RESOURCEMANAGER_HA_RM_IDS= YARN_APPLICATION_STATUS_ADDRESS=http://ds1:8088/ws/v1/cluster/apps/%s # skywalking SKYWALKING_ENABLE=false SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800 SW_GRPC_LOG_SERVER_HOST=127.0.0.1 SW_GRPC_LOG_SERVER_PORT=11800 # dolphinscheduler env HADOOP_HOME=/opt/soft/hadoop HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop SPARK_HOME1=/opt/soft/spark1 SPARK_HOME2=/opt/soft/spark2 PYTHON_HOME=/usr/bin/python JAVA_HOME=/usr/local/openjdk-8 HIVE_HOME=/opt/soft/hive FLINK_HOME=/opt/soft/flink DATAX_HOME=/opt/soft/datax #============================================================================ # Master Server #============================================================================ MASTER_SERVER_OPTS=-Xms1g -Xmx1g -Xmn512m MASTER_EXEC_THREADS=100 MASTER_EXEC_TASK_NUM=20 MASTER_DISPATCH_TASK_NUM=3 MASTER_HOST_SELECTOR=LowerWeight MASTER_HEARTBEAT_INTERVAL=10 MASTER_TASK_COMMIT_RETRYTIMES=5 MASTER_TASK_COMMIT_INTERVAL=1000 MASTER_MAX_CPULOAD_AVG=-1 MASTER_RESERVED_MEMORY=0.3 #============================================================================ # Worker Server #============================================================================ WORKER_SERVER_OPTS=-Xms1g -Xmx1g -Xmn512m WORKER_EXEC_THREADS=100 WORKER_HEARTBEAT_INTERVAL=10 WORKER_MAX_CPULOAD_AVG=-1 WORKER_RESERVED_MEMORY=0.3 WORKER_GROUPS=default #============================================================================ # Alert Server #============================================================================ ALERT_SERVER_OPTS=-Xms512m -Xmx512m -Xmn256m # xls file XLS_FILE_PATH=/tmp/xls # mail MAIL_SERVER_HOST= MAIL_SERVER_PORT= MAIL_SENDER= MAIL_USER= MAIL_PASSWD= MAIL_SMTP_STARTTLS_ENABLE=true MAIL_SMTP_SSL_ENABLE=false MAIL_SMTP_SSL_TRUST= # wechat ENTERPRISE_WECHAT_ENABLE=false ENTERPRISE_WECHAT_CORP_ID= ENTERPRISE_WECHAT_SECRET= ENTERPRISE_WECHAT_AGENT_ID= ENTERPRISE_WECHAT_USERS= #============================================================================ # Api Server #============================================================================ API_SERVER_OPTS=-Xms512m -Xmx512m -Xmn256m #============================================================================ # Logger Server #============================================================================ LOGGER_SERVER_OPTS=-Xms512m -Xmx512m -Xmn256m
config.env.sh 定义了用到的配置,通过 env_file 的方式传入容器,它会覆盖容器内的默认配置
参考资料
Networking overview
Zookeeper 四字命令
zookeeper四字命令提示命令不在白名单中
The “env_file” configuration optio
- 上一篇:[原创]Debian6安装conky
- 下一篇:iis6启动HTTP压缩的方法