DataCROP Maize Processing Engine Worker Deployment

Use this page when following the manual per-repository setup. If you use Maize MVP, the worker is deployed by the MVP script; refer here only for customization or troubleshooting. See Maize Setup for the two setup options.

This is a demo deployment instance for the Maize DataCROP version. It deploys a Worker responsible for handling tasks within the DataCROP Workflow Management Engine. The deployment consists of a single container.

Overview

The deployment utilizes Apache Airflow and CeleryExecutor for distributed task execution within the DataCROP system. Below is an explanation of the different components and configurations defined in the docker-compose.yml file.

Airflow Worker Setup

  • The airflow-worker service is set up using Airflow’s CeleryExecutor to manage distributed task execution.
  • The worker communicates with:
    • Redis: Used as the message broker for Celery.
    • PostgreSQL: Used as the backend for storing task results.

Volumes

The following directories are mounted into the Airflow worker container to persist data and provide necessary resources:

  • DAGs: Task definitions are stored in the ./dags folder.
  • Logs: Logs generated by Airflow are stored in the ./logs folder.
  • Data: Input and output data for tasks are stored in the ./data folder.
  • Models: Model data is stored in the ./models folder.
  • Plugins: Airflow plugins can be added via the ./plugins folder.
  • .env: The .env file is used to handle dynamic environment variables.

REQUIREMENTS

PREREQUISITES

Before proceeding, ensure that you have followed the setup instructions for the airflow processing engine.

After completing the setup, follow these steps to configure your environment variables:

  1. In the Processing Engine Worker repository, edit its .env file and ensure that all necessary environment variables are set correctly for your deployment. Current values from maze-processing-engine-worker/.env are shown below; sensitive secrets are redacted—keep using the real values already present in your .env.

     # HOST              ||  DC.C
     AIRFLOW_IP=<AIRFLOW_HOST_IP>
     AIRFLOW_WEB_SECRET_KEY=[REDACTED – keep existing value in your .env]
     AIRFLOW_FERNET_KEY=[REDACTED – keep existing value in your .env]
     HOST_IP=<WORKER_HOST_IP>
     _PIP_ADDITIONAL_REQUIREMENTS=''
     AIRFLOW_UID=1002
     AIRFLOW_GID=0
    
     # WORKER            ||  DC.W
     WORKER_NAME=<WORKER_NAME>
     WORKER_SSL_KEY_FILE=/security/${WORKER_NAME}/${WORKER_NAME}-key.pem
     WORKER_SSL_CERT_FILE=/security/${WORKER_NAME}/${WORKER_NAME}.pem
     WORKER_SSL_CERT_STORE=/security/ca/rootCA.pem
    
     # Please check the GID of the docker group on the host
     DOCKER_GID=988
    
     # REDIS             ||  DC.C
     REDIS_TLS_PORT=6379
     REDIS_TLS_CERT_FILE=/security/redis/redis.pem
     REDIS_TLS_KEY_FILE=/security/redis/redis-key.pem
     REDIS_TLS_CA_CERT_FILE=/security/ca/rootCA.pem
     REDIS_TLS_CLIENT_CERT_FILE=/security/redis/redis-client.pem
     REDIS_TLS_CLIENT_KEY_FILE=/security/redis/redis-client-key.pem
    
     # CELERY            ||  DC.C
     CELERY_WEB_UNAME=[REDACTED – keep existing value in your .env]
     CELERY_WEB_PSSWD=[REDACTED – keep existing value in your .env]
    

    Adjust only if your deployment differs (e.g., different IPs or worker name); do not publish or rotate the redacted secrets already set in your .env.

Once these parameters are correctly set, you can proceed with the deployment.

Start The Application.

  1. Navigate to the source directory containing the docker-compose.yml file.
  2. Run the following command:

     docker compose up -d
    

Verify that everything is up and running

Wait for the services to start, then run the following commands:

  • Check if the container is running (change worker_name with the actual name that you specified in the .env file):

      docker ps --filter name=[worker_name] --format "table {{.Image}}\t{{.Names}}"
    

    You should see the following output:

      IMAGE                                        NAMES
      [worker_name]-airflow-worker                [worker_name]
    

Make Sure Everything Works

  1. Open a browser and navigate to the flower web app (http://{Your IP}:5555/workers).
  2. Enter the credentials provided by your organization for celery.
  3. After successful authentication, you will be redirected to the workers page, where the newly created worker should appear in the workers table. If its status is marked as online, the setup was completed successfully.

Stop everything.

Navigate to the source directory and run the following command.

docker-compose down