DataCROP Maize Model Repository Deployment
Use this page when following the manual per-repository setup. If you use Maize MVP, the model repository is deployed by the MVP script; refer here only for customization or troubleshooting. See Maize Setup for the two setup options.
This is a demo deployment instance for the Maize DataCROP version. It deploys the DataCROP Model Repository infrastructure, consisting of the WME server plus supporting containers (MongoDB and the Elastic Stack services used by Logstash/Kibana pipelines).
Requirements
Prerequisites
Before proceeding, make sure you have completed the following steps:
- Airflow Setup:
- Ensure that you have followed the setup instructions for both the Airflow Processing Engine and the Processing Engine Worker. These components need to be properly configured and running before deploying the Maize DataCROP Model Repository.
After completing the setup, follow these steps to configure your environment variables:
- Navigate to your environment variable file (e.g.,
.envor the relevant configuration file for your deployment). -
Update the file with the correct values for your infrastructure. Below are the current values from
maize-model-repository/.envanddocker-compose.yml; sensitive secrets are redacted—keep using the real values already present in your.env.# Application SERVER_PORT=9090 MAX_FILE_SIZE=200MB MAX_REQUEST_SIZE=500MB # Workflow Management Engine VM_WME_IP=<YOUR_IP> VM_WORKER_IP=<YOUR_IP> WEBSERVER_DAGS_FOLDER=/path/to/maize-processing-engine-airflow/dags WORKER_API_PORT=8090 # Harbor HARBOR_URL=harbor.example.com/ HARBOR_USERNAME=<HARBOR_USERNAME> HARBOR_TOKEN=[REDACTED – keep existing value in your .env] # MongoDB MONGO_INITDB_ROOT_USERNAME=root MONGO_INITDB_ROOT_PASSWORD=[REDACTED – keep existing value in your .env] MONGO_USERNAME=root MONGO_PASSWORD=[REDACTED – keep existing value in your .env] MONGO_DATABASE=registry MONGO_PORT=27017 MONGO_HOST=<MONGO_HOST> # Kafka KAFKA_ENABLED=false KAFKA_BOOTSTRAP_SERVERS=<KAFKA_BOOTSTRAP_SERVERS> # Logstash LOGSTASH_CONFIG_FOLDER=/app/logstash/config/ LOGSTASH_PIPELINE_FOLDER=/app/logstash/pipeline/ # Keycloak KEYCLOAK_ISSUER_URI=https://keycloak.example.com/realms/YOUR-REALM KEYCLOAK_PROVIDER=<KEYCLOAK_PROVIDER> KEYCLOAK_CLIENT_NAME=<KEYCLOAK_CLIENT_NAME> KEYCLOAK_CLIENT_ID=<KEYCLOAK_CLIENT_ID> KEYCLOAK_CLIENT_SECRET=[REDACTED – keep existing value in your .env] KEYCLOAK_SCOPE=openid,offline_access,profile,roles KEYCLOAK_USER_NAME_ATTR=preferred_username KEYCLOAK_JWK_SET_URI=https://keycloak.example.com/realms/YOUR-REALM/protocol/openid-connect/certs # Credentials Encryption CREDENTIALS_ENCRYPTION_KEY=<BASE64_32_BYTE_KEY> # Elastic Stack ELASTIC_VERSION=8.15.3 ELASTIC_PASSWORD=[REDACTED – keep existing value in your .env] LOGSTASH_INTERNAL_PASSWORD=[REDACTED – keep existing value in your .env] KIBANA_SYSTEM_PASSWORD=[REDACTED – keep existing value in your .env] METRICBEAT_INTERNAL_PASSWORD=[REDACTED – keep existing value in your .env] FILEBEAT_INTERNAL_PASSWORD=[REDACTED – keep existing value in your .env] HEARTBEAT_INTERNAL_PASSWORD=[REDACTED – keep existing value in your .env] MONITORING_INTERNAL_PASSWORD=[REDACTED – keep existing value in your .env] BEATS_SYSTEM_PASSWORD=[REDACTED – keep existing value in your .env] # Airflow (WME integration) AIRFLOW_BASE_URL=http://<AIRFLOW_HOST>:8080/api/v1 AIRFLOW_USERNAME=<AIRFLOW_USERNAME> AIRFLOW_PASSWORD=[REDACTED – keep existing value in your .env]Sensitive secrets are redacted above; ensure your
.envretains the real values currently configured.
Defaults created by Initialize resources
When a Workflow Editor user clicks Settings → Initialize resources, the Model Repository seeds a baseline catalog (only if the resources don’t already exist). This includes default data interface types and processor definitions.
Default data interface types
These interface types are intentionally aligned with the editor’s automatic Logstash pipeline creation: the editor uses these type names and fields to generate Logstash input/output configuration automatically.
elasticsearchhosts:167.235.128.77:9200user:logstash_internalpassword:${LOGSTASH_INTERNAL_PASSWORD}index:test_index
kafkabootstrap_servers:167.235.128.77:9092topic_id:giannis_processed
httpurl:http://localhost:8080/apiport:8080http_method:postformat:jsonuser:""password:""
mongodburi:mongodb://localhost:27017/mydbdatabase:mydbcollection:mycollection
s3bucket:my-bucketregion:eu-central-1endpoint:""(empty means AWS S3; otherwise can be e.g.http://minio:9000)access_key_id:""secret_access_key:""prefix:logs/
redishost:localhostport:6379data_type:list(supported:list,channel,pattern_channel)key:mylistpassword:""
rabbitmqhost:localhostport:5672user:guestpassword:guestqueue:myqueue(input)exchange:myexchange(output)exchange_type:direct(supported:direct,topic,fanout)vhost:/
mqttbroker:tcp://localhostport:1883topic:sensor/datausername:""password:""client_id:modul4r-clientqos:0clean_session:true(not Logstash-compatible by default)
Default processor definitions
Initialize resources also creates these processor definitions (if missing):
Apache Kafka(Data Persistence,0.1) — provisions Kafka + AKHQ.Kibana Pipeline(Datacrop Service,1.0) — enables the built-in Kibana pipeline (active=true).Logstash Pipeline(Datacrop Service,1.0) — enables the built-in Logstash pipeline (active=true);logstash_filterdefaults to empty and expects filter plugin content only (nofilter {}wrapper).
Optional: Predefining processor definitions (before initialization)
In addition to the defaults above, deployers can ship predefined processor definitions that will be imported when a Workflow Editor user clicks Initialize resources. This lets each deployed instance come up with a customized processor catalog.
How to use
- Create
config/extra-processors.json(use the template file as a starting point):cp config/extra-processors.example.json config/extra-processors.json
- Edit
config/extra-processors.json:- Kafka is just an example in the template; rename the processor
nameand/or replace the entry with your own processors.
- Kafka is just an example in the template; rename the processor
- Ensure the file is mounted into the Model Repository container (already present in
docker-compose.yml):./config/extra-processors.json:/app/config/extra-processors.json:ro
- Deploy the Model Repository, then in the Workflow Editor go to Settings → Initialize resources (see Workflow Editor Setup).
File format (schema)
The server expects a root object with a processors array:
- Root:
{ "processors": [ ... ] } - Each processor:
name,description,processorType,version,copyright,processorLocation,fontAwesomeIcon,projectName,containerImageparameters: a list of{ "name", "description", "type", "defaultValue" }
Example (from config/extra-processors.example.json):
{
"processors": [
{
"name": "Kafka Example",
"description": "This processor is used for building a Kafka cluster alongside the akhq frontend for visualizations",
"processorType": "Data Persistence",
"version": "0.1",
"copyright": "Apache",
"processorLocation": "Local Deployment",
"fontAwesomeIcon": "fa-solid fa-bus",
"projectName": "test",
"containerImage": "",
"parameters": [
{
"name": "KAFKA_NETWORK",
"description": "",
"type": "String",
"defaultValue": "kafka-network"
},
{
"name": "KAFKA_DATA",
"description": "",
"type": "String",
"defaultValue": "kafka-data"
},
{
"name": "KAFKA_HOSTNAME",
"description": "",
"type": "String",
"defaultValue": "kafka"
},
{
"name": "KAFKA_CONTAINER_NAME",
"description": "",
"type": "String",
"defaultValue": "kafka"
},
{
"name": "KAFKA_EXTERNAL_PORT",
"description": "",
"type": "String",
"defaultValue": "9092"
},
{
"name": "KAFKA_EXTERNAL_HOSTNAME_OR_IP",
"description": "",
"type": "String",
"defaultValue": "167.235.128.77"
},
{
"name": "KAFKA_INTERNAL_PORT",
"description": "",
"type": "String",
"defaultValue": "9094"
},
{
"name": "KAFKA_INTERNAL_HOSTNAME_OR_IP",
"description": "",
"type": "String",
"defaultValue": "kafka"
},
{
"name": "CLUSTER_ID",
"description": "kraft mode cluster id",
"type": "String",
"defaultValue": "cluster-id"
},
{
"name": "AKHQ_CONTAINER_NAME",
"description": "",
"type": "String",
"defaultValue": "akhq"
},
{
"name": "AKHQ_IMAGE",
"description": "",
"type": "String",
"defaultValue": "0.24.0"
},
{
"name": "AKHQ_PORT",
"description": "",
"type": "String",
"defaultValue": "8081"
},
{
"name": "AKHQ_CONNECTION_NAME_PREFIX",
"description": "",
"type": "String",
"defaultValue": "kafka-connection"
}
]
}
]
}
Behavior notes
- If the file is missing or invalid, initialization continues without failing.
- If a processor definition with the same
namealready exists, it is skipped (not overwritten). - Changing
name(for example, addingv2) creates a separate processor definition.
Notes about default values created during initialization
The Initialize resources action creates the default interface templates and processor definitions listed above with deployment-specific defaults (for example IPs/ports for Elasticsearch/Kafka). Review and update the created entities in the UI after initialization if the defaults do not match your environment.
Once these parameters are correctly set, you can proceed with the deployment
Starting the Application
- Navigate to the source directory containing the
Dockerfileanddocker-compose.ymlfiles. -
Run the following commands:
docker build -t wme . docker compose up -d
Verifying the Deployment
Wait for the services to start, then run the following commands:
-
Check if the WME container is running:
docker ps --filter name=wme-container --format "table {{.Image}}\t{{.Names}}"You should see the following output:
IMAGE NAMES wme wme-container -
Check if the MongoDB container is running:
docker ps --filter name=mongo --format "table {{.Image}}\t{{.Names}}"You should see the following output:
IMAGE NAMES mongo:latest mongodb-container
Stopping the Application
To stop the containers, run the following command:
docker compose down
Clean everything up.
Run the following command (at your own risk).
docker compose down --volumes --remove-orphans