Before diving into production-ready celery configuration, let’s get an understanding of why we use Celery in the first place. Whenever we work on some data-intensive application or some long-running tasks, it generally slows down the performance of the application, and users have to wait until the task is completed.
It was okay in the legacy systems where we used to wait for single-page applications also.
Modern users expect pages to load instantaneously, to solve this we consider many solutions like multiprocessing, multithreading, asynchronous functions using async/await or the Message Queues.
At a high level, message queuing is pretty simple. A process, called the Producer, publishes Messages to a Queue, where they are stored until a Consumer process is ready to consume them.
Celery decreases performance load by running part of the functionality as postponed tasks either on the same server as other tasks or on a different server. These workers can then make changes in the database, update the UI via webhooks or callbacks, add items to the cache, process files, send emails, queue future tasks, and more! All while our main web server remains free to respond to user requests.
But our life is not as simple as it seems correct!! By introducing celery will not take away all the problems. As celery is an open-source library and there is a hell of a lot of configuration in it. It will work amazingly beautifully in a dev mode where there is not much load, but if we deploy the configuration in production then it is total chaos.
Here is the production-ready celery configuration that will keep your production environment stable.
Production-Ready Celery Configuration
1. Gossip, Mingle and Events
Celery worker command-line arguments can decrease the message rates substantially. Place these options after the word ‘worker’ in your command line because the order of the celery options is strictly enforced in Celery 5.0. For example,
celery -A my_celery_app worker --without-heartbeat --without-gossip --without-mingle
Source: Without these arguments, Celery will send hundreds of messages per second with different diagnostic and redundant heartbeat messages. Unfortunately, details about these settings have been removed from the current documentation, but the implementation has not changed. Read more about the Celery worker functionality in the documentation.
can also be disabled by the configuration parameter in your settings.py
worker_send_task_event=false
2. Celery Ignore Results and celery backend
If you want to keep track of the tasks’ states, Celery needs to store or send the states somewhere. For this example, we use the RPC result backend, which sends states back as transient messages. The backend is specified via the backend
argument to Celery
, (or via the result_backend
setting if you choose to use a configuration module):
app = Celery('tasks', backend='rpc://', broker='pyamqp://')
Or if you want to use Redis as the result backend, but still use RabbitMQ as the message broker (a popular combination):
app = Celery('tasks', backend='redis://localhost', broker='pyamqp://')
For every task celery internally store the task state in celery_taskmeta
table. So before starting it processes the table data and after processing it updates the data in the table that decreases the performance.
Disable the result backend and ignore the task result, that will improve the performance of the application and you can create your own custom logic for the result backend.
By default, the result backend is disabled in celery, and to disable the result use the below configuration
task_ignore_result = True
in a configuration file or you can also disable per task
@app.task(ignore_result=True)
def task_with_no_result():
# ...code without return..
3. Worker soft time limit/hard limit
Generally, celery has no time limit for the tasks. A task can run an indefinite amount of time. That can cause your message queue to be unresponsive. So it is a good practice
Hard Limit
Global variable in config.py
task_time_limit=60 # task will be killed after 60 seconds
Task-specific settings
@app.task(time_limit=60)
def task_with_time_limit():
pass
Task hard time limit in seconds. The worker processing the task will be killed and replaced with a new one when this is exceeded.
Soft Limit
Global variable in config.py
task_soft_time_limit=60
# task will raise exception SoftTimeLimitExceeded after 60 seconds
Task-specific settings
@app.task(soft_time_limit=50, time_limit=60)
def task_with_time_limit():
pass
The task can catch this to clean up before the hard time limit comes:
from celery.exceptions import SoftTimeLimitExceeded@app.task(soft_time_limit=50, time_limit=60)
def mytask():
try:
return do_work()
except SoftTimeLimitExceeded:
cleanup_in_a_hurry()
4. acks_late
By default Celery first marks the task as ran and then executes it, this prevents a task from running twice in case of an unexpected shutdown. This is a sane default because we cannot guarantee that every task that every developer writes can be safely run twice. But if you proactively write Idempotent and atomic tasks turning on the task_acks_late
setting will not harm your application and will instead make it more robust. If not all tasks can be configured in that way, you can set acks_late
per task:
Global variable in Config.py
task_acks_late = True. # task messages will be acknowledged after the task has been executed, not just before (the default behavior).
Task-specific configurations
@app.task(acks_late=True)
def task_with_acks_late():
pass
5. -Ofair
By default, preforking Celery workers distribute tasks to their worker processes as soon as they are received, regardless of whether the process is currently busy with other tasks.
If you have 20 tasks and each takes 1 second to finish. You set up 4 workers to run through these 20 tasks:
celery worker -A ... -Q random-tasks --concurrency=4
This will take about 5 seconds to finish. 4 subprocesses, 5 tasks each.
But, if instead of 1 second, the first task (task 1 of 20) takes 10 seconds to complete, the total amount of time this queue will take to execute? It’s not 10 seconds — it’s 14 seconds.
That’s because the tasks get distributed evenly, so each subprocess gets 5 of the 20 tasks.
- Ofair disables this behavior and delegates the task to the worker as soon as they are available instead of preforking them with the tasks.
celery worker -A ... -Ofair -Q random-tasks --concurrency=4
6. Prefetch limit
Default: 4.
How many messages to prefetch at a time multiplied by the number of concurrent processes. The default is 4 (four messages for each process). The default setting is usually a good choice, however — if you have very long-running tasks waiting in the queue and you have to start the workers, note that the first worker to start will receive four times the number of messages initially. Thus the tasks may not be fairly distributed to the workers.
To disable prefetching, set worker_prefetch_multiplier
to 1. Changing that setting to 0 will allow the worker to keep consuming as many messages as it wants.
For tasks that are doing some network operation, it would be best to mark the prefetch limit as 1. And if the task messages are small and didn’t interacting with the network then it is best to increase the limit to a somewhat big number — 10
Global configuration in config.py
worker_prefetch_multiplier = 10
# One worker taks 10 tasks from queue at a time and will increase the performance
7. Worker concurrency
This is how many process/threads/green-threads that should process tasks at the same time.
If your workload is CPU bound then limit it to the number of cores you got (this is the default), more will only slightly decrease the performance.
celery worker -A ... -Q random-tasks --concurrency=4
But if you’re doing I/O, i.e. doing outgoing HTTP requests, talking to a database or any other external service, then you can increase it a lot, and gain a lot of performance, 200–500 is not unheard of.
Source: The prefork pool can take use of multiple processes, but how many is often limited to a few processes per CPU. With Eventlet you can efficiently spawn hundreds, or thousands of green threads. In an informal test with a feed hub system the Eventlet pool could fetch and process hundreds of feeds every second, while the prefork pool spent 14 seconds processing 100 feeds. Note that this is one of the applications async I/O is especially good at (asynchronous HTTP requests). You may want a mix of both Eventlet and prefork workers, and route tasks according to compatibility or what works best.
celery -A proj worker -P eventlet -c 1000
Summarizing everything in the code
celery_config.py
celery_app.py
Command to run
CPU Bound task
celery -A <task> worker -l info -n <name of task> -c 4 -Ofair -Q <queue name> — without-gossip — without-mingle — without-heartbeat
I/O task
celery -A <task> worker -l info -n <name of task> -Ofair -Q <queue name> -P eventlet -c 1000 — without-gossip — without-mingle — without-heartbeat
Conclusion
So this was my curated list of production-ready celery configurations for celery workers. I hope, it has helped you or it will help you. If you want to discuss anything or anything related to tech, you can contact me here or on the Contact Page.