Production-ready Celery configuration

Before diving into production-ready celery configuration, let’s get an understanding of why we use Celery in the first place. Whenever we work on some data-intensive application or some long-running tasks, it generally slows down the performance of the application, and users have to wait until the task is completed.

It was okay in the legacy systems where we used to wait for single-page applications also.

Monkey Computer Not Working GIF from Monkey GIFs

Modern users expect pages to load instantaneously, to solve this we consider many solutions like multiprocessing, multithreading, asynchronous functions using async/await or the Message Queues.

production-ready celery configuration – Queuing Systems

At a high level, message queuing is pretty simple. A process, called the Producer, publishes Messages to a Queue, where they are stored until a Consumer process is ready to consume them.

Here You Go Celery GIF from Hereyougo GIFs

Celery decreases performance load by running part of the functionality as postponed tasks either on the same server as other tasks or on a different server. These workers can then make changes in the database, update the UI via webhooks or callbacks, add items to the cache, process files, send emails, queue future tasks, and more! All while our main web server remains free to respond to user requests.

But our life is not as simple as it seems correct!! By introducing celery will not take away all the problems. As celery is an open-source library and there is a hell of a lot of configuration in it. It will work amazingly beautifully in a dev mode where there is not much load, but if we deploy the configuration in production then it is total chaos.

Here is the production-ready celery configuration that will keep your production environment stable.

Friends Joey GIF from Friends GIFs

Production-Ready Celery Configuration

1. Gossip, Mingle and Events

Celery worker command-line arguments can decrease the message rates substantially. Place these options after the word ‘worker’ in your command line because the order of the celery options is strictly enforced in Celery 5.0. For example,

celery -A my_celery_app worker --without-heartbeat --without-gossip --without-mingle

Source: Without these arguments, Celery will send hundreds of messages per second with different diagnostic and redundant heartbeat messages. Unfortunately, details about these settings have been removed from the current documentation, but the implementation has not changed. Read more about the Celery worker functionality in the documentation.

can also be disabled by the configuration parameter in your settings.py

worker_send_task_event=false

2. Celery Ignore Results and celery backend

If you want to keep track of the tasks’ states, Celery needs to store or send the states somewhere. For this example, we use the RPC result backend, which sends states back as transient messages. The backend is specified via the backend argument to Celery, (or via the result_backend setting if you choose to use a configuration module):

app = Celery('tasks', backend='rpc://', broker='pyamqp://')

Or if you want to use Redis as the result backend, but still use RabbitMQ as the message broker (a popular combination):

app = Celery('tasks', backend='redis://localhost', broker='pyamqp://')

For every task celery internally store the task state in celery_taskmeta table. So before starting it processes the table data and after processing it updates the data in the table that decreases the performance.

Disable the result backend and ignore the task result, that will improve the performance of the application and you can create your own custom logic for the result backend.

By default, the result backend is disabled in celery, and to disable the result use the below configuration

task_ignore_result = True

in a configuration file or you can also disable per task

@app.task(ignore_result=True)
def task_with_no_result():
    # ...code without return..

3. Worker soft time limit/hard limit

Generally, celery has no time limit for the tasks. A task can run an indefinite amount of time. That can cause your message queue to be unresponsive. So it is a good practice

Hard Limit

Global variable in config.py

task_time_limit=60  # task will be killed after 60 seconds

Task-specific settings

@app.task(time_limit=60)
def task_with_time_limit():
    pass

Task hard time limit in seconds. The worker processing the task will be killed and replaced with a new one when this is exceeded.

Soft Limit

Global variable in config.py

task_soft_time_limit=60  
# task will raise exception SoftTimeLimitExceeded after 60 seconds

Task-specific settings

@app.task(soft_time_limit=50, time_limit=60)
def task_with_time_limit():
    pass

The task can catch this to clean up before the hard time limit comes:

from celery.exceptions import SoftTimeLimitExceeded@app.task(soft_time_limit=50, time_limit=60)
def mytask():
    try:
        return do_work()
    except SoftTimeLimitExceeded:
        cleanup_in_a_hurry()

4. acks_late

By default Celery first marks the task as ran and then executes it, this prevents a task from running twice in case of an unexpected shutdown. This is a sane default because we cannot guarantee that every task that every developer writes can be safely run twice. But if you proactively write Idempotent and atomic tasks turning on the task_acks_late setting will not harm your application and will instead make it more robust. If not all tasks can be configured in that way, you can set acks_late per task:

Global variable in Config.py

task_acks_late = True. # task messages will be acknowledged after the task has been executed, not just before (the default behavior).

Task-specific configurations

@app.task(acks_late=True)
def task_with_acks_late():
    pass

5. -Ofair

By default, preforking Celery workers distribute tasks to their worker processes as soon as they are received, regardless of whether the process is currently busy with other tasks.

If you have 20 tasks and each takes 1 second to finish. You set up 4 workers to run through these 20 tasks:

celery worker -A ... -Q random-tasks --concurrency=4

This will take about 5 seconds to finish. 4 subprocesses, 5 tasks each.

But, if instead of 1 second, the first task (task 1 of 20) takes 10 seconds to complete, the total amount of time this queue will take to execute? It’s not 10 seconds — it’s 14 seconds.

That’s because the tasks get distributed evenly, so each subprocess gets 5 of the 20 tasks.

\https://medium.com/@taylorhughes/three-quick-tips-from-two-years-with-celery-c05ff9d7f9eb

Ofair disables this behavior and delegates the task to the worker as soon as they are available instead of preforking them with the tasks.

celery worker -A ... -Ofair -Q random-tasks --concurrency=4

6. Prefetch limit

Default: 4.

How many messages to prefetch at a time multiplied by the number of concurrent processes. The default is 4 (four messages for each process). The default setting is usually a good choice, however — if you have very long-running tasks waiting in the queue and you have to start the workers, note that the first worker to start will receive four times the number of messages initially. Thus the tasks may not be fairly distributed to the workers.

To disable prefetching, set worker_prefetch_multiplier to 1. Changing that setting to 0 will allow the worker to keep consuming as many messages as it wants.

For tasks that are doing some network operation, it would be best to mark the prefetch limit as 1. And if the task messages are small and didn’t interacting with the network then it is best to increase the limit to a somewhat big number — 10

Global configuration in config.py

worker_prefetch_multiplier = 10  
# One worker taks 10 tasks from queue at a time and will increase the performance

7. Worker concurrency

This is how many process/threads/green-threads that should process tasks at the same time.

If your workload is CPU bound then limit it to the number of cores you got (this is the default), more will only slightly decrease the performance.

celery worker -A ... -Q random-tasks --concurrency=4

But if you’re doing I/O, i.e. doing outgoing HTTP requests, talking to a database or any other external service, then you can increase it a lot, and gain a lot of performance, 200–500 is not unheard of.

Source: The prefork pool can take use of multiple processes, but how many is often limited to a few processes per CPU. With Eventlet you can efficiently spawn hundreds, or thousands of green threads. In an informal test with a feed hub system the Eventlet pool could fetch and process hundreds of feeds every second, while the prefork pool spent 14 seconds processing 100 feeds. Note that this is one of the applications async I/O is especially good at (asynchronous HTTP requests). You may want a mix of both Eventlet and prefork workers, and route tasks according to compatibility or what works best.

celery -A proj worker -P eventlet -c 1000

Summarizing everything in the code

celery_config.py

celery_app.py

Command to run

CPU Bound task

celery -A <task> worker -l info -n <name of task> -c 4 -Ofair -Q <queue name> — without-gossip — without-mingle — without-heartbeat

I/O task

celery -A <task> worker -l info -n <name of task> -Ofair -Q <queue name> -P eventlet -c 1000 — without-gossip — without-mingle — without-heartbeat

Conclusion

So this was my curated list of production-ready celery configurations for celery workers. I hope, it has helped you or it will help you. If you want to discuss anything or anything related to tech, you can contact me here or on the Contact Page.

8 min readA complete guide to production-ready Celery configuration