Asynchronous Workers
Sentry comes with a built-in queue to process tasks in a more asynchronous fashion. For example when an event comes in instead of writing it to the database immediately, it sends a job to the queue so that the request can be returned right away, and the background workers handle actually saving that data.
Sentry relies on the Celery library for managing workers.
Registering a Task
Sentry configures tasks with a special decorator that allows us more explicit control over the callable.
from sentry.tasks.base import instrumented_task
@instrumented_task(
name="sentry.tasks.do_work",
queue="important_queue",
default_retry_delay=60 * 5,
max_retries=None,
)
def do_work(kind_of_work, **kwargs):
# ...
There are a few important points:
The task name must be declared.
The task name is how Celery identifies messages (requests) and which function and worker needs to handle those messages. If the tasks are not named celery will derive a name from the module and function names which makes the name tied to the location of the code and more brittle for future code maintenance.
Tasks must accept `**kwargs`` to handle rolling compatibility.
This ensures tasks will accept any message which happens to be in the queue rather than fail for unknown arguments. It helps with rolling changes back and forwards, deploys are not instant and messages may be produced with multiple versions of arguments.
While this allows rolling forwards and backwards without complete task failures, care must still be taken for workers to handle messages with both old and new arguments when changing the arguments. This does reduce the number of required changes in such a migration a little and gives some more operator flexibility, but message loss because of unknown arguments is still not acceptable.
When adding an argument to an existing task, make a PR adding the new parameter with a safe default and wait for it to deploy before merging a PR that starts to use the value. The new argument should be optional to allow for rollbacks.
Tasks should automatically retry on failure.
Tasks arguments should be primitive types and small.
Task arguments are serialised into the message sent across the brokers and the workers need to deserialise them again. Doing this with complex types is brittle and should be avoided. E.g. prefer to pass an ID to the task which can be used to load the data from cache rather than the data itself.
Similarly, to keep the message brokers and workers running efficiently serialising large values into the message results in large messages, large queues and more (de)serialising overheads so should be avoided.
The tasks' module must be added to
CELERY_IMPORTS
.Celery workers must find the task by name, they can only do so if the worker has imported the module with the decorated task function because this is what registers the task by name. Thus every module containing a task must be added to the
CELERY_IMPORTS
setting insrc/sentry/conf/server.py
.
Running a Worker
Workers can be run by using the Sentry CLI.
$ sentry run worker
Starting the Cron Process
Sentry schedules routine jobs via a cron process:
SENTRY_CONF=/etc/sentry sentry run cron
Configuring the Broker
Sentry supports two primary brokers which may be adjusted depending on your workload: RabbitMQ and Redis.
Redis
The default broker is Redis, and will work under most situations. The primary limitation to using Redis is that all pending work must fit in memory.
BROKER_URL = "redis://localhost:6379/0"
If your Redis connection requires a password for authentication, you need to use the following format:
BROKER_URL = "redis://:password@localhost:6379/0"
RabbitMQ
If you run with a high workload, or have concerns about fitting the pending workload in memory, then RabbitMQ is an ideal candidate for backing Sentry’s workers.
BROKER_URL = "amqp://guest:guest@localhost:5672/sentry"