.. _rq: http://python-rq.org
.. _deployment:
Architecture and Deploying
==========================
Pulp's architecture has three components to it: a REST API, a content serving application, and the
tasking system. Each component can be horizontally scaled for both high availability and/or
additional capacity for that part of the architecture.
REST API
--------
Pulp's REST API is a Django application which runs under any WSGI server. It serves the following
things:
* The REST API hosted at ``/pulp/api/v3/``
* The browse-able documentation at ``/pulp/api/v3/docs/``
* Any viewsets or views provided by plugins
* Static content used by Django, e.g. images used by the browse-able API. This is not Pulp content.
.. note::
A simple, but limited way to run the REST API as a standalone service using the built-in Django
runserver. The ``pulpcore-manager`` command is ``manage.py`` configured with the
``DJANGO_SETTINGS_MODULE="pulpcore.app.settings"``. Run the simple webserver with::
$ pulpcore-manager runserver 24817
.. warning::
Until Role-Based Access Control is added to Pulp, REST API is not safe for multi-user use.
Sensitive credentials can be read by any user, e.g. ``Remote.password``, ``Remote.client_key``.
The REST API can be deployed with any any WSGI webserver like a normal Django application. See the
`Django deployment docs `_ for more
information.
Content Serving Application
---------------------------
An aiohttp.server based application that serves content to clients. The content could be
:term:`Artifacts` already downloaded and saved in Pulp, or
:term:`on-demand content units`. When serving
:term:`on-demand content units` the downloading also happens from within this
component as well.
.. note::
Pulp installs a script that lets you run the content serving app as a standalone service as
follows:::
$ pulp-content
The content serving application can be deployed like any aiohttp.server application. See the
`aiohttp Deployment docs `_ for more
information.
Tasking System
--------------
Pulp's tasking system has two components: a resource manager and workers, all of which are run using
`rq`_.
Worker
Pulp workers perform most tasks "run" by the tasking system including long-running tasks like
synchronize and short-running tasks like a Distribution update. Each worker handles one task at a
time, and additional workers provide more concurrency. Workers auto-name and are auto-discovered,
so they can be started and stopped without notifying Pulp.
Resource Manager
A different type of Pulp worker that plays a coordinating role for the tasking system. You must
run exactly one of these for Pulp to operate correctly. The ``resource-manager`` is identified by
configuring using exactly the name ``resource-manager`` with the ``-n 'resource_manager'`` option.
*N* ``resource-manager`` rq processes can be started with 1 being active and *N-1* being passive.
The *N-1* will exit and should be configured to auto-relaunch with either systemd, supervisord, or
k8s.
.. note::
Pulp serializes tasks that are unsafe to run in parallel, e.g. a sync and publish operation on
the same repo should not run in parallel. Generally tasks are serialized at the "repo" level, so
if you start *N* workers you can process *N* repo sync/modify/publish operations concurrently.
Static Content
--------------
When browsing the REST API or the browsable documentation with a web browser, for a good experience,
you'll need static content to be served.
In Development
^^^^^^^^^^^^^^
If using the built-in Django webserver and your settings.yaml has ``DEBUG: True`` then static
content is automatically served for you.
In Production
^^^^^^^^^^^^^
Collect all of the static content into place using the ``collectstatic`` command. The
``pulpcore-manager`` command is ``manage.py`` configured with the
``DJANGO_SETTINGS_MODULE="pulpcore.app.settings"``. Run ``collectstatic`` as follows::
$ pulpcore-manager collectstatic
Hardware requirements
---------------------
.. note::
This section is updated based on your feedback. Feel free to share what your experience is https://pulpproject.org/help/
.. note::
These are empirical guidelines to give an idea how to estimate what you need. It hugely
depends on the scale of the setup (how much content you need, how many repositories you plan
to have), frequency (how often you run various tasks) and the workflows (which tasks you
perform, which plugin you use) of each specific user.
CPU
***
CPU count is recommended to be equal to the number of pulp workers. It allows to perform N
repository operations concurrently. E.g. 2 CPUs, one can sync 2 repositories concurrently.
RAM
***
Out of all operations the highest memory consumption task is likely synchronization of a remote
repository. Publication can also be memory consuming, however it depends on the plugin.
For each worker, the suggestion is to plan on 1GB to 3GB. E.g. 4 workers would need 4GB to 12 GB
For the database, 1GB is likely enough.
The range for the workers is quite wide because it depends on the plugin. E.g. for RPM plugin, a
setup with 2 workers will require around 8GB to be able to sync large repositories. 4GB is
likely not enough for some repositories, especially if 2 workers both run sync tasks in parallel.
Disk
****
For disk size, it depends on how one is using Pulp and which storage is used.
Pulp behaviour
^^^^^^^^^^^^^^
* Pulp de-duplicates content.
* There are different policies for downloading content. It is possible not to store any content
at all.
* If plugin needs to generate metadata for a repository, it will be in the artifact storage,
even if the download policy is configured not to save any content.
* Pulp verifies downloaded artifact checksums locally and artifacts are downloaded/verified in
parallel, so some local storage is needed, even if the download policy is configured not to save
any content and an external storage, like S3, is used.
Empirical estimation
^^^^^^^^^^^^^^^^^^^^
* If S3 is used as a backend for artifact storage, it is not required to have a large local
storage. 30GB should be enough in the majority of cases.
* If no content is planned to be stored in the artifact storage, aka only sync from
remote source and only with the ``streamed`` policy, some storage needs to be allocated for
metadata. It depends on the plugin, the size of a repository and the number of different
publications. 5GB should be enough for medium-large installation.
* If content is downloaded ``on_demand``, aka only packages that clients request from Pulp. A
good estimation would be 30% of the whole repository size, including futher updates to the
content. That the most common usage pattern. If clients use all the packages from a repository,
it would use 100% of the repository size.
* If all content needs to be downloaded, the size of all repositories together is needed.
Since Pulp de-duplicates content, this calculation assumes that all repositories have unique
content.
* Any additional content, one plans to upload to or import into Pulp, needs to be counted as well.
* DB size needs to be taken into account as well.
E.g. For syncing remote repositories with ``on_demand`` policy and using local storage, one
would need 50GB + 30% of size of all the repository content + the DB.