Tuning and Monitoring ====================== Tuning ------ WSGI Processes ^^^^^^^^^^^^^^ By default, each Apache server on which Pulp is deployed will start 3 WSGI processes to serve the REST API. The number of processes can be adjusted in ``/etc/httpd/conf.d/pulp.conf`` on the ``WSGIDaemonProcess`` statement, along with other items. See the Apache documentation of ``mod_wsgi`` for details. For tuning purposes, consider Pulp's REST API to be a low-traffic web application that has occasional spikes in memory use when returning large data sets. Most of pulp's heavy-lifting has been offloaded to celery workers. Pulp and Mongo Database ^^^^^^^^^^^^^^^^^^^^^^^ Pulp uses Mongo to manage repository information as well as content metadata. Mongo can be run on the same machine as Pulp, but we recommend that it run on dedicated hardware for larger production deployments. At this time, Pulp can be used with `replication `_ but does not support sharding. If searches for content are performing poorly, performance may be improved by adding an index for the collection responsible for that content type. Each content type has a collection called `unit_`. More about index creation can be found here_. .. _here: http://docs.mongodb.org/manual/core/index-creation/ .. _process_recycling: Memory Issues ^^^^^^^^^^^^^ Pulp workers do not release all unused memory back to the system once tasks are complete. This is a known issues with the version of Python that Pulp uses. To work around this problem, Pulp supports worker process recycling to terminate a worker process after X tasks and replace it with a new one. This will release unused memory back to the system after tasks complete. This will not interfere with your usage of Pulp, but it does incur a small runtime overhead on the tasking system from killing and respawning processes regularly. See the ``PULP_MAX_TASKS_PER_CHILD`` variable in your ``/etc/default/pulp_workers`` file to enable this feature. After adjusting the configuration value you will need to restart your ``pulp_workers`` processes. Monitoring ---------- Monitoring for outages ^^^^^^^^^^^^^^^^^^^^^^ While Pulp has a number of processes, users will interact with Pulp via httpd. At a minimum, your monitoring system should alert for the following issues: * `httpd` is not responsive on ports 80 or 443 * storage volumes associated with Pulp are about to run out of space * Mongo is not responsive * Apache Qpid or RabbitMQ is not responsive You may also want to alert if no Pulp workers are available. This is optional since it affects long-running background tasks like syncing and publishing but would not affect content downloads for consumer systems. Please consult the documentation of your monitoring software for information on how to check for these types of issues. Monitoring for performance issues ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Performance issues fall into a number of categories. However, here are some typical statistics that can be collected and reviewed periodically: * work queue depth * repository sync time * repository publish time * concurrent `httpd` connections to ports 80 and 443 * storage volume space usage Many of these statistics can be collected and viewed using tools like `Celery Flower `_ or `Munin `_.