Profiling the Stages API Performance

Pulp has a performance data collection feature that collects statistics about a Stages API pipeline as it runs. The data is recorded to a sqlite3 database in the /var/lib/pulp/debug folder.

This can be enabled with the PROFILE_STAGES_API = True setting in the Pulp settings file. Once enabled it will write a sqlite3 with the uuid of the task name it runs in to the /var/lib/pulp/debug/ folder.

Summarizing Performance Data

pulpcore-manager includes command that displays the pipeline along with summary statistics. After generating a sqlite3 performance database, use the stage-profile-summary command like this:

$ pulpcore-manager stage-profile-summary /var/lib/pulp/debug/2dcaf53a-4b0f-4b42-82ea-d2d68f1786b0

Profiling API Machinery

class pulpcore.plugin.stages.ProfilingQueue(stage_uuid, *args, **kwargs)

A customized subclass of asyncio.Queue that records time in the queue and between queues.

This Profiler records some data on items that are inserted and removed from Queues. This data is stored on items in a dictionary attribute called ‘extra_data’. If this attribute does not exist on an item, the ProfileQueue adds it.

The following statistics are computed for each Queue and the stage that it feeds into:

  • waiting time - The number of seconds an item waited in the Queue for this stage.

  • service time - The number of seconds an item received service in this stage.

  • queue_length - The number of waiting items in the queue, measured before each new arrival.

  • interarrival_time - The number of seconds since the previous arrival to this Queue.

See the create_profile_db_and_connection() docs for more info on the database tables and layout.

Parameters
  • stage_uuid (uuid.UUID) – The uuid of the stage this ProfilingQueue delivers work into.

  • args (tuple) – unused positional arguments

  • kwargs (dict) – unused keyword arguments

stages.create_profile_db_and_connection()

Create a profile db from this tasks UUID and a sqlite3 connection to that databases.

The database produced has three tables with the following SQL format:

The stages table stores info about the pipeline itself and stores 3 fields * uuid - the uuid of the stage * name - the name of the stage * num - the number of the stage starting at 0

The traffic table stores 3 fields: * uuid - the uuid of the stage this queue feeds into * waiting_time - the amount of time the item is waiting in the queue before it enters the stage. * service_time - the service time the item spent in the stage.

The system table stores 3 fields: * uuid - The uuid of stage this queue feeds into * length - The length of items in this queue, measured just before each arrival. * interarrival_time - The amount of time since the last arrival.