Importers¶
Overview¶
The fundamental role of an importer is to bring new units into a Pulp repository. The typical method is the sync process through which the contents of an external source (yum repository, Puppet Forge, etc.) are downloaded and inventoried on the Pulp server. Additionally, the importer is also responsible for handling uploaded units (inventorying and persistence on disk) and any logic involved with copying units between repositories.
Operations cannot be performed on an importer until it is attached to a repository. When adding an importer to a repository, the importer’s configuration will be stored in the Pulp server and provided to the importer in each operation. More information on how this configuration functions can be found in the configuration section of this guide.
Only one importer may be attached to a repository at a time.
The Plugin Conventions page describes behavior and APIs common to both importers and distributors.
Note
Currently, the API for the base class is not published. The code can
be found at Importer
in platform/src/pulp/plugins/importer.py
.
Implementation¶
Each importer must subclass the pulp.plugins.importer.Importer
class. That class defines
the operations an importer may be requested to perform on a repository. Not every method must
be overridden in the subclass. Some, such as the lifecycle methods
will have no effect. Others, such as upload_unit
, will raise an exception indicating the
operation is not supported by the importer if not overridden.
Warning
The importer instance is not reused between invocations. Any state maintained in the importer is only valid during the current operation’s execution. If state is required across multiple operations, the plugin’s scratchpad should be used to store the necessary information.
There are two methods in the Importer
class that must be overridden in order for the
importer to work:
Metadata¶
The importer implementation must implement the metadata
method as
described here.
Configuration Validation¶
The importer implementation must implement the validate_config
method as
described here.
Functionality¶
There are a number of abilities an importer implementation can support. All of these are optional; it is possible to have an importer that handles uploaded units but has no support for synchronizing against an external repository.
The sections below will cover an overview of each feature. More information on the specifics of how to implement them are found in the docstrings for each method.
Warning
Importers that implement a sync method must also implement support for cancelling the sync.
Synchronize an External Respository¶
Methods: sync_repo
, cancel_sync_repo
One of the most common uses of an importer is to download content from an external source and inventory it in the Pulp server. The importer serves as an adapter between the Pulp server and the external repository, using whatever protocols are necessary.
While the importer is responsible for downloading the unit, it is up to Pulp to determine
the absolute path on disk to store it. The importer provides a relative path for where it
would like to store the unit, taking into account enough information to create a unique
path. This is passed to the conduit’s init_unit
call which allows Pulp to derive the
absolute path on the server to store it. The path will be in the returned
pulp.plugins.model.Unit
object in the storage_path
attribute.
Plugin implementations for repository sync will obviously vary wildly. Below is a short outline of a common sync process.
- Call the conduit’s
get_units
method to understand what units are already associated with the repository being synchronized. - For each new unit to add to the Pulp server and associate with the repository,
the plugin takes the following steps:
- Calls the conduit’s
init_unit
which takes unit specific metadata and allows Pulp to populate any calculated/derived values for the unit. The result of this call is an object representation of the unit. - Uses the
storage_path
field in the returned unit to save the bits for the unit to disk. - Calls the conduit’s
save_unit
which creates/updates Pulp’s knowledge of the content unit and creates an association between the unit and the repository - If necessary, calls the conduit’s
link_unit
to establish any relationships between units.
- Calls the conduit’s
- For units previously associated with the repository (known from
get_units
) that should no longer be, calls the conduit’sremove_unit
to remove that association.
Note
It is valid for a unit to be purely metadata and not have a corresponding file. In these
cases, simply specify a relative path of None
to the init_unit
call and ignore the
step about using the storage_path
.
The conduit defines a set_progress
call that should be used throughout the process
to update the Pulp server with details on what has been accomplished and what remains to be
done. The Pulp server does not require these calls. The progress message must be JSON-serializable
(primitives, lists, dictionaries) but is otherwise entirely at the discretion of the plugin writer.
The most recent progress report is saved in the database and made available to users as a means
to track the progress of the sync.
When implementing the sync functionality, the importer’s cancel_sync_repo
method must be
implemented as well. This call will be made on the same instance performing the sync, therefore
it is valid to use an instance variable as a flag the sync process uses to determine if it should
continue to proceed.
Upload Units¶
Method: upload_unit
The Pulp server provides the infrastructure for users to upload units into a repository. It is the job of the importer to take the steps necessary to:
- Generate and save the inventoried representation of the unit.
- Determine the appropriate relative path at which to store the unit.
- Move the unit from the provided temporary location to the final storage path as provided by Pulp.
The conduit provides the init_unit
and save_unit
calls as described in Synchronize an External Respository.
Refer to that section for more information on usage.
Import Units¶
Method: import_units
The Pulp server provides an API for selecting units to copy from one repository to another. The
importer’s import_units
method is called on the destination repository to handle the
copy.
There are two approaches to handling this method:
- In most cases, the unit can be shared between the two repositories. A new association is created
between the destination repository and the original database representation of the unit. This
approach is accomplished by simply calling the conduit’s
save_unit
method for each unit to be copied. - In certain cases, the same unit cannot be safely referenced by both repositories. A new unit
must be created using the
init_unit
method and then saved to the repository withsave_unit
in the same way as in Synchronize an External Respository.
Note
Take note if which attributes on the unit are required for use when importing. It is then possible to specify in the associate request’s unit association criteria which fields should be loaded, which result in reduced RAM use during the import process, especially for units with a lot of metadata.
Remove Units¶
Method: remove_units
When a user unassociates units from a repository, the Pulp server will make the necessary database
changes to reflect the change. The remove_units
method is called on the repository’s importer
to allow the importer to perform any clean up steps is may need to make, such as removing any
data it may have been storing about the unit from the working directory. In most cases, this method
does not need to be overridden.
Warning
This call should not remove the unit from its final location specified by Pulp. Pulp will handle the deletion of the file itself during its orphan clean up process.