Pulp Import and Export

Warning

Support for import and export is provided as a tech preview in Pulp 3. Functionality may not work or may be incomplete. Also, backwards compatibility when upgrading is not guaranteed.

Overview

There is a use-case for extracting the content and Artifacts for a set of RepositoryVersions, out of a running instance of Pulp and into a file that can then be transferred to another Pulp instance and imported. This is not the Pulp-to-Pulp sync case; the assumption is that the receiving Pulp instance is network-isolated.

The high-level workflow for this use case is

1. On the Upstream Pulp instance, an Exporter is defined for the set of Repositories that are to be exported to some Downstream Pulp instance.

2. That Exporter is requested to produce and execute an Export for the current RepositoryVersions of the specified Repositories

  1. The resulting .tar.gz Export is transferred to the appropriate Downstream.

4. On the Downstream Pulp instance, an Importer is defined, that maps the incoming Upstream Repositories to matching Downstream Repositories.

5. That Importer is requested to produce and execute an Import, pointing to the provided export file from the Upstream.

In order to minimize space utilization, import/export operates on sets of Repositories. This allows the Export operation to export shared Artifacts only once per-export, rather than once for each Repository being exported.

Note

Export will not operate on RepositoryVersions that have been synchronized using policy=on_demand. Artifacts must actually exist in order to be exported - this is, after all the only way for the Downstream Pulp instance to gain access to them!

Note

Import and Export strictly control which directories may be read from/written to via the settings options ALLOWED_IMPORT_PATHS and ALLOWED_EXPORT_PATHS. These default to empty, if not explicitly set attempts to import or export will fail with a validation error like

"Path '/tmp/exports/' is not an allowed export path"

Definitions

Upstream

Pulp instance whose RepositoryVersions we want to export

Downstream

Pulp instance that will be importing those RepositoryVersions

ModelResource

entity that understands how to map the metadata for a specific Model owned/controlled by a plugin to an exportable file-format (see django-import-export)

Exporter

resource that exports content from Pulp for a variety of different use cases

PulpExporter

kind-of Exporter, that is specifically used to export data from an Upstream for consumption by a Downstream

PulpExport

specific instantiation/run of a PulpExporter

Export file

compressed tarfile containing database content and Artifacts for RepositoryVersions, generated during execution of an Export

PulpImporter

resource that accepts an Upstream PulpExporter export file, and manages the process of importing the content and Artifacts included

PulpImport

specific instantiation/run of a PulpImporter

Repository-mapping

configuration file that provides the ability to map an Upstream Repository, to a Downstream Repository, into which the Upstream’s RepositoryVersion should be imported by a PulpImporter

Import order

for complicated repository-types, managing relationships requires that ModelResources be imported in order. Plugins are responsible for specifying the import-order of the ModelResources they own

Exporting

Note

The following examples assume a Pulp instance that includes the pulp_file and pulp_rpm plugins. They also assume that the http and jq packages are installed.

These workflows are executed on an Upstream Pulp instance.

Creating an Exporter

In this workflow, you define an Exporter for a set of Repositories. This Exporter can be invoked repeatedly to regularly export the current RepositoryVersion of each of the specified Repositories.

First, let’s make a pair of Repositories named zoo and isofile, and save their UUIDs as ZOO_UUID and ISOFILE_UUID

Set up ‘zoo’ repository”:

# Create the repository
export ZOO_HREF=$(http POST :/pulp/api/v3/repositories/rpm/rpm/ name=zoo | jq -r '.pulp_href')
#
# add a remote
http POST :/pulp/api/v3/remotes/rpm/rpm/ name=zoo url=https://fixtures.pulpproject.org/rpm-signed/  policy='immediate'
#
# find remote's href
export REMOTE_HREF=$(http :/pulp/api/v3/remotes/rpm/rpm/ | jq -r ".results[] | select(.name == \"zoo\") | .pulp_href")
#
# sync the repository to give us some content
http POST :$ZOO_HREF'sync/' remote=$REMOTE_HREF

Set up ‘isofile’ repository:

# create the repository
ISOFILE_HREF=$(http POST :/pulp/api/v3/repositories/file/file/ name=isofile | jq -r '.pulp_href')
#
# add remote
http POST :/pulp/api/v3/remotes/file/file/ name=isofile url=https://fixtures.pulpproject.org/file/PULP_MANIFEST
#
# find remote's href
REMOTE_HREF=$(http :/pulp/api/v3/remotes/file/file/ | jq -r ".results[] | select(.name == \"isofile\") | .pulp_href")
#
# sync the repository to give us some content
http POST :$ISOFILE_HREF'sync/' remote=$REMOTE_HREF

Now that we have Repositories with content, let’s define an Exporter named test-exporter that will export these Repositories to the directory /tmp/exports/:

export EXPORTER_HREF=$(http POST :/pulp/api/v3/exporters/core/pulp/ \
    name=test-exporter                                              \
    repositories:=[\"${ISOFILE_HREF}\",\"${ZOO_HREF}\"]             \
    path=/tmp/exports/ | jq -r '.pulp_href')
http GET :${EXPORTER_HREF}

Exporting Content

Once we have an Exporter defined, we invoke it to generate an export-file in the directory specified by that Exporter’s path attribute:

http POST :${EXPORTER_HREF}exports/

The resulting Export writes to a .tar.gz file, in the directory pointed to by the Exporter’s path, with a name that follows the convention export-<export-UUID>-YYYYmmdd_HHMM.tar.gz.

It will also produce a “table of contents” file describing the file (or files, see Exporting Chunked Files below) for later use verifying and importing the results of the export:

ls /tmp/exports
export-32fd25c7-18b2-42de-b2f8-16f6d90358c3-20200416_2000.tar.gz
export-32fd25c7-18b2-42de-b2f8-16f6d90358c3-20200416_2000-toc.json
python -m json.tool /tmp/exports/export-32fd25c7-18b2-42de-b2f8-16f6d90358c3-20200416_2000-toc.json
    {
    "meta": {
        "chunk_size": 0, # chunk_size in bytes, or 0 if an export did not use the chunk_size parameter
        "file": "export-32fd25c7-18b2-42de-b2f8-16f6d90358c3-20200416_2000.tar.gz",
        "global_hash": "eaef962943915ecf6b5e45877b162364284bd9c4f367d9c96d18c408012ef424"
    },
    "files": {
        "export-32fd25c7-18b2-42de-b2f8-16f6d90358c3-20200416_2000.tar.gz": "eaef962943915ecf6b5e45877b162364284bd9c4f367d9c96d18c408012ef424"
    }
}

These export files can now be transferred to a Downstream Pulp instance, and imported.

Note

In the event of any failure during an export, the process will clean up any partial export-files that may have been generated. Export-files can be very large; this will preserve available space in the export-directory.

Exporting Specific Versions

By default, the latest-versions of the Repositories specified in the Exporter are exported. However, you can export specific RepositoryVersions of those Repositories if you wish using the versions= parameter on the /exports/ invocation.

Following the above example - let’s assume we want to export the “zero’th” RepositoryVersion of the repositories in our Exporter.:

http POST :${EXPORTER_HREF}exports/ \
    versions:=[\"${ISO_HREF}versions/0/\",\"${ZOO_HREF}versions/0/\"]

Note that the “zero’th” RepositoryVersion of a Repository is created when the Repository is created, and is empty. If you unpack the resulting Export tar.gz you will find, for example, that there is no artifacts/ directory and an empty ArtifactResource.json file:

cd /tmp/exports
tar xvzf export-930ea60c-97b7-4e00-a737-70f773ebbb14-20200511_2005.tar.gz
    versions.json
    pulpcore.app.modelresource.ArtifactResource.json
    pulpcore.app.modelresource.RepositoryResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulpcore.app.modelresource.ContentResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulpcore.app.modelresource.ContentArtifactResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulp_rpm.app.modelresource.PackageResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulp_rpm.app.modelresource.ModulemdResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulp_rpm.app.modelresource.ModulemdDefaultsResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulp_rpm.app.modelresource.PackageGroupResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulp_rpm.app.modelresource.PackageCategoryResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulp_rpm.app.modelresource.PackageEnvironmentResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulp_rpm.app.modelresource.PackageLangpacksResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulp_rpm.app.modelresource.UpdateRecordResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulp_rpm.app.modelresource.DistributionTreeResource.json
    repository-3c1ec06a-b0d6-4d04-9f99-32bfc0a499a9_0/pulp_rpm.app.modelresource.RepoMetadataFileResource.json
    repository-958ae747-c19d-4820-828c-87452f1a5b8d_0/pulpcore.app.modelresource.ContentResource.json
    repository-958ae747-c19d-4820-828c-87452f1a5b8d_0/pulpcore.app.modelresource.ContentArtifactResource.json
    repository-958ae747-c19d-4820-828c-87452f1a5b8d_0/pulp_file.app.modelresource.FileContentResource.json
python -m json.tool pulpcore.app.modelresource.ArtifactResource.json
    []

Exporting Incrementally

By default, PulpExport exports all of the content and artifacts of the RepositoryVersions being exported. A common use-case is to do regular transfers of content from an Upstream to a Downstream Pulp instance. While you can export everything every time, it is an inefficient use of time and disk storage to do so; exporting only the “entities that have changed” is a better choice. You can accomplish this by setting the full parameter on the /exports/ invocation to False:

http POST :${EXPORTER_HREF}exports/ full=False

This results in an export of all content-entities, but only Artifacts that have been added since the last_export of the same Exporter.

You can override the use of last_export as the starting point of an incremental export by use of the start_versions= parameter. Building on our example Exporter, if we want to do an incremental export of everything that’s happened since the second RepositoryVersion of each Repository, regardless of what happened in our last export, we would issue a command such as the following:

http POST :${EXPORTER_HREF}exports/ \
    full=False                      \
    start_versions:=[\"${ISO_HREF}versions/1/\",\"${ZOO_HREF}versions/1/\"]

This would produce an incremental export of everything that had been added to our Repositories between RepositoryVersion ‘1’ and the current_version RepositoryVersions of our Repositories.

Finally, if we need complete control over incremental exporting, we can combine the use of start_versions= and versions= to produce an incremental export of everything that happened after start_versions= up to and including versions=:

http POST :${EXPORTER_HREF}exports/                                         \
    full=False                                                              \
    start_versions:=[\"${ISO_HREF}versions/1/\",\"${ZOO_HREF}versions/1/\"] \
    versions:=[\"${ISO_HREF}versions/3/\",\"${ZOO_HREF}versions/3/\"]

Note

Note that specifying start_versions= without specifying full=False (i.e., asking for an incremental export) is an error, since it makes no sense to specify a ‘starting version’ for a full export.

Exporting Chunked Files

By default, PulpExport streams data into a single .tar.gz file. Since Repositories can contain a lot of artifacts and content, that can result in a file too large to be copied to transport media. In this case, you can specify a maximum-file-size, and the export process will chunk the tar.gz into a series of files no larger than this.

You accomplish this by setting the chunk_size parameter to the desired maximum number of bytes. This parameter takes an integer, or size-units of KB, MB, or GB. Files appear in the Exporter.path directory, with a four-digit sequence number suffix:

http POST :/pulp/api/v3/exporters/core/pulp/1ddbe6bf-a6c3-4a88-8614-ad9511d21b94/exports/ chunk_size="10KB"
    {
        "task": "/pulp/api/v3/tasks/da3350f7-0102-4dd5-81e0-81becf3ffdc7/"
    }
ls -l /tmp/exports/
    10K export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0000
    10K export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0001
    10K export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0002
    10K export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0003
    10K export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0004
    10K export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0005
    2.3K export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0006
    1168 export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325-toc.json

The “table of contents” lists all the resulting files and their checksums:

python -m json.tool /tmp/exports/export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325-toc.json
{
    "meta": {
        "chunk_size": 10240,
        "file": "export-8c1891a3-ffb5-41a7-b141-51daa0e38a18-20200717_1947.tar.gz",
        "global_hash": "eaef962943915ecf6b5e45877b162364284bd9c4f367d9c96d18c408012ef424"
    },
    "files": {
        "export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0000": "8156874798802f773bcbaf994def6523888922bde7a939bc8ac795a5cbb25b85",
        "export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0001": "e52fac34b0b7b1d8602f5c116bf9d3eb5363d2cae82f7cc00cc4bd5653ded852",
        "export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0002": "df4a2ea551ff41e9fb046e03aa36459f216d4bcb07c23276b78a96b98ae2b517",
        "export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0003": "27a6ecba3cc51965fdda9ec400f5610ff2aa04a6834c01d0c91776ac21a0e9bb",
        "export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0004": "f35c5a96fccfe411c074463c0eb0a77b39fa072ba160903d421c08313aba58f8",
        "export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0005": "13458b10465b01134bde49319d6b5cba9948016448da9d35cb447265a25e3caa",
        "export-780822a4-d280-4ed0-a53c-382a887576a6-20200522_2325.tar.gz.0006": "a1986a0590943c9bb573c7d7170c428457ce54efe75f55997259ea032c585a35"
    }
}

Updating an Exporter

You can update an Exporter to modify a subset of its fields:

http PATCH :${EXPORTER_HREF} path=/tmp/newpath

Importing

Creating the importer

The first step to importing a Pulp export archive is to create an importer:

http :/pulp/api/v3/importers/core/pulp/ name="test"

By default, Pulp will map Repositories in the export to Repositories in Pulp by name. This can be overriden by supplying a repo mapping that maps names from the Pulp export to the names of repos in Pulp. For example, suppose the name of the repo in the Pulp export achive was ‘source’ and the repo in Pulp was ‘dest’. The following command would set up this mapping:

http :/pulp/api/v3/importers/core/pulp/ name="test" repo_mapping:="{\"source\": \"dest\"}"

Note

Pulp import expects destination repository/repositories to be present at the import process.

After the importer is created, a POST request to create an import will trigger the import process.

You can import an exported .tar.gz directly using the path parameter:

http POST :/pulp/api/v3/importers/core/pulp/f8acba87-0250-4640-b56b-c92597d344b7/imports/ \
  path="/data/export-113c8950-072b-432a-9da6-24da1f4d0a02-20200408_2015.tar.gz"

Or you can point the importer at the “table of contents” file that was produced by an export. If the TOC file is in the same directory as the export-files it points to, the import process will:

  • verify the checksum(s) of all export-files,

  • reassemble a chunked-export into a single .tar.gz

  • remove chunks as they are used (in order to conserve disk space)

  • verify the checksum of the resulting reassembled .tar.gz

and then import the result:

http POST :/pulp/api/v3/importers/core/pulp/f8acba87-0250-4640-b56b-c92597d344b7/imports/ \
  toc="/data/export-113c8950-072b-432a-9da6-24da1f4d0a02-20200408_2015-toc.json"

Note

The directory containing the file pointed to by path or toc must be defined in the ALLOWED_IMPORT_PATHS setting or the import will fail.

The command to create an import will return a task that can be used to monitor the import. You can also see a history of past imports:

http :/pulp/api/v3/importers/core/pulp/f8acba87-0250-4640-b56b-c92597d344b7/imports/

Pre-validating import parameters

There are a number of things that can keep an import from being successful, ranging from a specified export-file not being available to bad JSON specified for repo_mapping. You can pre-validate your proposed import using the import-check command:

http POST :/pulp/api/v3/importers/core/pulp/import-check/ \
  path=/tmp/export-file-path toc=/tmp/export-toc-path repo_mapping:="{\"source\": \"dest\"}"

import-check will validate that:

  • paths are in ALLOWED_IMPORT_PATHS

  • containing directory exists

  • containing directory is readable

  • path/toc file(s) exist and are readable

  • for TOC, containing directory is writeable

  • repo_mapping is valid JSON

import-check is a low-overhead synchronous call. It does not attempt to do validations that require database access or long-running tasks such as verifying checksums. All parameters are optional.

Note

For path and toc, if the ALLOWED_IMPORT_PATHS check fails, no further information will be given.