Configure ephemeral volumes

Ephemeral volumes are ideal for workflows, such as raster analytics, that need to load data to a temporary space for processing. Administrators can create persistent volume templates that will generate multiple ephemeral volumes on-demand for the pods of a specified service deployment. Each pod in that deployment is then able to use its own ephemeral volume, providing each pod with its own resources to draw from. Once the pods are deleted, and the ephemeral volumes are no longer needed, the volumes are also removed.

Use ephemeral volumes for raster analytics

Some raster analysis tools distribute computation across multiple worker pods and write temporary data while performing analysis. When processing large amounts of data, it is recommended to configure ephemeral volumes to provide increased disk space to store temporary data as it is processed.

The necessary disk space to store the temporary data varies for different raster analysis capabilities, however, across all raster analysis capabilities, it is proportional to the number of cells in the raster to process and evenly distributed across the number of worker pods configured. Larger available disk space may be required depending on the complexity of the analysis and larger inputs.

Your cluster configuration must have the necessary disk space allocated for storing the temporary files associated with running a given analysis tool. Temporary data is managed internally by each tool and deleted after the processing has finished. The ephemeral storage configuration that you provide for this will be used as a persisted volume template that gets applied to the raster processing service deployment.

Storage guidelines for raster analytics

To support distributed raster analysis, particularly when running large analysis, it is recommended to configure ephemeral volumes. When considering how much storage to allocate, consider that the amount of disk space required will vary based on the number of cells in the raster to process and the number of worker pods configured.

For example, to process a raster of 2.5 billion cells (50,000 rows and 50,000 columns) you may need 30 gigabytes (GB) of disk space when using the Fill tool. Alternatively, to process a raster of 1 billion cells (roughly 30,000 rows and 30,000 columns) you may need 12 GB of disk space when using the Fill tool. In both cases, the total disk space required will be evenly distributed across the number of pods configured. If you have allocated 10 pods for the RasterProcessing service, each pod will require 3 GB to process 2.5 billion cells and 1.2 GB to process 1 billion cells. Hence, in this case, the RasterProcessing service will need ephemeral storage to be set up with 3 GB or 1.2 GB of disk space. Each pod that spins up will get this amount of temporary or ephemeral disk space. If you have set up autoscaling, you will need to use the maximum number of pods as the basis for this calculation.

The total disk space required to process a raster of 2.5 billion cells for different types of tools is:

  • 17 to 35 GB for hydrology analysis
  • 20 to 80 GB for distance analysis
  • 30 to 33 GB for generalization analysis

Some use cases may require a higher disk space based on the complexity of the analysis and the additional inputs and outputs specified. In such cases, to process an input raster of 2.5 billion cells, up to 90 GB of disk space may be used in a hydrology analysis workflow, and up to 170 GB in a distance analysis workflow.

When determining your organization's requirements for configuring ephemeral volumes, consider the following recommendations:

  1. First, determine how many pods are needed to support your workflow. For example, if your RasterProcessing service is configured with 10 pods, then distribute your total space needed across these pods. If the service is enabled with autoscaling, then use the maximum number of pods as the basis for this calculation.
  2. Next, determine an approximate amount of disk space needed for processing, based on the size (number of rows and columns) of your raster dataset.
  3. Divide the total disk space approximation by the number of pods that are allocated for the RasterProcessing service. This number is a good general estimate for the amount of storage needed to configure the ephemeral volume. When the ephemeral volume is attached to your RasterProcessing service, each pod will dynamically request this storage when it spins up. For example, if you have a 30GB total disk space requirement, and 10 pods running in the RasterProcessing service deployment, you would configure 3GB for each ephemeral volume.

Configure for raster analytics

Steps are provided to configure ephemeral volumes in support of raster analytics. You will use the ArcGIS Enterprise Administrator API Directory to create a persisted volume template and apply it to the RasterProcessing service deployment.

  1. Sign in to the ArcGIS Enterprise Administrator API Directory as an administrator.
  2. Click System > Volumes > Configurations.
  3. Click Create Volume Configuration.
  4. In the volume configuration JSON, include the specification for your ephemeral volume. If needed, work with your IT administrator for this specification.

    {
      "name": "<user-provided-name>",
      "type": "PVC_TEMPLATE",
      "spec": {
        "storageClassName": "<user-provided-storageclass-name>",
        "resources": {"requests": {"storage": "<user-provided-size, i.e. 3Gi>"}},
        "accessModes": ["ReadWriteOnce"],
        "volumeMode": "Filesystem",
        "volumeName": "<user-provided-optional-volume-name>"
    
    
              }
    }
    

  5. Once the volume configuration is created, locate its associated VolumeID.
  6. From the ArcGIS Enterprise Administrator API root, click Services > System > RasterProcessing (DPServer) > Scaling.
  7. Copy the deploymentId from the RasterProcessing (DPServer) service scaling JSON.
  8. From the root of the ArcGIS Enterprise Administrator API, click System > Deployments and search for the deploymentId above.
  9. Click the deploymentId for the RasterProcessing (DPServer) service.
  10. Click Edit Deployment.
  11. In the JSON, locate the replicas property.
  12. After the replicas property, add the volume specification, including the VolumeID that you copied when creating the volume configuration:
     
    "volumes": [{
      "purpose": "GIS_SERVICE_TEMP",
      "volumeConfigId": "<volumeId>"
    }],
    
  13. Click Submit. Optionally, enable the option to Run asynchronously.

    The service deployment will take a few minutes to restart.

  14. To verify that the ephemeral volume has been configured successfully, work with your cluster admin to check the new persisted volume claims (PVCs), that are created for each pod. These PVCs will bind to PVs that are dynamically created, as per the registered volume configuration in your cluster.

You can now use ephemeral volumes to store temporary data for your raster analytics workflows.