Writing Quest Service Plugin¶
Quest is designed to be extensible. It has tree types of plugins: (1) Data Service plugins, (2) Filter plugins, and (3) I/O plugins. Each of them work in a similar way, but this documentation will focus the details of the first type (Data Service plugins). Since there is not a standard interface for accessing the many web services that provide various types of data, the Quest service plugins are used as adapters that translate the specific interface used by each service to a common Quest API for searching, accessing, and downloading data. If we find a new data source that we would like to make accessible through Quest then we need to create a new service plugin for that source.
Data Service Plugins¶
Services are channels for finding and importing data into Quest. Data services are organized by provider. A provider is composed of one or more services, and all services must be part of a provider. For example, the U.S. Geological Survey (USGS) provides various data products. A subset of these products, say, the National Elevation Datasets (NED) haved been grouped into a usgs_ned
provider. That provider has four different services that provide the various NED data products (i.e. 1 arc second, 1/3 arc second, 1/9 arc second, and Alaska 2 arc second). The process of creating a new Data Service plugin involves subclassing both the ProviderBase
class and the ServiceBase
class. To illustrate this process, we will provide code examples that create an example web service provider (ExampleProvider
) that contains two services (ExampleService1
and ExampleService2
).
1. Provider Base Class¶
The ProviderBase class acts as the gateway to all of the services that are part of a provider. Most of the code resides in the abstract base class, so subclassing it is very simple, and involves specifying a few attributes. For example:
from .base import ProviderBase
class ExampleProvider(ProviderBase):
service_base_class = None #TODO: This will be implemented in the next step
display_name = 'Example Web Provider'
description = 'Example ProviderBase subclass for Quest'
organization_name = 'Example Data Provider Organization'
organization_abbr = 'EDPO'
This is all that is required to subclass the ProviderBase
. As you will notice the attribute service_base_class
was left as None
. This attribute refers to a base class that is the parent of all of the services that belong to this provider. The ProviderBase
will find all of the subclasses of the class specified by service_base_class
and register them as services of the provider. Therefore the next step is to create a 2. Service Base Class.
2. Service Base Class¶
A data service plugin must subclass the ServiceBase
class (or one of it’s subclasses, see Specialized Service Base Subclasses) to act as the base class for all services in the plugin. This ServiceBase
subclass is registered in the provider as the service_base_class attribute. As an example we will create an ExampleServiceBase
class that subclasses the ServiceBase
class:
from .base import ProviderBase, ServiceBase
class ExampleServiceBase(ServiceBase):
service_name = None
display_name = None
description = None
service_type = None
unmapped_parameters_available = None
geom_type = None
datatype = None
geographical_areas = None
bounding_boxes = None
smtk_template = None
_parameter_map = None
def download(self, feature, file_path, dataset, **params):
pass #TODO: This will be implemented later
def get_features(self, **kwargs):
pass #TODO: This will be implemented later
class ExampleProvider(ProviderBase):
service_base_class = ExampleServiceBase
...
Note
The ExampleServiceBase
class needed to be defined above the ExampleProvider
class so we could reference it to assign the service_base_class
attribute in the ExampleProvider
.
The content of ExampleServiceBase
has not yet been fully implemented. The above example simply illustrates the structure. All of the attributes and methods shown in the ExampleServiceBase
will need to be implemented either in this class directly or in the services that subclass this base class. The specifics of how this are done will be different for each plugin, but the next step, 3. Service Classes will demonstrate one way to do it.
Specialized Service Base Subclasses¶
There are a couple of special cases that apply to services from various providers. To allow all of these services to use the same codebase a couple of other base classes are available that can be used in place of the ServiceBase
.
TimePeriodServiceBase¶
This base class simply adds two parameters, a start and end date to represent the time period for the data being requested (see d. Specify the Download Options).
SingleFileServiceBase¶
This base class implements the download method for services where there is simply a download url that links to a single zip file that contains the data.
3. Service Classes¶
After a ServiceBase
subclass has been created (in our example this is the ExampleServiceBase
) then the next step is to create classes for each specific service. While the specifics of this step can vary significantly between plugins, the overall structure and process are similar and will be broken down in to several sub-steps:
Continuing the example from above we will create two service classes that each subclass the ExampleServiceBase
. We’ll first focus on assigning all of the required class attributes.
a. Required Service Class Attributes¶
service_name
(String): A unique identifier for the service. It should contain only alpha-numeric characters or_
or-
. There should be no spaces.
display_name
(String): A displayable version of the service name (may contain spaces) for use in GUIs.
description
(String): A brief description of the service that will be available in the service’s metadata.
service_type
(String): A keyword that indicates the type of data that the service provides. Must be one of geo-discrete, geo-seamless or geo-typical. (# TODO: provide link to description of service types in the docs)
unmapped_parameters_available
(Bool): Whether or not additional parameters are available from the service other than those that are listed in the_parameter_map
.
geom_type
(String): Describes what type of geometry represents the locations of the data (for geo-discrete services only). Must be Point, Line, Polygon. Leave asNone
for service of type other than geo-discrete.
datatype
(String): Represents the type of data that is accessible from the service. Must be timeseries, raster, or other.
geographical_areas
(List): A list of descriptive words that represent the areas where data is available (e.g. [‘North America’, ‘Europe’]). Should be left asNone
for geo-typical service types.
bounding_boxes
(List): A list of bounding boxes represented as tuples in the form (x-min, y-min, x-max, y-max). For example [(-180, -90, 180, 90)].
smtk_template
(String): The name of the SMTK template file that describes the download options for the service.
_parameter_map
(Dict): A mapping of parameters as they are called by the service, to the controlled vocabulary parameter names in Quest.
In some cases the attributes will be the same for both services, so they can be assigned in the ExampleServiceBase
class. The rest of the attributes, that are different between the two services, will be assigned in the service classes themselves:
from .base import ProviderBase, ServiceBase
class ExampleServiceBase(ServiceBase):
service_name = None
display_name = None
description = None
service_type = 'geo-discrete'
unmapped_parameters_available = False
geom_type = 'Point'
datatype = 'timeseries'
geographical_areas = ['Worldwide']
bounding_boxes = [
[-180, -90, 180, 90],
]
smtk_template = None
def get_features(self, **kwargs):
pass #TODO: This will be implemented later
def download(self, feature, file_path, dataset, **params):
pass #TODO: This will be implemented later
class ExampleService1(ExampleServiceBase):
service_name = 'example-1'
display_name = 'Example Service 1'
description = 'First example service'
_parameter_map = {}
class ExampleService2(ExampleServiceBase):
service_name = 'example-2'
display_name = 'Example Service 2'
description = 'Second example service'
_parameter_map = {}
class ExampleProvider(ProviderBase):
service_base_class = ExampleServiceBase
...
b. Implement the get_features
Method¶
The purpose of the get_features
method is to extract key metadata from the service that describes what data is available from that service. For geo-discrete services this would include a list of locations where the service has data in addition to other key metadata at each location. The return value for get_features
should be a Pandas DataFrame indexed by a unique id (known as the service_id) with the following columns:
display_name: (will be set to service_id if not provided)
description: (will be set to ‘’ if not provided)
service_id: a unique id that is used by the web service to identify the data
For geo-discrete services the DataFrame must also include a representation of the features’ geometry. Any of the following options are valid ways to specify the geometry:
geometry: a geojson string or Shapely object
latitude and longitude: two columns with the decimal degree coordinates of a point
geometry_type, latitudes, and longitudes: Point, Line, or Polygon with a list of coordinates
bbox: tuple with order (lon min, lat min, lon max, lat max)
All other fields that the DataFrame contains will be accumulated into a dict
and placed in a column called metadata.
Similar to the attributes the get_features
method may be implemented in the service classes (e.g. ExampleService1
and ExampleService2
), or in the base class (e.g. ExampleServiceBase
), or some combination of both.
c. Implement the download
Method¶
The download
method is responsible for retrieving the data from the data source using the specified download options, save it to disk, and then return a dictionary of key metadata. The download method should accept several arguments:
feature: the service_id for the feature that is associated with the data to be downloaded
file_path: the path to the directory on disk where Quest expects the data to be written
dataset: the Quest dataset id associated with the data to be downloaded
**params: key-word arguments for the dataset options
After downloading the data and saving it to disk, this method should return a dictionary ith the following keys:
metadata: any metadata that was returned by the data source when it was downloaded in the form of a
dict
file_path: the final file path (including the filename) where the data file was writen
file_format: the format that the file was written in (to be used to determine which I/O plugin to use to read the file)
datatype: a string representing the type of data. Must be timeseries, raster, or other.
parameter: a string representing the parameter of the data
unit: a string representing the units of the data
d. Specify the Download Options¶
Data sources’s APIs often allow various options to be specified to determine what data to download, what format it should be in, etc.
The download options that are needed for each service are defined using the Python library Param. This library enables parameters to have features like type and range checking, documentation strings, default values, etc. Refer to the Param documentation for more information.