Core Concepts

Quest is a python library designed to automate the following data management tasks:

  • Discovery

  • Retrieval

  • Organization

  • Transformation

  • Archival

At the heart of all of these tasks are datasets. Each of the tasks listed above involves finding, getting, storing, changing, or sharing a dataset. The underlying concepts for how Quest accomplishes these five tasks will be described below and are grouped into the following three sections:

Local Data Organization

Quest uses a hierarchical structure to organize and manage datasets, and data sources. The dataset hierarchy begins with projects which contains collections which have datasets. A more detailed description of each level is given below.

Projects

A Quest Project is the base organizing factor. The first time Quest is started a default project is created. Only one project can be active at a time and currently the api does not allow copying data from one project to another.

Physically, a project maps to a folder on the computer. All data and metadata associated with a project is stored under the project folder. The metadata is stored in a sqlite database.

Collections

Collections are a way of organizing data within a project. Collection names are unique and the collection name maps directly to a folder name in the project folder.

Datasets

These are the actual individual data files or in some cases a folder of data. Datasets have associated metadata that is stored in the project directory.

Data Transformations

Quest facilitates transforming data through the use of tools. Some examples of the kinds of transformations that Quest can do include merging datasets, aggregating data within a dataset, or changing the format that the data is stored in.

Tools

Quest tools are a way to perform some kind of operation on data. It is important to note that a tool will never perform “in-place” changes the datasets that it operates on. This means that datasets that are passed to a tool will remain unchanged, and the tool will create new datasets that have the transformed data. New tools can be added to Quest through Tool Plugins.

Tools define a set of options that a user must specify when using the tool.

Data Repositories

When Quest is used to search for data it searches among all of the data repositories or data providers that are registered with Quest. Similar to Tools Providers are added to Quest as plugins (see Provider Plugins). Providers contain one or more services. Services provide an interface for a single data product. Each service has a Catalogs, which stores metadata about the datasets that are available from that service and is what enables Quest to search for data.

Providers

Data providers are the top level source of data. Providers are composed of one or more Services, and typically represent an organization or specific part of an organization that provides data. In Quest, providers are a way of grouping related services.

Services

A data service is a specific type or channel of data that is offered from a Providers, and are the primary means of ingesting data into Quest.

Catalogs

Catalog Entries

Catalog Entries are a unique identifiers that indicate a group of datasets. Typically, these are geospatial locations, i.e., monitoring stations, counties, lakes, roads at which data exists. Features can also just be a tag or name to group data that does not have a geospatial component (i.e. geotypical datasets). Features are always either part of a collection or part of a web service.