Datasets

Making a Request

To retrieve data from an IMF database, you’ll need the database ID and any relevant filter parameters. Here’s a basic example using the Primary Commodity Price System (PCPS) database:

import imfp

# Get parameters and their valid codes
params = imfp.imf_parameters("PCPS")

# Fetch annual coal price index data
df = imfp.imf_dataset(
    database_id="PCPS",
    freq=["A"],  # Annual frequency
    commodity=["PCOAL"],  # Coal prices
    unit_measure=["IX"],  # Index
    start_year=2000,
    end_year=2015
)

This example creates two objects we’ll use in the following sections:

  • params: A dictionary of parameters and their valid codes
  • df: The retrieved data frame containing our requested data

Decoding Returned Data

When you retrieve data using imfp.imf_dataset, the returned data frame contains columns that correspond to the parameters you specified in your request. However, these columns use input codes (short identifiers) rather than human-readable descriptions. To make your data more interpretable, you can replace these codes with their corresponding text descriptions using the parameter information from imfp.imf_parameters, so that codes like “A” (Annual) or “W00” (World) become self-explanatory labels.

For example, suppose we want to decode the freq (frequency), ref_area (geographical area), and unit_measure (unit) columns in our dataset. We’ll merge the parameter descriptions into our data frame:

# Decode frequency codes (e.g., "A" → "Annual")
df = df.merge(
    # Select code-description pairs
    params['freq'][['input_code', 'description']],
    # Match codes in the data frame
    left_on='freq',
    # ...to codes in the parameter data
    right_on='input_code',
    # Keep all data rows
    how='left'
).drop(columns=['freq', 'input_code']
).rename(columns={"description": "freq"})

# Decode geographic area codes (e.g., "W00" → "World")
df = df.merge(
    params['ref_area'][['input_code', 'description']],
    left_on='ref_area',
    right_on='input_code',
    how='left'
).drop(columns=['ref_area', 'input_code']
).rename(columns={"description":"ref_area"})

# Decode unit codes (e.g., "IX" → "Index")
df = df.merge(
    params['unit_measure'][['input_code', 'description']],
    left_on='unit_measure',
    right_on='input_code',
    how='left'
).drop(columns=['unit_measure', 'input_code']
).rename(columns={"description":"unit_measure"})

After decoding, the data frame is much more human-interpretable. This transformation makes the data more accessible for analysis and presentation, while maintaining all the original information.

Understanding the Data Frame

Also note that the returned data frame has additional mysterious-looking codes as values in some columns.

Codes in the time_format column are ISO 8601 duration codes. In this case, “P1Y” means “periods of 1 year.” The unit_mult column represents the number of zeroes you should add to the value column. For instance, if value is in millions, then the unit multiplier will be 6. If in billions, then the unit multiplier will be 9.