Requesting Datasets

Making a Request

To retrieve data from an IMF database, you’ll need the database ID and any relevant filter parameters. Here’s a basic example using the Primary Commodity Price System (PCPS) database:

import imfp

# Get parameters and their valid codes
params = imfp.imf_parameters("PCPS")

# Fetch annual coal price index data
df = imfp.imf_dataset(
    database_id="PCPS",
    freq=["A"],  # Annual frequency
    commodity=["PCOAL"],  # Coal prices
    unit_measure=["IX"],  # Index
    start_year=2000,
    end_year=2015
)

This example creates two objects we’ll use in the following sections:

params: A dictionary of parameters and their valid codes
df: The retrieved data frame containing our requested data

Decoding Returned Data

When you retrieve data using imf_dataset, the returned data frame contains columns that correspond to the parameters you specified in your request. However, these columns use input codes (short identifiers) rather than human-readable descriptions. To make your data more interpretable, you can replace these codes with their corresponding text descriptions using the parameter information from imf_parameters, so that codes like “A” (Annual) or “W00” (World) become self-explanatory labels.

For example, suppose we want to decode the freq (frequency), ref_area (geographical area), and unit_measure (unit) columns in our dataset. We’ll merge the parameter descriptions into our data frame:

# Decode frequency codes (e.g., "A" → "Annual")
df = df.merge(
    # Select code-description pairs
    params['freq'][['input_code', 'description']],
    # Match codes in the data frame
    left_on='freq',
    # ...to codes in the parameter data
    right_on='input_code',
    # Keep all data rows
    how='left'
).drop(columns=['freq', 'input_code']
).rename(columns={"description": "freq"})

# Decode geographic area codes (e.g., "W00" → "World")
df = df.merge(
    params['ref_area'][['input_code', 'description']],
    left_on='ref_area',
    right_on='input_code',
    how='left'
).drop(columns=['ref_area', 'input_code']
).rename(columns={"description":"ref_area"})

# Decode unit codes (e.g., "IX" → "Index")
df = df.merge(
    params['unit_measure'][['input_code', 'description']],
    left_on='unit_measure',
    right_on='input_code',
    how='left'
).drop(columns=['unit_measure', 'input_code']
).rename(columns={"description":"unit_measure"})

df.head()

	commodity	time_format	time_period	obs_value	freq	ref_area	unit_measure
0	PCOAL	P1Y	2000	39.3510230293202	Annual	All Countries, excluding the IO	Index
1	PCOAL	P1Y	2001	49.3378587284039	Annual	All Countries, excluding the IO	Index
2	PCOAL	P1Y	2002	39.4949091648006	Annual	All Countries, excluding the IO	Index
3	PCOAL	P1Y	2003	43.2878876950788	Annual	All Countries, excluding the IO	Index
4	PCOAL	P1Y	2004	82.9185858052862	Annual	All Countries, excluding the IO	Index

After decoding, the data frame is much more human-interpretable. This transformation makes the data more accessible for analysis and presentation, while maintaining all the original information.

Understanding the Data Frame

Also note that the returned data frame has additional mysterious-looking codes as values in some columns.

Codes in the time_format column are ISO 8601 duration codes. In this case, “P1Y” means “periods of 1 year.” See Time Period Conversion for more information on reconciling time periods.

The unit_mult column represents the number of zeroes you should add to the value column. For instance, if value is in millions, then the unit multiplier will be 6. If in billions, then the unit multiplier will be 9. See Unit Multiplier Adjustment for more information on reconciling units.