Suggestions for Usage

Determining Data Availability

Unfortunately, many of the indicators listed as available in the lists of input codes returned by imf_parameters() are not actually available. This is a deficiency of the API rather than the library; someone at the IMF presumably intended to provide these indicators at some point, but never got around to it.

The only way to be certain whether an indicator is available is to make a request to the API and see if it succeeds. If not, you will receive an error message indicating that no data was found for your parameters. In general, if you see this message, you should try making a less restrictive version of your request. For instance, if your request returns no data for an indicator for a given country and time period, you can omit the country or time period parameter and try again. If you still get no data, that indicator is not actually available through the API.

While it is not fully predictable which indicators will be available, as a general rule you can expect to get unadjusted series but not adjusted ones. For instance, real and per capita GDP are not available (although they are listed) through the API, but nominal GDP is. The API does, however, make available all the adjustment variables you would need to adjust the data yourself. See the Common Data Transformations section below for examples of how to make adjustments.

Working with Large Data Frames

Inspecting Data

imfp outputs data in pandas DataFrames, so you will want to use the pandas package for its functions for viewing and manipulating this object type.

For large datasets, you can use the pandas library’s info() method to get a quick summary of the data frame, including the number of rows and columns, the count of non-missing values, the column names, and the data types.

import imfp
import pandas as pd

# Set float format to 2 decimal places for pandas display output
pd.set_option('display.float_format', lambda x: '%.2f' % x)

df: pd.DataFrame = imfp.imf_dataset(
    database_id="PCPS",
    commodity=["PCOAL"],
    unit_measure=["IX"],
    start_year=2000, end_year=2001
)

# Quick summary of DataFrame
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 34 entries, 0 to 23
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   freq          34 non-null     object
 1   ref_area      34 non-null     object
 2   commodity     34 non-null     object
 3   unit_measure  34 non-null     object
 4   unit_mult     34 non-null     object
 5   time_format   34 non-null     object
 6   time_period   34 non-null     object
 7   obs_value     34 non-null     object
dtypes: object(8)
memory usage: 2.4+ KB

Alternatively, you can use the head() method to view the first 5 rows of the data frame.

# View first 5 rows of DataFrame
df.head()

	freq	ref_area	commodity	unit_measure	time_format	time_period	obs_value
0	A	W00	PCOAL	IX	P1Y	2000	39.3510230293202
1	A	W00	PCOAL	IX	P1Y	2001	49.3378587284039
0	Q	W00	PCOAL	IX	P3M	2000-Q1	36.2253202528364
1	Q	W00	PCOAL	IX	P3M	2000-Q2	38.3233649120962
2	Q	W00	PCOAL	IX	P3M	2000-Q3	39.5896503864216

Cleaning Data

Numeric Conversion

All data is returned from the IMF API as a text (object) data type, so you will want to cast numeric columns to numeric.

# Numeric columns
numeric_cols = ["unit_mult", "obs_value"]

# Cast numeric columns
df[numeric_cols] = df[numeric_cols].apply(pd.to_numeric)

Categorical Conversion

You can also convert string columns to categorical types for better memory usage.

# Convert categorical columns like ref_area and indicator to category type
categorical_cols = [
  "freq",
  "ref_area",
  "commodity",
  "unit_measure",
  "time_format"
]

df[categorical_cols] = df[categorical_cols].astype("category")

NA Removal

After conversion, you may want to drop any rows with missing values.

# Drop rows with missing values
df = df.dropna()

Time Period Conversion

The time_period column can be more difficult to work with, because it may be differently formatted depending on the frequency of the data.

Annual data will be formatted as a four-digit year, such as “2000”, which can be trivially converted to numeric.

However, quarterly data will be formatted as “2000-Q1”, and monthly data will be formatted like “2000-01”.

You can use the pandas library’s to_datetime() method with the format="mixed" argument to convert this column to a datetime object in a format-agnostic way:

# Convert time_period to datetime
df["datetime"] = pd.to_datetime(df["time_period"], format="mixed")
df[["freq", "datetime"]].head()

	freq	datetime
0	A	2000-01-01
1	A	2001-01-01
0	Q	2000-01-01
1	Q	2000-04-01
2	Q	2000-07-01

Alternatively, you can split the time_period column into separate columns for year, quarter, and month, and then convert each to a numeric value:

# Split time_period into separate columns
df["year"] = df["time_period"].str.extract(r"(\d{4})")[0]
df["quarter"] = df["time_period"].str.extract(r"Q(\d{1})")[0]
df["month"] = df["time_period"].str.extract(r"-(\d{2})")[0]

# Convert year, quarter, and month to numeric
df["year"] = pd.to_numeric(df["year"])
df["quarter"] = pd.to_numeric(df["quarter"])
df["month"] = pd.to_numeric(df["month"])

df[["time_period", "year", "quarter", "month"]].head()

	time_period	year	quarter	month
0	2000	2000	NaN	NaN
1	2001	2001	NaN	NaN
0	2000-Q1	2000	1.00	NaN
1	2000-Q2	2000	2.00	NaN
2	2000-Q3	2000	3.00	NaN

Summarizing Data

After converting columns to numeric, you can use the describe() function to get a quick summary of the statistical properties of these, including the count of rows, the mean, the standard deviation, the minimum and maximum values, and the quartiles.

# Statistical summary
df.describe()

	unit_mult	obs_value	datetime	year	quarter	month
count	34.00	34.00	34	34.00	8.00	24.00
mean	0.00	44.34	2000-11-28 21:52:56.470588288	2000.50	2.50	6.50
min	0.00	35.59	2000-01-01 00:00:00	2000.00	1.00	1.00
25%	0.00	39.22	2000-06-08 12:00:00	2000.00	1.75	3.75
50%	0.00	44.35	2000-12-16 12:00:00	2000.50	2.50	6.50
75%	0.00	50.11	2001-05-24 06:00:00	2001.00	3.25	9.25
max	0.00	51.48	2001-12-01 00:00:00	2001.00	4.00	12.00
std	0.00	5.76	NaN	0.51	1.20	3.53

Viewing Data

For large data frames, it can be useful to view the data in a browser window. To facilitate this, you can define a View() function as follows. This function will save the data frame to a temporary HTML file and open it in your default web browser.

import tempfile
import webbrowser

# Define a simple function to view data frame in a browser window
def View(df: pd.DataFrame):
    html = df.to_html()
    with tempfile.NamedTemporaryFile('w', 
    delete=False, suffix='.html') as f:
        url = 'file://' + f.name
        f.write(html)
    webbrowser.open(url)

# Call the function
View(df)

Common Data Transformations

The International Financial Statistics (IFS) database provides key macroeconomic aggregates that are frequently needed when working with other IMF datasets. Here, we’ll demonstrate how to use three fundamental indicators—GDP, price deflators, and population statistics—to transform your data.

These transformations are essential for:

Converting nominal to real dollar values
Calculating per capita metrics
Harmonizing data across different frequencies
Adjusting for different unit scales

For a complete, end-to-end example of these transformations in a real analysis workflow, see Jenny Xu’s superb demo notebook.

Fetching IFS Adjusters

First, let’s retrieve the key adjustment variables from the IFS database:

# Fetch GDP Deflator (Index, Annual)
deflator = imfp.imf_dataset(
    database_id="IFS",
    indicator="NGDP_D_SA_IX",
    freq="Q",
    start_year=2010
)

# Fetch Population Estimates (Annual)
population = imfp.imf_dataset(
    database_id="IFS",
    indicator="LP_PE_NUM",
    freq="A",
    start_year=2010
)

# Fetch Exchange Rate (Annual)
exchange_rate = imfp.imf_dataset(
    database_id="IFS", 
    indicator="ENDE_XDC_USD_RATE",
    freq="Q",
    # start_year=2010 currently breaks this query for some reason
)

We’ll also retireve a nominal GDP series to be adjusted:

# Fetch Nominal GDP (Domestic currency, annual)
nominal_gdp = imfp.imf_dataset(
    database_id="IFS", 
    indicator="NGDP_XDC",
    freq="A",
    start_year=2010
)

Key IFS Indicators:

NGDP_D_SA_IX: GDP deflator index (seasonally adjusted)
LP_PE_NUM: Population estimates
ENDE_XDC_USD_RATE: Exchange rate (domestic currency per USD)
NGDP_XDC: Nominal GDP in domestic currency

Harmonizing Frequencies

When working with data of different frequencies, you’ll often need to harmonize them. For example, population and national GDP are available at an annual frequency, while the GDP deflator and exchange rates can only be obtained at a monthly or quarterly frequency. There are two common approaches:

Using Q4 values: This approach is often used for stock variables (measurements taken at a point in time) and when you want to align with end-of-year values:

# Keep only Q4 observations for annual comparisons
deflator = deflator[deflator['time_period'].str.contains("Q4")]
exchange_rate = exchange_rate[exchange_rate['time_period'].str.contains("Q4")]

# Extract just the year from the time period for Q4 data
deflator['time_period'] = deflator['time_period'].str[:4]
exchange_rate['time_period'] = exchange_rate['time_period'].str[:4]

Calculating annual averages: This approach is more appropriate for flow variables (measurements over a period) and when you want to smooth out seasonal variations:

# Alternative: Calculate annual averages
deflator = deflator.groupby(
    ['ref_area', deflator['time_period']], 
    as_index=False
).agg({
    'obs_value': 'mean'
})

Choose the appropriate method based on your specific analysis needs and the economic meaning of your variables.

Unit Multiplier Adjustment

IMF data often includes a unit_mult column that indicates the scale of the values (e.g., millions, billions). We can write a helper function to apply these scaling factors:

def apply_unit_multiplier(df):
    """Convert to numeric, adjust values using IMF's scaling factors, and drop
    missing values"""
    df['obs_value'] = pd.to_numeric(df['obs_value'])
    df['unit_mult'] = pd.to_numeric(df['unit_mult'])
    df['adjusted_value'] = df['obs_value'] * 10 ** df['unit_mult']
    df = df.dropna(subset=["obs_value"])
    return df

# Apply to each dataset
deflator = apply_unit_multiplier(deflator)
population = apply_unit_multiplier(population)
exchange_rate = apply_unit_multiplier(exchange_rate)
nominal_gdp = apply_unit_multiplier(nominal_gdp)

Merging Datasets

After harmonizing unit scales, we can combine the datasets using pd.DataFrame.merge() with ref_area and time_period as keys:

merged = (
    nominal_gdp.merge(
        deflator,
        on=['ref_area', 'time_period'],
        suffixes=('_gdp', '_deflator')
    )
    .merge(
        population,
        on=['ref_area', 'time_period']
    )
    .merge(
        exchange_rate,
        on=['ref_area', 'time_period'],
        suffixes=('_population', '_exchange_rate')
    )
)

Calculating Real Values

With the merged dataset, we can now calculate real GDP and per capita values:

# Convert nominal to real GDP
merged['real_gdp'] = (
    (merged['adjusted_value_gdp'] / merged['adjusted_value_deflator']) * 100
)

# Calculate per capita values (using population obs_value)
merged['real_gdp_per_capita'] = merged['real_gdp'] / merged['adjusted_value_population']

# Display the first 5 rows of the transformed data
merged[['time_period', 'real_gdp', 'real_gdp_per_capita']].head()

	time_period	real_gdp	real_gdp_per_capita
0	2011	3551364668794.20	82248.89
1	2012	3792317490278.53	85521.44
2	2013	3967309320845.69	87155.33
3	2014	4144699824894.22	88751.48
4	2015	4417512779956.08	92265.37

Exchange Rate Adjustment

Note that this result is still in the domestic currency of the country. If you need to convert to a common currency, you can use the exchange rate data from the IFS database.

# Because 'adjusted_value_exrate' is local-currency-per-USD,
# dividing local-currency real GDP by it yields GDP in USD.
merged["real_gdp_usd"] = (
    merged["real_gdp"] / merged["adjusted_value_exchange_rate"]
)

# (Optional) real GDP per capita in USD
merged["real_gdp_usd_per_capita"] = (
    merged["real_gdp_usd"] / merged["adjusted_value_population"]
)

# Inspect results
merged[["time_period","ref_area","real_gdp","real_gdp_usd","real_gdp_usd_per_capita"]].head()

	time_period	ref_area	real_gdp	real_gdp_usd	real_gdp_usd_per_capita
0	2011	KE	3551364668794.20	41747313843.78	966.86
1	2012	KE	3792317490278.53	44096304805.05	994.43
2	2013	KE	3967309320845.69	45965960945.06	1009.80
3	2014	KE	4144699824894.22	45796944714.40	980.66
4	2015	KE	4417512779956.08	43177131580.22	901.81