Automating Trade Data Collection: UN Comtrade & World Bank API

2025   ·  


Motivation

Trade researchers often spend several days collecting, cleaning, and aligning datasets before running a single regression (or finding that we download the wrong data!). Thanks to advancements in data processing and the availability of official APIs from reliable sources, this tedious task can now be streamlined efficiently.

I’ve developed a Python-based automation script integrating UN Comtrade, World Bank, and CEPII data to automatically download, merge, and harmonize multi-country trade datasets for econometric or gravity analysis.

Key Features

  • Simple, unified setup — built for researchers, not programmers; define settings once to work across all APIs.

  • Smart summary table — auto-generates Excel reports with summaries, stats, missing data, counts, and more.

  • Smart “all” mode — Enables large-scale batch downloads while maintaining correct country–year–pair consistency and avoiding API query limits.

  • Unified ISO definition — Country codes (ISO3) apply consistently to all data sources: UN Comtrade, World Bank, and CEPII.

  • Integrated output — Automatically merges World Bank macro data, UN Comtrade trade data, and CEPII gravity data into .csv.

Where to Start

Before we begin, this process needs a few setup steps:

  1. Obtain a UN Comtrade API key (Free Version)
    • Sign up at UN Comtrade Database or UN Comtrade Developer.

    • Go to My Account → My API portal → Profile → Primary key and copy your subscription key.

    • Note: The key approval may take several days; contact UN Comtrade support if needed.

  2. Download script, mapping file, and prepared CEPII from my GitHub repository.
    project/
       ├── in/        → prepared CEPII dataset
       ├── map/       → API_Mapping.xlsx
       ├── src/       → main Python script
       └── docs/      → ISO lists, code references, etc.
    

This entire code is based on Python packages — you must install all of them first.

  • Install Python from python.org

  • Install required packages

    pip install pandas wbgapi tqdm comtradeapicall openpyxl
    

    If you encounter issues with comtradeapicall, see the official UN Comtrade API page.

Configuration

This script’s flexibility lies in the CONFIG section, allowing full customization of the workflow without changing any core logic.

Key configuration settings
Parameter Description
YEARS Years to download (e.g., range(2010,2021) or [2015]), applied to both APIs.
REPORTERS / PARTNERS ISO3 format (e.g., ["THA","USA"]), or use ["all"] for all countries. applied to both APIs.
MAPPING_PATH Optional path to the Excel mapping file, required if USE_WB_VARIABLE_NAME = False to map readable variable names.
MERGE_DATA (True/False) to automatically merge UN Comtrade, World Bank, and CEPII datasets after download.
WB_VARIABLES World Bank indicators to extract; see API_Mapping.xlsx sheet WorldBank_DataList for available data details.
WB_MAPPING_SHEET Optional sheet in (API_Mapping.xlsx) for translating codes to names.
USE_WB_VARIABLE_NAME If False, replace codes with names; if True, keep original codes.
UN_PARAMS Dictionary of UN Comtrade API parameters; see the next section.
CEPII_PATH Path to CEPII symmetric dataset (CEPII_Symmetric.csv); skipped if missing.
CEPII_VARIABLES CEPII variables to include; see CEPII for available data.


💡 Tips for Simplicity
Download everything from my GitHub repository and modify only:
  • Paths for Output, API_Mapping.xlsx, and CEPII_Symmetric.csv
  • YEARS
  • REPORTERS / PARTNERS

UN Comtrade Parameters

Each parameter inside the UN_PARAMS dictionary defines how the script requests data from the UN Comtrade API.

UN Comtrade API Parameters
Parameter Description
typeCode Data type ("C" = goods, "S" = services). Default is "C".
freqCode Frequency ("A" = annual, "M" = monthly). Default is "A".
clCode Classification system ("HS", "SITC", "BEC", "EBOPS", "EB10").
cmdCode Product or commodity code (e.g., "TOTAL", "AG2", "AG6").
flowCode Trade flow ("M" = import, "X" = export). Use one flow per request.
partner2Code Secondary partner (default "0" = world). See UNComtrade_countryname.json for bloc codes.
customsCode Customs regime ("C00" = total trade, default).
motCode Mode of transport ("0" = all modes).
maxRecords Maximum number of records per call (default 500000).
format_output Output format ("JSON" recommended).
includeDesc If True, include product and country names in the output.


Note: For more detail, please refer to UN Comtrade API

Output and Summary

The script automatically generates clean, ready-to-use .csv files:

Main Output Files
File Description
WorldBank_Data.csv World Bank indicators by country-year.
UNComtrade_AG2_YYYY.csv UN Comtrade data per year.
UNComtrade_AG2_Merged.csv Combined multi-year dataset.
Merged_Trade_WB_CEPII.csv Merged trade, macro, and CEPII data.
Summary_Report.xlsx Excel summary with stats, mappings, coverage, product counts, and trade combinations.

Acknowledgment

This tool integrates publicly available data from:

  • UN Comtrade API and comtradeapicall — official package for automated trade data retrieval
    (PyPIGitHub)

  • World Bank API — provider of World Development Indicators (GDP, CPI, population, trade shares, etc.)
    (Website)

  • CEPII Gravity Dataset — bilateral economic and geographic variables for trade analysis
    (Website)

About the Author

Pattawee Puangchit
Ph.D. Candidate in Agricultural Economics, Purdue University
Graduate Research Assistant at GTAP
Research focus: International trade policy, tariffs and non-tariff measures, quasi-experimental analysis, and CGE modeling
Website | GitHub



Other Articles

Here are some more articles you might like to read next:

  • U.S. Reciprocal Tariffs: A CGE Analysis of Global Economic Impacts
  • U.S. Reciprocal Tariff Measures Monitoring (2025)