Automating Trade Data Collection: UN Comtrade & World Bank API
Motivation
Trade researchers often spend several days collecting, cleaning, and aligning datasets before running a single regression (or finding that we download the wrong data!). Thanks to advancements in data processing and the availability of official APIs from reliable sources, this tedious task can now be streamlined efficiently.
I’ve developed a Python-based automation script integrating UN Comtrade, World Bank, and CEPII data to automatically download, merge, and harmonize multi-country trade datasets for econometric or gravity analysis.
Key Features
-
Simple, unified setup — built for researchers, not programmers; define settings once to work across all APIs.
-
Smart summary table — auto-generates Excel reports with summaries, stats, missing data, counts, and more.
-
Smart “all” mode — Enables large-scale batch downloads while maintaining correct country–year–pair consistency and avoiding API query limits.
-
Unified ISO definition — Country codes (ISO3) apply consistently to all data sources: UN Comtrade, World Bank, and CEPII.
-
Integrated output — Automatically merges World Bank macro data, UN Comtrade trade data, and CEPII gravity data into
.csv.
Where to Start
Before we begin, this process needs a few setup steps:
- Obtain a UN Comtrade API key (Free Version)
-
Sign up at UN Comtrade Database or UN Comtrade Developer.
-
Go to My Account → My API portal → Profile → Primary key and copy your subscription key.
-
Note: The key approval may take several days; contact UN Comtrade support if needed.
-
- Download script, mapping file, and prepared CEPII from my GitHub repository.
project/ ├── in/ → prepared CEPII dataset ├── map/ → API_Mapping.xlsx ├── src/ → main Python script └── docs/ → ISO lists, code references, etc.
This entire code is based on Python packages — you must install all of them first.
-
Install Python from python.org
-
Install required packages
pip install pandas wbgapi tqdm comtradeapicall openpyxlIf you encounter issues with
comtradeapicall, see the official UN Comtrade API page.
Configuration
This script’s flexibility lies in the CONFIG section, allowing full customization of the workflow without changing any core logic.
| Parameter | Description |
|---|---|
YEARS | Years to download (e.g., range(2010,2021) or [2015]), applied to both APIs. |
REPORTERS / PARTNERS | ISO3 format (e.g., ["THA","USA"]), or use ["all"] for all countries. applied to both APIs. |
MAPPING_PATH | Optional path to the Excel mapping file, required if USE_WB_VARIABLE_NAME = False to map readable variable names. |
MERGE_DATA | (True/False) to automatically merge UN Comtrade, World Bank, and CEPII datasets after download. |
WB_VARIABLES | World Bank indicators to extract; see API_Mapping.xlsx sheet WorldBank_DataList for available data details. |
WB_MAPPING_SHEET | Optional sheet in (API_Mapping.xlsx) for translating codes to names. |
USE_WB_VARIABLE_NAME | If False, replace codes with names; if True, keep original codes. |
UN_PARAMS | Dictionary of UN Comtrade API parameters; see the next section. |
CEPII_PATH | Path to CEPII symmetric dataset (CEPII_Symmetric.csv); skipped if missing. |
CEPII_VARIABLES | CEPII variables to include; see CEPII for available data. |
💡 Tips for Simplicity
- Paths for Output,
API_Mapping.xlsx, andCEPII_Symmetric.csv YEARS-
REPORTERS/PARTNERS
UN Comtrade Parameters
Each parameter inside the UN_PARAMS dictionary defines how the script requests data from the UN Comtrade API.
| Parameter | Description |
|---|---|
typeCode | Data type ("C" = goods, "S" = services). Default is "C". |
freqCode | Frequency ("A" = annual, "M" = monthly). Default is "A". |
clCode | Classification system ("HS", "SITC", "BEC", "EBOPS", "EB10"). |
cmdCode | Product or commodity code (e.g., "TOTAL", "AG2", "AG6"). |
flowCode | Trade flow ("M" = import, "X" = export). Use one flow per request. |
partner2Code | Secondary partner (default "0" = world). See UNComtrade_countryname.json for bloc codes. |
customsCode | Customs regime ("C00" = total trade, default). |
motCode | Mode of transport ("0" = all modes). |
maxRecords | Maximum number of records per call (default 500000). |
format_output | Output format ("JSON" recommended). |
includeDesc | If True, include product and country names in the output. |
Note: For more detail, please refer to UN Comtrade API
Output and Summary
The script automatically generates clean, ready-to-use .csv files:
| File | Description |
|---|---|
WorldBank_Data.csv | World Bank indicators by country-year. |
UNComtrade_AG2_YYYY.csv | UN Comtrade data per year. |
UNComtrade_AG2_Merged.csv | Combined multi-year dataset. |
Merged_Trade_WB_CEPII.csv | Merged trade, macro, and CEPII data. |
Summary_Report.xlsx | Excel summary with stats, mappings, coverage, product counts, and trade combinations. |
Acknowledgment
This tool integrates publicly available data from:
-
UN Comtrade API and
comtradeapicall— official package for automated trade data retrieval
(PyPI • GitHub) -
World Bank API — provider of World Development Indicators (GDP, CPI, population, trade shares, etc.)
(Website) -
CEPII Gravity Dataset — bilateral economic and geographic variables for trade analysis
(Website)
About the Author
Pattawee Puangchit
Ph.D. Candidate in Agricultural Economics, Purdue University
Graduate Research Assistant at GTAP
Research focus: International trade policy, tariffs and non-tariff measures, quasi-experimental analysis, and CGE modeling
Website | GitHub
Other Articles
Here are some more articles you might like to read next: