FactbookXML - All textual information of the CIA World Factbook in one XML file

© 2007-2014 Michael Schierl, <schierlm at users dot sourceforge dot net>

Introduction

Latest version

Build info

Download date: Dec 31, 2014
Most recent file date in archive: Jun 24, 2014
Date of parsing: Dec 31, 2014

Note

As of 2014-09-30, the factbook.zip available for download is identical to the 2014-06-24 version. The only "news" that got added don't seem to imply that there have been any changes to the data, therefore no new factbook downloads for this quarter :)

As of 2014-12-31, neither the downloadable factbook.zip nor the "news" have changed. Therefore, no new factbook downloads for this quarter either :)

Latest news

Aug 15, 2014 What country has the highest life expectancy in the world? Under the References tab go to the Guide to Country Comparisons and click on the People and Society category and find the "Life expectancy at birth" entry. One more click will give you the answer.

Jul 10, 2014 Can you name the largest country in Central America - which also contains the largest freshwater lake in Central America? Check the Regional and World Maps under the References tab to find the answer.

Jun 26, 2014 There are dozens of monarchies scattered about the globe, bu only one of them falls in the Pacific region (Oceania). Can you name this arhcipelagic country? Hint: It lies in western Polynesia and was formerly referred to as the Friendly Islands. Its capital is Nuku'alofa, which translates as "the abode of love."

There are many reasons why you might want to parse data from the CIA World Factbook, which is luckily in the public domain, but unluckily written in a crude HTML (with lots of syntax errors) so that it is hard to parse it. For example, if you want to analyze fields that are not available as "Country comparison" pages, like for example, the "Languages" field (Field ID 2098). Or, in my case I wanted to build an easily browsable PDA version of the CIA factbook which is both small (less than 10MB) and optimized for the simple HTML syntax supported by MobiPocket or Plucker.

This got more important in 2009, when the CIA redesigned the World Factbook website. The website was even shinier and brighter afterwards, but its size grew enormously and it got more and more HTML errors in this process, which made parsing more complicated - both for users who wanted to parse data for statistical analysis and for tools who tried to parse it into a PDA version.

To be fair: There is a text version of the CIA factbook available, but it does not contain all the data (field definitions, news, appendix information are missing among other things), and in some cases the information provided in the text version is different (less frequently updated) [note: this got better after the redesign in 2009]

But it is still much easier to take the information you need from a XML file. It is still text (i. e. the language information is not decoded to language codes, but still language names), but it covers all the country profiles, country comparisons, appendices, definitions, news, faq and region information.

Contents

You can find all the files mentioned here in the Data directory.

FactbookXML consists of a XML schema file that describes the format, a gzipped XML file that contains the actual data, and several files produced from this XML file (by FactbookXML-Converter):

A Parser log that lists several inconsistencies about the parsed files, and an Online mobile version is also available.