Finding the data you need is often only the start of a long process to get it into the format that you require for analysis or presentation. The range of problems is daunting. Some of the more common problems of downloading and formatting data are listed below, with links to suggestions on how various problems can be solved. Unfortunately, one single download may incorporate several of the problems listed below. It is not possible to present examples of all of the problems found in all of the different databases, but the examples and solutions examined will demonstrate some of the more complex problems--users should be able to select parts of the proposed solutions which apply to the data that they have downloaded. Most of the examples will be drawn from data downloaded from organizations in the UN family of organizations.
The identified problems occur at all stages of the process and include 1) difficulties downloading data, 2) awkwardly designed tables, including ones with more than one data element per column, 3) issues in linking data from different sources due to the lack of country or sub-national codes and misspelled geographic names in the downloaded data and, 4) formatting necessary to prepare spreadsheet tables for uploading into GIS software, 5) in addition to considering the problems of formatting downloaded data a few examples of spreadsheet data applications will be provided, 6) links will also be provided to online training and other possibly useful information.
Using GIS: Geographic Information Systems (GIS) applications, including OpenStreetMaps (OSM) and QGIS are both open source software (free) that can be used at least at two levels. Basic mapping from satellite imagery on OpenStreetMaps and making maps with QGIS is relatively straightforward, but each application also has extremely powerful elements which add the possibility of using them as much more sophisticated tools for design and analysis. The Using GIS section will present the basic functions of the software as well as introducing some of the more powerful aspects of the two applications. Click here for an introduction to open source GIS.
Downloading Data: This section provides information and examples on downloading data from some of the important international data sites. The sites chosen present a sample of some of the more common problems involved in downloading data. The main sites reviewed will be the World Bank, the FAO, the UN Population Division, the International Aid Transparency Initiative (IATI), Open Street Maps (OSM)/Geofabrik and Demographic and Health Surveys (DHS). Click here.
Formatting Tables: This section will consider a number of issues, including modifying raw data from the online data sources to make it suitable for use in spreadsheets, databases and other application software. Special attention will be given to data problems which require considerable modification before the tables are suitable for preparing reports, and linking to other databases. The main issue will be downloaded tables in which more than one data element is included in a column. Data downloads from UNDP, FAO, and UNAIDS will be used as examples. Click here.
Linking Tables: Much analysis requires linking data tables, often tables provided by different sources. There are serious problems linking national and sub-national data from these different organizations because, while the International Standards Organization (ISO) provides a recognized set of spellings for geographic names as well as a set of codes, very few of the organizations providing data adhere to the ISO standards for country and sub-national administrative areas. Even within the UN, data from one organization can be difficult to link to that from another. This section will suggest solutions and provide look-up tables which will facilitate linking for some of the major databases.Click here.
Sample Applications: It is beyond the scope of this site to provide extensive examples of reporting based on downloaded data, however, this section will provide a few samples of spreadsheets reporting possibilities. It will also link to, or provide training material on using open source and other software to develop useful graphics (for example population pyramids). Click here.
Online Courses: This page provides links to online training modules for software suggested by this site. In addition, getting data and even manipulating it are one thing, but it is often necessary to have a knowledge of the subject matter or the basic mathematics or statistics involved to really understand what you are doing. In the past few years many universities have started making coursework available free on line. One "school" the Khan Academy provides free short online courses on a wide number of subjects in 28 languages. For links to this data click here.