A single downloaded table can provide valuable information, but the usefulness of the table is greatly enhanced when it is possible to link that table to other tables from other sources containing related data. An example would be linking the UN under-five mortality table to a WHO table on numbers of physicians and nurses. While it is conceptually possible to construct a model in which the country name itself would act as the field with which to link tables together, the spellings vary so widely (see examples below) that this is not feasible--an abbreviation, a missing comma or accent or even an extra space in a country name means that successful country name linking requires a great deal of manual input. The situation at the sub-national level is even more difficult because the large number of entities and the often difficult and unfamiliar spellings of the names makes linking difficult.

Country Names-- One of the very frustrating problems in dealing with data bases from different sources is linking the data by country name. Each organization has its own spellings and in some cases the spellings differ from table to table within the same organization. This means that users must often create their own tables to match country names from each of the sources they are using.

The table below shows a sample of country name spellings used by the UN, the World Bank (IBRD), the US Census Bureau and the United Nations Development Programme (UNDP). The entries in orange indicate spellings of country names which are not consistent with those used by the UN.

This necessary standards to link for joining national tables and subnational tables. The solution is to establish and use internationally accepted codes for these areas. These codes do exist--the International Standards Organization country codes ISO 3166-1 and country subdivision codes ISO 3166-2  fill this need ( ). The problem is that these codes are proprietary-- the codes are sold by ISO and it is not clear what the legal situation on using the codes is if you are not a subscriber.

In addition to joining tables, users who want to integrate spreadssheet data into GIS applications must also link the data to a field in the GIS appliction's "attribute" table.A third level of linking involves integrating spreadsheet data into GIS shapefile data--in this case it is necessary examine the attribute tables in the main country outline file to see which type of code is used.

The most straightforward method of creating the links described below is to use database software such as Microsoft Access, LibreOffice Base or if you have considerable database experience one of the more powerful database applications (MariaDB, MySQL, etc). If you do not have access to or experience with database software the links can be set up using the VLOOKUP function in Excel or LibreOffice CALC. There are many sites with instructions on how to use the VLOOKUP function in Excel including The instructions for using VLOOKUP in Excel also work in LibreOffice CALC.

NATIONAL LEVEL -- Building lookup tables at the national level is fairly straight forward. The UN has identified the ISO 3166-1 country level codes as being the UN standard. These codes appear to have been accepted in principle by most of the UN agencies as standards as well. The ISO standard is made up of a three character alpha  code, a two character alpha code and a the three digit numerical code (in text format). Different organizations have chosen to use different versions of the codes --The Word Bank uses the three character alpha code, the US Census Bureau uses the two character version and the UN itself uses the three digit code.However, in many cases, down-loaded data tables may include country names with no codes at all. Many organizations publish the codes on their web sites in conjunction with their choice  of spelling for country names--but in almost all cases they are not set up to be downloaded. To deal with these issues this website has created a number of lookup tables linking the various forms of country names used by different organizations to the ISO codes and has created sample look up tables. These are available at mainlu.html.

SUBNATIONAL LEVEL-- No organization has identified the subnational ISO 3166-2 as an official code. These codes consist of a the alpha2 ISO national code, a hyphen, and a two character subnational code. For example the province of Manitoba in Canada would be "CA-MB" . As no organization is officially using the codes, users have to set up their own set of codes when linking tables. When you are dealing with sub-national data from only one country it may be possible to join the area names--this is particularly straightforward if the units have a standard spelling of their name. Another option would be to create a set of codes by sorting the subnational unit names in alphabetical order and then giving them a number code. The number of geographic units is usually relatively small and assigning the codes to the other tables can often be done rather quickly. It was not realistic to try to prepare a list of possible codes for the subnational units in all countries.

GIS TABLES --Loading data in a shapefile format into a GIS application is usually straightforward. The data is already geocoded. The problem arises when data in spreadsheet format is loaded into the GIS application. The spreadsheet can be dragged and dropped into the application, but it is essential that the spreadsheet file has a data column which matches the geographic column in the "attribute table" of the GIS application. If the GIS table is sorted by country the ISO three character code is often used--in which case the lookup tables developed for NATIONAL LEVEL linking should be sufficient. If the file is organized at the sub-national level the problem is more complicated. Incorporating spreadsheets into GIS data is discussed at linkGIS.html.

Copyright 1998/2021 GRI    Updated 8 March 2021