Most of the data downloads available follow a rule of including only one element of data per field. However, some of the most important data bases are badly flawed-- loading a single field with a mixture of more than one data elements, formatting numeric data as text, and including extraneous characters. These non-standard pages have to be modified before they can be linked to other pages for analysis or even used as standalone tables for generating charts and graphs. Sometimes the problem seems to be an oversight, but most of the non-standard data formats seem to have been carefully planned.The examples have been selected from downloads from three organizations, the United Nations Development Programme (UNDP), the Food and Agriculture Organization (FAO) and UNAIDS.
Reformatting the data will require use of a number of the functions available in both Excel and LibreOffice Calc as well as Find/Replace and Copy/Paste. The functions are classified mainly as text functions and include: TRIM, FIND, LEFT, RIGHT, MID LEN, VALUE. These functions when selected provide tips on the data required.The three examples will include 1) the inclusion of an extraneous character, 2) a simple example including a text character and a numeric character in the same field and 3) a more complex example involving three variables and a number of extraneous text characters in the same field. These are only samples, but cover most of the variations that can be expected.
Extraneous Character-- This example is taken from the UNDP. The UNDP online data tables include an extraneous character (a miniature flag) in the country field. UNDP provides data in at least two different formats--this data is available at http://hdr.undp.org/en/data.
. In the download the flags have been removed.
While it is not visible in the graphic, when the flag was removed, it left a hard space in front of the country name which the spreadsheet and database software interpret as a character-- which prevents automatic linking to the country field of non-UNDP data or even UNDP data in other formats. To see steps necessary to correct the problem click here
Simple Example-- This example is taken from FAO's FAOSTAT with data generated as a "pivot table". Note that for the country column the country name and county code are shown together. The same applies to the element and crop fields and the year column headers as well.
To see the steps necessary to correct the problem in each of the three fields click here
Complex Example--The complex example is drawn from UNAIDS.In this case a single field contains three different data fields plus a variety of other characters. This particular excerpt shows data for new HIV Infections.
To see the steps necessary to correct the problem click here