REFORMATTING TEXT TO NUMBERS

A number of international organizations, including WHO and UNAIDS,  have created spreadsheets for downloading in which numeric data is formatted as text. This means that the data must converted to numbers before the data can be used in calculations. In some cases the conversion requires only a selection of all the problematic data and converting it to a numeric format using the format menu available when you right click on the data--this would apply when there are no text characters in the relevant fields.  When there is a single text character--for example when a number of the fields contain data which resembles "<356" or even "<=356" the characters can be removed using the search and replace function of the spreadsheet then converting the remaining text to a number. These solutions apply to either Excel or LibreOffice Calc. [For instructions on "search and replace" in Excel click here. There are instructions for the "find and replace" function in LibreOffice Calc here.]

Dealing with fields which contain two or more separate data elements combined with text characters creates a more difficult situation which requires the use of spreadsheet string functions. Such a field might look like "20.4 [4.4-42.6]" where the data represents percentages--the first figure 20.4 is the estimated percentage of the population which is obese while 4.4 and 42.6 are the estimated lower and upper bounds. It is not possible to consider all of the potential variations of this problem, but using this data as a sample will introduce the relevant string functions and demonstrate the methodology.

Using Spreadsheet String Functions: The sample solution presented below will work with either Excel or LibreOffice Calc which both use the same string functions.

Converting the contents of a cell which contains data in a format like "20.4 [4.4-42.6]" requires copying each of the data elements "20.4", "4.4" and "42.6" into a separate cell, converting them into a numeric value and because they are percentages formatting them so the they can be used as percentages in calculations.

The basic steps are to:

  1. Find the start and end position of the wanted data
  2. Extract the wanted data
  3. Convert it to a number
  4. Convert it to a percentage
  5. Save it as a number value in a cell

These activities will require the use of a number of spreadsheet functions. We use FIND AND LEN to determine the first and last positions of the data in the string, we use LEFT, RIGHT, and MID to extract the data. We use VALUE to convert the string to a numeric function and we use the arithmetic operators, + (add) - (subtract), * (multiply) and / (divide) for any necessary calculations. More information on the string functions in spreadsheets can be found at here . The explanations apply to both Excel and LibreOffice Calc.

A sample spreadsheet in XLS format with the calculations can be downloaded here. The data is taken from a WHO database on obesity at http://apps.who.int/gho/data/node.main.A896?lang=en. The formulas from the sample can be copied directly into the source spreadsheet to carry out the described calculations.

The first step is to select and extract the left hand value from column I to column J as a number. The formula in column J for row 7 is =VALUE(LEFT(I7,(FIND("[",I7)-1))) to simplify further calculations the remainder of the data in column I is copied to column K with the following expression in cell K7 =MID(I7,(FIND("[",I7)+1),(LEN(I7))-(FIND("[",I7))-1) the data in K7 is then parsed into cells L7 and M7 with the formulas =VALUE(LEFT(K7,(FIND("-",K7)-1))) and =VALUE(RIGHT(K7,(LEN(K7)-FIND("-",K7)))).  The data in J7, L7 and M7 are then copied as values into O7, P7 and Q7 respectively and each divided by 100 to turn them into percentages.

This process will seem complicated to the inexperienced user but by using the definitions of the string functions and breaking the process into steps the process soon becomes more understandable.

Copyright 1998/2015 GRI    Updated 5 May 2015