Online tool to clean and transform Excel and Google Sheets data.

1) Copy (Ctrl+c) your data from Excel or Google Sheets. 2) Paste it on this page (Ctrl+v). 3) Adapt the JavaScript process function. 4) Copy/download the result.

How it Works

On the Data tab you paste tab-separated-values (TSB) copied from Excel or Google Sheets. The data gets converted to an array of hash objects, each representing a row. The row objects will have as keys the column header names if you've toggle on the auto-detect headers option. Otherwise the keys will be the column index starting at 0.

The JavaScript process function maps data from Input to Output. Once written, you can reuse it on other data sets that uses the same logic for cleaning and transforming data.

Function process will be passed as arguments input and columns. Column headers (or indices) are passed in array columns for convenience and lookup. The process function must return an array of rows where each row is a hash. It will get displayed in the Output table. You will be able to copy or download a CSV of the output.


Helpers

Libraries Underscore.js, underscore.string and Moment.js are available when you write the process function. Checkout the Tips tab for common cleaning patterns.


Data Confidentiality

Data you paste in Data Clean and code you write is kept on your computer in local storage. It is not uploaded to our servers unless you decide to share with colleagues and use the Save link to save your session in order to share with a colleague or save for later.


Security

The JavaScript function is run in a sandbox environment using a web worker. This prevents malicious code from running on your computer. It also allows you to stop processing in case there is an infinit loop within the process function.


BETA Software

Data Clean is new and still in BETA. If you find bugs or have suggestions, please open a GitHub issue.


Change Log

2018-11-30

Initial BETA release.


Hire Me!

Want help to write the process function to clean and transform your data? Email me and I'll be happy to quote you.

martindrapeau@gmail.com


Let's say we have a table of game matches. This is the input table.

Match Date Result
Twins vs Yankees 17-10-03 4 - 8
Rockies vs Diamondbacks 17-10-04 8 - 11
Red Sox vs Astros 17-10-05 2 - 8

We would like to split up the teams and match results. We also want to format the date differently. To something like this:

Home Away Date Home Score Away Score Winner
Twins Yankees 3 Oct 2017 4 8 Yankees
Rockies Diamondbacks 4 Oct 2017 8 11 Diamondbacks
Red Sox Astros 5 Oct 2017 2 8 Astros

Function call process(input, columns) will be passed:

input

[
	{
		"Match": "Twins vs Yankees",
		"Date": "17-10-03",
		"Result": "4 - 8"
	},
	{
		"Match": "Rockies vs Diamondbacks",
		"Date": "17-10-04",
		"Result": "8 - 11"
	},
	{
		"Match": "Red Sox vs Astros",
		"Date": "17-10-05",
		"Result": "2 - 8"
	}
]

columns

["Match", "Date", "Result"]

We could write the process function as such:

function process(input, columns) {
  var output = [];
  input.forEach(function(inRow, r) {
    var outRow = {};
    
    var teams = inRow.Match.split('vs');
    outRow.Home = teams[0].trim();
    outRow.Away = teams[1].trim();
    
    var date = moment(inRow.Date, 'YY-MM-DD');
    outRow.Date = date.format('MMM Do YYYY');
    
    var scores = inRow.Result.split('-');
    outRow['Home Score'] = parseInt(scores[0].trim(), 10);
    outRow['Away Score'] = parseInt(scores[1].trim(), 10);
    
    outRow.Winner = outRow['Home Score'] > outRow['Away Score'] ? outRow.Home : outRow.Away;
    if (outRow['Home Score'] == outRow['Away Score']) outRow.Winner = 'Tie';

    output.push(outRow);
  });
  return output;
}

The output will be an array of rows; a list of hash objects like this:

[
  {
    "Home": "Twins",
    "Away": "Yankees",
    "Date": "Oct 3rd 2017",
    "Home Score": 4,
    "Away Score": 8,
    "Winner": "Yankees"
  },
  {
    "Home": "Rockies",
    "Away": "Diamondbacks",
    "Date": "Oct 4th 2017",
    "Home Score": 8,
    "Away Score": 11,
    "Winner": "Diamondbacks"
  },
  {
    "Home": "Red Sox",
    "Away": "Astros",
    "Date": "Oct 5th 2017",
    "Home Score": 2,
    "Away Score": 8,
    "Winner": "Astros"
  }
]

The Output table will contain a tabular representation of this data. You can copy/paste it back to Excel or download it as CSV.


Hire Me!

Want help to write the process function to clean and transform your data? Email me and I'll be happy to quote you.

martindrapeau@gmail.com


Cleaning Dates

Moment.js is available in global function moment() to help you parse and format dates.

// Parse a string as date and verify it's valid
moment('2012-01-12').isValid(); // true
moment('lorem ipsum').isValid(); // false

// Parse a known date format and convert to another format
moment('09-12-24', 'YY-MM-DD').format('MMM Do YYYY'); // 'Dec 24th 2009'

By default, the locale loaded is English en.


Handling strings

Lots can be done with vanilla JavaScript

// Split a string based on a separator
'Red Sox vs Astros'.split('vs'); // ['Red Sox', 'Astros']

// Split a string based on a case insensitive separator
'Red Sox VS Astros'.split(new RegExp('\\s+' + 'vs' + '\\s+', 'i')); // ['Red Sox', 'Astros']

// Convert to lower case
'RED SOX vs ASTROS'.toLowerCase(); // 'red sox vs astros'

Underscore.string functions are available in the global namespace s to help you to handle strings.

// Remove diacritics (accents) from a string
s.cleanDiacritics('Élévation'); // Elevation

// Break a string into an array of words
s.words('Hello my friend'); // ['Hello', 'my', 'friend']

// Reformat as an English title
s.titlize('RED SOX vs ASTROS'); // 'Red Sox Vs Astros'

// Clean a string before comparison
s.cleanDiacritics(' Élévation ').toLowerCase().trim() == 'elevation'; // true

Hire Me!

Want help to write the process function to clean and transform your data? Email me and I'll be happy to quote you.

martindrapeau@gmail.com