Data Janitor Tips

Libraries Underscore.js, underscore.string and Moment.js are loaded and available along with vanilla JavaScript to help you transform and clean data.

In addition, you can validate an email with _.isEmail(email). It will return true or false.


Cleaning dates

Moment.js is available in global function moment() to help you parse and format dates.

// Parse a string as date and verify it's valid
moment('2012-01-12').isValid(); // true
moment('lorem ipsum').isValid(); // false

// Parse a known date format and convert to another format
moment('09-12-24', 'YY-MM-DD').format('MMM Do YYYY'); // 'Dec 24th 2009'

By default, the locale loaded is English en. You can use the locale method to switch to another language:

// Change to French and parse a date
moment.locale('fr'); // Returns 'fr'
var d = moment('25 février 2018', 'DD MMM YYYY');
d.isValid(); // true

// Format in English
d.locale('en');
d.format('LL'); // February 25, 2018

Validating emails

Use _.isEmail(email) to check if an email is valid.

_.isEmail('jean françois@example.com') // false
_.isEmail('martindrapeau@gmail.com') // true

Handling strings

Lots can be done with vanilla JavaScript

// Split a string based on a separator
'Red Sox vs Astros'.split('vs'); // ['Red Sox', 'Astros']

// Split a string based on a case insensitive separator
'Red Sox VS Astros'.split(new RegExp('\\s+' + 'vs' + '\\s+', 'i')); // ['Red Sox', 'Astros']

// Convert to lower case
'RED SOX vs ASTROS'.toLowerCase(); // 'red sox vs astros'

Underscore.string functions are available in the global namespace s to help you to handle strings.

// Remove diacritics (accents) from a string
s.cleanDiacritics('Élévation'); // Elevation

// Break a string into an array of words
s.words('Hello my friend'); // ['Hello', 'my', 'friend']

// Reformat as an English title
s.titlize('RED SOX vs ASTROS'); // 'Red Sox Vs Astros'

// Clean a string before comparison
s.cleanDiacritics(' Élévation ').toLowerCase().trim() == 'elevation'; // true

Reordering columns

Columns are ordered as you create them. For instance, consider this:

// Input
[{
  "Last Name": "Doe",
  "Dob": "25 March 2001",
  "First Name": "John"
}]

// Process function
function process(input, columns) {
  var output = [];
  input.forEach(function(inRow, r) {
    var outRow = {};
    outRow['First Name'] = inRow['First Name'];
    outRow['Last Name'] = inRow['Last Name'];
    outRow['Dob'] = inRow['Dob'];
      
    output.push(outRow);
  });
  return output;
}

// Output
[{
  "First Name": "John",
  "Last Name": "Doe",
  "Dob": "25 March 2001"
}]

Sorting rows

Assuming we have this input table:

Id Holiday Day of week Date
1 New Year's Day Tuesday January 1, 2019
2 Good Friday Friday April 19, 2019
3 Easter Monday Monday April 22, 2019
4 Victoria Day/National Patriots Day Monday May 20, 2019
5 Quebec national Holiday Monday June 24, 2019

Using Underscore.js's sortBy() function to sort on a column.
Note: This will return a copy of the input array. Hence we need to reassign it.

// Sort by Id in ascending order
input = _.sortBy(input, 'Id');

// Sort by Id in desceding order
input = _.sortBy(input, function(o) {
  return -o.Id;
});

// Sort by date
input = _.sortBy(input, function(o) {
  return moment(o.Date).unix();
});

To sort by name, alphanumerically, we may combine JavaScript's Array.prototype.sort() method with underscore.string's naturalCmp(string1, string2) function.
Note: In this case the array is sorted in place - no need to reassign it.

input.sort(function(o1, o2) {
  return s.naturalCmp(o1.Holiday, o2.Holiday);
});

Throwing errors

If you encounter unfixable data, you can stop processing by throwing an error. For example, if an email is bad:

if (!_.isEmail(inRow.Email)) throw 'Not an email ' + inRow.Email + ' on row ' + r;

Data Janitor trow an error