Are you having to constantly do data cleaning and transformation in Excel? Are you repeating those steps over and over? Don't you wish you could save those operations and use in other Excel files easily?
Data Janitor helps you automate and save data cleaning recipes right in your browser. Data Janitor has tons of helpers to handle dates, strings and numbers.
In Excel or Google Sheets copy an entire worksheet (Ctrl+c). In Data Janitor on the Data tab paste (Ctrl+v) that data. The data gets converted to an array of hash objects, each representing a row. The row objects will have as keys the column header names if you've toggle on the auto-detect headers option. Otherwise the keys will be the column index starting at 0.
The JavaScript process
function maps data from Input to Output.
Once written, you can reuse it on other data sets that uses the same logic for cleaning and transforming data.
Function process
will be passed as arguments input
and columns
.
Column headers (or indices) are passed in array columns
for convenience and lookup.
The process
function must return an array of rows where each row is a hash.
It will get displayed in the Output table. You will be able to copy or download a CSV of the output.
Libraries Underscore.js, underscore.string and Moment.js are available when you write the process
function.
In addition, you can validate an email with _.isEmail(email)
. It will return true
or false
.
Checkout the Tips section for common cleaning patterns.
Data you paste and code your write in the Sandbox session is kept on your computer; in the browser's local storage. It is not uploaded to the server.
Use the Save link to save your session to the server. You can share the link to that session with co-workers or bookmark it for later.
You can delete a saved session from the server at any time. This will not delete all data from computers of people you have shared it with. You will need to ask them to delete the saved session as well.
The JavaScript function is run in a sandbox environment using a web worker.
This prevents malicious code from running on your computer.
It also allows you to stop processing in case there is an infinit loop within the process
function.
Data Clean is new and still in BETA. If you find bugs or have suggestions, please open a GitHub issue.
Jan 26, 2019
Removed 64k limit on download button.
Dec 31 2018
Added ability to name saved sessions.
Dec 25 2018
Expose sessions in UI.
Dec 19 2018
Added ability to save session and request service.
Nov 11 2018
Initial BETA release.