Data hygiene is not sexy, but it is necessary whether you are getting data from manually typed, scraped, or coded sources. Microsoft Excel’s looks have not changed in the last decade. I still consider myself a novice because no one person will ever use all the program’s capabilities.
I learned early on the essential Excel skills to help track my calls, emails, and submittals. I wasn’t great, but my templates were better than the rest and then shared with the whole team once I made improvements based on their suggestions. It moved from time saver to SOP with standardized file, and folder, naming practices.
I could copy and paste data to make weekly reports (freeing my boss’s schedule up so he could work on his golf swing and enjoy those websites the rest of us could not access). The same data we used internally paid off again to show the work performed. Excel become another always-opened program on my computer.
Taking Excel Cleanup to Another Level
Four years ago, I bought KUTools for Excel to help when I was having issues writing my own macros. I paid out of pocket for lifetime updates to this Excel add-on, and it has paid for itself many times over. Here’s an example of how this tool has worked wonders for me:
We were splitting one company in two, and the ATS needed to be shut down. Two new instances would take its place. Neither “hydra” ATS could contain data from the past, but no one wanted to lose years of valuable data. They said we could use the data so long as we did not have to pay for the export. Past attempts failed because no one researched what “ODBC timeout expired” meant. In layman’s terms, the system crashed if it took more than a minute to compile. The solution was to ask for smaller amounts of data. I found 8,000 records were safe, which worked out to two months in most business units.
Now the mission was to make it the same as it had been managing the original data. I had to sew together smaller reports, making sure the labels and formats were exact matches. The scale of this gave me a reason to learn how KU Tools could help me then, and I use it to this day because I forced myself not to do things manually.
Top 11 Ways This Tool Saves Time
The name of each function hyperlinks to online documentation if you want more detail.
Anyone sourcing Europe should stop here to get their boss’s AMEX card. It comes with a good number of presets to normalize names quickly, but you can customize those rules to your liking. Accented characters fail on email patterns, make research more difficult, and will create duplicates.
2.Find and highlight Duplicates
Deleting duplicates always seemed too dramatic. Highlighting duplicates lets you quickly sort by color and make the manual decisions when you don’t have a unique ID.
Easy to use, and works fast.
4. Extract Text Between Two Symbols
Web scraping quickly sometimes yields sloppy-looking results in a standard pattern. This is a quick and accurate solution.
You should learn how to concatenate because it opens infinite possibilities, but this makes it super easy for anyone to use. Preview sample text on the right for fast and error-proof living.
Even the fanciest formula fails with one extra space.
Create a checklist of files in a folder with links directly to the file
Data cleanup hell starts with problems like this. Comes with enough options to help you find a workable solution.
Real life saver when you are working with data connected to formulas. Change the data without breaking everything.
Sew together worksheets with this brilliant solution when you can only get 50 or 100 results at a time.
It is not perfect, but when you are moving quickly, this works very well.
Here are the other tools that I have experience with for purpose built for data cleanup. These tools help to automate these and many more complex tasks. If you regularly pull large amounts of data in one format and always need to export to a common format, these tools are worth exploring in my opinion.
Open Refine (formerly known as Google Refine) is free and runs on your computer locally. It is very slow and in my experience, crash prone.
Trifacta Wrangler is called what they do Data Munging. They use matching algorithms to identify problems like anomalies, duplicates, and format changes. Once you write a good reformatting script, it can rerun the script anytime for files or folders. The free version by limited by file size, so you will have to invest if you end up loving it.
I hope this another example of the value non-sourcing tools have if you want to excel in your craft. ?