Data Extraction Not Data Distraction

Every candidate that is hired is discovered in some capacity. They apply for a job, they are sourced from a career website like Dice or Monster, pulled from a social networking website, sourced from a search engine query, etc. They are added to a list in a CRM, ATS, or spreadsheet to be “worked” by a recruiter.

There has been a lot of discussion lately around what a talent pipeline is, Glen Cathey called it a virtual bench at Talent42, Shally Steckerl recently wrote an article on pipeline building. There are many opinions on the topic, but what is guaranteed is that everyone has to find leads, then contact and convert them to active candidates either “just in time” or in the future.

The harder the role or the more educated or experienced the recruiter or sourcer, the greater the likelihood that the lead/candidate information will be generated online leveraging research techniques. Most likely, the data will then be extracted manually to a system or spreadsheet to be worked. I see the virtual hands going up, especially from vendors.

Todd there are tools that can do that, yes (kind of) true but here is the rub.

  • They’re new, clunky, and  buggy.
  • Everyone is passionate about their solution but are they passionate about helping you be passionate about their solution?
  • They cost money (a lot) or will very soon.
  • You are not in control of your search and data extraction.

Options for Extraction 

Since joining Madrona, I’ve been on a quest to become a decent coder in at least JavaScript. Leveraging JavaScript, node.js, some JQuery, and Cheerio (http://nodejs.org/, https://github.com/cheeriojs/cheerio) for data extraction could be a solution but I would argue that knowledge and time would be a stumbling block.Out of all the classes I have taught, most of the people attending were limited in their sourcing knowledge, let alone being at a level of coding proficiency to pull off something like that. Also, extracting data from username and password websites like Facebook complicates the extraction process.

So what is the answer, how can I source and extract the data without a major time suck?

My session at SourceCon 2014 in Denver will cover a bit of the node.js/Cheerio method but I will primarily focus on leveraging the Kimono Labs product for data extraction / creating your own API and addressing issues such as pagenation (selecting multiple pages containing data) with a simple and effective point and click solution. I will show at SourceCon how I extracted over 2,000 attendees, their name, title, company and social media links from one website with the tool. I will also show how to export that data, embed it in a website or even create your own mobile app.

Last year, at SourceCon in Seattle, I showed how to use Memonic for Firefox to extract Facebook Graph Search results. This time, I have a Firefox and Chrome tool (Nimbus Clipper) that will extract the list easily to be worked and won’t be hindered by the security. I hope you’l join me!

Article Continues Below

Don’t let data extraction be a distraction, own it like a boss. –Todd

image credit: bigstock

Todd Davis

Todd Davis took his first AIRS certification test in 1998 and was the first external Internet Recruiter, as they were called at the time, hired by Microsoft. He has had the opportunity to work with Google, Yahoo!, Starbucks and others during his career. He has trained sources\recruiters at various companies during his career and has recently spoken at the Talent42 conference and at the Sourcing 7 (S7) organization in Seattle. He is currently a Senior Talent Acquisition Professional at Madrona Venture Group.