DataMiner, Excel, and OneTab, Oh My!

There’s been some great conversation lately about how to use coding and automation to make your life easier as you recruit, and I wanted to add my voice. Here’s how you can combine a few free tools to save yourself some time on your sourcing, even with no (or very little) understanding of how to code. You won’t need to install any software packages for this, but you will need access to DataMiner.io, DataMiner’s Recipe Creator, a spreadsheet editor like Google Sheets or Excel, and OneTab. I’m sure someone with a bit more experience might have suggestions for how to do this faster or better, and look forward to learning more in that conversation!

One of my favorite places to look for technically talented folks is GitHub. I’m sure by now you need no introduction to it, but consider this a part two to my earlier article. In that piece, I talked about how you can use Google X-Ray searches to find qualified candidates – consider this the natural sequel and next level to that article. One of the things I love about GitHub is that it’s a hub for open-source tools and technologies.

This is going to be targeted to a 101 audience, so forgive the explanation: open-source software is software that is free to use for anyone. If you’re unfamiliar with it, I encourage you to take a look at NumFocus’s Open Source Guide. Open-source software hosted on GitHub often means that anyone can go the repository and take a look at how something is built; developers can, and usually, do! Make a copy or clone of the code and make their own tweaks to it – those can either go on to live their own online lives or later be merged back to the “master” branch of the repository or repo. Github has all sorts of version control features built in, but from the recruiting side of things, those are less important to go into. If you’re looking for developers, it’s often worthwhile to see who is coding, commenting, and contributing to these software packages.

However, sometimes you’re looking for people who know how to code in an open-source language, but maybe don’t tweak the actual source-code all that much. That’s where the “issues” tab comes in. Anyone with a GitHub account can submit an issue to a repository, though there are often FAQs and Guidelines to read first and so a GitHub repository often becomes a built-in user community. Best of all for a recruiter, these conversations happen on an open website; they aren’t locked behind users-only paywalls.

Here’s a real example: Angular is a web development framework that many companies love. The developers who code and contribute to it are undoubtedly fantastic, but if you’re looking for a front-end developer to code up your business’s website, you might not need to find someone who’s actively contributing to the master repository. There’s also a relatively small pool of contributors: only about 500 developers have ever committed anything to the repository, and just about 20 people have made more than 100 commits. On the other hand, as of this writing, there are over 1800 open issues and over 10000 closed issues. That means that there are several thousand developers who have used Angular and noticed something wrong with it and then brought it to someone’s attention. I think it’s fair to assume that they’re actively using whatever framework, library, or tool you’re looking at and potentially know it very well, even if they aren’t on LinkedIn, haven’t given you a resume, or are otherwise hard to find.

Here’s where it gets tricky. When you look into the issues tab, you can see the author’s username, but you can’t click directly on their profile, and you can’t sort by all authors. If you click someone’s username, you’ll go to a page of all of their issues. For example, if GroverTheCat (not a real user) opened an issue, and you clicked on his name, you’d get to https://github.com/angular/angular/issues/created_by/GroverTheCat. There are a few manual options here – you could search for GroverTheCat in the GitHub built-in search feature, type in github.com/GroverTheCat, or manually edit the URL after you open it, but they’re all very time-consuming.

Here’s where you can save some time by automation! I used DataMiner and its Recipe Creator. DataMiner is an extension that allows you to scrape websites by using “recipes” that tell it what to pull from each page, and it comes with a pretty impressive library of public recipes. If you want to customize it, though, I strongly recommend installing its Recipe Creator side-extension. This will let you quickly select exactly what information will be scraped. They also have features to scroll down to the end of a page automatically and to click to the next page automatically, but be aware of the size of your data set and how long the jobs you send might take if you select them.

The Recipe Creator is exceptionally intuitive and will walk you through exactly what you need to do to get the information you need. DataMiner exports to Excel or CSV files, so think about how you need information in either rows or columns. I wrote a recipe to find all of the usernames on the page and store them in rows, this is a reasonably intuitive recipe to write, and DataMiner’s support page explains it excellently.

After you’ve written your recipe, refresh the page, open data miner, and select your recipe. Run a few tests to see how it works, and then you can run the recipe. If you want to learn more about DataMiner, I encourage you to check out their user community and how-to guides.

Article Continues Below

After you run the recipe, you’ll have a CSV or excel file with a bunch of usernames, and you might have pulled in some other information – for instance, mine had a few single-digit numbers included that needed to be cleaned up. Here’s where you can pop in some quick excel editing. There’s probably a macro to make this even faster, but this process is still reasonably efficient.

First, using Excel’s built-in features, you can remove duplicates (in my version, this is a feature under the ‘data’ heading in their home ribbon). This way, no matter how many issues a developer has opened, you only look at their profile once. Then, sort your rows and manually remove the noise – in this case, those single digit numbers. You now should have a clean list of just usernames, and you’re in the home stretch.

Again, there’s probably a macro that can do this next step even faster, but you can turn those usernames into clickable URLs in a few steps. Just insert a column to the left of your usernames and populate it with the start of the GitHub profile URL  “www.github.com/.” The backslash is essential, and there should be no spaces.

You can combine these columns with a quick concatenate function. This might be different depending on your spreadsheet editor. On excel, you can use the CONCAT function, but on Google Sheets, it’s CONCATENATE. Don’t be afraid to try both! In the column to the right of your usernames, enter “=CONCAT” or “=CONCATENATE” and then select the columns you want to combine. If you had the right function, you should now have a column full of URLs.

Here’s where the fun part comes in. Make sure you have OneTab (or a similar tab management tool) installed first! If your computer runs slowly, I recommend closing all of your other open tabs, OneTab will magically restore them if you use it right and opening a new window. Open up a Multiple URL Opener site (I used this one). In chunks of about 30 URLs at a time, paste over the URLs from your spreadsheet, then open them all. If your computer doesn’t slow down, you can go through them one by one and close them as you identify folks, or you can use OneTab to collapse all of the profiles into lists quickly. I recommend naming your groups by the repository or language you went through.

Now all that’s left is to go through and cross-reference profiles – happy sourcing!

Sarah Goldberg is a Senior Sourcing and Research Specialist at Objective Paradigm. With three years of experience in sourcing, Sarah focuses on leveraging data and research to provide the best “sourcery” - candidates, processes, and tools - for her clients. Sarah focuses on finding IT talent for FinTech and start up companies in Chicago. In her free time, she is a co-organizer of Chicago Queer Tech Club. She earned a Bachelor of Arts with Honors in Classical Languages and English Literature at the University of Chicago.

Topics