Advertisement

How to Learn Programming for Sourcers, Part 1: Ways to Grab All Email Addresses Without Paid Software

Article main image
Jul 25, 2017
This article is part of a series called Editor's Pick.

This post is for those who weren’t lucky enough to be among the first 50 people who signed up for Steven Jiang‘s free programming course for sourcers and recruiters that began in mid-July. They are on their way to learning how to code their own solutions to typical recruiting problems.

After Jiang first came up with this idea during a brainstorm we had in preparation for the new Programmers Track at the upcoming SourceCon Austin conference, he agreed that I could share highlights from the course with a wider audience since he necessarily had to cap the enrollment.

So, in this series of how-to posts between now and SourceCon, I will take you through some of the key learnings of the course, plus some other related programming tips that occur to me, and you’ll be in a much better position to absorb what’ll be presented during our new five-session track in Austin.

Before you dismiss reading further with a “I’m too old, I don’t have time, [insert other lame rationalization — ahem, reason — here] to learn something new,” let me tell you a short story:  I was working in campus marketing in the early ’90s when we landed our first world wide web client, a new portal geared to college students. Even though the public internet was in its infancy (computers connected with slow dial-up modems, when the top providers were AOL and Prodigy), I sensed this was going to be an exciting field and transitioned. I was in my early 30s when I first learned to raw code in HTML. Shortly after, I entered the recruiting world. It wasn’t until my early 40s that I first learned how to code a bookmarklet using JavaScript (very useful for sourcers, don’t worry, we’ll cover that in this post) and not until my late 40s when I first learned Excel VBA coding (introducing you to that in a minute, too). I didn’t even take my first formal programming course (in data science, learning Python, yep, you’re getting that here, too!) until nine months ago: in my early 50s!

Today, it’s much easier with so many free tutorials online, and even structured free courses with interactive, hands-on interfaces that present everything in bite-size chunks, such as Introduction to Python on Codecademy (yes, I’m recommending https://www.codecademy.com/learn/python if you’re not part of Jiang’s class) that you can start, stop and continue whenever you can slip in 15 minutes or more. Don’t tell me you can’t fit that into your schedule periodically. And you’ll be motivated to do it more often and/or longer as you start to see your competency grow. I’ll never be a top coder, but I’m now good enough to create some useful stuff and know where to go whenever I’m blocked and need relevant example code to address it. Ready to join me? Here goes…

  1. Standard Python + regular expressions: One of the useful things Jiang showed in his first class was how to grab all email addresses off a web page in Python. First, he had people save the web page to their computer (you can do this with File à Save Page As in your browser, or just select all (Ctrl+a), copy (Ctrl+c) and paste (Ctrl+v) to a blank Word doc). To keep it simple, he just showed how to print the results to the console (display on screen) with a web app that doesn’t require installing anything: your Python code runs in a web-based interface, similar to what’s used in the aforementioned Codecademy course. You can see his well-commented code at https://github.com/SourceConProgrammers/extractemail/blob/master/extractemail.py

To build on that, here’s how to (A) install Python and (B) a nice code editor, Atom, on your computer, which will let you do a lot more:

A. To get the full version of Python, go to continuum.io/downloads and download Anaconda (full versions for Windows, Mac, and Linux are free). You should do it on your personal computer unless you have admin rights on your work computer. It does take up a lot of disk space so you can install the Miniconda option instead if need be. It offers Python v 2.7 (a/k/a python2.7) or 3.5 (a/k/a python3) as your default, though you can switch between them (2.7 is probably safest).

B. You’ll also want a robust free code editor that works on Windows, Mac, and Linux such as Atom (atom.io) which recently added full GitHub integration (https://github.atom.io/). Once installed and launched, from the Atom welcome screen, click Install Packages (or press Ctrl+Shift+P), and type Script in the search box (“Script – run code in Atom” should be the first result) and click the Install button. (See https://www.quora.com/How-can-I-run-Python-in-Atom if you want to configure the version of Python to run your script with.)

Close and relaunch Atom after installing Script. Now you can create a new file in Atom with the shortcut Ctrl+n, save it with Ctrl+s, open an existing file with Ctrl+o, and run your code to test it with Shift+Ctrl+b at any point. Those commands are for Windows PCs, check the built-in help or www.atom.io/docs for Mac, and there’s a lot more you can do explained there, plus www.youtube.com/watch?v=AtMRdxJTmPE is among the many videos explaining further.

2. Now I’ll give you a version of Jiang’s code that saves the output of the Python script to a CSV file instead of printing to screen so that you can open the list of emails immediately in Excel, import into Google Sheets or a CRM, etc. It’s available at https://github.com/gutmach/SourceConExtras/blob/master/emailextract-gg.py which you are welcome to copy. As indicated in the comments, make sure to change the folder paths and filenames listed in lines three, six and seven to where you’ll have them on your computer.

3. Javascript (bookmarklet): Wouldn’t it be convenient to have this functionality just sitting in your browser that you could run in a single click? That’s what bookmarklets are: bookmarks that can sit in your favorites/bookmarks bar (or deeper in favorite/bookmark folders), but they contain JavaScript, so they can do things, not just take you to another website.

So I created a version that works similarly to the Python ones but in JavaScript. The actual code is at https://github.com/gutmach/SourceConExtras/blob/master/find_MailTo_emails.js, and it is explained line-by-line at https://github.com/gutmach/SourceConExtras/blob/master/find_MailTo_emails_explained.js, but the best way to install it in your browser is to:

  • Make a new favorite/bookmark in your favorites/bookmarks bar for any web page.
  • Highlight and copy the code from the first .js page (starting with javascript: )
  • Right click on the new bookmark/favorite and select Edit. Now remove the URL populating that bookmark and paste in the code (i.e., what began HTTP… is now gone and replaced with javascript:…) and then click OK to save/close it.

You can create any bookmarklet that way, just replace the URL of a regular bookmark/favorite with the JavaScript code starting with javascript: but I made it slightly easier in the case of this particular bookmarklet:  You can drag the FindAllEmailsOnPage hyperlink in the third paragraph of www.recruiting-online.com/bookmarklets.html into your browser’s favorites/bookmarks bar. That page also explains how to pull my bonus sets of free bookmarklets into your browser, which are very useful to sourcers!

The only major differences you may have noticed between the Python and JavaScript versions is that the bookmarklet:

  • Acts on the web page itself (no downloading to a file required), and
  • Returns what’s between the quotation marks being hyperlinked (inside the HTML tag visible only in the source code). This is advantageous in situations such as where the page displays the hyperlinked words Email me, but the href is mailto:glenn@whatever.com — this JavaScript grabs the actual email address rather than the words “Email me.” However, it ignores unlinked email addresses which the Python versions grab just fine.

Can all of the advantages be combined in one Python or JavaScript, so you grab both email types on any given web page and not have to download a file first?  Of course, but we’ll save that for a future lesson, or maybe one of you can share a solution in the meantime! (Feel free to post it to the SourceConExtras GitHub if so.)

  1. Another great way to run something like this email grabber would be in Excel. Did you know Excel has its own built-in programming language? It’s called VBA, and the code is saved in what’s called macros. These macros are conveniently saveable into Excel files you can share (filetype .xlsm but otherwise indistinguishable from the default .xlsx), and you can run them with just a couple of mouse clicks. So you could, for example, populate a whole list of URLs of web pages that you would want to grab emails from, cell by cell, in a spreadsheet, and the Excel VBA macro could do the same kind of processing as the previous Python and JavaScript solutions, populating an additional column with emails from each page into your same Excel file or create and populate them into a new file!

Since I’m running long, I’ll save that for next time, too, but I hope I’ve gotten you excited about taking the leap into learning some coding for sourcing purposes!

Disclaimer: I’m not a serious programmer, and I’m very busy managing a team of sourcers at a large company, so I don’t have time to provide support to answer install and usage questions. Please ask your questions in the Sourcers Who Code group on Facebook, on Codecademy.com’s Forums, or on StackOverflow.com and hopefully, someone with more time can answer them. Or grab one of the presenters in the Programmers Track at SourceCon Austin!

This article is part of a series called Editor's Pick.