Back in the good old days of internet talent sourcing, you could find all kinds of amazing attendee lists and org charts data armed only with Boolean and a search engine. I hear people ask why this can’t be done any longer, but the answer should be obvious. The system admins that did not practice proper security protocols or keep up with security leaks are now working in a different industry, maybe in a fast food joint where their incompetence will only cost the company a free meal.
Technology cuts both ways, what they use to conceal we can uncover with new school methods and determination.
Docker and concept of “containers” is a fairly new, open-source method to manage applications in virtual environments. If you are recruiting in this space you know this is a fairly new skill, with many smart people learning this new technology outside their workplace. Since 2014 they have held an annual Dockercon event in the United States and subsequent events in Europe as well. The latest event was held in Seattle last June had more than 5000 attendees.
The attendees of Dockercon are serious technology people working for global enterprise companies. They are not going to accidently leave this data on the website. They have a conference app built by a third party company that lets attendees pre-register and communicate before, during and after the event. Now that I have identified the app, I need to somehow get in. If you imagined some super-secret trick out of Mr. Robot you would be wrong. I installed the app.
Once installed, I was given the option to connect my LinkedIn and Twitter profiles to help pre-populate my profile. Once I created my account, I found the web version of the app. The third party uses HTML5 to render the content, meaning you can’t right-click on anything (which is typically harder to scrape than other technologies). I could just use the search function and look up attendees one-by-one, but this not why you are still reading. You want all the data you greedy people, don’t you? Me too.
I created these visualizations using the data that I extracted using Google’s new Data Studio tool. For free you can build and share and a handful of reports and share them the same way you can Google Drive Sheets and Docs. This tool was primarily built for analytics data, but it works well as a live dashboard of Google Sheet data.
The first page includes some analysis I pulled together from the complete attendee data scrape. I did have to remove 400+ duplicate and partial profiles for better quality data. Here are the cliff notes, but you really want to check page one of my Dockercon dashboard.
67% provided phone numbers
1% connected the app to their LinkedIn or Twitter account
Engineer appears in four of the top five job titles
Article Continues Below
Page two of my dashboard demonstrates some of the filtering methods available with this platform. You can click the header to change the sort by method and view additional pages with pagination arrows on the bottom right.
Back to the scraping autopsy. I used two Chrome tools to grab the data, both are made by Postman. The first is an app simply called Postman. Apps, unlike extensions, are usually links to websites, but this one offers functionality not available from the website and IS Required. Postman Interceptor is their Chrome extension that captures your browser data and allows you to interact with it in the Postman app. Especially important here as this passes all browser cookies and header data as you interact with websites.
Set the extension to “Request Capture” then open the Postman App to see the skeleton of the websites you visit. The GET and POST API links on the left will change with time. If you have open tabs, you will notice how sites like Gmail, Twitter, and Facebook are constantly receiving updates in the background.
To sort through the noise, use the search box in the upper right to look for our target domain. Look at the GET links ignoring anything that mentions “metrics” or “admin.” Test a link by clicking “send” and scroll down to view the body, cookies, and header received. You are looking at the real Henry Gray’s Anatomy version of the web.
Look at the “pretty” version of the JSON results and you will see how the data is structured. The easiest method I’ve found to get this data is to copy the resulting JSON by clicking the double square icon on the right side. I paste that into a plain text file (copy and paste may be slow if you have large amounts of data, so don’t assume your computer is frozen). Now you need to convert this JSON data into something useful for humans. I have had luck with this free web tool, but many others do exist. Select export as XLS and open the file in Excel.
Take note of the patterns as you click through websites. This helped me convert the ID numbers into a direct link to each person’s profile that could not be found previously. I was happy to see the 513 people that worked for exhibitors at the event and the 3387 phone numbers listed.
Data munging or wrangling is not for everyone in sourcing, but when I send a hot list like this to my team they know they are working from data no one else has access to.