Sssshhhh, I have a secret, but you can’t tell anyone, deal?
As Thomas Powers said: “The first rule in keeping secrets is nothing on paper: paper can be lost or stolen or simply inherited by the wrong people; if you really want to keep something secret, don’t write it down.”
We can customize that quote a little bit by replacing “paper” with “computer.” Every company is working hard to protect their data, and they are training their people on how to do this adequately. Companies are also using many tools to prevent data leaks that could potentially hurt their business or reveal their plans and strategies to their competitors or investors.
There are many ways for this information to be leaked on the internet. Sometimes confidential information is leaked intentionally, sometimes it is leaked because of human error, and sometimes it is leaked because of a technical error. However, as you know, “Everything posted online is there forever, even after it has been deleted.” So, when you combine the possibility of technical and human error, the result is that confidential information could easily appear on Google.
And we sourcers love secrets!
Confidential Information
As a sourcers, we always depend on our carefully selected search keywords, and targeting confidential information is going to be no exception. We will need to target the keywords that are most relevant to our search.
There are many keywords that you can target like: confidential; internal use only; not for distribution; not for public distribution; classified; document is private, etc. Don’t forget that these words are only applicable to a search for English documents. If you are living in France, Germany or another country, you should use these words and phrases in the relevant language.
Creating the string when you are searching for confidential data is very easy; just generate the best list of the keywords for that search.
Example: (“confidential” OR “internal use only” OR “not for distribution” OR “not for public distribution” OR “classified”)
You can add more operators like intitle: with the current year, which should find all the pages with 2018 in the title of the page.
Example: (“confidential” OR “internal use only” OR “not for distribution” OR “not for public distribution” OR “classified”) intitle:2018
If you would like to target more years, you can just add more intitle: operators.
Example: (“confidential” OR “internal use only” OR “not for distribution” OR “not for public distribution” OR “classified”) (intitle:2017 OR intitle:2018)
You can also combine inurl: operator with intitle: operator.
Example: (“confidential” OR “internal use only” OR “not for distribution” OR “not for public distribution” OR “classified”) (intitle:2018 OR inurl:2018)
You can also try to target file types. That’s where the filetype: operator comes in handy.
Example: filetype:pdf (“confidential” OR “internal use only” OR “not for distribution” OR “not for public distribution” OR “classified”)
When you are using a more general keyword like “confidential,” you are going to get lots of results with that keyword in it. Often, you are going to get lots of results that are not relevant, and that’s why I choose not to use it in my strings.
Example: filetype:pdf (“internal use only” OR “not for distribution” OR “not for public distribution” OR “classified”)
If you would like to use some advanced strings, just add more operators or keywords.
Example: filetype:pdf (“internal use only” OR “not for distribution” OR “not for public distribution” OR “classified”) (sourcing AND recruitment)
You can also use the site:operator to target specific sites.
Example: site:com (“internal use only” OR “not for distribution” OR “not for public distribution” OR “classified”)
This string is targeting .com domains. If you would like to target, for example, domains only from Germany, you will just replace site:com with site:de.
Conclusion
The best way to keep a secret is to pretend there isn’t one. Don’t add keywords like: confidential; internal use only; not for distribution; not for public distribution, etc. If you need to use these words or phrases in your own documents or presentations, there is a simple trick that you can use to protect these documents. Replace the text “internal use only” and similar warnings with an image of that text.
Search engine robots that are indexing your domain are not able to run an OCR (optical character recognition) on images that are included in the file. And because they can’t read that text in the photo, people searching for accidentally leaked confidential information are not going to find it online.
Your material will still have the “Internal use only” warning on it, but you will make it a little bit harder for others to find those files through search engines if the search engines are going to index them.