CompTIA Pentest+ PT0-002 – Section 4: Passive Reconnaissance Part 5

  • By
  • January 23, 2023
0 Comment

32. Public Repositories (OBJ 2.1)

As you’ve seen by now, data is everywhere online if you just know where to look at it now. Now, some other great places to look when you’re doing your reconnaissance is public source code repositories as well as website archives. Now public source code repositories are websites that allow developers to work together in an agile way to create software very quickly. Because of this, they’re very collaboratively focused. They’ll allow people to work together regardless of where their location is and share data and source code together.

Now, these things can be set up as either public or private and a lot of times, files that start out as private sometimes get misclassified as public and then they’re available online for anyone to find. So as you’re doing your open source intelligence and your reconnaissance work, be sure to check places like GitHub, Bitbucket, SourceForge, and other places like this that are source code repositories because you’re going to be able to find a lot of good data there.

For example, if you go into SourceForge, you’re going to be able to see discussion forums and issue tracking for that particular piece of code or software. GitHub is also extremely popular because you need these public source code repositories to be able to have your code staged and ready to be pulled into production especially if you’re using cloud-based services, like AWS, Azure, or Google Cloud or even something like a docker image.

Now, when we’re dealing with these repositories the way developers normally use it is that they will create a public repository or a private repository in one of these public sources where they can then house their code. As they have the things there each developer can check out code, work on it and then check it back in and have it committed back to the public repository once they’re satisfied with their changes.

At that point, whoever is the project leader or maintainer can verify the code is good, ready to go to production and then it can get put into the continuous integration and continuous deployment pipeline so it can get moved into production from your development and then staging servers, and finally into production.

Now, as a penetration tester you can often find a lot of really important data inside of these public source code repositories. For example, if somebody has them set to public, instead of private, you’re going to be able to see the entire source code and then you can analyze it and find vulnerabilities with it that you may be able to attack later on during your exploitation phase. Now, in addition to that, some developers make some really common insecure coding mistakes, like putting in things like API keys, usernames, and passwords directly into their code instead of pulling those things from a more secure location. If they do this and you can see the code because it’s been exposed you’re going to be able to find not just vulnerabilities but also authentication credentials. And so again, you can see why this information is really valuable to a penetration tester. Now, the other thing we want to talk about in this lesson is the fact that there are website archives and website cashes out there.

Now, just because the website is secured today, doesn’t mean it was secure three weeks ago, three months ago or three years ago. So if you go to a site like you can actually use the Wayback Machine and look at what that site looked like in previous versions. If they had some sort of a public disclosure at some point, you can go back and see that and then grab that detail and that information.

Additionally, if you’re searching with something like Google, you can use the keyword hash, colon and the website name to be able to pull up the cached version of that website. Where this is really helpful is if there was a company that had some sort of a disclosure or some kind of a bad event happen. For example, maybe somebody put up a Word document, a PowerPoint or an Excel file on the website that had confidential information on it.

Well, if that was cached by Google or part of the archive search inside of the Wayback Machine, you can actually go back and find those files and then use them as part of your penetration test. There’s lots of different things you can do with this type of data.

But the point here is, just because something was deleted last week, doesn’t mean it’s gone forever. That thing is still in some archive, somewhere on the internet, especially with Google who is continually caching websites and the Wayback Machine at The last thing I want to discuss in this lesson is the fact that you can search for images, not just by keywords, by going to something like, but also taking a picture and then finding out where that picture belongs.

This is really helpful if you’re trying to find pictures of people or identify people from their pictures. For example, maybe during your reconnaissance you were able to get a picture of somebody that you found on a website but you weren’t sure who that employee was. Well, if you upload that picture into an image search engine it can actually scour the internet and see where that picture was originally at.

And it might identify it that it was on that company’s website or that person’s LinkedIn profile or other places where that picture might show up. For example, my picture, that I use on all of my course images, is used for a lot of different things. And so if you found that picture but you didn’t know who I was and you searched for that picture, you’re going to be able to find out that that’s a picture of Jason Dion. In addition to that, you’ll find my YouTube channel, my Twitter profile, my LinkedIn profile, my Facebook page, and lots of other things online that use that exact same picture. And so by using that single picture that you have, you’re now able to find a lot of other places that that person is hanging out on the web and you may be able to use that as part of your future attacks.

33. Search Engine Analysis (OBJ 2.1)

When I talk about search engine analysis and Google hacking, I’m not talking about hacking Google’s servers. That’s not really what we mean here. And this often gets students confused. Google hacking is a form of open-source intelligence. It’s an open-source intelligence technique that uses Google search operators to locate vulnerable web servers and applications. Basically, we can do advanced searches using Google to find all sorts of information because Google is one of the best search engines out there. Now, there are lots of different things that we can do when we’re doing these advanced operations. We can use things like quotes, NOT, AND and OR, scope, and URL modifiers. Let’s talk about each of these really quickly. First, we have quotes. When you use double quotes, you get to specify the exact phrase and make a search much more precise. For example, if I wanted to have the word Jason Dion, if I just type them into the Google search with Jason space Dion, you’re going to end up finding everything that relates to Jason or Dion or both. But if you put quote, Jason Dion, end quote, you’re only going to find words that have Jason Dion together like that. Now, that’s kind of a silly example, but you could see how this could be very useful as you’re trying to narrow down the results for a specific phrase or term that you’re looking for.

The next thing we want to talk about is the NOT operator. Now, the NOT operator uses the minus sign in front of a word or quoted phrase to take that out of your search results so you don’t find that in the results. For example, if I wanted to find all of the examples of PenTest+, but I didn’t want anything from Dion Training, I could type in PenTest+, a space, then minus Dion Training, and that will give me all the CySA results without any of the Dion Training results. The next thing we want to talk about is ANDs and ORs. These are logical operators, and they’re used to require both search terms if you use the AND, or either search term if you use the OR. Much like I said, Jason Dion in quotes gave me Jason and Dion, that would give it to me only if they’re right next to each other. But what if I want every result out there that had Jason and Dion in it, but I didn’t want them necessarily together? For instance, maybe I wanted Jason middle name Dion or something like that. Well, I could just put Jason, the AND symbol, and then Dion, or if I wanted to use the OR and get Jason or Dion, I could use Jason, O-R for OR, and then Dion, or use the pipe character, the up and down, which is also the character for OR. The next advanced search term we want to talk about is defining your scope.

Now, you can have different keywords that are used to select the scope of the search, things like site, filetype, related, allintitle, allinurl, or allinanchor. Based on these, you’ll put in the word, like filetype, a colon, and then the file type you want. For example, if I wanted to search for just PDF files, I could type, filetype colon PDF, and then whatever my search term was, and I would get only results that had PDFs containing that search term. Lastly, we can use a URL modifier. Now, these modifiers can be added to the results page to affect your results. These are things like &pws=0, which means don’t give me personalized results, or &filter=0, which says don’t filter the results, or &tbs=li:1, which says do not autocorrect my search terms, search for it exactly as I typed it. All of these are different ways to modify your URL.

Now, another thing we need to talk about when we talk about search hacking and Google hacking is the Google hacking database, or GHDB. Now, the Google hacking database provides a database of different search strings that you can use that are already optimized for locating vulnerable websites and services. This site is maintained by Offensive Security, and it contains an entire database of different search strings that you can use that are pre-run. We like to call these Google dorks. You can see here on the screen we have a couple of examples. And notice, they’re basically giving you information that you can use. For example, if you have, intitle:webview login alcatel lucent, that might be something to identify switches and routers that are Alcatel-Lucent that have a login page that’s available online. Or maybe we have something like, intitle:index of wp-security-audit-log.

That may be something that provides us with interesting results if we’re a penetration tester or a bad actor. So let’s put all this information together with a quick example. Here on the screen, you can see a Google search that I’m running using some of these advanced characteristics. For example, you see and then this big, long list. Let’s break those down individually, one at a time. First, we have filetype. The filetype is one of those advanced operators that says I’m only looking for this type of file. And then, you see %3Axls, which tells me %3A is the colon, but written in this format, and then xls, saying I want an Excel spreadsheet file. So whatever I’m searching for, it better come back as a spreadsheet. Next, you see I have the word password. That’s what I’m searching for. This is my search term. And then you see, I have site. This is the next modifier.

I only want to find things at a particular website. And then, again, I have, which is colon So what this is telling me if I put all this together is, find me any files that are of file type XLS, an Excel spreadsheet, that contain the word password and are hosted on the site Now, if you run this search, you’re going to come back with zero results. And that’s a good thing. It means me and my staff are not hosting any Excel spreadsheets with the word password on our website. But again, this is a silly example to show you how you can break down these different things. Now, as a penetration tester, do you need to know how to do this? You absolutely need to be able to create these yourself when you are trying to gather reconnaissance about somebody, and you’re trying to find out information about them. You also need to understand how to read queries, like the one shown here, because you could very well see something like this on the exam.

Now, let me give you a couple of quick exam tips. When it comes to the exam, what do you need to know from this? Well, on the exam, you do not need to put these queries together yourself, but if I give you a search string, like that Google search I gave you earlier, you should be able to read that and then pick the results you would expect based on that. This is just a basic level of analysis that you should be able to do if you’re doing an assessment.

* The most recent comment are at the top

Interesting posts

IBM Certified Data Scientist: Building a Career in Data Science

In today’s digital age, data is the new oil, driving decision-making and innovation across industries. The role of a data scientist has become one of the most sought-after positions in the tech world. If you’re considering a career in data science, obtaining the IBM Certified Data Scientist certification can be a game-changer. This certification not… Read More »

How to Balance Work and Study While Preparing for IT Certification Exams

Balancing work and study while preparing for IT certification exams can feel like an uphill battle. Juggling a full-time job and intense study sessions requires careful planning, discipline, and creativity. The pressure of meeting job responsibilities while dedicating time and energy to study can be overwhelming. However, with the right strategies and mindset, you can… Read More »

10 Highest Paying IT Certifications

In the ever-evolving world of information technology, certifications are more than just a feather in your cap – they’re a ticket to higher salaries and advanced career opportunities. With the tech landscape constantly shifting, staying updated with the most lucrative and relevant certifications can set you apart in a competitive job market. Whether you’re aiming… Read More »

Strategies for ISACA Certified Information Systems Auditor (CISA) Exam

Are you ready to take your career in information systems auditing to the next level? The ISACA Certified Information Systems Auditor (CISA) exam is your ticket to becoming a recognized expert in the field. But let’s face it, preparing for this comprehensive and challenging exam can be daunting. Whether you’re a seasoned professional or just… Read More »

Preparing for Juniper Networks JNCIA-Junos Exam: Key Topics and Mock Exam Resources

So, you’ve decided to take the plunge and go for the Juniper Networks JNCIA-Junos certification, huh? Great choice! This certification serves as a robust foundation for anyone aiming to build a career in networking. However, preparing for the exam can be a daunting task. The good news is that this guide covers the key topics… Read More »

Mastering Microsoft Azure Fundamentals AZ-900: Essential Study Materials

Ever wondered how businesses run these days without giant server rooms? That’s the magic of cloud computing, and Microsoft Azure is a leading cloud platform. Thinking about a career in this exciting field? If so, mastering the Microsoft Certified: Azure Fundamentals certification through passing the AZ-900 exam is the perfect starting point for you. This… Read More »