Get Search Engine Results with Python


For the first step of our search engine scraping
project, we’re going to do the obvious thing; enter a search, and return the search engine
results. We’ll start by importing the Selenium web
driver, which will allow us to open and manipulate a web browser. We’ll create a function with only one argument,
the search term that we’d like to query. Let’s create a variable and store our search
engine website as a string. Next we’ll open up the Firefox driver and
navigate to the url we assigned. Let’s do this manually for a second. Here’s the search box that we’ll enter
our query into. Now we just need to uniquely identify the
element. Since the search box has its own id, it’s
easy for us to find that single element. We can pass our argument, search_term into
the search box input. And since we don’t have any other form fields,
we can submit the form without any other interaction. Let’s check to see what we need to be looking
for in the search engine results. We can see that there are ads at the top of
the page that look similar to the results themselves. We’ll need to keep that in mind to avoid
returning ads as search results. The link itself is nested inside an h3 tag. So if it weren’t for the ads, we could just
search for all h3 tags that have a link. But to be safe, we’ll account for both situations
– when ads are and are not displayed, by first searching for a special ordered list class
that only displays when ads are present. If that doesn’t yield results, then we’ll
just try finding all the h3 links. We’ll create an empty list to store our
potential search results. Let’s set up a for loop. For each individual link in the set of links
that was returned to us: We want to find the href attribute for each
link. I’m going to include a print statement,
so as our project grows in complexity, we can see the printout of links being added. The important part is adding the links to
our empty results list. We’re done with the browser, so we can go
ahead and close it. And now we can return the list of our links. Let’s test it out! Call the function get_results with a search
term of your choosing. It happens quickly, but the browser opening,
navigating, and closing again is all controlled by your code! And we can check back on the terminal, and
we can see, yep, the top ten results have printed to the console. We didn’t utilize the return value at this
point, but we will as we add to the project. Now that we’ve confirmed the code works
properly, I want to change the Firefox browser to PhantomJS. PhantomJS lets us execute an actual browser
but without the graphical interface, since we probably don’t care to have a new Firefox
window pop up every time we run our code. And, it works! If you like what you saw and want to learn
more about Python and automation, subscribe today!

30 thoughts on “Get Search Engine Results with Python

  1. Great tutorial, but it is a bit fast. Also, although it's quickly inferred, there is no introduction to the project (goal, intent, overview). I wouldn't mind a couple more minutes with a bit more elaboration.

  2. Dude..
    What the hell are you waiting for ?
    You got a great accent..
    Straight to the point..
    Short Video..
    And very confident..

    Make some kickass logo and start doing some tutorials!
    You'll grow big in no time !

  3. Great job, this was really useful to me. As a note, the try except block isn't needed, you can just pick special ordered list class, it shows up even when there aren't ads.

  4. what if I have a long list of queries in csv, how can I read file and search one query one by one?

  5. hi what program are you using? am using IDLE from python website and when i run the code it opens firefox and searches then opens a blank CLI interface and nothing happens i dont get any results

  6. os.path.basename(self.path), self.start_error_message)
    selenium.common.exceptions.WebDriverException: Message: 'phantomjs' executable needs to be in PATH.

  7. Nice Video! By the way, i tried searching "*.sony.*" no results were shown. I think that the * and . were filtered. Any fix to this ?

  8. Hi. Thank you for great videos.
    In your other video you show to to limit the number of returned links. The think is I want to have around 200 first links . Would you please give my some idea / tip or make a video on how can I do that? Thanks

  9. Thank you. I would love to follow the following up tutorials but cannot get this first step to fully work. I have the same problem as Mayed Hamad. Firefox opens searches with keyword returns results but no links. I am on windows 7, Python3.6, geckodriver set to path. It seems geckodriver starts but then stalls. Any help/advise would be most appreciated. Spiros

  10. Great tutorial, to the point, good accent, no wastage of time…….. Please do some more tutorials

  11. I keep getting a syntax error even though I typed it exactly like yours and put it through an error checker and it keeps says no new line after eof

  12. Awesome! Thanks! I'm on Mac Jupyter and not getting any errors however, I have only empty results:: Out[30]: [ ]. Do I have to install some module?

Leave a Reply

Your email address will not be published. Required fields are marked *