[NEW]Leak Scanner 1.1.4.478 has been released! Get PAID stuff for FREE! Click here to DOWNLOAD

What would you like us to code next? Share your ideas with us! Click here to view thread

Post Reply 
 
Thread Rating:
  • 1 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[C#] Using Google API for Image Search
Author Message
kamito Offline
Contributor
*******
Contributors

Posts: 1,913
Joined: Nov 2010
Reputation: 704
Valued MemberThe PromoterMember Of The Month
Post: #1
[C#] Using Google API for Image Search
DOWNLOAD (SOURCE CODE):
Code:
The contents of this section are hidden for your group. Please register to view them.
Register

PASSWORD:
Code:
The contents of this section are hidden for your group. Please register to view them.
Register

[IMG]The contents of this section are hidden for your group. Please register to view them.[/IMG] Register

Introduction

As part of the research that I am doing for my thesis I had to perform a lot of image search queries against the most popular search engines. The Yahoo! guys provide us with The contents of this section are hidden for your group. Please register to view them.which also supports image search and was really useful to me. Google, however, for some obscure reasons, provides only The contents of this section are hidden for your group. Please register to view them. and does not provide anything for image search. A few weeks ago, I came across The contents of this section are hidden for your group. Please register to view them. by The contents of this section are hidden for your group. Please register to view them.. So I thought, why not do something similar for the Google image search? Well, I did that, and here it is. The regular expressions used are far more complex than the one used in the translation service, but in the end, it's basically the same.

What's in the source files?

The source files include two projects. The Ilan.Google.API project includes a DLL which can be used to query Google image search programmatically. The Ilan.Test.Google.API project includes a simple application that enables you to run a query and display all the resulting images dynamically on the form. When you double-click a thumbnail the full original image is displayed. This application is aimed at showing how simple it is to use the API. I wouldn't recommend it as a real searching tool (at least as is) because:

  1. It fetches all images (thumbnails) on one single thread. Although it does provide the ability to view the full image before all the thumbnails have been downloaded, a "real" application would have to download several thumbnails at the same time and significantly boost performance.
  2. Only simple queries are supported (space-separated words). Special characters are not handled. If you need to support more complicated queries, you'll have to parse the query and transform it to a format that complies with the URL.


QuickStart - How do I use the API?

If you're not interested in how it works, but just want to use the library, this section is especially for you :-)
  • Add a reference to the Ilan.Google.API library in your project.
  • Add the using directive:
    Code:
    The contents of this section are hidden for your group. Please register to view them.
    Register
  • When you need to run a query, make sure it conforms to the URL supported by Google. For this I suggest you to check the query on Google Image Search and look at how they build the URL. For instance, if you need to support only simple queries of space-separated words, you just need to transform the query to a list of words separated by the plus (+) sign. For example, the query "apple cake" must be transformed to "apple+cake". Notice that several space characters must be transformed to a single + sign, so I wouldn't recommend a simple string.Replace call. You could use the regular expression that I have used in my demo project:

    Code:
    The contents of this section are hidden for your group. Please register to view them.
    Register
  • To run the query (say for the first 50 results), use the SearchService.SearchImages method:

    Code:
    The contents of this section are hidden for your group. Please register to view them.
    Register
  • The response object holds *all* the first 50 results for the given query (or less if there are no more results). For example, you can retrieve the URL of the first image through the response.Results array:

    Code:
    The contents of this section are hidden for your group. Please register to view them.
    Register
  • The parameters for the SearchService.SearchImages method are:
    • (string) query- The query to be sent.
    • (int) startPosition- The index of the first item to be retrieved (must be positive).
    • (int) resultsRequested- The number of results to be retrieved (must be between 1 and (1000 - startPosition)).
    • (bool) filterSimilarResults- Set to 'true' if you want Google to automatically omit similar entries. Set to 'false' if you want to retrieve every matching image.
    • [optional: (SafeSearchFiltering) safeSearch - Indicates what level of safeSearch to use.]
  • Well, I think that should be it. Just remember that Google does not return results beyond the first 1000 results for a query, so if you're trying to get a result that exceeds the first 1000 I'll throw you an exception...

Returned objects

The SearchResponse and SearchResult classes are pretty straightforward. A query returns one SearchReponse which holds the total number of available results for the query as well as an array of SearchResult objects, each representing a separate image returned by Google. The SearchResult objects hold the URL of the thumbnail of the image (located somewhere at Google) and the URL of the actual image (at its source).

Building the query URL

After sending a few queries to Google using The contents of this section are hidden for your group. Please register to view them., it turns out that you can run a simple query for "apple cake" The contents of this section are hidden for your group. Please register to view them..

Digging a little further, you can fetch results at a certain position by adding "&start=". So, if you want to fetch results from 21 to 40 (i.e. the second page if you were using their web site), the URL should be The contents of this section are hidden for your group. Please register to view them..

Note that the index of the images is 0-based, so to start with the 21st result you must mention "&start=20". Then, I found out that there is a default filter that omits results if they resemble the previous results. If you want to disable this filter you need to add "&filter=0". A quick test will show you that "&filter=1" turns the filter on. To see the effect of the filter I suggest you run the following two queries, which return the results starting at result nr. 900:
  • http://images.google.com/images?q=apple+cake&start=900&filter=0
  • http://images.google.com/images?q=apple+cake&start=900&filter=1

Finally, pedrito68 indicated to me that you can also choose the SafeSearch mode by adding "&safe=...". Google's SafeSearch blocks web pages containing explicit sexual content from appearing in search results. There are three options: "active" (filter both explicit text and explicit images), "moderate" (filter explicit images only - default behavior) and "off" (do not filter the search results)

I have tried to find out a way to define the number of returned results, but didn't succeed. I keep getting 20 results at a time. More on this later...

So, to sum up the querying part, building the URL for a query, given the query, start position and filter can be done as follows:

Code:
The contents of this section are hidden for your group. Please register to view them.
Register

Sending the query and retrieving the result

Sending the query and retrieving the HTML file returned is rather simple and pretty common:

Code:
The contents of this section are hidden for your group. Please register to view them.
Register

Extracting information from the retrieved HTML

Here comes the ugly part. We have to parse the HTML and extract the number of available results for the query as well as information for each one of the retrieved images. After having analyzed the HTML I got from the Google, I managed to find a recurring pattern that accurately allows you to know where each of these interesting information bits can be located in the HTML. Needless to say, that if Google changes the format of the returned HTML the parsing will fail!!! Of course, I relied on regular expressions to parse the text. Following are the different patterns used in the API:

Line breaks used to avoid scrolling.

Code:
The contents of this section are hidden for your group. Please register to view them.
Register

This pattern is used to retrieve information about each image. The URL of the original image is captured into the "imgurl" group, the URL of the thumbnail is captured into the "images" group and the width and height of the thumbnail image are captured in the "width" and "height" groups respectively.

Line breaks used to avoid scrolling.

Code:
The contents of this section are hidden for your group. Please register to view them.
Register

This pattern is used to retrieve additional information about each image - the original images' widths, heights and sizes (in groups "width", "height" and "size" respectively). I didn't find a way to use the same pattern for all the images' information - I guess there is but I gave up searching after a while...

Line breaks used to avoid scrolling.

Code:
The contents of this section are hidden for your group. Please register to view them.
Register

This pattern is used to retrieve the total number of results available for the query (can be found on the upper right portion of the HTML when you look at the result of a query). I have also extracted the last result index - to find when there are no more results.

Since I'm not a regular expressions pro, if you want some more information about it and want to get a better understanding of how the pattern works, I suggest reading the following: The contents of this section are hidden for your group. Please register to view them. The contents of this section are hidden for your group. Please register to view them. and of course you must check out The contents of this section are hidden for your group. Please register to view them..

What with the 20-results-per-query?

That's straightforward. Once you know the "start=" portion of the URL, you can run the queries in a loop until you reach the requested number of results.

And the 1000 results limit?

Hmmm, sorry. I didn't find a way to work around that one. I assume, however, that virtually in all applications 1000 results should be more than enough. Besides, once you get to the last result, most of them become totally irrelevant to the actual query anyway...

Thanks and apologies

I wish to thank The contents of this section are hidden for your group. Please register to view them. and his The contents of this section are hidden for your group. Please register to view them.. I have used regular expressions a few times in the past, but mostly with very simple patterns. The expressions used here are by far the most complicated ones ever written by me, and I wouldn't have tested it and successfully written it without the help of "The Regulator". Which brings me to the apologies - there is a good chance that the patterns I'm using could be simplified. If you find a way to simplify it (with no performance penalty), then please let me know, and I'll update the code. Finally, I would like to thank "pedrito68", who has provided me with very useful comments (and code) based on the first version of this API, which I have added in the current version (see History section).

Conclusions

The Google Image Search API is essentially a tool that you can use if you need to perform an image search against Google programmatically. Since it parses the HTML returned by Google, if the format of this HTML changes, the library's implementation will have to change accordingly. The implementation is rather simple. It shows a simple example of how to send a URL to a web server (using the HttpWebRequest object) and retrieve the HTML returned by the web server. It also uses regular expressions (using the System.Text.RegularExpressions.Regex class) with some pretty complicated patterns to extract the interesting data from the HTML. Finally, the demo application shows how to use the API.

On a personal note - I have been using this API for the past few days to run over 40,000 single-word queries. It has proven to be very accurate and never did the regular expression break. One very interesting feature is that it does not suffer from any quantity-limit as the regular SDK. For instance, Google's web search API won't let you run more than 1000 queries with the same key in a single day (24 hours). Similarly, Yahoo! has 5000 queries per day limit. It might be good to adapt this API to provide regular search capabilities and work around the Google's 1000-qeries-per-day limitation or adapt it to Yahoo and work around their limitation...

Credits to Ilan Assayag.
05-31-2011 08:57 PM
Find all posts by this user
codebreaker911 Offline
Junior Member
**
Registered

Posts: 3
Joined: Nov 2011
Reputation: 0
Post: #2
RE: [C#] Using Google API for Image Search
Thanks for the nice tutorial.
11-25-2011 04:39 PM
Find all posts by this user
Inventor Offline
Junior Member
**
Registered

Posts: 19
Joined: Nov 2011
Reputation: 0
Post: #3
RE: [C#] Using Google API for Image Search
Simple and nice, thanks for posting.
11-26-2011 10:42 PM
Find all posts by this user
JinXeR Offline
Junior Member
**
Registered

Posts: 23
Joined: Dec 2011
Reputation: 1
Post: #4
RE: [C#] Using Google API for Image Search
Yeh very simple thanks ill put it on mah applications
12-12-2011 04:39 AM
Find all posts by this user
dharma06 Offline
Junior Member
**
Registered

Posts: 1
Joined: Feb 2012
Reputation: 0
Post: #5
RE: [C#] Using Google API for Image Search
Hi kamito, ı didn' t access this download address please you can contact me to send source code because ı have to make proje to graduate my university and ı need to source code, o didn't write myself please help me...
02-23-2012 01:54 PM
Find all posts by this user
hackstock90 Offline
Junior Member
**
Registered

Posts: 10
Joined: Feb 2012
Reputation: 0
Post: #6
RE: [C#] Using Google API for Image Search
thanks man really nice tut :)
02-29-2012 07:29 PM
Find all posts by this user
Post Reply 




User(s) browsing this thread: 1 Guest(s)





L33T Forums
Links
GET CONNECTED