My Thoughts: October 2013

Saturday, October 5

Extract and Verify the text from image using Selenium WebDriver


Firstly WebDriver does not support the functionality of extracting text from an image , at least as of now  :) .
So if we would like to extract and verify text from an image then we should use OCR (Optical Character Recognition) technology.

Coming to OCR , here is one of the nice article , and it says :

OCR software extracts all the information from the image into easily editable text format.Optical character recognition (OCR) is a system of converting scanned printed/handwritten image files into its machine readable text format. OCR software works by analyzing a document and comparing it with fonts stored in its database and/or by noting features typical to characters. 

There are good no.of free OCR software tools . If your preferred program is Java then you can use one of the Java OCR libraries to extract text from an image. I used ASPRISE OCR java library in this article. To work with  ASPRISE OCR library , follow the below simple two steps.

  1. Download "Asprise OCR" libraries , depending on the operating system you are using .
  2. Unzip the downloaded  folder and add the aspriseOCR jar file to your working directory . If you want you can download the single jar file from here .
  3. Also Copy the "AspriseOCR.dll" file from unzipped downloaded folder and save it  under "C:\Windows\System32" .
Now you are ready to go  :P .

Look at the below sample image .



And here is the sample code to read the text from above image :

import java.awt.Image;
import java.awt.image.BufferedImage;
import java.awt.image.RenderedImage;
import java.io.File;
import java.io.IOException;
import java.net.URL;
import javax.imageio.ImageIO;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;
import com.asprise.util.ocr.OCR;

public class ExtractImage {
 
 WebDriver driver;
 
 @BeforeTest
  public void setUpDriver() {
   driver = new FirefoxDriver();
  }
 
 @Test
 public void start() throws IOException{
   
 /*Navigate to http://www.mythoughts.co.in/2013/10/extract-and-verify-text-from-image.html page
  * and get the image source attribute
  *  
  */  
 driver.get("http://www.mythoughts.co.in/2013/10/extract-and-verify-text-from-image.html");
 String imageUrl=driver.findElement(By.xpath("//*[@id='post-body-5614451749129773593']/div[1]/div[1]/div/a/img")).getAttribute("src");
 System.out.println("Image source path : \n"+ imageUrl);

 URL url = new URL(imageUrl);
 Image image = ImageIO.read(url);
 String s = new OCR().recognizeCharacters((RenderedImage) image);
 System.out.println("Text From Image : \n"+ s);
 System.out.println("Length of total text : \n"+ s.length());
 driver.quit();
    
 /* Use below code If you want to read image location from your hard disk   
  *   
   BufferedImage image = ImageIO.read(new File("Image location"));   
   String imageText = new OCR().recognizeCharacters((RenderedImage) image);  
   System.out.println("Text From Image : \n"+ imageText);  
   System.out.println("Length of total text : \n"+ imageText.length());   
      
   */ 
}

}

Here is the output of the above program:

Image source path :
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgB6j3Yn6-LlRNqfQp1QvtINI0H9YbPfzY228Qmt72DdJhfy3h9ort0tySJcwdX9KVZGOzR8XHHrIIRo18_VpmMsn8LtZuYAn1ulTP48R2qZKtDIp3JNX5LlR8IAE0a2CC0GN4ZWXOy3qnV/s1600/love.jpg

Never M2suse the O, ne
Who Likes You
Never Say Busy To Th,e One
Who Needs You
Never cheat The One
Who ReaZZy Trust You,
Never foJnget The One
Who Zways Remember You.

Length of total text :
175



Thats for now. Have a great weekend ..!! :)

Reference :
http://asprise.com/product/ocr/javadoc/index.html