Firstly WebDriver does not support the functionality of extracting text from an image , at least as of now :) .
So if we would like to extract and verify text from an image then we should use OCR (Optical Character Recognition) technology.
Coming to OCR , here is one of the nice article , and it says :
OCR software extracts all the information from the image into easily editable text format.Optical character recognition (OCR) is a system of converting scanned printed/handwritten image files into its machine readable text format. OCR software works by analyzing a document and comparing it with fonts stored in its database and/or by noting features typical to characters.
There are good no.of free OCR software tools . If your preferred program is Java then you can use one of the Java OCR libraries to extract text from an image. I used ASPRISE OCR java library in this article. To work with ASPRISE OCR library , follow the below simple two steps.
- Download "Asprise OCR" libraries , depending on the operating system you are using .
- Unzip the downloaded folder and add the aspriseOCR jar file to your working directory . If you want you can download the single jar file from here .
- Also Copy the "AspriseOCR.dll" file from unzipped downloaded folder and save it under "C:\Windows\System32" .
And here is the sample code to read the text from above image :
import java.awt.Image; import java.awt.image.BufferedImage; import java.awt.image.RenderedImage; import java.io.File; import java.io.IOException; import java.net.URL; import javax.imageio.ImageIO; import org.openqa.selenium.WebDriver; import org.openqa.selenium.firefox.FirefoxDriver; import org.testng.annotations.BeforeTest; import org.testng.annotations.Test; import com.asprise.util.ocr.OCR; public class ExtractImage { WebDriver driver; @BeforeTest public void setUpDriver() { driver = new FirefoxDriver(); } @Test public void start() throws IOException{ /*Navigate to http://www.mythoughts.co.in/2013/10/extract-and-verify-text-from-image.html page * and get the image source attribute * */ driver.get("http://www.mythoughts.co.in/2013/10/extract-and-verify-text-from-image.html"); String imageUrl=driver.findElement(By.xpath("//*[@id='post-body-5614451749129773593']/div[1]/div[1]/div/a/img")).getAttribute("src"); System.out.println("Image source path : \n"+ imageUrl); URL url = new URL(imageUrl); Image image = ImageIO.read(url); String s = new OCR().recognizeCharacters((RenderedImage) image); System.out.println("Text From Image : \n"+ s); System.out.println("Length of total text : \n"+ s.length()); driver.quit(); /* Use below code If you want to read image location from your hard disk * BufferedImage image = ImageIO.read(new File("Image location")); String imageText = new OCR().recognizeCharacters((RenderedImage) image); System.out.println("Text From Image : \n"+ imageText); System.out.println("Length of total text : \n"+ imageText.length()); */ } }
Here is the output of the above program:
Thats for now. Have a great weekend ..!! :)
Reference :
http://asprise.com/product/ocr/javadoc/index.html
Image source path :
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgB6j3Yn6-LlRNqfQp1QvtINI0H9YbPfzY228Qmt72DdJhfy3h9ort0tySJcwdX9KVZGOzR8XHHrIIRo18_VpmMsn8LtZuYAn1ulTP48R2qZKtDIp3JNX5LlR8IAE0a2CC0GN4ZWXOy3qnV/s1600/love.jpg
Never M2suse the O, ne
Who Likes You
Never Say Busy To Th,e One
Who Needs You
Never cheat The One
Who ReaZZy Trust You,
Never foJnget The One
Who Zways Remember You.
Length of total text :
175
Thats for now. Have a great weekend ..!! :)
Reference :
http://asprise.com/product/ocr/javadoc/index.html
Good post, but I find one minor fault with the example code. You never actually make use of WebDriver in there even though you instantiate it. Might as well just omit WebDriver from the example. Or make a better example that shows use of WebDriver to navigate to some page, fetch the URL of an image via it's image src attribute, then invoke the code from your example code on OCR above.
ReplyDeleteThanks for the input David. :)
ReplyDeleteEven I felt the same . I will modify the code soon.
I have updated the code now.
ReplyDeleteHi Vamshi ..for above code it gives me a Error to add "DevIL.dll" file however i also tried this after adding this dll but it wont work ..can u pls tell me that have u already added this dll or there is something else ....! :(
ReplyDeleteHi Niyati ,
ReplyDeleteI didnot use "Devil.dll" file . The only extra jar file I have used for this program is "aspriseOCR.jar" .
Please make sure you have added correct "aspriseOCR" jar file. For every OS and also 32/64 bit machines they have separate jar files. Download the correct file and try again. :)
wish you luck :P
Also check the below article:
https://forums.oracle.com/thread/1302537
Got Your point. thank you so much for such a nice explanation :)
ReplyDeletehello,
ReplyDeletein "Internet Explorer" it not allow to click or write value in text box so can you give me some idea for that
Pavan,
ReplyDeleteCan you try "2.37.0.0" version of IEDriver. I tried "2.37.0.0" version of IEDriver and it is working fine for me . If you are still having an issue please send the url on which you are trying ?
Hi Vamsi,
ReplyDeleteDatepicker is displayed, but not selecting the date. It is entering the loop and I am able to print the date to the console.
cell.findElement(By.linkText(String.valueOf(day))).click(); this particular code is not doing the required action.
Hi Kannan,
ReplyDeleteI am on Firefox 24 and using selenium 2.35 jars.Code is working for me without any issues . What version of firefox and jars you are using ?
Did you modify the code in anyway like changing the website ?
Hi Vamshi,
ReplyDeleteI am using the browser and jars as you said above. I tried using this code for my application. So, I made it generic instead of clicking on a particular day. It was not working but changed the code as below and it started working.
if (cell.getText().equals(String.valueOf(day))){ // day is the input day we are providing
cell.findElement(By.linkText(cell.getText())).click();
break;
}
I would also like to know if you will be able to help me with reading PDF that is embedded within the same web page, instead getting downloaded to a separate browser or window? For example, when you sign in to a web application, you will get redirected to a page, where you will find the agreement PDF displayed inside a frame which can be downloaded or read and agreed to proceed.
Thanks,
Kannan
If PDF file file embedded within the webpage then try to get the exact location of PDF file from sourcecode .Hope below one will help you.
ReplyDeletehttp://www.mythoughts.co.in/2012/05/webdriverselenium2-extract-text-from.html
I think the admin of this website is in fact working hard in support of his website, as here every data is
ReplyDeletequality based data.
my blog post: sports
Hi vamshi.. I 2 got the same error. Pls tell me how 2 resolve the above error..?
ReplyDeleteHi, I just followed the same instructions in this code but at the run time i get below error
ReplyDeleteException in thread "main" java.lang.UnsatisfiedLinkError: no AspriseOCR in java.library.path
at java.lang.ClassLoader.loadLibrary(Unknown Source)
at java.lang.Runtime.loadLibrary0(Unknown Source)
at java.lang.System.loadLibrary(Unknown Source)
at com.asprise.util.ocr.OCR.loadLibrary(OCR.java:247)
at com.asprise.util.ocr.OCR.(OCR.java:56)
at Test2.main(Test2.java:29)
Hi Srikanth , Its my bad . I forgot to add another step of adding "AspriseOCR.dll" file at "C:\Windows\System32" .
ReplyDeleteAdd a copy of AspriseOCR.dll file at C:\Windows\System32 and the errors will go off . :)
Hi Anjala, Its my bad .
ReplyDeleteI forgot to add another step of adding "AspriseOCR.dll" file at "C:\Windows\System32" .
Add a copy of AspriseOCR.dll file at C:\Windows\System32 and the errors will go off . :)
Thank you so much Vamshi, got the resolution. Also, we need to copy the following dll files[DevIL.dll, AspriseJTwain.dll, ILU.dll] as well into system32 location, otherwise we will get the below error:
ReplyDelete*************
FAILED: start
java.lang.UnsatisfiedLinkError: C:\Windows\System32\AspriseOCR.dll: Can't find dependent libraries
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1939)...
*************
very helpful.....you can also do this with Onenote,Read this - http://www.superpctricks.com/2013/11/how-to-extract-text-from-image-with.html
ReplyDeleteThanks a lot, it really helped me
ReplyDeleteHi vamshi,
ReplyDeletewe can also use driver.findelement(By Locator).sendkeys(Keys.Return);
to select the dates
Hi Guys i tried all the ways specified above , but find the same error still , can any one help me out pls
ReplyDeleteI have donwloaded the ASpire OCR zipped file and unzipped
Copied the DLL files ( ASpireOCr.dll , DEVIL and iLU DLL to system32 folder)
Added the aspireOCr jar file to eclipse external jar
Now written a script to read the text inside The image
I have added the Environment Path value as D:/Aspires - this folder consist of unzipped data
also have added the Environment classpath D:\ASpires\aspriseOCR.jar
Now when i run the script
WebDriver fire = new FirefoxDriver();
fire.get("http://www.mythoughts.co.in/2013/10/extract-and-verify-text-from-image.html");
String urlOfImage=fire.findElement(By.xpath("//*[@id='post-body-5614451749129773593']/div[1]/div[1]/div/a/img")).getAttribute("src");
System.out.println("image text"+urlOfImage);
URL url = new URL(urlOfImage);
Image i0 = ImageIO.read(url);
String s = new OCR().recognizeEverything((RenderedImage) i0); // exception got is here
System.out.println(s);
Exception in thread "main" java.lang.UnsatisfiedLinkError: no AspriseOCR in java.library.path
Can any one please help me out in correcting it ?
Can you please try adding "AspriseOCR.dll" file at "C:\Windows\System32" and run ?
ReplyDeleteHi,
ReplyDeleteNice Blog, but i have one question.
Suppose i have one document then how can i select text from that document using selenium webdriver.
Refer attached screenshot for example of select text from document.
Please reply me ASAP.
Can you add the screenshot once ?
ReplyDeleteIs it possible to extract the text from a Scanned JPG file which is stored in the system using Selenium..?
ReplyDeleteIf there is any possibility , can u please post the code for that?
Hi..thanx for ur great post..
ReplyDeleteIs it possible to extract the text from a Scanned JPG file which is stored in the system using Selenium..?
If there is any possibility , can u please post the code for that?
HI,
ReplyDeleteLast time i was attached screenshot you don't got screenshot?
Refer attached screenshot.
OCR technology has its own limitations. It always won't give good results that we are looking for (bad I know :) )
ReplyDeletehttp://www.primafact.com/what-is-ocr-2/
http://www.meridianoutpost.com/resources/articles/ocr-limitations.php
Hey Akash,
ReplyDeleteJust want to know why do you want to select this text ? :)
Are you working any specific test case or you are exploring selenium in deep :)
If you are looking for copy paste you can do it using Keys method.
Please use this code wherever you found Two Jquery calendar (Here i used Hash table)
ReplyDeleteHashtable h=new Hashtable();
h.put("January",0 );
h.put("February",1);
h.put("March",2);
h.put("April",3);
h.put("May",4);
h.put("June",5);
h.put("July",6);
h.put("August",7);
h.put("September",8);
h.put("October",9);
h.put("November",10);
h.put("December",11);
int expMonth;
int expYear;
// Calendar Month and Year
String calMonth = null;
String calYear = null;
boolean dateNotFound;
dateNotFound = true;
expMonth= 5;
expYear = 2014;
while(dateNotFound)
{
calMonth = driver.findElement(By.className("ui-datepicker-month")).getText(); // get the text of month
calYear = driver.findElement(By.className("ui-datepicker-year")).getText();
if(((Integer)h.get(calMonth))+1 == expMonth && (expYear == Integer.parseInt(calYear)))
{
String block="//div[@class='monthBlock first']/table/tbody/tr/td"; // THIS IS FIRST CALENDAR
selectDate(expDate,block);
dateNotFound = false;
}
// parseInt - Converts String to integer and indexof( It will return the index position of String)
else if(((Integer)h.get(calMonth))+1 < expMonth && (expYear == Integer.parseInt(calYear)) || expYear > Integer.parseInt(calYear))
{
String block="//div[@class='monthBlock last']/table/tbody/tr/td"; // THIS IS SECOND CALENDAR
selectDate(expDate,block); // PASSING DATE AND CALENDAR
dateNotFound = false; // Otherwise it will rotate continuously
}
else if((Integer)h.get(calMonth)+1 > expMonth && (expYear == Integer.parseInt(calYear)) || expYear < Integer.parseInt(calYear))
{
System.out.println(" Please enter the date greater than Current date");
dateNotFound = false;
}
}
}
//Thread.sleep(3000);
public static void selectDate(String date,String block) throws IOException
{
String monthblock=block;
List dateWidget = driver.findElements(By.xpath(monthblock));
for (WebElement cell: dateWidget)
{
//Selects Date
if (cell.getText().equals(date))
{
cell.findElement(By.linkText(date)).click();
break;
}
}
driver.findElement(By.id("SearchBtn")).submit();
driver.quit();
}
Hi All,
ReplyDeleteHow can we select a date dynamically for the months which are not displayed in the view. Example, It's April and I want to select date from July which is not in the current datepicker view so how can we achieve that?