Verifying PDF content is also part of testing.But in WebDriver (Selenium2) we don't have any direct methods to achieve this.
If you would like to extract pdf content then we can use Apache PDFBox API.
Download the Jar files and add them to your Eclipse Class path.Then you are ready to extract text from PDF file .. :)
Here is the sample script which will extract text from the below PDF file.
http://www.votigo.com/pdf/corp/CASE_STUDY_EarthBox.pdf
import java.io.BufferedInputStream; import java.io.IOException; import java.net.URL; import java.util.concurrent.TimeUnit; import org.apache.pdfbox.pdfparser.PDFParser; import org.apache.pdfbox.util.PDFTextStripper; import org.openqa.selenium.WebDriver; import org.openqa.selenium.firefox.FirefoxDriver; import org.testng.Reporter; import org.testng.annotations.BeforeTest; import org.testng.annotations.Test; public class ReadPdfFile { WebDriver driver; @BeforeTest public void setUpDriver() { driver = new FirefoxDriver(); Reporter.log("I am done"); } @Test public void start() throws IOException{ driver.get("http://votigo.com/overview_collateral.pdf"); driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS); URL url = new URL(driver.getCurrentUrl()); BufferedInputStream fileToParse=new BufferedInputStream(url.openStream()); //parse() -- This will parse the stream and populate the COSDocument object. //COSDocument object -- This is the in-memory representation of the PDF document PDFParser parser = new PDFParser(fileToParse); parser.parse(); //getPDDocument() -- This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources //PDFTextStripper() -- This class will take a pdf document and strip out all of the text and ignore the formatting and such. String output=new PDFTextStripper().getText(parser.getPDDocument()); System.out.println(output); parser.getPDDocument().close(); driver.manage().timeouts().implicitlyWait(100, TimeUnit.SECONDS); } }Here is the output of above program :
EarthBox a Day Giveaway Objectives EarthBox wanted to engage their Facebook audience with an Earth Day promotion that would also increase their Facebook likes. They needed a simple solution that would allow them to create a sweepstakes application themselves. Solution EarthBox utilized the Votigo platform to create a like- gated sweepstakes. Utilizing a theme and uploading a custom graphic they were able to create a branded promotion. Details • 1 prize awarded each day for the entire Month of April • A grand prize given away on Earth Day • Daily winner announcements on Facebook • Promoted through email newsletter blast Results (4 weeks) • 6,550 entries Facebook
Hi Vamshi,
ReplyDeleteThis is really a good code but in my case it showing parse error is :
***********************************************************
FAILED: start
java.lang.NoClassDefFoundError: org/apache/fontbox/afm/AFMParser
************************************************************
Might be I am doing wrong but Can you please suggest any solution ?
Thanks,
Shubham
Hi Shubham,
ReplyDeleteSorry I couldn't be of much help. I tried but I couldnot find what was teh actual issue. (Might be jars / pdf ou are trying to read)
it works... thanks
ReplyDeleteI run the same code and it is working fine for me.
ReplyDeleteJust want to know , have you done any changes before you run the script?
hi thanks a lot for this wonder code .
ReplyDeletewhen i run i get the below code
can u please try to help me on this
Sep 27, 2012 9:29:38 AM org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 98877 is wrong. Fall back to reading stream until 'endstream'.
FAILED: start
org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 98877 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
at com.test.PDF.PDF_Reader.start(PDF_Reader.java:38)
hi,
ReplyDeleteYour code is working fine but your download link is not working so please use beloved link so program will be run (pdfbox-app-1.8.2.jar)
http://www.bizdirusa.com/mirrors/apache/pdfbox/1.8.2/pdfbox-1.8.2-src.zip
Hi Vamsi Kurra Ji,
ReplyDeleteWonderful Post.Thanks for sharing this with us.Please keep posting articles regularly and share your knowledge and experience with us w.r.t Selenium WebDriver and Other Topics..Thank You.
thanks :)
ReplyDeleteHi Vamshi,
ReplyDeleteThe application that I am automating has set of pages where in user provides certain information. All these information are shown in the next page as PDF embedded within a container / frame. Each text in the PDF is captured as separate element using firbug. I couldn't identify a container itself. Tried css, firepath etc but of no luck. More interesting stuff is after accepting (clicking a checkbox and click continue) the PDF in this page, in the next page again the same PDF, opens with an option to esign., where in user clicks to esign the document (within the PDF) and the user name will be displayed in signature area of PDF. We have a test esign verification created for us.
Any suggesion?
Thanks,
Kannan V
Kannan,
ReplyDeleteIt sounds like it is not exactly a pdf . Seems it is an iframe (Like "read sample" at the http://www.flipkart.com/my-journey-transforming-dreams-into-actions/p/itmdmzw9yszr94r5?pid=9788129124913&ref=1c9b59ba-12ea-470b-8bb2-aa7f6c1d15ea) .
If it is pdf , are you able to see the exact location of pdf from htmlsource. If yes your problem is solved.
Thanks for your response. I spoke to the dev. team. It is not iframe. Just they are creating an object and taking the data submitted in the earlier pages and showing in the format. This is saved as pdf when clicked on save in the container. I can share the screenshot of that page if you can share your email.
ReplyDeleteyou can reach me at vamshikurra@gmail.com
ReplyDeletehttp://www.java2s.com/Code/Jar/f/Downloadfontbox182jar.htm
ReplyDeletei faced the same issue downloaded fontbox from this website now it is working fine. thanks!!!!
Hello Vamsi,
ReplyDeleteI just check your code on an application that I am automating and guess what? work perfectly, I dont have too much experience working with TestNG but it looks a very useful tool, thanks for sharing and regards from Mexico!!!
Thank you Vamshi..
ReplyDelete