Pagine: < [1 2] | How can I count words in PDF files? Iniziatore argomento: suesimons
| Neil Coffey Regno Unito Local time: 02:46 Da Francese a Inglese + ... PractiCount contained a trojan according to Kaspersky | Sep 1, 2011 |
Just to warn people: I just downloaded PractiCount from the abovementioned web site and according to Kaspersky it contained a trojan. | | | Improving your PDF word count | Nov 4, 2011 |
I am not a translator, but I do prepare documents in Adobe Indesign from which PDFs are created and then sent to our translation bureau for translation. Disagreements over word counts prompted me to seek out more reliable methods of counting words.
As others have commented, you can copy text from a PDF and then paste into Word for an accurate word count. If you are using Adobe Reader, go to the "View" drop-down menu, select the "Page display" option, an... See more I am not a translator, but I do prepare documents in Adobe Indesign from which PDFs are created and then sent to our translation bureau for translation. Disagreements over word counts prompted me to seek out more reliable methods of counting words.
As others have commented, you can copy text from a PDF and then paste into Word for an accurate word count. If you are using Adobe Reader, go to the "View" drop-down menu, select the "Page display" option, and then the "Enable scrolling" option from that drop-down menu (you do this to ensure that when you do your copying that the entire PDF is selected).
Hit CNTL+A, then CNTL+C (in Windows), to select the entire PDF and then copy it. Open Microsoft Word and hit CNTL+V to paste. In my experience, any text (including text in images) should get picked up and pasted. Scroll through the document to check on any anomalies. The letter combination of "fi" or "fl" often causes words to inappropriately split in two (this is a known bug that results from the concept of ligatures). A search and replace sequence will take care of this.
If you have Adobe Acrobat Pro (as I do), you have an added tool called the Redaction Tool which will enable you to block out text that you do not want counted This tool was intended to blacken out confidential information on a document, but in doing so, also deletes those text blocks from the word count. This is important if you want to exclude things like numbers in tables, repetitive headers/footers, repetitive table row stubs/column headers, etc.
To use the tool, open the drop down menu "Advanced", then select "Redaction" and you will see a number of tools under that menu (you can drag this toolbar to your screen). Use the "Mark for redactions" button to go through and draw red rectangles around text that you wanted redacted (Note: make sure you have crosshairs showing on things like tables or images, so that you are able to select and block the text). You can create boxes anywhere, so if you have text that wraps around objects, just use extra boxes to mark different sections of the text.
When you've finished "marking for redaction", hit the "Apply Redactions" button and everything in the boxes will turn black.
If you now do a copy and paste into Word, all the text in the black boxes will not be there and will not be counted.
Adobe Acrobat Pro is expensive (I need it for many other purposes), but it does save time and money when sending PDF documents to translation. ▲ Collapse | | | Tony M Francia Local time: 03:46 Membro Da Francese a Inglese + ... SITE LOCALIZER A different experience | Nov 4, 2011 |
I had a problem with this recently, where a client who had failed to agree in advance the wordcount for a PDF job complained at the invoicing stage that I had charged for more words than there should have been.
I tried 3 different ways of counting the words:
1) 'Select all' text and copy to Word
2) OCR using Abbyy Finereader and output as 'editable text'
3) OCR using Abbyy Finereader and output as RTF
On a ... See more I had a problem with this recently, where a client who had failed to agree in advance the wordcount for a PDF job complained at the invoicing stage that I had charged for more words than there should have been.
I tried 3 different ways of counting the words:
1) 'Select all' text and copy to Word
2) OCR using Abbyy Finereader and output as 'editable text'
3) OCR using Abbyy Finereader and output as RTF
On a document of around 3000 words, the spread of results was nigh on 20%, and this could not be accounted for just by the relatively small number of words included within images — which were counted in the OCR versions but not, of course, in the direct copy.
In the end, we split the difference and agreed on a compromise figure — but I have warned this customer that next time they send me a PDF file, they must either furnish the extracted document with it, or accept my wordcount. ▲ Collapse | | | Catherine Muir Australia Local time: 13:46 Da Indonesiano a Inglese + ... In memoriam Use ABBYY FineReader to count words in image-based PDF | Aug 27, 2012 |
I just received a request for a quote to translate 6 PDFs, a total of 67 pages. The quality of the images was poor. Nonetheless, using ABBYY FineReader 11, in just a few minutes I was able to produce a ROUGH .docx file and then run the word count from within MS Word, resulting in a ROUGH count of 16,175 words.
I transmitted the file to the agency that requested the quote, with my basic per source word quote, including an element for conversion of the image-based PDF to a usable Word... See more I just received a request for a quote to translate 6 PDFs, a total of 67 pages. The quality of the images was poor. Nonetheless, using ABBYY FineReader 11, in just a few minutes I was able to produce a ROUGH .docx file and then run the word count from within MS Word, resulting in a ROUGH count of 16,175 words.
I transmitted the file to the agency that requested the quote, with my basic per source word quote, including an element for conversion of the image-based PDF to a usable Word document, along with my estimated turnaround time and payment terms, and suggested they come back to me to negotiate if interested.
If I get the job, I will do a proper conversion and validation in FineReader 11, create a .doc file, run CodeZapper on it and then import it into my DVX2 Pro TenT program for translation. (I use .doc files because DVX2 Pro seems to work better than with .docx files.)
Many agencies have no concept whatsoever of what it takes to convert a PDF made from a bad photocopy of a document into a professional translation. It takes a lot of time, not like putting a dime in a jukebox and out pops a song!
For those who might be considering using an online program to provide a word count, be careful: confidentiality might be the victim.
Cheers and best wishes for the upcoming change of season!
Catherine Muir
Freelance Translator, Indonesian>English
Mildura VIC Australia ▲ Collapse | |
|
|
Catherine Muir Australia Local time: 13:46 Da Indonesiano a Inglese + ... In memoriam word count feature in DVX2 Pro | Jan 8, 2013 |
Another way to do it, for those translators using DVX2 Pro, is built into the program. Of course, you have to have a good, clean file to start with, either a verified conversion of a PDF (I use ABBYY FineReader 11 for this stage) or a DOC/DOCX that has been corrected to remove extra spaces, etc. DVX2 Pro will produce a detailed report in either an MS Word-like count or a DVX count (so far I haven't found them to differ), which can be saved as an RTF and provided to a client to verify your word c... See more Another way to do it, for those translators using DVX2 Pro, is built into the program. Of course, you have to have a good, clean file to start with, either a verified conversion of a PDF (I use ABBYY FineReader 11 for this stage) or a DOC/DOCX that has been corrected to remove extra spaces, etc. DVX2 Pro will produce a detailed report in either an MS Word-like count or a DVX count (so far I haven't found them to differ), which can be saved as an RTF and provided to a client to verify your word count.
FWIW, I downloaded a trial version of Adobe Acrobat XI, which supposedly had a more direct route to go from an image-based PDF (such as a crappy photocopy that had been scanned and emailed) to a Word document, but I found it no better than my FineReader 11, so I didn't buy it when the trial period ended. (You'd think the creator of the PDF would be able to provide good software to unravel a PDF, wouldn't you???)
Also FWIW, the word count I get in DVX2 Pro has proven to be less that what the client has estimated, so they're happy. As the Indonesian saying goes, "Asal Bapak senang." (Meaning, "As long as the boss is happy...")
Cheers,
Catherine Muir ▲ Collapse | | | Counting words in PDF documents | Jul 14, 2024 |
Hi suesimons,
Counting words in PDF documents can vary depending on whether the text is selectable or if it's an image-based PDF. Here are a few methods commonly used by translators and writers:
Selectable Text PDFs: If you can select and copy text from your PDF:
Open the PDF in a viewer.
Select all (Ctrl+A), copy (Ctrl+C), and paste (Ctrl+V) the text into a word processor like Microsoft Word.
Use Word's word count feature to get the total numbe... See more Hi suesimons,
Counting words in PDF documents can vary depending on whether the text is selectable or if it's an image-based PDF. Here are a few methods commonly used by translators and writers:
Selectable Text PDFs: If you can select and copy text from your PDF:
Open the PDF in a viewer.
Select all (Ctrl+A), copy (Ctrl+C), and paste (Ctrl+V) the text into a word processor like Microsoft Word.
Use Word's word count feature to get the total number of words.
Image-based PDFs: For PDFs where text is scanned and embedded as images:
Use OCR (Optical Character Recognition) software to extract text. Programs like Abbyy FineReader or Adobe Acrobat's built-in OCR can help convert images to selectable text.
Once converted, you can proceed with copying the text to a word processor and counting the words.
Online Tools: Alternatively, you can use online tools designed to count characters and words in text. You can find a useful tool here to help you with this process.
Each method has its advantages depending on the complexity and format of the PDF. Feel free to explore these options based on your specific needs.
Hope this helps! ▲ Collapse | | | Pagine: < [1 2] | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » How can I count words in PDF files? TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| Pastey | Your smart companion app
Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.
Find out more » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |