Pages in topic:   [1 2] >
What is the best way to translate a PDF document?
Thread poster: André Moreira
André Moreira
André Moreira  Identity Verified
Austria
Local time: 09:54
Member (2019)
English to Portuguese
+ ...
Oct 18, 2019

Greetings,
I realize PDF documents can cause more trouble to finish on the later stages of structure and format editing since the exported text often, if not always, comes with imperfections and lack of precision in terms of paragraph positioning, fonts, image positioning, lines etc ...
What is the best way to avoid these problems if there is any?
What is the best way to translate a PDF?
Thanks.

[Edited at 2019-10-18 22:16 GMT]


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 11:54
English to Russian
Tidy up the converted file Oct 18, 2019

1. Ocr a pdf file
2. Copy all and paste to Notepad.
3. Copy all in Notepad and paste back to MS Word
4. Apply formats and insert all pictures as per the original layout.
Steps 2 and 3 releive your from the formatting-related headache (tag soup).
That's it.


LDSngo91
Rebecca Cockburn
Jorge Payan
Tony Keily
expressisverbis
Rachel Waddington
 
Vadim Kadyrov
Vadim Kadyrov  Identity Verified
Ukraine
Local time: 10:54
English to Russian
+ ...
The best way Oct 19, 2019

is to never touch a PDF file for translation, since this format was never intended to be edited/translated/manipulated in any way. The format was invented to provide documents, information 'as is' - for reference, etc.

Still, the only way is to try to OCR a pdf file and then use a CAT tool to translate the corresponding doc file.
Of course, you will have to adjust formatting - for an extra fee, of course.

But this is a workaround, an imperfect way to do the impos
... See more
is to never touch a PDF file for translation, since this format was never intended to be edited/translated/manipulated in any way. The format was invented to provide documents, information 'as is' - for reference, etc.

Still, the only way is to try to OCR a pdf file and then use a CAT tool to translate the corresponding doc file.
Of course, you will have to adjust formatting - for an extra fee, of course.

But this is a workaround, an imperfect way to do the impossible - i.e. try to translate something that was never intended to be translated, only be given as a reference.
Collapse


DZiW (X)
Adrian MM.
Tom in London
Christel Zipfel
Björn Vrooman
Josep Vives (X)
Rachel Waddington
 
Achmad Fuad Lubis
Achmad Fuad Lubis  Identity Verified
Indonesia
Local time: 15:54
English to Indonesian
+ ...
Using the Adobe Acrobat Distiller XI to convert a PDF document to a other formats for translation Oct 21, 2019

Try this link for help https://helpx.adobe.com/acrobat/11/using/exporting-pdfs-file-formats.html.

 
Hamish Young
Hamish Young  Identity Verified
New Zealand
Local time: 21:54
Chinese to English
Use OCR software Oct 22, 2019

Almost everything I work on comes in PDF format, and I like it that way. However, clients seldom expect the format of the target text to match the source exactly. I would be checking first to see if this is actually a requirement of the job. If so, it is quite normal to add on an extra charge to cover time spent on formatting. Particularly if the PDF file contains unformattable text that you find difficult to work with, you should run the file through OCR software to extract the text, and this w... See more
Almost everything I work on comes in PDF format, and I like it that way. However, clients seldom expect the format of the target text to match the source exactly. I would be checking first to see if this is actually a requirement of the job. If so, it is quite normal to add on an extra charge to cover time spent on formatting. Particularly if the PDF file contains unformattable text that you find difficult to work with, you should run the file through OCR software to extract the text, and this will also help with formatting, since many OCRs can reproduce the format of a PDF almost perfectly in Word. A good OCR tool is more useful than a CAT tool for PDF files.Collapse


Gareth Callagy
 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 08:54
Member (2014)
Japanese to English
No need to choose one or the other Oct 22, 2019

Hamish Young wrote:
A good OCR tool is more useful than a CAT tool for PDF files.

I think I understand your sentiment, but they are not mutually exclusive.

If I use FineReader to perform OCR on a document, that doesn't mean that I cannot or should not use a CAT tool on the resulting (typically Word) file. They are equally useful.

Sometimes I cannot obtain a readable document for my CAT tool without OCR, but by the same token I wouldn't attempt a document of any size without my CAT tool even if a readable version of the source file were available.

Regards,
Dan


Jorge Payan
Kirill Loktionov
expressisverbis
John Smith (X)
Stepan Konev
 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 09:54
Member (2006)
English to Afrikaans
+ ...
Assume the worst, hope for the best Oct 23, 2019

André Moreira wrote:
I realize PDF documents can cause more trouble to finish on the later stages of structure and format editing since the exported text often comes with imperfections and lack of precision in terms of paragraph positioning, fonts, image positioning, lines etc. What is the best way to avoid these problems if there is any?


If you intend to recreate the entire file manually, then it's a matter of preference whether you want to recreate it in the source language first, and then translate it, or first translate it, and then recreate it in the target language.

If you intend to use e.g. OCR and simply fix the formatting errors, then I recommend that you fix those errors before you start translating, to ensure that the text is suitable for translating in a CAT tool. If you choose to "finish on the later stages of structure and format editing" only after the translation is done, then (a) the text won't be CAT tool friendly while you translate it and (b) it would be difficult to schedule your time since you can't predict how long it is going to take to fix those errors. If you fix errors before you start translating, then its easier to determine halfway through the project how long it will take for the project to be completed.

When you get a PDF file, you should assume the worst: assume that you'll need to recreate everything from scratch. But then try the various ways to speed up the process, e.g. if you have OCR or PDF conversion tools, try them out to see if they produce usable results.

If the text can't be copied (i.e. it is non-editable), then sometimes you can OCR it, but sometimes it's faster to just type it give it to a typist to type for you. Often, typists are also able to take care of much of the formatting as part of the typing service.


Jorge Payan
 
Kyle Corbitt
Kyle Corbitt
United States
Local time: 01:54
Spanish to English
+ ...
OCR/MT combined tool May 30, 2023

I realize this thread is years old, but for people who find it via search like I did I just wanted to mention that I had this exact problem and decided to build a tool to make it easier. Basically the tool I built lets you upload a scanned document, then it uses OCR to find all the text fields and replace them with editable textboxes populated using MT. This saves a ton of time on the initial prep/formatting work and lets you focus just on fixing any errors in the translations.

You
... See more
I realize this thread is years old, but for people who find it via search like I did I just wanted to mention that I had this exact problem and decided to build a tool to make it easier. Basically the tool I built lets you upload a scanned document, then it uses OCR to find all the text fields and replace them with editable textboxes populated using MT. This saves a ton of time on the initial prep/formatting work and lets you focus just on fixing any errors in the translations.

You can find it at https://translato.ai/ and I'd really appreciate any feedback on how I can make it better!
Collapse


 
Matthias Brombach
Matthias Brombach  Identity Verified
Germany
Local time: 09:54
Member (2007)
Dutch to German
+ ...
Trados Studio May 31, 2023

I assume you have to deal with a "dead" pdf (all text isn't able to copy, as suggested further above)? If it's not dead, you may use Trados Studio to open and save it later as an MS Word file.

expressisverbis
 
Kyle Corbitt
Kyle Corbitt
United States
Local time: 01:54
Spanish to English
+ ...
Dead PDF May 31, 2023

Matthias Brombach wrote:

I assume you have to deal with a "dead" pdf (all text isn't able to copy, as suggested further above)? If it's not dead, you may use Trados Studio to open and save it later as an MS Word file.


Yes exactly. I'm talking about PDFs that come from a scan, not the (relatively easier to manage) PDFs that contain electronically-readable text.


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 08:54
Member (2009)
Dutch to English
+ ...
ABBYY FineReader does an amazing job Jun 2, 2023

1. ABBYY FineReader PDF 15 (convert to editable .docx)
2. translate in memoQ

… works very well.


expressisverbis
 
Tom in London
Tom in London
United Kingdom
Local time: 08:54
Member (2008)
Italian to English
the best way Jun 2, 2023

André Moreira wrote:

What is the best way to translate a PDF?


The best way is to tell your client politely that before you can even begin the translation you will need the document in Word, and giving them two options:

1. They provide the Word conversion
2. You will produce a conversion yourself but this will require additional time and there will be an additional time-based charge at €XXX per hour.


Christel Zipfel
Zea_Mays
Renata Soa
expressisverbis
 
Josep Vives (X)
Josep Vives (X)
Spain
Local time: 09:54
English to Spanish
+ ...
Ahh, PDF, the jewel of the translation crown Jun 2, 2023

Funny facts:

1. PDF is a format intended to protect information and avoid all sorts of manipulation, i.e. copying and translating a text (true story!)
2. The Internet is full of apps that will let you convert a PDF into a Word for a small fee (I've seen usually 50 cents per file)
3. Back in the day there was a tool called "ABBYY PDF converter" that cost around 50 EUR -it worked so-so and, as usual, it was useless when the PDF was handwritten, it was a scan or an image of
... See more
Funny facts:

1. PDF is a format intended to protect information and avoid all sorts of manipulation, i.e. copying and translating a text (true story!)
2. The Internet is full of apps that will let you convert a PDF into a Word for a small fee (I've seen usually 50 cents per file)
3. Back in the day there was a tool called "ABBYY PDF converter" that cost around 50 EUR -it worked so-so and, as usual, it was useless when the PDF was handwritten, it was a scan or an image of poor quality (i.e. the letters could not be recognized clearly by the software), etc. I used to own a license for that program, but I've lost the number and they don't have any client support as far as I know.

The handwritten/image scan PDF is still happening today and -not surprisingly- not everybody will agree to pay for the extra work processing those (they can only be processed manually). That's one of the few cases when the rate per word is against you.

My advice to you is: Have a lovely day while translating your PDFs! (as in The Sweetest Thing)
Collapse


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 08:54
Member (2014)
Japanese to English
Is this ChatGPT-generated spam? Oct 7, 2023

John Smith wrote:
Stuff

Maybe mods can look at deleting it...

Dan


Michael Beijer
Michele Fauble
Jennifer Levey
Zea_Mays
Tom in London
Christel Zipfel
expressisverbis
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

What is the best way to translate a PDF document?






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »