MemoQ scrambling texts?
Thread poster: psicutrinius
psicutrinius
psicutrinius  Identity Verified
Spain
Local time: 14:08
Member (2008)
Spanish to English
+ ...
Feb 10, 2018

Hello all.

Have been doing quite a long translation (over 10 Mb, thus no preview was available), first in word 2007 (I asked the client to try that in the first place). I found that, whenever there was a table with cells, memoQ did not seem to notice the cells, so it translated in line (horizontally) the first line of every cell instead of finishing the cell one line at a time and then going for the next. Also, the layout for the page went at least slightly awry from the beginning (
... See more
Hello all.

Have been doing quite a long translation (over 10 Mb, thus no preview was available), first in word 2007 (I asked the client to try that in the first place). I found that, whenever there was a table with cells, memoQ did not seem to notice the cells, so it translated in line (horizontally) the first line of every cell instead of finishing the cell one line at a time and then going for the next. Also, the layout for the page went at least slightly awry from the beginning (and for 200 pages and being cumulative, you can imagine the aspect of the last 30 translated ones).

There were quite a number of these, therefore the additional work was short of awesome (and, in fact -which was the real problem- meant getting past the agreed delivery time), so I asked for the pdf (a pdf-A, thus editable), and tried to use the TM on it, but, first, segmentation had changed (totally) and, second, the problem with the cells persisted.

Any solutions? at the very least: any mitigations?. Plus, of course, I would love to know the reasons. If this is due to (for instance) the original being a series of cut-and-paste word texts from different authors as is the case for most technical manuals, in possible different versions of the software, or for updates "grafted into" the same way, which are then converted to pdf, I wouls like to know how to spot it.

Thanks in advance.
Collapse


 
Kevin Fulton
Kevin Fulton  Identity Verified
United States
Local time: 09:08
German to English
DTP issue Feb 10, 2018

I had a similar problem a few months ago. I received a pdf of a brochure from which I easily (or so I thought) extracted the text to MS word. There were tables and bullet lists that got all mixed up. Basically I had to cut and paste to reassemble the text. I've encountered this before on a much smaller scale (2 pages). A colleague suggested that the pdf was created from a document that had been assembled using a desktop publishing program rather than from an MS Word document.

 
Tomás Cano Binder, BA, CT
Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 14:08
Member (2005)
English to Spanish
+ ...
Incorrect preparation Feb 11, 2018

This sounds like a Word file made by saving a PDF file coming from some other application, not a native Word file.

If you examine the source Word 2007 file you have, can you use the arrow keys in the keyboard to continue from the end of a line in a cell in a table to the next line, or is each line an individual object? This would be the case of the file was saved from a PDF without any proper OCR process. You will get a clearer picture of the true structure of the Word file if you e
... See more
This sounds like a Word file made by saving a PDF file coming from some other application, not a native Word file.

If you examine the source Word 2007 file you have, can you use the arrow keys in the keyboard to continue from the end of a line in a cell in a table to the next line, or is each line an individual object? This would be the case of the file was saved from a PDF without any proper OCR process. You will get a clearer picture of the true structure of the Word file if you enable the Show all icon in the ribbon to reveal hidden characters.

Possible solutions are, if the actual source file is indeed a PDF:
- If the PDF is an object-based file, i.e. you can select the text using Acrobat Reader: Use software like Iceni Infix to produce a proper interpretation of the contents of the PDF. Infix detects the tables and creates full sentences in each cell, as it should be. Once Infix has processed the PDF file, you can produce an XML file you can import into memoQ, translate, and then reimport with Infix into the PDF again. With some additional formatting work in Infix, the result is relatively OK.

- If the PDF is a scanned document: Use an OCR tool like Abbyy Finereader to produce a Word file that resembles the PDF document more closely, although this is still the less desirable situation and will not produce a clean final document anyway.

I hope to have helped a little bit!
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

MemoQ scrambling texts?






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »