Problems with target document export (json, xlsx)
Thread poster: Elisabeth Richard
Elisabeth Richard
Elisabeth Richard  Identity Verified
France
Local time: 23:34
Member (2018)
English to French
+ ...
Jan 20, 2020

Hi,

I have two files to export, a json file and an xlsx file and they both have a problem on export (not the same problem though).
Please note this was my first project using Cafetran so I might have unknowingly caused this, although I have no idea how, nor how to fix it.

the xlsx file:
Most of the spaces between segments have been replaced by characters, usually a single letter, sometimes a number, and sometimes a couple of characters and a space. These cha
... See more
Hi,

I have two files to export, a json file and an xlsx file and they both have a problem on export (not the same problem though).
Please note this was my first project using Cafetran so I might have unknowingly caused this, although I have no idea how, nor how to fix it.

the xlsx file:
Most of the spaces between segments have been replaced by characters, usually a single letter, sometimes a number, and sometimes a couple of characters and a space. These characters only show in the exported excel document, not in Cafetran. Since this is a 20k words file, fixing this by hand is extremely time consuming and I was hoping for a better solution.

the json file:
The last 600 segments or so of the documents are missing when I open the file in excel. When I try to open it in Mozilla, it doesn’t work and I get a message saying there is a formatting error somewhere, probably a missing comma. First, the error message on mozilla helped me identify and fix a missing comma mistake, but then I got another one detecting a mistake elsewhere, but this time, I was unable to understand where the error was, because the error message speaks about columns, and I don’t have any columns in my documents. In any case, I don’t know if both problems are connected or not.

For the json file, I’ve found a solution as the customer asked me to send him the file in txt format instead, with I was able to do by copy pasting my translation from a bilingual file. So this problem is now less urgent, but I would really like to be able to send the file to my client in the correct format though.

Thanks in advance for your help.
Collapse


 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
  Jan 20, 2020

Hello Elisabeth,

I usually don't recommend learning the ropes while working on deadline sensitive projects.

As just another CafeTran user, here are a few pointers. Please excuse my brevity, as something else requires my attention.

TL;DR: When such file issues are involved, it is probably better to use the support ticket route, as you can securely and privately attach any relevant file, screenshot or export the whole cafetran package and share/discuss it wit
... See more
Hello Elisabeth,

I usually don't recommend learning the ropes while working on deadline sensitive projects.

As just another CafeTran user, here are a few pointers. Please excuse my brevity, as something else requires my attention.

TL;DR: When such file issues are involved, it is probably better to use the support ticket route, as you can securely and privately attach any relevant file, screenshot or export the whole cafetran package and share/discuss it with the developer. To export a package to share, Project > Export and Exchange > To Package.

That said here are some thoughts:

- JSON files cannot be opened in Excel, unless JSON content was added in an Excel spreadsheet. A text editor can be used to open it.

- To my knowledge, CafeTran does not offer a file filter for JSON files. Which filter type did you use during project creation?

- For Excel files (XLSX), I assume you haven't used the filter options (see https://github.com/idimitriadis0/TheCafeTranFiles/wiki/4-File-formats#ms-excel-xlsx and https://github.com/idimitriadis0/TheCafeTranFiles/wiki/2-Menu-and-Interface#filter-options-tab ). In that case, the entire Excel content has been imported. Was there a specific column to be translated? if you use filter options, be extra careful however, as wrong settings can prevent export or produce unwanted results.

- For Excel files, I've found that the "Sentence" segmentation, which is fine in many cases, is not the best for this file type. To test the segmentation, you can create a new project, after applying a different segmentation setting, like Paragraph or Document in (Edit > // or Dashboard button) Preferences > General. I am wondering whether this may be causing problems at export. The Excel export issue you describe does not "speak" to me as such.

- If you have already translated all the content, you should be able to reuse the Project TM or export the project segments to a TMX for reuse, in case you determine that you have used incorrect settings at project creation. To export a memory: Project > Export > To TMX memory. Then, in a new project with the TMX memory attached, Translate > Insert all exact matches.

- Can you describe how you have created the project? Did you create a multiple document project? Two separate projects? Added the two files individually?

Official support documentation: https://cafetran.freshdesk.com/support/solutions (the above shared links are also referenced).
Collapse


 
Elisabeth Richard
Elisabeth Richard  Identity Verified
France
Local time: 23:34
Member (2018)
English to French
+ ...
TOPIC STARTER
some answers Jan 20, 2020

Thanks Jean

- the project was not particularly time sensitive, I've actually been working on it for about a month. It probably would have been better to test cafetran with a smaller project though, but this was what I had at the time and overall, I'm very happy with the software. It's just extremely uncomfortable when you tell your client you're ready to deliver and it turns out you can't. I've been fiddling with this since Friday to no avail.

- Also, for an unknown rea
... See more
Thanks Jean

- the project was not particularly time sensitive, I've actually been working on it for about a month. It probably would have been better to test cafetran with a smaller project though, but this was what I had at the time and overall, I'm very happy with the software. It's just extremely uncomfortable when you tell your client you're ready to deliver and it turns out you can't. I've been fiddling with this since Friday to no avail.

- Also, for an unknown reason, I'm unable to connect to the Cafetran website or to reset my password so I couldn't create a ticket. Sorry about that.

- This json file opens in Excel. It contains only one column and the source document shows all the rows, but the target document doesn't.

- I did not use any filter for the json file, and as I said, the content is on one column.

- I didn't use any filter options for the excel file either, no.

- I'll remember not to use the sentence segmentation next time, thanks.

- Thanks for the info on exporting the TM, and starting over. I might just try that

- I created the project as a single document (excel) project as the client sent the rest much later. Then I added more documents (a couple excel and the one json) to this same project one by one.

I want to mention the other excel documents turned out just fine. The one I'm having a problem with was the first (and largest) one.

Come to think of it, I might have an idea what could have caused this. I'm not sure why, but at some point, the source language file was missing. This initially prevented me from exporting the target document, of course. So after checking the forums for the same kind of issues, I tried copy pasting the source document into the project folder again and manually adding the source language extension to it. At first I thought it had worked just fine, until I spellchecked the excel file and saw all the extra characters. Btw, I don't know if this information is useful or not, but the extra characters seem to only include characters from my source language.
Collapse


 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
Suggestion Jan 20, 2020

Elisabeth Richard wrote:

[…]

Come to think of it, I might have an idea what could have caused this. I'm not sure why, but at some point, the source language file was missing. This initially prevented me from exporting the target document, of course. So after checking the forums for the same kind of issues, I tried copy pasting the source document into the project folder again and manually adding the source language extension to it. At first I thought it had worked just fine, until I spellchecked the excel file and saw all the extra characters. Btw, I don't know if this information is useful or not, but the extra characters seem to only include characters from my source language.


Thank you for your answer and for sharing this information. Yes, that might actually explain a few things.

When you create a native project in CafeTran, it creates a folder, where it copies the source file(s), creates the bilingual XLIFF which will be used for the translation after applying segmentation settings, along with any enabled ProjectTM or ProjectTerms, etc.

The copied source files are crucial for CafeTran to successfully export from XLF back to the native format. They should be left untouched/unedited.

My guess is that the issue comes from there.

---

So here is my suggestion:

- To call this project "finished/delivered", I understand only one Excel file is currently missing/ being incorrectly exported. In that case, it makes sense to work from this project directly.
- First, please consider backing up the relevant project folder that CafeTran has created. This will help revert to a previous state if needed.

Preliminaries: (rationale: making sure settings will not interfere with the "actual action" see below)
- Go to Preferences > Auto-propagation and disable all these settings. You can enable them back afterwards. Explanation for what each setting does: https://github.com/idimitriadis0/TheCafeTranFiles/wiki/1-Preferences#auto-propagation
- Go to Preferences > Auto-assembling, and switch off all settings, especially Fuzzy match auto-correction. You can enable them back afterwards. Explanation for what each setting does: https://github.com/idimitriadis0/TheCafeTranFiles/wiki/1-Preferences#auto-assembling
- With the project open, do you have a translation memory open? The ProjectTM perhaps? If so, go to Memory > Import segments from project and select the ProjectTM. This will ensure the ProjectTM is updated with the latest version of your translated segments (assuming TM options is set to Keep newer duplicates, which is the default). [Note: If you do not have a ProjectTM, export the TM as explained in my previous message and open it via the Memory menu. The export TM should already have all project segments].

Actual action:
- Add the same Excel file again to your project, as you did before, renaming the file slightly to make sure you recognize which is which.
- Since auto-propagation is disabled, no same segments should be propagated.
- Normally, your newly imported file has no or few translations, only the segmented source segments. To add your translations, use Translate > Insert all exact segments. Since segmentation settings have not changed, all segments should have exact segments matches, so everything should be populated with Context Matches (CM) [this is why using the TM is better than auto-propagation to populate existing translations in such a case] and/or Exact Matches.
- Go through the imported file to make sure everything is OK, and export the target file. Review it as well for completeness, etc.

Do let me know if this worked! If not, I hope the developer can chime in, or responds to your direct support ticket email.

Jean

[Edited at 2020-01-20 18:57 GMT]


 
Elisabeth Richard
Elisabeth Richard  Identity Verified
France
Local time: 23:34
Member (2018)
English to French
+ ...
TOPIC STARTER
in theory, yes Jan 20, 2020

Thank you so much for taking the time to reply in such a detailed way Jean.

So I've done exactly what you suggested and in theory, it absolutely works and at first I was thrilled! But then I realized that since I have done such heavy rewriting of the source text, starting over is definitely not a good idea for me. I think I'm just going to have to erase all those characters manually directly from the excel file!... See more
Thank you so much for taking the time to reply in such a detailed way Jean.

So I've done exactly what you suggested and in theory, it absolutely works and at first I was thrilled! But then I realized that since I have done such heavy rewriting of the source text, starting over is definitely not a good idea for me. I think I'm just going to have to erase all those characters manually directly from the excel file!
Collapse


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
What does the file look like? Jan 21, 2020

Elisabeth Richard wrote:

I think I'm just going to have to erase all those characters manually directly from the excel file!


Can you describe how the cells/file looks? Perhaps you can post a screenshot (if necessary, with sensitive data masked)? Perhaps there is some way to fix this.


 
Elisabeth Richard
Elisabeth Richard  Identity Verified
France
Local time: 23:34
Member (2018)
English to French
+ ...
TOPIC STARTER
screenshot Jan 21, 2020

Thanks for your interest Hans,

I actually solved the problem by manually erasing all the extra characters, but since figuring out what could have gone wrong or how it could have been solved better might one day be useful to someone else, here is a screenshot of the excel file (I assumed that's the file you were asking about).
Basically, when a cell contains several sentences, there is often (but not always) one characters (and sometimes several) that replaces the space that s
... See more
Thanks for your interest Hans,

I actually solved the problem by manually erasing all the extra characters, but since figuring out what could have gone wrong or how it could have been solved better might one day be useful to someone else, here is a screenshot of the excel file (I assumed that's the file you were asking about).
Basically, when a cell contains several sentences, there is often (but not always) one characters (and sometimes several) that replaces the space that should go between the period marking the end of a sentence and the capital letter marking the beginning of the next sentence. The target language is French, and the source language is Swedish. The extra characters are Swedish characters.

Capture d’écran 2020-01-21 à 10.23.01
Collapse


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Thanks Jan 21, 2020

Thank you for the screenshot. I'm quite sure that the developer will have something to say about the cause and how to avoid it.

FYI:

When you can/are allowed to open the Excel file in LibreOffice, you can use regular expressions to search and replace.

E.g. a string like:
des dates.dComme ces

Can be found using:

Screenshot 2020-01-21 at 10.58.43

Note that the replacement expression is not correct (I'd have to dive into that). But I know for sure that automatic correction via these regular expressions is possible.

For (maybe) next time ...


 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
  Jan 21, 2020

Hello Elisabeth,

I'm sorry to hear you had to resort to manual editing in the end.

However, I am not sure what you mean by "heavy rewriting of the source text".


 
Elisabeth Richard
Elisabeth Richard  Identity Verified
France
Local time: 23:34
Member (2018)
English to French
+ ...
TOPIC STARTER
Good question Jan 21, 2020

Hi again Jean,

The reason why the TM didn't work for me is that:

- There were many instances in which the segmentation was wrong: sentences that were cut in the middle because of abbreviations or because the dot is also used as a mathematical symbol (multiplier) in my target document. Going through the 20+k words to remerge all those was going to take me a long time anyway.

- But what's worse is that the client inadvertedly duplicated some text by mistake:
... See more
Hi again Jean,

The reason why the TM didn't work for me is that:

- There were many instances in which the segmentation was wrong: sentences that were cut in the middle because of abbreviations or because the dot is also used as a mathematical symbol (multiplier) in my target document. Going through the 20+k words to remerge all those was going to take me a long time anyway.

- But what's worse is that the client inadvertedly duplicated some text by mistake: they'd have the same text in places where it was obvious you needed different texts. I informed my client of this as I went and they told me what the right text was meant to be in each instance, but without sending me a new source document. Going through all of our exchange to make sure I didn't miss any such occurence would have required too much time and concentration.

All in all, fixing this by hand took me less than 2 hours. I wasted over twice that trying to fix it in a more "efficient" way. Oh well.
Collapse


 
Elisabeth Richard
Elisabeth Richard  Identity Verified
France
Local time: 23:34
Member (2018)
English to French
+ ...
TOPIC STARTER
Thanks Hans Jan 21, 2020

That seems very interesting, but I don't understand the code in the search and replace fields. Do you know where I can find some documentation on that?

 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Regular expressions Jan 21, 2020

Elisabeth Richard wrote:

That seems very interesting, but I don't understand the code in the search and replace fields. Do you know where I can find some documentation on that?


Tutorial: https://www.regular-expressions.info/tutorial.html

Tester: https://www.freeformatter.com/java-regex-tester.html

Practical Usage of Regular Expressions: An introduction to regexes for translators: https://www.amazon.com/Practical-Usage-Regular-Expressions-introduction/dp/1985752921


 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
 Abbreviations and Segmentation Rules Jan 21, 2020

Ouch! I see…

Elisabeth Richard wrote:

[…]
There were many instances in which the segmentation was wrong: sentences that were cut in the middle because of abbreviations or because the dot is also used as a mathematical symbol (multiplier) in my target document. Going through the 20+k words to remerge all those was going to take me a long time anyway.


Well, I am not sure about those multipliers, but there are a few ways to minimize the segmentation issues you describe in future projects.

The following methods are NOT mutually exclusive, they can be combined:

1. Abbreviations. CafeTran offers a user-friendly way to deal with segmentation exceptions such as abbreviations, without messing with SRX segmentation rules. You can define abbreviations to improve segmentation of the source text. CafeTran joins segments at abbreviations automatically.

In a (newly created) project, you can use Resources > Abbreviations > Scan project for abbreviations and add any relevant item by simply clicking on it in the results tab.

At any point, you can also manually add abbreviations via Resources > Abbreviations > Add selection to abbreviations.

Please note segmentation occurs during the project creation. You can either replace the document or create a new project (etc.) to apply these abbreviations. Otherwise, in existing projects, when you go to a segment that breaks at a newly defined abbreviation, it is automatically merged with the next one at that point.

https://github.com/idimitriadis0/TheCafeTranFiles/wiki/2-Menu-and-Interface#resources--abbreviations-submenu

2. Segmentation rules (SRX). The other method consists in using an SRX file and defining segmentation rules (break rules and break exceptions) for a specific source language. This method is slightly more involved, however, many such abbreviations (and language-specific rules) are already set in readily-available SRX files you can find online.

For example, here is what such rules look like for Swedish in the OmegaT SRX:

https://i.imgur.com/rYOTFeW.png
https://i.imgur.com/B4gWHel.png
https://i.imgur.com/DhbP1Am.png

CafeTran ships with its own editable SRX file, which has very few rules, but you can also copy an external SRX file in the appropriate location (see link below), and just use that!

You can refer at the "Suggestion" in the following reference document link for Preferences > General > Segmentation: https://github.com/idimitriadis0/TheCafeTranFiles/wiki/1-Preferences#segmentation

When you use an SRX file in Preferences > General, CafeTran automatically recognizes the correct source language to use (and you can also set it manually).

Cheers!

Jean

[Edited at 2020-01-21 11:52 GMT]


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie[Call to this topic]

You can also contact site staff by submitting a support request »

Problems with target document export (json, xlsx)






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »