Pagine:   [1 2] >
DGT translation memories
Iniziatore argomento: Dominique Pivard
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 21:46
Da Finlandese a Francese
Feb 6, 2013

Old news already, but here is how to create DGT TM's in any language pair (23 EU languages available):

http://wordfast.fi/blog/cat-tools/2013/02/06/how-to-create-dgt-translation-memories/
or
http://youtu.be/GNj07W2ZqhQ?hd=1

... See more
Old news already, but here is how to create DGT TM's in any language pair (23 EU languages available):

http://wordfast.fi/blog/cat-tools/2013/02/06/how-to-create-dgt-translation-memories/
or
http://youtu.be/GNj07W2ZqhQ?hd=1

The sample Finnish-Slovenian TMX used in the video has more than 2 million translation units (though probably lots of duplicates).
Collapse


 
Tomás Cano Binder, BA, CT
Tomás Cano Binder, BA, CT  Identity Verified
Spagna
Local time: 20:46
Membro (2005)
Da Inglese a Spagnolo
+ ...
2 million in English-Spanish Feb 6, 2013

Thanks a lot Dominique! This is great information and I appreciate it. I downloaded the package and made English-Spanish. Contains nearly 2 million segments as well. I also plan to make my other main pair, German-Spanish.

I am keeping this memory as background information for EU related translations in our memoQ server here.

The extraction took just over an hour on my machine.


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 21:46
Da Finlandese a Francese
AVVIO ARGOMENTO
memoQ Feb 6, 2013

Tomás Cano Binder, CT wrote:
The extraction took just over an hour on my machine.

You must have a much faster computer than the one I used (a three-year old laptop with an AMD processor and 4 GB of RAM)!

Let me know how the import in memoQ goes, because when I tried, I wasn't able to complete it. Not a problem for me, because I'm searching the DGT (and other very large TM's) with dtSearch, but I think memoQ (and probably several other tools) may have problems dealing with TM's that big.


 
Meta Arkadia
Meta Arkadia
Local time: 02:46
Da Inglese a Indonesiano
+ ...
No problems, and problems Feb 6, 2013

Dominique Pivard wrote:
but I think memoQ (and probably several other tools) may have problems dealing with TM's that big.

I use the DGT for GER>DUT as one of three TMs in CafeTran without problems. I assigned 6 GB of RAM to Java, and DGT "pre-translates" (another strange "Igor term" which means auto-assemble) as a database with a low priority. I also set it to Read Only.
Searching within DGT provides instant results.

Not that I don't have problems, though: http://www.proz.com/forum/apple_mac_operating_systems/242687-automated_search_help_needed.html but they have nothing to do with DGT, and everything with searching in databases.

Cheers,

Hans


 
Michael Beijer
Michael Beijer  Identity Verified
Regno Unito
Local time: 19:46
Membro
Da Olandese a Inglese
+ ...
re: importing large (DGT) TMXs into memoQ Feb 6, 2013

In order to get the really big ones into memoQ you need to cut them in half in a text editor and import them in two goes. For really big files, I recommend EmEditor (which can handle sizes that even UltraEdit can't).

Michael


http://www.emeditor.com/

[Edited at 2013-02-06 10:06 GMT]


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 21:46
Da Finlandese a Francese
AVVIO ARGOMENTO
Other tools? Feb 6, 2013

Michael Beijer wrote:
In order to get the really big ones into memoQ you need to cut them in half in a text editor and import them in two goes. For really big files, I recommend EmEditor (which can handle sizes that even UltraEdit can't).

Yes, now you're mentioning it, I remember you talked about splitting the TMX before importing in memoQ. Did you remember how long it took to import each half? Did you import the 2nd half into the same TM as the 1st half, or to a separate memoQ TM? Do you find you get useful LSC hits from the DGT TM's?

Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011? If so, how long did it take (for instance compared to memoQ)?


 
Stanislav Pokorny
Stanislav Pokorny  Identity Verified
Repubblica Ceca
Local time: 20:46
Da Inglese a Ceco
+ ...
Studio positive Feb 6, 2013

Dominique Pivard wrote:
Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011?


No problems with Studio; took about four hours on an older i3 2.5 GHz machine with 3 GB RAM.


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 20:46
Da Inglese a Ungherese
+ ...
Studio struggles above 2M Feb 6, 2013

Stanislav Pokorny wrote:

Dominique Pivard wrote:
Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011?


No problems with Studio; took about four hours on an older i3 2.5 GHz machine with 3 GB RAM.


I've done a couple of tests with Studio (2009 only). It slows down as the number of segments goes up. So it might do 100,000 segments in 2 minutes and 1 million segments in an hour and a half (random figure), but it will take six hours to import two million. In my experience, about two million is the upper limit. I tried to import 6 million TUs once, and killed it after sixteen hours. It was not even halfway done IIRC. Maybe 2011 brought improvements in this regard, I will soon test it.
I'm not sure if lookup performance is better with multiple smaller TMs but I suspect it might.
In any case, the size of the DGT-TM is right about where Studio starts to crap out.

I asked about this in a separate thread here: http://www.proz.com/forum/cat_tools_technical_help/237113-very_large_tms_~10_million_tu.html


 
Michael Beijer
Michael Beijer  Identity Verified
Regno Unito
Local time: 19:46
Membro
Da Olandese a Inglese
+ ...
as far as I can remember... Feb 6, 2013

Dominique Pivard wrote:

Yes, now you're mentioning it, I remember you talked about splitting the TMX before importing in memoQ. Did you remember how long it took to import each half? Did you import the 2nd half into the same TM as the 1st half, or to a separate memoQ TM? Do you find you get useful LSC hits from the DGT TM's?

Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011? If so, how long did it take (for instance compared to memoQ)?


Hi Dominique,

1. I can't remember exactly how long it took for each half (of 330MB), maybe around 40 minutes or so each (on a 64-bit desktop with a 3.07 GHz i7, 16GB of RAM and an SSD).

2. I imported the 2nd half into the same TM as the 1st half.

3. I have LSC (longest substring concordance) switched off. I find it never has anything useful to report. Incidentally, I also have Predictive Typing & AutoPick (and the Muse) switched off, as I find they just get in my way when translating.

4. I tried importing it into Déjà Vu X2, but gave up after 8 hours.

Michael

[Edited at 2013-02-06 13:54 GMT]


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 20:46
Da Francese a Polacco
+ ...
DVX Feb 6, 2013

Michael Beijer wrote:

4. I tried importing it into Déjà Vu X2, but gave up after 8 hours.


As a big DVX fan, I can but confirm than a large TMX import in DVX is a PITA
Now, after some months, I don't remember exactly but the header of the DGT TMX is/was incorrect and the file can't be imported "as is", it was necessary to edit in a decent text editor.
A good practice is to import a smaller TMX, compact the DVMDB, then import another smaller TMX, compact the DVMDB, etc.

Cheers
GG


 
Meta Arkadia
Meta Arkadia
Local time: 02:46
Da Inglese a Indonesiano
+ ...
A very short screencast of DGT GER-DUT Feb 7, 2013

in CafeTran. I go to the next segment, DGT (and my other TMs and glossaries) Auto-Assembles. Next, I select a word to search in DGT (and other resources).
The screencast is short because, er, CT doesn't take much time to arrive at the desired results…

http://www.screencast.com/t/E4IDfKcMueF

Cheers,

Hans


 
trhanslator (X)
trhanslator (X)
No indexing? Feb 7, 2013

Do you mean that CafeTran with no indexing of the TM is that fast?

How about opening the TMX file, how many hours did that take?


 
Meta Arkadia
Meta Arkadia
Local time: 02:46
Da Inglese a Indonesiano
+ ...
Seconds Feb 7, 2013

trhanslator wrote:
Do you mean that CafeTran with no indexing of the TM is that fast?

Well, yes. But it's set to "pre-translate", and that explains the very fast auto-assemble results. However, searching within the DGT is fast as well, as you can see in my miserably short screencast.

How about opening the TMX file, how many hours did that take?


Seconds. With 6 GB of RAM assigned to Java. And CafeTran loads TMs in RAM.

Cheers,

Hans


 
Michael Beijer
Michael Beijer  Identity Verified
Regno Unito
Local time: 19:46
Membro
Da Olandese a Inglese
+ ...
@Hans (Meta Arkadia): Feb 7, 2013

And how about the amount of TMs that CafeTran can access in a project simultaneously? In memoQ I have around 8,000,000 segments across all of my connected TMs and experience no slowdowns. How does this work in CT?

Michael

[Edited at 2013-02-07 11:06 GMT]


 
Meta Arkadia
Meta Arkadia
Local time: 02:46
Da Inglese a Indonesiano
+ ...
Eight million? WOW! Feb 7, 2013

Michael Beijer wrote:
And how about the amount of TMs that CafeTran can access in a project simultaneously?

I never tried more that three TMs (.tmx) and two glossaries (tab delimited .txt) at the same time, Michael. And that doesn't present any problems. However, the total number of TUs never came close to 8 million. I don't think I can even try it, because I probably don't have that number of TUs in one language pair. I hope somebody else can answer your question.

Cheers,

Hans


 
Pagine:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

DGT translation memories







Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »