Pagine: [1 2] > | DGT translation memories Iniziatore argomento: Dominique Pivard
| | 2 million in English-Spanish | Feb 6, 2013 |
Thanks a lot Dominique! This is great information and I appreciate it. I downloaded the package and made English-Spanish. Contains nearly 2 million segments as well. I also plan to make my other main pair, German-Spanish.
I am keeping this memory as background information for EU related translations in our memoQ server here.
The extraction took just over an hour on my machine. | | |
Tomás Cano Binder, CT wrote:
The extraction took just over an hour on my machine.
You must have a much faster computer than the one I used (a three-year old laptop with an AMD processor and 4 GB of RAM)!
Let me know how the import in memoQ goes, because when I tried, I wasn't able to complete it. Not a problem for me, because I'm searching the DGT (and other very large TM's) with dtSearch, but I think memoQ (and probably several other tools) may have problems dealing with TM's that big. | | | Meta Arkadia Local time: 02:46 Da Inglese a Indonesiano + ... No problems, and problems | Feb 6, 2013 |
Dominique Pivard wrote:
but I think memoQ (and probably several other tools) may have problems dealing with TM's that big.
I use the DGT for GER>DUT as one of three TMs in CafeTran without problems. I assigned 6 GB of RAM to Java, and DGT "pre-translates" (another strange "Igor term" which means auto-assemble) as a database with a low priority. I also set it to Read Only.
Searching within DGT provides instant results.
Not that I don't have problems, though: http://www.proz.com/forum/apple_mac_operating_systems/242687-automated_search_help_needed.html but they have nothing to do with DGT, and everything with searching in databases.
Cheers,
Hans | |
|
|
Michael Beijer Regno Unito Local time: 19:46 Membro Da Olandese a Inglese + ... re: importing large (DGT) TMXs into memoQ | Feb 6, 2013 |
In order to get the really big ones into memoQ you need to cut them in half in a text editor and import them in two goes. For really big files, I recommend EmEditor (which can handle sizes that even UltraEdit can't).
Michael
http://www.emeditor.com/
[Edited at 2013-02-06 10:06 GMT] | | |
Michael Beijer wrote:
In order to get the really big ones into memoQ you need to cut them in half in a text editor and import them in two goes. For really big files, I recommend EmEditor (which can handle sizes that even UltraEdit can't).
Yes, now you're mentioning it, I remember you talked about splitting the TMX before importing in memoQ. Did you remember how long it took to import each half? Did you import the 2nd half into the same TM as the 1st half, or to a separate memoQ TM? Do you find you get useful LSC hits from the DGT TM's?
Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011? If so, how long did it take (for instance compared to memoQ)? | | | Studio positive | Feb 6, 2013 |
Dominique Pivard wrote:
Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011?
No problems with Studio; took about four hours on an older i3 2.5 GHz machine with 3 GB RAM. | | | Studio struggles above 2M | Feb 6, 2013 |
Stanislav Pokorny wrote:
Dominique Pivard wrote:
Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011?
No problems with Studio; took about four hours on an older i3 2.5 GHz machine with 3 GB RAM.
I've done a couple of tests with Studio (2009 only). It slows down as the number of segments goes up. So it might do 100,000 segments in 2 minutes and 1 million segments in an hour and a half (random figure), but it will take six hours to import two million. In my experience, about two million is the upper limit. I tried to import 6 million TUs once, and killed it after sixteen hours. It was not even halfway done IIRC. Maybe 2011 brought improvements in this regard, I will soon test it.
I'm not sure if lookup performance is better with multiple smaller TMs but I suspect it might.
In any case, the size of the DGT-TM is right about where Studio starts to crap out.
I asked about this in a separate thread here: http://www.proz.com/forum/cat_tools_technical_help/237113-very_large_tms_~10_million_tu.html | |
|
|
Michael Beijer Regno Unito Local time: 19:46 Membro Da Olandese a Inglese + ... as far as I can remember... | Feb 6, 2013 |
Dominique Pivard wrote:
Yes, now you're mentioning it, I remember you talked about splitting the TMX before importing in memoQ. Did you remember how long it took to import each half? Did you import the 2nd half into the same TM as the 1st half, or to a separate memoQ TM? Do you find you get useful LSC hits from the DGT TM's?
Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011? If so, how long did it take (for instance compared to memoQ)?
Hi Dominique,
1. I can't remember exactly how long it took for each half (of 330MB), maybe around 40 minutes or so each (on a 64-bit desktop with a 3.07 GHz i7, 16GB of RAM and an SSD).
2. I imported the 2nd half into the same TM as the 1st half.
3. I have LSC (longest substring concordance) switched off. I find it never has anything useful to report. Incidentally, I also have Predictive Typing & AutoPick (and the Muse) switched off, as I find they just get in my way when translating.
4. I tried importing it into Déjà Vu X2, but gave up after 8 hours.
Michael
[Edited at 2013-02-06 13:54 GMT] | | |
Michael Beijer wrote:
4. I tried importing it into Déjà Vu X2, but gave up after 8 hours.
As a big DVX fan, I can but confirm than a large TMX import in DVX is a PITA ![](https://cfcdn.proz.com/images/bb/smiles/icon_frown.gif)
Now, after some months, I don't remember exactly but the header of the DGT TMX is/was incorrect and the file can't be imported "as is", it was necessary to edit in a decent text editor.
A good practice is to import a smaller TMX, compact the DVMDB, then import another smaller TMX, compact the DVMDB, etc.
Cheers
GG | | | Meta Arkadia Local time: 02:46 Da Inglese a Indonesiano + ... A very short screencast of DGT GER-DUT | Feb 7, 2013 |
in CafeTran. I go to the next segment, DGT (and my other TMs and glossaries) Auto-Assembles. Next, I select a word to search in DGT (and other resources).
The screencast is short because, er, CT doesn't take much time to arrive at the desired results…
http://www.screencast.com/t/E4IDfKcMueF
Cheers,
Hans | | |
Do you mean that CafeTran with no indexing of the TM is that fast?
How about opening the TMX file, how many hours did that take? | |
|
|
Meta Arkadia Local time: 02:46 Da Inglese a Indonesiano + ...
trhanslator wrote:
Do you mean that CafeTran with no indexing of the TM is that fast?
Well, yes. But it's set to "pre-translate", and that explains the very fast auto-assemble results. However, searching within the DGT is fast as well, as you can see in my miserably short screencast.
How about opening the TMX file, how many hours did that take?
Seconds. With 6 GB of RAM assigned to Java. And CafeTran loads TMs in RAM.
Cheers,
Hans | | | Michael Beijer Regno Unito Local time: 19:46 Membro Da Olandese a Inglese + ... @Hans (Meta Arkadia): | Feb 7, 2013 |
And how about the amount of TMs that CafeTran can access in a project simultaneously? In memoQ I have around 8,000,000 segments across all of my connected TMs and experience no slowdowns. How does this work in CT?
Michael
[Edited at 2013-02-07 11:06 GMT] | | | Meta Arkadia Local time: 02:46 Da Inglese a Indonesiano + ... Eight million? WOW! | Feb 7, 2013 |
Michael Beijer wrote:
And how about the amount of TMs that CafeTran can access in a project simultaneously?
I never tried more that three TMs (.tmx) and two glossaries (tab delimited .txt) at the same time, Michael. And that doesn't present any problems. However, the total number of TUs never came close to 8 million. I don't think I can even try it, because I probably don't have that number of TUs in one language pair. I hope somebody else can answer your question.
Cheers,
Hans | | | Pagine: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » DGT translation memories Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
| Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |