Tags

9 pages

Large Model

Blog Translation Project Musings: Historical Conversations

The initial design of the blog translation project was overly complex – first parsing Markdown format, then using placeholders to protect the content, and finally sending it to a large model for translation. This was entirely unnecessary; large models inherently possess the ability to recognize Markdown syntax and can directly process the original content while maintaining formatting during translation.

Our work shifted from debugging code to debugging the prompting of the model. Model: google/gemma-3-4b Hardware: Nvidia 3060 12GB Indeed, we chose a non-thinking model – thinking models were inefficient when executing translation tasks. We compared the performance of 4b and 12b parameters, and for translation purposes, gemma3’s 4b parameter was sufficient; there was no significant advantage in terms of 12b parameters. 12b parameter speed: 11.32 tok/sec , 4b parameter speed: 75.21 tok/sec.