Text segmentation and filters applied during file import
In this guide: What happens when you upload or import a document to Redokun imports? What is text segmentation?
💡 Text segmentation: The process of dividing written text into meaningful units, such as words, phrases, sentences, paragraphs, or topics.
When you upload a document to Redokun, the software runs a process called segmentation that breaks the text into smaller "segments".
The goal of creating these segments is to make translation easier and faster. Here's how:
- The translator can focus on translating smaller units of text rather than having to look at the entire written content.
- Having smaller logical units of text makes it easier to reuse their translations in the future projects. It is more difficult to achieve this with larger units of text, which tend to differ in word choice, word order, and length.
Types of segmentation available in Redokun
Redokun offers two types of segmentation:
- Paragraph segmentation (default): the text is segmented into paragraphs.
- Sentence segmentation: Additionally, the paragraphs can be further segmented into sentences.
Which type of segmentation should you use?
You can choose between the two types of text segmentation.
Redokun uses paragraph segmentation by default because it makes the translation process less complex, especially for non-professional translators. Since they are usually not trained to evaluate fully the complexities of the full text, paragraph segments present a more "digestible" form.
On the other hand, sentence segmentation produces around 10-30% more segments on average, but it also enhances the reusability of past translations.
We suggest using sentence segmentation if you are working with translation vendors or professional translators.
How to change segmentation type
To change the segmentation type:
- Go to the Settings.
- Go to the Segmentation tab.
- Select or deselect
Use sentence segmentation.
- Click on
Text segmentation and wrapping
Redokun creates a new segment whenever it detects a hard-return or a tabulation.
If you need to wrap your text, consider using the soft-return character (shift + return). With the soft-return, Redokun creates a single segment.
If you want Redokun to show your translator where the text has been wrapped, go to
Advanced Settings >
- For more info, here is a more precise explanation of soft return (from Wikipedia).
A soft return or soft wrap is the break resulting from line wrap or word wrap (whether automatic or manual), whereas a hard return or hard wrap is an intentional break, creating a new paragraph. Soft wrapping allows line lengths to adjust automatically with adjustments to the width of the user's window or margin settings. [...] Manual soft breaks are unnecessary when word wrap is done automatically, so hitting the "Enter" key usually produces a hard return.
Filters applied during the import
Redokun's engine automatically filters numbers, percentages, units of measure, and major currencies so that you don't have to translate them. This is especially helpful in translating documents where you have large data tables.
|Numbers||11 1.1 1,1|
|Percentage||11% 1,1% 1.1%|
|Units of measure||11mm 11ft 1.1W 1,1Kg|
|Technical data||230/1/50 100/10,200/20|
|Currencies||11$ 11€ 11£|
|Phone numbers||1-888-555-8888 +39 0423 1780033|
|Non-alphanumeric values||# | \|
Best practices for InDesign and Word documents
If you are often dealing with InDesign or Word documents, we created these free resources to help you improve your workflow:
- InDesign: ebook
- Word: blog post + ebook