Segmentation: Guideline
1. Space and punctuation are segmentation markers;
2. Two or three character phrases are segmentation units;
3. Four character idioms are segmentation units;
4. Five and more character idioms are/are not segmentation units;
5. Idiomatic terms are segmentation units;
6. Abbreviations are segmentation units;
7. Segmentation units + er are segmentation units;
8. Roman strings remain as is;
9. Foreign-loan words are not segmented;
10. Ambiguous cases depend on context.