Segmentation: Guideline
 
 
1. 	Space and punctuation are segmentation markers;
2. 	Two or three character phrases are segmentation units;
3. 	Four character idioms are segmentation units;
4. 	Five and more character idioms are/are not segmentation units; 
5. 	Idiomatic terms are segmentation units;
6. 	Abbreviations are segmentation units;
7. 	Segmentation units + er are segmentation units;
8. 	Roman strings remain as is;
9. 	Foreign-loan words are not segmented;
10. Ambiguous cases depend on context.