Linux Chinese Word processing?

marcelbdt
August 20, 2008 at 09:45 AM posted in General Discussion

For various reasons, I'm trying to learn python, and as an excercise I'm writing a short python module to analyse a Chinese text using open source  material.  It should give a list of characters ordered after frequences, and relevant combinations and words (with frequency information if it exists). Poor mans wenlin so to say. As dictionary, I use cedict, for frequencies there are various lists on the net. One thing I'd like to have is a traditional <-> simplified converter. I know it can't be made perfect, but maybe someone has already made a module for that?

Actually, it should not be so hard to make an open source Chinese text reader, by combining one of the many text editors with the dictionary information. Like Chinese-perakun, but built into a standard editor. Has someone already done just that? 

Profile picture
marcelbdt
August 21, 2008 at 05:29 AM

Thanks Andrew! It looks like a nice solution, certainly more elegant than what I had come up with. But as I said, I'm new to python. I'll do some experimenting with your code, especially adding semantic information to the class "Entry". I'll be back when I have got further (but I don't have that much time for this, so it might take a while).

Profile picture
andrew_c
August 20, 2008 at 01:48 PM

A while back I wrote a simplified character -> pinyin converter that I used to pinyinize emails from my in-laws in bulk.  It's in Python and uses CEDict.  http://cds.gmu.edu/~acorriga/chinese/parse_cedict.py  I imagine that it can be adapted to translate between simplified and traditional characters.

Regarding your second question I haven't done anything like that, but I agree that it would be nice to have a Perakun-like or Hanzibar-like plugin for a text editor.  It would be great if they had this for all of GNOME or KDE, so it worked uniformly across the desktop. As a cheap fix, I just edit Chinese text in Firefox with HanziBar running.

Obviously, it's not Python but there's ZDT, which is an open source plug-in for Eclipse for learning Chinese.  Perhaps that may be of use?