Strategic CPod Lesson Selection
I've come up with an idea that I'm just now starting to use and am really optimistic about. It is a way to figure out which is the best ChinesePod lesson to study next. I've written some very rough programs (bash scripts) that make an informed decision. In a nutshell, it figures out which lesson's unlearned vocabulary is the most frequently used.
To do this, it requires 3 things:
1. An exhaustive list of your vocabulary, not an easy thing to get unless you use pleco, anki or some other program that can track your vocabulary.
2. Chinese character & bigram frequency data, freely available online.
3. ChinesePod vocabulary lists for each lesson that you want to choose between. (this is not easy to get, it would be nice if they would make this available to us in a more straightforward manner.)
So the script reads in your vocabulary, the frequency data and then looks at the vocab list in each lesson. For each lesson, it figures out which words or characters that you have not yet learned, and then looks all those words/characters in the frequency table. it then adds those frequencies up and then divides the total by the number of new words. The results can then be sorted to determine the highest scored lesson, which is the lesson you should study next.
Currently, I'm using 2 separate programs to do this for single characters and bigrams. I don't have a frequency list for words of arbitrary length, but it sure would be useful if I could find one.
Has anyone else done anything like this?
koujiachengDecember 22, 2009, 09:54 PM
bash is the default shell on linux. I'm sure it will run on mac, but you need to install bash (v 4 or higher).
I don't think it's possible to attach files here, but maybe we can find a page somewhere where I can post the scrits & frequency data.
go_manlyDecember 23, 2009, 01:15 PM
I think there is another thing worth considering when deciding ideal lesson order. If lessons were structured well, they should contain some of the new vocabulary from the last/recent lessons, thus providing some consolidation of recently learned vocabulary.
I believe this is one of the shortcomings of the Cpod method - lack of immediate consolidation - other than random expansion sentences lacking context, and containing vocabulary which hasn't been encountered previously.
koujiachengDecember 23, 2009, 09:05 PM
andrew_c: the key is bash v4, v3 will not work because it does not support associative arrays, which I make use of in the script.
go_manly: presumably if you are able to generate a comprehensive list of your vocabulary, then you are using some type of SRS system that will quiz you on previously learned words so that they don't fall out of your memory. To calculate previously learned words would make the program a lot more difficult and also to some extent duplicate the functionality of SRS. From what I understand, the Pimsleur audio lessons use the kind of progression, but the disadvantage is that it's difficult to know at what level to jump in and start listening.
With the lesson selector, everyone's recommendations might be different since they will be starting with their own vocabulary base.
I'm reposting this over on the pleco discussion boards and have attached the scripts there:
koujiachengDecember 23, 2009, 09:26 PM
pretzellogic: I know that gnu tools are available for windows, so it should be able to run. I have no knowledge of vbs, so not sure how easy it would be to convert, but the scripts are there if you feel inclined to do it.
RJDecember 23, 2009, 10:09 PM
SRS- spaced repetition - words you know best are reviewed the least often.
Pleco- is a chinese dictionary
Anki - a great SRS flashcard program (free)
bigram frequency - how often two character words appear
vbs- vbscript - active scripting language for programers
bash - linux shell (again for programers)
your dictionary and list of words used may actually help here. Although I tend to think we may be trying too hard here. Is the benefit (if there is one) really worth the effort?