Strategic CPod Lesson Selection

December 22, 2009, 06:43 PM posted in General Discussion

I've come up with an idea that I'm just now starting to use and am really optimistic about.  It is a way to figure out which is the best ChinesePod lesson to study next.  I've written some very rough programs (bash scripts) that make an informed decision.  In a nutshell, it figures out which lesson's unlearned vocabulary is the most frequently used.

To do this, it requires 3 things:

1.  An exhaustive list of your vocabulary, not an easy thing to get unless you use pleco, anki or some other program that can track your vocabulary.

2.  Chinese character & bigram frequency data, freely available online.

3.  ChinesePod vocabulary lists for each lesson that you want to choose between.  (this is not easy to get, it would be nice if they would make this available to us in a more straightforward manner.)

So the script reads in your vocabulary, the frequency data and then looks at the vocab list in each lesson.  For each lesson, it figures out which words or characters that you have not yet learned, and then looks all those words/characters in the frequency table.  it then adds those frequencies up and then divides the total by the number of new words.  The results can then be sorted to determine the highest scored lesson, which is the lesson you should study next.

Currently, I'm using 2 separate programs to do this for single characters and bigrams.  I don't have a frequency list for words of arbitrary length, but it sure would be useful if I could find one.

Has anyone else done anything like this?

Profile picture
December 22, 2009, 08:25 PM

That is some pretty sweet code, man. I don't know what a bash script is. Can I run that (or something it compiles to) on my Mac? Are you willing to share?

Profile picture
December 22, 2009, 09:54 PM

bash is the default shell on linux.  I'm sure it will run on mac, but you need to install bash (v 4 or higher). 

I don't think it's possible to attach files here, but maybe we can find a page somewhere where I can post the scrits & frequency data.

Profile picture
December 23, 2009, 05:21 AM

according to google, the default shell on a Mac is also Bash.  you should have no problem running it Simon.

Profile picture
December 23, 2009, 10:23 AM

will it run in vbs on windows? Or maybe rather, can it be converted into vbs easily?


Profile picture
December 23, 2009, 01:15 PM

I think there is another thing worth considering when deciding ideal lesson order. If lessons were structured well, they should contain some of the new vocabulary from the last/recent lessons, thus providing some consolidation of recently learned vocabulary.

I believe this is one of the shortcomings of the Cpod method - lack of immediate consolidation - other than random expansion sentences lacking  context, and containing vocabulary which hasn't been encountered previously.

Profile picture
December 23, 2009, 09:05 PM

andrew_c:  the key is bash v4, v3 will not work because it does not support associative arrays, which I make use of in the script.

go_manly:  presumably if you are able to generate a comprehensive list of your vocabulary, then you are using some type of SRS system that will quiz you on previously learned words so that they don't fall out of your memory.  To calculate previously learned words would make the program a lot more difficult and also to some extent duplicate the functionality of SRS.  From what I understand, the Pimsleur audio lessons use the kind of progression, but the disadvantage is that it's difficult to know at what level to jump in and start listening.

With the lesson selector, everyone's recommendations might be different since they will be starting with their own vocabulary base.

I'm reposting this over on the pleco discussion boards and have attached the scripts there:

Profile picture
December 23, 2009, 09:26 PM

pretzellogic: I know that gnu tools are available for windows, so it should be able to run.  I have no knowledge of vbs, so not sure how easy it would be to convert, but the scripts are there if you feel inclined to do it.

Profile picture
December 23, 2009, 09:55 PM


I have no idea what SRS means. (or Bash, or Pleco, or Anki, or Bigram Frequency, or Vbs)

Profile picture
December 23, 2009, 10:09 PM


SRS- spaced repetition - words you know best are reviewed the least often.

Pleco- is a chinese dictionary

Anki - a great SRS flashcard program (free)

bigram frequency - how often two character words appear 

vbs- vbscript - active scripting language for programers

bash - linux shell (again for programers)

your dictionary and list of words used may actually help here. Although I tend to think we may be trying too hard here. Is the benefit (if there is one) really worth the effort?

Profile picture
December 26, 2009, 08:11 AM

So has anyone tried this out?  The output is kinda ugly and the scripts are not well documented.  Let me know if you guys have any questions.