Download Latest Version JapaneseFrequencyListSorter_v2.3_Source_Code.7z (23.0 kB)
Email in envelope

Get an email when there's a new version of Japanese Frequency List Sorter

Home
Name Modified Size InfoDownloads / Week
Japanese_Frequency_List_Sorter_v2.3 2015-10-11
readme.txt 2015-10-11 3.9 kB
Totals: 2 Items   3.9 kB 0
cb's Japanese Frequency List Sorter
--------------------------------------------------------------------------------

Description and Usage:
----------------------

This tool sorts a list of Japanese words or kanji based on their frequency.

The list can be simple like this:

合図
挨拶
愛情

Or the list can have multiple columns like this:

合図	あいず	sign, signal
挨拶	あいさつ	greeting
愛情	あいじょう	love, affection

In the latter case, each value must be separated with tabs.

If the lines contain multiple columns, you will need to specify which column
contains the word or kanji. In both of the above cases, the word is in column 1.

You will need to select the frequency report to sort against with the
"Frequency report to sort against option". Three good, pre-made frequency reports
are available for use. They were generated using cb's Japanese Text Analysis Tool
(JTAT) on a corpus of 5000+ Japanese novels. If you want to sort a list of kanji,
select "Kanji Frequency Report". If you want to sort a list of words, select either
"Word Frequency Report (MeCab)" or "Word Frequency Report (JParser)". Advanced
users may generate their own frequency report with JTAT and use it by selecting
the "(user-specified frequency report)" option and providing the path of the report.

The sorted output list would look like this:

挨拶
愛情
合図

To output the entire line from the input file instead of just the word,
check the "Output entire line" option. With this option selected, the
sorted output list would look like this:

挨拶	あいさつ	greeting
愛情	あいじょう	love, affection
合図	あいず	sign, signal

You may append frequency information to the output list by checking one of
the following options:

  1) Append number of times encountered

     This number is taken from the selected frequency report.

  2) Append Frequency Group

     Description of Frequency Group from Japanese Text Analysis Tool:

     "All words in the analysis that share the exact same frequency
     will be assigned to a numbered Frequency Group, with group 1
     containing the most common word(s), group 2 containing the
     next most common word(s), and so on."

  3) Append Frequency Rank

     Description of Frequency Rank from Japanese Text Analysis Tool:

     "For a given word, the Frequency Rank is the total
     number of words in the analysis that are more frequent
     that the given word + 1. For example, if the given word has a
     Frequency Rank of 500, then there are 499 other words in the
     analysis that are more frequent than the given word."

With all 3 options checked, the sorted output list would look like this:

挨拶	22018	1243	1256
愛情	11568	2271	2338
合図	10860	2389	2465


How to Install and Launch:
--------------------------
1) Unzip cb's Japanese Frequency List Sorter.
2) In the unzipped directory, simply double-click JapaneseFrequencyListSorter.exe.
   (Note: Requires .Net Framework 3.5.)


Contact:
--------
Christopher Brochtrup
cb4960@gmail.com


--------------------------------------------------------------------------------

Version History:
----------------
[Version 2.3 (10-October-2015)]
- Added "Output entire line" option.
- Added "Append number of times encountered" option.
- Added "Append Frequency Group option.
- Added "Append Frequency Rank" option.
- Updated word_freq_report_mecab.txt and kanji_freq_report.txt.

[Version 2.2 (27-October-2013)]
- Updated word_freq_report_mecab.txt.

[Version 2.1 (14-July-2013)]
- Updated word_freq_report_mecab.txt.
- Added default output file.

[Version 2.0 (27-May-2012)]
- Added option to specify which frequency report to compare against.
- Added settings.txt.

[Version 1.0 (24-May-2012)]
- Initial version.


Source: readme.txt, updated 2015-10-11