| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Japanese_Frequency_List_Sorter_v2.3 | 2015-10-11 | ||
| readme.txt | 2015-10-11 | 3.9 kB | |
| Totals: 2 Items | 3.9 kB | 0 |
cb's Japanese Frequency List Sorter
--------------------------------------------------------------------------------
Description and Usage:
----------------------
This tool sorts a list of Japanese words or kanji based on their frequency.
The list can be simple like this:
合図
挨拶
愛情
Or the list can have multiple columns like this:
合図 あいず sign, signal
挨拶 あいさつ greeting
愛情 あいじょう love, affection
In the latter case, each value must be separated with tabs.
If the lines contain multiple columns, you will need to specify which column
contains the word or kanji. In both of the above cases, the word is in column 1.
You will need to select the frequency report to sort against with the
"Frequency report to sort against option". Three good, pre-made frequency reports
are available for use. They were generated using cb's Japanese Text Analysis Tool
(JTAT) on a corpus of 5000+ Japanese novels. If you want to sort a list of kanji,
select "Kanji Frequency Report". If you want to sort a list of words, select either
"Word Frequency Report (MeCab)" or "Word Frequency Report (JParser)". Advanced
users may generate their own frequency report with JTAT and use it by selecting
the "(user-specified frequency report)" option and providing the path of the report.
The sorted output list would look like this:
挨拶
愛情
合図
To output the entire line from the input file instead of just the word,
check the "Output entire line" option. With this option selected, the
sorted output list would look like this:
挨拶 あいさつ greeting
愛情 あいじょう love, affection
合図 あいず sign, signal
You may append frequency information to the output list by checking one of
the following options:
1) Append number of times encountered
This number is taken from the selected frequency report.
2) Append Frequency Group
Description of Frequency Group from Japanese Text Analysis Tool:
"All words in the analysis that share the exact same frequency
will be assigned to a numbered Frequency Group, with group 1
containing the most common word(s), group 2 containing the
next most common word(s), and so on."
3) Append Frequency Rank
Description of Frequency Rank from Japanese Text Analysis Tool:
"For a given word, the Frequency Rank is the total
number of words in the analysis that are more frequent
that the given word + 1. For example, if the given word has a
Frequency Rank of 500, then there are 499 other words in the
analysis that are more frequent than the given word."
With all 3 options checked, the sorted output list would look like this:
挨拶 22018 1243 1256
愛情 11568 2271 2338
合図 10860 2389 2465
How to Install and Launch:
--------------------------
1) Unzip cb's Japanese Frequency List Sorter.
2) In the unzipped directory, simply double-click JapaneseFrequencyListSorter.exe.
(Note: Requires .Net Framework 3.5.)
Contact:
--------
Christopher Brochtrup
cb4960@gmail.com
--------------------------------------------------------------------------------
Version History:
----------------
[Version 2.3 (10-October-2015)]
- Added "Output entire line" option.
- Added "Append number of times encountered" option.
- Added "Append Frequency Group option.
- Added "Append Frequency Rank" option.
- Updated word_freq_report_mecab.txt and kanji_freq_report.txt.
[Version 2.2 (27-October-2013)]
- Updated word_freq_report_mecab.txt.
[Version 2.1 (14-July-2013)]
- Updated word_freq_report_mecab.txt.
- Added default output file.
[Version 2.0 (27-May-2012)]
- Added option to specify which frequency report to compare against.
- Added settings.txt.
[Version 1.0 (24-May-2012)]
- Initial version.