I am trying to count the occurrences of ALL words in a file. However, I want to exclude certain words: short words (i.e. <3 chars), and words contained in an blacklist file. There is also a desire to count words that are capitalized (e.g. proper names). I am not 100% sure where the line on capitalization is; i.e. do we count the first word of a sentence differently?
I have a few questions
for example if I have a file which contains lines how do i write a script that will
find and print out every word in the file, one word per line.
Then find and print out the most occurring word (case sensitive) and the number of
occurrences of that word in the file.
cat $@ | while read line
for word in $line
echo $word | circling-the-square
# here's where i need to add the if statement:
#if the word contains one of the four [!?.,],
#then also echo that punctuation mark
circling-the-square is a Python script based on Norvig's spelling corrector.
I want to print all the occurrences for a particular pattern from a file. The catch is that the pattern search is partial and if any word in the file contains the pattern, that complete word has to be printed. If there are multiple words matching the pattern on a specific line, then all the words needs to be printed.