13

Script to count word occurrences, but exclude some?

view full story
linux-howto

http://www.unix.com – I am trying to count the occurrences of ALL words in a file. However, I want to exclude certain words: short words (i.e. <3 chars), and words contained in an blacklist file. There is also a desire to count words that are capitalized (e.g. proper names). I am not 100% sure where the line on capitalization is; i.e. do we count the first word of a sentence differently? What if it is a word that would be capitalized in the middle of a sentence, e.g. a name? So working on the other parts is more important, but any other input would be appreciated. I have put together a command to do the w (HowTos)