11

How to remove duplicate lines inside a text file?

view full story
linux-howto

http://unix.stackexchange.com – A huge (up to 2 GiB) text file of mine contains about 100 exact duplicates of every line in it (useless in my case, as the file is a CSV-like data table). What I need is to remove all the repetitions while (preferably, but this can be sacrificed for a significant performance boost) maintaining the original sequence order. In the result each line is to be unique. If there were 100 equal lines (usually the duplicates are spread across the file and won't be neighbours) there is to be only one of the kind left. I have written a program in Scala (consider it Java if you don't know about Scala) to (HowTos)