Fast elimination of duplicate lines across multiple files

view story

http://unix.stackexchange.com – I have a huge amount of data in which each (data-)line should be unique. There are a lot of files in one folder in which this is already true. It is about 15GB splitted into roughly 170 files with 1000000 lines. Let's call that folder foo. Now there is a second folder (bar) with even more data: In each file, there are no multiple entries. The intersection of two files in bar is not necessarily empty. There are roughly 15k lines in each of the files there (and there are several thousands of files in bar). Right now I'm using awk 'NR==FNR{a[$0]=$0;next}!a[$0]' foo/file bar/file > tmp mv (HowTos)