Clustering data by matching columns

view full story

http://www.unix.com – I am stuck with by DNA clustering analysis. I thought this forum will be a great help with data manipulations. Please help me. I have a table with 91 columns. First I want to trim the table to only having rows where the column values are single characters which are A,T,G,C or 0. So any row having column values such as AA,AAG, AATG , Y, K etc has to be filtered out. I figured out the regular expression will be something like [0ATGC] Next I want to compare all the columns pairwise and group the columns which have the exact same values.The intermediate table output is not required. Example in (HowTos)