1

extract regions of file based on start and end position

view story
linux-howto

http://www.unix.com – Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2. Based on a post elsewhere, I found the code: Code: awk 'NR==FNR{if($0~/^>/){i=substr($0,2);getline};a[i]=a[i] $0;next}{print ">" $1 ORS substr(a[$1], $2, $3-$2+1)}' file1 FS=\\t file2 But with the files I have, regions are extracted from only a subset of the specified sequences. file1 (my real file is much longer, >47000 lines, and each sequence is mu (HowTos)