editing a 26 gigabyte text file

view full story

http://forums.fedoraforum.org – I did a pg_dump of our data warehouse, and it is 26 gigabytes. I wanted to load this data into a different database product (one that forked from Postgresql, so its syntax is very similar). I needed to change the syntax of the "COPY" command in the pg_dump output, remove references to "public", and change the table owner name. sed did these for me (albeit slowly). The output from pg_dump is a file called "pg-dump-output.sql". pg_dump produces this: Code: ... COPY table_name (col1, col2, col3, col4, col5) FROM stdin; ... ALTER TABLE public.table_name OWNER TO postgres_owner; I needed: Code: ... COPY table_name FROM stdin; ... ALTER TABLE table_name OWNER TO otherdb_owner; Here is how I did it: Code: $ sed -e "s:\(COPY [a-z_]\+\) ([a-zA-Z0-9_, ]\+):\1:" -e "s:public\.::" -e "s:OWNER TO postgres_owner:OWNER TO otherdb_owner:" -i pg-dump-output.sql The last part of this problem is this: pg_dump puts 14 lines of unnecessary junk at the start of the file (unnecessary for the target database, at least) ... Code: -- -- PostgreSQL database dump -- -- Started on 2009-07-08 12:46:10 UTC SET client_encoding = 'UTF8'; SET standard_conforming_strings = off; SET check_function_bodies = false; SET client_min_messages = warning; SET escape_string_warning = off; SET search_path = public, pg_catalog; I wanted to trim that off -- without re-writing the entire 26 gigabyte file. I came up with a solution that worked for me, but am not sure it is the 'best' one. I thought I'd post here to see if anyone else had a better idea. First, use 'wc' to find out how many bytes these unnecessary lines take up: Code: $ head -14 pg-dump-output.sql | wc -     14      37    286 - Then I used the "offset" feature of losetup to create a loopback file that points 287 bytes into the beginning of the 26 gigabyte file: Code: $ sudo losetup -f -o 287 /somedir/otherdir/pg-dump-output.sql $ sudo chmod 644 /dev/loop0 I then used /dev/loop0 as the input 'file' for the SQL command shell for the other database. Thoughts? (HowTos)