Commandline Fu: Find and Replace strings in files without opening them using ‘sed’

One thing you discover when moving from Windows to Linux is just how much you can accomplish from the console [aka commandline / terminal] in Linux. There are withdrawal pains at first, of course. Things seem arduous and difficult, you have to look up the the syntax of different commands over and over, and  you want your GUI back. Little by little though, it strikes you just how much time you’re saving.

Consider this scenario: You’re working on an app to automate data migration for a MySQL database, for example to update a QA database with data from the production instance. You need to extract the data as ‘INSERT’ statements, probably using another command-line tool, mysqldump. Some of that same data will already exist in the QA copy though, causing conflicts when you try load the PROD extract. Fortunately the MySQL developers thought of this sort of thing and provided a ‘REPLACE INTO’ command; it works just like ‘INSERT INTO’ except that it updates any data that already exists in the destination instead of trying to insert it again. However, mysqldump writes out ‘INSERT’ statements, not ‘REPLACE’ statements.

Enter the ‘sed‘ command in bash. sed is a stream editor for filtering and transforming text. Using sed in conjunction with mysqldump and bash’s powerful piping and redirection capabilities, you can do all 3 of these things in one fell swoop:

  1. Use mysqldump to extract your data in a format that’s easily loaded into another database;
  2. Find every occurrence of the phrase ‘INSERT INTO’ in the extract and replace it with ‘REPLACE INTO’ using sed;
  3. Redirect the modified output from sed into a file (it would normally go to the screen, which is probably less than useful).

Using these commands you can do this all with one line of text at the command prompt (ignore wrapping and type on a single line):

$ mysqldump --raw=true --skip-opt --column-names=false -n -t -e -c --hex-blob
  | sed -e 's/INSERT INTO/REPLACE INTO/g' > data_extract.sql;

Pretty cool, huh?

Here’s what’s going on. First, mysqldump extracts the data (I’ll explain all the switches further down). Next, bash’s pipe operator ( “|” ) tells the command interpreter to send the output of the preceding command to another program before displaying it on the console. We sent it to sed, and gave sed an expression telling it to replace every ‘INSERT INTO’ occurrence with ‘REPLACE INTO’. Lastly, bash’s redirect operator ( “>” ) sends the output of everything leading up to it into a file named data_extract.sql instead of showing it on the screen. Voilà! You have a file you can import conflict-free into your QA database.

Using ‘-e’ with sed means an expression will immediately follow. The pattern for find-and-replace expressions is ‘s/pattern/replacement/[flags]’. We used ‘/g’ for flags, which means replace all occurrences of pattern with replacement. (See here for a more in-depth tutorial on sed.)

Lastly, here’s a bit of explanation on what all those arguments to mysqldump were all about. mysqldump can extract a database’s structure, data, or both. You control the specifics with arguments, some examples being:

# -c = complete insert (insert using explicit column names)
# -d = nodata
# -e = extended inserts (multiple rows per INSERT statement,
       instead of one by one INSERTs)
# -n = --no-create-db - don't create db in destination
       (i.e. use existing)
# -t = --no-create-info = skip create table statements
# -p = ask for password, -psecret = --password=secret
# --skipt-opt: see below, gets rid of MyISAM only "diable keys"
#		(ALWAYS put BEFORE -c and -e!!!)
# --skip-triggers
# -q = quick stream, don't buffer entire dataset (good for large tables)
# -uroot = switch to root user
# --hex-blob = convert binary to 0xHEX notation
	same format as:
        select CONCAT('0x', HEX(UNHEX(REPLACE(UUID(), '-', ''))));
#  --single-transaction is a much better option than locking for InnoDB,
	because it does not need to lock the tables at all.
	To dump big tables, you should combine this option with --quick.
# --opt, --skip-opt (PUT BEFORE -c and -e)
	This option is shorthand; it is the same as specifying
	--add-drop-table --add-locks --create-options --disable-keys
	--extended-insert --lock-tables --quick --set-charset. It should
	give you a fast dump operation and produce a dump file that can be
	reloaded into a MySQL server quickly.  As of MySQL 4.1, --opt is on
	by default, but can be disabled with --skip-opt. To disable only
	certain of the options enabled by --opt, use their --skip forms; for
	example, --skip-add-drop-table or --skip-quick.

Happy terminals!

  1. sed -e ‘s/INSERT INTO/REPLACE INTO/g’ would apply the changes in the same file. Neater, no ?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: