Usually, the most desired way to run an sql script
through psql is 1) with the ability to feed it variables
that get evaluated by the script, and 2) with the expectation
that the script die after the first error that gets encountered
(rather than continuing on, as is the default).
Here's a sample that shows these two traits in action.
First, make a wrapper shell script to control how you feed your
sql script to psql. In this example, we assume that there are
two exactly two required arguments we want to give our shell
script that we want to feed to our sql script.
#!/bin/bash
set -e
set -u
if [ $# != 2 ]; then
echo "please enter a db host and a table suffix"
exit 1
fi
export DBHOST=$1
export TSUFF=$2
psql \
-X \
-U user \
-h $DBHOST \
-f /path/to/sql/file.sql \
--echo-all \
--set AUTOCOMMIT=off \
--set ON_ERROR_STOP=on \
--set TSUFF=$TSUFF \
--set QTSTUFF=\'$TSUFF\' \
mydatabase
psql_exit_status = $?
if [ $psql_exit_status != 0 ]; then
echo "psql failed while trying to run this sql script" 1>&2
exit $psql_exit_status
fi
echo "sql script successful"
exit 0
A few notes on the above shell script:
TNUM is fed to psql as both a bare value and a value wrapped
in single quotes, so that we can use the value in table or schema
names (TSUFF, unquoted) or in strings (QTSUFF, quoted)
We use -X to ensure we do not use the current unix users's .psqlrc
file
We echo all messages to the console, so we know what sql is being
executed (handy in the face of failures)
we use ON_ERROR_STOP to stop our sql script as soon as something
goes wrong
We set AUTOCOMMIT off so that we do not get a commit
after every statement in file.sql; instead, wherever there is a "commit;"
in file.sql, there will be a commit. If you want
the entire script to run as one transaction, put a "commit;" at
then end of your sql script (and nowhere else); otherwise, the script will
run successfully and then not get committed (roll back) at the end!
See also the next tip on the --single-transaction flag.
Here are the contents of /path/to/sql/file.sql:
begin;
drop index this_index_:TSUFF;
commit;
begin;
create table new_table_:TSUFF (
greeting text not null default '');
commit;
begin;
insert into new_table_:TSUFF (greeting)
values ('Hello from table ' || :QTSUFF);
commit;
Using the PG* environmental variables to make your shell scripts more terse.
For unit testing at a certain place I once worked for, this came in handy for
database setup/teardown scripts. We would use the PG* environmental variables in
our shell scripts, so that instead of running psql with -U for user and -h for host,
we would allow it to pick up PGUSER and PGHOST from the environment we were running
the shell script in. For safety, we would use bash's ability to provide a default
for each PG* environmental variable as we were setting it. This worked quite well.
#!/bin/bash
set -e
set -u
# Set these environmental variables to override them,
# but they have safe defaults.
export PGHOST=${PGHOST-localhost}
export PGPORT=${PGPORT-5432}
export PGDATABASE=${PGDATABASE-my_database}
export PGUSER=${PGUSER-my_user}
export PGPASSWORD=${PGPASSWORD-my_password}
RUN_PSQL="psql -X --set AUTOCOMMIT=off --set ON_ERROR_STOP=on "
${RUN_PSQL} <<SQL
select blah_column
from blahs
where blah_column = 'foo';
rollback;
SQL
Run a bunch of sql commands in a single transaction.
Often times, you will have a text file filled with SQL commands that you want to execute
as a single transaction, and it's easy to forget to put the "commit;" as the last command
in the file. There's a way around this, using the --single-transaction flag:
insert into foo (bar) values ('baz');
insert into yikes (mycol) values ('hello');
and both inserts will be wrapped in a begin/commit.
Make running lots of multi-line sql statements prettier
#!/bin/bash
set -e
set -u
RUN_ON_MYDB="psql -X -U myuser -h myhost --set ON_ERROR_STOP=on --set AUTOCOMMIT=off mydb"
$RUN_ON_MYDB <<SQL
drop schema if exists new_my_schema;
create table my_new_schema.my_new_table (like my_schema.my_table);
create table my_new_schema.my_new_table2 (like my_schema.my_table2);
commit;
SQL
$RUN_ON_MYDB <<SQL
create index my_new_table_id_idx on my_new_schema.my_new_table(id);
create index my_new_table2_id_idx on my_new_schema.my_new_table2(id);
commit;
SQL
Note too that you can use fun bash tricks to assign to multiline variables
and feed those to psql later:
CREATE_MY_TABLE_SQL=$(cat <<EOF
create table foo (
id bigint not null,
name text not null);
EOF
)
$RUN_ON_MYDB <<SQL
$CREATE_MY_TABLE_SQL
commit;
SQL
How to get a single scalar select value into a shell variable
CURRENT_ID=$($PSQL -X -U $PROD_USER -h myhost -P t -P format=unaligned $PROD_DB -c "select max(id) from users")
let NEXT_ID=CURRENT_ID+1
echo "next user.id is $NEXT_ID"
echo "about to reset user id sequence on other database"
$PSQL -X -U $DEV_USER $DEV_DB -c "alter sequence user_ids restart with $NEXT_ID"
How to get a one row select into bash variables named for each column in the row
#!/bin/bash
set -e
set -u
PSQL=/usr/bin/psql
DB_USER=myuser
DB_HOST=myhost
DB_NAME=mydb
$PSQL \
-X \
-h $DB_HOST \
-U $DB_USER \
-c "select username, password, first_name, last_name from users" \
--single-transaction \
--set AUTOCOMMIT=off \
--set ON_ERROR_STOP=on \
--no-align \
-t \
--field-separator ' ' \
--quiet \
$DB_NAME | while read -a Record ; do
username=${Record[0]}
password=${Record[1]}
first_name=${Record[2]}
last_name=${Record[3]}
echo "USER: $username $password $first_name $last_name"
done
How use bash to do a job in chunks using a control table
Let's pretend you have a job so large that you want to do it a bit at a time.
You decide you can do your job an item at a time, and that will be easier on the
database instead of doing one long-running query. You create a table called
my_schema.items_to_process, which has the item_id of each item you want to
process, and you add a column to that items_to_process table called done, which
defaults to false. You can then use a script that gets each not-done item from
items_to_process, processes it, then updates that item to done = true in
items_to_process. A bash script to do that would look something like this:
#!/bin/bash
set -e
set -u
PSQL="/u99/pgsql-9.1/bin/psql"
DNL_TABLE="items_to_process"
#DNL_TABLE="test"
FETCH_QUERY="select item_id from my_schema.${DNL_TABLE} where done is false limit 1"
process_item() {
local item_id=$1
local dt=$(date)
echo "[${dt}] processing item_id $item_id"
$PSQL -X -U myuser -h myhost -c "insert into my_schema.thingies select thingie_id, salutation, name, ddr from thingies where item_id = $item_id and salutation like 'Mr.%'" mydb
}
item_id=$($PSQL -X -U myuser -h myhost -P t -P format=unaligned -c "${FETCH_QUERY}" mydb)
dt=$(date)
while [ -n "$item_id" ]; do
process_item $item_id
echo "[${dt}] marking item_id $item_id as done..."
$PSQL -X -U myuser -h myhost -c "update my_schema.${DNL_TABLE} set done = true where item_id = $item_id" mydb
item_id=$($PSQL -X -U myuser -h myhost -P t -P format=unaligned -c "${FETCH_QUERY}" mydb)
dt=$(date)
done
How to copy a table from one database to another
There are many ways to do this, but the psql client's \copy command
can be used to copy data from one table to another. Let's say you have
two databases, olddb and newdb. olddb has a table called users, and
newdb has a new, empty table called users that is currently empty but
needs the data from olddb's users. Here's how to accomplish this
on the command line, piping the output of one psql session into
a second psql session:
psql \
-X \
-U user \
-h oldhost \
-d olddb \
-c "\\copy users to stdout" \
| \
psql \
-X \
-U user \
-h newhost \
-d newdb \
-c "\\copy users from stdin"
A NOTE ON FAILING: I've gotten in teh habit of putting two things at the top of
all of my bash scripts:
# fail on uninitialized vars rather than treating them as null
set -u
# fail on the first program that returns $? != 0
set -e
However, when two programs run at once (one piping into the other), the exit status of the last
program ends up in $?, and the exit status of the first program gets lost. (Slight lie there: The exit status
of the last program in the example above is also put in ${PIPESTATUS[1]}, and the exit status of
the first program is put in ${PIPESTATUS[0]}.) When I want the shell script to fail on the failure
of any program in the pipeline, I then set three things at the top of my shell script:
set -u
set -e
set -o pipefail
A MORE DIFFICULT EXAMPLE: Let's say your users table in olddb has three columns:
first_name
middle_name
last_name
But your users table in newdb has two columns:
first_name
last_name
You can still use psql's copy command to get what you want:
psql \
-X \
-U user \
-h oldhost \
-d olddb \
-c "\\copy (select first_name, last_name from users) to stdout" \
| \
psql \
-X \
-U user \
-h newhost \
-d newdb \
-c "\\copy users from stdin"
Shell script to dump contents of bytea column to a file
Please note that bytea columns, in pg 9.0 and above, display as hex,
with an irritating leading '\x', which can be removed using pg's
substring funciton.
#!/bin/bash
set -e
set -u
psql \
-P t \
-P format=unaligned \
-X \
-U myuser \
-h myhost \
-c "select substring(my_bytea_col::text from 3) from my_table where id = 12" \
mydb \
| xxd -r -p > dump.txt
Shell script to show stats of a particular table in a particular database
#!/bin/bash
set -e
set -u
if [ -z "$1" ]; then
echo "Usage: $0 table [db]"
exit 1
fi
SCMTBL="$1"
SCHEMANAME="${SCMTBL%%.*}" # everything before the dot (or SCMTBL if there is no dot)
TABLENAME="${SCMTBL#*.}" # everything after the dot (or SCMTBL if there is no dot)
if [ "${SCHEMANAME}" = "${TABLENAME}" ]; then
SCHEMANAME="public"
fi
if [ -n "$2" ]; then
DB="$2"
else
DB="my_default_db"
fi
PSQL="psql -U my_default_user -h my_default_host -d $DB -x -c "
$PSQL "
select '-----------' as \"-------------\",
schemaname,
tablename,
attname,
null_frac,
avg_width,
n_distinct,
correlation,
most_common_vals,
most_common_freqs,
histogram_bounds
from pg_stats
where schemaname='$SCHEMANAME'
and tablename='$TABLENAME';
" | grep -v "\-\[ RECORD "