Quick way to find duplicate entries in MySQL table

I recently needed a script that could find duplicate entries in a particular table so I wrote the following bash script…

#!/bin/bash

SERVER=db1.some.domain.com
PORT=9090
USER=root
PASS=somepassword
TABLE=sometable
FIELD=somefield
CSV=$HOME/domain/bin/duplicate-emails.csv
DB=somedatabase

mysql -h$SERVER --port=$PORT -u$USER --password="$PASS" $DB -e "SELECT $FIELD FROM $TABLE" | sed 's/\t/|/g' > $CSV

#sort fields alphabetically, use uniq to return duplicates, use sort again to output sorted list of duplicate entries to file
sort $CSV | uniq -d -i | sort -o $CSV

The script does the following…

  1. Exports all entries for a particular field into a file.
  2. Uses sort to sort the fields alphabetically.
  3. Uses uniq to output just the duplicate entries.
  4. Uses sort again to output the sorted duplicates into a file.

Feel free to use and if you have any suggestions on better/quicker/tidier ways of doing it, please leave a comment.

4 thoughts on “Quick way to find duplicate entries in MySQL table

    1. Thanks Daryl, I have used queries like that in the past. I was just wary in this case as it was a large and busy production table and I didn’t want to hog it while the query ran. However looking at the query I could have used.. something like..

      EXPLAIN SELECT field, count(id) as cnt FROM table GROUP BY field HAVING cnt > 1;

      It doesn’t look anymore expensive as using the SELECT in the script above… perhaps because the field is indexed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s