I recently needed a script that could find duplicate entries in a particular table so I wrote the following bash script…
[source language=“bash“]
#!/bin/bash
SERVER=db1.some.domain.com
PORT=9090
USER=root
PASS=somepassword
TABLE=sometable
FIELD=somefield
CSV=$HOME/domain/bin/duplicate-emails.csv
DB=somedatabase
mysql -h$SERVER –port=$PORT -u$USER –password="$PASS" $DB -e "SELECT $FIELD FROM $TABLE" | sed ’s/\t/|/g‘ > $CSV
#sort fields alphabetically, use uniq to return duplicates, use sort again to output sorted list of duplicate entries to file
sort $CSV | uniq -d -i | sort -o $CSV
[/source]
The script does the following…
- Exports all entries for a particular field into a file.
- Uses sort to sort the fields alphabetically.
- Uses uniq to output just the duplicate entries.
- Uses sort again to output the sorted duplicates into a file.
Feel free to use and if you have any suggestions on better/quicker/tidier ways of doing it, please leave a comment.

Seems like you could do this in pure MySQL with a GROUP BY on the field, a COUNT, and HAVING COUNT(*) > 1.
Thanks Daryl, I have used queries like that in the past. I was just wary in this case as it was a large and busy production table and I didn’t want to hog it while the query ran. However looking at the query I could have used.. something like..
EXPLAIN SELECT field, count(id) as cnt FROM table GROUP BY field HAVING cnt > 1;It doesn’t look anymore expensive as using the SELECT in the script above… perhaps because the field is indexed.
Reblogged this on Sutoprise Avenue, A SutoCom Source.
I appreciate, cause I found just what I used to be having a look for.
You’ve ended my 4 day long hunt! God Bless you man. Have a nice
day. Bye