I recently needed a script that could find duplicate entries in a particular table so I wrote the following bash script…
#!/bin/bash SERVER=db1.some.domain.com PORT=9090 USER=root PASS=somepassword TABLE=sometable FIELD=somefield CSV=$HOME/domain/bin/duplicate-emails.csv DB=somedatabase mysql -h$SERVER --port=$PORT -u$USER --password="$PASS" $DB -e "SELECT $FIELD FROM $TABLE" | sed 's/\t/|/g' > $CSV #sort fields alphabetically, use uniq to return duplicates, use sort again to output sorted list of duplicate entries to file sort $CSV | uniq -d -i | sort -o $CSV
The script does the following…
- Exports all entries for a particular field into a file.
- Uses sort to sort the fields alphabetically.
- Uses uniq to output just the duplicate entries.
- Uses sort again to output the sorted duplicates into a file.
Feel free to use and if you have any suggestions on better/quicker/tidier ways of doing it, please leave a comment.
Seems like you could do this in pure MySQL with a GROUP BY on the field, a COUNT, and HAVING COUNT(*) > 1.
Thanks Daryl, I have used queries like that in the past. I was just wary in this case as it was a large and busy production table and I didn’t want to hog it while the query ran. However looking at the query I could have used.. something like..
EXPLAIN SELECT field, count(id) as cnt FROM table GROUP BY field HAVING cnt > 1;
It doesn’t look anymore expensive as using the SELECT in the script above… perhaps because the field is indexed.
Reblogged this on Sutoprise Avenue, A SutoCom Source.
I appreciate, cause I found just what I used to be having a look for.
You’ve ended my 4 day long hunt! God Bless you man. Have a nice
day. Bye