I recently needed a script that could find duplicate entries in a particular table so I wrote the following bash script…
#!/bin/bash SERVER=db1.some.domain.com PORT=9090 USER=root PASS=somepassword TABLE=sometable FIELD=somefield CSV=$HOME/domain/bin/duplicate-emails.csv DB=somedatabase mysql -h$SERVER --port=$PORT -u$USER --password="$PASS" $DB -e "SELECT $FIELD FROM $TABLE" | sed 's/\t/|/g' > $CSV #sort fields alphabetically, use uniq to return duplicates, use sort again to output sorted list of duplicate entries to file sort $CSV | uniq -d -i | sort -o $CSV
The script does the following…
- Exports all entries for a particular field into a file.
- Uses sort to sort the fields alphabetically.
- Uses uniq to output just the duplicate entries.
- Uses sort again to output the sorted duplicates into a file.
Feel free to use and if you have any suggestions on better/quicker/tidier ways of doing it, please leave a comment.