LaTeX Reference Checker
Bash Script to Find Unused Labels and Missing Citations
Overview
When preparing research manuscripts, it’s common to accumulate unused labels, orphaned citations, and commented-out content. This bash script provides a comprehensive audit of your LaTeX document, identifying:
- Unreferenced figures, tables, and equations
- Unused bibliography entries
- Missing citations (cited but not in
.bibfile) - All while properly ignoring commented lines
Key Feature: The script preserves the sequence in which labels appear in your document, making it easier to locate and manage them.
The Problem
During manuscript development, you might encounter:
| Issue | Example | Impact |
|---|---|---|
| Unreferenced labels | \label{fig:old_analysis} never used |
Clutters document, confuses reviewers |
| Commented labels counted | % \label{tab:removed} still detected |
False positives in checks |
| Missing citations | \cite{smith2023} but not in .bib |
Compilation errors |
| Unused bibliography | References added but never cited | Inflates reference count |
Manual checking across 50+ pages with multiple revisions becomes impractical.
The Solution
Features
| Feature | Description |
|---|---|
| Comment-aware | Ignores both full-line and inline comments |
| Sequence-preserving | Shows labels in document order |
| Comprehensive | Checks figures, tables, equations, and citations |
| Multiple reference styles | Supports \ref, \autoref, and \eqref |
| Lightweight | Pure bash, no dependencies |
| Fast | Processes typical manuscripts in <1 second |
Installation and Usage
- Save the script - Copy the code below and save as
check_latex_refs.sh - Make it executable - Run
chmod +x check_latex_refs.sh - Run the checker - Execute
./check_latex_refs.sh manuscript.tex
Step 1: Create the Script
VS Code: Open integrated terminal with Ctrl+ ` (backtick)
Copy the following code and save it as check_latex_refs.sh:
#!/bin/bash
if [ $# -eq 0 ]; then
echo "Usage: $0 <latex_file.tex>"
exit 1
fi
TEXFILE=$1
# Remove comments in two steps:
# Step 1: Remove lines that start with % (including whitespace before %)
# Step 2: Remove inline comments (everything after % that's not \%)
TEXCONTENT=$(grep -v '^\s*%' "$TEXFILE" | sed 's/\([^\\]\)%.*$/\1/')
echo "Checking references in: $TEXFILE"
echo "========================================"
# Tables (preserving order, excluding comments)
echo -e "\n=== TABLES ==="
TAB_LABELS=$(echo "$TEXCONTENT" | grep -oP '\\label\{tab:\K[^}]+')
TAB_LABELS_UNIQ=$(echo "$TAB_LABELS" | awk '!seen[$0]++')
# Catches both \ref and \autoref
TAB_REFS=$(echo "$TEXCONTENT" | grep -oP '\\(auto)?ref\{tab:\K[^}]+' | awk '!seen[$0]++')
TAB_COUNT=$(echo "$TAB_LABELS_UNIQ" | grep -v '^$' | wc -l)
REF_COUNT=$(echo "$TAB_REFS" | grep -v '^$' | wc -l)
echo "Total labels: $TAB_COUNT"
echo "Total refs: $REF_COUNT"
echo "Unreferenced tables:"
while IFS= read -r label; do
if ! echo "$TAB_REFS" | grep -qx "$label"; then
echo " tab:$label"
fi
done <<< "$TAB_LABELS_UNIQ"
# Figures (preserving order, excluding comments)
echo -e "\n=== FIGURES ==="
FIG_LABELS=$(echo "$TEXCONTENT" | grep -oP '\\label\{fig:\K[^}]+')
FIG_LABELS_UNIQ=$(echo "$FIG_LABELS" | awk '!seen[$0]++')
# Catches both \ref and \autoref
FIG_REFS=$(echo "$TEXCONTENT" | grep -oP '\\(auto)?ref\{fig:\K[^}]+' | awk '!seen[$0]++')
FIG_COUNT=$(echo "$FIG_LABELS_UNIQ" | grep -v '^$' | wc -l)
FIGREF_COUNT=$(echo "$FIG_REFS" | grep -v '^$' | wc -l)
echo "Total labels: $FIG_COUNT"
echo "Total refs: $FIGREF_COUNT"
echo "Unreferenced figures:"
while IFS= read -r label; do
if ! echo "$FIG_REFS" | grep -qx "$label"; then
echo " fig:$label"
fi
done <<< "$FIG_LABELS_UNIQ"
# Equations (preserving order, excluding comments)
echo -e "\n=== EQUATIONS ==="
EQ_LABELS=$(echo "$TEXCONTENT" | grep -oP '\\label\{eq:\K[^}]+')
EQ_LABELS_UNIQ=$(echo "$EQ_LABELS" | awk '!seen[$0]++')
# Catches \ref, \autoref, and \eqref
EQ_REFS=$(echo "$TEXCONTENT" | grep -oP '\\((auto|eq))?ref\{eq:\K[^}]+' | awk '!seen[$0]++')
EQ_COUNT=$(echo "$EQ_LABELS_UNIQ" | grep -v '^$' | wc -l)
EQREF_COUNT=$(echo "$EQ_REFS" | grep -v '^$' | wc -l)
echo "Total labels: $EQ_COUNT"
echo "Total refs: $EQREF_COUNT"
echo "Unreferenced equations:"
while IFS= read -r label; do
if ! echo "$EQ_REFS" | grep -qx "$label"; then
echo " eq:$label"
fi
done <<< "$EQ_LABELS_UNIQ"
# Citations (excluding comments)
echo -e "\n=== CITATIONS ==="
BIBFILE=$(echo "$TEXCONTENT" | grep -oP '\\bibliography\{\K[^}]+' | head -1)
if [ -n "$BIBFILE" ]; then
[[ "$BIBFILE" != *.bib ]] && BIBFILE="${BIBFILE}.bib"
if [ -f "$BIBFILE" ]; then
echo "Using bibliography file: $BIBFILE"
BIB_ENTRIES=$(grep -oP '@\w+\{\K[^,]+' "$BIBFILE" | awk '!seen[$0]++')
CITATIONS=$(echo "$TEXCONTENT" | grep -oP '\\cite[tp]?\{\K[^}]+' | tr ',' '\n' | sed 's/^[[:space:]]*//' | awk '!seen[$0]++')
BIB_COUNT=$(echo "$BIB_ENTRIES" | grep -v '^$' | wc -l)
CITE_TOTAL=$(echo "$TEXCONTENT" | grep -oP '\\cite[tp]?\{[^}]+\}' | wc -l)
CITE_UNIQ=$(echo "$CITATIONS" | grep -v '^$' | wc -l)
echo "Total bib entries: $BIB_COUNT"
echo "Total citation commands: $CITE_TOTAL"
echo "Unique cited keys: $CITE_UNIQ"
echo "Uncited bibliography entries:"
while IFS= read -r entry; do
if ! echo "$CITATIONS" | grep -qx "$entry"; then
echo " $entry"
fi
done <<< "$BIB_ENTRIES"
echo "Missing bibliography entries (cited but not in .bib):"
while IFS= read -r cite; do
if ! echo "$BIB_ENTRIES" | grep -qx "$cite"; then
echo " $cite"
fi
done <<< "$CITATIONS"
else
echo "Bibliography file not found: $BIBFILE"
fi
else
echo "No bibliography file found in document"
fi
echo -e "\n========================================"
echo "Check complete!"Step 2: Make it Executable
chmod +x check_latex_refs.shStep 3: Run the Checker
./check_latex_refs.sh manuscript.texReplace manuscript.tex with your actual LaTeX filename.
Example Output
Here’s what the script reports for a typical manuscript:
Checking references in: paper.tex
========================================
=== TABLES ===
Total labels: 11
Total refs: 8
Unreferenced tables:
tab:appendix_data
tab:extra_results
tab:summary_stats
=== FIGURES ===
Total labels: 6
Total refs: 6
Unreferenced figures:
=== EQUATIONS ===
Total labels: 15
Total refs: 12
Unreferenced equations:
eq:supplementary
eq:variance
eq:alt_form
=== CITATIONS ===
Using bibliography file: references.bib
Total bib entries: 45
Total citation commands: 52
Unique cited keys: 38
Uncited bibliography entries:
smith2020old
jones2019unused
brown2018extra
Missing bibliography entries (cited but not in .bib):
nguyen2023missing
========================================
Check complete!
The script detects references made with \ref{fig:one}, \autoref{tab:results}, or \eqref{eq:main} - all are counted correctly.
Advanced Usage
Check Multiple Files
Create check_all.sh:
#!/bin/bash
for file in *.tex; do
echo "File: $file"
./check_latex_refs.sh "$file"
echo ""
doneGenerate Timestamped Reports
# Create dated log
./check_latex_refs.sh paper.tex > "audit_$(date +%Y%m%d_%H%M%S).log"
# Compare before/after revision
./check_latex_refs.sh paper.tex > before_revision.log
# ... make changes ...
./check_latex_refs.sh paper.tex > after_revision.log
diff before_revision.log after_revision.logGit Pre-commit Hook
Add to .git/hooks/pre-commit:
#!/bin/bash
# Run check
./check_latex_refs.sh main.tex > ref_check.log
# Count unreferenced items
UNREFERENCED=$(grep -c "^ " ref_check.log)
if [ $UNREFERENCED -gt 0 ]; then
echo "Warning: $UNREFERENCED unreferenced items found"
cat ref_check.log
read -p "Continue commit? (y/n) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
exit 1
fi
fiMakefile Integration
# Makefile
.PHONY: check clean all
all: manuscript.pdf check
manuscript.pdf: manuscript.tex
pdflatex manuscript.tex
bibtex manuscript
pdflatex manuscript.tex
pdflatex manuscript.tex
check:
@./check_latex_refs.sh manuscript.tex
clean:
rm -f *.aux *.log *.bbl *.blg *.out *.toc
audit:
@./check_latex_refs.sh manuscript.tex > audit_$(shell date +%Y%m%d).log
@cat audit_$(shell date +%Y%m%d).logUsage:
make # Compile and check
make check # Run reference check only
make audit # Generate timestamped auditExtensions and Customizations
Add Line Numbers
Modify the unreferenced item display to show line numbers:
while IFS= read -r label; do
if ! echo "$TAB_REFS" | grep -qx "$label"; then
LINE=$(grep -n "\\label{tab:$label}" "$TEXFILE" | cut -d: -f1)
echo " tab:$label (line $LINE)"
fi
done <<< "$TAB_LABELS_UNIQ"Output:
Unreferenced tables:
tab:appendix_data (line 145)
tab:extra_results (line 203)
Add Section Labels
Add this after the equations section:
# Sections (preserving order, excluding comments)
echo -e "\n=== SECTIONS ==="
SEC_LABELS=$(echo "$TEXCONTENT" | grep -oP '\\label\{sec:\K[^}]+')
SEC_LABELS_UNIQ=$(echo "$SEC_LABELS" | awk '!seen[$0]++')
SEC_REFS=$(echo "$TEXCONTENT" | grep -oP '\\ref\{sec:\K[^}]+' | awk '!seen[$0]++')
SEC_COUNT=$(echo "$SEC_LABELS_UNIQ" | grep -v '^$' | wc -l)
SECREF_COUNT=$(echo "$SEC_REFS" | grep -v '^$' | wc -l)
echo "Total section labels: $SEC_COUNT"
echo "Total section refs: $SECREF_COUNT"
echo "Unreferenced sections:"
while IFS= read -r label; do
if ! echo "$SEC_REFS" | grep -qx "$label"; then
echo " sec:$label"
fi
done <<< "$SEC_LABELS_UNIQ"Color Output
Add ANSI color codes for better visibility:
# Add at top of script (after shebang)
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Use in output
echo -e "${BLUE}=== TABLES ===${NC}"
echo "Total labels: $TAB_COUNT"
echo "Total refs: $REF_COUNT"
if [ "$TAB_COUNT" -eq "$REF_COUNT" ]; then
echo -e "${GREEN}All tables referenced!${NC}"
else
echo -e "${RED}Unreferenced tables:${NC}"
while IFS= read -r label; do
if ! echo "$TAB_REFS" | grep -qx "$label"; then
echo -e " ${YELLOW}tab:$label${NC}"
fi
done <<< "$TAB_LABELS_UNIQ"
fiVS Code Task Integration
Add to .vscode/tasks.json:
{
"version": "2.0.0",
"tasks": [
{
"label": "Check LaTeX References",
"type": "shell",
"command": "./check_latex_refs.sh",
"args": ["${file}"],
"problemMatcher": [],
"presentation": {
"reveal": "always",
"panel": "new"
}
}
]
}Comparison with Alternatives
| Tool | Pros | Cons | Best For |
|---|---|---|---|
| This script | Fast, customizable, comment-aware | Single file only | Quick audits, CI/CD |
refcheck package |
LaTeX-integrated, visual markers | Must recompile | During writing |
chktex |
Comprehensive linting | Verbose output | Deep analysis |
| VS Code LaTeX Workshop | Real-time, editor-integrated | Editor-specific | Active editing |
Limitations and Workarounds
- Single file only: Doesn’t traverse
\input{}or\include{}commands - Standard prefixes: Assumes
tab:,fig:,eq:,sec:conventions - Reference commands: Detects
\ref,\autoref, and\eqref - Citation commands: Only detects
\cite,\citep,\citet
Workarounds
For multi-file projects:
# Combine files first
cat main.tex chapter*.tex appendix.tex > combined.tex
./check_latex_refs.sh combined.tex
rm combined.texFor custom prefixes:
# Modify the grep patterns in the script
# Change tab: to tbl:
grep -oP '\\label\{tbl:\K[^}]+'For biblatex:
# Change BIBFILE detection:
BIBFILE=$(echo "$TEXCONTENT" | grep -oP '\\addbibresource\{\K[^}]+' | head -1)Troubleshooting
Issue: Script shows no output
# Check file exists and has content
ls -la manuscript.tex
wc -l manuscript.tex
# Check for labels
grep "\\label{" manuscript.tex | head -5
# Run with debug mode
bash -x check_latex_refs.sh manuscript.texIssue: Bibliography not found
# Check bibliography command exists
grep "bibliography" manuscript.tex
# Verify .bib file location
ls -la *.bibIssue: Commented lines still appear
# Try alternative comment removal
TEXCONTENT=$(grep -v '^\s*%' "$TEXFILE" | sed 's/[^\\]%.*$//')
# Or use Perl
TEXCONTENT=$(perl -pe 's/([^\\])%.*$/$1/' "$TEXFILE")Quick Reference
| Task | Command |
|---|---|
| Run checker | ./check_latex_refs.sh manuscript.tex |
| Make executable | chmod +x check_latex_refs.sh |
| Save to log | ./check_latex_refs.sh paper.tex > audit.log |
| Check multiple | for f in *.tex; do ./check_latex_refs.sh "$f"; done |
| Debug mode | bash -x check_latex_refs.sh manuscript.tex |
- Run the checker before every submission
- Use timestamped logs to track cleanup progress
- Integrate with Git hooks for automatic checking
- Customize prefixes for your project conventions