Thursday, April 12, 2007

Finding Database Tables used in ASP files - grep.exe sort.exe uniq.exe

We have a few websites sharing a single database.
We need to split the database, so that each website uses an separate database.

The problem was to find the set of tables used by a single website.

Instead of manually going through all the *.asp files,
I harness the power of Unix command line tools: grep

We use a standard naming convention for all database table names.
We prefix all database table names with "tbl".

Open Windows Command Prompt:
cd C:\websites\www
grep -o -r tbl[A-Za-z0-9]* *


grep command usage:
Usage: grep [OPTION]... PATTERN [FILE] ...
Search for PATTERN in each FILE or standard input.
Example: grep -i 'hello world' menu.h main.c

Regexp selection and interpretation:
-E, --extended-regexp PATTERN is an extended regular expression
-F, --fixed-strings PATTERN is a set of newline-separated strings
-G, --basic-regexp PATTERN is a basic regular expression
-P, --perl-regexp PATTERN is a Perl regular expression
-e, --regexp=PATTERN use PATTERN as a regular expression
-f, --file=FILE obtain PATTERN from FILE
-i, --ignore-case ignore case distinctions
-w, --word-regexp force PATTERN to match only whole words
-x, --line-regexp force PATTERN to match only whole lines
-z, --null-data a data line ends in 0 byte, not newline

-s, --no-messages suppress error messages
-v, --invert-match select non-matching lines
-V, --version print version information and exit
--help display this help and exit
--mmap use memory-mapped input if possible

Output control:
-m, --max-count=NUM stop after NUM matches
-b, --byte-offset print the byte offset with output lines
-n, --line-number print line number with output lines
--line-buffered flush output on every line
-H, --with-filename print the filename for each match
-h, --no-filename suppress the prefixing filename on output
--label=LABEL print LABEL as filename for standard input
-o, --only-matching show only the part of a line matching PATTERN
-q, --quiet, --silent suppress all normal output
--binary-files=TYPE assume that binary files are TYPE
TYPE is 'binary', 'text', or 'without-match'
-a, --text equivalent to --binary-files=text
-I equivalent to --binary-files=without-match
-d, --directories=ACTION how to handle directories
ACTION is 'read', 'recurse', or 'skip'
-D, --devices=ACTION how to handle devices, FIFOs and sockets
ACTION is 'read' or 'skip'
-R, -r, --recursive equivalent to --directories=recurse
--include=PATTERN files that match PATTERN will be examined
--exclude=PATTERN files that match PATTERN will be skipped.
--exclude-from=FILE files that match PATTERN in FILE will be skipped.
-L, --files-without-match only print FILE names containing no match
-l, --files-with-matches only print FILE names containing matches
-c, --count only print a count of matching lines per FILE
-Z, --null print 0 byte after FILE name

Context control:
-B, --before-context=NUM print NUM lines of leading context
-A, --after-context=NUM print NUM lines of trailing context
-C, --context=NUM print NUM lines of output context
-NUM same as --context=NUM
--colour[=WHEN] use markers to distinguish the matching string
WHEN may be `always', `never' or `auto'.
-U, --binary do not strip CR characters at EOL (MSDOS)
-u, --unix-byte-offsets report offsets as if CRs were not there (MSDOS)

`egrep' means `grep -E'. `fgrep' means `grep -F'.
With no FILE, or when FILE is -, read standard input. If less than
two FILEs given, assume -h. Exit status is 0 if match, 1 if no match,
and 2 if trouble.

Display only Table Names (without FileName):
grep -o -r -h tbl[A-Za-z0-9]* *


Removing duplicate Table Names:
grep -o -r -h tbl[A-Za-z0-9]* * | sort.exe | uniq.exe


Finding only ASP files:
grep --include=*.asp -o -r tbl[A-Za-z0-9]* *