Regular Expressions in SQL Learning SQL by Alan Beaulieu Recall using the LIKE operator. Wildcard Character Matches ------------ ------ _ Exactly one character % Any number of characters (including zero) Search Expression Interpretation ------------- ------ F% Strings beginning with "F" %t Strings ending with "t" %bas% Strings containing the substring "bas" __t_ Four-character string with a "t" in the third position The following search expression specifies strings containing an "a" in the second position and followed by an "e" at any other position in the string. sqlite> select Name, BaseSalary from salaries where Name like '_a%e%'; The following query finds all employees whose name begins with "X" or "Z". sqlite> select Name, Institution from salaries where Name like 'X%' or Name like 'Z%'; If the wildcard characters don't provide enough flexibility, you can use regular expressions to build search expressions. A regular expression is, basically, "a search expression on steroids". In SQLite on mathvnc, Dr. Mertz has installed regexp. The next step is to load it. sqlite> .load /usr/lib/sqlite3/pcre.so The following query returns the same thing as previous query. sqlite> select Name, Institution from salaries where Name regexp '^[XZ]'; In MySQL, it is already preinstalled. REGULAR EXPRESSIONS There are several operators for regular expressions, listed here: http://www.gnu.org/software/gawk/manual/html_node/Regexp-Operators.html Carrot The carrot means: ^ matches at the start of a string The following two queries return different records. sqlite> select * from salaries where Name regexp '^ZHI'; sqlite> select * from salaries where Name regexp '^Zhi'; Dollar Sign The dollar sign means: $ matches at the end of a string sqlite> select Name, Position from salaries where Name regexp 'AN$'; sqlite> select Name, Position from salaries where Name regexp 'an$'; error: returns names that end with yua, Zwi. Dot The dot means: . matches any one character '.u' matches any single character followed by a 'u' in the string. Using concatenation, we can make a regular expression such as 'Z.I' which matches any three-character sequence that begins with 'Z' and ends with 'I'. sqlite> select Name, Position from salaries where Name regexp 'Z.I'; sqlite> select Name, Position from salaries where Name regexp '.u'; Character List The [ ] means: [ ] matches any one character from inside the list sqlite> select Name, Institution from salaries where Name regexp '[xz]'; Alternation Operator The | means: | matches one of several different possibilities The following query matches any string that starts with 'P' or contains any of the lowercase English vowels. sqlite> select Name, Position from salaries where Name regexp '^P|[aeiouy]'; Several Consecutive Occurrences The * means: * matches 0 or more occurrences in a row The + means: + matches 1 or more occurrences in a row The ? means: ? matches 0 or 1 occurrences Specifying Number of Occurrences There are three ways of doing this: {n} exactly n repetitions {n,} n or more repetitions {n,m} between n and m repetitions