Pattern matching in C

Pattern matching using "*" and "?" wildcards is the common approach under Windows when it comes to comparing paths. Many applications - lead by DOS, cmd.exe and explorer.exe (the Windows Explorer) - allow users to use wildcards in paths. Users thereby are enabled to select a group of files matching specific criteria, which is often used in the "open document" dialog of Windows applications.

Matching a string agains a pattern is a very simple thing if you've got a framework or interpreted high level language like PHP or C# (or any of the .NET Framework languages). Those languages already provide classes or functions for using Regular Expressions (RegEx) and thus allow for very complex string comparison, string matching and string replacement features out of the box.

The same topic can become rather complicated when you're (for what reason ever) are limited to pure C and don't want or just can't include complex classes. As I had the same problem and luckily solved it, I decided to provide you the surprisingly very simple C code for matching a string against a pattern containing wildcard if you like.

The below C code matches a wide char (= 16-bit unicode) string against a wide char pattern. The return value is true if the string matches, otherwise false. The code may easily be adopted to other encodings like 8-bit ANSI if required.

bool matchWideString( const wchar_t *wzString, const wchar_t *wzPattern ){
  switch (*wzPattern){
    case L'\0':
      return !*wzString;
    case L'*':
      return matchWideString(wzString, wzPattern+1) ||
             ( *wzString && matchWideString(wzString+1, wzPattern) );
    case L'?':
      return *wzString &&
             matchWideString(wzString+1, wzPattern+1);
    default:
      return (*wzPattern == *wzString) &&
             matchWideString(wzString+1, wzPattern+1);
  }
}

Rules:

The algorithm...
...is case-sensitive
...doesn't require wildcards at all
...accepts ? to match a single character
...accepts * to match zero or multiple characters
...accepts ?* to match one or multiple characters :)

Examples:

matchWideString( L"Blah", L"?la?" ); // returns true
matchWideString( L"text_document.doc", L"*.doc" ); // returns true
matchWideString( L"text_document.txt", L"*.doc" ); // returns false
matchWideString( L"test", L"TEST" ); // returns false (remember it's case-sensitive!)
matchWideString( L"test", L"test" ); // returns true