Text Search Algorithm in C++ - Explanation, Example and Code

The Text Search algorithm is a fundamental method in text processing and information retrieval. This algorithm allows us to locate and identify occurrences of a substring (or pattern) within a larger piece of text.

How It Works

  1. Start with a larger piece of text (or document) and a substring to search for.
  2. Iterate through each character of the text sequentially.
  3. Compare the substring with a portion of the text that has the same length as the substring. If a match is found, record the current position.
  4. Move to the next position and continue the comparison.

Example

Consider the text: "Let's search for a substring within this text to see how the algorithm works."

And the substring to search for: "substring"

  1. Start at position 0. Compare Let' with "substring". No match.
  2. Move to position 1. Compare et's with "substring". No match.
  3. Move to position 2. Compare t's " with "substring". No match.
  4. Move to position 3. Compare 's s with "substring". No match.
  5. Move to position 4. Compare se with "substring". No match.
  6. Move to position 5. Compare sea with "substring". No match.
  7. Move to position 6. Compare earc with "substring". No match.
  8. Move to position 7. Compare arch with "substring". No match.
  9. Move to position 8. Compare rch " with "substring". No match.
  10. Move to position 9. Compare ch w with "substring". No match.
  11. Move to position 10. Compare h wi with "substring". No match.
  12. Move to position 11. Compare " wit with "substring". No match.
  13. Move to position 12. Compare with with "substring". No match.
  14. Move to position 13. Compare ithi with "substring". No match.
  15. Move to position 14. Compare thin with "substring". No match.
  16. Move to position 15. Compare hinn with "substring". No match.
  17. Move to position 16. Compare in t with "substring". No match.
  18. Move to position 17. Compare n th with "substring". No match.
  19. Move to position 18. Compare " thi with "substring". No match.
  20. Move to position 19. Compare this with "substring". No match.
  21. Move to position 20. Compare his " with "substring". No match.
  22. Move to position 21. Compare is t with "substring". No match.
  23. Move to position 22. Compare s te with "substring". No match.
  24. Move to position 23. Compare ubst with "substring". No match.
  25. Move to position 24. Compare bstr with "substring". No match.
  26. Move to position 25. Compare stre with "substring". No match.
  27. Move to position 26. Compare trin with "substring". No match.
  28. Move to position 27. Compare ring with "substring". Match found, record position 27.

The substring "substring" is found at position 27 within the text.

Example Code in C++

#include <iostream>
#include <string>
#include <vector>

std::vector<int> textSearch(const std::string& text, const std::string& query) {
    std::vector<int> positions;

    for (int i = 0; i <= text.length() - query.length(); ++i) {
        int j = 0;
        while (j < query.length() && text[i + j] == query[j]) {
            ++j;
        }
        if (j == query.length()) {
            positions.push_back(i);
        }
    }

    return positions;
}

int main() {
    std::string text = "Let's search for a substring within this text to see how the algorithm works.";
    std::string query = "substring";

    std::vector<int> result = textSearch(text, query);

    std::cout << "Substring found at positions: ";
    for (int pos : result) {
        std::cout << pos << " ";
    }
    std::cout << std::endl;

    return 0;
}

In this example, the textSearch function is used to find the positions of the substring "substring" within the text. The result will be a vector containing the starting positions of the matches.