How to parse string in java?
Usually in natural language processing, a text is parsed. Parsing means the division of text into discrete set of words. These discrete parts are known as tokens. Each token has a specific semantic meanings.
In Java the “StringTokenizer” class provides the parsing functionality. This initial parsing functionality is called the lexical analysis thus code performing the lexical analysis is called lexer (lexical analyzer) or scanner.
The “StringTokenizer” class has the following constructors:
- StringTokenizer(String str)
- StringTokenizer(String str, String delimiters)
- StringTokenizer(String str, String delimiters, boolean delimAsToken)
What is parsing mechanism built in these constructors?
Two strings are passed, one string is that you want to tokenized and the other one is considered as a delimiter string. The delimiter is the string that separate the tokens in the given string to be parsed. In the first version above, there is a default delimiter used whereas in the second and third constructor given above has delimiter also provided by the user passed in the constructor as parameters. There is also a main difference between the third and the first two constructors, that is the first two constructors do not list the delimiter as a token whereas the third constructor also lists the delimiter as a token in the string to be tokenized.