TTokeniser

TTokeniser is a simple class to split strings into useful bits

Example

  | T := ttokeniser.create('=','Rover=Dog|black|collie|7|||');
  | if T.anymore then memo.lines.add('['+t.nextbit+']');   // Get Rover
  | T.setsplitter('|');                                    // Change splitter
  | while t.anymore do begin                               // work through rest
  |   memo.lines.add('['+t.nextbit+']');                   // bits about Rover
  | end;
  |
  | T.initialise('~-.0123456789','12.34 up to 56.78');     // reuse tokeniser
  | while t.anymore do begin                               // result is just
  |   memo.lines.add('['+t.nextbit+']');                   // the two numbers
  | end;
  |
  | T.free;                                                // don't forget!

Splitter flexibility

'' ... whitespace (ascii <= 32)
'a' ... split if character a found
'abc' ... split if characters a or b or c found

'-...' ... prefix with minus sign to invert
eg '-0123456789' will split on next non digit

'~...' ... prefix with tilde to squash consecutive splits
eg '~' treats '        ' as a single bit of whitespace character
eg '~-0123456789' will break on any non digit and series of non digits
(Compare splitting '12.34     56.78' with and without ~)
~ must preceede - if present

Notes

There is notionally 'more' when the last char is a splitter
Splitting ('1,2,3') with ',' gives 1 2 3
Splitting ('1,2,3,') with ',' gives 1 2 3 blank

Splitting only works on single characters so if you want to extract say the minimum from 'MAX=198 MIN=124 AVG=155' you have to split on '~= ', find the bit 'MIN' then get the next bit as your number.

Handy utility functions

Split String to array
Join Array to String

Function reference

function Join(const ItemCount:integer;const Splitter:string; const StrArray:array of string):string;
Stick the first ITEMCOUNT number of elements from the array to make a string 'split' by the 'splitter'.
eg join(2,'!',['foo','bar','fox']) --> 'foo!bar'
Converse to split()
function Split(const aString,Splitter:string; var StrArray:array of string):integer;
Split the given string into an array
Splitter can be a simple character or more sophisticated
See TTokeniser.SetSplitter for details
The return value is the number of elements put into StrArray
function divide(const Source:String; const targets:array of string;var parts:array of string):string;
Divide the source up by a sequence of string tokens
Example
   srce := 'SELECT something FROM else ORDER BY sort';
   targetsarray := ['SELECT','FROM','WITH','ORDER','BY']; // Note: ORDER BY in two bits
   beforeselect := divide(srce,targetsarray,partsarray);
   fromclause   := partsarray[1];                         // 'cos FROM is targetsarray[1]
The function return value is anything before a target is found
The results in parts match are the bits that follow the targets
Whitespace is the lower level splitter
Case is ignored when looking for targets

constructor TTokeniser.Create;
default constructor
constructor TTokeniser.Create(const Splitter,aString:string);
create ready to go

function TTokeniser.AnyMore:boolean;
Return true if there is more string left. NB If the last splitting character was the last character of the source string then there is 'more' even though it is a blank.
Splitting ('1,2,3') with ',' gives 1 2 3
Splitting ('1,,3,') with ',' gives 1 blank 3 blank
procedure TTokeniser.Initialise(const Splitter,aString:string);
Set up string and splitter but don't do anything yet
function TTokeniser.isCharASplitter(var ix:integer):boolean;
Return true if the character at given index is a splitter
function TTokeniser.LastSplitChar:string;
Return the character actually used to split.
Example
  T := TTokeniser.create('-0123456789','12.34/56.789');
  N := T.NextBit;
  if T.LastSplitChar = '.' then N := N + '.'+T.NextBit;
  (do something with N as a floating point number)
function TTokeniser.NextBit(const Splitter:string):string;
Split again but with a different splitter
function TTokeniser.NextBit:string;
'Same again' split
function TTokeniser.Position:integer;
Report where the last split was found
-1 = not initialised or past end
0 = not started
1 = 1st character
eg after splitting 'foo+bar' with '+' this would return 4
function TTokeniser.Remainder:string;
Get the rest of the string Note : does not affect the index so remainder followed by nextbit will work fine
procedure TTokeniser.reset;
Set index back to start
procedure TTokeniser.SetSplitter(const Splitter:string);
Set up splitter CHARACTER(S) '' ... white space
'-' ... not white space (!)
'a' ... character 'a'
'abc' ... any of a,b or c
'-xyz' ... anything but x,y or z
Prefix with tilde (~) to treat consecutive splitters as one
eg '~o'on 'book' --> b k whereas just 'o' --> b blank k


The master version of this code is at www.eminent.demon.co.uk