String processing

Before we start, try this function and the following procedure:

function InputBox ( const Caption, Prompt, Default : string ) : string;

procedure ShowMessage ( const Text : string ) ;

String declaration

Typical type for work with a text is the string:

var
  s : string;

The next picture shows, how this (basic) type of the string is written in the memory. In the first character (zero index), there are length of the string:

#4 'A' 'l' 'í' 'k' ? ? ? ? ? ?

There are more types of the string. The above exactly represents the shortstring, but the same work for the ansistring from the Pascal point-of-view. We can address each letter in the string as in array, use the square brackets for the index; for the ansistring, the first element of this one-dimensional array is a word (zero address - length of the string), but the rest of them are bytes. For the widestring, all of them are words - in this case, the UTF-16 coding is used. For the UTF-8, where each character has different size, the unicodestring type has to be used (later version of the Delphi), but indexing is more complicated (probably slow). The string itself is a generic type (equal to shortstring up to Delphi 1, later ansistring, now default as the unicodestring). For more info about strings, visit the Delphi producer page.

For the compatibility with the C language family, the pchar has been introduced. This solution has all problems of the C: This is only pointer to a null-terminated string (it means, that the #00 is the last element of this string). You have to ask for memory allocation after program execution and this type of string cannot contain any character with the zero code - except the last, which has to be this. In real, any work with memory allocation and pointers could be hard to debug, because if you point to wrong direction and rewrite for example part of the Delphi environment (operating system should prevent to rewrite part of the system, as was possible in Windows 98), only safe solution is to restart computer, or at least logout the user and log in as another, if computer cannot be restarted (server).

Older version of Delphi reserve memory for the maximum size of the string, which was very memory consuming; newer version creates dynamic string size, which is slow. From this reason it is recommended to set the maximum length of the structure. This is similar as for array, but we need to set only the upper limit:

var
  s : string[20];

This is the maximum length, the actual may vary from a zero to the actual value, leaving the rest of the reserved memory unused. The length is counted in number of characters, except for the unicodestring.

If we have switched on the "range check" during compilation, then we cannot address a memory outside limits; but everything will become slow. Default is no range check (see compiler options).

If the string or array variable is declared as local and you write outside limits (of index), it will overwrite the returning addresses of running program, resulting in jumping to some strange address in the memory, probably different application, or system; typical result is some system error.

String joining

We can assign a new value to a string by standard assignment command ( := ). Let us have the a, b and c variables, declared as string. Than we can use:

a := b;

For connecting of two strings, write:

a := b + c;

The result (a) is created by attaching the second string after the end of the first string; this function is often called concatenate.

Repetition: The initial value can be assigned by writing between apostrophes::

a := 'Alík';

Strange characters can be written either by code (use the "#" (hash) prefix), or the ^ sign for the control characters (char. number 1 to 26 mainly), but outside the apostrophes - write the closing apostrophe and without spaces the required character, then you can continue with another apostrophe (again, no spaces) and then the rest of the string. For example, a tab symbol (for creating columns using tabulator, often used while writing a text, prepared for the Excel) has code number 9; we can write

b := '3,29'#9'2,56'#9'4,12';

This (in real - the constant definition) cannot be broken by any way, so we have to forgot a method like in C, with backspace. If you really need to write very long string, you have to trust the Borland and use the + sign (plus) to write very long string as an expression. Compiler usually joins all strings during compilation, as an optimalization.

String comparing

You can compare two strings simply by

a < b

It will compare this strings byte by byte, till finding the difference, or the end of one of strings. The shorter string is then lower. Each character is then compared using the code, so 'A'<'B', but 'Z'<'a', because whole lowercase alphabet is above the uppercase one. For correct comparing, we need the Windows API; we will continue here later.

The "copy" function

For example

s:=copy('Good day to all',6,3);

will copy part of the first string (there are mostly variable name) from the 6th character, with the length of three characters ('day'). More typical is to have a variable name or expression in those positions.

If the rest of the string is shorter then required for the copy, the rest of the string will be returned. For example, if the length of the string is declared as 30, then the command

s:=copy(s1,4,999);

will copy everything from the 4th character to end of the string (or none, if there are less then 4 characters).

The "lengt" function

returns the actual length of the string:

l :=length('Good day to all'); 

- returns 15. For the most of the Pascal characters, the length is in the begin of the variable in memory; for the pchar and the unicodestring, the actual value is evaluated by calling an appropriate function.

The "pos" function

Looks "for needle in haystack", so the first parameter is the substring, we are searching for in the second string. The second parameter should be variable. We can look for a single character, as well as for a part of text, for example:

i:=pos('txt', s);

If there are any presence of the substring (the first) in the string (the second), this function returns its position, otherwise zero. This function is often used to test of any appearance - if returns zero, there are no one. For example, we can look for the a substring, use:

if pos(a,s)>0 then showmessage('We have it!');

The "delete" procedure

It is a procedure; procedure gives back no result, so it cannot be a part of an expression.

This procedure will delete characters from a string variable (the first parameter) from character on the given position (second parameter), the third parameter tells, how many; if too many, than to end of the string. String has to be variable, the integer values of the other two can be represented by numbers, variables or any integer expression.

For example:

s := 'Good day to all';
delete(s,6,7);

will leave s with the value of 'Good all'.

Task: Create form with two edit and a button. Assume, that user will write a sentence (few words, split by space) to the first edit. By pressing the button, move first word from the sentence to the second edit (and erase from the sentence, including space).

The "insert" procedure

... will insert a string (the first parameter, can be string variable, text in apostrophes or expression) into the variable, given as second parameter, from the position (in the second parameter), given by third parameter (integer variable or expression) - the rest characters in the variable will move after insert:

s:='Good day to all';
insert('very',s,6);

will result to 'Good veryday to all' (missing space).

The Memo component

If you need to work with a simple text (more lines, but no formatting), you can use the Memo component. In this component, each line is in a separate string (in real, in the Lines variable - this is the TStrings type).

Comments:

TMemo.ScrollBars
- default: ssNone, recommended: ssBoth.
- if you have no vertical scrollbar and there are too many lines, you can move through them by cursor up/down arrows.
- if you have no horizontal scrollbar, then after loading longer lines into the memo, lines will break to fit the size. There are no way back from this, even if you try to change a horizontal size after this (when program is running, it can be possible, for example by the "align-bottom" activation). The breaking of the lines is permanent.

TMemo.Lines
- the main component, allowing access to most of the memo function.
Use for direct access to the strings:
- - Memo1.Lines[5], Memo1.Lines[j]; BTW, the correct would be insert the name of the array, like Memo1.Lines.Strings[5] , but this is default, making the first variant possible.
- the lines are numbered from the zero, the actual count of the lines is in the TMemo.Lines.Count variable (it means, that the last index is lower then one, compared with this result).

- - TMemo.Lines.Add
- The structure of the lines are fully dynamic; the Add procedure will add new line to the last position, the Insert (for example, TMemo.Lines.Insert) will add to any position (the first parameter is the index of the new, included string, the second is string itself - if the first parameter is bigger then Count, the string will be add to the end of the structure, = as the last). The Delete procedure will delete a line from the text (the rest will be moved).

- - TMemo.Lines.Clear
- will clear all of the memo content. Often use while start, then the Add is used.

- - TMemo.Lines.LoadFromFile
- load complete new content of lines from the file. Text file assumed. If there are no horizontal scrollbar, lines can be reformated immediately.

- - TMemo.Lines.SaveToFile
- for save the whole content.

For the last two function, we can use the dialogs to choose a FileName parameter.


SaveDialogOpenDialogTOpenDialog.Execute, TSaveDialog.Execute
- the easy way how to select, where to save or from where to load. This is on the Dialogs ribbon. The image is only symbolic, disappear after program execute. To use dialog, you have to add a button; the Execute method should be called. This is a function, the result is true, if the user press Enter or OK (this is possible only after the correct parameter selection), or false, if the Cancel or the ESC key is pressed. For dialogs, after success, the FileName property is set. To execute the appropriate function only after user choose OK, we should test the Execute result; so, the typical phrase is:

if OpenDialog1.Execute
then Memo1.Lines.LoadFromFile(OpenDialog1.FileName);

The same for save. Before program execution, it is important to set the dialog variables:

The Filter property
Doubleclick or click on the "dots" for open the filter setting window. Universal filter is *.*; while not set, no files will be visible in the dialog (load, save). For our (text file), we can write:

Text files *.txt
Initialization files *.ini
All files *.*

The DefaultExt property
When filled, will be used, if chosen name contain no dot ( . ) character. For example, if set to ".txt", the "list_of_students" will be saved as "list_of_students.txt", while "list.of.students" will be saved without extension added.

The DefaultDir property
If set, this directory will be open. Empty for most application, because we don't know the user directory structure.

Task: Create form with a memo, savedialog, opendialog, buttons for load and save, and try to edit some text files.