Forth Multiprocessing


C. H. Ting and T. G. Tsuei , August 1998


1. Introduction

Generally, computers hate each other. You can connect all kinds of dumb devices to the computer with the RS232 cable, like terminals, printers, mice, disk drives, etc. But trying to connect two computers with a RS232 cable, they will either NAK each other to death, or not talking at all. It is because all computers are programmed to be masters, and they can issue commands to other devices and send large amounts of data to other devices. They are not designs to take orders from other computers smarter than themselves.

How can we get computers to talk to each other? The hard way is to have an operating system with the network capability. Computers are connected to the network and they will talk according to the protocols defined in the network. However, network operating systems are monsters overpowering any applications intended to be shared among computers.

Forth is a language eminently suitable for the computers to talk to each other. Networks can be constructed easily if the member computers have Forth kernels. Because Forth is a high level machine independent language, different computers can be connected together regardless of the CPU's in the member computers.

2. Computer-Human Interface

Forth is the best computer-human interface. Its text interpreter accepts a line of commands and then executes the command in sequence. Its compiler builds new commands and associates them with lists of existing commands. These are the simplest ways for human to make computers do useful things. In natural languages, complicated syntax and grammatical rules can be instilled for human consumption, but they are not necessary when we have to deal with computers. In fact, syntax and grammatical rules are very difficult for the computer to handle satisfactorily.

In the Forth text interpreter, most implementations are designed to be interactive to the user. therefore, characters typed on the computer keyboard are echoed back to the display screen, and after a line of command is processed, a prompt for next line or an error message is also sent to the screen to continue the interaction with the user.

In most Forth systems, the following rules are followed:

1. Each character received is echoed back.

2. Backspace caused the last entered character to be deleted

3. Command line is processed when carriage-return is received

4. "OK" indicates command line executed correctly.

5. "?" indicates an unknown command.

3. Computer-Computer Interface

When we link computers together, the first thing we have to do is to define precisely how they are to talk to each other. This is the protocol of computer-computer interface. The protocol may be very simple and it may be very elaborate. Most often the protocol defines how computers pass messages to one another. How computers process these messages must be programmed into the applications which handles the inter-computer communication, and depends on how the computers are linked--point-to-point, networks of different topology.

The computer-computer interface is different from the computer-human interface. The most important differences are that generally the input characters are not echoed, it is not necessary to send carriage-returns and line-feeds, and the human readable prompt and error messages must be packaged into the messages sent from one computer to another.

The Forth text interpreter can be easily modified for point-to-point computer-computer interface. It can be the simplest way to connect computers together. Here are the most important modifications to get Forth computers to talk to each other through a serial RS232 cable:

1. Computers only send Forth commands to one another.

2. Characters are not echoed.

3. Command lines are not acknowledged

4. Prompts like 'OK' and error indicators like '?' must be suppressed.

5. 'OK' and '?' are optionally defined as vectored words so that a command expecting these responses can handle the responses by executing them properly.

With these simple guidelines, two computers can talk to each other smoothly, and many computers can be connected to form a network. There is not restriction as to what kind of Forth commands are sent from one computer to another, including colon definitions, the network is adaptive and its capability is extensible. The network can be implemented on any multiprocessor architecture with any network topology.

4. eForth for Multiprocessing

The eForth model was designed with multiprocessing in mind. Hence the echoing of characters and the 'OK' prompting are vectored through user variables. It is very easy to suppress character echo and the 'OK' prompt. The command FILE does exactly these changes so that the host computer can down load files to the eForth system. It also puts the eForth system into the proper mode as a node in a computer network.

Assuming we are building a computer network with only point-to-point connections. To make an eForth system a true node in a computer network, the eForth system must have a multitasking system supporting multiple serial ports and each port is controlled by a concurrent task in the multitasker.

To make this computer network to do useful work, each eForth node must have the required word set for the application. This application word-set can be compiled from the local disk, or can be down-loaded through the network. After all the nodes are thus initialized, they can send commands to each other and divide the work load among themselves. If the results were to be collected to a node handling the output, they might be passed through the net in the form of Forth commands. Small amount of intermediate numerical data can be passed on the stacks. Large amount of data can be passed between two computers using DUMP-like commands.

The central problem and the challenge in programming a multiprocessing system is to divide the application so that the work load can be divided evenly among the processing nodes. There has been no set rule on how the division of labor can be accomplished. It all depends on the application, the network configuration, and the algorithm used in the application.

Forth provides a generalized framework to program a multiprocessing system. After an application is solved on a single computer, the application code can be loaded on all the nodes in the multiprocessing system. Commands are then sent to nodes to execute sub tasks which can be executed in parallel. Results are collected. Nodes are synchronized if necessary. The next batch of commands are sent out for parallel processing. These activities are repeated until the application is solved completely.

If the application is too big to be loaded on each and every node, the application can be parceled out so that each node is loaded with only a part of the application. Now the node issuing a command must be sure that the receiving node has that command in its dictionary. This will require more careful planning and scheduling of commands, and it is part of the discipline in programming the multiprocessing system for a specific application.

5. An Example

An integrated wafer transfer system is constructed with 4 robot arms and two wafer transfer units. All six units are connected to a host computer, which schedules and coordinates the operations of all the units. Each unit has two RS232 ports, one is used to connected to the host computer, and the other port is used optionally as a local diagnostic port.

It is a typical centralized multiprocessor system.




Steps Left Arm Wafer Transfer System Right Arm

1 Pod Present (empty cassette)

2 Pod Present (full cassette)

3 Load Cassette

4 Cassette Present

5 Load Wafers

6 Unload Cassette

7 Cassette Removed

8 Load Cassette

9 Cassette Present

10 Unload Wafers

11 Unload Cassette

12 Cassette Removed

13 Pod Removed (empty cassette)

14 Pod Removed (full cassette)



1. The first pod placed on arm platform must have an empty cassette, and thus starts a transfer cycle.

2. The pod can be removed from the arm platform after the cassette is returned. Which pod is removed first is not important.

5.1 A Centralized Multiprocessing System

In out current design, there is a host computer controlling both the arms and the Wafer Transfer System (WTS). When a pod is placed on an arm platform, the arm sends a PodPresent message to the host computer through the serial RS233 line. When the host computer receives the second PodPresent message from the second arm, it starts to issue commands to the arm to load the full cassette on the Wafer Transfer System. When the cassette is placed on the transfer platform on WTS, WTS sends a CassettePresent message to the host computer. The host computer then sends the command to load wafers from the cassette to the WTS, and then commands the firs arm to unload the cassette. After the cassette is removed from the WTS platform, WTS sends the CassetteRemoved message to the host. The host then commands the second arm to load an empty cassette on the WTS platform. Then the WTS unloads wafers into the empty cassette, and the second arm unloads the full cassette.

The host computer is the master. The arms and the WTS are slaves. All the intelligence resides in the host computer. The arms and the WTS merely report status and carry out commands issued by the host computer. Programming the host computer becomes a project by itself. It is fairly substantial program because of the multiple serial communication links to the slaves and also the graphical user interface required so that unsophisticated technicians can operate the system.

5.2 A Less-Centralized Multiprocessing System

The arms and the WTS are not dummies. They have lots of intelligence built-in to handle mechanical motions and also serial communication channels to both the host computer and to human operators. Since each unit has two serial ports, it is possible to connect them directly to perform the require function. The two serial ports on the WTS can be connected to the two arms. With some small modifications in the existing software, an autonomous multiprocessing system can be built to satisfy all the performance requirements.

The centralized multiprocessing model could be implemented on the Wafer Transfer System, because it can communicate with both arms, while the arms do not communicate directly. Assuming that the Wafer Transfer System has two serial ports, the left port is connected to the left arm and the right port is connected to the right arm. We can build a Finite State Machine (FSM) in the Wafer Transfer system, which has the following states:


0. Idle, no pod on either arm platform.

1. Pod on right arm platform.

2. Pod on left arm platform.

3. Transfer wafers from left to right.

4. Transfer wafers from right to left.

5. Pods on both arm platforms.

6. One pod left on the right arm platform.

7. One pod left on the left arm platform.

In State 0, a pod placed on one of the arm platforms moves the FSM to State 1 or State 2, depending on which platform the pod is placed on. In State 1, the second pod placed on the left platform moves the FSM to State 4. Similarly, in State 2, the second pod placed on the right platform moves the FSM to State 3. After the wafers are transferred in either State 3 or 4, the FSM moves to State 5, 6, or 7, depending whether the empty pod is removed or not. In State 5, if one pod is removed, the FSM moves to State 6 or 7. Removing the last pod moves the FSM back to State 0.

States 1 and 2 must be distinguishable for States 6 and 7, respectively, to avoid switch bouncing problems and to make sure that pods can be removed unintentionally and put back on the platform without causing the FSM to get into the wrong queue.

5.3 A Distributed Multiprocessing System

In a fully distributed multiprocessing system, all three sub-systems, the left arm, the right arm and the WTS are working at the same level as peers. However, the connections are the same as the Less-Centralized System: the WTS has its right serial port connected to the right arm and the left serial port connected to the left arm. They send commands to each other. The commands need to be enhanced so that they will effectively coordinate the actions in each unit with the other units.

The events occurring in sequence are as follows:

1. Left Arm detects an empty cassette, and sends a CassettePresent command to WTS.

2. WTS vectors CassettePresent command to move cassette from Right Arm to Left Arm.

3. Right Arm detects a full cassette, and sends a CassettePresent command to WTS.

4. WTS executes new CassettePresent command, and sends LoadCassette command to Right Arm.

5. Right Arm loads cassette on WTS platform, and then sends LoadWafer command to WTS.

6. WTS loads wafers, and then sends UnloadCassette command to Right Arm. After wafers are loaded, WTS sends LoadCassette command to Left Arm.

7. Right Arm removes cassette from WTS.

8. Left Arm loads an empty cassette on WTS platform, and then sends UnloadWafer commands to WTS.

9. WTS unloads wafers into the empty cassette, and then sends UnloadCassette command to Left Arm.

10. Left Arm removes the full cassette from WTS platform

11. WTS restores the original CassettePresent command.

12. Left Arm waits until the full cassette is removed. Right Arm waits until the empty cassette is removed.

13. The system returns to idle state.



All the synchronization and coordination are achieved by sending commands from one node to another. Status reporting is not necessary. Most multiprocessing problems can be solved this way.

6. Another Example

Let’s build a computer to play professional Go game. There are many computer Go game programs, but so far they are well below the sophistication exhibited by professional Go players. My approach is to build a multiprocessing system in which each node will analyze one aspect of the Go game and can be programmed independently. As these nodes mature, the whole system will play stronger and stronger games, eventually catching up with the human professional Go players.

The function of each node is as follows:

Interface Computer Display game, accept new stones, pass new stones to other

nodes and select the best counter move

Beginning Game Computer Compare new stones to stored beginning game patterns and

determines the best next move

End Game Computer Compare new stones to stored end game patterns and

determine the best next move

Mid Game Computer Analyze the new stones and determine the best mid game


Corner Game Computer Compare new stones to stored corner game patterns and

determine the best next move

Strategy Computer Analyze the overall game pattern and determine the best

strategy of playing

Tactics Computer Analyze the game pattern near the new stone to determine the

best tactics of playing


It is quite clear that the corner game, the beginning game, and the end game in Go are fundamentally data base problems. There are vast literature on these aspects of Go game, and the literature can be converted into data bases to guide playing. These data bases will raise the level of computer Go game to professional grade. Mid game struggle, strategy and tactics are much more difficult to codify, but much work had already been done in most current computer Go games. We can certainly take advantage of the existing technology and improving on it.

Each of the node computer will be loaded with the appropriate data base or analysis program upon boot-up. Subsequently, the interface computer will accept a stone from the human player and passes the location of this stone to all the node computers. Each node computer will search its own data base or analyze the stone and report to the interface computer the best counter move. The interface computer will select the most advantageous move and place the stone. This move is also broadcast to all the node computers. The interface computer now waits for the next stone from the human player and repeat the process until the end of the game.

A star-shaped network topology is thus perfectly suitable for this application. The stone positions can be passed among the nodes using the command-only protocol natural to a Forth system. As interprocessor communication is held to a minimum, and the whole multiprocessing system operates optimally because the problem of Go game can be divided to keep many computers equally busy.

7. Conclusion

As the price of personal computers keeps going down and the processing power of these computers keeps going up, connecting many personal computers together to form a network or a multiprocessing system is trivial, and we can build super computers at very small cost. The real challenge is to program the multiprocessing system.

Parallel processing using a multiprocessing system is a difficult problem, and it has occupied many computer scientists in developing algorithms, methodology, and architecture to handle it for various applications. Although Forth has never been considered as a parallel processing language, its modular structure can be very useful when an application is to be adapted to a multiprocessing system. If the application is already solved in Forth, it is a simple matter to divide the work load and distribute it among many processors. Very small changes are necessary to modify the Forth interpreter to work as a node in a network of computers.

The assertion that a multiprocessing system can function properly by passing only commands among the processor nodes cannot be proven conclusively. However, indications show that it works in many specific examples. It would be nice if we can prove this theorem. In the meantime, we can use it to explore lots of interesting applications.