A Preliminary Study of PHP Multiprocesses-Two or Three Events of Interprocess Communication

[Original Address:https://blog.ti-node.com/blog …]

Often the purpose of starting multiple processes is to work together to speed up efficiency. As mentioned earlier, the memory spaces between different processes are isolated from each other, which means that process A cannot read or write any data content in process B, and vice versa. However, sometimes, there must be a mutual notification mechanism between multiple processes, which is called “timely communication” in workplace terms. Everyone is doing different parts of the same thing together. It is very important to “communicate in time” with each other.

Therefore, InterProcess Communication was born, with the English abbreviation IPC, full name interprocess communication.
Common inter-process communication methods include: pipes (nameless and famous), message queues, semaphores, shared memory and socket. The last method is not mentioned today, but will be discussed in php socket programming later, with emphasis on the first four methods.

Pipeline is a common thing on *NIX, and everyone uses it when using linux at ordinary times. A simple understanding is |. For example, ps -aux|grep php is a pipeline, which is similar to the communication between ps process and grep process. Pipeline is a half-duplex mode (now there are systems that already support full-duplex pipelines), that is, data can only be transmitted along one direction of the pipeline, and data cannot be transmitted in reverse on the same pipeline. There are two types of pipes, one is named unnamed pipe and the other is named named pipe. unnamed pipes can only be used between two processes with common ancestors. simple understanding is that they can only be used for communication between parent processes and their child processes, but named pipes can be used for communication between any two unrelated processes (this named pipe will be demonstrated later).

It should be pointed out that message queue, semaphore and shared memory all belong to XSI IPC(XSI can be considered as a superset of POSIX standard, which is roughly understood as C++ to C). These three IPC generally have two “names” in *NIX to name them, one is called identifier and the other is called key. The identifier is a non-negative integer. Whenever an IPC structure is created and then destroyed, the identifier will be +1 until the maximum integer value of the integer, and then recalculated from 0. Since it is used for multi-process communication, when many processes use XSI IPC, they need to use a name to find the corresponding IPC before they can read and write it (the term is that multiple processes converge on the same IPC structure), so POSIX suggests that whenever an IPC structure is created, a key should be specified to be associated with it. In a word, the identifier is the internal name of xsipc and the key is the external name of xsipc.

There are roughly three ways to aggregate multiple processes on XSI IPC:

  • Use the specified key IPC_PRIVATE to create an IPC structure, then save the returned identifier to a file, and then the processes communicate by reading the identifier in this file. Use public header files. The disadvantage of this is more IO operations.
  • Write the commonly recognized key into the public header file. Disadvantages of doing this: This key may already be associated with an IPCi structure, so errors may occur when using this key to create the structure, and then the existing IPC structure must be deleted and recreated.
  • Identify a file pathname and project ID, and then use ftok to convert these two parameters into a key. This will be the way we use it.

The XSI IPC structure has a corresponding permission structure called ipc_perm, which defines the creator, owner, etc. of the IPC structure.

One of multi-process communication: named pipe. In php, the function that creates a pipeline is called posix_mkfifo (). After the pipeline is created, it is actually a file, and then you can use any function related to reading and writing files to operate on it. The code roughly demonstrates:

<?php
// 管道文件绝对路径
$pipe_file = __DIR__.DIRECTORY_SEPARATOR.'test.pipe';
// 如果这个文件存在,那么使用posix_mkfifo()的时候是返回false,否则,成功返回true
if( !file_exists( $pipe_file ) ){
  if( !posix_mkfifo( $pipe_file, 0666 ) ){
    exit( 'create pipe error.'.PHP_EOL );
  }
}
// fork出一个子进程
$pid = pcntl_fork();
if( $pid < 0 ){
  exit( 'fork error'.PHP_EOL );
} else if( 0 == $pid ) {
  // 在子进程中
  // 打开命名管道,并写入一段文本
  $file = fopen( $pipe_file, "w" );
  fwrite( $file, "helo world." );
  exit;
} else if( $pid > 0 ) {
  // 在父进程中
  // 打开命名管道,然后读取文本
  $file = fopen( $pipe_file, "r" );
  // 注意此处fread会被阻塞
  $content = fread( $file, 1024 );
  echo $content.PHP_EOL;
  // 注意此处再次阻塞,等待回收子进程,避免僵尸进程
  pcntl_wait( $status );
}

The operation results are as follows:

Multiprocess communication ii: message queue. Many people have even heard of this, but the impression often stays on kafka, rabbitmq and other network message queue software used for server decoupling. Message queue is a linked list of messages (a common data structure), but this message queue is stored in the system kernel (not in user state). Generally, our external program uses a key to read and write the message queue. In PHP, the message queue operation is completed through msg_* series functions.

<?php
// 使用ftok创建一个键名,注意这个函数的第二个参数“需要一个字符的字符串”
$key = ftok( __DIR__, 'a' );
// 然后使用msg_get_queue创建一个消息队列
$queue = msg_get_queue( $key, 0666 );
// 使用msg_stat_queue函数可以查看这个消息队列的信息,而使用msg_set_queue函数则可以修改这些信息
//var_dump( msg_stat_queue( $queue ) );  
// fork进程
$pid = pcntl_fork();
if( $pid < 0 ){
  exit( 'fork error'.PHP_EOL );
} else if( $pid > 0 ) {
  // 在父进程中
  // 使用msg_receive()函数获取消息
  msg_receive( $queue, 0, $msgtype, 1024, $message );
  echo $message.PHP_EOL;
  // 用完了记得清理删除消息队列
  msg_remove_queue( $queue );
  pcnlt_wait( $status );
} else if( 0 == $pid ) {
  // 在子进程中
  // 向消息队列中写入消息
  // 使用msg_send()向消息队列中写入消息,具体可以参考文档内容
  msg_send( $queue, 1, "helloword" );
  exit;
}

The operation results are as follows:

However, the two functions msg_send () and msg_receive () are worthy of further study, and each parameter of these two functions is very worthy of further study and attempt. The problem of space will not be described in detail here.

Multiprocess Communication III: Semaphores and Shared Memory. Shared memory is the fastest way to communicate between processes, because n processes do not need data replication, but directly control the same data. In fact, semaphores and shared memory are inseparable. They are also used in combination. * Some *NIX books do not even recommend novice users to use this method of interprocess communication easily, because it is a solution that is prone to deadlock. Shared memory, as its name implies, is an area in a lump of memory that allows multiple processes to read and write. The biggest problem here lies in the problem of data synchronization. For example, when one changes data, another process cannot read it, otherwise, problems will arise. Therefore, in order to solve this problem, semaphores are introduced. Semaphores are counters that are used in conjunction with shared memory. Generally, the process is as follows:

  • The current process gets the semaphore of the shared memory that will be used.
  • If the semaphore is greater than 0, it means that the shared resource can be used, and then the process decrements the semaphore by 1
  • If the semaphore is 0, the process enters the sleep state until the semaphore is greater than 0, and the process wake-up starts from 1

If a process no longer uses the current shared resource, it will reduce the semaphore by 1. In this place, the detection of semaphores and minus 1 are atomic, which means that the two operations must succeed together. This is implemented by the system kernel.

In php, semaphores and shared memory have a total of these functions:

Among them, sem_Is a semaphore related function, shm_Is a shared memory related function.

<?php
// sem key
$sem_key = ftok( __FILE__, 'b' );
$sem_id = sem_get( $sem_key );
// shm key
$shm_key = ftok( __FILE__, 'm' );
$shm_id = shm_attach( $shm_key, 1024, 0666 );
const SHM_VAR = 1;
$child_pid = [];
// fork 2 child process
for( $i = 1; $i <= 2; $i++ ){
  $pid = pcntl_fork();
  if( $pid < 0 ){
    exit();
  } else if( 0 == $pid ) {
    // 获取锁
    sem_acquire( $sem_id );
    if( shm_has_var( $shm_id, SHM_VAR ) ){
      $counter = shm_get_var( $shm_id, SHM_VAR );
      $counter += 1;
      shm_put_var( $shm_id, SHM_VAR, $counter );
    } else {
      $counter = 1;
      shm_put_var( $shm_id, SHM_VAR, $counter );
    }
    // 释放锁,一定要记得释放,不然就一直会被阻锁死
    sem_release( $sem_id );
    exit;
  } else if( $pid > 0 ) {
    $child_pid[] = $pid;
  }
}
while( !empty( $child_pid ) ){
  foreach( $child_pid as $pid_key => $pid_item ){
    pcntl_waitpid( $pid_item, $status, WNOHANG );
    unset( $child_pid[ $pid_key ] );
  }
}
// 休眠2秒钟,2个子进程都执行完毕了
sleep( 2 );
echo '最终结果'.shm_get_var( $shm_id, SHM_VAR ).PHP_EOL;
// 记得删除共享内存数据,删除共享内存是有顺序的,先remove后detach,顺序反过来php可能会报错
shm_remove( $shm_id );
shm_detach( $shm_id );

The operation results are as follows:

Specifically, if sem is not used, the above operation results will produce 1 instead of 2 under certain probability. However, as long as sem is added, 100% is guaranteed to be 2, and no other values will appear.

[Original Address:https://blog.ti-node.com/blog …]