PHP socket — select system call

[Original Address:https://blog.ti-node.com/blog …]

On <A Preliminary Study of PHP socket-Start with a Simple socket Server> in turn explains the three progressive servers:

  • Servers that can only serve one client
  • An additional server that can serve multiple clients with fork.
  • A server that serves multiple clients with a pre-fork derived process.

The basic principle of the process model of the last server is very similar to apache.
In fact, the biggest problem with this model is that it needs to estimate the number of processes according to the actual business, but it still needs a large number of processes to solve the problem. CPU may be wasted on switching between processes, and there may be a cluster phenomenon (simple understanding is that 100 processes are waiting for a client connection, one client comes but all processes are woken up, but in the end only one process serves this client, and the remaining 99 processes are wasted). So, is there a solution that can make a small number of processes serve multiple clients?
The answer is in <A Preliminary Study of PHP socket-Some Boring Theories about IO> mentioned in the “io multiplexing”. multiplexing refers to multiple clients connecting socket, multiplexing refers to multiplexing a few processes, multiplexing itself still belongs to synchronous communication mode, but the results appear asynchronous, which is worth noting. currently there are three commonly used multiplexing schemes, which are:

  • Select, the earliest solution
  • Poll is an upgraded version of select
  • Epoll, the current final resolution version, is responsible for solving the c10k problem.

Today, I’m talking about select, which is itself a Linux system call. In Linux, everything is a file, socket is no exception. Every time Linux opens a file system, it returns a tag corresponding to the file called a file descriptor. The file descriptor is a non-negative integer, and when the number of file descriptors reaches the maximum, it will return to decimal places to start again (digression: according to tradition, In general, the standard input is 0, the standard output is 1, and the standard error is 2). The read-write operation of a file is to use the read-write operation of a file descriptor. The number of file descriptors that a process can operate is limited. Different systems have different numbers. In linux, the control can be adjusted by adjusting ulimit.
First, let’s give a simple example to illustrate the function and function of select. When double 11 arrived, you bought a lot of sneakers for the shaolin soccer team, and there were 10 couriers to deliver them to you. Then you kept calling the 10 couriers, and you felt a little tired. Ah Mui felt very sorry for you. So Ah Mui said, “You don’t have to worry about this. You go and concentrate on practicing omega supreme’s legs. When any couriers arrive, I’ll tell you”. When one of the couriers arrived, Ah Mui shouted to you, “Come down, there are couriers!” “However, Ah Mui is a simpleton. She does not tell you which pair of shoes were delivered by express, but only tells you that the express has arrived. Therefore, you can only check the status of all the express orders in turn to confirm which one has signed for them.
The above example is interpreted by combining terms, that is, you are the server software, ah mui is the select, 10 couriers are 10 clients (i.e. 10 connection socket fd). ah mui is responsible for managing these 10 connection socket fd for you, and when any one FD responds, that is, data can be read or sent. Select will tell you that there are fd that can read and write, but select will not tell you which fd can read and write, so you must look around all fd to see which fd is readable or writable.
It’s time for a wave of mechanical memory:
When you start select, you need to add three different sets of socket fd as parameters of select. Traditionally, this set of fd is called fd_set. The three sets of fd_set are in turn readable set, writable set and exception set. The three sets of fd_set are maintained by the system kernel. Whenever there are readable or writable sets or exceptions in the three fd_set monitored and managed by select, The caller will be notified. After the caller calls select, the caller will be blocked by select, waiting for the occurrence of readable and writable events. Once there is readable and writable or abnormal occurrence, it is necessary to copy all three fd_set from kernel state to user state, and then the caller traverses all fd through polling. Take out the readable and writable fd or abnormal fd and take corresponding actions. If a caller ignores an operable fd, the next time the remaining fd is operable, it will return the last unprocessed fd of the caller to the caller again. That is to say, when traversing the fd, the ignored FD is still readable and writable until the caller cares.
The above is all my personal understanding and summary. Any mistakes can be pointed out, and I hope it will not mislead people. The following is a wave of select system calls to be operated through php code examples. In php, you can operate select system calls through stream_select or socket_select. The following is a code demonstration of socket_select:

<?php

// BEGIN 创建一个tcp socket服务器
$host = '0.0.0.0';
$port = 9999;
$listen_socket = socket_create( AF_INET, SOCK_STREAM, SOL_TCP );
socket_bind( $listen_socket, $host, $port );
socket_listen( $listen_socket );
// END 创建服务器完毕 

// 也将监听socket放入到read fd set中去,因为select也要监听listen_socket上发生事件
$client = [ $listen_socket ];
// 先暂时只引入读事件,避免有同学晕头
$write = [];
$exp = [];

// 开始进入循环
while( true ){
  $read = $client;
  // 当select监听到了fd变化,注意第四个参数为null
  // 如果写成大于0的整数那么表示将在规定时间内超时
  // 如果写成等于0的整数那么表示不断调用select,执行后立马返回,然后继续
  // 如果写成null,那么表示select会阻塞一直到监听发生变化
  if( socket_select( $read, $write, $exp, null ) > 0 ){
    // 判断listen_socket有没有发生变化,如果有就是有客户端发生连接操作了
    if( in_array( $listen_socket, $read ) ){
      // 将客户端socket加入到client数组中
      $client_socket = socket_accept( $listen_socket );
      $client[] = $client_socket;
      // 然后将listen_socket从read中去除掉
      $key = array_search( $listen_socket, $read );
      unset( $read[ $key ] );
    }
    // 查看去除listen_socket中是否还有client_socket
    if( count( $read ) > 0 ){
      $msg = 'hello world';
      foreach( $read as $socket_item ){
      // 从可读取的fd中读取出来数据内容,然后发送给其他客户端
      $content = socket_read( $socket_item, 2048 );
      // 循环client数组,将内容发送给其余所有客户端
      foreach( $client as $client_socket ){
        // 因为client数组中包含了 listen_socket 以及当前发送者自己socket,所以需要排除二者
        if( $client_socket != $listen_socket && $client_socket != $socket_item ){
          socket_write( $client_socket, $content, strlen( $content ) );
        }
      }
      }
    }
  } 
  // 当select没有监听到可操作fd的时候,直接continue进入下一次循环
  else {
    continue;
  }
  
}

Save the file as server.php, then execute PHP server.php to run the service, open three terminals at the same time, execute telnet 127.0.0.9999, and then enter “I am DOG” in any Telnet terminal! , and look at the other two telnet windows, isn’t it great?
Under incomplete screenshots:

Haven’t you realized the problem? If we see that there are three telnet clients connected to the server and can send messages to each other, but we only need one process to service three clients, if you like, you can open more telnet, but the server only needs one process to complete, this is where IO multiplexes diao!
Finally, we focus on analyzing some socket_select functions. let’s look at the prototype of this function:

int socket_select ( array &$read , array &$write , array &$except , int $tv_sec [, int $tv_usec = 0 ] )

It is worth noting that the three parameters of $read, $write and $except are preceded by a&,which means that the three parameters are of reference type and can be rewritten. In the above code case, when the server code is executed for the first time, we will put all fd that need to be monitored into the read array. However, when the system goes through select, the contents of this array will change. From all the original read fds to only readable read fds, this is why a client array is declared, then a read array is declared, and then read = client.. if we directly regard client as the parameter of socket_select, Then the contents of the client array will be modified. if five users are stored in the client array and only one is readable, only the readable fd will be left in the client after socket_select, and the other four clients will be lost. at this time, the client’s performance is that the connection is lost without any reason.

[Original Address:https://blog.ti-node.com/blog …]