Programmer Notes | How to Write High Performance Java Code

  erupt simultaneously, java, jvm, Thread pool

I concurrency

Unable to create new native thread ……

Question 1: How much memory does it consume to create a thread in 1:Java?

Each thread has its own stack memory and shares heap memory

Question 2: How many threads can a machine create?

CPU, Memory, Operating System, JVM, Application Server

We write a sample code to verify the difference between the lower thread pool and the non-thread pool:

//线程池和非线程池的区别
public class ThreadPool {
  
     public static int times = 100;//100,1000,10000
  
     public static ArrayBlockingQueue arrayWorkQueue = new ArrayBlockingQueue(1000);
     public static ExecutorService threadPool = new ThreadPoolExecutor(5, //corePoolSize线程池中核心线程数
             10,
             60,
             TimeUnit.SECONDS,
             arrayWorkQueue,
             new ThreadPoolExecutor.DiscardOldestPolicy()
     );
  
     public static void useThreadPool() {
         Long start = System.currentTimeMillis();
         for (int i = 0; i < times; i++) {
             threadPool.execute(new Runnable() {
                 public void run() {
                     System.out.println("说点什么吧...");
                 }
             });
         }
         threadPool.shutdown();
         while (true) {
             if (threadPool.isTerminated()) {
                 Long end = System.currentTimeMillis();
                 System.out.println(end - start);
                 break;
             }
         }
     }
  
     public static void createNewThread() {
         Long start = System.currentTimeMillis();
         for (int i = 0; i < times; i++) {
  
             new Thread() {
                 public void run() {
                     System.out.println("说点什么吧...");
                 }
             }.start();
         }
         Long end = System.currentTimeMillis();
         System.out.println(end - start);
     }
  
     public static void main(String args[]) {
         createNewThread();
         //useThreadPool();
     }
 }

Start different numbers of threads, and then compare the execution results of thread pool and non-thread pool:

Non-thread pool Thread pool
100 times 16 milliseconds 5ms
1000 times 90 milliseconds 28ms
10000 times 1329ms 164ms

Conclusion: Do not use new Thread (), use thread pool.

Disadvantages of non-thread pools:

  • Each creation consumes a lot of performance.
  • Disorderly, lack of management. Easy to create unlimited threads, causing OOM and panic

1.1 Issues to Pay Attention to when Using Thread Pool

To avoid deadlock, please use CAS as much as possible.

We write an implementation example of optimistic locking:

public class CASLock {
  
     public static int money = 2000;
  
     public static boolean add2(int oldm, int newm) {
         try {
             Thread.sleep(2000);
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         if (money == oldm) {
             money = money + newm;
             return true;
         }
         return false;
     }
  
     public synchronized static void add1(int newm) {
         try {
             Thread.sleep(3000);
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         money = money + newm;
     }
  
     public static void add(int newm) {
         try {
             Thread.sleep(3000);
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         money = money + newm;
     }
  
     public static void main(String args[]) {
         Thread one = new Thread() {
             public void run() {
                 //add(5000)
                 while (true) {
                     if (add2(money, 5000)) {
                         break;
                     }
                 }
             }
         };
         Thread two = new Thread() {
             public void run() {
                 //add(7000)
                 while (true) {
                     if (add2(money, 7000)) {
                         break;
                     }
                 }
             }
         };
         one.start();
         two.start();
         try {
             one.join();
             two.join();
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         System.out.println(money);
     }
 }

Note when using ThreadLocal

The ThreadLocalamp uses the weak reference of ThreadLocal as the key. if a ThreadLocal has no external strong reference to reference it, the threadlocal is bound to be recycled when the system GC is in progress. thus, an Entry with a null key will appear in threadlocalamp. There is no way to access the value of these entries with null keys. if the current thread is delayed, the value of these entries with null keys will always have a strong reference chain: threadref-> thread-> threadcalcmap-> Entry-> value will never be recycled, causing memory leaks.

We write an example of the correct use of ThreadLocalMap:

//ThreadLocal应用实例
public class ThreadLocalApp {
  
     public static final ThreadLocal threadLocal = new ThreadLocal();
  
     public static void muti2() {
         int i[] = (int[]) threadLocal.get();
         i[1] = i[0] * 2;
         threadLocal.set(i);
     }
  
     public static void muti3() {
         int i[] = (int[]) threadLocal.get();
         i[2] = i[1] * 3;
         threadLocal.set(i);
     }
  
     public static void muti5() {
         int i[] = (int[]) threadLocal.get();
         i[3] = i[2] * 5;
         threadLocal.set(i);
     }
  
     public static void main(String args[]) {
         for (int i = 0; i < 5; i++) {
             new Thread() {
                 public void run() {
                     int start = new Random().nextInt(10);
                     int end[] = {0, 0, 0, 0};
                     end[0] = start;
                     threadLocal.set(end);
                     ThreadLocalApp.muti2();
                     ThreadLocalApp.muti3();
                     ThreadLocalApp.muti5();
                     //int end = (int) threadLocal.get();
                     System.out.println(end[0] + "  " + end[1] + "  " + end[2] + "  " + end[3]);
                     threadLocal.remove();
                 }
             }.start();
         }
     }
 }

1.2 Thread Interaction-Problems Caused by Unsafe Threads

The classic HashMap dead cycle causes CPU100% problem

We simulate an example of a HashMap dead cycle:

//HashMap死循环示例
public class HashMapDeadLoop {
  
     private HashMap hash = new HashMap();
  
     public HashMapDeadLoop() {
         Thread t1 = new Thread() {
             public void run() {
                 for (int i = 0; i < 100000; i++) {
                     hash.put(new Integer(i), i);
                 }
                 System.out.println("t1 over");
             }
         };
  
         Thread t2 = new Thread() {
             public void run() {
                 for (int i = 0; i < 100000; i++) {
                     hash.put(new Integer(i), i);
                 }
                 System.out.println("t2 over");
             }
         };
         t1.start();
         t2.start();
     }
  
     public static void main(String[] args) {
         for (int i = 0; i < 1000; i++) {
             new HashMapDeadLoop();
         }
         System.out.println("end");
     }
 }
https://coolshell.cn/articles/9606.html

After the HashMap dead loop occurs, we can observe the following information in the thread stack:

/HashMap死循环产生的线程栈
Thread-281" #291 prio=5 os_prio=31 tid=0x00007f9f5f8de000 nid=0x5a37 runnable [0x0000700006349000]
   java.lang.Thread.State: RUNNABLE
       at java.util.HashMap$TreeNode.split(HashMap.java:2134)
       at java.util.HashMap.resize(HashMap.java:713)
       at java.util.HashMap.putVal(HashMap.java:662)
       at java.util.HashMap.put(HashMap.java:611)
       at com.example.demo.HashMapDeadLoop$2.run(HashMapDeadLoop.java:26)

Deadlock Problem of Spring3.1

We simulate an example of deadlock:

//死锁的示例
public class DeadLock {
     public static Integer i1 = 2000;
     public static Integer i2 = 3000;
         public static synchronized Integer getI2() {
         try {
             Thread.sleep(3000);
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         return i2;
     }
     public static void main(String args[]) {
         Thread one = new Thread() {
             public void run() {
                 synchronized (i1) {
                     try {
                         Thread.sleep(3000);
                     } catch (InterruptedException e) {
                         e.printStackTrace();
                     }
                     synchronized (i2) {
                         System.out.println(i1 + i2);
                     }
                 }
             }
         };
         one.start();
         Thread two = new Thread() {
             public void run() {
                 synchronized (i2) {
                     try {
                         Thread.sleep(3000);
                     } catch (InterruptedException e) {
                         e.printStackTrace();
                     }
                     synchronized (i1) {
                         System.out.println(i1 + i2);
                     }
                 }
             }
         };
         two.start();
     }
 }

After the deadlock occurs, we can observe the following information in the thread stack:

//死锁时产生堆栈
"Thread-1":
       at com.example.demo.DeadLock$2.run(DeadLock.java:47)
       - waiting to lock  (a java.lang.Integer)
       - locked  (a java.lang.Integer)
"Thread-0":
       at com.example.demo.DeadLock$1.run(DeadLock.java:31)
       - waiting to lock  (a java.lang.Integer)
       - locked  (a java.lang.Integer)
Found 1 deadlock.

1.3 Optimization Example Based on JUC

For the optimization of a counter, we use Synchronized, ReentrantLock, Atomic three different ways to realize a counter and realize the performance difference among them.

//示例代码
public class SynchronizedTest {
  
     public static int threadNum = 100;
     public static int loopTimes = 10000000;
  
     public static void userSyn() {
         //线程数
         Syn syn = new Syn();
         Thread[] threads = new Thread[threadNum];
         //记录运行时间
         long l = System.currentTimeMillis();
         for (int i = 0; i < threadNum; i++) {
             threads[i] = new Thread(new Runnable() {
                 @Override
                 public void run() {
                     for (int j = 0; j < loopTimes; j++) {
                         //syn.increaseLock();
                         syn.increase();
                     }
                 }
             });
             threads[i].start();
         }
         //等待所有线程结束
         try {
             for (int i = 0; i < threadNum; i++)
                 threads[i].join();
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         System.out.println("userSyn" + "-" + syn + " : " + (System.currentTimeMillis() - l) + "ms");
     }
  
     public static void useRea() {
         //线程数
         Syn syn = new Syn();
         Thread[] threads = new Thread[threadNum];
         //记录运行时间
         long l = System.currentTimeMillis();
         for (int i = 0; i < threadNum; i++) {
             threads[i] = new Thread(new Runnable() {
                 @Override
                 public void run() {
                     for (int j = 0; j < loopTimes; j++) {
                         syn.increaseLock();
                         //syn.increase();
                     }
                 }
             });
             threads[i].start();
         }
         //等待所有线程结束
         try {
             for (int i = 0; i < threadNum; i++)
                 threads[i].join();
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         System.out.println("userRea" + "-" + syn + " : " + (System.currentTimeMillis() - l) + "ms");
     }
    public static void useAto() {
         //线程数
         Thread[] threads = new Thread[threadNum];
         //记录运行时间
         long l = System.currentTimeMillis();
         for (int i = 0; i < threadNum; i++) {
             threads[i] = new Thread(new Runnable() {
                 @Override
                 public void run() {
                     for (int j = 0; j < loopTimes; j++) {
                         Syn.ai.incrementAndGet();
                     }
                 }
             });
             threads[i].start();
         }
         //等待所有线程结束
         try {
             for (int i = 0; i < threadNum; i++)
                 threads[i].join();
         } catch (InterruptedException e) {
             e.printStackTrace();
         }
         System.out.println("userAto" + "-" + Syn.ai + " : " + (System.currentTimeMillis() - l) + "ms");
     }
  
     public static void main(String[] args) {
         SynchronizedTest.userSyn();
         SynchronizedTest.useRea();
         SynchronizedTest.useAto();
     }
 }
  
 class Syn {
     private int count = 0;
     public final static AtomicInteger ai = new AtomicInteger(0);
  
     private Lock lock = new ReentrantLock();
  
     public synchronized void increase() {
         count++;
     }
  
     public void increaseLock() {
         lock.lock();
         count++;
         lock.unlock();
     }
  
     @Override
     public String toString() {
         return String.valueOf(count);
     }
 }

Conclusion: In the case of high concurrency and many cycles, the efficiency of relockable is higher than Synchronized, but the final Atomic performance is the best.

II. Communications

2.1 Efficient Problem of Database Connection Pool

  • Be sure to close the connection in finally.
  • Be sure to release the connection in finally.

2.2 OIO/NIO/AIO

OIO NIO AIO
Type Blocking Non-blocking Non-blocking
Difficulty of use Simple Complex Complex
Reliability Poor High High
Throughput Low High High

Conclusion: I should use NIO to communicate as much as possible under strict performance requirements.

2.3 TIME_WAIT(client), CLOSE_WAIT(server) problem

Response: Frequent requests fail

Get connection netstat-n | awk’/tcp/{++s [$ nf]} end {for (ains) printa, s [a]}’

  • TIME_WAIT: indicates active shutdown; optimizing system kernel parameters can.
  • CLOSE_WAIT: indicates passive close.
  • ESTABLISHED: Indicates that communication is in progress

Solution: forced shutdown after completion of phase 2

2.4 Serial connection, permanent connection (long connection), pipelined connection

Conclusion:

Pipeline connection has the best performance. Persistence reduces the time to open/close the connection based on serial connection.

Use restrictions for pipelined connections:

1. HTTP client cannot confirm persistence (usually server-to-server and not terminal use);

2. The order of response information must be consistent with the order of request information;

3. Pipelined connections can only be used if idempotent operations are supported.

III. Database Operation

Must have index (pay special attention to query by time)

Single operation or batch operation

Note: Many programmers randomly use a single operation when writing code, but under the premise of performance requirements, batch operation is required.

Iv. JVM

4.1 General Processing Steps for CPU Elevation

  • Top finds out which process consumes high cpu
  • Top–h–p finds out which thread consumes more cpu
  • Record the threads that consume the most cpu.
  • Printf %x Performs pid Binary Conversion
  • Jstack records stack information for a process
  • Find the thread information that consumes the most cpu

4.2 General Processing Steps for Memory Elevation (OOM)

  • Jstat commands check the number of FGC occurrences and the time consumed. The more the number of FGC occurrences, the longer the time consumed, indicating that there are problems.
  • Continuously check jmap -heap to check the occupancy of the old generation, and the greater the change, the more the program has problems.
  • Export the file by using the continuous jmap -histo:live command, and compare the differences of loaded objects. The differences are generally the places where problems occur.

4.3 Single Core Elevation Caused by GC

Single CPU utilization rate is high, starting with GC.

4.4 common SY elevation

  • Thread context switches frequently
  • Too many threads
  • Lock competition is fierce.

4.5 Iowait elevation

If IO’s CPU usage is very high, check programs involving IO, such as transforming OIO into NIO.

4.6 jitter problem

Reason: The conversion of bytecode into machine code requires CPU time slice, and a large number of CPUs cause the CPU to be at a high level for a long time when executing bytecode.

Phenomena: “C2Compiler Thread 1” daemon, “C2Compiler Thread 0” daemon has the highest CPU utilization rate;

Solution: Ensure the CPU proportion of compilation threads.

Author: Liang Xin

Source:Yixin Institute of Technology