[case14] Case Analysis of gc Time Too Long Caused by Frequent Object Generation

  jvm

Order

This paper mainly analyzes a case that frequently generates objects and causes gc time to be too long.

Symptoms and analysis

The gc time is too long, the average gc pause time is nearly 4 seconds, and 13% of the gc is over 10 seconds, which is terrible. Some gc logs are as follows:

[PSYoungGen: 457878K->126656K(489472K)] 1746043K->1453131K(1887744K), 12.1965757 secs] [Times: user=5.59 sys=0.52, real=12.19 secs] 
154415.774: [GC (Allocation Failure) 
Desired survivor size 212860928 bytes, new threshold 1 (max 15)
[PSYoungGen: 376192K->65968K(484864K)] 1702667K->1392499K(1883136K), 0.1665513 secs] [Times: user=0.10 sys=0.00, real=0.17 secs] 
154416.838: [GC (Allocation Failure) 
Desired survivor size 235929600 bytes, new threshold 1 (max 15)
[PSYoungGen: 341424K->196182K(445952K)] 1667955K->1523034K(1844224K), 1.7996294 secs] [Times: user=0.89 sys=0.03, real=1.80 secs] 
154419.456: [GC (Allocation Failure) 
Desired survivor size 225968128 bytes, new threshold 1 (max 15)
[PSYoungGen: 434262K->121776K(468480K)] 1761114K->1486938K(1866752K), 23.0844304 secs] [Times: user=10.75 sys=0.81, real=23.09 secs] 
154442.541: [Full GC (Ergonomics) [PSYoungGen: 121776K->0K(468480K)] [ParOldGen: 1365162K->208108K(1398272K)] 1486938K->208108K(1866752K), [Metaspace: 93615K->93615K(1132544K)], 23.5955214 secs] [Times: user=3.76 sys=5.30, real=23.59 secs] 
154504.670: [GC (Allocation Failure) 
Desired survivor size 217579520 bytes, new threshold 1 (max 15)
[PSYoungGen: 200553K->106368K(486400K)] 408662K->314476K(1884672K), 1.0664613 secs] [Times: user=0.39 sys=0.13, real=1.06 secs] 
154507.542: [GC (Allocation Failure) 
Desired survivor size 218103808 bytes, new threshold 1 (max 15)
[PSYoungGen: 372096K->144182K(478208K)] 580204K->425927K(1876480K), 5.7999561 secs] [Times: user=1.12 sys=1.55, real=5.80 secs] 
154514.037: [GC (Allocation Failure) 
Desired survivor size 213909504 bytes, new threshold 1 (max 15)
[PSYoungGen: 409910K->87920K(489984K)] 691655K->407999K(1888256K), 10.1020217 secs] [Times: user=4.46 sys=0.61, real=10.11 secs] 
154563.240: [GC (Allocation Failure) 
Desired survivor size 213385216 bytes, new threshold 1 (max 15)
[PSYoungGen: 328380K->65952K(485888K)] 648460K->386087K(1884160K), 0.0918412 secs] [Times: user=0.04 sys=0.01, real=0.09 secs] 
154564.037: [GC (Allocation Failure) 
Desired survivor size 219676672 bytes, new threshold 1 (max 15)
[PSYoungGen: 342944K->153558K(478208K)] 663079K->474022K(1876480K), 3.1948641 secs] [Times: user=0.72 sys=0.69, real=3.19 secs] 
154568.135: [GC (Allocation Failure) 
Desired survivor size 212336640 bytes, new threshold 1 (max 15)
[PSYoungGen: 423382K->98528K(484352K)] 743846K->457302K(1882624K), 13.4085860 secs] [Times: user=6.04 sys=0.69, real=13.41 secs] 

After passing through the memory under jmap dump, use mat to share and view thread_overview.
图片描述

It can be seen that ElasticeSearchJessalthindicator. dohealthcheck holds many objects that have not been released.

Debugging and Reproduction

Local recurrence

spring-context-4.3.7.RELEASE-sources.jar! /org/springframework/jmx/export/SpringModelMBean.java

    /**
     * Switches the {@link Thread#getContextClassLoader() context ClassLoader} for the
     * managed resources {@link ClassLoader} before allowing the invocation to occur.
     * @see javax.management.modelmbean.ModelMBean#invoke
     */
    @Override
    public Object invoke(String opName, Object[] opArgs, String[] sig)
            throws MBeanException, ReflectionException {

        ClassLoader currentClassLoader = Thread.currentThread().getContextClassLoader();
        try {
            Thread.currentThread().setContextClassLoader(this.managedResourceClassLoader);
            return super.invoke(opName, opArgs, sig);
        }
        finally {
            Thread.currentThread().setContextClassLoader(currentClassLoader);
        }
    }

spring-boot-actuator-1.4.5.RELEASE-sources.jar! /org/springframework/boot/actuate/endpoint/jmx/DataEndpointMBean.java

@ManagedResource
public class DataEndpointMBean extends EndpointMBean {

    /**
     * Create a new {@link DataEndpointMBean} instance.
     * @param beanName the bean name
     * @param endpoint the endpoint to wrap
     * @param objectMapper the {@link ObjectMapper} used to convert the payload
     */
    public DataEndpointMBean(String beanName, Endpoint<?> endpoint,
            ObjectMapper objectMapper) {
        super(beanName, endpoint, objectMapper);
    }

    @ManagedAttribute(description = "Invoke the underlying endpoint")
    public Object getData() {
        return convert(getEndpoint().invoke());
    }

}

Only when the Enable JMX Agent is turned on by the local idea can it be reproduced.

healthEndpoint

spring-boot-admin-server-1.4.6-sources.jar! /de/codecentric/boot/admin/registry/StatusUpdateApplicationListener.java

    private long updatePeriod = 10_000L;

    public void startStatusUpdate() {
        if (scheduledTask != null && !scheduledTask.isDone()) {
            return;
        }

        scheduledTask = taskScheduler.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                statusUpdater.updateStatusForAllApplications();
            }
        }, updatePeriod);
        LOGGER.debug("Scheduled status-updater task for every {}ms", updatePeriod);

    }

spring-boot-admin-server-1.4.6-sources.jar! /de/codecentric/boot/admin/registry/StatusUpdater.java

    public void updateStatusForAllApplications() {
        long now = System.currentTimeMillis();
        for (Application application : store.findAll()) {
            if (now - statusLifetime > application.getStatusInfo().getTimestamp()) {
                updateStatus(application);
            }
        }
    }

    public void updateStatus(Application application) {
        StatusInfo oldStatus = application.getStatusInfo();
        StatusInfo newStatus = queryStatus(application);

        Application newState = Application.create(application).withStatusInfo(newStatus).build();
        store.save(newState);

        if (!newStatus.equals(oldStatus)) {
            publisher.publishEvent(
                    new ClientApplicationStatusChangedEvent(newState, oldStatus, newStatus));
        }
    }

    protected StatusInfo queryStatus(Application application) {
        LOGGER.trace("Updating status for {}", application);

        try {
            @SuppressWarnings("unchecked")
            ResponseEntity<Map<String, Object>> response = restTemplate.getForEntity(
                    application.getHealthUrl(), (Class<Map<String, Object>>) (Class<?>) Map.class);
            LOGGER.debug("/health for {} responded with {}", application, response);

            if (response.hasBody() && response.getBody().get("status") instanceof String) {
                return StatusInfo.valueOf((String) response.getBody().get("status"));
            } else if (response.getStatusCode().is2xxSuccessful()) {
                return StatusInfo.ofUp();
            } else {
                return StatusInfo.ofDown();
            }

        } catch (Exception ex) {
            if ("OFFLINE".equals(application.getStatusInfo().getStatus())) {
                LOGGER.debug("Couldn't retrieve status for {}", application, ex);
            } else {
                LOGGER.warn("Couldn't retrieve status for {}", application, ex);
            }
            return StatusInfo.ofOffline();
        }
    }

You can see that the admin-server has registered a scheduled task, calling /health regularly.

Problem analysis

ElasticsearchHealthIndicatorConfiguration

spring-boot-actuator-1.4.5.RELEASE-sources.jar! /org/springframework/boot/actuate/autoconfigure/ElasticsearchHealthIndicatorConfiguration.java

class ElasticsearchHealthIndicatorConfiguration {

    @Configuration
    @ConditionalOnBean(Client.class)
    @ConditionalOnEnabledHealthIndicator("elasticsearch")
    @EnableConfigurationProperties(ElasticsearchHealthIndicatorProperties.class)
    static class ElasticsearchClientHealthIndicatorConfiguration extends
            CompositeHealthIndicatorConfiguration<ElasticsearchHealthIndicator, Client> {

        private final Map<String, Client> clients;

        private final ElasticsearchHealthIndicatorProperties properties;

        ElasticsearchClientHealthIndicatorConfiguration(Map<String, Client> clients,
                ElasticsearchHealthIndicatorProperties properties) {
            this.clients = clients;
            this.properties = properties;
        }

        @Bean
        @ConditionalOnMissingBean(name = "elasticsearchHealthIndicator")
        public HealthIndicator elasticsearchHealthIndicator() {
            return createHealthIndicator(this.clients);
        }

        @Override
        protected ElasticsearchHealthIndicator createHealthIndicator(Client client) {
            return new ElasticsearchHealthIndicator(client, this.properties);
        }

    }

    @Configuration
    @ConditionalOnBean(JestClient.class)
    @ConditionalOnEnabledHealthIndicator("elasticsearch")
    static class ElasticsearchJestHealthIndicatorConfiguration extends
            CompositeHealthIndicatorConfiguration<ElasticsearchJestHealthIndicator, JestClient> {

        private final Map<String, JestClient> clients;

        ElasticsearchJestHealthIndicatorConfiguration(Map<String, JestClient> clients) {
            this.clients = clients;
        }

        @Bean
        @ConditionalOnMissingBean(name = "elasticsearchHealthIndicator")
        public HealthIndicator elasticsearchHealthIndicator() {
            return createHealthIndicator(this.clients);
        }

        @Override
        protected ElasticsearchJestHealthIndicator createHealthIndicator(
                JestClient client) {
            return new ElasticsearchJestHealthIndicator(client);
        }

    }

}

ElasticsearchHealthIndicator

spring-boot-actuator-1.4.5.RELEASE-sources.jar! /org/springframework/boot/actuate/health/ElasticsearchHealthIndicator.java

    @Override
    protected void doHealthCheck(Health.Builder builder) throws Exception {
        List<String> indices = this.properties.getIndices();
        ClusterHealthResponse response = this.client.admin().cluster()
                .health(Requests.clusterHealthRequest(indices.isEmpty() ? allIndices
                        : indices.toArray(new String[indices.size()])))
                .actionGet(this.properties.getResponseTimeout());

        switch (response.getStatus()) {
        case GREEN:
        case YELLOW:
            builder.up();
            break;
        case RED:
        default:
            builder.down();
            break;
        }
        builder.withDetail("clusterName", response.getClusterName());
        builder.withDetail("numberOfNodes", response.getNumberOfNodes());
        builder.withDetail("numberOfDataNodes", response.getNumberOfDataNodes());
        builder.withDetail("activePrimaryShards", response.getActivePrimaryShards());
        builder.withDetail("activeShards", response.getActiveShards());
        builder.withDetail("relocatingShards", response.getRelocatingShards());
        builder.withDetail("initializingShards", response.getInitializingShards());
        builder.withDetail("unassignedShards", response.getUnassignedShards());
    }

spring-boot-actuator-1.4.5.RELEASE-sources.jar! /org/springframework/boot/actuate/health/ElasticsearchHealthIndicatorProperties.java

@ConfigurationProperties(prefix = "management.health.elasticsearch", ignoreUnknownFields = false)
public class ElasticsearchHealthIndicatorProperties {

    /**
     * Comma-separated index names.
     */
    private List<String> indices = new ArrayList<String>();

    /**
     * Time, in milliseconds, to wait for a response from the cluster.
     */
    private long responseTimeout = 100L;

    //......

}

If there are specified indicators, their health conditions, such as /index1,index2/_stats, will be queried; if not, all indicators will be queried, which is a potential pit.

ElasticsearchJestHealthIndicator

spring-boot-actuator-1.4.5.RELEASE-sources.jar! /org/springframework/boot/actuate/health/ElasticsearchJestHealthIndicator.java

    @Override
    protected void doHealthCheck(Health.Builder builder) throws Exception {
        JestResult aliases = this.jestClient.execute(new Stats.Builder().build());
        JsonElement root = this.jsonParser.parse(aliases.getJsonString());
        JsonObject shards = root.getAsJsonObject().get("_shards").getAsJsonObject();
        int failedShards = shards.get("failed").getAsInt();
        if (failedShards != 0) {
            builder.outOfService();
        }
        else {
            builder.up();
        }
    }

Jest’s last http request was /_all/_stats. This is the problem. Query all _all statistics. For a large elasticsearch platform, the returned data is huge, with nearly 5,000 pieces of data. The returned json pure files need more than 20 m. This, coupled with regular task /health query, leads to the continuous generation of new objects. ygc is very frequent, resulting in memory leakage.

Summary

The timing monitoring application health is a good thing, but we must pay attention to the frequency, and also pay attention to the specific implementation. for example, this one of elasticsearch will be trapped without any attention, which is equivalent to generating a certain amount of garbage at regular intervals. the frequency exceeds the speed of garbage recovery, which is similar to memory leakage and brings great burden to the application gc.

doc