How to deal with the pages captured by php? Can you just keep DOM structure and remove CSS and JS?

  ios, question

Once the regular rules have been written, the page will have to be revised once it has changed.
Is there a better way to extract DOM from the page first?

I think what you need is phpDOM module… there is installation by default, don’t worry about it. …

Because I don’t know what your actual application scenario is … let me give you a simple example. …

<?php
/* i heard that you need DOM ..? */
$doc = new DOMDocument();

/* i wrote a simple page ... change it to a curl result ... */
$doc->loadHTML( <<<HTML_SECTION
<html><head><title>Sunyanzi's Test</title></head>
<body>
  <h1>Hello World</h1>
  <a href="http://segmentfault.com/" id="onlylink">Hey Welcome</a>
</body></html>
HTML_SECTION
);

/* now we should try to get something ... */
$h1Elements = $doc->getElementsByTagName( 'h1' );

/* this line prints "Hello World" ... */
foreach( $h1Elements as $h1Node ) 
    echo $h1Node->nodeValue, PHP_EOL;

/* and this line prints "http://segmentfault.com/" ... */
echo $doc->getElementById( 'onlylink' )->getAttribute( 'href' ), PHP_EOL;

/* now i will introduce something advanced ... using XPath ... */
$xpath = new DOMXPath( $doc );

/* also prints "http://segmentfault.com/" ... locate via h1 ... */
echo $xpath->evaluate(
    'string(//h1[text()="Hello World"]/following-sibling::a/@href)'
    ), PHP_EOL;

Basically … after you have mastered XPath … you will find DOM is much more flexible than regular. …

Php’s ability to process XML is far beyond your imagination … it’s not a bad thing to read manuals when you have time …