这两天,在用php写一个爬虫。
对方的代码都在JS中,表现形式为ASCII编码。如下图示:
"\u003cdiv class=\"grid-container footer-container\"\u003e\n \u003cdiv class=\"grid footer\"\u003e\n \u003cdiv class=\"grid__item md--one-quarter\"\u003e\n \u003cp class=\"footer__title\"\u003eSupport\u003c/p\u003e\n \u003cul class=\"footer__nav\"\u003e\n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/about\"\u003e\u003c/a\u003e\u003c/li\u003e\n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/contact\"\u003eContact us\u003c/a\u003e\u003c/li\u003e\n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/cookies\"\u003eCookie Policy\u003c/a\u003e\u003c/li\u003e\n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/terms\"\u003eTerms of use\u003c/a\u003e\u003c/li\u003e\n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/privacy\"\u003ePrivacy Policy\u003c/a\u003e\u003c/li\u003e\n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/ub/refunds\"\u003eReturns Policy\u003c/a\u003e\u003c/li\u003e\n \u003c/ul\u003e\n \u003cp class=\"footer__title\"\u003eSafe and Secure\u003c/p\u003e\n \u003cimg class=\"footer__ssl\" src=\"/themes/compare-modular/images/v2/ssl-secured.svg\" alt=\"\"\u003e\n \u003c/div\u003e\n \u003cdiv class=\"grid__item md--one-half\"\u003e\n \u003cp class=\"footer__title\"\u003eShop\u003c/p\u003e\n \u003cul class=\"footer__nav footer__nav--categories\"\u003e\n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?q=Home\u0026#43;Furniture\u0026#43;DIY\"\u003eHome, Furniture \u0026amp; DIY\u003c/a\u003e\u003c/li\u003e\n \n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?q=Garden\u0026#43;Patio\"\u003eGarden \u0026amp; Patio\u003c/a\u003e\u003c/li\u003e\n \n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?q=TV\u0026#43;Speakers\"\u003eSound \u0026amp; Vision\u003c/a\u003e\u003c/li\u003e\n \n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?q=sports\u0026#43;equipment\"\u003eSports Goods\u003c/a\u003e\u003c/li\u003e\n \n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?q=makeup\u0026#43;skincare\"\u003eHealth \u0026amp; Beauty\u003c/a\u003e\u003c/li\u003e\n \n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?q=cars\u0026#43;motorcycles\"\u003eCars, Motorcycles \u0026amp; Vehicles\u003c/a\u003e\u003c/li\u003e\n \n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?q=jewellery\u0026#43;watches\"\u003eJewellery \u0026amp; Watches\u003c/a\u003e\u003c/li\u003e\n \n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?=mobile\u0026#43;phones\"\u003eSmartphones\u003c/a\u003e\u003c/li\u003e\n \n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?q=Toys\u0026#43;Games\"\u003eToys \u0026amp; Games\u003c/a\u003e\u003c/li\u003e\n \n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?q=baby\"\u003eBaby\u003c/a\u003e\u003c/li\u003e\n \n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?q=child\u0026#43;clothes\"\u003eKids Clothes \u0026amp; Shoes\u003c/a\u003e\u003c/li\u003e\n \n \n \n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"/search?q=clothing\u0026#43;accessories\"\u003eClothing \u0026amp; Accessories\u003c/a\u003e\u003c/li\u003e\n \n \n \u003c/ul\u003e\n \u003c/div\u003e\n \u003cdiv class=\"grid__item md--one-quarter\"\u003e\n \u003cp class=\"footer__title\"\u003eOur Address\u003c/p\u003e\n \u003cul class=\"footer__nav\"\u003e\n \u003cli class=\"footer__nav__item\"\u003e\u003ca href=\"https://www.redbrain.com/\"\u003eRedBrain\u003c/a\u003e\u003c/li\u003e\n \u003cli class=\"footer__nav__item\"\u003eSuite 14, Cathedral House\u003c/li\u003e\n \u003cli class=\"footer__nav__item\"\u003e5 Beacon Street\u003c/li\u003e\n \u003cli class=\"footer__nav__item\"\u003eLichfield\u003c/li\u003e\n \u003cli class=\"footer__nav__item\"\u003eStaffordshire\u003c/li\u003e\n \u003cli class=\"footer__nav__item\"\u003eWS13 7AA\u003c/li\u003e\n \u003c/ul\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"grid-container footer-container footer-container--2\"\u003e\n \u003cdiv class=\"grid\"\u003e\n \u003cdiv class=\"grid__item\"\u003e\n \u003cdiv class=\"footer footer--2\"\u003e\n \u003ca class=\"footer--2__link\" href=\"/\"\u003e\u003cimg class=\"footer--2__logo\" src=\"/images/logo.svg\" alt=\"\"\u003e\u003c/a\u003e\n \u003cp\u003e© RedBrain 2018\u003c/p\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\n\n\n
怎么转啊?
找啊找啊,找朋友。找到一个小方法。
$str = preg_replace("/\\\\u([0-9a-f]{3,4})/i", "&#x\\1;", $str);
$str = html_entity_decode($str, null, 'UTF-8');
转换后效果为
<div class="grid-container footer-container">
<div class="grid footer">
<div class="grid__item md--one-quarter">
<p class="footer__title">Support</p>
<ul class="footer__nav">
<li class="footer__nav__item"><a href="/about"></a></li>
<li class="footer__nav__item"><a href="/contact">Contact us</a></li>
<li class="footer__nav__item"><a href="/cookies">Cookie Policy</a></li>
<li class="footer__nav__item"><a href="/terms">Terms of use</a></li>
<li class="footer__nav__item"><a href="/privacy">Privacy Policy</a></li>
<li class="footer__nav__item"><a href="/ub/refunds">Returns Policy</a></li>
</ul>
<p class="footer__title">Safe and Secure</p>
<img class="footer__ssl" src="/themes/compare-modular/images/v2/ssl-secured.svg" alt="">
</div>
<div class="grid__item md--one-half">
<p class="footer__title">Shop</p>
<ul class="footer__nav footer__nav--categories">
<li class="footer__nav__item"><a href="/search?q=Home+Furniture+DIY">Home, Furniture & DIY</a></li>
<li class="footer__nav__item"><a href="/search?q=Garden+Patio">Garden & Patio</a></li>
<li class="footer__nav__item"><a href="/search?q=TV+Speakers">Sound & Vision</a></li>
<li class="footer__nav__item"><a href="/search?q=sports+equipment">Sports Goods</a></li>
<li class="footer__nav__item"><a href="/search?q=makeup+skincare">Health & Beauty</a></li>
<li class="footer__nav__item"><a href="/search?q=cars+motorcycles">Cars, Motorcycles & Vehicles</a></li>
<li class="footer__nav__item"><a href="/search?q=jewellery+watches">Jewellery & Watches</a></li>
<li class="footer__nav__item"><a href="/search?=mobile+phones">Smartphones</a></li>
<li class="footer__nav__item"><a href="/search?q=Toys+Games">Toys & Games</a></li>
<li class="footer__nav__item"><a href="/search?q=baby">Baby</a></li>
<li class="footer__nav__item"><a href="/search?q=child+clothes">Kids Clothes & Shoes</a></li>
<li class="footer__nav__item"><a href="/search?q=clothing+accessories">Clothing & Accessories</a></li>
</ul>
</div>
<div class="grid__item md--one-quarter">
<p class="footer__title">Our Address</p>
<ul class="footer__nav">
<li class="footer__nav__item"><a href="https://www.redbrain.com/">RedBrain</a></li>
<li class="footer__nav__item">Suite 14, Cathedral House</li>
<li class="footer__nav__item">5 Beacon Street</li>
<li class="footer__nav__item">Lichfield</li>
<li class="footer__nav__item">Staffordshire</li>
<li class="footer__nav__item">WS13 7AA</li>
</ul>
</div>
</div>
</div>
<div class="grid-container footer-container footer-container--2">
<div class="grid">
<div class="grid__item">
<div class="footer footer--2">
<a class="footer--2__link" href="/"><img class="footer--2__logo" src="/images/logo.svg" alt=""></a>
<p>© RedBrain 2018</p>
</div>
</div>
</div>
</div>
好啦,搞定了。