Golang Crawler Crawls for the Simplest Douban Movie Top250

  Data analysis, golang, Web crawler

Climbing for Douban Movie Top250

Crawler is standard. It was fun to watch the data. Let’s start with the simplest and most basic crawler.

Project address:https://github.com/go-crawler …


Our target site isDouban movie Top250, it is estimated that everyone is very familiar

This time, eight fields are selected for simple summary analysis. The specific fields are as follows:


Simply analyze the target source

  • There are 25 articles in a page.
  • Including paging (10 pages in total) and paging rules are normal
  • The data field ordering of each item is regular and unchanged


Due to the small quantity, our climbing steps are as follows

  • Analyze the page to get all pages
  • Analyze the pages and loop through the movie information of all pages
  • The crawled movie information is put into storage


$ go get -u github.com/PuerkitoBio/goquery


$ go run main.go

Code snippet

1. Get all pages

func ParsePages(doc *goquery.Document) (pages []Page) {
    pages = append(pages, Page{Page: 1, Url: ""})
    doc.Find("#content > div > div.article > div.paginator > a").Each(func(i int, s *goquery.Selection) {
        page, _ := strconv.Atoi(s.Text())
        url, _ := s.Attr("href")

        pages = append(pages, Page{
            Page: page,
            Url:  url,

    return pages

2. Analysis of Douban Movie Information

func ParseMovies(doc *goquery.Document) (movies []Movie) {
    doc.Find("#content > div > div.article > ol > li").Each(func(i int, s *goquery.Selection) {
        title := s.Find(".hd a span").Eq(0).Text()


        movieDesc := strings.Split(DescInfo[1], "/")
        year := strings.TrimSpace(movieDesc[0])
        area := strings.TrimSpace(movieDesc[1])
        tag := strings.TrimSpace(movieDesc[2])

        star := s.Find(".bd .star .rating_num").Text()

        comment := strings.TrimSpace(s.Find(".bd .star span").Eq(3).Text())
        compile := regexp.MustCompile("[0-9]")
        comment = strings.Join(compile.FindAllString(comment, -1), "")

        quote := s.Find(".quote .inq").Text()


        log.Printf("i: %d, movie: %v", i, movie)

        movies = append(movies, movie)

    return movies





What do you think of these data? I am really curious: =)