MongoDB在一个实例中达到每秒8万次的插入速度

3 年 ago

新, 韵

2 minutes

本篇文章描述了在AWS ec2上安装的CentOS7中所使用的Go1.7+mgo和MongoDB3.2。

尽管MongoDB被认为是慢慢的，并且虽然分片比较复杂但性能并不突出，但我们在AWS EC2上进行基准测试的结果表明，它能提供相当不错的性能。由于云计算使得扩展变得容易，即使没有分片的单个实例，在大多数网站上都可以充分利用。

基准条件客户端使用Go1.7+mgo，服务器使用MongoDB3.2，分别为它们分配了ec2实例，通过网络插入了20万次1KB大小的文档，并进行了基准测试。

下面展示客户端的代码。

package main

import (
  "fmt"
  "strings"
  "os"
  "strconv"
  "time"
  mgo "gopkg.in/mgo.v2"
)

const lenDummydata = 970   // create 1Kbyte-size bson

type InsertDoc struct {
  D string `bson:"d"`
}

func main() {
  // confirm arg
  maxSessions := 100  // arg #1
  maxGoroutines := 10000  // arg #2, shold be a multiple of maxSessions
  numInserts := 200000  // arg #3, shold be a multiple of maxGoroutines
  if len(os.Args) >= 2 {
    maxSessions, _ = strconv.Atoi(os.Args[1])
  }
  if len(os.Args) >= 3 {
    maxGoroutines , _ = strconv.Atoi(os.Args[2])
  }
  if len(os.Args) >= 4 {
    numInserts, _ = strconv.Atoi(os.Args[3])
  }
  fmt.Println("maxSessions :=", maxSessions, "maxGoroutines :=", maxGoroutines, "numInserts :=", numInserts)

  // clear mongodb
  session, err := mgo.Dial("mongodb://172.31.18.129/")
  if err != nil {
    panic(err)
  }
  COL := session.DB("test").C("COL")
  COL.DropCollection()
  // create InsertDoc
  var doc InsertDoc
  doc.D = strings.Repeat("0", lenDummydata)
  // start timer
  time_start := time.Now()

  // multiplex mgo sessions
  for i := 0; i < maxSessions; i++ {
    go func() {
      session, err := mgo.Dial("mongodb://172.31.18.129/")
      if err != nil {
        panic(err)
      }
      COL := session.DB("test").C("COL")
      // multiplex goroutines
      for j := 0; j < maxGoroutines / maxSessions; j++ {
        go func() {
          for k:= 0; k < numInserts / maxGoroutines; k++ {
            // insert document
            err := COL.Insert(&doc)
            if err != nil {
              panic(err)
            }
          }
        } ()
      }
    } ()
  }
  // wait all docs inserted
  n := 0
  for n < numInserts {
    n, _ = COL.Count()
    time.Sleep(10 * time.Millisecond)
  }
  // print result
  fmt.Println(n, "docs inserted")    
  duration := time.Now().Sub(time_start).Seconds()*1000
  fmt.Println("TOTAL:", int(duration), "msec", int(float64(numInserts) * 1000 /duration), "docs/s")
}

使用`go run mongo_bench.go`命令在DB会话数、goroutine数和插入次数上执行基准测试。后面的参数可以省略。

在等待goroutine完成方面，通常使用sync.WaitGroup是一种常见的方法，但是简单地监视MongoDB的写入数量似乎会更快。这种方式更像是异步写入。

使用默认索引，MongoDB的journaling功能已经启用。在尝试了其他方法后，发现增加索引会导致写入速度略有减慢（约几个百分点）。此外，通过使用snappy压缩算法，实际数据被压缩了约30％，这减少了对存储性能的限制。尽管可以通过生成随机数来减轻影响，但考虑到随机数生成的开销，我们此次未进行该操作。

结论和讨论
我将以下结果汇总到表格中。
我通过更改AWS ec2实例类型、DB会话数和goroutine数，对各种基准测试结果进行比较。

（d/g是指d：数据库会话数，g：goroutine数）
（实例类型在客户端和数据库两端相同）
平均每秒插入数

インスタンスタイプコア数1/11/100100/100100/10000t2.small12,01424,5479,91710,293t2.medium22,71623,11421,98618,665t2.xlarge43,67118,62359,35633,101t2.2xlarge83,72721,82884,42858,290m4.4xlarge164,62919,98674,10269,072根据DB会话数量和goroutine的并行数量，结果会略有不同。特别是在DB会话数量=1的情况下，即使在多核环境下，MongoDB只能使用一个核心，因此需要注意性能不可扩展的问题。

如果将DB会话和goroutine数量设置为100，则似乎会有比较好的结果。我们在t2.2xlarge上实现了每秒8万次的插入操作。

有一方面，我们也尝试使用了M4.4xlarge，但并没有进一步加快速度。以及T2.2xlarge和T2.xlarge的比率也没有增加，这可能意味着已经达到了CPU以外的瓶颈。如果有人可以在类似的前提条件下实现每秒10万次以上的插入，请回报一下。

#技术文章