Elasticsearch系列—生產數據備份恢復方案

前言

生產環境中運行的組件，只要有數據存儲，定時備份、災難恢復是必修課，mysql數據庫的備份方案已經非常成熟，Elasticsearch也同樣有成熟的數據備份、恢復方案，我們來了解一下。

概要

本篇介紹Elasticsearch生產集群數據的數據備份、恢復和升級的常規操作。

curl命令

curl是Linux操作的必備工具，Elasticsearch生產環境的搭建，不能保證都能使用kibana訪問到，而Elasticsearch Restful API都可以使用curl工具來完成訪問。

使用curl還有一個好處：有些操作需要一連串的請求才能完成，我們可以使用shell腳本將這些關聯的操作，封裝到腳本里，後續使用起來就非常方便。

如果有定時執行的命令，也是使用shell將一系列操作封裝好，運用Linux自帶的crontab進行觸發。

後續的一些操作命令，將會用curl來完成，並且只需要將完整的curl請求拷貝到kibana的dev tools上，kibana能夠自動轉化成我們之前常見的請求，非常方便。

在Linux下的請求命令：

[esuser@elasticsearch02 ~]$ curl -XGET 'http://elasticsearch02:9200/music/children/_search?pretty' -H 'Content-Type: application/json' -d '
{
  "query": {
    "match_all": {}
  }
}
'

完整的命令拷貝到dev tools里時，自動會變成：

GET /music/children/_search
{

  "query": {

    "match_all": {}

  }

}

這工具真是強大，不過反過來操作不行的，我已經試過了。

curl命令，有Body體的，記得加上-H 'Content-Type: application/json'，?pretty參數可以讓響應結果格式化輸出。

數據備份

我們知道Elasticsearch的索引拆分成多個shard進行存儲在磁盤裡，shard雖然分了primary shard和replica shard，可以保證集群的數據不丟失，數據訪問不間斷，但如果機房停電導致集群節點全部宕機這種重大事故時，我們就需要提前定期地對數據進行備份，以防萬一。

既然是磁盤文件存儲，那存儲介質的選擇就有很多了：本地磁盤，NAS，文件存儲服務器（如FastDFS、HDFS等），各種雲存儲（Amazon S3, 阿里雲OSS）等

同樣的，Elasticsearch也提供snapshot api命令來完成數據備份操作，可以把集群當前的狀態和數據全部存儲到一個其他目錄上，本地路徑或網絡路徑均可，並且支持增量備份。可以根據數據量來決定備份的執行頻率，增量備份的速度還是很快的。

創建備份倉庫

我們把倉庫地址暫定為本地磁盤的/home/esuser/esbackup目錄，

首先，我們需要在elasticsearch.yml配置文件中加上

path.repo: /home/esuser/esbackup

並重啟Elasticsearch。

啟動成功后，發送創建倉庫的請求：

[esuser@elasticsearch02 ~]$ curl -XPUT 'http://elasticsearch02:9200/_snapshot/esbackup?pretty' -H 'Content-Type: application/json' -d '
{
    "type": "fs", 
    "settings": {
        "location": "/home/esuser/esbackup",
        "max_snapshot_bytes_per_sec" : "50mb", 
        "max_restore_bytes_per_sec" : "50mb"
    }
}
'
{"acknowledged":true}
[esuser@elasticsearch02 ~]$

參數解釋：

type: 倉庫的類型名稱，請求里都是fs，表示file system。
location: 倉庫的地址，要與elasticsearch.yml配置文件相同，否則會報錯
max_snapshot_bytes_per_sec: 指定數據從Elasticsearch到倉庫（數據備份）的寫入速度上限，默認是20mb/s
max_restore_bytes_per_sec: 指定數據從倉庫到Elasticsearch（數據恢復）的寫入速度上限，默認也是20mb/s

用於限流的兩個參數，需要根據實際的網絡進行設置，如果備份目錄在同一局域網內，可以設置得大一些，便於加快備份和恢復的速度。

也有查詢命令可以看倉庫的信息：

[esuser@elasticsearch02 ~]$ curl -XGET 'http://elasticsearch02:9200/_snapshot/esbackup?pretty'

{"esbackup":{"type":"fs","settings":{"location":"/home/esuser/esbackup","max_restore_bytes_per_sec":"50mb","max_snapshot_bytes_per_sec":"50mb"}}}

[esuser@elasticsearch02 ~]$

使用hdfs創建倉庫

大數據這塊跟hadoop生態整合還是非常推薦的方案，數據備份這塊可以用hadoop下的hdfs分佈式文件存儲系統，關於hadoop集群的搭建方法，需要自行完成，本篇末尾有補充說明，可供參考。

對Elasticsearch來說，需要安裝repository-hdfs的插件，我們的Elasticsearch版本是6.3.1，對應的插件則使用repository-hdfs-6.3.1.zip，hadoop則使用2.8.1版本的。

插件下載安裝命令：

./elasticsearch-plugin install https://artifacts.elastic.co/downloads/elasticsearch-plugins/repository-hdfs/repository-hdfs-6.3.1.zip

如果生產環境的服務器無法連接外網，可以先在其他機器上下載好，上傳到生產服務器，解壓到本地，再執行安裝：

./elasticsearch-plugin install file:///opt/elasticsearch/repository-hdfs-6.3.1

安裝完成後記得重啟Elasticsearch節點。

查看節點狀態：

[esuser@elasticsearch02 ~]$ curl -XGET elasticsearch02:9200/_cat/nodes?v

ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.17.137           38          95   2    0.03    0.03     0.05 mdi       *      node-1

創建hdfs倉庫

先查看節點的shard信息

[esuser@elasticsearch02 ~]$ curl -XGET 'http://elasticsearch02:9200/_count?pretty' -H 'Content-Type: application/json' -d '
 {
     "query": {
         "match_all": {}
     }
}'


{
  "count" : 5392,
  "_shards" : {
    "total" : 108,
    "successful" : 108,
    "skipped" : 0,
    "failed" : 0
  }
}

創建一個hdfs的倉庫，名稱為hdfsbackup

[esuser@elasticsearch02 ~]$ curl -XPUT  'http://elasticsearch02:9200/_snapshot/hdfsbackup?pretty' -H 'Content-Type: application/json' -d '
 {
   "type": "hdfs",
   "settings": {
     "uri": "hdfs://elasticsearch02:9000/",
     "path": "/home/esuser/hdfsbackup",
   "conf.dfs.client.read.shortcircuit": "false",
   "max_snapshot_bytes_per_sec" : "50mb", 
     "max_restore_bytes_per_sec" : "50mb"
   }
 }'

{
  "acknowledged" : true
}

驗證倉庫

倉庫創建好了之後，可以用verify命令驗證一下：

[esuser@elasticsearch02 ~]$ curl -XPOST 'http://elasticsearch02:9200/_snapshot/hdfsbackup/_verify?pretty'
{
  "nodes" : {
    "A1s1uus7TpuDSiT4xFLOoQ" : {
      "name" : "node-1"
    }
  }
}

索引備份

倉庫創建好並驗證完成后，可以執行snapshot api對索引進行備份了，

如果不指定索引名稱，表示備份當前所有open狀態的索引都備份，還有一個參數wait_for_completion，表示是否需要等待備份完成后才響應結果，默認是false，請求提交後會立即返回，然後備份操作在後台異步執行，如果設置為true，請求就變成同步方式，後台備份完成后，才會有響應。建議使用默認值即可，有時備份的整個過程會持續1-2小時。

示例1：備份所有的索引，備份名稱為snapshot_20200122

[esuser@elasticsearch02 ~]$ curl -XPUT 'http://elasticsearch02:9200/_snapshot/hdfsbackup/snapshot_20200122?pretty'
{
  "accepted" : true
}

示例2：備份索引music的數據，備份名稱為snapshot_20200122_02，並指定wait_for_completion為true

[esuser@elasticsearch02 ~]$ curl -XPUT 'http://elasticsearch02:9200/_snapshot/hdfsbackup/snapshot_20200122_02?wait_for_completion=true&pretty' -H 'Content-Type: application/json' -d '
{
  "indices": "music",
  "ignore_unavailable": true,
  "include_global_state": false,
  "partial": true
}'


{
  "snapshot" : {
    "snapshot" : "snapshot_20200122_02",
    "uuid" : "KRXnzc6XSWagCQO92EQx6A",
    "version_id" : 6030199,
    "version" : "6.3.1",
    "indices" : [
      "music"
    ],
    "include_global_state" : false,
    "state" : "SUCCESS",
    "start_time" : "2020-01-22T07:11:06.594Z",
    "start_time_in_millis" : 1579677066594,
    "end_time" : "2020-01-22T07:11:07.313Z",
    "end_time_in_millis" : 1579677067313,
    "duration_in_millis" : 719,
    "failures" : [ ],
    "shards" : {
      "total" : 5,
      "failed" : 0,
      "successful" : 5
    }
  }
}

這條命令中幾個參數介紹：

indices：索引名稱，允許寫多個，用”,”分隔，支持通配符。
ignore_unavailable：可選值true/false，如果為true，indices里不存在的index就可以忽略掉，備份操作正常執行，默認是false，如果某個index不存在，備份操作會提示失敗。
include_global_state：可選值true/false，含義是要不要備份集群的全局state數據。
partial：可選值true/false，是否支持備份部分shard的數據。默認值為false，如果索引的部分primary shard不可用，partial為false時備份過程會提示失敗。

使用snapshot api對數據的備份是增量進行的，執行snapshotting的時候，Elasticsearch會分析已經存在於倉庫中的snapshot對應的index file，在前一次snapshot基礎上，僅備份創建的或者發生過修改的index files。這就允許多個snapshot在倉庫中可以用一種緊湊的模式來存儲，非常節省存儲空間，並且snapshotting過程是不會阻塞所有的Elasticsearch讀寫操作的。

同樣的，snapshot作為數據快照，在它之後寫入index中的數據，是不會反應到這次snapshot中的，snapshot數據的內容包含index的副本，也可以選擇是否保存全局的cluster元數據，元數據裡面包含了全局的cluster設置和template。

每次只能執行一次snapshot操作，如果某個shard正在被snapshot備份，那麼這個shard此時就不能被移動到其他node上去，這會影響shard rebalance的操作。只有在snapshot結束之後，這個shard才能夠被移動到其他的node上去。

查看snapshot備份列表

查看倉庫內所有的備份列表

curl -XGET 'http://elasticsearch02:9200/_snapshot/hdfsbackup/_all?pretty'

查看單個備份數據

[esuser@elasticsearch02 ~]$ curl -XGET 'http://elasticsearch02:9200/_snapshot/hdfsbackup/snapshot_20200122_02?pretty'
{
  "snapshots" : [
    {
      "snapshot" : "snapshot_20200122_02",
      "uuid" : "KRXnzc6XSWagCQO92EQx6A",
      "version_id" : 6030199,
      "version" : "6.3.1",
      "indices" : [
        "music"
      ],
      "include_global_state" : false,
      "state" : "SUCCESS",
      "start_time" : "2020-01-22T07:11:06.594Z",
      "start_time_in_millis" : 1579677066594,
      "end_time" : "2020-01-22T07:11:07.313Z",
      "end_time_in_millis" : 1579677067313,
      "duration_in_millis" : 719,
      "failures" : [ ],
      "shards" : {
        "total" : 5,
        "failed" : 0,
        "successful" : 5
      }
    }
  ]
}

刪除snapshot備份

如果需要刪除某個snapshot備份快照，一定要使用delete命令，造成別自個跑到服務器目錄下做rm操作，因為snapshot是增量備份的，裏面有各種依賴關係，極可能損壞backup數據，記住不要上來就自己干文件，讓人家標準的命令來執行，命令如下：

[esuser@elasticsearch02 ~]$ curl -XDELETE 'http://elasticsearch02:9200/_snapshot/hdfsbackup/snapshot_20200122?pretty'
{
  "acknowledged" : true
}

查看備份進度

備份過程長短視數據量而定，wait_for_completion設置為true雖然可以同步得到結果，但時間太長的話也不現實，我們是希望備份操作後台自己搞，我們時不時的看看進度就行，其實還是調用的snapshot的get操作命令，加上_status參數即可，備份過程中會显示什麼時間開始的，有幾個shard在備份等等信息：

curl -XGET 'http://elasticsearch02:9200/_snapshot/hdfsbackup/snapshot_20200122_02/_status?pretty'

取消備份

正在備份的數據可以執行取消，使用的是delete命令：

curl -XDELETE 'http://elasticsearch02:9200/_snapshot/hdfsbackup/snapshot_20200122?pretty'

這個命令有兩個作用：

如果備份正在進行中，那麼取消備份操作，並且刪除備份了一半的數據。
如果備份已經完成，直接刪除備份數據。

數據恢復

生產環境的備份操作，是定期執行的，執行的頻率看實際的數據量，有1天執行1次的，有4小時一次的，簡單的操作是使用shell腳本封裝備份的命令，然後使用Linux的crontab定時執行。

既然數據有備份，那如果數據出現異常，或者需要使用到備份數據時，恢復操作就能派上用場了。

常規恢復

數據恢復使用restore命令，示例如下：

[esuser@elasticsearch02 ~]$ curl -XPOST 'http://elasticsearch02:9200/_snapshot/hdfsbackup/snapshot_20200122_02/_restore?pretty'
{
  "accepted" : true
}

注意一下被恢復的索引，必須全部是close狀態的，否則會報錯，關閉索引的命令：

[esuser@elasticsearch02 ~]$ curl -XPOST  'http://elasticsearch02:9200/music/_close?pretty'

恢復完成后，索引自動還原成open狀態。

同樣有些參數可以進行選擇：

[esuser@elasticsearch02 ~]$ curl -XPOST 'http://elasticsearch02:9200/_snapshot/hdfsbackup/snapshot_20200122_02/_restore
{
    "indices": "music", 
	"ignore_unavailable": true,
	"include_global_state": true
}

默認會把備份數據里的索引全部還原，我們可以使用indices參數指定需要恢復的索引名稱。同樣可以使用wait_for_completion參數，ignore_unavailable、partial和include_global_state與備份時效果相同，不贅述。

監控restore的進度

與備份類似，調用的recovery的get操作命令查看恢復的進度：

curl -XGET 'http://elasticsearch02:9200/music/_recovery?pretty'

music為索引名稱。

取消restore

與備份類似，delete正在恢復的索引可以取消恢復過程：

curl -XDELETE 'http://elasticsearch02:9200/music'

集群升級

我們現在使用的版本是6.3.1，目前官網最新版本已經是7.5.2了，如果沒有重大的變更或嚴重bug報告的情況下，一般是不需要做升級，畢竟升級有風險，發布要謹慎。

這裏就簡單說一下通用的步驟，謹慎操作:

查看官網最新版本的文檔，從當前版本到目標版本的升級，有哪些變化，新加入的功能和修復的bug。
在開發環境或測試環境先執行升級，相應的插件也做一次匹配升級，穩定運行幾個項目版本周期后，再考慮生產環境的升級事宜。
升級前對數據進行全量的備份，萬一升級失敗，還有挽救的餘地。
申請生產環境升級的時間窗口，逐個node進行升級驗證。

補充hadoop集群搭建

Elasticsearch的數據備份，通常建議的實踐方案是結合hadoop的hdfs文件存儲，這裏我們搭建一個hadoop的集群環境用作演示，hadoop相關的基礎知識請自行了解，已經掌握的童鞋可以跳過。

版本環境：
hadoop 2.8.1

虛擬機環境

hadoop集群至少需要3個節點。我們選用elasticsearch02、elasticsearch03、elasticsearch04三台機器用於搭建。

下載解壓

官網下載hadoop-2.8.1.tar.gz，解壓至/opt/hadoop目錄

設置環境變量

演示環境擁有root權限，就介紹一種最簡單的設置方法，修改/etc/profile文件，添加變量後記得source一下該文件。


[root@elasticsearch02 ~]# vi /etc/profile

# 文件末尾添加
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.1
export PATH=${HADOOP_HOME}/bin:$PATH

[root@elasticsearch02 ~]# source /etc/profile

創建hadoop數據目錄，啟動hadoop時我們使用esuser賬戶，就在/home/esuser下創建目錄，如 /home/esuser/hadoopdata
修改hadoop的配置文件，在/opt/hadoop/hadoop-2.8.1/etc/hadoop目錄下，基本上是添加配置，涉及的配置文件：

core-site.xml
hdfs-site.xml
yarn-site.xml
mapred-site.xml
slaves(注：我們選定elasticsearch02為master，其餘兩個為slave)

示例修改如下：

core-site.xml

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://elasticsearch02:9000</value>
</property>

hdfs-site.xml

<property>
  <name>dfs.namenode.name.dir</name>
  <value>/home/esuser/hadoopdata/namenode</value>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>/home/esuser/hadoopdata/datanode</value>
</property>

yarn-site.xml

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>elasticsearch02</value>
</property>

mapred-site.xml

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

slaves

elasticsearch03
elasticsearch04

拷貝設置后的文件到另外兩台機器上

scp -r /opt/hadoop/hadoop-2.8.1 esuser@elasticsearch03:/opt/hadoop/hadoop-2.8.1
scp -r /opt/hadoop/hadoop-2.8.1 esuser@elasticsearch04:/opt/hadoop/hadoop-2.8.1

拷貝的文件有點大，需要等一會兒，拷貝完成后，在elasticsearch03、elasticsearch04再設置一次HADOOP_HOME環境變量

啟動集群

格式化namenode，在hadoop master節點(elasticsearch02)，HADOOP_HOME/sbin目錄下執行hdfs namenode -format

執行啟動命令：start-dfs.sh
這個啟動過程會建立到elasticsearch03、elasticsearch04的ssh連接，輸入esuser的密碼即可，也可以提前建立好免密ssh連接。

我們只需要用它的hdfs服務，其他的組件可以不啟動。

驗證啟動是否成功，三台機器分別輸入jps，看下面的進程，如無意外理論上應該是這樣：
elasticsearch02：NameNode、SecondaryNameNode
elasticsearch03：DataNode
elasticsearch04：DataNode

同時在瀏覽器上輸入hadoop master的控制台地址：http://192.168.17.137:50070/dfshealth.html#tab-overview，應該能看到這兩個界面：

datanodes看到2個結點，表示集群啟動成功，如果只能看到一個或一個都沒有，可以查看相應的日誌：/opt/hadoop/hadoop-2.8.1/logs

Error: JAVA_HOME is not set and could not be found 錯誤解決辦法

這個明明已經設置了JAVA_HOME，並且export命令也能看到，啟動時死活就是不行，不跟他杠了，直接在/opt/hadoop/hadoop-2.8.1/etc/hadoop/hadoop-env.sh文件加上

export JAVA_HOME="/opt/jdk1.8.0_211"

小結

本篇主要以hadoop分佈式文件存儲為背景，講解了Elasticsearch數據的備份與恢復，可以了解一下。集群版本升級這類操作，實踐起來比較複雜，受項目本身影響比較大，這裏就簡單提及要注意的地方，沒有作詳細的案例操作，真要有版本升級的操作，請各位慎重操作，多驗證，確保測試環境充分測試后再上生產，記得數據要備份。

專註Java高併發、分佈式架構，更多技術乾貨分享與心得，請關注公眾號：Java架構社區
可以掃左邊二維碼添加好友，邀請你加入Java架構社區微信群共同探討技術

本站聲明:網站內容來源於博客園,如有侵權,請聯繫我們,我們將及時處理

【其他文章推薦】

※網頁設計一頭霧水該從何著手呢? 台北網頁設計公司幫您輕鬆架站!

※網頁設計公司推薦不同的風格，搶佔消費者視覺第一線

※Google地圖已可更新顯示潭子電動車充電站設置地點!!

※廣告預算用在刀口上，台北網頁設計公司幫您達到更多曝光效益

※別再煩惱如何寫文案,掌握八大原則!

※網頁設計最專業,超強功能平台可客製化