ELK

Data Science

ELK Stack

ELASTICSEARCH

https://www.elastic.co/kr/products/elasticsearch

Lucene 기반 검색 엔진 HTTP 웹 인터페이스와 JSON 문서 객체를 이용해 분산된 multitenant 가능한 검색을 지원한다.

LOGSTASH

https://www.elastic.co/kr/products/logstash

server-side 처리 파이프라인, 다양한 소스에서 동시에 Data를 수집-변환 후 Stash로 전송

KIBANA

https://www.elastic.co/kr/products/kibana

ELASTICSEARCH data를 시각화하고 탐색을 지원

ELASTICSEARCH Install on ubuntu

Install JAVA 8

sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
sudo apt-get -y install oracle-java8-installer
java -version

Install ELASTICSEARCH

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.3.1.deb
dpkg -i elasticsearch-5.3.1.deb
sudo systemctl enable elasticsearch.service

Install path: /usr/share/elasticsearch
Config file: /etc/elasticsearch
Init script: /etc/init.d/elasticsearch

ELASTICSEARCH Start | Stop | Check

sudo service elasticsearch start
sudo service elasticsearch stop
curl -XGET 'localhost:9200' # check run

ELASTICSEARCH config (External network)

Allow All Host (AWS 같은 클라우드 서비스를 사용하는 경우 외부에서 접속하기 때문에 네트워크 설정 필요)

vi /etc/elasticsearch/elasticsearch.yml

network.host: 0.0.0.0

이런 경우 앞으로 사용될 localhost를 각자의 IP로 변경한다. locahlost:9200 -> 119.10.10.10:9200

ELASTICSEARCH Basic Concepts

ELASTICSEARCH vs RDB

ELASTICSEARCH	RDB
Index	Database
Type	Table
Document	Row
Field	Column
Mapping	Schema

http://d2.naver.com/helloworld/273788

RDB	ELASTICSEARCH
Database	Index
Table	Type
Row	Document
Column	Field
Schema	Mapping
Index	Everything is indexed
SQL	Query

ELASTICSEARCH CRUD

ELASTICSEARCH에 Document(Row)를 삽입, 삭제, 조회

ELASTICSEARCH	RDB	CRUD
GET	SELECT	READ
PUT	UPDATE	UPDATE
POST	INSERT	CREATE
DELETE	DELETE	DELETE

Verify Index

curl -XGET localhost:9200/classes
curl -XGET localhost:9200/classes?pretty

?pretty: JSON fotmatting

Create Index

curl -XPUT localhost:9200/classes

Success: {"acknowledged":true,"shards_acknowledged":true}

Delete Index

curl -XDELETE localhost:9200/classes

Success: {"acknowledged":true}

Create Document

curl -XPOST localhost:9200/classes/class/1/ -d '{"title":"A", "professor":"J"}'

Success: {"_index":"classes","_type":"class","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}

Create Document from File

curl -XPOST localhost:9200/classes/class/1/ -d @oneclass.json

Fail: {"error":{"root_cause":[{"type":"null_pointer_exception","reason":null}],"type":"null_pointer_exception","reason":null},"status":500}
Success: {"_index":"classes","_type":"class","_id":"1","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"created":false}

oneclass.json

{
  "title" : "Machine Learning",
  "Professor" : "Minsuk Heo",
  "major" : "Computer Science",
  "semester" : ["spring", "fall"],
  "student_count" : 100,
  "unit" : 3,
  "rating" : 5
}

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch01/oneclass.json

ELASTICSEARCH Update

ELASTICSEARCH에 Document(Row)를 수정

curl -XPOST localhost:9200/classes/class/1/ -d '{"title":"Algorithm", "professor":"John"}'

classes : Index ( RDB's Database )
class : Type ( RDB's Table )
1 : Document ( RDB's Row )

Add Field

RDB's Column

Way 1. Basic

curl -XPOST localhost:9200/classes/class/1/_update -d '{"doc":{"unit":1}}'
curl -XGET localhost:9200/classes/class/1?pretty

Before	After
{"title":"Algorithm", "professor":"John"}	{"title":"Algorithm", "professor":"John", "unit": 1}

Way 2. Script

curl -XPOST localhost:9200/classes/class/1/_update -d '{"script":"ctx._source.unit +=5"}'

ELASTICSEARCH Bulk

File로 된 여러 개의 Documents를 한 번에 저장

curl -XPOST localhost:9200/_bulk?pretty --data-binary @classes.json

Get classes.json

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch02/classes.json#

ELASTICSEARCH Mapping

RDB's Schema. 효율적인 검색을 위해서 데이터의 타입을 정의하는 것

Step 1. Create Index

curl -XPUT localhost:9200/classes

Error

{
  "error": {
    "root_cause": [
      {
        "type": "index_already_exists_exception",
        "reason": "index [classes/5mBFGPEwRpOdJP08VNDJug] already exists",
        "index_uuid": "5mBFGPEwRpOdJP08VNDJug",
        "index": "classes"
      }
    ],
    "type": "index_already_exists_exception",
    "reason": "index [classes/5mBFGPEwRpOdJP08VNDJug] already exists",
    "index_uuid": "5mBFGPEwRpOdJP08VNDJug",
    "index": "classes"
  },
  "status": 400
}

curl -XDELETE localhost:9200/classes
curl -XPUT localhost:9200/classes

Success

{
  "classes" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index" : {
        "creation_date" : "1493388598544",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "N2GHTYgFQqOy_DoP9zuISA",
        "version" : {
          "created" : "5030199"
        },
        "provided_name" : "classes"
      }
    }
  }
}

Mapping 없이 Index만 생성했기 때문에, classes.mappings 의 값이 빈 객체이다.

Step 2. Create Mapping

2-1 Get classesRating_mapping.json

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch02/classesRating_mapping.json

2-2 Mapping

curl -XPUT localhost:9200/classes/class/_mapping -d @classesRating_mapping.json

2-3 Verify Mapping

curl -XGET localhost:9200/classes?pretty

Success

{
  "classes" : {
    "aliases" : { },
    "mappings" : {
      "class" : {
        "properties" : {
          "major" : {
            "type" : "text"
          },
          "professor" : {
            "type" : "text"
          },
          "rating" : {
            "type" : "integer"
          },
          "school_location" : {
            "type" : "geo_point"
          },
          "semester" : {
            "type" : "text"
          },
          "student_count" : {
            "type" : "integer"
          },
          "submit_date" : {
            "type" : "date",
            "format" : "yyyy-MM-dd"
          },
          "title" : {
            "type" : "text"
          },
          "unit" : {
            "type" : "integer"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1493388598544",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "N2GHTYgFQqOy_DoP9zuISA",
        "version" : {
          "created" : "5030199"
        },
        "provided_name" : "classes"
      }
    }
  }
}

2-4 Add Documents

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch02/classes.json

curl -XPOST localhost:9200/_bulk?pretty --data-binary @classes.json

Verify Documents

curl -XGET localhost:9200/classes/class/1?pretty

ELASTICSEARCH Search

Get simple_basketball.json

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch03/simple_basketball.json#

Add Documents

curl -XPOST localhost:9200/_bulk --data-binary @simple_basketball.json

Search

Search - Basic

curl -XGET localhost:9200/basketball/record/_search?pretty

Success

{
  "took" : 254,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "basketball",
        "_type" : "record",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "team" : "Chicago Bulls",
          "name" : "Michael Jordan",
          "points" : 20,
          "rebounds" : 5,
          "assists" : 8,
          "submit_date" : "1996-10-11"
        }
      },
      {
        "_index" : "basketball",
        "_type" : "record",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "team" : "Chicago Bulls",
          "name" : "Michael Jordan",
          "points" : 30,
          "rebounds" : 3,
          "assists" : 4,
          "submit_date" : "1996-10-11"
        }
      }
    ]
  }
}

Search - Uri

curl XGET localhost:9200/basketball/record/_search?q=points:30&pretty

Success

{
  "took": 15,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.0,
    "hits": [
      {
        "_index": "basketball",
        "_type": "record",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "team": "Chicago Bulls",
          "name": "Michael Jordan",
          "points": 30,
          "rebounds": 3,
          "assists": 4,
          "submit_date": "1996-10-11"
        }
      }
    ]
  }
}

Search - Request body

curl XGET localhost:9200/basketball/record/_search -d '{"query":{"term":{"points":30}}}'

curl XGET localhost:9200/basketball/record/_search -d '{
  "query": {
    "term": {
      "points": 30
    }
  }
}'

search-request-body

ELASTICSEARCH Aggregation(Metric)

평균, 합, 최소, 최대 등 산술 분석을 제공

Add Documents

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch03/simple_basketball.json

curl -XPOST localhost:9200/_bulk --data-binary @simple_basketball.json

Average

curl -XGET localhost:9200/_search?pretty --data-binary @avg_points_aggs.json

Get avg_points_aggs.json

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch03/avg_points_aggs.json

{
  "size": 0,
  "aggs": {
    "avg_score": {
      "avg": {
        "field": "points"
      }
    }
  }
}

Max

curl -XGET localhost:9200/_search?pretty --data-binary @max_points_aggs.json

Get max_points_aggs.json

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch03/max_points_aggs.json

{
  "size": 0,
  "aggs": {
    "max_score": {
      "max": {
        "field": "points"
      }
    }
  }
}

Min

curl -XGET localhost:9200/_search?pretty --data-binary @min_points_aggs.json

Get min_points_aggs.json

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch03/min_points_aggs.json

{
  "size": 0,
  "aggs": {
    "min_score": {
      "min": {
        "field": "points"
      }
    }
  }
}

Sum

curl -XGET localhost:9200/_search?pretty --data-binary @sum_points_aggs.json

Get sum_points_aggs.json

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch03/sum_points_aggs.json

{
  "size": 0,
  "aggs": {
    "sum_score": {
      "sum": {
        "field": "points"
      }
    }
  }
}

Stats

평균, 합, 최소, 최대 한 번에 도출

curl -XGET localhost:9200/_search?pretty --data-binary @stats_points_aggs.json

Get stats_points_aggs.json

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch03/stats_points_aggs.json

{
  "size": 0,
  "aggs": {
    "stats_score": {
      "stats": {
        "field": "points"
      }
    }
  }
}

ELASTICSEARCH Aggregation(Bucket)

RDB's Group by. 데이터를 일정 기준으로 묶어서 결과를 도출.

Create Basketball Index

curl -XPUT localhost:9200/basketball

Mapping Basketball

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch04/basketball_mapping.json

curl -XPUT localhost:9200/basketball/record/_mapping -d @basketball_mapping.json

Add Basketball Documents

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch04/twoteam_basketball.json

curl -XPOST localhost:9200/_bulk --data-binary @twoteam_basketball.json

Term Aggs - Group by(team)

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch04/terms_aggs.json

curl -XGET localhost:9200/_search?pretty --data-binary @terms_aggs.json

Stats Aggs - Group by(team)

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch04/stats_by_team.json

curl -XGET localhost:9200/_search?pretty --data-binary @stats_by_team.json

KIBANA Install on ubuntu

Install KIBANA

wget https://artifacts.elastic.co/downloads/kibana/kibana-5.3.1-amd64.deb

dpkg -i kibana-5.3.1.deb

Config

vi /etc/kibana/kibana.yml

#server.host: "localhost"

#elasticsearch.url: "http://localhost:9200"

Network

AWS 같은 클라우드 서비스를 사용하는 경우 localhost가 아닌 내부 IP 설정

ifconfig | grep inet

It will returns

inet addr:192.169.212.10 Bcast:192.169.212.255 Mask:255.255.255.0 inet6 addr: fe80::d00d:d9ff:fecc:9dfd/64 Scope:Link inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host

server.host: 192.169.212.10

Start KIBANA

sudo /usr/share/kibana/bin/kibana

KIBANA Management

ELASTICSEARCH 저장된 데이터 중 어떤 Index( RDB's Database )를 시각화할지 정한다.

Step 1. Set Basketball Data

curl -XDELETE localhost:9200/basketball

curl -XPUT localhost:9200/basketball

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch04/basketball_mapping.json

curl -XPUT localhost:9200/basketball/record/_mappin -d @basketball_mapping.json

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch05/bulk_basketball.json

curl -XPOST localhost:9200/_bulk --data-binary @bulk_basketball.json

Step 2. Access KIBANA

AWS 같은 클라우드를 사용하면 localhost 를 각자의 IP 주소로 바꾼다.

http://localhost:5601

Go To Management Tab (http://localhost:5601/app/kibana#/management/kibana/index?_g=())

Create Index Patterns

Index name or pattern
- basketball*
Time-field name
- submit_date

Created

KIBANA Discover

ELASTICSEARCH 의 저장된 데이터를 JSON, Table 형식으로 보여준다. Filter를 이용해서 원하는 정보만 볼 수 있다.

Go To Discover Tab (http://localhost:5601/app/kibana#/discover)

Step 1. Time Range

Step 2. Select Item

Step 3. Filtering

Step 4. Toggle

KIBANA Visualize 1

막대 차트와 파이 차트로 데이터 시각화를 실습한다.

Go To Visualize Tab (http://localhost:5601/app/kibana#/visualize)

Vertical bar chart

metrics

Y-Axis
- Aggregation
  - Average
- Field
  - points
- Custom Label
  - avg

buckets

X-Axis
- Aggregation
  - Terms
- Field
  - name.keyword
- Order by
  - metric: avg

Result

Pie bar chart

metrics

Slice Size
- Aggregation
  - Sum
- Field
  - points

buckets

Split Slices
- Aggregation
  - Terms
- Field
  - team.keyword
- Order by
  - metric: Sum of points

Result

KIBANA Visualize 2

Map 차트를 이용한 데이터 시각화를 실습한다.

Create Mapping

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch02/classesRating_mapping.json

curl -XPUT localhost:9200/classes/class/_mapping -d @classesRating_mapping.json

Add Documents

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch02/classes.json#

curl -XPUT localhost:9200/_bulk?pretty --data-binary @classes.json

Verify Documents

curl -XGET localhost:9200/classes/class/1?pretty

Kibana Management

Index name or pattern
- classes*
Time-field name
- submit_date

Kibana Visualize - Tile map

buckets

Geo Coordinates
- Aggregation
  - Geohash
- Field
  - school_location

KIBANA Dashboard

Kibana Visualize

Vertical bar chart -> classes

metrics

Y-Axis
- Aggregation
  - Average
- Field
  - points

buckets

X-Axis
- Aggregation
  - Terms
- Field
  - Professor.keyword
- Order
  - Descending
- Size
  - 16

Result

Save

Create Dashboard

Go To Dashboard Tab (http://localhost:5601/app/kibana#/dashboards)

Create a dashboard -> Add -> Select Dashboard -> Save

LOGSTASH Install on ubuntu

LOGSTASH는 ELK 스택에서 Input에 해당한다. 다양한 형태의 데이터를 받아들여서 사용자가 지정한 형식에 맞게 필터링한 후 ELASTICSEARCH로 보낸다.

Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite “stash.”

Install LOGSTASH

Java must required first!!

wget https://artifacts.elastic.co/downloads/logstash/logstash-5.3.1.deb
dpkg -i logstash-5.3.1.deb

Install path: /usr/share/logstash

Config LOGSTASH

vi logstash-simple.conf

input {
        stdin { }
}
output {
        stdout { }
}

Run LOGSTASH

sudo /usr/share/logstash/bin/logstash -f ./logstash-simple.conf

Practical data analysis using ELK 1 - Population

현재까지 구성된 ELK 스택을 이용해서 세계 인구 분석을 실습한다.

Collect Datas

Datas site

https://catalog.data.gov/dataset

Population analysis Datas

https://catalog.data.gov/dataset/population-by-country-1980-2010-d0250

Get Ready-to-use Datas

Site에서 받은 데이터는 약간의 수정이 필요하다. 아래의 데이터는 바로 사용할 수 있는 데이터.

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch06/populationbycountry19802010millions.csv

Check ELASTICSEARCH & KIBANA are running

Check KIBANA

ps -ef | grep kibana

Running

root     29968 29933  9 16:58 pts/0    00:00:06 /usr/share/kibana/bin/../node/bin/node --no-warnings /usr/share/kibana/bin/../src/cli
root     30036 30018  0 16:59 pts/1    00:00:00 grep --color=auto kibana

Stopped

root     29957 29933  0 16:57 pts/0    00:00:00 grep --color=auto kibana

Restart

sudo /usr/share/kibana/bin/kibana

Check ELASTICSEARCH

service elasticsearch status

## OR

curl -XGET 'localhost:9200'

Running

● elasticsearch.service - Elasticsearch ....

## OR

{
  "name" : "lPQjk0j",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "MORDslcmSKywLz4ReZtsIA",
  "version" : {
    "number" : "5.3.1",
    "build_hash" : "5f9cf58",
    "build_date" : "2017-04-17T15:52:53.846Z",
    "build_snapshot" : false,
    "lucene_version" : "6.4.2"
  },
  "tagline" : "You Know, for Search"
}

Stopped

curl: (7) Failed to connect to localhost port 9200: Connection refused

Restart

sudo service elasticsearch start

Config LOGSTASH

받은 파일을 LOGSTASH를 이용해서 필터링한 후 ELASTICSEARCH에 넣어준다.

vi logstash.conf

input {
  file {
    path => "/home/minsuk/Documents/git-repo/BigData/ch06/populationbycountry19802010millions.csv"
    start_position => "beginning"  
    sincedb_path => "/dev/null"  
  }
}
filter {
  csv {
      separator => ","
      columns => ["Country","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010"]
  }
  mutate {convert => ["1980", "float"]}
  mutate {convert => ["1981", "float"]}
  mutate {convert => ["1982", "float"]}
  mutate {convert => ["1983", "float"]}
  mutate {convert => ["1984", "float"]}
  mutate {convert => ["1985", "float"]}
  mutate {convert => ["1986", "float"]}
  mutate {convert => ["1987", "float"]}
  mutate {convert => ["1988", "float"]}
  mutate {convert => ["1989", "float"]}
  mutate {convert => ["1990", "float"]}
  mutate {convert => ["1991", "float"]}
  mutate {convert => ["1992", "float"]}
  mutate {convert => ["1993", "float"]}
  mutate {convert => ["1994", "float"]}
  mutate {convert => ["1995", "float"]}
  mutate {convert => ["1996", "float"]}
  mutate {convert => ["1997", "float"]}
  mutate {convert => ["1998", "float"]}
  mutate {convert => ["1999", "float"]}
  mutate {convert => ["2000", "float"]}
  mutate {convert => ["2001", "float"]}
  mutate {convert => ["2002", "float"]}
  mutate {convert => ["2003", "float"]}
  mutate {convert => ["2004", "float"]}
  mutate {convert => ["2005", "float"]}
  mutate {convert => ["2006", "float"]}
  mutate {convert => ["2007", "float"]}
  mutate {convert => ["2008", "float"]}
  mutate {convert => ["2009", "float"]}
  mutate {convert => ["2010", "float"]}
}
output {  
    elasticsearch {
        hosts => "localhost"
        index => "population"
    }
    stdout {}
}

input -> file -> path
- Edit Your own file path
- e.g.) "/root/populationbycountry19802010millions.csv"

OR Download logstash.conf file

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch06/logstash.conf

Run LOGSTASH output to ELASTICSEARCH

/usr/share/logstash/bin/logstash -f ./logstash.conf

Go KIBANA

http://localhost:5601/app/kibana#/management?_g=()

Add pattern

Discover Tab

http://localhost:5601/app/kibana#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-1y,mode:quick,to:now))&_a=(columns:!(_source),index:population,interval:auto,query:'',sort:!('@timestamp',desc))))

Visualize Tab

Practical data analysis using ELK 2 - Stock

현재까지 구성된 ELK 스택을 이용해서 주식 분석을 실습한다.

http://blog.webkid.io/visualize-datasets-with-elk/

Collcet Datas

Datas site

https://finance.yahoo.com

Stock analysis Datas - Facebook

https://finance.yahoo.com/quote/FB/history?period1=1336316400&period2=1494082800&interval=1d&filter=history&frequency=1d

Get Ready-to-use Datas

Site에서 받은 데이터는 약간의 수정이 필요하다. 아래의 데이터는 바로 사용할 수 있는 데이터.

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch06/table.csv

Check ELASTICSEARCH & KIBANA are running

Check ELASTICSEARCH

service elasticsearch status

Check KIBANA

ps -ef | grep kibana

Config LOGSTASH

vi logstash_stock.conf

input {
  file {
    path => "/home/minsuk/Documents/git-repo/BigData/ch06/table.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null"    
  }
}
filter {
  csv {
      separator => ","
      columns => ["Date","Open","High","Low","Close","Volume","Adj Close"]
  }
  mutate {convert => ["Open", "float"]}
  mutate {convert => ["High", "float"]}
  mutate {convert => ["Low", "float"]}
  mutate {convert => ["Close", "float"]}
}
output {  
    elasticsearch {
        hosts => "localhost"
        index => "stock"
    }
    stdout {}
}

input -> file -> path
- Edit Your own file path
- e.g.) "/root/table.csv"

OR Download logstash_stock.conf file

wget https://raw.githubusercontent.com/minsuk-heo/BigData/master/ch06/logstash_stock.conf