Morphological analysis and POS tagging

Morphological analysis is the identification of the structure of morphemes and other linguistic units, such as root words, affixes, or parts of speech.

POS (part-of-speech) tagging is the process of marking up morphemes in a phrase, based on their definitions and contexts. For example.:

가방에 들어가신다 -> 가방/NNG + /JKM + 들어가/VV + /EPH + ㄴ다/EFN

POS tagging with KoNLPy

In KoNLPy, there are several different options you can choose for POS tagging. All have the same input-output structure; the input is a phrase, and the output is a list of tagged morphemes.

For detailed usage instructions see the tag Package.

See also

Korean POS tags comparison chart

Compare POS tags between several Korean analytic projects. (In Korean)

Comparison between POS tagging classes

Now, we do time and performation analysis for executing the pos method for each of the classes in the tag Package. The experiments were carried out on a Intel i7 CPU with 4 cores, Python 2.7, and KoNLPy 0.4.1.

Time analysis 1

  1. Loading time: Class loading time, including dictionary loads.

  2. Execution time: Time for executing the pos method for each class, with 100K characters.

    If we test among a various number of characters, all classes’ execution times increase in an exponential manner.

    _images/time.png

Performance analysis

The performance evaluation is replaced with result comparisons for several sample sentences.

  1. “아버지가방에들어가신다”

    We can check the spacing algorithm through this example. Desirably, an analyzer would parse this sentence to 아버지가 + 방에 + 들어가신다 (My father enters the room), rather than 아버지 + 가방에 + 들어가신다 (My father goes in the bag). Hannanum and Komoran are careful in spacing uncertain terms, and defaults the whole phrase to nouns. Kkma is more confident, but gets undesirable results. For this result, Mecab shows the best results.

Hannanum

Kkma

Komoran

Mecab

Twitter

아버지가방에들어가 / N

아버지 / NNG

아버지가방에들어가신다 / NNP

아버지 / NNG

아버지 / Noun

이 / J

가방 / NNG

가 / JKS

가방 / Noun

시ㄴ다 / E

에 / JKM

방 / NNG

에 / Josa

들어가 / VV

에 / JKB

들어가신 / Verb

시 / EPH

들어가 / VV

다 / Eomi

ㄴ다 / EFN

신다 / EP+EC

  1. “나는 밥을 먹는다” vs “하늘을 나는 자동차”

    If we focus on “나는” in both sentences, we can see whether an analyzer considers the context of words. “나는” in the first sentence should be 나/N + 는/J, and in the second sentence 나(-ㄹ다)/V + 는/E. Kkma properly understands the latter “나는” as a verb, wheras the rest observe it as nouns.

Hannanum

Kkma

Komoran

Mecab

Twitter

나 / N

나 / NP

나 / NP

나 / NP

나 / Noun

는 / J

는 / JX

는 / JX

는 / JX

는 / Josa

밥 / N

밥 / NNG

밥 / NNG

밥 / NNG

밥 / Noun

을 / J

을 / JKO

을 / JKO

을 / JKO

을 / Josa

먹 / P

먹 / VV

먹 / VV

먹 / VV

먹는 / Verb

는다 / E

는 / EPT

는다 / EC

는다 / EC

다 / Eomi

다 / EFN

Hannanum

Kkma

Komoran

Mecab

Twitter

하늘 / N

하늘 / NNG

하늘 / NNG

하늘 / NNG

하늘 / Noun

을 / J

을 / JKO

을 / JKO

을 / JKO

을 / Josa

나 / N

날 / VV

나 / NP

나 / NP

나 / Noun

는 / J

는 / ETD

는 / JX

는 / JX

는 / Josa

자동차 / N

자동차 / NNG

자동차 / NNG

자동차 / NNG

자동차 / Noun

  1. “아이폰 기다리다 지쳐 애플공홈에서 언락폰질러버렸다 6+ 128기가실버ㅋ”

    How do each of the analyzers deal with slang, or terms that are not included in the dictionary?

Hannanum

Kkma

Komoran

Mecab

Twitter

아이폰 / N

아이 / NNG

아이폰 / NNP

아이폰 / NNP

아이폰 / Noun

기다리 / P

폰 / NNG

기다리 / VV

기다리 / VV

기다리 / Verb

다 / E

기다리 / VV

다 / EC

다 / EC

다 / Eomi

지치 / P

다 / ECS

지치 / VV

지쳐 / VV+EC

지쳐 / Verb

어 / E

지치 / VV

어 / EC

애플 / NNP

애플 / Noun

애플공홈 / N

어 / ECS

애플 / NNP

공 / NNG

공홈 / Noun

에서 / J

애플 / NNP

공 / NNG

홈 / NNG

에서 / Josa

언락폰질러버렸다 / N

공 / NNG

홈 / NNG

에서 / JKB

언락폰 / Noun

6+ / N

홈 / NNG

에서 / JKB

언락 / NNG

질 / Verb

128기가실벜 / N

에서 / JKM

언 / NNG

폰 / NNG

러 / Eomi

언락 / NNG

락 / NNG

질러버렸 / VV+EC+VX+EP

버렸 / Verb

폰 / NNG

폰 / NNG

다 / EC

다 / Eomi

질르 / VV

지르 / VV

6 / SN

6 / Number

어 / ECS

어 / EC

+ / SY

+ / Punctuation

버리 / VXV

버리 / VX

128 / SN

128 / Number

었 / EPT

었 / EP

기 / NNG

기 / Noun

다 / ECS

다 / EC

가 / JKS

가 / Josa

6 / NR

6 / SN

실버 / NNP

실버 / Noun

+ / SW

+ / SW

ㅋ / UNKNOWN

ㅋ / KoreanParticle

128 / NR

128기가실벜 / NA

기가 / NNG

실버 / NNG

ㅋ / UN

Note

If you would like to run the experiments yourself, run this code from your local machine.

1

Please note that these are comparisons among KoNLPy classes, and not the original distributions.