ログイン
言語:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 学位論文
  2. 博士論文

素性密度及びクロスリンガルゼロショット転移学習による多言語のネットいじめ自動検出の改良に関する研究

https://doi.org/10.19000/0002000332
https://doi.org/10.19000/0002000332
8ecbefd0-b3b4-4f52-8b09-e64aa21683dc
名前 / ファイル ライセンス アクション
PhD_Thesis_Eronen PhD_Thesis_Eronen .pdf (1.8 MB)
Item type 学位論文 / Thesis or Dissertation(1)
公開日 2022-09-06
タイトル
タイトル 素性密度及びクロスリンガルゼロショット転移学習による多言語のネットいじめ自動検出の改良に関する研究
言語 ja
タイトル
タイトル Improving Multilingual Automatic Cyberbullying Detection With Feature Density And Cross-lingual Zero-shot Transfer
言語 en
言語
言語 eng
資源タイプ
資源 http://purl.org/coar/resource_type/c_db06
タイプ doctoral thesis
ID登録
ID登録 10.19000/0002000332
ID登録タイプ JaLC
アクセス権
アクセス権 open access
アクセス権URI http://purl.org/coar/access_right/c_abf2
著者 エロネン ユーソ カレビ クリスティアン

× エロネン ユーソ カレビ クリスティアン

en Eronen Juuso Kalevi Kristian

ja エロネン ユーソ カレビ クリスティアン

Search repository
抄録
内容記述タイプ Abstract
内容記述 In this thesis, I study two different methods for improving multilingual automatic cyberbullying
detection. First, I study the effectiveness of Feature Density (FD) using different linguisticallybacked
feature preprocessing methods in order to estimate dataset complexity, which in turn is
used to comparatively estimate the potential performance of machine learning (ML) classifiers
prior to any training. I hypothesize that estimating dataset complexity allows for the reduction
of the number of required experiments iterations, making it possible to optimize the resourceintensive
training of ML models which is becoming a serious issue due to the increases in available
dataset sizes and the ever rising popularity of models based on Deep Neural Networks (DNN).
The problem of constantly increasing needs for more powerful computational resources is also
affecting the environment due to alarmingly-growing amount of CO2 emissions caused by training
of large-scale ML models. I use cyberbullying datasets collected for multiple languages, namely
English, Japanese and Polish. The difference in linguistic complexity of datasets allows me to
additionally discuss the efficacy of linguistically-backed word preprocessing.
Second, I study the selection of transfer languages for automatic abusive language detection.
I demonstrate the effectiveness of cross-lingual transfer learning for zero-shot abusive language
detection. This way it is possible to use existing data from higher-resource languages to build
better detection systems for languages lacking data. The datasets are from eight different languages
from three language families. I measure the distance between the languages using several language
similarity measures, especially by quantifying the World Atlas of Language Structures. I show
that there is a correlation between linguistic similarity and classifier performance, making it
possible to choose an optimal transfer language for zero shot abusive language detection.
Next, I demonstrate that this method is also generally applicable to multiple Natural Language
Processing tasks, specifically sentiment analysis, named entity recognition and dependency parsing.
I show that there is also a correlation between linguistic similarity and zero-shot cross-lingual
transfer performance for these tasks, allowing me to select an ideal transfer language in order to
aid with the problem of dealing with languages that do not currently have a sufficient amount
of data. Lastly, I show that the World Atlas of Language Structures can be quantified into an
effective linguistic similarity method.
言語 en
書誌情報
p. 1, 発行日 2022-09
著者版フラグ
言語 en
値 ETD
学位名
言語 ja
学位名 博士(工学)
学位授与機関
学位授与機関識別子Scheme kakenhi
学位授与機関識別子 10106
言語 ja
学位授与機関名 北見工業大学
学位授与番号
学位授与番号 甲第203号
研究科・専攻名
言語 ja
研究科・専攻名 生産基盤工学専攻
学位授与年月日
学位授与年月日 2022-09-06
戻る
0
views
See details
Views

Versions

Ver.1 2022-10-05 06:40:25.048226
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Cite as

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX

Confirm


Powered by WEKO3


Powered by WEKO3