WEKO3
アイテム
素性密度及びクロスリンガルゼロショット転移学習による多言語のネットいじめ自動検出の改良に関する研究
https://doi.org/10.19000/0002000332
https://doi.org/10.19000/00020003328ecbefd0-b3b4-4f52-8b09-e64aa21683dc
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
|
Item type | 学位論文 / Thesis or Dissertation(1) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
公開日 | 2022-09-06 | |||||||||
タイトル | ||||||||||
タイトル | 素性密度及びクロスリンガルゼロショット転移学習による多言語のネットいじめ自動検出の改良に関する研究 | |||||||||
言語 | ja | |||||||||
タイトル | ||||||||||
タイトル | Improving Multilingual Automatic Cyberbullying Detection With Feature Density And Cross-lingual Zero-shot Transfer | |||||||||
言語 | en | |||||||||
言語 | ||||||||||
言語 | eng | |||||||||
資源タイプ | ||||||||||
資源 | http://purl.org/coar/resource_type/c_db06 | |||||||||
タイプ | doctoral thesis | |||||||||
ID登録 | ||||||||||
ID登録 | 10.19000/0002000332 | |||||||||
ID登録タイプ | JaLC | |||||||||
アクセス権 | ||||||||||
アクセス権 | open access | |||||||||
アクセス権URI | http://purl.org/coar/access_right/c_abf2 | |||||||||
著者 |
エロネン ユーソ カレビ クリスティアン
× エロネン ユーソ カレビ クリスティアン
|
|||||||||
抄録 | ||||||||||
内容記述タイプ | Abstract | |||||||||
内容記述 | In this thesis, I study two different methods for improving multilingual automatic cyberbullying detection. First, I study the effectiveness of Feature Density (FD) using different linguisticallybacked feature preprocessing methods in order to estimate dataset complexity, which in turn is used to comparatively estimate the potential performance of machine learning (ML) classifiers prior to any training. I hypothesize that estimating dataset complexity allows for the reduction of the number of required experiments iterations, making it possible to optimize the resourceintensive training of ML models which is becoming a serious issue due to the increases in available dataset sizes and the ever rising popularity of models based on Deep Neural Networks (DNN). The problem of constantly increasing needs for more powerful computational resources is also affecting the environment due to alarmingly-growing amount of CO2 emissions caused by training of large-scale ML models. I use cyberbullying datasets collected for multiple languages, namely English, Japanese and Polish. The difference in linguistic complexity of datasets allows me to additionally discuss the efficacy of linguistically-backed word preprocessing. Second, I study the selection of transfer languages for automatic abusive language detection. I demonstrate the effectiveness of cross-lingual transfer learning for zero-shot abusive language detection. This way it is possible to use existing data from higher-resource languages to build better detection systems for languages lacking data. The datasets are from eight different languages from three language families. I measure the distance between the languages using several language similarity measures, especially by quantifying the World Atlas of Language Structures. I show that there is a correlation between linguistic similarity and classifier performance, making it possible to choose an optimal transfer language for zero shot abusive language detection. Next, I demonstrate that this method is also generally applicable to multiple Natural Language Processing tasks, specifically sentiment analysis, named entity recognition and dependency parsing. I show that there is also a correlation between linguistic similarity and zero-shot cross-lingual transfer performance for these tasks, allowing me to select an ideal transfer language in order to aid with the problem of dealing with languages that do not currently have a sufficient amount of data. Lastly, I show that the World Atlas of Language Structures can be quantified into an effective linguistic similarity method. |
|||||||||
言語 | en | |||||||||
書誌情報 |
p. 1, 発行日 2022-09 |
|||||||||
著者版フラグ | ||||||||||
言語 | en | |||||||||
値 | ETD | |||||||||
学位名 | ||||||||||
言語 | ja | |||||||||
学位名 | 博士(工学) | |||||||||
学位授与機関 | ||||||||||
学位授与機関識別子Scheme | kakenhi | |||||||||
学位授与機関識別子 | 10106 | |||||||||
言語 | ja | |||||||||
学位授与機関名 | 北見工業大学 | |||||||||
学位授与番号 | ||||||||||
学位授与番号 | 甲第203号 | |||||||||
研究科・専攻名 | ||||||||||
言語 | ja | |||||||||
研究科・専攻名 | 生産基盤工学専攻 | |||||||||
学位授与年月日 | ||||||||||
学位授与年月日 | 2022-09-06 |