WEKO3
アイテム
マイクロブログの解析データを利用したWebベースのサファリレビューシス テム開発
https://doi.org/10.19000/0002000492
https://doi.org/10.19000/00020004924712fb76-d1b0-456d-b1da-c3dd29018766
名前 / ファイル | ライセンス | アクション |
---|---|---|
PhD_Thesis_SILAA _Sept4.pdf (5.8 MB)
|
|
Item type | 学位論文 / Thesis or Dissertation(1) | |||||||
---|---|---|---|---|---|---|---|---|
公開日 | 2023-10-03 | |||||||
タイトル | ||||||||
タイトル | Web-based Safari Review System Development using Microblog Analyzed Data | |||||||
言語 | en | |||||||
タイトル | ||||||||
タイトル | マイクロブログの解析データを利用したWebベースのサファリレビューシス テム開発 | |||||||
言語 | ja | |||||||
言語 | ||||||||
言語 | eng | |||||||
資源タイプ | ||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_db06 | |||||||
資源タイプ | doctoral thesis | |||||||
ID登録 | ||||||||
ID登録 | 10.19000/0002000492 | |||||||
ID登録タイプ | JaLC | |||||||
アクセス権 | ||||||||
アクセス権 | open access | |||||||
アクセス権URI | http://purl.org/coar/access_right/c_abf2 | |||||||
著者 |
Victor Alex Silaa
× Victor Alex Silaa
|
|||||||
抄録 | ||||||||
内容記述タイプ | Abstract | |||||||
内容記述 | In this study, I propose the use of online microblogs as review supplements and demonstrate their applicability through a designed tourist support system that aims to provide additional opinions and up-to-date points of interest to the less-known tourist spots. In realizing this proposal, I use Information Extraction (IE), Artificial Intelligence (AI), and Natural Language Processing (NLP) - based techniques. The proposed approach folds into three. First, through the use of geotagged tweets. Tweets that contain geolocation information are considered geotagged and therefore treated as possible tourist on-spot opinions. The main challenge, however, is to confirm the authenticity of the extracted tweets. This stage includes the use of location clustering and classification techniques. Specifically, extracted geotagged tweets are clustered by using location information and then annotated taking into consideration specific features applied to machine learning-based classification techniques. As for the machine learning (ML) algorithms, I adopt a fine-tuned transformer neural network-based BERT model which implements the information of token context orientation for better classification. Second, I studied geolocatability of ungeotagged tweets so that they can be used as review alternatives. Ungeotagged tweets have no geolocation information attached so it is difficult to associate with specific location. Furthermore, Twitter data is typically noisy and consists of ungrammatical or informal phraseology and non-standard vocabulary, which additionally causes the feature sparsity problem, resulting in low classifier performance. To address this, I proposed the use of a two-stage process, a transformer-based model for the classification of primary tweets, and a combination of impact words like location mention or event mention for location inferring. Additionally, I evaluate a range of pre-processing techniques for text categorization to accurately obtain a proper set that collectively contributes to the improvement of prediction accuracy. A classification framework created here relies on a fine-tuned transformer neural network model which learns from tweet contents and predicts the locations from which those tweets were sent - with a limited application in the detection of widely known general locations - such as tourist spots. I learned that the average 0.84 F1 score of a pre-trained DistilBERT language model outperformed other tested models when tested on different pre-processing datasets. Furthermore, i evaluated the effect of impact words like location mention, and event mention on the geolocation estimation, and model accuracy improvement when impact words are involved or removed. To investigate the effect of impact words on a classification model, i first computed the weighting of words using TFIDF and futher created a likelihood wordlist. I discovered model accuracy improvement as much as 6% when impact words are involved compared to when they are removed which suggests positive influence of impact words in geolocatability. I also discovered wrong weighted impact words that negatively contributes to the model performance and byeliminating them, the model F1 score improved by 3%. Third, I demonstrate the applicability of these two approaches by designing a tourist support system and mapping extracted opinions to their respective tourist spots as touristic information. |
|||||||
言語 | en | |||||||
書誌情報 |
p. 1, 発行日 2023-09 |
|||||||
著者版フラグ | ||||||||
言語 | en | |||||||
値 | ETD | |||||||
学位名 | ||||||||
言語 | ja | |||||||
学位名 | 博士(工学) | |||||||
学位授与機関 | ||||||||
学位授与機関識別子Scheme | kakenhi | |||||||
学位授与機関識別子 | 10106 | |||||||
言語 | ja | |||||||
学位授与機関名 | 北見工業大学 | |||||||
学位授与番号 | ||||||||
学位授与番号 | 甲第211号 | |||||||
研究科・専攻名 | ||||||||
値 | 生産基盤工学専攻 | |||||||
学位授与年月日 | ||||||||
学位授与年月日 | 2023-09-05 |